Getting `KeyError: 'nvrtc'` on CPU-only machine #75

geoffreyangus · 2022-11-08T17:18:52Z

Versions:

Python 3.9.14
macOS 12.6
292984c
ludwig@1d8154f

Hello!

I'd like to add S4 into the Ludwig OSS project. I've successfully imported and initialized the S4 module. However, in the forward pass, I am running into the following error:

[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
Traceback (most recent call last):
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1670, in <module>
    main()
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1666, in main
    module(torch.randn(2, 16, 100))
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1582, in forward
    k, k_state = self.kernel(L=L_kernel, rate=rate, state=state)  # (C H L) (B C H L)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1388, in forward
    return self.kernel(state=state, L=L, rate=rate)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 821, in forward
    r = cauchy_conj(v, z, w)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 59, in cauchy_conj
    r = 2 * cauchy_mult(v, z, w, backend="GPU")
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
    out = GenredAutograd.apply(
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
KeyError: 'nvrtc'

Here is the code snippet I am currently running from within the file implementing the S4 class:

def main():
    module = S4(
        16,
        gate=4,  # Multiplicative gating layer that also expands dimension by factor of 4
        bottleneck=4,  # Reduce dimension of SSM by factor of 4
        measure="legs",  # Randomly initialize A
        dt_min=1.0,
        dt_max=1.0,  # Initialize dt to 1.0
        lr={"dt": 0.0, "B": 0.0},  # Freeze B and dt
    )
    module(torch.randn(2, 16, 100))


if __name__ == "__main__":
    main()

To reproduce the error:

Download Ludwig on a CPU-only machine. cd into the repository.
Checkout the hack-s4 branch.
Run python ludwig/modules/s4_modules.py.

Let me know what you think, thanks!

The text was updated successfully, but these errors were encountered:

albertfgu · 2022-11-09T00:43:04Z

Thanks for the report! I've never tested the code on CPU actually. It looks like your error is occuring with the pykeops package specifically. I'm trying to figure out if it's a problem with your pykeops installation, if there's an issue with pykeops+CPU in general, if this happens only with a CPU-only machine, or if it's specific to the integration with Ludwig.

Could you follow the instructions at https://www.kernel-operations.io/keops/python/installation.html#testing-your-installation to see if your pykeops works on your machine.
On my standard machine, I turned off GPU through pytorch-lightning and checked that it's going through the pykeops codepath on CPU. It seems to work fine
I'll try installing the repo on a Macbook and see if the issue persists.
If everything else works fine, I can try following your instructions to reproduce the problem with Ludwig

It's probably easier to just disable pykeops on CPU, as I doubt it adds much in that case (won't CPU be really slow for any reasonable use case?). You can try to pip uninstall pykeops on this machine and see if it can run.

albertfgu · 2022-11-10T17:03:48Z

Update: I tried running the code on a Macbook and ran into weird pickling issues during dataloading that likely stem from pytorch-lightning / torchvision. Given that CPU-only workflows are pretty suboptimal and rare (do you think it's likely that your users will run deep sequence models on CPU?) I think the best solution for now is to just uninstall pykeops if it causes issues. I'm not sure if it even provides any speedup on CPU and the memory savings are much less relevant than on an accelerator. Can you check step (1) above to see if the pykeops installation is actually correct, and if so then see if uninstalling it fixes the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting `KeyError: 'nvrtc'` on CPU-only machine #75

Getting `KeyError: 'nvrtc'` on CPU-only machine #75

geoffreyangus commented Nov 8, 2022

albertfgu commented Nov 9, 2022

albertfgu commented Nov 10, 2022

Getting KeyError: 'nvrtc' on CPU-only machine #75

Getting KeyError: 'nvrtc' on CPU-only machine #75

Comments

geoffreyangus commented Nov 8, 2022

albertfgu commented Nov 9, 2022

albertfgu commented Nov 10, 2022

Getting `KeyError: 'nvrtc'` on CPU-only machine #75

Getting `KeyError: 'nvrtc'` on CPU-only machine #75