Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting KeyError: 'nvrtc' on CPU-only machine #75

Open
geoffreyangus opened this issue Nov 8, 2022 · 2 comments
Open

Getting KeyError: 'nvrtc' on CPU-only machine #75

geoffreyangus opened this issue Nov 8, 2022 · 2 comments

Comments

@geoffreyangus
Copy link

Versions:

  • Python 3.9.14
  • macOS 12.6
  • 292984c
  • ludwig@1d8154f

Hello!

I'd like to add S4 into the Ludwig OSS project. I've successfully imported and initialized the S4 module. However, in the forward pass, I am running into the following error:

[KeOps] Warning : Cuda libraries were not detected on the system ; using cpu only mode
Traceback (most recent call last):
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1670, in <module>
    main()
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1666, in main
    module(torch.randn(2, 16, 100))
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1582, in forward
    k, k_state = self.kernel(L=L_kernel, rate=rate, state=state)  # (C H L) (B C H L)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 1388, in forward
    return self.kernel(state=state, L=L, rate=rate)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1423, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 821, in forward
    r = cauchy_conj(v, z, w)
  File "/Users/geoffreyangus/repositories/predibase/ludwig/ludwig/modules/s4_modules.py", line 59, in cauchy_conj
    r = 2 * cauchy_mult(v, z, w, backend="GPU")
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 624, in __call__
    out = GenredAutograd.apply(
  File "/Users/geoffreyangus/repositories/predibase/ludwig/venv39/lib/python3.9/site-packages/pykeops/torch/generic/generic_red.py", line 78, in forward
    myconv = keops_binder["nvrtc" if tagCPUGPU else "cpp"](
KeyError: 'nvrtc'

Here is the code snippet I am currently running from within the file implementing the S4 class:

def main():
    module = S4(
        16,
        gate=4,  # Multiplicative gating layer that also expands dimension by factor of 4
        bottleneck=4,  # Reduce dimension of SSM by factor of 4
        measure="legs",  # Randomly initialize A
        dt_min=1.0,
        dt_max=1.0,  # Initialize dt to 1.0
        lr={"dt": 0.0, "B": 0.0},  # Freeze B and dt
    )
    module(torch.randn(2, 16, 100))


if __name__ == "__main__":
    main()

To reproduce the error:

  1. Download Ludwig on a CPU-only machine. cd into the repository.
  2. Checkout the hack-s4 branch.
  3. Run python ludwig/modules/s4_modules.py.

Let me know what you think, thanks!

@albertfgu
Copy link
Contributor

Thanks for the report! I've never tested the code on CPU actually. It looks like your error is occuring with the pykeops package specifically. I'm trying to figure out if it's a problem with your pykeops installation, if there's an issue with pykeops+CPU in general, if this happens only with a CPU-only machine, or if it's specific to the integration with Ludwig.

  1. Could you follow the instructions at https://www.kernel-operations.io/keops/python/installation.html#testing-your-installation to see if your pykeops works on your machine.
  2. On my standard machine, I turned off GPU through pytorch-lightning and checked that it's going through the pykeops codepath on CPU. It seems to work fine
  3. I'll try installing the repo on a Macbook and see if the issue persists.
  4. If everything else works fine, I can try following your instructions to reproduce the problem with Ludwig

It's probably easier to just disable pykeops on CPU, as I doubt it adds much in that case (won't CPU be really slow for any reasonable use case?). You can try to pip uninstall pykeops on this machine and see if it can run.

@albertfgu
Copy link
Contributor

Update: I tried running the code on a Macbook and ran into weird pickling issues during dataloading that likely stem from pytorch-lightning / torchvision. Given that CPU-only workflows are pretty suboptimal and rare (do you think it's likely that your users will run deep sequence models on CPU?) I think the best solution for now is to just uninstall pykeops if it causes issues. I'm not sure if it even provides any speedup on CPU and the memory savings are much less relevant than on an accelerator. Can you check step (1) above to see if the pykeops installation is actually correct, and if so then see if uninstalling it fixes the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants