Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: PyG installation instructions (esp. for XPUs) #166

Closed
6 tasks
chaitjo opened this issue Mar 25, 2024 · 5 comments
Closed
6 tasks

[Feature request]: PyG installation instructions (esp. for XPUs) #166

chaitjo opened this issue Mar 25, 2024 · 5 comments
Labels
enhancement New feature or request needs triage Issue needs decision making

Comments

@chaitjo
Copy link

chaitjo commented Mar 25, 2024

Feature/behavior summary

I'm trying to get PyG to install and work well with Intel XPUs, and was hoping to use this repository as reference. At present, I see that PyG is never installed by default, and nor are any instructions for setting it up with XPUs available.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

Unknown.

Additional notes

At present, working with a different repository (https://github.com/a-r-j/ProteinWorkshop), I've been trying to integrate your code for the XPU as a new accelerator in PyTorch Lightning: https://github.com/IntelLabs/matsciml/blob/main/matsciml/lightning/xpu.py.

So far, I'm able to get my trainer to identify the XPU as a device, but it seems like some torch_cluster operations are not compatible with tensor stored on XPUs. I would like to perform torch_cluster operations such as knn graph creation on XPU tensors so that I can do data processing in a batched manner or on-the-fly, as opposed to on the CPU.

Here is a minimal example which fails:

import torch
import intel_extension_for_pytorch as ipex
from torch_geometric.nn import knn_graph

device = torch.device('xpu:0' if torch.xpu.is_available() else 'cpu')

x = torch.tensor([[-1.0, -1.0], [-1.0, 1.0], [1.0, -1.0], [1.0, 1.0]]).to(device)
batch = torch.tensor([0, 0, 0, 0]).to(device)
edge_index = knn_graph(x, k=2, batch=batch, loop=False)

The resulting error is RuntimeError: x.device().is_cpu() INTERNAL ASSERT FAILED at "csrc/cpu/knn_cpu.cpp":12, please report a bug to PyTorch. x must be CPU tensor.

And here's a longer trace from the ProteinWorkshop codebase, which probably won't make any sense to MatSciML maintainers.

File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_geometric/nn/pool/__init__.py", line 171, in knn_graph
    return torch_cluster.knn_graph(x, k, batch, loop, flow, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_cluster/knn.py", line 132, in knn_graph
    edge_index = knn(x, x, k if loop else k + 1, batch, batch, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch_cluster/knn.py", line 81, in knn
    return torch.ops.torch_cluster.knn(x, y, ptr_x, ptr_y, k, cosine,
  File "/home/ckj24/rds/hpc-work/envs/proteinworkshop/lib/python3.10/site-packages/torch/_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: x.device().is_cpu() INTERNAL ASSERT FAILED at "csrc/cpu/knn_cpu.cpp":12, please report a bug to PyTorch. x must be CPU tensor
@chaitjo chaitjo added the enhancement New feature or request label Mar 25, 2024
@laserkelvin laserkelvin added the needs triage Issue needs decision making label Mar 25, 2024
@laserkelvin
Copy link
Collaborator

Thanks for bringing this up! That's a good point, I think we've been taking a lot of the dependencies for granted and we'll update the documentation.

Nominally, PyG since a few versions ago, a lot of the PyG core functionality has been upstreamed to be PyTorch (e.g. torch_scatter stuff), but not everything; that means for the most part, PyG by itself should work out of the box on XPUs, however functionality that exists outside - torch_scatter, torch_cluster, torch_sparse - aren't supported yet. So the error you're seeing is basically the low level implementation for knn_graph only exists for CUDA or for CPUs, and it's expecting a tensor that resides on the latter.

I'm not 100% sure what our plans are for supporting those supplementary libraries, and so they might need to be treated on a case-by-case basis. Please reach out to me via email or Slack and we can discuss this further (even if it's not matsciml related). I'll keep this issue up still, since I agree we do need to update our PyG + XPU instructions.

@chaitjo
Copy link
Author

chaitjo commented Mar 25, 2024

Thanks!

What's the current recommended way to installing PyG?

I'm currently using:

pip install torch_geometric
pip install torch-scatter torch-cluster

..and this seems fine unless I need some of the functions from torch-cluster to be run on tensors which are located on XPUs. PyG's doc also states regarding torch-scatter and torch-cluster that these packages 'come with their own CPU and GPU kernel implementations based on the PyTorch C++/CUDA/hip(ROCm) extension interface.' So I suppose there's no real fix yet for my particular usecase apart from shifting my computation to the CPU.

@laserkelvin
Copy link
Collaborator

Those pip commands should work. If you are super paranoid, you can tack on --no-cache-dirs to make sure you're not using a cached version, and also --no-binary :all: to make sure it's built from source. If you have issues, I'd suggest you step through those :)

I've brought up torch_cluster support internally on some things we can potentially do, but will require some time. I'll send you an email separately.

@laserkelvin
Copy link
Collaborator

@chaitjo do you think I can close this issue?

#198 updated the README, and I think it should be pretty complete - within the bounds of the current status of broader framework support

@chaitjo
Copy link
Author

chaitjo commented May 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage Issue needs decision making
Projects
None yet
Development

No branches or pull requests

2 participants