Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Cuda" counterpart for Apple M1 computers #149

Open
areyoukidneyme opened this issue Aug 23, 2022 · 25 comments
Open

"Cuda" counterpart for Apple M1 computers #149

areyoukidneyme opened this issue Aug 23, 2022 · 25 comments
Assignees
Labels
enhancement New feature or improvement

Comments

@areyoukidneyme
Copy link

I'm running CellBender in my M1 Macbook but I guess the "cuda" argument to call for GPU use does not work in M1 computers. Was wondering if we could call for GPU usage in M1 devices when running CellBender.

Thanks in advance!

@sjfleming
Copy link
Member

If this source is to be believed
https://towardsdatascience.com/gpu-acceleration-comes-to-pytorch-on-m1-macs-195c399efcc1

then it looks like the M1 gpu might work in pytorch 1.12+

It looks like if we want cellbender to use the M1 gpu, then we have to use the device "mps" rather than "cuda". That's probably the only change that would need to be made.

I can try to make these changes if you're interested! But I would need you to test it out, as I don't have an Apple M1 myself :)
Want to test it @areyoukidneyme ?

@sjfleming sjfleming self-assigned this Sep 22, 2022
@sjfleming sjfleming added the enhancement New feature or improvement label Sep 22, 2022
@areyoukidneyme
Copy link
Author

Hello @sjfleming !

I'd love to try this out. So are you saying that the code should look something like this?

cellbender remove-background --input 'raw_feature_bc_matrix.h5' --output 'X_cellbender_output.h5' \ --expected-cells 10000 \ --total-droplets-included 30000 \ --epochs 150 \ --z-dim 200 \ --z-layers 1000 \ --mps

Just use --mps instead of --cuda?

@areyoukidneyme
Copy link
Author

Hello @sjfleming !

I'd love to try this out. So are you saying that the code should look something like this?

cellbender remove-background --input 'raw_feature_bc_matrix.h5' --output 'X_cellbender_output.h5' \ --expected-cells 10000 \ --total-droplets-included 30000 \ --epochs 150 \ --z-dim 200 \ --z-layers 1000 \ --mps

Just use --mps instead of --cuda?

Update: I tried this and it seems like it's not working.

I've seen this link before https://towardsdatascience.com/gpu-acceleration-comes-to-pytorch-on-m1-macs-195c399efcc1 . I did install PyTorch but I'm not super familiar with it. Also not very well-versed with Python. If you could walk me through what I should do, that would be nice.

Would love to help figuring this out.

Thank you!

@sjfleming
Copy link
Member

Hi @areyoukidneyme , okay great to hear. Well, I will have to make some code changes in cellbender actually, before you can try that out. (I will make it so that you can use --mps instead of --cuda, but the code currently will not allow that.)

I will let you know once I get something ready for you to test out.

@areyoukidneyme
Copy link
Author

Gotcha! Just let me know and I will try it out.

Honestly, I kinda stop using CellBender for my dataset because it took me 20+ hours to finish 1 sample without the --cuda function. Although, I do not have an idea of how much time it takes with the --cuda since I've not made it work in my windows computer but I believe calling the GPU use will help a lot.

Anyway, thank you and let me know!

@sjfleming
Copy link
Member

Yes, on a GPU, the code should run in about 1.5 hours. Still kind of longer than I'd like, but much more manageable.

In the meantime, have you ever tried using a GPU on google colab? Some users told me they have been able to successfully run cellbender there for free. (You can get kicked off the machine randomly though...)
https://colab.research.google.com

You can run command-line commands from a Jupyter notebook cell in google colab by starting the line with !, like

!cellbender remove-background --cuda --input input_file.h5 --output output_file.h5

Or, if by chance you are part of a research lab that uses Terra for cloud compute (app.terra.bio), you can run the cellbender workflow there on google cloud GPUs. It's all set up for you. But you do have to pay for compute.
https://app.terra.bio/#workspaces/help-terra/CellBender

sjfleming added a commit that referenced this issue Sep 22, 2022
@sjfleming
Copy link
Member

Okay, when you get a chance, could you install cellbender from the sf_pytorch_mps_backend branch on this GitHub repository? Then try the command you tried before:

cellbender remove-background \
    --mps \
    --input 'raw_feature_bc_matrix.h5' \
    --output 'X_cellbender_output.h5' \ 
    --expected-cells 10000 \ 
    --total-droplets-included 30000 \ 
    --epochs 150 \ 
    --z-dim 200 \ 
    --z-layers 1000

@sjfleming
Copy link
Member

(I have not been able to test this, so we might need to go back-and-forth a bit until any errors I've made are corrected...)

@areyoukidneyme
Copy link
Author

areyoukidneyme commented Sep 22, 2022

Okay, when you get a chance, could you install cellbender from the sf_pytorch_mps_backend branch on this GitHub repository? Then try the command you tried before:

cellbender remove-background \
    --mps \
    --input 'raw_feature_bc_matrix.h5' \
    --output 'X_cellbender_output.h5' \ 
    --expected-cells 10000 \ 
    --total-droplets-included 30000 \ 
    --epochs 150 \ 
    --z-dim 200 \ 
    --z-layers 1000

Sorry, what do you mean by install cellbender from the sf_pytorch_mps_backend branch on this GitHub repository? ?

I don't know if this info's gonna help but these are my current conda environments:

# conda environments:
#
base                  *  /Users/jevaremdphd/opt/anaconda3
cellbender               /Users/jevaremdphd/opt/anaconda3/envs/cellbender
torch-nightly            /Users/jevaremdphd/opt/anaconda3/envs/torch-nightly
                         /Users/jevaremdphd/opt/miniconda3/envs/cellbender
                         /Users/jevaremdphd/opt/miniconda3/envs/torch-gpu

@sjfleming
Copy link
Member

We might also be a little too early for this to work… it looks like PyTorch is not yet fully functional for the MPS backend
pytorch/pytorch#77764

@areyoukidneyme
Copy link
Author

We might also be a little too early for this to work… it looks like PyTorch is not yet fully functional for the MPS backend
pytorch/pytorch#77764

Ooooh. I see.

Anyway, thanks for trying tho! I really appreciate it!

@sjfleming
Copy link
Member

Well it’s possible it might work! But now I’m slightly more pessimistic

@sjfleming
Copy link
Member

If you installed the cellbender package with “pip install -e CellBender”, then it’s installed in editable mode, so all you need to do is navigate to that CellBender folder and then do

git pull
git checkout -b sf_pytorch_mps_backend

That will change your local copy of the code to the new branch, and since you pip installed with “-e”, then you’ll be ready to try it out without needing to reinstall CellBender.

@areyoukidneyme
Copy link
Author

Hey @sjfleming ! So sorry for getting back at this super late.

I tried doing this

source activate cellbender
git pull
git checkout -b sf_pytorch_mps_backend

but it's giving me this fatal: not a git repository (or any of the parent directories): .git . Did I do this right? or did I miss some steps before doing the git pull? Sorry, I am not well-versed with Python.

@sjfleming
Copy link
Member

Hi @areyoukidneyme , I think I can help. That git checkout command needs to happen in the directory that's the root directory for your clone of the cellbender repository.

So on my laptop for example, I did something like

cd /home/sfleming/github
git clone https://github.com/broadinstitute/CellBender.git CellBender

then the root directory of my clone would be at /home/sfleming/github/CellBender

Then if i want to checkout the sf_pytorch_mps_backend branch, I would do

$ cd /home/sfleming/github/CellBender
$ git pull
$ git checkout -b sf_pytorch_mps_backend

I would then probably also do this (if you are using a conda environment for your cellbender installation, as in the README... source activate is replaced by conda activate for newer conda versions)

$ conda activate cellbender
(cellbender) $ pip install -e /home/sfleming/github/CellBender

Then you should be ready to try it out

@areyoukidneyme
Copy link
Author

areyoukidneyme commented Oct 5, 2022

So I tried this out, this is what happened:

$ git pull

Already up to date.

$ git checkout -b sf_pytorch_mps_backend

Switched to a new branch 'sf_pytorch_mps_backend

And then I tried running this:

cellbender remove-background \
    --mps \
    --input 'raw_feature_bc_matrix.h5' \
    --output 'X_cellbender_output.h5' \ 
    --expected-cells 10000 \ 
    --total-droplets-included 30000 \ 
    --epochs 150 \ 
    --z-dim 200 \ 
    --z-layers 1000

I got this error:

usage: cellbender [-h] {remove-background} ...
cellbender: error: unrecognized arguments: --mps
zsh: command not found: --expected-cells

@sjfleming
Copy link
Member

Okay it looks like the branch switching worked so that's good. But it seems like you will have to re-run this command:

cd <FOLDER_WITH_CELLBENDER_GITHUB_REPO>
pip install -e .

That should re-install cellbender and allow you to use that new branch (hopefully).

@asabjorklund
Copy link

I installed from the origin/sf_pytorch_mps_backend branch which worked fine after removing the line

-f https://download.pytorch.org/whl/torch_stable.html

in REQUIREMENTS.txt since it threw an error.

But it seems you will have to wait for some more updates to pytorch. When running with --mps flag I get:

NotImplementedError: The operator 'aten::_standard_gamma' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

I have pytorch=1.13.0 installed via conda.

So I will just have to run the slow cpu version for now I guess.

@sjfleming
Copy link
Member

Oh very interesting, thank you @asabjorklund !

(I guess you can also try setting that environment variable and see if other things are still faster on MPS.)

I will be trying a small change to see if it speeds up CPU compute soon #160

@asabjorklund
Copy link

@sjfleming I already tested setting the PYTORCH_ENABLE_MPS_FALLBACK=1 but it failed with error:

/Users/asabjor/miniconda3-arm64/envs/cellbender2/lib/python3.8/site-packages/torch/distributions/gamma.py:11: UserWarning: The operator 'aten::_standard_gamma' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/_temp/anaconda/conda-bld/pytorch_1666646603923/work/aten/src/ATen/mps/MPSFallback.mm:11.) return torch._standard_gamma(concentration) loc("mps_multiply"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<300x500xf32>' and 'tensor<1xi64>' are not broadcast compatible LLVM ERROR: Failed to infer result type(s).

So I gave up and reverted to the cpu version, great to hear that it will be faster soon!

@sjfleming
Copy link
Member

Hmm, okay thank you for posting this... I am not sure what's going on there unfortunately. Bummer that it doesn't work. Seems to be this pytorch/pytorch#78429 so I'll try to check what the solution is there.

@BradBalderson
Copy link

Hey! This is an awesome tool, thankyou for developing. I am also getting this running on my M3 mac, and also have the issue with regards to --cuda. Wondering if there is some update on this?

I will try the branch indicate in the meantime, and if it doesn't work feeling motivated enough I will see if I can update cellbender's code and put in a pull request :) Thanks!

@BradBalderson
Copy link

BradBalderson commented Dec 4, 2023

OK so I got this going on my M3 mac now: https://github.com/BradBalderson/CellBender

Works with:
cellbender remove-background --mps --input {sample_input_counts} --output {sample_out_dir}

Note that I added this to the top of the run script:
os.environ['PYTORCH_ENABLE_MPS_FALLBACK'] = '1'

This ensures that for components of pytorch that are not implemented for MPS device, will fall back to CPU. Also had to make sure had the latest version of pyro and pytorch installed. These are my exact versions of major CellBender dependencies:

pyro-ppl==1.8.6
torch==2.1.1

I determined the correct pytorch for my machine here: https://pytorch.org/get-started/locally/

At first did not have the latest pytorch version and so ran into more mps issues (particular torch.where would not work).

Thanks for a nice tool! I will put in a pull request. Have not tested to make sure I didn't break cuda though (though write in a way that should be more backend agnostic), perhaps @sjfleming could test?? Thankyou!

@BradBalderson
Copy link

BradBalderson commented Dec 11, 2023

I don't know why, but it runs for a few epochs, then I get a stranger error with regards to two tensors not being the same shape with the above implementation. Will debug next time I need to run, but just thought I'd flag this to other users if they try my M3 apple silicon implementation. If I run the equivalent on CPU, no such error occurs. Think there is something M3 specific happening in the backend of Pytorch that will be tricky to debug.

@kulinseth
Copy link

@BradBalderson , can you please file a github issue for module:mps and a way to repro the issue ? we will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or improvement
Projects
None yet
Development

No branches or pull requests

5 participants