Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU queue experiments #446

Draft
wants to merge 15 commits into
base: gpu-tests
Choose a base branch
from
Draft

Conversation

jaimergp
Copy link
Member

@jaimergp jaimergp commented Dec 19, 2022

  • Add support in conda-smithy for rerenders
  • Log in with a conda-forge bot account to configure it with Quansight's OpenStack cloud provider
  • Add support for automatic repository registration:
    • Enable Cirun app for given repository, on Github end
    • Enable Cirun OpenStack-GPU runner for given repository, on Cirun end
  • Add support in admin-requests
  • Documentation

@conda-forge-linter
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

MNT: Re-rendered with conda-build 3.21.7+119.g1b221ef0, conda-smithy 3.22.1.post.dev3, and conda-forge-pinning 2022.12.19.14.36.50
@jaimergp jaimergp closed this Dec 27, 2022
@jaimergp jaimergp reopened this Dec 27, 2022
@jaimergp
Copy link
Member Author

jaimergp commented Jan 4, 2023

We need to rebuild the VM images with docker support.

.cirun.yml Outdated Show resolved Hide resolved
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@jaimergp
Copy link
Member Author

jaimergp commented Jan 9, 2023

Updated the conda-smithy PR with support to recognize the need for extra docker flags (--gpus all). Next step: nvidia-docker runtime suppport in the VM!

.cirun.yml Outdated Show resolved Hide resolved
jaimergp and others added 2 commits January 10, 2023 18:50
Co-authored-by: Amit Kumar <dtu.amit@gmail.com>
@jaimergp
Copy link
Member Author

Good news, the --gpus switch works now!

I think there is some issue with SSL downloads inside the Docker instance. With or without GPUs, we are failing to clone the repo that provides the source code to be built:

fatal: unable to access 'https://github.com/openmm/openmm.git/': SSL connection timeout

I have seen this in the two runs that managed to successfully start conda mambabuild (0486882 and 7901e95).


Another minor detail is that all actions checks are green despite the runner having obviously failed 🤔 See the green checks next to the commit hashes. Have you seen it elsewhere?

@jaimergp
Copy link
Member Author

Might be a MTU mismatch. I recall we needed to route things through a VPN tunnel so I guess we are using a lower MTU than default, and Docker is insisting on using 1500. The daemon configuration would have to be amended.

@jaimergp
Copy link
Member Author

Well, if you are reading this, please know you are witnessing the first ever conda-build run that has successfully used GPUs in conda-forge 🚀

Relevant block (ignore the warnings, that's OpenCL; omitted here for clarity):

+ python -m openmm.testInstallation

OpenMM Version: 7.7
Git Revision: 130124a3f9277b054ec40927360a6ad20c8f5fa6

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Successfully computed forces
4 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.30618e-06
Reference vs. CUDA: 6.72268e-06
CPU vs. CUDA: 7.09326e-07
Reference vs. OpenCL: 6.74352e-06
CPU vs. OpenCL: 7.29696e-07
CUDA vs. OpenCL: 4.46884e-07

All differences are within tolerance.

Still a long way to go in terms of access control and so on, but this is a BIG milestone! Thanks @aktech for all the work in the backend!

@leofang
Copy link
Member

leofang commented Jan 10, 2023

Can we try it for CuPy? 🙂 Does it also support aarch64?

@jakirkham
Copy link
Member

This is nice to see! Thanks Jaime 😄

For CuPy, we can run the tests during the build or on pre-built packages (using conda build --test <pkg file>). So it would make another nice test case.

@jaimergp
Copy link
Member Author

Can we try it for CuPy? 🙂 Does it also support aarch64?

The server is native x64, but I guess we can cross-compile and/or emulate?

I'll try CuPy next week. I am out of hours for this one.

@leofang
Copy link
Member

leofang commented Jan 11, 2023

I am happy to help in any ways, just let me know. (And no rush, ofc!) Right now we use QEMU for aarch64/ppc64le, as it's not possible to cross-compile AFAIK.

@jaimergp jaimergp mentioned this pull request Jan 23, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants