GPU queue experiments #446

jaimergp · 2022-12-19T11:15:57Z

Add support in conda-smithy for rerenders
- Add support for Cirun on self-hosted GHA runners conda-smithy#1703
Log in with a conda-forge bot account to configure it with Quansight's OpenStack cloud provider
Add support for automatic repository registration:
- Enable Cirun app for given repository, on Github end
- Enable Cirun OpenStack-GPU runner for given repository, on Cirun end
Add support in admin-requests
Documentation

conda-forge-linter · 2022-12-19T11:16:02Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

MNT: Re-rendered with conda-build 3.21.7+119.g1b221ef0, conda-smithy 3.22.1.post.dev3, and conda-forge-pinning 2022.12.19.14.36.50

jaimergp · 2023-01-04T13:15:53Z

We need to rebuild the VM images with docker support.

.cirun.yml

conda-forge-webservices · 2023-01-09T10:18:48Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jaimergp · 2023-01-09T10:33:06Z

Updated the conda-smithy PR with support to recognize the need for extra docker flags (--gpus all). Next step: nvidia-docker runtime suppport in the VM!

.cirun.yml

Co-authored-by: Amit Kumar <dtu.amit@gmail.com>

jaimergp · 2023-01-10T18:17:10Z

Good news, the --gpus switch works now!

I think there is some issue with SSL downloads inside the Docker instance. With or without GPUs, we are failing to clone the repo that provides the source code to be built:

fatal: unable to access 'https://github.com/openmm/openmm.git/': SSL connection timeout

I have seen this in the two runs that managed to successfully start conda mambabuild (0486882 and 7901e95).

Another minor detail is that all actions checks are green despite the runner having obviously failed 🤔 See the green checks next to the commit hashes. Have you seen it elsewhere?

jaimergp · 2023-01-10T18:29:29Z

Might be a MTU mismatch. I recall we needed to route things through a VPN tunnel so I guess we are using a lower MTU than default, and Docker is insisting on using 1500. The daemon configuration would have to be amended.

jaimergp · 2023-01-10T20:25:14Z

Well, if you are reading this, please know you are witnessing the first ever conda-build run that has successfully used GPUs in conda-forge 🚀

Relevant block (ignore the warnings, that's OpenCL; omitted here for clarity):

+ python -m openmm.testInstallation

OpenMM Version: 7.7
Git Revision: 130124a3f9277b054ec40927360a6ad20c8f5fa6

There are 4 Platforms available:

1 Reference - Successfully computed forces
2 CPU - Successfully computed forces
3 CUDA - Successfully computed forces
4 OpenCL - Successfully computed forces

Median difference in forces between platforms:

Reference vs. CPU: 6.30618e-06
Reference vs. CUDA: 6.72268e-06
CPU vs. CUDA: 7.09326e-07
Reference vs. OpenCL: 6.74352e-06
CPU vs. OpenCL: 7.29696e-07
CUDA vs. OpenCL: 4.46884e-07

All differences are within tolerance.

Still a long way to go in terms of access control and so on, but this is a BIG milestone! Thanks @aktech for all the work in the backend!

leofang · 2023-01-10T20:50:42Z

Can we try it for CuPy? 🙂 Does it also support aarch64?

jakirkham · 2023-01-10T23:15:16Z

This is nice to see! Thanks Jaime 😄

For CuPy, we can run the tests during the build or on pre-built packages (using conda build --test <pkg file>). So it would make another nice test case.

jaimergp · 2023-01-11T15:44:05Z

Can we try it for CuPy? 🙂 Does it also support aarch64?

The server is native x64, but I guess we can cross-compile and/or emulate?

I'll try CuPy next week. I am out of hours for this one.

leofang · 2023-01-11T15:54:43Z

I am happy to help in any ways, just let me know. (And no rush, ofc!) Right now we use QEMU for aarch64/ppc64le, as it's not possible to cross-compile AFAIK.

start doc

fbbdfd4

rerender with self-hosted cirun support

5dc139e

MNT: Re-rendered with conda-build 3.21.7+119.g1b221ef0, conda-smithy 3.22.1.post.dev3, and conda-forge-pinning 2022.12.19.14.36.50

jaimergp mentioned this pull request Dec 19, 2022

Add support for Cirun on self-hosted GHA runners conda-forge/conda-smithy#1703

Merged

1 task

jaimergp closed this Dec 27, 2022

jaimergp reopened this Dec 27, 2022

jaimergp added 3 commits December 27, 2022 11:52

rename cirun temporarily

c86bbed

use extended yaml syntax

e7604c1

try to match labels

55807e9

jaimergp closed this Dec 27, 2022

jaimergp reopened this Dec 27, 2022

jaimergp added 2 commits December 27, 2022 13:12

Some more docs [ci skip]

45acbf2

retrigger

41a4154

aktech reviewed Jan 6, 2023

View reviewed changes

.cirun.yml Outdated Show resolved Hide resolved

jaimergp added 3 commits January 6, 2023 12:09

use latest vm

4b8e905

use gpu_medium

7901e95

rerender with smithy-dev 5f96e5ef

a7dff98

aktech reviewed Jan 10, 2023

View reviewed changes

.cirun.yml Outdated Show resolved Hide resolved

jaimergp and others added 2 commits January 10, 2023 18:50

Update .cirun.yml

7929ebc

Co-authored-by: Amit Kumar <dtu.amit@gmail.com>

retrigger

0486882

jaimergp added 3 commits January 10, 2023 20:35

patch mtu?

86e6498

sudo it

9903ce8

use gpu_large?

6a0df57

jaimergp mentioned this pull request Jan 23, 2023

Try GPU CI with cupy (DNM) #466

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU queue experiments #446

GPU queue experiments #446

jaimergp commented Dec 19, 2022 •

edited

conda-forge-linter commented Dec 19, 2022

jaimergp commented Jan 4, 2023

conda-forge-webservices bot commented Jan 9, 2023

jaimergp commented Jan 9, 2023

jaimergp commented Jan 10, 2023

jaimergp commented Jan 10, 2023

jaimergp commented Jan 10, 2023

leofang commented Jan 10, 2023

jakirkham commented Jan 10, 2023

jaimergp commented Jan 11, 2023

leofang commented Jan 11, 2023

GPU queue experiments #446

Are you sure you want to change the base?

GPU queue experiments #446

Conversation

jaimergp commented Dec 19, 2022 • edited

conda-forge-linter commented Dec 19, 2022

jaimergp commented Jan 4, 2023

conda-forge-webservices bot commented Jan 9, 2023

jaimergp commented Jan 9, 2023

jaimergp commented Jan 10, 2023

jaimergp commented Jan 10, 2023

jaimergp commented Jan 10, 2023

leofang commented Jan 10, 2023

jakirkham commented Jan 10, 2023

jaimergp commented Jan 11, 2023

leofang commented Jan 11, 2023

jaimergp commented Dec 19, 2022 •

edited