You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In our buildkite longruns pipeline, we specify agent requirements for the GPU AMIP runs (introduced in #632) that are very difficult (impossible?) to satisfy. Because of this, these jobs are waiting for an agent for a very long time (see here). We need to update these jobs to request GPUs and an appropriate amount of memory, but not multiple nodes/tasks. We should also figure out what the appropriate amount of memory is.
Update: the jobs take < 5GB of memory when run for a few timesteps, so the existing limits of 16-20GB should be sufficient (checked using CUDA.memory_status).
The text was updated successfully, but these errors were encountered:
In our buildkite longruns pipeline, we specify agent requirements for the GPU AMIP runs (introduced in #632) that are very difficult (impossible?) to satisfy. Because of this, these jobs are waiting for an agent for a very long time (see here). We need to update these jobs to request GPUs and an appropriate amount of memory, but not multiple nodes/tasks. We should also figure out what the appropriate amount of memory is.
Update: the jobs take < 5GB of memory when run for a few timesteps, so the existing limits of 16-20GB should be sufficient (checked using
CUDA.memory_status
).The text was updated successfully, but these errors were encountered: