unify buildkite pipelines #443

juliasloan25 · 2023-09-29T21:48:43Z

Purpose

remove some environment variables from our buildkite pipelines, and unify standard and longrun pipelines

closes #441

I have read and checked the items on the review checklist.

.buildkite/pipeline.yml

.buildkite/longruns/pipeline.yml

simonbyrne

Looks good to me, but might be a good idea to try running it

juliasloan25 · 2023-10-03T00:06:27Z

see longrun here: https://buildkite.com/clima/climacoupler-longruns/builds/212

simonbyrne · 2023-10-03T17:14:20Z

.buildkite/longruns/pipeline.yml

@@ -1,22 +1,19 @@
+agents:
+  queue: central
+  slurm_mem: 8G


You're hitting CliMA/slurm-buildkite#47

This is applied to all jobs, but some other jobs also specify slurm_mem_per_cpu, which causes conflicts

Ahh okay, thank you

Do you think it may be similarly problematic to specify both slurm_ntasks and slurm_tasks_per_node? e.g. here

Yes, I think that can do weird things.

juliasloan25 · 2023-10-03T23:14:33Z

@LenkaNovak it looks like these changes will fix the longrun issues we had that prevented the coupler reports from being generated :)

LenkaNovak · 2023-10-04T14:21:08Z

Awesome! Thanks for pursuing this @juliasloan25 and @simonbyrne for your help! Once this is merged we can finalize #456 and start profiling the DYAMOND run with @sriharshakandala 👏

LenkaNovak · 2023-10-04T14:26:36Z

Maybe just a clarification, @simonbyrne , wouldn't removing slurm_tasks_per_node: X make our weak scaling in these long runs less accurate? We noticed in the past that testing across nodes caused a large variability in walltime.

simonbyrne · 2023-10-04T15:59:54Z

Looking at the docs:

--ntasks-per-node=
Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option. This is related to --cpus-per-task=ncpus, but does not require knowledge of the actual number of cpus on each node. In some cases, it is more convenient to be able to request that no more than a specific number of tasks be invoked on each node. Examples of this include submitting a hybrid MPI/OpenMP app where only one MPI "task/rank" should be assigned to each node while allowing the OpenMP portion to utilize all of the parallelism present in the node, or submitting a single setup/cleanup/monitoring job to each node of a pre-existing allocation as one step in a larger job script.

What I would generally suggest is using either just slurm_ntasks, or slurm_ntasks_per_node + slurm_nodes.

juliasloan25 · 2023-10-04T16:50:24Z

I can it change to slurm_ntasks_per_node + slurm_nodes if we want to be more specific! Let me know which you prefer @LenkaNovak

Also, as of this PR our target AMIP longrun fails due to numerical instability. It's hard to tell which PR introduced this, since the last time the longruns successfully ran was Sept 17, and between then and now the target AMIP run was failing due to HDF5 errors. It could have been the updates to ClimaAtmos v0.16.0 or v0.16.1.

LenkaNovak · 2023-10-04T19:21:28Z

slurm_ntasks_per_node + slurm_nodes

Sounds good. Thanks for the pointer! @juliasloan25 Would it be possible to do this, i.e. specify slurm_ntasks_per_node + slurm_nodes? I've just tested it here. The precise setup is a trade off between wait time for resource allocation and getting the specific setup we need, but it would be good to have a more trackable / comparable configurations for the AMIP run set. Otherwise happy to merge this.

juliasloan25 · 2023-10-05T16:15:54Z

bors r+

bors · 2023-10-05T18:21:31Z

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

juliasloan25 requested a review from simonbyrne September 29, 2023 21:48

simonbyrne reviewed Oct 2, 2023

View reviewed changes

.buildkite/pipeline.yml Outdated Show resolved Hide resolved

simonbyrne reviewed Oct 2, 2023

View reviewed changes

.buildkite/longruns/pipeline.yml Outdated Show resolved Hide resolved

juliasloan25 requested a review from simonbyrne October 2, 2023 20:32

simonbyrne approved these changes Oct 2, 2023

View reviewed changes

simonbyrne reviewed Oct 3, 2023

View reviewed changes

juliasloan25 force-pushed the js/envvars branch from f5b8fba to f7ee3da Compare October 4, 2023 00:55

juliasloan25 force-pushed the js/envvars branch from f7ee3da to 35b0763 Compare October 4, 2023 19:46

unify buildkite pipelines

4cce0a6

juliasloan25 force-pushed the js/envvars branch from 35b0763 to 4cce0a6 Compare October 5, 2023 16:15

bors bot merged commit 15314cf into main Oct 5, 2023
10 checks passed

bors bot deleted the js/envvars branch October 5, 2023 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unify buildkite pipelines #443

unify buildkite pipelines #443

juliasloan25 commented Sep 29, 2023

simonbyrne left a comment

juliasloan25 commented Oct 3, 2023

simonbyrne Oct 3, 2023

juliasloan25 Oct 3, 2023

juliasloan25 Oct 3, 2023

simonbyrne Oct 3, 2023

juliasloan25 commented Oct 3, 2023

LenkaNovak commented Oct 4, 2023

LenkaNovak commented Oct 4, 2023 •

edited

Loading

simonbyrne commented Oct 4, 2023

juliasloan25 commented Oct 4, 2023 •

edited

Loading

LenkaNovak commented Oct 4, 2023

juliasloan25 commented Oct 5, 2023

bors bot commented Oct 5, 2023

unify buildkite pipelines #443

unify buildkite pipelines #443

Conversation

juliasloan25 commented Sep 29, 2023

Purpose

simonbyrne left a comment

Choose a reason for hiding this comment

juliasloan25 commented Oct 3, 2023

simonbyrne Oct 3, 2023

Choose a reason for hiding this comment

juliasloan25 Oct 3, 2023

Choose a reason for hiding this comment

juliasloan25 Oct 3, 2023

Choose a reason for hiding this comment

simonbyrne Oct 3, 2023

Choose a reason for hiding this comment

juliasloan25 commented Oct 3, 2023

LenkaNovak commented Oct 4, 2023

LenkaNovak commented Oct 4, 2023 • edited Loading

simonbyrne commented Oct 4, 2023

juliasloan25 commented Oct 4, 2023 • edited Loading

LenkaNovak commented Oct 4, 2023

juliasloan25 commented Oct 5, 2023

bors bot commented Oct 5, 2023

LenkaNovak commented Oct 4, 2023 •

edited

Loading

juliasloan25 commented Oct 4, 2023 •

edited

Loading