Improve traceability between currently-running tests and Coiled resources they create #267

gjoseph92 · 2022-08-19T15:55:54Z

Tests generally create Coiled resources (clusters & software environments) with a pattern like "{test/module name}_{random UUID}". Especially when multiple GitHub actions are running the same tests against different python versions and platforms, this can make it very hard to track down which cluster a particular test is running on. (See #243 (comment) for an example.)

It would be nice to have a naming scheme that's more meaningful and hierarchical. Some ideas:

Instead of random UUIDs, can we use GitHub Actions job IDs? This lets you map between the cluster and the job that created it.
Can we use IDs more hierarchically, so all clusters and senvs from the same GHA job have that ID in their name, even if they need to have a third component to further identify them too (I actually think they probably don't need a third component though)?
Every time we create and close a cluster, can we log that, and note which module it's being used for?
At the start of every test, can we print the test name, the cluster name and ID it's running on, and a link to the dashboard?
Between each test on a cluster, use a client.run or something to print in all the worker and scheduler logs which test is about to run. When tests fail, this would make parsing logs a lot more meaningful.

cc @ian-r-rose @ncclementi @jrbourbeau

The text was updated successfully, but these errors were encountered:

jrbourbeau · 2022-08-19T15:57:24Z

+1 thanks for opening an issue on this, I've been meaning to get to it

ian-r-rose · 2022-08-19T16:03:48Z

At the start of every test, can we print the test name, the cluster name and ID it's running on, and a link to the dashboard?

This in particular has been challenging to do in combination with pytest-xdist: pytest-dev/pytest-xdist#402 . Of course, we can still log more stuff for use in local development.

gjoseph92 · 2022-08-19T16:06:00Z

@ian-r-rose does that only apply to logging, or all stdout? Print statements and -s would be an acceptable alternative :)

ian-r-rose · 2022-08-19T16:12:31Z

@ian-r-rose does that only apply to logging, or all stdout?

It's been a bit since I looked at it, but my recollection is that -s does not work in xdist.

Print statements and -s would be an acceptable alternative :)

Don't reveal to the world my debugging techniques!

jrbourbeau · 2022-09-27T17:08:54Z

Noting that clusters can now be tagged (see https://docs.coiled.io/user_guide/cluster_creation.html#tags). We could probably make use of this to improve test/cluster/etc. traceability

cc @ntabris for visibility

ncclementi · 2022-10-03T19:15:33Z

@ian-r-rose does that only apply to logging or all stdout? Print statements and -s would be an acceptable alternative :)

I just saw this, but last time I tried what I found was, in gHA CI print statements with -s and xdist didn't work. I also tried logging and it only worked when having failures otherwise they were swallowed. Some of the discussion was here #107 (comment) related to pytest-dev/pytest-xdist#402

ncclementi · 2022-11-29T22:10:17Z

Closing : This discussion seems stale since now we do have easier traceability from the dashboard plot to the database table which has links to the cluster.

gjoseph92 · 2022-11-29T23:48:51Z

The issue was also about currently-running tests, which AFAIK you still can't map to currently-running clusters? The linking in the static HTML files works for checking on things after the fact, but if you're developing new tests, or running tests to debug them, I think it's still hard to figure out which cluster they're running on.

See https://github.com/gjoseph92/snakebench#mapping-between-coiled-clusters-and-the-tests-running-them for an example of the sort of deterministic, hierarchical naming I was talking about.

ncclementi · 2022-11-30T14:38:50Z

The issue was also about currently-running tests, which AFAIK you still can't map to currently-running clusters?

My bad, you are right this is still a problem. AFAIK we still have the logging problem with xdist so that is not an option. I'll re-open this.

gjoseph92 · 2022-11-30T16:25:43Z

Yeah, xdist makes printing info hard. But again, see the snakebench naming scheme: https://github.com/gjoseph92/snakebench/blob/8c6cda1f11e0f27e17bf7234ab1e66d040b3fe81/snakebench/core.py#L73-L92. When you see a cluster like f35ff10-3576597018-1-1-tests_benchmarks_test_array, I know exactly what's running it even without logging, because the triggering commit and GHA run ID are all in the name.

gjoseph92 added the dx Developer experience label Aug 19, 2022

gjoseph92 mentioned this issue Aug 19, 2022

Store memory profile plots, show them when hovering over memory usage dashboard? #268

Open

gjoseph92 mentioned this issue Sep 8, 2022

test_basic_sum occasionally takes 340% time and 160% memory to complete #315

Closed

ncclementi closed this as completed Nov 29, 2022

ncclementi reopened this Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve traceability between currently-running tests and Coiled resources they create #267

Improve traceability between currently-running tests and Coiled resources they create #267

gjoseph92 commented Aug 19, 2022 •

edited

Loading

jrbourbeau commented Aug 19, 2022

ian-r-rose commented Aug 19, 2022

gjoseph92 commented Aug 19, 2022

ian-r-rose commented Aug 19, 2022

jrbourbeau commented Sep 27, 2022

ncclementi commented Oct 3, 2022

ncclementi commented Nov 29, 2022

gjoseph92 commented Nov 29, 2022

ncclementi commented Nov 30, 2022

gjoseph92 commented Nov 30, 2022

Improve traceability between currently-running tests and Coiled resources they create #267

Improve traceability between currently-running tests and Coiled resources they create #267

Comments

gjoseph92 commented Aug 19, 2022 • edited Loading

jrbourbeau commented Aug 19, 2022

ian-r-rose commented Aug 19, 2022

gjoseph92 commented Aug 19, 2022

ian-r-rose commented Aug 19, 2022

jrbourbeau commented Sep 27, 2022

ncclementi commented Oct 3, 2022

ncclementi commented Nov 29, 2022

gjoseph92 commented Nov 29, 2022

ncclementi commented Nov 30, 2022

gjoseph92 commented Nov 30, 2022

gjoseph92 commented Aug 19, 2022 •

edited

Loading