Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve traceability between currently-running tests and Coiled resources they create #267

Open
5 tasks
gjoseph92 opened this issue Aug 19, 2022 · 10 comments
Open
5 tasks
Labels
dx Developer experience

Comments

@gjoseph92
Copy link
Contributor

gjoseph92 commented Aug 19, 2022

Tests generally create Coiled resources (clusters & software environments) with a pattern like "{test/module name}_{random UUID}". Especially when multiple GitHub actions are running the same tests against different python versions and platforms, this can make it very hard to track down which cluster a particular test is running on. (See #243 (comment) for an example.)

It would be nice to have a naming scheme that's more meaningful and hierarchical. Some ideas:

  • Instead of random UUIDs, can we use GitHub Actions job IDs? This lets you map between the cluster and the job that created it.
  • Can we use IDs more hierarchically, so all clusters and senvs from the same GHA job have that ID in their name, even if they need to have a third component to further identify them too (I actually think they probably don't need a third component though)?
  • Every time we create and close a cluster, can we log that, and note which module it's being used for?
  • At the start of every test, can we print the test name, the cluster name and ID it's running on, and a link to the dashboard?
  • Between each test on a cluster, use a client.run or something to print in all the worker and scheduler logs which test is about to run. When tests fail, this would make parsing logs a lot more meaningful.

cc @ian-r-rose @ncclementi @jrbourbeau

@gjoseph92 gjoseph92 added the dx Developer experience label Aug 19, 2022
@jrbourbeau
Copy link
Member

+1 thanks for opening an issue on this, I've been meaning to get to it

@ian-r-rose
Copy link
Contributor

At the start of every test, can we print the test name, the cluster name and ID it's running on, and a link to the dashboard?

This in particular has been challenging to do in combination with pytest-xdist: pytest-dev/pytest-xdist#402 . Of course, we can still log more stuff for use in local development.

@gjoseph92
Copy link
Contributor Author

@ian-r-rose does that only apply to logging, or all stdout? Print statements and -s would be an acceptable alternative :)

@ian-r-rose
Copy link
Contributor

@ian-r-rose does that only apply to logging, or all stdout?

It's been a bit since I looked at it, but my recollection is that -s does not work in xdist.

Print statements and -s would be an acceptable alternative :)

Don't reveal to the world my debugging techniques!

@jrbourbeau
Copy link
Member

Noting that clusters can now be tagged (see https://docs.coiled.io/user_guide/cluster_creation.html#tags). We could probably make use of this to improve test/cluster/etc. traceability

cc @ntabris for visibility

@ncclementi
Copy link
Contributor

@ian-r-rose does that only apply to logging or all stdout? Print statements and -s would be an acceptable alternative :)

I just saw this, but last time I tried what I found was, in gHA CI print statements with -s and xdist didn't work. I also tried logging and it only worked when having failures otherwise they were swallowed. Some of the discussion was here #107 (comment) related to pytest-dev/pytest-xdist#402

@ncclementi
Copy link
Contributor

Closing : This discussion seems stale since now we do have easier traceability from the dashboard plot to the database table which has links to the cluster.

@gjoseph92
Copy link
Contributor Author

The issue was also about currently-running tests, which AFAIK you still can't map to currently-running clusters? The linking in the static HTML files works for checking on things after the fact, but if you're developing new tests, or running tests to debug them, I think it's still hard to figure out which cluster they're running on.

See https://github.com/gjoseph92/snakebench#mapping-between-coiled-clusters-and-the-tests-running-them for an example of the sort of deterministic, hierarchical naming I was talking about.

@ncclementi ncclementi reopened this Nov 30, 2022
@ncclementi
Copy link
Contributor

The issue was also about currently-running tests, which AFAIK you still can't map to currently-running clusters?

My bad, you are right this is still a problem. AFAIK we still have the logging problem with xdist so that is not an option. I'll re-open this.

@gjoseph92
Copy link
Contributor Author

Yeah, xdist makes printing info hard. But again, see the snakebench naming scheme: https://github.com/gjoseph92/snakebench/blob/8c6cda1f11e0f27e17bf7234ab1e66d040b3fe81/snakebench/core.py#L73-L92. When you see a cluster like f35ff10-3576597018-1-1-tests_benchmarks_test_array, I know exactly what's running it even without logging, because the triggering commit and GHA run ID are all in the name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dx Developer experience
Projects
None yet
Development

No branches or pull requests

4 participants