GIN as GitHub Artifacts #266

CodyCBakerPhD · 2021-09-21T17:18:37Z

Motivation

Let's see if this works...

The idea is, instead of manually re-downloading the GIN datasets each time we want to do a full test, do (1) an initial download from GIN (takes about 42 minutes) and 'upload' to GitHub actions [as an artifact] (takes less than 5 minutes)(https://docs.github.com/en/actions/guides/storing-workflow-data-as-artifacts), (2) then to perform tests 'download' the directory from the artifacts locally (download time TBD). This, in theory, should allow the full tests to run much faster.

This also simplifies the test_gin script to no longer need datalad through that imported python API (all gets offloaded to workflows), which reduces the amount of code needed by a significant margin.

This also allows the possibility of running full GIN tests on other parts of the OS matrix to ensure our awareness of OS specific issues.

How to test the behavior?

No changes to the way testing behavior is called; the CI will report issues quickly if there's an issue with handling the artifact part.

Checklist

Have you checked our Contributing document?
Have you ensured the PR description clearly describes the problem and solutions?
Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?

CodyCBakerPhD · 2021-09-21T17:20:18Z

Immediately failing to recognize the successfully uploaded artifact; will work on debugging this.

CodyCBakerPhD · 2021-09-21T17:27:26Z

Aha, I can manually download the artifact from that job run, but to get it on the other workflow I'll have to do something like this

https://stackoverflow.com/questions/60355925/share-artifacts-between-workflows-github-actions

the only thing now is, and this is why I put upload/download in single quotes above, it does seem like it's still spending server bandwidth on its own end rather than somehow storing it somewhere local that can allow memory access... so it remains to see if it's actually any faster than just going through GIN directly.

…b-conversion-tools into gin_as_artifact

CodyCBakerPhD · 2021-09-23T21:00:58Z

OK - this should be ready to go now.

Basically, once every 90 days, someone would have to go to this branch and do a push (or just manually trigger that specific workflow through actions), and that will update the archived artifact of the ephy testing dataset.

Now, each development test workflow uses this custom action to download that artifact and perform the GIN tests on it.

Some statistics on performance:

The original approach for the "Full tests", running on ubuntu only, with Datalad fetching only specific data formats, takes on average 20-30 minutes to download and run.

This approach, which is currently able to run on both ubuntu and mac, takes about 8-9 minutes for each OS, and can run each in parallel.

One thing to note is currently, the archived artifact is the entire GIN repo (takes about 40 minutes to upload as the artifact, but again that would only be every 90 days at a minumum). We could narrow it down to the formats we currently use, and this could speed it up even more, but will require more complicated workflow code so I'll table it for a next addition.

Another thing to add in later will be an auto-schedule workflow to automatically update the GIN once every month or so so we don't have to do it manually. However, it will take some tinkering to find the best way to also then update the workflow/run IDs across to the development tests so they know where to look for the artifact.

The custom action however, currently throws an error on the windows part of the matrix, so I'll post an issue on their repo to try to get that resolved; looks like something simple involving the decompression. Will extend to include that in coverage once/if it is fixed.

CodyCBakerPhD · 2021-09-23T21:02:48Z

Note also that this removes the "Full tests" from the workflows and the "development tests" instead run test_gin.py every time, making those the ones we want to set to 'required'.

CodyCBakerPhD · 2021-09-27T18:33:09Z

Raised an issue on windows part of test matrix, we'll see what they say: dawidd6/action-download-artifact#108

bendichter · 2021-09-27T18:41:41Z

@CodyCBakerPhD would you be able to check against the git hash or the hash of the files to update the artifacts any time there is a change?

CodyCBakerPhD · 2021-09-27T18:45:10Z

@CodyCBakerPhD would you be able to check against the git hash or the hash of the files to update the artifacts any time there is a change?

There's probably a few different ways of doing that update; I think the best solution would be to have some kind of internal repo secret or token that gets updated to point to the correct workflow ID each time, but I haven't messed around with that yet to see what's best.

CodyCBakerPhD · 2021-09-27T18:46:13Z

Point is, however we do it, the value being reference has to be (a) global to all actions/workflows and (b) modifiable by workflows (specifically the one responsible for updating the artifact archive on a schedule)

bendichter · 2021-09-27T18:54:03Z

Could we save those hashes as artifacts?

CodyCBakerPhD · 2021-09-27T18:59:07Z

Could we save those hashes as artifacts?

We would still have to know the workflow ID of the last hash update if it were uploaded as an artifact, even though its content is itself supposed to be a hash leading to the actual most up to date GIN artifact. Bit of a chicken and egg problem there... problem is still how to communicate the most recent hash references across the workflows.

CodyCBakerPhD · 2021-09-27T19:00:32Z

I suppose another alternative would be to have a .txt file or something (not sure where would be best placement) and have the workflows open/edit/close that file keeping a record of each artifact update with corresponding ID reference

CodyCBakerPhD · 2021-10-06T19:01:19Z

Looking into using https://github.com/actions/cache now

CodyCBakerPhD · 2021-10-08T00:57:35Z

PR #274 shows better performance now, and solves many of the other open questions here, such as hashing.

CodyCBakerPhD added 4 commits September 21, 2021 10:53

initial code for weekly update of repo

e930b09

adjusting workflow and tests

dfbcdee

Merge branch 'master' into gin_as_artifact

74a975a

adjusting tests

67f8e9e

CodyCBakerPhD added code reduction Reduces the amount of code for the same functionality performance This pull request makes the code run faster or take less memory labels Sep 21, 2021

CodyCBakerPhD self-assigned this Sep 21, 2021

Automated changes

5b5347a

CodyCBakerPhD added 7 commits September 23, 2021 13:48

debugging

6786b66

Merge branch 'gin_as_artifact' of https://github.com/catalystneuro/nw…

63a92b3

…b-conversion-tools into gin_as_artifact

correcting workflows

ed2035d

extending to windows+mac, removing scheduled update

dabac59

update display

8d1ff0b

removing windows until fixed on download side

41e8ce9

adding lazy check for production without data

040f942

CodyCBakerPhD marked this pull request as ready for review September 23, 2021 21:01

CodyCBakerPhD requested a review from bendichter September 23, 2021 21:01

CodyCBakerPhD marked this pull request as draft October 6, 2021 19:00

CodyCBakerPhD mentioned this pull request Oct 8, 2021

GIN as cache #274

Merged

3 tasks

CodyCBakerPhD closed this Oct 8, 2021

CodyCBakerPhD deleted the gin_as_artifact branch October 8, 2021 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GIN as GitHub Artifacts #266

GIN as GitHub Artifacts #266

CodyCBakerPhD commented Sep 21, 2021

CodyCBakerPhD commented Sep 21, 2021

CodyCBakerPhD commented Sep 21, 2021

CodyCBakerPhD commented Sep 23, 2021 •

edited

Loading

CodyCBakerPhD commented Sep 23, 2021

CodyCBakerPhD commented Sep 27, 2021

bendichter commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

bendichter commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Oct 6, 2021

CodyCBakerPhD commented Oct 8, 2021

GIN as GitHub Artifacts #266

GIN as GitHub Artifacts #266

Conversation

CodyCBakerPhD commented Sep 21, 2021

Motivation

How to test the behavior?

Checklist

CodyCBakerPhD commented Sep 21, 2021

CodyCBakerPhD commented Sep 21, 2021

CodyCBakerPhD commented Sep 23, 2021 • edited Loading

CodyCBakerPhD commented Sep 23, 2021

CodyCBakerPhD commented Sep 27, 2021

bendichter commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

bendichter commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Sep 27, 2021

CodyCBakerPhD commented Oct 6, 2021

CodyCBakerPhD commented Oct 8, 2021

CodyCBakerPhD commented Sep 23, 2021 •

edited

Loading