-
Notifications
You must be signed in to change notification settings - Fork 12
Conversation
Immediately failing to recognize the successfully uploaded artifact; will work on debugging this. |
Aha, I can manually download the artifact from that job run, but to get it on the other workflow I'll have to do something like this https://stackoverflow.com/questions/60355925/share-artifacts-between-workflows-github-actions the only thing now is, and this is why I put upload/download in single quotes above, it does seem like it's still spending server bandwidth on its own end rather than somehow storing it somewhere local that can allow memory access... so it remains to see if it's actually any faster than just going through GIN directly. |
…b-conversion-tools into gin_as_artifact
OK - this should be ready to go now. Basically, once every 90 days, someone would have to go to this branch and do a push (or just manually trigger that specific workflow through actions), and that will update the archived artifact of the ephy testing dataset. Now, each development test workflow uses this custom action to download that artifact and perform the GIN tests on it. Some statistics on performance: The original approach for the "Full tests", running on ubuntu only, with Datalad fetching only specific data formats, takes on average 20-30 minutes to download and run. This approach, which is currently able to run on both ubuntu and mac, takes about One thing to note is currently, the archived artifact is the entire GIN repo (takes about 40 minutes to upload as the artifact, but again that would only be every 90 days at a minumum). We could narrow it down to the formats we currently use, and this could speed it up even more, but will require more complicated workflow code so I'll table it for a next addition. Another thing to add in later will be an auto-schedule workflow to automatically update the GIN once every month or so so we don't have to do it manually. However, it will take some tinkering to find the best way to also then update the workflow/run IDs across to the development tests so they know where to look for the artifact. The custom action however, currently throws an error on the windows part of the matrix, so I'll post an issue on their repo to try to get that resolved; looks like something simple involving the decompression. Will extend to include that in coverage once/if it is fixed. |
Note also that this removes the "Full tests" from the workflows and the "development tests" instead run |
Raised an issue on windows part of test matrix, we'll see what they say: dawidd6/action-download-artifact#108 |
@CodyCBakerPhD would you be able to check against the git hash or the hash of the files to update the artifacts any time there is a change? |
There's probably a few different ways of doing that update; I think the best solution would be to have some kind of internal repo secret or token that gets updated to point to the correct workflow ID each time, but I haven't messed around with that yet to see what's best. |
Point is, however we do it, the value being reference has to be (a) global to all actions/workflows and (b) modifiable by workflows (specifically the one responsible for updating the artifact archive on a schedule) |
Could we save those hashes as artifacts? |
We would still have to know the workflow ID of the last hash update if it were uploaded as an artifact, even though its content is itself supposed to be a hash leading to the actual most up to date GIN artifact. Bit of a chicken and egg problem there... problem is still how to communicate the most recent hash references across the workflows. |
I suppose another alternative would be to have a .txt file or something (not sure where would be best placement) and have the workflows open/edit/close that file keeping a record of each artifact update with corresponding ID reference |
Looking into using https://github.com/actions/cache now |
PR #274 shows better performance now, and solves many of the other open questions here, such as hashing. |
Motivation
Let's see if this works...
The idea is, instead of manually re-downloading the GIN datasets each time we want to do a full test, do (1) an initial download from GIN (takes about 42 minutes) and 'upload' to GitHub actions [as an artifact] (takes less than 5 minutes)(https://docs.github.com/en/actions/guides/storing-workflow-data-as-artifacts), (2) then to perform tests 'download' the directory from the artifacts locally (download time TBD). This, in theory, should allow the full tests to run much faster.
This also simplifies the
test_gin
script to no longer need datalad through that imported python API (all gets offloaded to workflows), which reduces the amount of code needed by a significant margin.This also allows the possibility of running full GIN tests on other parts of the OS matrix to ensure our awareness of OS specific issues.
How to test the behavior?
No changes to the way testing behavior is called; the CI will report issues quickly if there's an issue with handling the artifact part.
Checklist