-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: implement queuing mechanism for acceptance tests #1038
Conversation
@barbeau, as demonstrated by 7fd762c (see https://github.com/MobilityData/gtfs-validator/runs/4010914933?check_suite_focus=true), including "[acceptance test skip]" in the commit message enables to skip the execution of Note: this mechanism will be documented in #848 where additional documentation for the acceptance test has been created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @lionel-nj, LGTM!
- name: Set URL matrix | ||
id: set-matrix | ||
run: | | ||
python3 scripts/mobility-database-harvester/harvest_latest_versions.py -d datasets_metadata -a gtfs_archives_ids.json -l latest_versions.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future optimization - I'm not sure what the primary cost is for fetching data from the Mobility Database, but if it's outbound data bandwidth from the Mobility Database you could reduce this by supporting the HTTP header If-Modified-Since
and caching a version of the response in the GitHub Action. Then when you send this request to the mobility database with the date of the cached file, you'd only get the full dataset returned to you (and pay for it) if it was modified after the date of the cache file. Both sides (GitHub Action and Mobility Database) would need to support If-Modified-Since
for that to work.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will definitely look into that!
Summary:
This PR provides support to implement a queueing mechanism dedicated to the acceptance test feature. Said mechanism has to be implemented since running the validator on 1000+ datasets sequentially takes more than 6 hours (on github) which causes the workflow to get cancelled
Expected behavior:
Rule acceptance tests
) should be cancelled when a new one is triggered to avoid queueing too many workflows for a same PRreference.json
andreference_errors.json
report.json
andsystem_errors.json
⚙️ The task
get-reports
(fromacceptance_test.yml
) works as follows:Once 1000+ urls are retrieved from the Mobility Database as a list of 256 objects (max), each one of these objects are used to define a matrix. The latest generates jobs that run concurrently. Each one of these jobs run
queue_runner.sh
which parses the input as an array of json objects:For each one of these dictionaries both snapshot and reference validator are run.
The reports are all saved in a directory called
output
- which is persisted at the end of the job, and uploaded to Google Cloud Storage to ease download.Please make sure these boxes are checked before submitting your pull request - thanks!
gradle test
to make sure you didn't break anything