ci: implement queuing mechanism for acceptance tests #1038

lionel-nj · 2021-10-13T22:08:04Z

Summary:

This PR provides support to implement a queueing mechanism dedicated to the acceptance test feature. Said mechanism has to be implemented since running the validator on 1000+ datasets sequentially takes more than 6 hours (on github) which causes the workflow to get cancelled

Expected behavior:

CI should be triggered on all commits except if the developer added the keyword "[acceptance test skip]" to its latest commit message
Older workflows (only Rule acceptance tests) should be cancelled when a new one is triggered to avoid queueing too many workflows for a same PR
Datasets' URL should be fetched from the MobilityDatabase
For each dataset (i.e. URL retrieved from the database), the reference version of the validator (from the master branch) should run and produce reference.json and reference_errors.json
For each dataset (i.e. URL retrieved from the database), the snapshot version of the validator (from the branch where this CI process is run) should run and produce report.json and system_errors.json

⚙️ The task get-reports (from acceptance_test.yml) works as follows:
Once 1000+ urls are retrieved from the Mobility Database as a list of 256 objects (max), each one of these objects are used to define a matrix. The latest generates jobs that run concurrently. Each one of these jobs run queue_runner.sh which parses the input as an array of json objects:

({"id:"dataset id value-1", "url":"dataset url-1"} {"id:"dataset id value-2", "url":"dataset url-2"} ... {"id:"dataset id value-n", "url":"dataset url-n"})

For each one of these dictionaries both snapshot and reference validator are run.

The reports are all saved in a directory called output - which is persisted at the end of the job, and uploaded to Google Cloud Storage to ease download.

Please make sure these boxes are checked before submitting your pull request - thanks!

Run the unit tests with gradle test to make sure you didn't break anything
Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
Linked all relevant issues
Include screenshot(s) showing how this pull request works and fixes the issue(s)

…tance test skip]

lionel-nj · 2021-10-26T14:26:36Z

@barbeau, as demonstrated by 7fd762c (see https://github.com/MobilityData/gtfs-validator/runs/4010914933?check_suite_focus=true), including "[acceptance test skip]" in the commit message enables to skip the execution of Rule acceptance test workflow. When the keyword is not included in said commit message, the workflow is executed as shown in https://github.com/MobilityData/gtfs-validator/actions/runs/1386081082 from 7fd762c. PTAL.

Note: this mechanism will be documented in #848 where additional documentation for the acceptance test has been created.

barbeau

Great job @lionel-nj, LGTM!

barbeau · 2021-10-26T19:46:55Z

.github/workflows/acceptance_test.yml

+      - name: Set URL matrix
+        id: set-matrix
+        run: |
+          python3 scripts/mobility-database-harvester/harvest_latest_versions.py -d datasets_metadata -a gtfs_archives_ids.json -l latest_versions.json


For future optimization - I'm not sure what the primary cost is for fetching data from the Mobility Database, but if it's outbound data bandwidth from the Mobility Database you could reduce this by supporting the HTTP header If-Modified-Since and caching a version of the response in the GitHub Action. Then when you send this request to the mobility database with the date of the cached file, you'd only get the full dataset returned to you (and pay for it) if it was modified after the date of the cache file. Both sides (GitHub Action and Mobility Database) would need to support If-Modified-Since for that to work.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since

Thanks! I will definitely look into that!

lionel-nj added 30 commits October 13, 2021 15:33

modify script to save a list of 256 sublists

ee9acd2

do not execute end to end on changes in python scripts

68e30ab

do not execute end to end on changes in acceptance_test.yml

4cf243c

check matrix definition

47dd98d

modify branch name

0f0c74b

check $DATASETS

f103119

use line return

0a64b7b

add missing white space

1e21d35

do not use tostring

fd556cf

define output earlier

249af27

touch

5e37b09

use set-matrix as step id

ff93fd3

touch

a90834c

use line return

a357653

define output later

512ca0f

do not use tostring

fbb09b8

set name

d50ef2c

touch

c5b1da4

use strategy

b6bfa6a

set step name

63c7162

do not use tostring

2fe4f5f

touch

d2223b1

other endpoint

3c75865

use include

bf37be6

do not use tostring

e6c0658

use tostring

df15e68

use -r

afedab6

use -R

d5a0823

do not use from Json

96b3fce

use other test matrix

300575f

lionel-nj added 19 commits October 25, 2021 18:48

touch that is not supposed to trigger acceptance test

eb16b43

touch [acceptance test skip]

94669b5

touch [acceptance test skip]

8f93d8f

log

5729ecb

on push [acceptance test skip]

8fd7d48

try env [acceptance test skip]

489c382

use output [acceptance test skip]

532fd80

checkout repo [acceptance test skip]

117a998

try other job [acceptance test skip]

02a978b

touch that is not supposed to trigger acceptance test workflow

b03e06c

touch that is not supposed to trigger acceptance test workflow [accep…

131336f

…tance test skip]

touch that is supposed to trigger workflow

95ac254

echo event

7179116

echo commit msg

cdbbee8

other job [acceptance test skip]

787a284

use pre_ci [acceptance test skip]

4763060

skip fetch-urls [acceptance test skip]

84b1be7

touch that is supposed to skip acceptance test [acceptance test skip]

7fd762c

touch that is supposed to execute acceptance test workflow

f2f1310

lionel-nj added 4 commits October 26, 2021 10:33

remove "push" language

7955f08

remove comments

17425c6

touch [acceptance test skip]

01114bc

remove extra blank lines

db40891

barbeau approved these changes Oct 26, 2021

View reviewed changes

lionel-nj marked this pull request as ready for review October 26, 2021 20:11

lionel-nj merged commit 8590cd1 into master Oct 26, 2021

lionel-nj deleted the ci/use-github-matrix branch October 26, 2021 20:11

isabelle-dr linked an issue Oct 28, 2021 that may be closed by this pull request

Automated tests to see if a PR will trigger new errors in datasets #1045

Closed

isabelle-dr mentioned this pull request Oct 28, 2021

3.0.0 Release tracking #1046

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: implement queuing mechanism for acceptance tests #1038

ci: implement queuing mechanism for acceptance tests #1038

lionel-nj commented Oct 13, 2021 •

edited

Loading

lionel-nj commented Oct 26, 2021 •

edited

Loading

barbeau left a comment

barbeau Oct 26, 2021 •

edited

Loading

lionel-nj Oct 26, 2021

ci: implement queuing mechanism for acceptance tests #1038

ci: implement queuing mechanism for acceptance tests #1038

Conversation

lionel-nj commented Oct 13, 2021 • edited Loading

lionel-nj commented Oct 26, 2021 • edited Loading

barbeau left a comment

Choose a reason for hiding this comment

barbeau Oct 26, 2021 • edited Loading

Choose a reason for hiding this comment

lionel-nj Oct 26, 2021

Choose a reason for hiding this comment

lionel-nj commented Oct 13, 2021 •

edited

Loading

lionel-nj commented Oct 26, 2021 •

edited

Loading

barbeau Oct 26, 2021 •

edited

Loading