-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: acceptance tests #848
Conversation
integration/src/main/java/org/mobilitydata/gtfsvalidator/integration/Main.java
Outdated
Show resolved
Hide resolved
After 1 billion trials, this works: 7dc7016. Stoping here for today, resuming tomorrow. To all those that received a bunch of emails: thanks for bearing with me 😅 |
As 5b96df7 demonstrates. the inclusion of "[ci skip]" key word in a commit message will prevent the execution of the integration test workflow. |
This requires being able to run the validator without providing
For now this is done via Github action
#852 makes reports names user configurable so that we can make sure reports do not collide |
Per trial and error, the value passed to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some first comments regarding the code style. I have not checked the overall logic yet.
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
...ator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/util/NoticeAggregate.java
Outdated
Show resolved
Hide resolved
...ator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/util/NoticeAggregate.java
Outdated
Show resolved
Hide resolved
...tor/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/util/ValidationReport.java
Outdated
Show resolved
Hide resolved
...tor/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/util/ValidationReport.java
Outdated
Show resolved
Hide resolved
Thanks @aababilov - c4e3459 introduces changes to reduce the complexity of getNewErrorCount.
My bad, it is now an |
Thanks for updates, Lionel! And thanks for fixing the performance. |
Indeed, that was confusing - I used GSON custom deserialization instead of using the proxy
The latest update of this PR closes the reader right after reading the files.
Modified. @aababilov PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lionel-nj Thanks for working on this! Some feedback in-line below.
* as parameter. | ||
*/ | ||
public int getNewErrorCount(ValidationReport other) { | ||
return Sets.difference(other.getErrorCodes(), getErrorCodes()).size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to know about a change in the other direction too. For example, if the last validator release found 4 error types, and the latest snapshot only found 2 error types (and presumably a rule implementation changed), that's going to allow new data that was previously invalid. My understanding is that the current implementation doesn't catch this due to order of variables here.
Should we compare both ways, and return a positive or negative value depending on the direction of change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed during our last meeting: this could be nice but not a priority now. @isabelle-dr will come back to us with more details about the possible use cases needed for these acceptance tests.
In order to keep track of this, I will leave this discussion open for now. Will revisit in the future if needed.
...tor/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/util/ValidationReport.java
Outdated
Show resolved
Hide resolved
.../java/org/mobilitydata/gtfsvalidator/outputcomparator/util/ValidationReportDeserializer.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
@barbeau thanks for reviewing! One question: how would you recommend proceeding to execute this new workflow only when file from package I tried to leverage on:
push:
branches: [ master, new-test-module ]
paths:
- 'main/src/main/java/org/mobilitydata/gtfsvalidator/validator'
- 'core/src/main/java/org/mobilitydata/gtfsvalidator/validator' on:
push:
branches: [ master, new-test-module ]
paths:
- '../../main/src/main/java/org/mobilitydata/gtfsvalidator/validator'
- '../../core/src/main/java/org/mobilitydata/gtfsvalidator/validator' but these attempts were not successful. |
@lionel-nj I think you're just missing the wildcard? Try something like: on:
push:
branches: [ master, new-test-module ]
paths:
- 'main/src/main/java/org/mobilitydata/gtfsvalidator/validator/**'
- 'core/src/main/java/org/mobilitydata/gtfsvalidator/validator/**' For example see https://help.sumologic.com/03Send-Data/Sources/04Reference-Information-for-Sources/Using-Wildcards-in-Paths. |
That worked! Thank you @barbeau |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @lionel-nj! Some comments in-line.
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
Thanks for reviewing. After discussion with @isabelle-dr: this CI process should be executed on any code change. Hence, I removed the support to only execute this workflow when changes were provided to |
Ok - you could still set it to ignore changes to .md files like the main CI |
core/src/main/java/org/mobilitydata/gtfsvalidator/model/ValidationReport.java
Outdated
Show resolved
Hide resolved
...-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Arguments.java
Outdated
Show resolved
Hide resolved
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
output-comparator/src/main/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/Main.java
Outdated
Show resolved
Hide resolved
...t-comparator/src/test/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/MainTest.java
Outdated
Show resolved
Hide resolved
...t-comparator/src/test/java/org/mobilitydata/gtfsvalidator/outputcomparator/cli/MainTest.java
Outdated
Show resolved
Hide resolved
Ugh... I tried to add quote reply and accidently sent the whole review...
We have pretty good coverage of unit tests, however, if we look at the overall feature it does the next
We're testing almost each step, but we don't know if each step is correctly linked to the next one. Once we make an update to the code this linking could be broken and we should have a way to make sure everything works fine. This could be just an instruction on how to run the whole pipeline and assess its results, unfortunately, I didn't have time to look at the way how GitHub pipelines could be tested. |
Edit: actually all validation reports are already available in the artifacts persisted after execution of the pipeline (therefore, no need to open the Google Storage bucket to the public). So I will update the documentation with basic instructions to verify the execution of the acceptance test. We could still provide a notebook to automate the task in the future. @asvechnikov2 @isabelle-dr cc @isabelle-dr |
- generate source corruption report - refactor MainTest.writeFile method - refactor ValidationReport - implement resolve for code clarity and consistency - additional unit tests - clarify documentation
Thank you for this contribution. Information about source corruption0 out of 1247 sources are corrupted. Acceptance test detailsAlso, the changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase. |
Thank you for this contribution! 🍰✨🦄 Information about source corruption0 out of 1247 sources are corrupted. Acceptance test detailsThe changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase. |
The emoji choice is on point 👌 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM!
actually all validation reports are already available in the artifacts persisted after execution of the pipeline (therefore, no need to open the Google Storage bucket to the public)
I saw messages from github-actions
that reflect new changes, so it seems that it's possible to start the pipeline with new changes and verify its results manually. I think this should be enough to verify that everything works the way it's expected, so there's no need for any automation. We might want to add one broken feed to the sources to make sure that the pipeline catches and correctly processes this use case.
core/src/main/java/org/mobilitydata/gtfsvalidator/model/ValidationReport.java
Outdated
Show resolved
Hide resolved
comment = ( | ||
"Thank you for this contribution! 🍰✨🦄 \n\n" | ||
"### Information about source " | ||
"corruption \n\n" | ||
f"{corrupted_sources_report['corruptedSourcesCount']} out of " | ||
f"{corrupted_sources_report['sourceIdCount']}" | ||
f" sources are corrupted." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information snippet looks really great! No need to make any additional changes right now, I just want a feature request, to provide information about how many source there were, how many broken, how many newly broken, how many corrupted, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #1085 to follow up on the next steps. Feel free to comment there if you see something missing. Thank you again for all the precious feedback 🙏
- remove unused variable
Thank you for this contribution! 🍰✨🦄 Information about source corruption0 out of 1248 sources are corrupted. Acceptance test detailsThe changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase. |
After providing the last modifications, I am super excited about merging this PR. Thank you very much to everyone involved in the reflexion and the PR review process (@aababilov, @asvechnikov2, @barbeau, @isabelle-dr, @maximearmstrong). |
Amazing work @lionel-nj 👏👏👏 |
Congrats @lionel-nj for all your work on this! |
Bravo @lionel-nj |
Great work @lionel-nj ! Congrats 🙌 🚀 |
closes #734
Summary:
This PR provides changes to create a new module to be used for acceptance tests
Expected behavior:
Input data:
where:
latest.json
is the validation report produced by the snapshot version of the validationreference.json
is the validation report generated by the version of reference of the validator (typically the version published on themaster branch
)Comparison process
Validation reports (
latest.json
andreference.json
) are compared for eachdataset-id-value
.If
latest.json
contains a type of error notice (identified bynotice_code
) that is not included inreference.json
, then it is flagged by incrementing a counter related to the dataset in question (identified by an id).If the value in this counter is greater than the allowed threshold (determined by command line input, please see documentation in /docs/ACCEPTANCE_TEST.md), then the dataset is flagged as faulty.
At the end the percentage of new faulty datasets is compared to the allowed threshold (determined by command line input, please see documentation in /docs/ACCEPTANCE_TEST.md) to determine if a rule is "acceptable" or not.
Final output
The final outputs:
acceptance_report.json
that contains information about the difference encountered for each source, formatted as follows:corrupted_sources_report.json
that contains information about soruces that could not be taken into account for the acceptance test; which is formatted as follows:A console log that underlines the percentage of existing dataset that contains more than 1 new type of error.
A comment on the PR with link to the acceptance test report.
Workflow status is green ( ✅ ) is acceptance test passed or red ( ❌ ) if it did not.
Error handling
Directory
output
is emptylog: "Specified directory is empty, cannot generate acceptance tests report."
No report is available for a given id
System exits on error code 1
One of the reports is not available for a given id
System exits on error code 1
Please make sure these boxes are checked before submitting your pull request - thanks!
gradle test
to make sure you didn't break anything