Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: acceptance tests #848

Merged
merged 556 commits into from
Jan 10, 2022
Merged

feat: acceptance tests #848

merged 556 commits into from
Jan 10, 2022

Conversation

lionel-nj
Copy link
Contributor

@lionel-nj lionel-nj commented Apr 9, 2021

closes #734

Summary:

This PR provides changes to create a new module to be used for acceptance tests

Expected behavior:

Input data:

/reports/archive-id-1
  - latest.json
  - reference.json
/reports/archive-id-2
  - latest.json
  - reference.json
/reports/archive-id-3
  - latest.json
  - reference.json

where:

  • latest.json is the validation report produced by the snapshot version of the validation
  • reference.json is the validation report generated by the version of reference of the validator (typically the version published on the master branch)

Comparison process

Validation reports (latest.json and reference.json) are compared for each dataset-id-value.
If latest.json contains a type of error notice (identified by notice_code) that is not included in reference.json, then it is flagged by incrementing a counter related to the dataset in question (identified by an id).
If the value in this counter is greater than the allowed threshold (determined by command line input, please see documentation in /docs/ACCEPTANCE_TEST.md), then the dataset is flagged as faulty.

At the end the percentage of new faulty datasets is compared to the allowed threshold (determined by command line input, please see documentation in /docs/ACCEPTANCE_TEST.md) to determine if a rule is "acceptable" or not.

Final output

The final outputs:

  1. A json file named acceptance_report.json that contains information about the difference encountered for each source, formatted as follows:
{
  "newErrors": [
    {
      "noticeCode": "first_notice_code",
      "affectedSourcesCount": 2,
      "affectedSources": [
        {
          "sourceId": "source-id-1",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-1",
          "count": 4
        },
        {
          "sourceId": "source-id-2",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-2",
          "count": 6
        }
      ]
    },
    {
      "noticeCode": "fourth_notice_code",
      "affectedSourcesCount": 1,
      "affectedSources": [
        {
          "sourceId": "source-id-5",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-5",
          "count": 5
        }
      ]
    },
    {
      "noticeCode": "second_notice_code",
      "affectedSourcesCount": 1,
      "affectedSources": [
        {
          "sourceId": "source-id-2",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-2",
          "count": 40
        }
      ]
    },
    {
      "noticeCode": "third_notice_code",
      "affectedSourcesCount": 3,
      "affectedSources": [
        {
          "sourceId": "source-id-1",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-1",
          "count": 40
        },
        {
          "sourceId": "source-id-3",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-3",
          "count": 15
        },
        {
          "sourceId": "source-id-5",
          "sourceUrl": "url to the latest version of the dataset issued by source-id-5",
          "count": 2
        }
      ]
    }
  ]
}
  1. A json file named corrupted_sources_report.json that contains information about soruces that could not be taken into account for the acceptance test; which is formatted as follows:
{
  "corruptedSources": [
    "source-id-1",
    "source-id-2",
  ],
  "sourceIdCount": 1245,
  "status": "valid",
  "corruptedSourcesCount": 2,
  "maxPercentageCorruptedSources": 2
}
  1. A console log that underlines the percentage of existing dataset that contains more than 1 new type of error.

  2. A comment on the PR with link to the acceptance test report.

  3. Workflow status is green ( ✅ ) is acceptance test passed or red ( ❌ ) if it did not.

Capture d’écran, le 2021-10-28 à 14 44 55

Capture d’écran, le 2021-11-23 à 16 45 46

Error handling

  1. Directory output is empty
    log: "Specified directory is empty, cannot generate acceptance tests report."

  2. No report is available for a given id
    System exits on error code 1

  3. One of the reports is not available for a given id
    System exits on error code 1

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

@lionel-nj lionel-nj self-assigned this Apr 9, 2021
@lionel-nj
Copy link
Contributor Author

lionel-nj commented Apr 14, 2021

After 1 billion trials, this works: 7dc7016. Stoping here for today, resuming tomorrow.

To all those that received a bunch of emails: thanks for bearing with me 😅

@lionel-nj
Copy link
Contributor Author

As 5b96df7 demonstrates. the inclusion of "[ci skip]" key word in a commit message will prevent the execution of the integration test workflow.
Capture d’écran, le 2021-04-14 à 11 25 43

@lionel-nj
Copy link
Contributor Author

Build the CLI project with the latest changes and run the gtfs-validator JAR in the GitHub action (i.e., what we're currently doing)

This requires being able to run the validator without providing -f CLI arg. #851 has been created to this extent.

Download the latest release JAR from https://github.com/MobilityData/gtfs-validator/releases (GitHub might be able to do this directly vs. making an HTTP request?) and run it

For now this is done via Github action

We'll need to make sure the two sets of output don't collide.

#852 makes reports names user configurable so that we can make sure reports do not collide

@lionel-nj
Copy link
Contributor Author

lionel-nj commented Apr 27, 2021

Per trial and error, the value passed to DATASETS should be a compact stringified json object such as:

{"include":[{"url":"http://webapps.thebus.org/transitdata/Production/google_transit.zip","output":"thb"},{"url":"http://www.transperth.wa.gov.au/TimetablePDFs/GoogleTransit/Production/google_transit.zip","output":"transperth"},{"url":"https://octa.net/current/google_transit.zip","output":"octa"}]}

main/build.gradle Outdated Show resolved Hide resolved
comparator/build.gradle Outdated Show resolved Hide resolved
comparator/build.gradle Outdated Show resolved Hide resolved
@lionel-nj
Copy link
Contributor Author

Thanks for the advice @barbeau! That worked: 562ec00 💯

Capture d’écran, le 2021-04-29 à 16 47 39

Copy link
Collaborator

@aababilov aababilov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some first comments regarding the code style. I have not checked the overall logic yet.

@lionel-nj
Copy link
Contributor Author

lionel-nj commented May 11, 2021

Thanks @aababilov - c4e3459 introduces changes to reduce the complexity of getNewErrorCount.

  • I encapsulated ValidationReport in ValidationReportContainer so that the set of error codes is generated only once: we no longer have to create this set each time getErrorCodes is called.

You still have code duplication and you do not close your latestReportReader.

  • ValidationReportContainer now implements AutoCloseable so that is each container is automatically closed after usage in the try-catch block.
  • ValidationReportContainer has method fromPath to avoid code duplication in the main method.

But why double?

My bad, it is now an int as it should have been from the beginning.

@lionel-nj lionel-nj requested a review from aababilov May 11, 2021 14:18
@aababilov
Copy link
Collaborator

Thanks for updates, Lionel! And thanks for fixing the performance.

@lionel-nj
Copy link
Contributor Author

lionel-nj commented May 12, 2021

I am not sure that I see how this class helps to simplify the code. Instead, it mixes some unrelated logic:

reading of ValidationReport - with holding the input Reader object open;
caching reportErrorCodes that logically belong to ValidationReport

Indeed, that was confusing - I used GSON custom deserialization instead of using the proxy ValidationReportContainer. Now all the logic is in ValidationReport whose construction process has been clarified.

Why hold the reader open after we read all the report?

The latest update of this PR closes the reader right after reading the files.

This description of "return" is too long and it repeats the information above.

Modified.

@aababilov PTAL

Copy link
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lionel-nj Thanks for working on this! Some feedback in-line below.

.github/workflows/integration_test.yml Outdated Show resolved Hide resolved
.github/workflows/integration_test.yml Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
docs/INTEGRATION_TESTS.md Outdated Show resolved Hide resolved
docs/INTEGRATION_TESTS.md Outdated Show resolved Hide resolved
* as parameter.
*/
public int getNewErrorCount(ValidationReport other) {
return Sets.difference(other.getErrorCodes(), getErrorCodes()).size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to know about a change in the other direction too. For example, if the last validator release found 4 error types, and the latest snapshot only found 2 error types (and presumably a rule implementation changed), that's going to allow new data that was previously invalid. My understanding is that the current implementation doesn't catch this due to order of variables here.

Should we compare both ways, and return a positive or negative value depending on the direction of change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed during our last meeting: this could be nice but not a priority now. @isabelle-dr will come back to us with more details about the possible use cases needed for these acceptance tests.

In order to keep track of this, I will leave this discussion open for now. Will revisit in the future if needed.

@lionel-nj lionel-nj changed the title feat: create additional module for integration tests feat: create additional module for acceptance tests May 31, 2021
@lionel-nj
Copy link
Contributor Author

@barbeau thanks for reviewing!

One question: how would you recommend proceeding to execute this new workflow only when file from package org.mobilitydata.gtfsvalidator.validator are changed?

I tried to leverage paths:

on:
  push:
    branches: [ master, new-test-module ]
    paths:
      - 'main/src/main/java/org/mobilitydata/gtfsvalidator/validator'
      - 'core/src/main/java/org/mobilitydata/gtfsvalidator/validator'
on:
  push:
    branches: [ master, new-test-module ]
    paths:
      - '../../main/src/main/java/org/mobilitydata/gtfsvalidator/validator'
      - '../../core/src/main/java/org/mobilitydata/gtfsvalidator/validator'

but these attempts were not successful.

@lionel-nj lionel-nj requested a review from barbeau May 31, 2021 20:51
@barbeau
Copy link
Member

barbeau commented Jun 1, 2021

@lionel-nj I think you're just missing the wildcard?

Try something like:

on:
  push:
    branches: [ master, new-test-module ]
    paths:
      - 'main/src/main/java/org/mobilitydata/gtfsvalidator/validator/**'
      - 'core/src/main/java/org/mobilitydata/gtfsvalidator/validator/**'

For example see https://help.sumologic.com/03Send-Data/Sources/04Reference-Information-for-Sources/Using-Wildcards-in-Paths.

@lionel-nj
Copy link
Contributor Author

That worked! Thank you @barbeau

Capture d’écran, le 2021-06-01 à 10 26 03

Copy link
Member

@barbeau barbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @lionel-nj! Some comments in-line.

.github/workflows/acceptance_test.yml Outdated Show resolved Hide resolved
.github/workflows/acceptance_test.yml Outdated Show resolved Hide resolved
.github/workflows/acceptance_test.yml Outdated Show resolved Hide resolved
output-comparator/README.md Outdated Show resolved Hide resolved
output-comparator/README.md Outdated Show resolved Hide resolved
docs/ACCEPTANCE_TESTS.md Outdated Show resolved Hide resolved
@lionel-nj
Copy link
Contributor Author

Thanks for reviewing. After discussion with @isabelle-dr: this CI process should be executed on any code change. Hence, I removed the support to only execute this workflow when changes were provided to validator module.

@barbeau
Copy link
Member

barbeau commented Jun 11, 2021

After discussion with @isabelle-dr: this CI process should be executed on any code change.

Ok - you could still set it to ignore changes to .md files like the main CI

@lionel-nj lionel-nj requested a review from barbeau June 11, 2021 17:56
@asvechnikov2
Copy link
Collaborator

Ugh... I tried to add quote reply and accidently sent the whole review...

How would such test differ from the unit test provided in MainTest? From my understanding MainTest tests the comparison process. Do you think that these integration tests should test the entire Github pipeline?

We have pretty good coverage of unit tests, however, if we look at the overall feature it does the next

  • Fetches list of urls
  • Fetches individual datasets
  • Runs validation on the datasets and stores results
  • Runs reports comparison
  • Updates GitHub according to comparison

We're testing almost each step, but we don't know if each step is correctly linked to the next one. Once we make an update to the code this linking could be broken and we should have a way to make sure everything works fine. This could be just an instruction on how to run the whole pipeline and assess its results, unfortunately, I didn't have time to look at the way how GitHub pipelines could be tested.

@lionel-nj
Copy link
Contributor Author

lionel-nj commented Dec 23, 2021

We're testing almost each step, but we don't know if each step is correctly linked to the next one. Once we make an update to the code this linking could be broken and we should have a way to make sure everything works fine. This could be just an instruction on how to run the whole pipeline and assess its results, unfortunately, I didn't have time to look at the way how GitHub pipelines could be tested.

One thing that could be done to test the pipeline's execution would be allowing download of the validation reports from the Google Storage Bucket that we use. We also have an internal notebook that is used to compute information about the state of datasets - which could be leveraged for the sakes of verification. We could integrate and document this verification process in a subsequent step. What do you think about that @asvechnikov2?

Edit: actually all validation reports are already available in the artifacts persisted after execution of the pipeline (therefore, no need to open the Google Storage bucket to the public). So I will update the documentation with basic instructions to verify the execution of the acceptance test. We could still provide a notebook to automate the task in the future. @asvechnikov2 @isabelle-dr

cc @isabelle-dr

lionel-nj added 3 commits December 23, 2021 11:17
- generate source corruption report
- refactor MainTest.writeFile method
- refactor ValidationReport
- implement resolve for code clarity and consistency
- additional unit tests
- clarify documentation
@github-actions
Copy link
Contributor

Thank you for this contribution.

Information about source corruption

0 out of 1247 sources are corrupted.
The following sources are corrupted:

Acceptance test details

Also, the changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase.
Download the full acceptance test report for commit 6a01baa here (report will disappear after 90 days).

@github-actions
Copy link
Contributor

Thank you for this contribution! 🍰✨🦄

Information about source corruption

0 out of 1247 sources are corrupted.

Acceptance test details

The changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase.
Download the full acceptance test report for commit 6eb4f45 here (report will disappear after 90 days).

@isabelle-dr
Copy link
Contributor

The emoji choice is on point 👌

Copy link
Collaborator

@asvechnikov2 asvechnikov2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM!

actually all validation reports are already available in the artifacts persisted after execution of the pipeline (therefore, no need to open the Google Storage bucket to the public)

I saw messages from github-actions that reflect new changes, so it seems that it's possible to start the pipeline with new changes and verify its results manually. I think this should be enough to verify that everything works the way it's expected, so there's no need for any automation. We might want to add one broken feed to the sources to make sure that the pipeline catches and correctly processes this use case.

Comment on lines +71 to +78
comment = (
"Thank you for this contribution! 🍰✨🦄 \n\n"
"### Information about source "
"corruption \n\n"
f"{corrupted_sources_report['corruptedSourcesCount']} out of "
f"{corrupted_sources_report['sourceIdCount']}"
f" sources are corrupted."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information snippet looks really great! No need to make any additional changes right now, I just want a feature request, to provide information about how many source there were, how many broken, how many newly broken, how many corrupted, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #1085 to follow up on the next steps. Feel free to comment there if you see something missing. Thank you again for all the precious feedback 🙏

- remove unused variable
@github-actions
Copy link
Contributor

Thank you for this contribution! 🍰✨🦄

Information about source corruption

0 out of 1248 sources are corrupted.

Acceptance test details

The changes in this pull request did not trigger any new errors on known GTFS datasets from the MobilityDatabase.
Download the full acceptance test report for commit 2d320f2 here (report will disappear after 90 days).

@lionel-nj
Copy link
Contributor Author

lionel-nj commented Jan 10, 2022

After providing the last modifications, I am super excited about merging this PR. Thank you very much to everyone involved in the reflexion and the PR review process (@aababilov, @asvechnikov2, @barbeau, @isabelle-dr, @maximearmstrong).

@MobilityData MobilityData deleted a comment from github-actions bot Jan 10, 2022
@lionel-nj lionel-nj merged commit fed4bad into master Jan 10, 2022
@lionel-nj lionel-nj deleted the new-test-module branch January 10, 2022 12:24
@isabelle-dr
Copy link
Contributor

isabelle-dr commented Jan 10, 2022

Amazing work @lionel-nj 👏👏👏
Massive Kudos for bringing this feature to the (first) finish line :)

@barbeau
Copy link
Member

barbeau commented Jan 10, 2022

Congrats @lionel-nj for all your work on this!

@f8full
Copy link
Contributor

f8full commented Jan 11, 2022

Bravo @lionel-nj

@maximearmstrong
Copy link
Contributor

Great work @lionel-nj ! Congrats 🙌 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Automated tests to see if a PR will trigger new errors in datasets Write higher level acceptance tests
7 participants