Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial support for GTFS Fares v2 - fare media #1305

Merged
merged 18 commits into from
Mar 31, 2023

Conversation

bdferris-v2
Copy link
Collaborator

As described in google/transit#355. This is a WIP that is intended to track the proposed modifications the spec. Closes #1301.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with gradle test to make sure you didn't break anything
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues

Copy link
Contributor

@isabelle-dr isabelle-dr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdferris-v2 Thanks for working on this 🙏! Can we make this fork available for stakeholders producing this data?
Here are the name changes that we've made after you created this PR:

  • fare_payment_types.txt -> fare_payment_options.txt
  • fare_payment_type_group_id -> fare_payment_option_group_id
  • fare_payment_type_name -> fare_payment_option_name
  • fare_payment_type -> fare_payment_option_type

@bdferris-v2
Copy link
Collaborator Author

@isabelle-dr done

@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1419 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1419 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1419 sources (~0 %) are corrupted.
Commit: 2965826
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

@isabelle-dr isabelle-dr changed the title feat: Initial support for GTFS Fares v2 - fare containers / payment types feat: Initial support for GTFS Fares v2 - fare media Feb 28, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2023

✅ Rule acceptance tests passed.
New Errors: 1 out of 1419 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 0 out of 1419 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1419 sources (~0 %) are corrupted.
Commit: 089ad3e
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

@isabelle-dr
Copy link
Contributor

@bdferris-v2 this is now unblocked :)
I believe we only need to update medium -> media in this PR.

I suggest we keep the paper_ticket as well, I expect it to be adopted soon and I see more pros (avoid false positives, avoid confusion) than cons (not yet adopted) in keeping it in this validator.
Let me know if you think otherwise.

@bdferris-v2
Copy link
Collaborator Author

@isabelle-dr I had already updated the bulk of references - just had to fix a few lingering mentions in the docs. This is ready for review.

@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1420 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1420 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1420 sources (~0 %) are corrupted.
Commit: e2d33bf
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

Copy link
Contributor

@isabelle-dr isabelle-dr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bdferris-v2 there seems to be a problem with the structure of the duplicate_fare_media notice. See the image of the report.
Screenshot 2023-03-14 at 8 03 18 PM
Here is the JSON if it helps:

{"code":"duplicate_fare_media",
  "severity":"WARNING",
  "totalNotices":1,
  "sampleNotices":[{
    "fareMedia1":{
        "csvRowNumber":6.0,"fareMediaId":"cash"},
      "fareMedia2":{
        "csvRowNumber":7.0,"fareMediaId":"cash2"}}]}

RULES.md Outdated Show resolved Hide resolved
RULES.md Outdated Show resolved Hide resolved
bdferris-v2 and others added 2 commits March 15, 2023 12:08
Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>
Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>
@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1420 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1420 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1420 sources (~0 %) are corrupted.
Commit: bce6bd8
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

RULES.md Outdated Show resolved Hide resolved
@bdferris-v2
Copy link
Collaborator Author

So back to the high-level question from @isabelle-dr. I'll admit I took a slightly different approach to Notice schema modeling here and I wanted to explain why.

We have lots of notices where we are documenting the differences between two entities in a file. The existing convention for notices has been to introduce fields like:

  • csvRowNumber1
  • filename1
  • fieldName1

and corresponding:

  • csvRowNumber2
  • filename2
  • fieldName2

While that works, it leads to duplication of field descriptions/documentation and logic for setting field values.

Here, I took a different approach, where I bundled the fields for an entity together (csvRowNumber and fareMediaId) and then duplicated those across two parent fields fareMedia1 and fareMedia2.

The resulting JSON schema looks like:

{
  "fareMedia1": { "csvRowNumber": 1, "fareMediaId": "id1" },
  "fareMedia2": { "csvRowNumber": 2, "fareMediaId": "id2" }
}

I'll admit that renders a little weirdly in the existing report UI, but I have a follow-up PR that would make it render like this:

Screenshot 2023-03-20 at 10 47 54 AM

Note the nested rows of field name headers at the top.

Thoughts?

@isabelle-dr isabelle-dr added this to the 4.1.0 milestone Mar 20, 2023
@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1423 sources (~0 %) are corrupted.
Commit: c334e58
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

@isabelle-dr
Copy link
Contributor

Okay, I understand. This seems more straightforward indeed.

I am wondering if the users that programmatically consume the JSON report will have issues with this.
@lauriemerrell and @derhuerst do you foresee any issues with having this additional nesting in the JSON report?

@lauriemerrell
Copy link

As noted on Slack, @atvaccaro will review for Cal-ITP impacts; thanks for the call-out @isabelle-dr, this does affect us!

@atvaccaro
Copy link

atvaccaro commented Mar 30, 2023

As noted on Slack, @atvaccaro will review for Cal-ITP impacts; thanks for the call-out @isabelle-dr, this does affect us!

Cross-posting from Slack: would it be possible to stick with either totally unnested or totally nested notices? Right we just parse each record as flat i.e. no nesting (https://github.com/cal-itp/data-infra/blob/main/airflow/dags/create_external_tables/gtfs_schedule_v2/validation_notices.yml#L41-L299) and while I wouldn’t be opposed to the nesting in general, my understanding is that the PR would only change to nesting for this specific fares work. (please correct me if I’m wrong)

We can handle nested fields for sure, just wondering about overall consistency.

@bdferris-v2
Copy link
Collaborator Author

It's true that this change is only for the single fare-related notice. There are a handful of existing notices that I believe could benefit from a nested field approach:

duplicate_key
foreign_key_violation
block_trips_with_overlapping_stop_times
duplicate_route_name
overlapping_frequency
wrong_parent_location_type
decreasing_shape_distance
equal_shape_distance_diff_coordinates
equal_shape_distance_same_coordinates
stops_match_shape_out_of_order
decreasing_or_equal_stop_time_distance
fast_travel_between_consecutive_stops
fast_travel_between_far_stops

but I acknowledge it's a more significant decision to migrate all of them and probably one we couldn't make in time for the next release. And if we were going to make a breaking change to the notice schema, I suppose I'd want to use it as an opportunity to fix some other inconsistencies as well (csvRowNumber1 vs csvRowNumberA vs oldCsvRowNumber, etc).

So I guess that's a long way of saying that I will revert the nest field notices in this PR for now.

@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1423 sources (~0 %) are corrupted.
Commit: d58bb44
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

@derhuerst
Copy link

While I don't like that we're adding a different format/schema just for our use case here, I agree that breaking backwards compatibility (by changing all other notices' formats to work this way too) should be done in a more coordinated fashion. Si I'm okay with the format. 👍

@derhuerst
Copy link

derhuerst commented Mar 30, 2023

I am wondering if the users that programmatically consume the JSON report will have issues with this. […] @derhuerst do you […]?

I should note that I do not parse GTFS Validator's output in production yet. I have built a quick prototype script once, and I'm planning to use it in production, but not yet. 😊

@isabelle-dr
Copy link
Contributor

Thank you for the feedback.
@bdferris-v2, I've updated the notice schema in the doc to represent your latest changes.
Merging this PR, thank you!

@isabelle-dr isabelle-dr merged commit 09d0882 into master Mar 31, 2023
@isabelle-dr isabelle-dr deleted the issue/1301/fare_containers branch March 31, 2023 13:18
@github-actions
Copy link
Contributor

✅ Rule acceptance tests passed.
New Errors: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Errors: 1 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
New Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
Dropped Warnings: 0 out of 1423 datasets (~0%) are invalid due to code change, which is less than the provided threshold of 1%.
0 out of 1423 sources (~0 %) are corrupted.
Commit: 81dfd0f
Download the full acceptance test report here (report will disappear after 90 days).
✅ Rule acceptance tests passed.

@atvaccaro
Copy link

It's true that this change is only for the single fare-related notice. There are a handful of existing notices that I believe could benefit from a nested field approach:
...
And if we were going to make a breaking change to the notice schema, I suppose I'd want to use it as an opportunity to fix some other inconsistencies as well (csvRowNumber1 vs csvRowNumberA vs oldCsvRowNumber, etc).

Sounds good to me! We'd definitely be on board with normalization and consistency fixes, but agree that doing it in a single breaking change would be ideal.

ryon pushed a commit to JarvusInnovations/gtfs-validator that referenced this pull request Apr 1, 2023
* Initial entry for fare payment type schema and validation.

* Update to reflect new fare_payment_options.txt naming.

* Update to reflect fare payment option group discussion.

* Update to match fare_media.txt proposal.

* fare medium => fare media

* Clean up a few last usages of `medium`.

* Documentation.

* Fix link in RULES.md

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>

* Fix link in RULES.md

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>

* Remove extraneous tick-tick-ticks.

* Flatten notice fields.

* Add new notice fields.

* Update RULES.md

---------

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>
bradyhunsaker pushed a commit to bradyhunsaker/gtfs-validator that referenced this pull request Apr 25, 2023
* Initial entry for fare payment type schema and validation.

* Update to reflect new fare_payment_options.txt naming.

* Update to reflect fare payment option group discussion.

* Update to match fare_media.txt proposal.

* fare medium => fare media

* Clean up a few last usages of `medium`.

* Documentation.

* Fix link in RULES.md

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>

* Fix link in RULES.md

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>

* Remove extraneous tick-tick-ticks.

* Flatten notice fields.

* Add new notice fields.

* Update RULES.md

---------

Co-authored-by: isabelle-dr <isabelle@mobilitydata.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Initial support for GTFS Fares v2 - Fare Containers
5 participants