Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Have you thought about these validation rules? #1729

Open
dancesWithCycles opened this issue Apr 2, 2024 · 2 comments
Open

Question: Have you thought about these validation rules? #1729

dancesWithCycles opened this issue Apr 2, 2024 · 2 comments
Labels
new rule New rule to be added status: Needs triage Applied to all new issues

Comments

@dancesWithCycles
Copy link

Describe the problem

Hi folks,
Thank you so much for maintaining this repository and this rule overview!

I came across the Duplicate Route Name Rule and thought for myself:

Does anyone already gave a Duplicate Trip ID Rule and Duplicate Trip Name Rule a thought?

According to trips.txt trip_id and trip_short_name shall be unique (at least on a service day basis). If a GTFS feed is the result of a fusion of many different sources of public transport schedule data from many different providers, it is a common observation (at least for me) that trip_id's and trip_short_name's are unique considering a single source but not unique anymore in the resulting overall GTFS feed. Looking at the first and last departure stop and time I can tell that several stop_times.txt entries shall belong to different trips but have the same tips_id or trip_short_name. Any idea how to tackle this observation with the GTFS validator?

How about vice versa? Does anyone already gave a Duplicate Departure Stop Time Rule a thought? I am observing stop_times.txt entries that differ in unique agency_id, unique route_id, unique trip_id and unique trip_short_name but have the same first and last departure_time and stop_id. I can not imagine several trips with identical first and last departure_time and stop_id with different agency_id, route_id and trip_id. Can someone imagine this observation and has an idea how to tackle it with the GTFS validator?

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

I am also wondering if we can derive from the Trip Coverage Next Days Rule anAgency Coverage Trip Count Rule. I observed many times that agencies provide public transport schedule data only for a subset of trips and not for all trips. In other words, the data delivery is missing the remaining trip subset. As a consequence, you count only a subset of trips per agencies in the resulting GTFS feed. If I provide the GTFS validator with a list of minimum trip counts per agency in a CSV like file, do you think this observation will be tackled by a validation rule? The validator shall tell me the agencies that have trip counts below the minimum trip count per agency threshold.

Cheers!

Describe the new validation rule

Please see above.

Sample GTFS datasets

Please see above.

Severity

Please see above.

Additional context

Please see above.

@dancesWithCycles dancesWithCycles added new rule New rule to be added status: Needs triage Applied to all new issues labels Apr 2, 2024
@emmambd
Copy link
Contributor

emmambd commented May 10, 2024

Hi @dancesWithCycles! Thanks for your patience - our team had several different discussions about your proposed rules here.

Looking at the first and last departure stop and time I can tell that several stop_times.txt entries shall belong to different trips but have the same tips_id or trip_short_name. Any idea how to tackle this observation with the GTFS validator?

Could you share some more context for how you know that the stop_times.txt entries should belong to different trips but are associated with the same trip_id? We assume you're deriving this from the same stop being serviced at different times that are extremely close together, like 8am and 8:10am on the same day. But curious to know more. It would be very helpful if you had a feed example with trip rows to include as well.

Does anyone already gave a Duplicate Departure Stop Time Rule a thought? I am observing stop_times.txt entries that differ in unique agency_id, unique route_id, unique trip_id and unique trip_short_name but have the same first and last departure_time and stop_id. I can not imagine several trips with identical first and last departure_time and stop_id with different agency_id, route_id and trip_id.

Could you share examples of feeds where you're seeing this use case? It may warrant an INFO notice in the validator to flag that something looks strange, but we're wondering if there are cases of aggregate feeds where it might be legitimate.

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

Currently, making this rule dynamic is outside the scope of what's possible with the GTFS validator. However, providing custom validation in the validator has been a long standing feature request that we intend to address in the future (not within the next year though). If you'd like to share your thoughts or needs on this feature, there's an issue for it here.

If I provide the GTFS validator with a list of minimum trip counts per agency in a CSV like file, do you think this observation will be tackled by a validation rule? The validator shall tell me the agencies that have trip counts below the minimum trip count per agency threshold.

Similar to the above question, this is out of scope at present because it requires dynamic inputs. However, Transport Data Gouv has a great GTFS diff tool that can help compare two different feed versions and see if a trip count looks dramatically different from how it did previously. We also provide a trip count in the summary of the validation report.

Let me know if you have any other questions!

@dancesWithCycles
Copy link
Author

I stumbled over the Trip Coverage Next Days Rule. I like this rule very much. Kudos! I could make use of this rule even more if the number of days would be an argument that I can supply to the GTFS validator as parameter on a GTFS feed specific or costumer specific manner. Any idea if a dynamic rule like this is possible with the current architecture of the GTFS validator?

Currently, making this rule dynamic is outside the scope of what's possible with the GTFS validator. However, providing custom validation in the validator has been a long standing feature request that we intend to address in the future (not within the next year though). If you'd like to share your thoughts or needs on this feature, #1067.

Hi there,
I still like the trip_coverage_not_active_for_next7_days rule very much. Currently, I can not use this rule as much as I like. The 7 day window is to short for me in everyday live to react on missing trip coverage in a productive GTFS archive and I am not aware about the definition of ...the majority service window.. That is why I would like to ask the following.

  • In everyday live me or the transport companies in my area are usually interested in a trip coverage for the next 4 weeks or about 1 month to have enough time to be able to react on missing data. What do you think about, for instance, a next_4weeks rule replicated from the next_7days rule? This way we omit the need for custom validation.
  • How is ...the majority service window. defined? Is it based on the feed info, the calendar or the current date plus the trip coverage window (e.g. next 7 days or next 4 weeks)?
  • Shall we continue this rule specific discussion in this issue or create a dedicated one?

Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new rule New rule to be added status: Needs triage Applied to all new issues
Projects
None yet
Development

No branches or pull requests

2 participants