-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
False positives for StopTimeTimepointWithoutTimesNotice #954
Comments
Thanks @mcplanner for opening.
The GTFS specification says that if
I am not sure if a notice should be generated here, the specification handles the case where no value is attributed to Please feel free to close this issue, if that answers your interrogations. |
Here's example data from the above GTFS file:
tl;dr - This data is fine and the validator shouldn't be logging any errors. The reason is that this data conforms to the original GTFS spec prior to the IMHO the timepoint definition really needs to be clarified in the spec as it currently doesn't communicate this concept clearly - see google/transit#61. @MobilityData/transit-specs |
The GTFS Best Practices (http://gtfs.org/best-practices/#stop_timestxt) say:
Sticking to the canonical validator definition, timepoint field missing couldn't be an error, but could certainly be a warning. |
Thanks for the clarification @barbeau! Following your suggestion, I agree that the spec is not clear on this particular point. I'll keep this issue open so that we can fix this bug. Does the following seems reasonable?
|
Here are the cases to cover. For all stop_times.txt files:
If timepoint column does not exist:
If timepoint column exists:
There is also the case where a stop_times.txt file does not have the timepoint column, but all records have arrival and departure times populated. Technically this was against the original GTFS spec, but it became a common practice, and therefore we shouldn't consider it an error because it will invalidate a lot of old datasets. In this case we'll emit a warning anyway that timepoint should be added. Does that make sense? |
✅ Covered in
✅ Covered in
⏳
Yes, @isabelle-dr should the specification be clarified first or could these changes be implemented before? |
Hey all, thanks for flagging and looking into this. The case of legacy data with no timepoint column (prior to the timepoint field existing) was flagged when we initially introduced that rule. I agree that the root problem is that the specification contains two contradictory statements about the meaning of an "empty" value in
The former could mean that This is extremely well described in google/transit/issues/61. How should the validator work?
Absolutely, and this will be a new notice based on the Best Practices, that we should add at some point.
I would only implement the second statement proposed here (at least for now) because the first is stricter than the specification: it would make the following data invalid, and there is nothing in spec currently that would justify that.
@lionel-nj I'd say we're good to make the changes in the validator to maintain the backwards compatibility (second statement proposed by Sean, and your initial idea), but not upgrading this rule to error, this will be done in the PR we already have. |
Here are some examples of what would change, @barbeau does this seem right? Case 1: legacy datasetCurrent behaviour: The notice is emitted for stops 2 & 3
Case 2: dataset with timepoint column, filledCurrent behaviour: No notice
Case 3: invalid dataCurrent behaviour: The StopTimeTimepointWithoutTimesNotice is emitted for stop 2
Case 4 (edge case): dataset with timepoint columns, only 1's, all times populatedCurrent behaviour: No notice
Case 5 (edge case): dataset with timepoint column, one 1's and some empty timesCurrent behaviour: The StopTimeTimepointWithoutTimesNotice is emitted for stops 2 and 3
|
@isabelle-dr Cases 1, 2, and 3 LGTM 👍
So Cases 4 and 5 fall into this category. I agree that the current language in the spec is fuzzy around this, although the original intent when introducing the Do we know if there are real datasets in the wild that are partially populating |
@barbeau here are some interesting edge cases with partial implementations, with some real data samples.
They seem to have assumed that since 1 or empty - Times are considered exact. (
Here, they seem to have assumed that since times are Conditionally Required: Required for timepoint=1. Optional otherwise.(
This part of the data is at the end of the dataset. The timepoint values have been added for most of the lines, but after this point, all timepoint values are blank. Does this mean all times are exact? Or approximate? I believe neither, the values just haven't been added. There are also datasets that added the timepoint column but all values are blank, and it seems obvious that it doesn't mean all values are timepoints, see the Via Rail Canada dataset below:
|
😱 Yikes - so there is real-world data all over the place. It would be nice to get this clarified in the spec ASAP so we can implement the original intent in the validator and get these datasets cleaned up. There seemed to be clear agreement on how the data was intended to be represented in the original proposal IIRC, the wording just apparently wasn't clear enough. |
As agreed with @isabelle-dr and @barbeau, we will implement the following solutions:
This rule will consider legacy datasets as valid, and the validator will behave as follows: sample data 1 (no timepoint column)
No notice sample data 2 (timepoint column + no empty values)
No notice sample data 3 (timepoint with no times)
A notice will be generated for the 2nd row. *sample data 4 (approximate, with times)
No notice sample data 5 (approximate, with no times)
No notice
|
@mcplanner-zz could you provide an URL to download the dataset you mentionned in the issue description please? I tried download the dataset linked to the URL you provided but could not: the operation timed out with error code |
Bug report
Describe the bug
The StopTimeTimepointWithoutTimesNotice error is displayed when the stop_times.timepoint field is blank. This is unexpected. I would expect this to be interpreted as having no timepoints.
How we reproduce the bug
Steps to reproduce the behaviour:
This happened with this field: http://iportal.sacrt.com/GTFS/Unitrans/google_transit.zip
Expected behaviour
If the timepoint field isn't being used, I expect an error about it's being missing.
Observed behaviour
I was alerted that every timepoint was lacking an arrival and departure time.
Screenshots:
Environment versions
Java 11, v2.0.0 validator, https://github.com/cal-itp/gtfs-validator
The text was updated successfully, but these errors were encountered: