-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: False positives for StopTimeTimepointWithoutTimesNotice #1044
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions and comments in-line before approval :)
* </ul> | ||
*/ | ||
@GtfsValidator | ||
public class TimepointTimeValidator extends SingleEntityValidator<GtfsStopTime> { | ||
|
||
@Override | ||
public void validate(GtfsStopTime stopTime, NoticeContainer noticeContainer) { | ||
if (!stopTime.hasTimepoint()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the validator fill the column with a default value if it is not in the dataset, or are we not entering this TimepointTimeValidator
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the validator fill the column with a default value if it is not in the dataset,
Yes, default is 1
.
Co-authored-by: Maxime Armstrong <46797220+maximearmstrong@users.noreply.github.com>
…r into fix/timepoint
One question here:
It's not exactly what is mentioned here, is this a typo in the description of this PR? We talked about throwing the Also: do you need real datasets that fall in those different categories to test this PR? |
Indeed, this will be fixed in the next commits. Thanks for flagging that.
I'll use the one you flagged in this document, thanks. |
@maximearmstrong PTAL - the rule logic has been changed a bit to generate At present, out of ~1200 datasets on the MobilityDatabse:
|
That The number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rename the notice names? :)
- MissingTimepointColumn -> MissingTimePointColumnNotice
- MissingTimepointValue -> MissingTimepointValueNotice
main/src/main/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidator.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
private boolean isTimepoint(GtfsStopTime stopTime) { | ||
return stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT); | ||
return stopTime.hasTimepoint() && stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, in this case should we remove default value instead?
main/src/main/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidator.java
Outdated
Show resolved
Hide resolved
.setDepartureTime(null) | ||
.setStopId("stop id 0") | ||
.setStopSequence(2) | ||
.setTimepoint((Integer) null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
It looks like we may need to create a feature request to have an ability to specify test entities in the csv format, otherwise, we're trying to repeat logic of auto generated code, which may change and we end up with wrong tests.
- clarify docstrings - remove use of Strings.format()
@asvechnikov2 can we get a review so we can merge this PR? |
main/src/main/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidator.java
Outdated
Show resolved
Hide resolved
.setDepartureTime(GtfsTime.fromSecondsSinceMidnight(580)) | ||
.setStopId("stop id 2") | ||
.setStopSequence(2) | ||
.setTimepoint(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it right to set timepoint equal to 1
for this test? From the method name I expected both to be set to null
. If it's intended, IMHO it's a bit awkward as it's not real data - you'd never have two records for stop times where one has the column and the other does not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up question after looking at other tests - how do you differentiate between "no column" and "no value" in the unit tests?
Is the missing timepoint
value in CsvHeader
header what triggers MissingTimepointColumnNotice
(even if you call .setTimepoint(1)
on one of the StopTime records)?
Is setting .setTimepoint(null)
what is used to define a missing value (even if the timepoint
value is included in header)?
I'd suggest clearly documenting this behavior in the unit tests as right now that's not 100% clear (to me at least), even following the discussion thread in https://github.com/MobilityData/gtfs-validator/pull/1044/files#r742464479. And I'd avoid any confusing data that won't exist in the real world, like a missing column but a record with a timepoint value (i.e., this test currently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up question after looking at other tests - how do you differentiate between "no column" and "no value" in the unit tests?
- no column -> these tests use he legacy header
- no value -> these tests use the regular header, with
null
timepoint
value.
Is the missing timepoint value in CsvHeader header what triggers MissingTimepointColumnNotice (even if you call .setTimepoint(1) on one of the StopTime records)?
Yes.
Is setting .setTimepoint(null) what is used to define a missing value (even if the timepoint value is included in header)?
Exactly.
I'd suggest clearly documenting this behavior in the unit tests as right now that's not 100% clear (to me at least), even following the discussion thread in https://github.com/MobilityData/gtfs-validator/pull/1044/files#r742464479. And I'd avoid any confusing data that won't exist in the real world, like a missing column but a record with a timepoint value (i.e., this test currently).
👍🏾 I removed the non-necessary (and misleading records) in e33c960. And added additional documentation in 84926f4.
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
This contribution does not follow the conventions set by the Google Java style guide. Please run the following command line at the root of the project to fix formatting errors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lionel-nj, a suggestion for docs and a few more comments in-line.
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
main/src/test/java/org/mobilitydata/gtfsvalidator/validator/TimepointTimeValidatorTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asvechnikov2 can we get a review so we can merge this PR? See this comment for what we are proposing to do.
LGTM! Sorry, I was sure that I'd removed "requested changes" review and the PR had been merged already.
} | ||
} | ||
|
||
private boolean isTimepoint(GtfsStopTime stopTime) { | ||
return stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT); | ||
return stopTime.hasTimepoint() && stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd add a comment to clarify the behaviour, here and probably in the class docstring as well.
Summary:
This PR provides support to no longer throw false positives in
TimepointValidator
.Expected behavior:
sample data 1 (no timepoint column)
MissingTimepointColumnNotice
should be issued once.sample data 2 (timepoint column + no empty values)
No notice
sample data 3 (timepoint with no times)
StopTimeTimepointWithoutTimesNotice
should be issued for the 2nd row.sample data 4 (approximate, with times)
MissingTimepointValueNotice
should be issued for the 2nd and 3rd row.sample data 5 (approximate, with no times)
MissingTimepointValueNotice
should be issued for the 2nd and 3rd row.Please make sure these boxes are checked before submitting your pull request - thanks!
gradle test
to make sure you didn't break anything[ ] Include screenshot(s) showing how this pull request works and fixes the issue(s)