Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: False positives for StopTimeTimepointWithoutTimesNotice #1044

Merged
merged 27 commits into from
Nov 16, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
a72b888
Emit a notice when timepoint = 1 AND times are missing
Oct 26, 2021
febb426
Emit a notice (WARNING) when stop_times.timepoint value is missing fo…
Oct 26, 2021
c2fa449
update javadocs
Oct 26, 2021
b3ec654
Merge branch 'master' into fix/timepoint
Nov 1, 2021
fc14880
apply suggestion from code review
lionel-nj Nov 1, 2021
d44332a
Merge branch 'fix/timepoint' of github.com:MobilityData/gtfs-validato…
Nov 1, 2021
fa647ef
clarify documentation
Nov 1, 2021
8e91071
modify rule logic to flag legacy datasets not using timepoint column
Nov 1, 2021
c1d8fc3
refactor StopTimeTimepointWithoutTimesNotice for code clarity
Nov 1, 2021
7497e16
update rules documentation
Nov 1, 2021
bab59bf
issue MissingTimepointValueNotice in rule logic
Nov 1, 2021
df1f702
modify rule logic
Nov 1, 2021
a2b5e3b
touch
Nov 2, 2021
ddf420b
persist metadata
Nov 2, 2021
1135049
update docs: rename notices
Nov 2, 2021
539fc1f
no longer persist datasets metadata: to be done in a separate PR [acc…
Nov 2, 2021
2c62b0a
apply suggestions from code review
Nov 4, 2021
62c64c1
Merge branch 'master' into fix/timepoint
Nov 4, 2021
afa44dc
apply goJF
Nov 4, 2021
4420e7d
Merge branch 'master' into fix/timepoint
Nov 5, 2021
b9dd33f
apply suggestions from code review
Nov 8, 2021
22c59aa
Merge branch 'master' into fix/timepoint
Nov 10, 2021
e33c960
remove StopTime record where not needed
Nov 16, 2021
dc6462c
add context to notice upgrade [acceptance test skip]
Nov 16, 2021
0a74e40
apply goJF [acceptance test skip]
Nov 16, 2021
84926f4
additional documentation in unit tests [acceptance test skip]
Nov 16, 2021
35f47ce
apply suggestions from code review [acceptance test skip]
Nov 16, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions RULES.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,8 @@ Additional details regarding the notices' context is provided in [`NOTICES.md`](
| [`InconsistentAgencyLangNotice`](#InconsistentAgencyLangNotice) | Inconsistent language among agencies. |
| [`LeadingOrTrailingWhitespacesNotice`](#LeadingOrTrailingWhitespacesNotice) | The value in CSV file has leading or trailing whitespaces. |
| [`MissingFeedInfoDateNotice`](#MissingFeedInfoDateNotice) | `feed_end_date` should be provided if `feed_start_date` is provided. `feed_start_date` should be provided if `feed_end_date` is provided. |
| [`MissingTimepointColumnNotice`](#MissingTimepointColumnNotice) | `timepoint` column is missing for a dataset. |
| [`MissingTimepointValueNotice`](#MissingTimepointValueNotice) | `stop_times.timepoint` value is missing for a record. |
| [`MoreThanOneEntityNotice`](#MoreThanOneEntityNotice) | More than one row in CSV. |
| [`NonAsciiOrNonPrintableCharNotice`](#NonAsciiOrNonPrintableCharNotice) | Non ascii or non printable char in `id`. |
| [`PathwayDanglingGenericNodeNotice`](#PathwayDanglingGenericNodeNotice) | A generic node has only one incident location in a pathway graph. |
Expand Down Expand Up @@ -760,6 +762,24 @@ Even though `feed_info.start_date` and `feed_info.end_date` are optional, if one

##### References:
* [feed_info.txt Best practices](http://gtfs.org/best-practices/#feed_infotxt)

<a name="MissingTimepointValueNotice"/>

#### MissingTimepointValueNotice

Even though the column `timepoint` is optional in `stop_times.txt` according to the specification, `stop_times.timepoint` should not be empty when provided.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specification says 1 or empty - Times are considered exact.. Are we going to change specification?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will advocate for @barbeau's recommendation in google/transit#61. If you support this recommendation or have other ideas, feel free to leave a comment there so we make a PR accordingly :)


##### References:
* [stop_times.txt specification](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#stop_timestxt)

<a name="MissingTimepointColumnNotice"/>

#### MissingTimepointColumnNotice

The `timepoint` column should be provided.

##### References:
* [stop_times.txt bets practices](https://github.com/MobilityData/GTFS_Schedule_Best-Practices/blob/master/en/stop_times.md)

<a name="MoreThanOneEntityNotice"/>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ public int entityCount() {

public abstract String gtfsFilename();

public boolean hasColumn(String columnName) {
return header.hasColumn(columnName);
}

/**
* Returns names of the columns that are keys in that table.
*
Expand Down
24 changes: 24 additions & 0 deletions docs/NOTICES.md
Original file line number Diff line number Diff line change
Expand Up @@ -707,6 +707,8 @@
| `inconsistent_agency_lang` | [`InconsistentAgencyLangNotice`](#InconsistentAgencyLangNotice) |
| `leading_or_trailing_whitespaces` | [`LeadingOrTrailingWhitespacesNotice`](#LeadingOrTrailingWhitespacesNotice) |
| `missing_feed_info_date` | [`MissingFeedInfoDateNotice`](#MissingFeedInfoDateNotice) |
| `missing_timepoint_column` | [`MissingTimepointColumnNotice`](#MissingTimepointColumnNotice) |
| `missing_timepoint_value` | [`MissingTimepointValueNotice`](#MissingTimepointValueNotice) |
| `more_than_one_entity` | [`MoreThanOneEntityNotice`](#MoreThanOneEntityNotice) |
| `non_ascii_or_non_printable_char` | [`NonAsciiOrNonPrintableCharNotice`](#NonAsciiOrNonPrintableCharNotice) |
| `pathway_dangling_generic_node` | [`PathwayDanglingGenericNodeNotice`](#PathwayDanglingGenericNodeNotice) |
Expand Down Expand Up @@ -837,6 +839,28 @@
##### Affected files
* [`feed_info.txt`](http://gtfs.org/reference/static#feed_infotxt)

#### [MissingTimepointColumnNotice](/RULES.md#MissingTimepointColumnNotice)
##### Fields description

| Field name | Description | Type |
|---------------- |------------------------------------------------- |-------- |
| `filename` | The name of the affected file. | String |

##### Affected files
* [`stop_times.txt`](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#stop_timestxt)

#### [MissingTimepointValueNotice](/RULES.md#MissingTimepointValueNotice)
##### Fields description

| Field name | Description | Type |
|---------------- |------------------------------------------------- |-------- |
| `csvRowNumber` | The row number of the faulty record. | Long |
| `tripId` | The faulty record's `stop_times.trip_id`. | String |
| `stopSequence` | The faulty record's `stop_times.stop_sequence`. | String |

##### Affected files
* [`stop_times.txt`](https://github.com/google/transit/blob/master/gtfs/spec/en/reference.md#stop_timestxt)

#### [MoreThanOneEntityNotice](/RULES.md#MoreThanOneEntityNotice)
##### Fields description

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@

package org.mobilitydata.gtfsvalidator.validator;

import javax.inject.Inject;
import org.mobilitydata.gtfsvalidator.annotation.GtfsValidator;
import org.mobilitydata.gtfsvalidator.notice.NoticeContainer;
import org.mobilitydata.gtfsvalidator.notice.SeverityLevel;
import org.mobilitydata.gtfsvalidator.notice.ValidationNotice;
import org.mobilitydata.gtfsvalidator.table.GtfsStopTime;
import org.mobilitydata.gtfsvalidator.table.GtfsStopTimeTableContainer;
import org.mobilitydata.gtfsvalidator.table.GtfsStopTimeTableLoader;
import org.mobilitydata.gtfsvalidator.table.GtfsStopTimeTimepoint;

Expand All @@ -32,56 +34,101 @@
* <ul>
* <li>{@link StopTimeTimepointWithoutTimesNotice} - a timepoint does not specifies arrival_time
* or departure_time
* <li>{@link MissingTimepointValueNotice} - value for {@code stop_times.timepoint} is missing
* <li>{@link MissingTimepointColumnNotice} - field {@code stop_times.timepoint} is missing
* </ul>
*/
@GtfsValidator
public class TimepointTimeValidator extends SingleEntityValidator<GtfsStopTime> {
public class TimepointTimeValidator extends FileValidator {
private final GtfsStopTimeTableContainer stopTimes;

@Inject
TimepointTimeValidator(GtfsStopTimeTableContainer stopTimes) {
this.stopTimes = stopTimes;
}

@Override
public void validate(GtfsStopTime stopTime, NoticeContainer noticeContainer) {
if (!isTimepoint(stopTime)) {
public void validate(NoticeContainer noticeContainer) {
if (!stopTimes.hasColumn(GtfsStopTimeTableLoader.TIMEPOINT_FIELD_NAME)) {
// legacy datasets do not use timepoint column in stop_times.txt as a result:
// - this should be flagged;
// - but also no notice regarding the absence of arrival_time or departure_time should be
// generated
noticeContainer.addValidationNotice(new MissingTimepointColumnNotice());
return;
}
if (!stopTime.hasArrivalTime()) {
noticeContainer.addValidationNotice(
new StopTimeTimepointWithoutTimesNotice(
stopTime.csvRowNumber(),
stopTime.tripId(),
stopTime.stopSequence(),
String.format(GtfsStopTimeTableLoader.ARRIVAL_TIME_FIELD_NAME)));
}
if (!stopTime.hasDepartureTime()) {
noticeContainer.addValidationNotice(
new StopTimeTimepointWithoutTimesNotice(
stopTime.csvRowNumber(),
stopTime.tripId(),
stopTime.stopSequence(),
GtfsStopTimeTableLoader.DEPARTURE_TIME_FIELD_NAME));
for (GtfsStopTime stopTime : stopTimes.getEntities()) {
if (!stopTime.hasTimepoint()) {
noticeContainer.addValidationNotice(new MissingTimepointValueNotice(stopTime));
}
if (isTimepoint(stopTime)) {
if (!stopTime.hasArrivalTime()) {
noticeContainer.addValidationNotice(
new StopTimeTimepointWithoutTimesNotice(
stopTime, GtfsStopTimeTableLoader.ARRIVAL_TIME_FIELD_NAME));
}
if (!stopTime.hasDepartureTime()) {
noticeContainer.addValidationNotice(
new StopTimeTimepointWithoutTimesNotice(
stopTime, GtfsStopTimeTableLoader.DEPARTURE_TIME_FIELD_NAME));
}
}
}
}

private boolean isTimepoint(GtfsStopTime stopTime) {
return stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT);
return stopTime.hasTimepoint() && stopTime.timepoint().equals(GtfsStopTimeTimepoint.EXACT);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage of stopTime.hasTimepoint() shouldn't be necessary, because timepoint has a default value (see [0]). Was it added because unit tests don't respect the default values?

[0]

Copy link
Contributor Author

@lionel-nj lionel-nj Nov 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually because of the fact that timepoint has a default value we need to check stopTime.hasTimepoint(). See, if a stopTime has no value for timepoint or has value 1 for timepoint, stopTime.timepoint() returns the same value EXACT. Using stopTime.hasTimepoint() makes it possible to distinguish these two situations.

Bottom line, a stopTime is considered a timepoint if the value is provided and set to 1, otherwise it is not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in this case should we remove default value instead?

Copy link
Contributor Author

@lionel-nj lionel-nj Nov 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make sense, thoughts @isabelle-dr @barbeau ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed to remove default value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other default values, it seems, that default variable type value is used (e.g. 0 for integers),

@asvechnikov2 I was trying to trace the execution for this too. Where do you see this assignment?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's this one

switch (field.type()) {
case ENUM:
case INTEGER:
case FLOAT:
case LATITUDE:
case LONGITUDE:
return CodeBlock.of("0");
case DECIMAL:
return CodeBlock.of("$T.ZERO", BigDecimal.class);
case COLOR:
return CodeBlock.of("$T.fromInt(0)", GtfsColor.class);
case TEXT:
case URL:
case PHONE_NUMBER:
case ID:
case EMAIL:
return CodeBlock.of("\"\"");
case DATE:
return CodeBlock.of("$T.fromEpochDay(0)", GtfsDate.class);
case TIME:
return CodeBlock.of("$T.fromSecondsSinceMidnight(0)", GtfsTime.class);
case TIMEZONE:
return CodeBlock.of("$T.of(\"UTC\")", ZoneId.class);
case CURRENCY_CODE:
case LANGUAGE_CODE:
default:
return CodeBlock.of("null");

Copy link
Member

@barbeau barbeau Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvechnikov2 Ah, thanks! Looks like you're right. I just ran this through the debugger with two different datasets to confirm:

This code is what auto-generates the GtfsStopTime.DEFAULT_TIMEPOINT variable value.

So the current behavior is:

  • If there is no @DefaultValue annotation, if the column is missing or the row has an empty value for an enum then the default value of 0 is used.
  • If there is a @DefaultValue annotation, then the provided default value is used.

So the default enum value is the value defined for 0 - for GtfsStopTimeTimepoint this is APPROXIMATE(0). This is similar to protocol buffer bindings behavior. I agree that explicitly defining the default enum value everywhere via @DefaultValue would be a good best practice to avoid confusion, especially for newcomers adding new fields where is isn't obvious that the 0 enum value is the de facto default.

Back to the original question - if we remove @DefaultValue("1"), the new value of timepoint field would be GtfsStopTimeTimepoint.APPROXIMATE if the timepoint column is missing or the field has an empty value in the CSV. We don't want this either because of legacy datasets from the original GTFS spec before the timepoint field existed (because all those records with times should be considered EXACT).

IMHO the best solution is still having the default is set to EXACT IFF the timepoint column isn't provided (EDIT - we also have to consider the presence/absence of times - if times are provided it's EXACT, otherwise APPROXIMATE), and otherwise have the default set to UNRECOGNIZED for empty. I think the 2nd best solution is to leave the @DefaultValue("1"), because that aligns with the legacy GTFS datasets and possible interpretations of the current GTFS spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO the best solution is still having the default is set to EXACT IFF the timepoint column isn't provided, and otherwise have the default set to UNRECOGNIZED for empty. I think the 2nd best solution is to leave the @DefaultValue("1"), because that aligns with the legacy GTFS datasets and possible interpretations of the current GTFS spec.

@barbeau, @asvechnikov2

After discussion with @isabelle-dr, we agreed on keeping the @DefaultValue("1") for now since removing this entails deeper conversations about how default values should be dealt with. We suggest to keep track of this in a separate issue and work on it as future improvement.

If you agree with that, I'll ask one of you their approval on this PR if the code looks good to you :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd add a comment to clarify the behaviour, here and probably in the class docstring as well.

}

/**
* Timepoint without time
*
* <p>Severity: {@code SeverityLevel.WARNING}
* <p>Severity: {@code SeverityLevel.WARNING} - to be upgraded as {@code SeverityLevel.ERROR}
lionel-nj marked this conversation as resolved.
Show resolved Hide resolved
*/
static class StopTimeTimepointWithoutTimesNotice extends ValidationNotice {
private final long csvRowNumber;
private final String tripId;
private final long stopSequence;
private final String specifiedField;

StopTimeTimepointWithoutTimesNotice(
long csvRowNumber, String tripId, long stopSequence, String specifiedField) {
StopTimeTimepointWithoutTimesNotice(GtfsStopTime stopTime, String specifiedField) {
super(SeverityLevel.WARNING);
this.csvRowNumber = csvRowNumber;
this.tripId = tripId;
this.stopSequence = stopSequence;
this.csvRowNumber = stopTime.csvRowNumber();
this.tripId = stopTime.tripId();
this.stopSequence = stopTime.stopSequence();
this.specifiedField = specifiedField;
}
}

/**
* {@code stop_times.timepoint} value is missing
*
* <p>Severity: {@code SeverityLevel.WARNING}
*/
static class MissingTimepointValueNotice extends ValidationNotice {
private final long csvRowNumber;
private final String tripId;
private final long stopSequence;

MissingTimepointValueNotice(GtfsStopTime stopTime) {
super(SeverityLevel.WARNING);
this.csvRowNumber = stopTime.csvRowNumber();
this.tripId = stopTime.tripId();
this.stopSequence = stopTime.stopSequence();
}
}

/**
* Column {@code stop_times.timepoint} is missing.
*
* <p>Severity: {@code SeverityLevel.WARNING}
*/
static class MissingTimepointColumnNotice extends ValidationNotice {
private final String filename;

MissingTimepointColumnNotice() {
super(SeverityLevel.WARNING);
this.filename = GtfsStopTimeTableLoader.FILENAME;
}
}
}
Loading