Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False positives of missing_required_field when encountering empty transfer_type values in transfers.txt #1198

Closed
evansiroky opened this issue Jun 28, 2022 · 9 comments · Fixed by #1344
Assignees
Labels
bug Something isn't working (crash, a rule has a problem) GTFS Reference Used for Adding or changing rules that belong in the GTFS reference status: In review

Comments

@evansiroky
Copy link

Bug report

The GTFS Validator is generating false positives of the missing_required_field when it encounters empty transfers.txt#transfer_type field values.

Describe the bug

Given an empty value within the transfers.txt#transfer_type field (which is valid according to the spec), when the validator checks the file, then it will falsely raise a missing_required_field notice.

How we reproduce the bug

Example feed: Archive.zip

In this feed in the transfers.txt file, the following use case is shown:

from_stop_id,to_stop_id,transfer_type,min_transfer_time
1,2,,

This is a valid use case in that the transfer_type value is empty which indicates a "Recommended transfer point between routes".

Expected behaviour

The validator should not flag this as an error.

Observed behaviour

The GTFS Validator flagged the above-mentioned use case as an error.

Screenshots:

See relevant JSON output for the above-linked feed.

{
  "notices": [{
    "code": "missing_required_field",
    "severity": "ERROR",
    "totalNotices": 1,
    "sampleNotices": [{
      "filename": "transfers.txt",
      "csvRowNumber": 2.0,
      "fieldName": "transfer_type"
    }]
  },
...

Environment versions

  • validator version: v2, v3.0.1, v3.1.0
  • Java version:
  • OS versions:
@evansiroky evansiroky added the bug Something isn't working (crash, a rule has a problem) label Jun 28, 2022
@github-actions
Copy link
Contributor

Thank you for your reporting a bug. The issue has been placed in triage, the MobilityData team will follow-up on it.

@isabelle-dr
Copy link
Contributor

isabelle-dr commented Jun 29, 2022

Hello Evan,

Thanks for flagging this! This is super useful, having a reliable validation report is a top priority. 🙂
You are right, this field is considered required in the validator, so the notice is triggered for every row that has an empty value, regardless of whether the column is present or not.
👇 Source code from GtfsTransferSchema.java.

  @Required
  GtfsTransferType transferType();

This is another example of a slightly blurry definition of what "empty" means in the spec. We have a similar issue with timepoint, and we ended up differentiating between (a) an empty value but an existing column, and (b) an empty column.
I think we need to follow a similar logic here.

  1. empty value but existing column: valid data
from_stop_id to_stop_id transfer_type min_transfer_time
1 2
  1. empty column: invalid data
from_stop_id to_stop_id min_transfer_time
1 2

cc @Cristhian-HA @bdferris @barbeau

@isabelle-dr isabelle-dr added the GTFS Reference Used for Adding or changing rules that belong in the GTFS reference label Jun 29, 2022
@isabelle-dr isabelle-dr added this to the Rules improvements milestone Jun 29, 2022
@isabelle-dr isabelle-dr removed this from the Rules improvements milestone Oct 3, 2022
@isabelle-dr isabelle-dr added need spec clarification Needs a modification in the specification status: Ready An issue that is ready to be worked on. and removed need spec clarification Needs a modification in the specification labels Oct 3, 2022
@isabelle-dr isabelle-dr added this to the Q1 2023 milestone Feb 14, 2023
@briandonahue
Copy link
Collaborator

@isabelle-dr @bdferris-v2 It seems that @Required annotation is used to validate both that the header exists (DefaultTableHeaderValidator.java) and that the field is non-empty (RowParser.java).

If I understand correctly, the goal here is to generate an error notice if the column header is missing, but no notice if the header is present and the field is empty? Or is it conditional - the field should not be empty if min_transfer_time is non-empty?

@isabelle-dr
Copy link
Contributor

isabelle-dr commented Feb 23, 2023

@briandonahue that's correct, the goal is to generate a notice if the file transfers.txt is present but the header transfer_type is not in the header of the file. No link with min_transfer_time here.

It looks like @Required in the schema file was not a good approach because it generates missing_required_field for every record that has an empty value in transfer_type.

This "0 or empty" semantic is often used for optional fields in the spec, I believe transfer_type is the only required field that has it. It's not great in terms of interpretation by a machine.

@isabelle-dr
Copy link
Contributor

Could we use the MissingRequiredColumn notice for this?

@briandonahue
Copy link
Collaborator

Could we use the MissingRequiredColumn notice for this?

@isabelle-dr I believe so. I'll just add a validator that manually checks that the header exists, and throws that notice, and remove the @Required annotation.

@briandonahue
Copy link
Collaborator

@isabelle-dr on further review - I see that there is a GtfsTableContainer.TableStatus.INVALID_HEADERS status for Tables that is supposed to represent if columns are missing, which is tested here. I think a better approach may be to add a new annotation, @RequiredColumn or @RequiredHeader? And potentially rename @Required to @RequiredField or @RequiredValue?

Then we would update the logic in AnyTableLoader.java to check both annotations and add the MissingRequiredColumn notice in either case.

@isabelle-dr
Copy link
Contributor

Interesting. I think we also use @Required for files.
Brian is on holiday this week.
@aababilov, any wisdom to share?

@briandonahue
Copy link
Collaborator

I think we also use @Required for files.

You're correct, I missed that detail. Well, we could still do this and just add a @RequiredHeader annotation and leave the @Required functionality as-is. I'd argue it might be clearer to have separate annotations for Fields, Headers, and Files but I could raise that as a separate issue if it seems useful...

@briandonahue briandonahue self-assigned this Feb 23, 2023
@isabelle-dr isabelle-dr added status: In review and removed status: Ready An issue that is ready to be worked on. labels Mar 6, 2023
@isabelle-dr isabelle-dr modified the milestones: Q1 2023, 4.1.0 Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working (crash, a rule has a problem) GTFS Reference Used for Adding or changing rules that belong in the GTFS reference status: In review
Projects
Development

Successfully merging a pull request may close this issue.

3 participants