Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Catch CSV parse errors and add validation notices #874

Merged
merged 2 commits into from
Apr 29, 2021

Conversation

aababilov
Copy link
Collaborator

If a CSV file failed to parse, there will be a clear user-visible
notice, e.g.:

        {
            "code": "csv_parsing_failed",
            "severity": "ERROR",
            "totalNotices": 1,
            "notices": [
                {
                    "filename": "stops.txt",
                    "charIndex": 278201,
                    "columnIndex": 2,
                    "lineIndex": 2148,
                    "message": "Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096).",
                    "content": "..."
                }
            ]
         }

If a CSV file failed to parse, there will be a clear user-visible
notice, e.g.:

```
        {
            "code": "csv_parsing_failed",
            "severity": "ERROR",
            "totalNotices": 1,
            "notices": [
                {
                    "filename": "stops.txt",
                    "charIndex": 278201,
                    "columnIndex": 2,
                    "lineIndex": 2148,
                    "message": "Length of parsed input (4097) exceeds the maximum number of characters defined in your parser settings (4096).",
                    "content": "..."
                }
            ]
         }
```
@aababilov
Copy link
Collaborator Author

Note that Univocity provides ExpandingCharAppender that can expand up to (Integer.MAX_VALUE - 8). We may switch to it in a separate PR if there is a use case for that (which I doubt because a CSV value longer than 4096 characters looks invalid to me).

Copy link
Contributor

@lionel-nj lionel-nj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working on that @aababilov! Could you update RULES.md with this new notice description please?

Note that Univocity provides ExpandingCharAppender that can expand up to (Integer.MAX_VALUE - 8). We may switch to it in a separate PR if there is a use case for that (which I doubt because a CSV value longer than 4096 characters looks invalid to me).

@MobilityData/transit-specs could the GTFS specification add a requirement? "each cell value must be shorter than 4096 characters"? Unless there is a use case for as @aababilov suggested.

@lionel-nj
Copy link
Contributor

Also @maximearmstrong you should take a look, see how his new notice code impacts the MobilityDatabase.

@aababilov
Copy link
Collaborator Author

thanks for working on that @aababilov! Could you update RULES.md with this new notice description please?

Done.

@MobilityData/transit-specs could the GTFS specification add a requirement? "each cell value must be shorter than 4096 characters"? Unless there is a use case for as @aababilov suggested.

Good question. This requires an extra PR.

Copy link
Contributor

@lionel-nj lionel-nj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lionel-nj lionel-nj merged commit 58f4793 into MobilityData:master Apr 29, 2021
@timMillet
Copy link

@MobilityData/transit-specs could the GTFS specification add a requirement? "each cell value must be shorter than 4096 characters"?

Issue created to keep track https://github.com/MobilityData/gtfs-tasks/issues/118

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants