Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid content #79

Merged
merged 5 commits into from
May 28, 2020
Merged

Invalid content #79

merged 5 commits into from
May 28, 2020

Conversation

kaitj
Copy link
Collaborator

@kaitj kaitj commented May 14, 2020

Handles previously found issue of a server error in the case of invalid content being uploaded (#71). This PR adds code to check for this by checking if a version is found. Also adds in a new test case for this particular error.

Additionally, removes an old error that is no longer raised!

- removed old ValueError (was not being raised and may have been from earlier version
- added IndexError, which is raised if a version is not found in the file
  - this is raised when the content contains non standad .fcsv content (eg. html)
- added a test case to test_model_auto.py
@github-actions github-actions bot requested a review from tkkuehn May 14, 2020 19:21
@kaitj kaitj mentioned this pull request May 14, 2020
Copy link
Collaborator

@tkkuehn tkkuehn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely looks good to me, with one criticism: In the case where the regex happens to match something on the first line but the rest of the document isn't a properly formatted csv, this will get through the check you added at line 102 but raise an unhandled csv.Error at line 120 (I think), so I don't think this fully solves the problem. Would like to see a test case for this (pathological) situation and some code to handle it gracefully.

@kaitj
Copy link
Collaborator Author

kaitj commented May 27, 2020

Looking at it again, I agree that we can probably have a better check in place, but I also see 3 cases where an error is going to get thrown if it passes the first check:

  1. Invalid content (either line 129 or line 152)
  2. Too few rows (if the content doesn't have the same number of rows line 159)
  3. Too many rows (as you mention the error at line 120)

I think (1) or (3) is most likely to occur in the case it passes that first test, in which case you'll probably be checking the .fcsv manually. We could write a check to properly validate the file, but I'm not sure how we go about implementing that without checking the file line by line (essentially what we are doing already anyways). Any ideas? (In the mean time I am going to mock up a fake .fcsv with a proper header and see what happens)

@tkkuehn
Copy link
Collaborator

tkkuehn commented May 27, 2020

I think we're already handling all those kinds of errors, I just mean that the csv reader won't know what to do when it starts parsing at line 120 if the file has no CSV-like structure, which will lead to an exception we're not handling. See the second last example here for an example of how to handle that kind of case.

@kaitj
Copy link
Collaborator Author

kaitj commented May 27, 2020

Ahh I see what you mean now. Can discuss some more during the call today!

@kaitj
Copy link
Collaborator Author

kaitj commented May 28, 2020

Changes from Tristan to handling valid header with invalid content looks good! Merging the PR and closing

@kaitj kaitj merged commit fe47358 into master May 28, 2020
@kaitj kaitj deleted the invalid-content branch May 28, 2020 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants