-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Tab character in a value in a JSON file results in a truncated data import #1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think 3 would be most appropriate given OpenRefine is a data cleansing tool. And we should try to cleanse a bit of that. Folks expect it not to fail for a String value and we can just escape for those cases. |
Looks like we might be able to just expand on the CASE logic for VALUE_EMBEDDED_OBJECT |
Hi, I am now working with this issue and had make some progress. Just double check if I understand the expected effect correct: file with |
I am not sure what you mean by "two types of |
I see, so unquoted control chars (like \t, \n) should be ignored during the parsing right? |
I would not ignore these unescaped control characters - see the 3 options mentioned by Owen above. |
This currently does produce an error:
although it doesn't happen until you make your record selection (ie not in initial preview). Is there ever a time that we don't want to be permissive in parsing? ie Do we really need an import option for this? It's an invalid JSON file, but we're a data wrangling tool, not a validation tool. |
We clean messy data. We DO NOT clean messy formats? I guess that's what we should adhere to since we are dependent on a large ecosystem of helper libraries for parsing and validating before importing the data. In general, I take back my comment above that we should offer an option to cleanse unescaped values to clean messy JSON, since I was not aware that this was in fact an invalid or unknown format parsing error for JSON (Thanks to @tfmorris for helping me understand now on this) Let's close this issue if we all agree and that our existing error message is adequate enough for this.
|
I think option 3, permissive parsing, always enabled with no option setting is the right answer here.
More generally, I'd really like to avoid the situation where someone works on a PR that addresses an issue only to have the code review devolved into a labored discussion as to whether the original issue was valid or not. Let's get that resolved up front and not flip flop on our decisions. We have a number of open PRs where this is happening and it's got to be discouraging to new contributors. |
@tfmorris Ah, so you agreed with my original feeling, OK. I thought you were disagreeing. The flip-flopping happens because I read one thing instead of hearing you voice the comment. So I think about it more, because I value your opinion @tfmorris and think around it. No problem. I'm very used to code review opening up more questions back to the original design issue #. I think new contributors should be comfortable with some of that; they'll navigate that in the real world soon enough. But agree that the issues are where we should do the collaborating on the design or on the dev mailing list. |
Closed by #2431 |
Uh oh!
There was an error while loading. Please reload this page.
When importing JSON to OpenRefine through 'Create Project', if there is a tab inside a value in the JSON file, the import succeeds but only imports data up to the tab character. e.g. JSON:
{ "rows": [ { "Content": "A" }, { "Content": " B" }, { "Content": "C" } ] }
(tab character is before the 'B')
Since tab is an illegal character in this position (should be escaped as \t) it makes sense for the import to fail - but failing silently is a problem.
Not sure what the best solution is - three possibilities:
The text was updated successfully, but these errors were encountered: