-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Type Inferrence in inconsistent lists of dictionaries #31
Comments
Hi, The 1st element in the array says that
Option 1: Hard error. This is what
Option 2: Soft error. This is the current implementation in
Option 3: Remove the field from the schema, and also print a warning. Maybe Option 3 is the better choice, since it allows the rest of the data file to be imported. |
Hi, many thanks for the answer! I've just realized that even in the simpler case: { "x": 30 }
{ "x": "50px" } schema-generator will infer "type": "INTEGER". { "x": "50px" }
{ "x": 30 } schema-generator will infer "type": "STRING". What drawbacks does casting to string have in such a case? |
Yes your results looks to be correct because I'm not sure I understand your Option 4: why is STRING better than INTEGER? What if you had 1000 INTEGERs but only a single record with a STRING? Even if It seems like the solution for you is to pre-filter your dataset to eliminate the records that you don't want, and keep the ones that you do want, then generate the schema using In the meantime, I think I will implement Option 3 and make |
FYI: I won't be able to look at this again for the next 1-2 weeks... in case you start wondering about the lack of responses. |
Hey, thanks anyway for the responce! 👍 |
(Thanks, back from vacation.) I'm also a fan of putting everything in BQ then figuring out things later. For this particular problem of inconsistent types for a given field/column, it's not clear to me what I'm planning on implementing Option 3 that I described above, which removes the offending field from the generated schema. That will allow the dataset to be loaded using the (Keeping this issue open so that I can attach my commits to it.) |
Pushed 0.5 to PyPI. Closing. |
Hi, I would expect that module would pass the following test:
Unfortunately, type of the "i" field returned is INTEGER. I have a problem with understanding if it is this a bug - it's seems to be technically doable and useful, but it also seems to be a case mentioned somewhere in README - "but bq load does not support it, so we follow its behavior".
Is is a bug to be fixed or not?
The text was updated successfully, but these errors were encountered: