-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ERROR cell type when using inferSchema=true #343
Support ERROR cell type when using inferSchema=true #343
Conversation
@nightscape I'm afraid it's not possible to individually set
Please see the commented out test case for more info. Appreciate if you have any idea on how to output individual If it's not possible, we can just go with the default behaviour of processing |
@derianpt overall great job! |
Hi @nightscape, have you had the time to check out the tests yet? 🙇 |
I just had a look. I think the problem is that the second column in your example xlsx file contains only errors. |
8e497af
to
0237731
Compare
Thanks for the tip @nightscape. Turns out I was missing the code change to update how cell values should being extracted for However, there is a side effect when we treat errors as strings: the entire column will then be interpreted as string type. Understandable and acceptable behaviour imo. This is reflected in the first test case. I've also put a note of this in the README. |
If I understand correctly, the type of the column changes to String only if there is an error in that column, and then only if it is within the first Something else that is worth knowing is that there is the possibility to add metadata to a column. It would be possible to use this to specify a sensible fallback value for one specific column (e.g. |
I have modified the test file & confirmed this behaviour. Looks like blanket converting to String will not work, we need to conform to the inferred column type. However, setting as I suggest this strategy;
For the sensible fallback values, we can refer to the range of values defined in spark docs This is similar to EDIT: Please see the latest code changes |
Looks good! I'll merge once it's ready from your side 👍 |
Yup PR is ready to be merged from my side |
Hi @derianpt, great job! |
Sure @nightscape |
@nightscape Could I trouble you to release a new version with this fix? I would like to use it in my app 🙇 |
I just fixed the SNAPSHOT release mechanism, so every commit to main gets a release. I would also like to add you as contributor to the project, then you could create a |
Yes I have 2FA enabled. |
Fixes #208
When using
inferSchema
option, and the excel file we are reading hasERROR
type cells, we will have a runtime error:scala.MatchError: ERROR (of class shadeio.poi.ss.usermodel.CellType)
This PR adds support to have the option of treating ERROR cells as string cells and outputting the error messages instead (e.g.
#N/A
,#NULL!
)