-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve inference of booleans to handle string representations #153
Comments
Relates to |
@gsheni can you explain why the example you gave is currently inferred as categorical? I wonder if this should get classified as a bug instead, and the fix is simply to set Yeah, if I follow this right, my suggestion is:
|
@dsherry My example wasn't clear enough. Let say we had we had some Data Columns like this:
All of DataColumns should be inferred with the Boolean Logical Type, and converted to the following representation (pd.BooleanDtype). [True, False, True, True] If there is |
Got it. So these [True, False, True, True, np.nan]
[True, False, True, True, pd.NA] would also end up as boolean logical type, converted to [True, False, True, True, pd.NA] yes? It occurs to me we'll want the same nan-tolerant behavior when we infer any type, not just booleans, right? Are there other types which we need to address right now? Whoever picks this up, please look into that / add test coverage to look into that :) |
@dsherry Yes, we want the NaNs converted properly for Boolean Logical Types. Though, this issue is more about converting string representations of boolean:
|
If we update the inference of booleans to identify series such as See PR #830 for additional context. |
[1, True, "true", "True", "yes", "t", "T"]
[0, False, "false", "False", "no", "f", "F"]
The text was updated successfully, but these errors were encountered: