-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Fix issue 85 #91
base: master
Are you sure you want to change the base?
Conversation
This pull request introduces 1 alert and fixes 1 when merging 4d5d77f into f47716c - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 1 when merging 6bf2ca9 into f47716c - view on LGTM.com new alerts:
fixed alerts:
|
""" | ||
Transform 'object'-typed column values to appropriate types. | ||
|
||
This static method analyzes the DataFrame ``df`` columns that have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This static method analyzes the DataFrame ``df`` columns that have | |
Analyze the DataFrame ``df`` columns that have |
The method is meant to convert 'object' typed column into mixed typed data
This is to make the inferred type consistent with the CSV file reading
Moved type_checking imports to feature_operations, and used direct import in dataset. This is later used to instantiate FeatureOperation in Dataset class methods.
Added a separate convert_to_mixed_type module to avoid import loops. Created a related FeatureOperation for _DfConvertToMixedType class, that is tracked in dataset history
Added methods to control the column dtype conversion when the column contains values with only one type
The new approach of _convert_to_boolean_mixed_types is more similar to the other method _convert_to_numeric_mixed_types
Made the method _data_to_mixed_types static
Now the conversion is just a dry run for identifying mixed type columns. This can provide hints to user but it does not modify data
Adapted the other tests to the new column
The new test checks that the new columns read from CSV file are still consistent, even with small errors. More tests are needed
Added some missing comments
Fixed ReplaceStrings and ReplaceSubstrings typechecking with the new way
8c1b559
to
dff574d
Compare
This pull request introduces 1 alert and fixes 1 when merging dff574d into f878ec1 - view on LGTM.com new alerts:
fixed alerts:
|
This is to check that the new dtype of the converted values is the most appropriate. So we check if previous conversions have led good results
Moved boolean conversion to check for boolean before converting them to numbers.
Fixed the tests by adding the new columns to the appropriate expected column type sets. Reorganized the definition of the sets for differently typed features
This pull request introduces 1 alert and fixes 1 when merging a2d1fff into f878ec1 - view on LGTM.com new alerts:
fixed alerts:
|
The refactoring was aimed at making the class more clear, flexible and mostly testable. Another _MixedColumn is added to separate methods and attributes with different purposes.
Modified the column conversion considering the dtype the user selected. Added few comments
Now add_converted_values method can return a value if inplace=False Added the required deepcopy method for doing so.
Now the method _updated_dtype does not automatically update the private attribute dtype, but it is only returned and then set by the caller
This pull request introduces 1 alert and fixes 1 when merging aa697a2 into f878ec1 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request introduces 1 alert and fixes 1 when merging 30acbc4 into f878ec1 - view on LGTM.com new alerts:
fixed alerts:
|
This pull request fixes 2 alerts when merging f5ab10d into f878ec1 - view on LGTM.com fixed alerts:
|
Added dataset method to analyze and convert (if possible) the columns with dtype='object' when the dataset is read