Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fix issue 85 #91

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from
Draft

WIP: Fix issue 85 #91

wants to merge 32 commits into from

Conversation

lorenz-gorini
Copy link
Member

Added dataset method to analyze and convert (if possible) the columns with dtype='object' when the dataset is read

@lgtm-com
Copy link

lgtm-com bot commented Oct 22, 2020

This pull request introduces 1 alert and fixes 1 when merging 4d5d77f into f47716c - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

@lgtm-com
Copy link

lgtm-com bot commented Oct 22, 2020

This pull request introduces 1 alert and fixes 1 when merging 6bf2ca9 into f47716c - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

src/trousse/dataset.py Outdated Show resolved Hide resolved
src/trousse/dataset.py Outdated Show resolved Hide resolved
src/trousse/dataset.py Outdated Show resolved Hide resolved
src/trousse/feature_operations.py Outdated Show resolved Hide resolved
src/trousse/dataset.py Outdated Show resolved Hide resolved
"""
Transform 'object'-typed column values to appropriate types.

This static method analyzes the DataFrame ``df`` columns that have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This static method analyzes the DataFrame ``df`` columns that have
Analyze the DataFrame ``df`` columns that have

The method is meant to convert 'object' typed column
into mixed typed data
This is to make the inferred type consistent with the CSV file reading
Moved type_checking imports to feature_operations,
and used direct import in dataset.
This is later used to instantiate FeatureOperation
in Dataset class methods.
Added a separate convert_to_mixed_type module to avoid import loops.
Created a related FeatureOperation for _DfConvertToMixedType class,
that is tracked in dataset history
Added methods to control the column dtype conversion
when the column contains values with only one type
The new approach of _convert_to_boolean_mixed_types is more similar
to the other method _convert_to_numeric_mixed_types
Made the method _data_to_mixed_types static
Now the conversion is just a dry run for identifying mixed type columns.
This can provide hints to user but it does not modify data
Adapted the other tests to the new column
The new test checks that the new columns read from CSV file are
still consistent, even with small errors.
More tests are needed
Added some missing comments
Fixed ReplaceStrings and ReplaceSubstrings typechecking
with the new way
@lgtm-com
Copy link

lgtm-com bot commented Oct 25, 2020

This pull request introduces 1 alert and fixes 1 when merging dff574d into f878ec1 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

This is to check that the new dtype of the converted values is the most
appropriate. So we check if previous conversions have led good results
Moved boolean conversion to check for boolean before converting them
 to numbers.
Fixed the tests by adding the new columns to the appropriate
expected column type sets.
Reorganized the definition of the sets for differently typed features
@lgtm-com
Copy link

lgtm-com bot commented Oct 28, 2020

This pull request introduces 1 alert and fixes 1 when merging a2d1fff into f878ec1 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

The refactoring was aimed at making the class more clear, flexible
and mostly testable.
Another _MixedColumn is added to separate methods and attributes
with different purposes.
Modified the column conversion considering the dtype the user selected.
Added few comments
Now add_converted_values method can return a value if inplace=False
Added the required deepcopy method for doing so.
Now the method _updated_dtype does not automatically update the private
attribute dtype, but it is only returned and then set by the caller
@lgtm-com
Copy link

lgtm-com bot commented Oct 30, 2020

This pull request introduces 1 alert and fixes 1 when merging aa697a2 into f878ec1 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

@lgtm-com
Copy link

lgtm-com bot commented Oct 30, 2020

This pull request introduces 1 alert and fixes 1 when merging 30acbc4 into f878ec1 - view on LGTM.com

new alerts:

  • 1 for Module is imported with 'import' and 'import from'

fixed alerts:

  • 1 for Module is imported with 'import' and 'import from'

@lgtm-com
Copy link

lgtm-com bot commented Oct 30, 2020

This pull request fixes 2 alerts when merging f5ab10d into f878ec1 - view on LGTM.com

fixed alerts:

  • 2 for Module is imported with 'import' and 'import from'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants