Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #970, sort feature types #975

Merged
merged 1 commit into from
Oct 8, 2020
Merged

Fix #970, sort feature types #975

merged 1 commit into from
Oct 8, 2020

Conversation

mfeurer
Copy link
Contributor

@mfeurer mfeurer commented Oct 7, 2020

Closes #970

When encoding a pandas array in autosklearn.data.validator, the columns are re-ordered by the ColumnTransformer. This PR re-orders the feature types so that when passing the data to the actual ML pipeline, columns and feature types are sorted the same way.

When encoding a pandas array in autosklearn.data.validator,
the columns are re-ordered by the ColumnTransformer. This PR
re-orders the feature types so that when passing the data to
the actual ML pipeline, columns and feature types are sorted
the same way.
@franchuterivera
Copy link
Contributor

I would have expected this check to catch this:

def test_classification_pandas_support(self):

Maybe it makes sense to consider having 1558 there (dataset 2 did not seem to catch it ) and covertype categories seem pretty decent?

V1      float64
V2     category
V3     category
V4     category
V5     category
V6      float64
V7     category
V8     category
V9     category
V10     float64
V11    category
V12     float64
V13     float64
V14     float64
V15     float64
V16    category
dtype: object

@codecov-io
Copy link

codecov-io commented Oct 7, 2020

Codecov Report

❗ No coverage uploaded for pull request base (development@49b3750). Click here to learn what that means.
The diff coverage is 90.00%.

Impacted file tree graph

@@              Coverage Diff               @@
##             development     #975   +/-   ##
==============================================
  Coverage               ?   85.32%           
==============================================
  Files                  ?      131           
  Lines                  ?     9834           
  Branches               ?        0           
==============================================
  Hits                   ?     8391           
  Misses                 ?     1443           
  Partials               ?        0           
Impacted Files Coverage Δ
autosklearn/data/validation.py 88.29% <90.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49b3750...50e9432. Read the comment docs.

@mfeurer
Copy link
Contributor Author

mfeurer commented Oct 8, 2020

That's a good suggestion, however, dataset 1558 doesn't have missing values. I just checked OpenML for datasets with nominal values, missing values, negative real-values, classification as target and being small, but didn't find any.

Therefore, I suggest keeping the test case as it is as we have fixed this issue and added a test at the correct part in the code.

@franchuterivera franchuterivera merged commit 08d099b into development Oct 8, 2020
@franchuterivera franchuterivera deleted the fix_970 branch October 8, 2020 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants