Fix #970, sort feature types #975

mfeurer · 2020-10-07T15:20:55Z

Closes #970

When encoding a pandas array in autosklearn.data.validator, the columns are re-ordered by the ColumnTransformer. This PR re-orders the feature types so that when passing the data to the actual ML pipeline, columns and feature types are sorted the same way.

franchuterivera · 2020-10-07T16:38:45Z

I would have expected this check to catch this:

auto-sklearn/test/test_automl/test_estimators.py

Line 493 in 50e9432

def test_classification_pandas_support(self):

Maybe it makes sense to consider having 1558 there (dataset 2 did not seem to catch it ) and covertype categories seem pretty decent?

V1      float64
V2     category
V3     category
V4     category
V5     category
V6      float64
V7     category
V8     category
V9     category
V10     float64
V11    category
V12     float64
V13     float64
V14     float64
V15     float64
V16    category
dtype: object

codecov-io · 2020-10-07T17:19:44Z

Codecov Report

❗ No coverage uploaded for pull request base (development@49b3750). Click here to learn what that means.
The diff coverage is 90.00%.

@@              Coverage Diff               @@
##             development     #975   +/-   ##
==============================================
  Coverage               ?   85.32%           
==============================================
  Files                  ?      131           
  Lines                  ?     9834           
  Branches               ?        0           
==============================================
  Hits                   ?     8391           
  Misses                 ?     1443           
  Partials               ?        0

Impacted Files	Coverage Δ
autosklearn/data/validation.py	`88.29% <90.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 49b3750...50e9432. Read the comment docs.

mfeurer · 2020-10-08T09:03:54Z

That's a good suggestion, however, dataset 1558 doesn't have missing values. I just checked OpenML for datasets with nominal values, missing values, negative real-values, classification as target and being small, but didn't find any.

Therefore, I suggest keeping the test case as it is as we have fixed this issue and added a test at the correct part in the code.

Fix #970, sort feature types

50e9432

When encoding a pandas array in autosklearn.data.validator, the columns are re-ordered by the ColumnTransformer. This PR re-orders the feature types so that when passing the data to the actual ML pipeline, columns and feature types are sorted the same way.

mfeurer requested a review from franchuterivera October 7, 2020 15:20

mfeurer mentioned this pull request Oct 7, 2020

Fit fails when input data has categorical columns #970

Closed

franchuterivera merged commit 08d099b into development Oct 8, 2020

franchuterivera deleted the fix_970 branch October 8, 2020 18:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #970, sort feature types #975

Fix #970, sort feature types #975

mfeurer commented Oct 7, 2020

franchuterivera commented Oct 7, 2020

codecov-io commented Oct 7, 2020 •

edited

mfeurer commented Oct 8, 2020

Fix #970, sort feature types #975

Fix #970, sort feature types #975

Conversation

mfeurer commented Oct 7, 2020

franchuterivera commented Oct 7, 2020

codecov-io commented Oct 7, 2020 • edited

Codecov Report

mfeurer commented Oct 8, 2020

codecov-io commented Oct 7, 2020 •

edited