Add fit/transform interface to the data validation #1041

franchuterivera · 2020-12-20T20:23:10Z

Creates fit/transform interface to input validator
Adds more checking thanks to pytest (we also move to pytest instead of unittest)

codecov · 2020-12-20T20:52:50Z

Codecov Report

Merging #1041 (3645162) into development (4d19551) will increase coverage by 0.20%.
The diff coverage is 95.49%.

@@               Coverage Diff               @@
##           development    #1041      +/-   ##
===============================================
+ Coverage        85.46%   85.66%   +0.20%     
===============================================
  Files              127      128       +1     
  Lines            10177    10272      +95     
===============================================
+ Hits              8698     8800     +102     
+ Misses            1479     1472       -7

Impacted Files	Coverage Δ
autosklearn/automl.py	`84.30% <83.33%> (-0.40%)`	⬇️
autosklearn/data/feature_validator.py	`96.35% <96.35%> (ø)`
autosklearn/data/target_validator.py	`96.96% <96.96%> (ø)`
autosklearn/data/validation.py	`97.14% <97.05%> (+7.74%)`	⬆️
...ature_preprocessing/select_rates_classification.py	`83.09% <0.00%> (-4.23%)`	⬇️
...ine/components/classification/gradient_boosting.py	`91.30% <0.00%> (-0.87%)`	⬇️
...eline/components/feature_preprocessing/fast_ica.py	`93.47% <0.00%> (+2.17%)`	⬆️
...ipeline/components/regression/gradient_boosting.py	`92.30% <0.00%> (+2.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d19551...3645162. Read the comment docs.

mfeurer

Hi, This looks great but there are a lot of changes to look at, so I only left a few initial comments and remarks and will have a detailed look at the new transformers afterwards.

autosklearn/data/validation.py

autosklearn/data/feature_validator.py

autosklearn/data/target_validator.py

mfeurer · 2021-01-05T09:58:20Z

I was also wondering whether score works for all metrics, especially those which require probabilities and also binary cases? But maybe this is not within the scope of this PR and needs to be considered in a separate PR?

mfeurer

Part 2 of the review. Tests are still to come.

autosklearn/data/target_validator.py

autosklearn/data/feature_validator.py

mfeurer

And one more batch of comments.

test/test_automl/test_automl.py

test/test_data/test_feature_validator.py

autosklearn/data/feature_validator.py

franchuterivera · 2021-01-08T16:26:40Z

I was also wondering whether score works for all metrics, especially those which require probabilities and also binary cases? But maybe this is not within the scope of this PR and needs to be considered in a separate PR?

Do you think you can elaborate more on this? Do you mean making sure all metrics works for all type of data?

franchuterivera · 2021-01-08T17:19:52Z

There is one thing pending that needs further discussion... which is the fact that infer objects from pandas will not make an object column with letters categorical.

Should we handle this ourselves using some heuristic?

autosklearn/data/feature_validator.py

autosklearn/data/target_validator.py

autosklearn/data/feature_validator.py

autosklearn/data/target_validator.py

mfeurer · 2021-01-08T19:33:06Z

Do you think you can elaborate more on this? Do you mean making sure all metrics works for all type of data?

Yes, that's what I meant here. The score code looks a bit broken to me at the moment.

There is one thing pending that needs further discussion... which is the fact that infer objects from pandas will not make an object column with letters categorical.

As I mentioned above, this will be handled by an upcoming PR.

autosklearn/data/feature_validator.py

autosklearn/data/target_validator.py

test/test_data/test_feature_validator.py

test/test_data/test_target_validator.py

autosklearn/data/feature_validator.py

autosklearn/data/target_validator.py

test/test_automl/test_automl.py

…lidation (#1041)

mfeurer reviewed Jan 5, 2021

View reviewed changes

mfeurer reviewed Jan 7, 2021

View reviewed changes

franchuterivera added 4 commits January 8, 2021 17:28

Initial commit for new data scheme

c41085f

New input validator schema

5911e52

Incorporate feedback from automl#1041

e3628e7

Missing feedback from automl#1041

1fa170c

franchuterivera force-pushed the data_scheme_validator branch from 98273b3 to 1fa170c Compare January 8, 2021 17:18

Deleted missing file

246dbef

mfeurer reviewed Jan 8, 2021

View reviewed changes

autosklearn/data/feature_validator.py Outdated Show resolved Hide resolved

mfeurer reviewed Jan 8, 2021

View reviewed changes

autosklearn/data/target_validator.py Show resolved Hide resolved

Merge conflict with two loggers

6da82a9

mfeurer reviewed Jan 8, 2021

View reviewed changes

autosklearn/data/feature_validator.py Outdated Show resolved Hide resolved

mfeurer reviewed Jan 8, 2021

View reviewed changes

autosklearn/data/target_validator.py Outdated Show resolved Hide resolved

mfeurer reviewed Jan 8, 2021

View reviewed changes

autosklearn/data/target_validator.py Show resolved Hide resolved

franchuterivera added 6 commits January 8, 2021 23:52

Improving coverage

dbc79a2

Inverse transform unknown handling

fa54232

Test logger client for smbo error

02b2480

Try to remove random smbo error

61bdaee

Mode debug msg capabilites for smbo

c176706

Also print root logger

9c748fb