Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Preprocessing out of experimental #3676

Merged

Conversation

viclafargue
Copy link
Contributor

Closes #3483.

This PR removes unnecessary thirdparty code, tests most thirdparty and adapter functions and moves preprocessing out of the experimental namespace. Preprocessing models testing will also be strengthen. Finally, the preprocessing models will keep being accessible from the experimental namespace at least for the 0.19 release.

@viclafargue viclafargue requested a review from a team as a code owner March 31, 2021 16:00
@viclafargue viclafargue added the 2 - In Progress Currenty a work in progress label Mar 31, 2021
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Mar 31, 2021
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the expanded tests and new namespace! Just a few suggestions and mostly questions.

python/cuml/test/test_adapters.py Outdated Show resolved Hide resolved
python/cuml/test/test_adapters.py Show resolved Hide resolved
python/cuml/test/test_thirdparty.py Show resolved Hide resolved
python/cuml/test/test_thirdparty.py Show resolved Hide resolved
@JohnZed JohnZed added this to PR-WIP in v0.19 Release via automation Apr 1, 2021
@viclafargue viclafargue added 3 - Ready for Review Ready for review by team 2 - In Progress Currenty a work in progress and removed 2 - In Progress Currenty a work in progress 3 - Ready for Review Ready for review by team labels Apr 1, 2021
@JohnZed
Copy link
Contributor

JohnZed commented Apr 5, 2021

@viclafargue Per other discussions, if we can get the random-generators to all use the method where the seed used is random but reported in the logs, I think this will be good to go. (Also need to fix the merge conflicts of course)

@viclafargue viclafargue added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed 2 - In Progress Currenty a work in progress labels Apr 6, 2021
@codecov-io
Copy link

Codecov Report

Merging #3676 (83e9b9b) into branch-0.19 (fd9ec89) will increase coverage by 5.18%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #3676      +/-   ##
===============================================
+ Coverage        80.70%   85.89%   +5.18%     
===============================================
  Files              227      225       -2     
  Lines            17615    17096     -519     
===============================================
+ Hits             14217    14684     +467     
+ Misses            3398     2412     -986     
Flag Coverage Δ
dask 49.27% <ø> (+4.28%) ⬆️
non-dask 77.80% <ø> (+4.89%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/__init__.py 95.58% <ø> (+0.20%) ⬆️
...cuml/_thirdparty/sklearn/preprocessing/__init__.py 100.00% <ø> (ø)
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py 64.52% <ø> (+1.41%) ⬆️
...hirdparty/sklearn/preprocessing/_discretization.py 83.33% <ø> (-0.88%) ⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py 85.54% <ø> (+22.74%) ⬆️
python/cuml/_thirdparty/sklearn/utils/extmath.py 97.22% <ø> (+40.32%) ⬆️
...cuml/_thirdparty/sklearn/utils/skl_dependencies.py 80.00% <ø> (+26.07%) ⬆️
...thon/cuml/_thirdparty/sklearn/utils/sparsefuncs.py 90.90% <ø> (+48.32%) ⬆️
...ython/cuml/_thirdparty/sklearn/utils/validation.py 64.10% <ø> (+41.65%) ⬆️
python/cuml/benchmark/algorithms.py 94.38% <ø> (ø)
... and 188 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9feecfb...83e9b9b. Read the comment docs.

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usages look good... would be nice if in the future we can automate a bit more so we just use the fixture and don't have to manually set the seed each time, but should be ok for release. Just want to make the small change to read seed from environment if available and then we're good to go!

python/cuml/test/conftest.py Outdated Show resolved Hide resolved
v0.19 Release automation moved this from PR-WIP to PR-Needs review Apr 6, 2021
@JohnZed JohnZed changed the title [WIP] Preprocessing out of experimental [REVIEW] Preprocessing out of experimental Apr 6, 2021
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more tiny thing - can you add a row for preprocessing to the list of algorithms in the README? We don't need to list all of them, just something like:

Preprocessing | Scikit-learn compatible preprocessing including imputation, normalization, etc. |

@viclafargue
Copy link
Contributor Author

viclafargue commented Apr 7, 2021

Thanks for the review. @lowener has some changes that would be interesting to add to this PR before merging it. @lowener changes are now merged. Changes are visible here.

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for the additional test too, @lowener

v0.19 Release automation moved this from PR-Needs review to PR-Reviewer approved Apr 7, 2021
@JohnZed
Copy link
Contributor

JohnZed commented Apr 7, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 58c4ef5 into rapidsai:branch-0.19 Apr 7, 2021
v0.19 Release automation moved this from PR-Reviewer approved to Done Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
v0.19 Release
  
Done
Development

Successfully merging this pull request may close these issues.

[FEA] Preprocessing module out of experimental
4 participants