[REVIEW] cuML's estimator Base class for preprocessing models #3270

viclafargue · 2020-12-07T15:56:24Z

Answers #3201 .
This PR makes preprocessing models fully compliant with cuML's estimator Base class and the tagging system.

Preprocessing models were decorated with cuml_estimator and preprocessing functions with cuml_function to make use of features offered by cuML's estimator Base class. The return type of fit and transform method were specified. CumlArrayDescriptor attributes were created and the get_param_names and _more_tags methods were added when necessary.

As the SparseCumlArray class can only handle CSR matrices for now, preprocessing models will only return this type as sparse outputs.

GPUtester · 2020-12-07T15:56:54Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

dantegd

Had one question about the relationship with PR #3257

dantegd · 2020-12-07T16:49:00Z

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

        return X

    def _more_tags(self):
-        return {'allow_nan': True}
+        return {'X_types_gpu': ['2darray', 'sparse'],


@viclafargue we're in the process of making the tags system static in #3257, so depending on timing that PR will affect this one or the other way around. Do you foresee many issues arising from that change for these classes in _thirdparty/sklearn?

No problem, I'll wait for your PR. Should be fairly simple, the models inherit from the Base class. I'll just have to make _more_tags methods static everywhere.

@viclafargue I think were interested in knowing if you have any tags that are dynamic and will change from one instance to the other depending on the properties of the class. Or can all of the tags be determined in a static method.

@dantegd Look at these lines: https://github.com/rapidsai/cuml/pull/3270/files#diff-6eb3baa332b96d9c2a1a79101b54e7d74c7510d65407a7cb54db9d3606d406f6R140-R141

Sorry, I didn't knew these tags could be instance specific. ~~From what I could see, all of them seems to be class specific (static) for preprocessing.~~ After closer look, there seems to be at least one occurrence of instance-specific tag.

which tag would be instance specific?

ah I see it, let me think on on it for a second, have a couple of ideas

Since this is using the AllowNaNTagMixin already, instead of defining _more_tags, you can just add

cuml/python/cuml/common/mixins.py

Line 306 in c4c4068

class SparseInputTagMixin:

mdemoret-nv

Looking at this PR I think the use of CumlArrayDescriptor looks pretty good but I have some concerns about the use of decorators and the class inheritance. In order to work seamlessly with the descriptors/decorators added in 0.17, this will need some significant changes to the architecture (the ESTIMATOR_GUIDE.md might be helpful).

Before approving this PR or make any suggestions, I would prefer to discuss the design decisions with Victor to understand the motivation first and then do another review.

python/cuml/_thirdparty/sklearn/preprocessing/__init__.py

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

python/cuml/thirdparty_adapters/adapters.py

python/cuml/_thirdparty/sklearn/preprocessing/_imputation.py

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

mdemoret-nv · 2020-12-08T23:16:50Z

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

        return X

    def _more_tags(self):
-        return {'allow_nan': True}
+        return {'X_types_gpu': ['2darray', 'sparse'],


@dantegd Look at these lines: https://github.com/rapidsai/cuml/pull/3270/files#diff-6eb3baa332b96d9c2a1a79101b54e7d74c7510d65407a7cb54db9d3606d406f6R140-R141

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

dantegd

Sorry for the delay on my review @viclafargue

dantegd · 2021-03-16T22:28:20Z

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

        return X

    def _more_tags(self):
-        return {'allow_nan': True}
+        return {'X_types_gpu': ['2darray', 'sparse'],


Since this is using the AllowNaNTagMixin already, instead of defining _more_tags, you can just add

cuml/python/cuml/common/mixins.py

Line 306 in c4c4068

class SparseInputTagMixin:

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

python/cuml/_thirdparty/sklearn/preprocessing/_discretization.py

python/cuml/_thirdparty/sklearn/preprocessing/_imputation.py

python/cuml/common/base.pyx

python/cuml/test/test_preproc_utils.py

JohnZed

(Will give feedback on the rest tomorrow - just need to think on it a bit but wanted to add this comment first)

I see the TODOs about preserving order. Is this something that should really be a generic feature of the base class output conversion? Maybe some other transform-type models need this?

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

JohnZed

Apologies for being so slow! I REALLY debated whether there was another approach that would reduce the delta in the _thirdparty_dependencies section, but in the end I believe you found the best solution so we should move forward with this PR.

My one caveat overall is that we should make sure we're testing the various different sparse matrix formats that are supported... in a couple of cases, I think we may not be testing them all (noted in comments)

python/cuml/test/test_preprocessing.py

python/cuml/test/test_preproc_utils.py

python/cuml/test/test_base.py

python/cuml/common/array_sparse.py

python/cuml/_thirdparty/sklearn/preprocessing/_imputation.py

python/cuml/_thirdparty/sklearn/preprocessing/_discretization.py

python/cuml/test/test_preprocessing.py

python/cuml/_thirdparty/sklearn/preprocessing/_data.py

dantegd

Changes look good from my review pov

JohnZed · 2021-03-25T22:24:00Z

Looks great!

Outdated review from several months ago.

codecov-io · 2021-03-26T16:15:55Z

Codecov Report

Merging #3270 (d199019) into branch-0.19 (c2f246a) will increase coverage by 1.36%.
The diff coverage is 87.22%.

@@               Coverage Diff               @@
##           branch-0.19    #3270      +/-   ##
===============================================
+ Coverage        80.87%   82.23%   +1.36%     
===============================================
  Files              228      226       -2     
  Lines            17630    17480     -150     
===============================================
+ Hits             14258    14375     +117     
+ Misses            3372     3105     -267

Flag	Coverage Δ
dask	`46.37% <1.66%> (+1.41%)`	⬆️
non-dask	`74.17% <87.22%> (+1.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...cuml/_thirdparty/sklearn/preprocessing/__init__.py	`100.00% <ø> (ø)`
python/cuml/thirdparty_adapters/adapters.py	`92.08% <ø> (+3.08%)`	⬆️
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py	`64.65% <81.45%> (+1.54%)`	⬆️
...hirdparty/sklearn/preprocessing/_discretization.py	`83.59% <100.00%> (-0.62%)`	⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py	`64.54% <100.00%> (+1.74%)`	⬆️
...cuml/_thirdparty/sklearn/utils/skl_dependencies.py	`80.00% <100.00%> (+25.09%)`	⬆️
python/cuml/common/array_sparse.py	`96.29% <100.00%> (+1.95%)`	⬆️
python/cuml/internals/api_context_managers.py	`93.61% <100.00%> (+0.13%)`	⬆️
python/cuml/thirdparty_adapters/__init__.py	`100.00% <100.00%> (ø)`
python/cuml/_thirdparty/sklearn/utils/_pprint.py	`0.00% <0.00%> (-27.54%)`	⬇️
... and 52 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c2f246a...d199019. Read the comment docs.

JohnZed · 2021-03-29T16:29:02Z

@gpucibot merge

viclafargue added 6 commits December 4, 2020 16:51

cuML Base estimator class for preprocessing models

94d845f

cuml_function decorator

653b9cf

Cleaning code

4bf0c2d

Adding missing tags

d8b56bc

Merge branch 'branch-0.18' into fea-cuml-base-for-preproc

3545bfb

Changelog update

d64c1d1

viclafargue requested a review from a team as a code owner December 7, 2020 15:56

viclafargue requested review from wphicks and mdemoret-nv December 7, 2020 15:59

viclafargue added breaking Breaking change improvement Improvement / enhancement to an existing function labels Dec 7, 2020

Check style

ef74d52

dantegd reviewed Dec 7, 2020

View reviewed changes

Updated adapters testing

ff1f419

mdemoret-nv previously requested changes Dec 8, 2020

View reviewed changes

viclafargue added 10 commits December 9, 2020 13:17

Relative imports

829eabc

Remove cuml_function

4bcae89

Remove cuml_estimator

ca93dc5

Update get_param_names

dba3f37

Update __init_subclass__

4ed8be3

Update get_param_names

4b6bda4

Exemption for docstring

031ecf4

Restoring KernelCenterer, QuantileTransformer, and PowerTransformer code

f25ab52

Update coding style

42b6475

Merge branch 'branch-0.18' into fea-cuml-base-for-preproc

5827536

viclafargue requested a review from a team as a code owner December 29, 2020 10:40

viclafargue added 2 commits December 30, 2020 17:14

Merge branch 'branch-0.18' into fea-cuml-base-for-preproc

96c64d4

Tags deepcopy fix

0e467f8

viclafargue force-pushed the fea-cuml-base-for-preproc branch from 7023c85 to 0e467f8 Compare December 30, 2020 17:26

Remove changelog entry

7e15903

viclafargue removed the 0 - Blocked Cannot progress due to external reasons label Mar 5, 2021

Merge branch 'branch-0.19' into fea-cuml-base-for-preproc

ec001b5

v0.19 Release automation moved this from PR-WIP to PR-Needs review Mar 16, 2021

dantegd requested changes Mar 16, 2021

View reviewed changes

viclafargue added 3 commits March 17, 2021 11:02

Requested changes

8603650

Merge branch 'branch-0.19' into fea-cuml-base-for-preproc

dd09165

Requested changes (2/2)

c30722e

JohnZed self-assigned this Mar 18, 2021

JohnZed mentioned this pull request Mar 18, 2021

[REVIEW] SimpleImputer fix #3624

Merged

JohnZed reviewed Mar 19, 2021

View reviewed changes

python/cuml/_thirdparty/sklearn/preprocessing/_data.py Show resolved Hide resolved

Merge branch 'branch-0.19' into fea-cuml-base-for-preproc

26c2ddd

JohnZed requested changes Mar 24, 2021

View reviewed changes

Catch all fetch_20newsgroups exceptions

fd5e73c

viclafargue force-pushed the fea-cuml-base-for-preproc branch from 870a942 to fd5e73c Compare March 24, 2021 09:50

viclafargue added 2 commits March 25, 2021 18:04

Update testing

cc67116

Remove verbose from SimpleImputer

1c5f422

dantegd approved these changes Mar 25, 2021

View reviewed changes

JohnZed approved these changes Mar 25, 2021

View reviewed changes

v0.19 Release automation moved this from PR-Needs review to PR-Reviewer approved Mar 25, 2021

Fix base testing

d199019

viclafargue added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Mar 26, 2021

rapids-bot bot merged commit d4d1bcf into rapidsai:branch-0.19 Mar 29, 2021

v0.19 Release automation moved this from PR-Reviewer approved to Done Mar 29, 2021

viclafargue mentioned this pull request May 11, 2021

[ENH] Make preprocessing models fully compliant with cuML's estimator Base class and tagging system #3201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] cuML's estimator Base class for preprocessing models #3270

[REVIEW] cuML's estimator Base class for preprocessing models #3270

viclafargue commented Dec 7, 2020 •

edited

GPUtester commented Dec 7, 2020

dantegd left a comment

dantegd Dec 7, 2020

viclafargue Dec 8, 2020

mdemoret-nv Dec 8, 2020

mdemoret-nv Dec 8, 2020

viclafargue Dec 9, 2020 •

edited

dantegd Dec 9, 2020

dantegd Dec 9, 2020

dantegd Mar 16, 2021

mdemoret-nv left a comment

mdemoret-nv Dec 8, 2020

dantegd left a comment

dantegd Mar 16, 2021

JohnZed left a comment

JohnZed left a comment

dantegd left a comment

JohnZed commented Mar 25, 2021

codecov-io commented Mar 26, 2021

JohnZed commented Mar 29, 2021

[REVIEW] cuML's estimator Base class for preprocessing models #3270

[REVIEW] cuML's estimator Base class for preprocessing models #3270

Conversation

viclafargue commented Dec 7, 2020 • edited

GPUtester commented Dec 7, 2020

dantegd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viclafargue Dec 9, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mdemoret-nv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dantegd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JohnZed left a comment

Choose a reason for hiding this comment

JohnZed left a comment

Choose a reason for hiding this comment

dantegd left a comment

Choose a reason for hiding this comment

JohnZed commented Mar 25, 2021

codecov-io commented Mar 26, 2021

Codecov Report

JohnZed commented Mar 29, 2021

viclafargue commented Dec 7, 2020 •

edited

viclafargue Dec 9, 2020 •

edited