Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] MNMG RF broadcast feature #3349

Merged
merged 23 commits into from
Mar 25, 2021

Conversation

viclafargue
Copy link
Contributor

Answers #3343 and #3342

This will provide the following new features to MNMG RF:

  • Possibility to train on the whole dataset (the workers will receive the full dataset)
  • Possibility to avoid the transfer of the model before inference, instead workers will perform partial inference with trees at disposal and the results will be reduced on the client side. This will be particularly helpful when the dataset is actually smaller than the model to be transfered.

@viclafargue viclafargue requested a review from a team as a code owner January 6, 2021 17:46
@viclafargue viclafargue changed the title [ENH] MNMG RF broadcast feature [WIP] MNMG RF broadcast feature Jan 6, 2021
@hcho3
Copy link
Contributor

hcho3 commented Jan 6, 2021

@viclafargue Thanks for working on this. I will make sure to review once it's marked ready for review. Also, let me know if you'd like some early feedback.

@viclafargue viclafargue changed the title [WIP] MNMG RF broadcast feature [REVIEW] MNMG RF broadcast feature Jan 11, 2021
@viclafargue viclafargue added the 3 - Ready for Review Ready for review by team label Jan 11, 2021
@viclafargue
Copy link
Contributor Author

viclafargue commented Jan 11, 2021

Thanks @hcho3! I think that it should be ready for a first review/discussion. For now, the broadcast feature is implemented for training, but only partially implemented for inference (predict but not predict_proba). Reducing results produced by different sets of trees may not produce the same results as performing the inference with the full model. That's why I'm a bit dubious about the validity of the predictions with the broadcast feature enabled. Maybe have you some tips to get a more faithful results during the reduction operation? Also, I'm wondering how to implement the reduction for predict_proba.

Additionally, is it preferable to perform the reduction in a distributed or local fashion? Some of the Dask features I am using only work with host arrays. Maybe a better solution would be to assume there's enough device space on the client and to perform the reduction there with GPU acceleration.

@dantegd dantegd requested a review from hcho3 January 11, 2021 21:04
@JohnZed JohnZed requested review from JohnZed and removed request for hcho3 January 14, 2021 17:14
Copy link
Contributor

@drobison00 drobison00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

Have you tested in a MNMG environment?

Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The train code looks really nice and clean. Per discussion, I think it's worth moving the prediction code over to do the predict_proba+average+vote for classification to ensure no change in prediction. I'd really hope we can get it all onto gpu too with dask.array/cupy arrays... if mean isn't working on gpu for the reduction you need, maybe worth conversing with the dask team.

python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
def _func_predict(model, input_data, **kwargs):
X = concatenate(input_data)
with cuml.using_output_type("numpy"):
prediction = model.predict(X, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, for classification, this should probably be predict_proba and then we take the per-class average (maybe make a 3d array of nrowsnclassesnworkers) and average over the 3rd dim. We'd also want to weight by the ntrees per worker, since they could differ a bit (e.g. training 101 trees on 10 workers would lead to an extra tree on 1). Same approach (weighted average reduction) should work for regression, just working like nclasses=1.

python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/randomforestclassifier.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/randomforestregressor.py Outdated Show resolved Hide resolved
@viclafargue
Copy link
Contributor Author

Looks reasonable to me.

Have you tested in a MNMG environment?

Thanks, only MG for now. I need to do this.

@github-actions github-actions bot added the Cython / Python Cython or Python issue label Jan 21, 2021
@viclafargue viclafargue added 4 - Waiting on Reviewer Waiting for reviewer to review or respond improvement Improvement / enhancement to an existing function non-breaking Non-breaking change 4 - Waiting on Author Waiting for author to respond to review and removed 3 - Ready for Review Ready for review by team 4 - Waiting on Reviewer Waiting for reviewer to review or respond labels Jan 27, 2021
@drobison00
Copy link
Contributor

@viclafargue Looks good to me.

@viclafargue
Copy link
Contributor Author

@viclafargue Looks good to me.

Thank you for the review! I'll test the PR on MNMG settings. Unfortunately, it's a little bit hard to setup one at the moment because of technical problems. I'll keep you updated.

@JohnZed JohnZed added this to PR-WIP in v0.18 Release via automation Feb 8, 2021
@viclafargue viclafargue changed the base branch from branch-0.18 to branch-0.19 February 9, 2021 09:34
@JohnZed JohnZed removed this from PR-WIP in v0.18 Release Feb 9, 2021
@JohnZed JohnZed added this to PR-WIP in v0.19 Release via automation Feb 9, 2021
@viclafargue viclafargue requested review from a team as code owners March 5, 2021 16:49
@codecov-io
Copy link

Codecov Report

Merging #3349 (7cc2495) into branch-0.19 (6dfff66) will increase coverage by 1.67%.
The diff coverage is 95.24%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #3349      +/-   ##
===============================================
+ Coverage        79.21%   80.89%   +1.67%     
===============================================
  Files              226      227       +1     
  Lines            17900    17796     -104     
===============================================
+ Hits             14180    14396     +216     
+ Misses            3720     3400     -320     
Flag Coverage Δ
dask 45.49% <75.17%> (+1.74%) ⬆️
non-dask 72.90% <82.42%> (+1.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/dask/cluster/dbscan.py 97.29% <ø> (+0.07%) ⬆️
python/cuml/dask/linear_model/elastic_net.py 100.00% <ø> (ø)
python/cuml/datasets/arima.pyx 97.56% <ø> (ø)
python/cuml/datasets/blobs.py 77.27% <ø> (ø)
python/cuml/datasets/regression.pyx 98.11% <ø> (ø)
python/cuml/decomposition/incremental_pca.py 94.70% <ø> (ø)
python/cuml/feature_extraction/_tfidf.py 94.73% <ø> (+0.11%) ⬆️
python/cuml/metrics/pairwise_distances.pyx 98.83% <ø> (ø)
python/cuml/neighbors/__init__.py 100.00% <ø> (ø)
python/cuml/preprocessing/LabelEncoder.py 94.73% <ø> (ø)
... and 94 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bacb05e...7cc2495. Read the comment docs.

@JohnZed JohnZed self-assigned this Mar 16, 2021
@JohnZed JohnZed added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Mar 16, 2021
v0.19 Release automation moved this from PR-WIP to PR-Needs review Mar 17, 2021
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good... I just had a few suggestions then will approve quick. The weighting scheme via reduction looks good, but it may be nonobvious to code-readers, so it'd be good to beef up the comments there a bit. Small test suggestions too. Otherwise great!

python/cuml/dask/ensemble/base.py Show resolved Hide resolved
python/cuml/dask/ensemble/base.py Outdated Show resolved Hide resolved
python/cuml/dask/ensemble/randomforestregressor.py Outdated Show resolved Hide resolved
python/cuml/test/dask/test_random_forest.py Outdated Show resolved Hide resolved
python/cuml/test/dask/test_random_forest.py Show resolved Hide resolved
Copy link
Contributor

@JohnZed JohnZed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, Victor!

v0.19 Release automation moved this from PR-Needs review to PR-Reviewer approved Mar 25, 2021
@JohnZed
Copy link
Contributor

JohnZed commented Mar 25, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 9244037 into rapidsai:branch-0.19 Mar 25, 2021
v0.19 Release automation moved this from PR-Reviewer approved to Done Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
v0.19 Release
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

5 participants