Fix RF slowdown with n_jobs=-1 #206

jeremyliweishih · 2019-11-12T16:05:04Z

We were missing n_jobs in the estimator under RFRegressorSelectFromModel. Also changed n_jobs=-1 for all defaults for good measure. Integration test results below:

py37 Comparison:

Current branch:

Using random seed: 0
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 10 pipelines.
Possible model types: linear_model, random_forest

✔ Random Forest Regressor w/ One Hot ...     0%|          | Elapsed:00:49
✔ Random Forest Regressor w/ One Hot ...    10%|█         | Elapsed:01:08
▹ Linear Regressor w/ One Hot Encoder...    20%|██        | Elapsed:01:08Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
✔ Linear Regressor w/ One Hot Encoder...    20%|██        | Elapsed:01:11
✔ Random Forest Regressor w/ One Hot ...    30%|███       | Elapsed:01:34
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:02:02
✔ Random Forest Regressor w/ One Hot ...    50%|█████     | Elapsed:02:32
✔ Random Forest Regressor w/ One Hot ...    60%|██████    | Elapsed:03:45
✔ Random Forest Regressor w/ One Hot ...    70%|███████   | Elapsed:04:43
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:04:58
▹ Linear Regressor w/ One Hot Encoder...    90%|█████████ | Elapsed:04:58Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
✔ Linear Regressor w/ One Hot Encoder...    90%|█████████ | Elapsed:05:01
✔ Linear Regressor w/ One Hot Encoder...   100%|██████████| Elapsed:05:01

✔ Optimization finished
************************
* Pipeline Description *
************************

Pipeline Name: Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model
Model type: ModelTypes.RANDOM_FOREST
Objective: R2 (greater is better)
Total training time (including CV): 73.5 seconds

Parameters
==========
• n_estimators: 926
• max_depth: 20
• impute_strategy: median
• percent_features: 0.957583607363516

Cross Validation
=================
               R2        MAE              MSE  MSLE  MedianAE    MaxError  ExpVariance # Training # Testing
0           0.813 106669.717 105303801733.075 0.038 57287.997 7653940.019        0.813   4666.000  2334.000
1           0.846  90229.448  35308763568.433 0.034 57907.394 5485624.217        0.846   4667.000  2333.000
2           0.893  97391.557  30452872685.617 0.043 63516.589 2127794.844        0.893   4667.000  2333.000
mean        0.851  98096.907  57021812662.375 0.039 59570.660 5089119.694        0.851          -         -
std         0.040   8242.800  41883860501.796 0.004  3431.280 2784327.926        0.040          -         -
coef of var 0.047      0.084            0.735 0.113     0.058       0.547        0.047          -         -

v0.5.0:

Using random seed: 0
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 10 pipelines. No time limit is set. Set one using max_time parameter.

Possible model types: random_forest

✔ Random Forest w/ imputation:               0%|          | Elapsed:00:40
✔ Random Forest w/ imputation:              10%|█         | Elapsed:00:58
✔ Random Forest w/ imputation:              20%|██        | Elapsed:01:21
✔ Random Forest w/ imputation:              30%|███       | Elapsed:01:51
✔ Random Forest w/ imputation:              40%|████      | Elapsed:02:24
✔ Random Forest w/ imputation:              50%|█████     | Elapsed:03:35
✔ Random Forest w/ imputation:              60%|██████    | Elapsed:04:33
✔ Random Forest w/ imputation:              70%|███████   | Elapsed:04:45
✔ Random Forest w/ imputation:              80%|████████  | Elapsed:04:55
✔ Random Forest w/ imputation:              90%|█████████ | Elapsed:05:26
✔ Random Forest w/ imputation:             100%|██████████| Elapsed:05:26/home/ubuntu/evalml-integration-tests/.tox/py37/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)


✔ Optimization finished
************************
* Pipeline Description *
************************

Pipeline Name: Random Forest w/ imputation
Model type: ModelTypes.RANDOM_FOREST
Objective: R2 (greater is better)
Total training time (including CV): 70.9 seconds

Parameters
==========
• n_estimators: 926
• max_depth: 20
• impute_strategy: median
• percent_features: 0.957583607363516

Cross Validation
=================
               R2        MAE              MSE  MSLE  MedianAE    MaxError  ExpVariance # Training # Testing
0           0.813 106669.717 105303801733.075 0.038 57287.997 7653940.019        0.813   4666.000  2334.000
1           0.846  90229.448  35308763568.433 0.034 57907.394 5485624.217        0.846   4667.000  2333.000
2           0.893  97391.557  30452872685.617 0.043 63516.589 2127794.844        0.893   4667.000  2333.000
mean        0.851  98096.907  57021812662.375 0.039 59570.660 5089119.694        0.851          -         -
std         0.040   8242.800  41883860501.796 0.004  3431.280 2784327.926        0.040          -         ```

[n_jobs_logs.txt](https://github.com/FeatureLabs/evalml/files/3837315/n_jobs_logs.txt)

codecov · 2019-11-12T16:16:56Z

Codecov Report

Merging #206 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master     #206   +/-   ##
=======================================
  Coverage   96.73%   96.73%           
=======================================
  Files          91       91           
  Lines        2298     2298           
=======================================
  Hits         2223     2223           
  Misses         75       75

Impacted Files	Coverage Δ
...feature_selection/rf_regressor_feature_selector.py	`100% <ø> (ø)`	⬆️
evalml/pipelines/classification/random_forest.py	`100% <ø> (ø)`	⬆️
...eature_selection/rf_classifier_feature_selector.py	`100% <ø> (ø)`	⬆️
evalml/tests/pipeline_tests/test_pipelines.py	`100% <ø> (ø)`	⬆️
evalml/pipelines/regression/random_forest.py	`100% <ø> (ø)`	⬆️
evalml/pipelines/classification/xgboost.py	`100% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33c310b...4e61a1b. Read the comment docs.

angela97lin

Looks good! Glad you found the bug :D

kmax12

Good catch. LGTM

This reverts commit 116392b.

jeremyliweishih added 2 commits November 12, 2019 11:03

Fix n_jobs for SFM and defaults to -1

b5ea14c

CL

4e61a1b

jeremyliweishih requested review from kmax12, angela97lin and christopherbunn November 12, 2019 17:08

angela97lin approved these changes Nov 12, 2019

View reviewed changes

kmax12 approved these changes Nov 12, 2019

View reviewed changes

jeremyliweishih merged commit 116392b into master Nov 12, 2019

jeremyliweishih added a commit that referenced this pull request Nov 13, 2019

Revert "Fix RF slowdown with n_jobs=-1 (#206)"

f4cc431

This reverts commit 116392b.

angela97lin mentioned this pull request Nov 15, 2019

v0.5.1 #216

Merged

dsherry deleted the njobs-fix branch May 26, 2020 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RF slowdown with n_jobs=-1 #206

Fix RF slowdown with n_jobs=-1 #206

jeremyliweishih commented Nov 12, 2019 •

edited

Loading

codecov bot commented Nov 12, 2019

angela97lin left a comment

kmax12 left a comment

Fix RF slowdown with n_jobs=-1 #206

Fix RF slowdown with n_jobs=-1 #206

Conversation

jeremyliweishih commented Nov 12, 2019 • edited Loading

codecov bot commented Nov 12, 2019

Codecov Report

angela97lin left a comment

Choose a reason for hiding this comment

kmax12 left a comment

Choose a reason for hiding this comment

jeremyliweishih commented Nov 12, 2019 •

edited

Loading