Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix RF slowdown with n_jobs=-1 #206

Merged
merged 2 commits into from
Nov 12, 2019
Merged

Fix RF slowdown with n_jobs=-1 #206

merged 2 commits into from
Nov 12, 2019

Conversation

jeremyliweishih
Copy link
Contributor

@jeremyliweishih jeremyliweishih commented Nov 12, 2019

We were missing n_jobs in the estimator under RFRegressorSelectFromModel. Also changed n_jobs=-1 for all defaults for good measure. Integration test results below:

py37 Comparison:

Current branch:

Using random seed: 0
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 10 pipelines.
Possible model types: linear_model, random_forest

✔ Random Forest Regressor w/ One Hot ...     0%|          | Elapsed:00:49
✔ Random Forest Regressor w/ One Hot ...    10%|█         | Elapsed:01:08
▹ Linear Regressor w/ One Hot Encoder...    20%|██        | Elapsed:01:08Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
✔ Linear Regressor w/ One Hot Encoder...    20%|██        | Elapsed:01:11
✔ Random Forest Regressor w/ One Hot ...    30%|███       | Elapsed:01:34
✔ Random Forest Regressor w/ One Hot ...    40%|████      | Elapsed:02:02
✔ Random Forest Regressor w/ One Hot ...    50%|█████     | Elapsed:02:32
✔ Random Forest Regressor w/ One Hot ...    60%|██████    | Elapsed:03:45
✔ Random Forest Regressor w/ One Hot ...    70%|███████   | Elapsed:04:43
✔ Random Forest Regressor w/ One Hot ...    80%|████████  | Elapsed:04:58
▹ Linear Regressor w/ One Hot Encoder...    90%|█████████ | Elapsed:04:58Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
Mean Squared Logarithmic Error cannot be used when targets contain negative values.
✔ Linear Regressor w/ One Hot Encoder...    90%|█████████ | Elapsed:05:01
✔ Linear Regressor w/ One Hot Encoder...   100%|██████████| Elapsed:05:01

✔ Optimization finished
************************
* Pipeline Description *
************************

Pipeline Name: Random Forest Regressor w/ One Hot Encoder + Simple Imputer + RF Regressor Select From Model
Model type: ModelTypes.RANDOM_FOREST
Objective: R2 (greater is better)
Total training time (including CV): 73.5 seconds

Parameters
==========
• n_estimators: 926
• max_depth: 20
• impute_strategy: median
• percent_features: 0.957583607363516

Cross Validation
=================
               R2        MAE              MSE  MSLE  MedianAE    MaxError  ExpVariance # Training # Testing
0           0.813 106669.717 105303801733.075 0.038 57287.997 7653940.019        0.813   4666.000  2334.000
1           0.846  90229.448  35308763568.433 0.034 57907.394 5485624.217        0.846   4667.000  2333.000
2           0.893  97391.557  30452872685.617 0.043 63516.589 2127794.844        0.893   4667.000  2333.000
mean        0.851  98096.907  57021812662.375 0.039 59570.660 5089119.694        0.851          -         -
std         0.040   8242.800  41883860501.796 0.004  3431.280 2784327.926        0.040          -         -
coef of var 0.047      0.084            0.735 0.113     0.058       0.547        0.047          -         -

v0.5.0:

Using random seed: 0
*****************************
* Beginning pipeline search *
*****************************

Optimizing for R2. Greater score is better.

Searching up to 10 pipelines. No time limit is set. Set one using max_time parameter.

Possible model types: random_forest

✔ Random Forest w/ imputation:               0%|          | Elapsed:00:40
✔ Random Forest w/ imputation:              10%|█         | Elapsed:00:58
✔ Random Forest w/ imputation:              20%|██        | Elapsed:01:21
✔ Random Forest w/ imputation:              30%|███       | Elapsed:01:51
✔ Random Forest w/ imputation:              40%|████      | Elapsed:02:24
✔ Random Forest w/ imputation:              50%|█████     | Elapsed:03:35
✔ Random Forest w/ imputation:              60%|██████    | Elapsed:04:33
✔ Random Forest w/ imputation:              70%|███████   | Elapsed:04:45
✔ Random Forest w/ imputation:              80%|████████  | Elapsed:04:55
✔ Random Forest w/ imputation:              90%|█████████ | Elapsed:05:26
✔ Random Forest w/ imputation:             100%|██████████| Elapsed:05:26/home/ubuntu/evalml-integration-tests/.tox/py37/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
  warnings.warn(msg, category=DeprecationWarning)


✔ Optimization finished
************************
* Pipeline Description *
************************

Pipeline Name: Random Forest w/ imputation
Model type: ModelTypes.RANDOM_FOREST
Objective: R2 (greater is better)
Total training time (including CV): 70.9 seconds

Parameters
==========
• n_estimators: 926
• max_depth: 20
• impute_strategy: median
• percent_features: 0.957583607363516

Cross Validation
=================
               R2        MAE              MSE  MSLE  MedianAE    MaxError  ExpVariance # Training # Testing
0           0.813 106669.717 105303801733.075 0.038 57287.997 7653940.019        0.813   4666.000  2334.000
1           0.846  90229.448  35308763568.433 0.034 57907.394 5485624.217        0.846   4667.000  2333.000
2           0.893  97391.557  30452872685.617 0.043 63516.589 2127794.844        0.893   4667.000  2333.000
mean        0.851  98096.907  57021812662.375 0.039 59570.660 5089119.694        0.851          -         -
std         0.040   8242.800  41883860501.796 0.004  3431.280 2784327.926        0.040          -         ```

[n_jobs_logs.txt](https://github.com/FeatureLabs/evalml/files/3837315/n_jobs_logs.txt)

@codecov
Copy link

codecov bot commented Nov 12, 2019

Codecov Report

Merging #206 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #206   +/-   ##
=======================================
  Coverage   96.73%   96.73%           
=======================================
  Files          91       91           
  Lines        2298     2298           
=======================================
  Hits         2223     2223           
  Misses         75       75
Impacted Files Coverage Δ
...feature_selection/rf_regressor_feature_selector.py 100% <ø> (ø) ⬆️
evalml/pipelines/classification/random_forest.py 100% <ø> (ø) ⬆️
...eature_selection/rf_classifier_feature_selector.py 100% <ø> (ø) ⬆️
evalml/tests/pipeline_tests/test_pipelines.py 100% <ø> (ø) ⬆️
evalml/pipelines/regression/random_forest.py 100% <ø> (ø) ⬆️
evalml/pipelines/classification/xgboost.py 100% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 33c310b...4e61a1b. Read the comment docs.

Copy link
Contributor

@angela97lin angela97lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Glad you found the bug :D

Copy link
Contributor

@kmax12 kmax12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. LGTM

@jeremyliweishih jeremyliweishih merged commit 116392b into master Nov 12, 2019
jeremyliweishih added a commit that referenced this pull request Nov 13, 2019
@angela97lin angela97lin mentioned this pull request Nov 15, 2019
@dsherry dsherry deleted the njobs-fix branch May 26, 2020 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants