Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi-output regression support for CascadeForestRegressor #40

Merged
merged 5 commits into from
Feb 22, 2021

Conversation

Alex-Medium
Copy link
Contributor

No description provided.

@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

Thanks for your PR @Alex-Medium, I will take a careful look soon.

@xuyxu
Copy link
Member

xuyxu commented Feb 21, 2021

cc @tczhao

This PR extends your contributions on the CascadeForestRegressor, will appreciate it very much if you can leave some comments on the code ;-)

@tczhao
Copy link
Contributor

tczhao commented Feb 21, 2021

LGTM

@tczhao
Copy link
Contributor

tczhao commented Feb 21, 2021

It'd be good if we can add tests that don't use the 'use_predictor' in tests/test_model_{type}.py

@Alex-Medium
Copy link
Contributor Author

It'd be good if we can add tests that don't use the 'use_predictor' in tests/test_model_{type}.py

Thanks zhao @tczhao, I will work on this.

Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Alex-Medium, the codes look good. I have an additional suggestion, please refer to the comment for details.

@xuyxu
Copy link
Member

xuyxu commented Feb 22, 2021

Thanks @Alex-Medium, we are closer to the merge ;-). In addition, could you modify the docstrings of CascadeForestRegressor, here is a good example: RandomForestRegressor.

Copy link
Member

@xuyxu xuyxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@xuyxu
Copy link
Member

xuyxu commented Feb 22, 2021

Looks all good. I will merge this PR after conducting some experiments.

@LAMDA-NJU LAMDA-NJU deleted a comment from allcontributors bot Feb 22, 2021
@xuyxu
Copy link
Member

xuyxu commented Feb 22, 2021

@all-contributors please add @Alex-Medium for code test

@allcontributors
Copy link
Contributor

@xuyxu

I've put up a pull request to add @Alex-Medium! 🎉

@xuyxu
Copy link
Member

xuyxu commented Feb 22, 2021

Testing mse on Sarcos dataset:

  • deep forest: 0.44325 (2 layers) | 0.35531 (1 layer)
  • Random forest: 0.39799
  • XGB-EXACT + Scikit-Learn.MultiOutputRegressor: 1.37665
  • XGB-HIST + Scikit-Learn.MultiOutputRegressor: 1.34101
  • LightGBM + Scikit-Learn.MultiOutputRegressor: 2.62116

The experiment result looks promising, at least much better than GBDTs. However, multiple cascade layers seem to deteriorate the performance on this dataset, maybe a furture work to work on.

import scipy.io as scio
from sklearn.metrics import mean_squared_error

from deepforest import CascadeForestRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
from xgboost.sklearn import XGBRegressor
from lightgbm import LGBMRegressor


if __name__ == "__main__":
    
    train = scio.loadmat("sarcos_inv.mat")["sarcos_inv"]
    test = scio.loadmat("sarcos_inv_test.mat")["sarcos_inv_test"]

    X_train, y_train = train[:, :21], train[:, 21:]
    X_test, y_test = test[:, :21], test[:, 21:]

    model = CascadeForestRegressor(n_jobs=-1, verbose=2, random_state=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))
    
    model = RandomForestRegressor(n_estimators=800, n_jobs=-1, random_state=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = XGBRegressor(n_estimators=800 // 7,
                          objective="reg:squarederror",
                          tree_method='exact',
                          n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = XGBRegressor(n_estimators=800 // 7,
                          objective="reg:squarederror",
                          tree_method='hist',
                          n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = LGBMRegressor(boosting_type='gbdt',
                            n_estimators=800 // 7,
                            n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

@xuyxu xuyxu merged commit 2189a9b into LAMDA-NJU:master Feb 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants