Add multi-output regression support for CascadeForestRegressor #40

Alex-Medium · 2021-02-21T08:37:40Z

No description provided.

xuyxu · 2021-02-21T08:45:48Z

Thanks for your PR @Alex-Medium, I will take a careful look soon.

xuyxu · 2021-02-21T08:48:33Z

This PR extends your contributions on the CascadeForestRegressor, will appreciate it very much if you can leave some comments on the code ;-)

tczhao · 2021-02-21T14:06:00Z

LGTM

tczhao · 2021-02-21T14:06:50Z

It'd be good if we can add tests that don't use the 'use_predictor' in tests/test_model_{type}.py

Alex-Medium · 2021-02-21T15:14:20Z

It'd be good if we can add tests that don't use the 'use_predictor' in tests/test_model_{type}.py

Thanks zhao @tczhao, I will work on this.

deepforest/_layer.py

xuyxu

Hi @Alex-Medium, the codes look good. I have an additional suggestion, please refer to the comment for details.

xuyxu · 2021-02-22T11:47:19Z

Thanks @Alex-Medium, we are closer to the merge ;-). In addition, could you modify the docstrings of CascadeForestRegressor, here is a good example: RandomForestRegressor.

xuyxu

LGTM.

xuyxu · 2021-02-22T12:14:47Z

Looks all good. I will merge this PR after conducting some experiments.

xuyxu · 2021-02-22T12:17:24Z

@all-contributors please add @Alex-Medium for code test

allcontributors · 2021-02-22T12:17:33Z

@xuyxu

I've put up a pull request to add @Alex-Medium! 🎉

xuyxu · 2021-02-22T13:39:46Z

Testing mse on Sarcos dataset:

deep forest: 0.44325 (2 layers) | 0.35531 (1 layer)
Random forest: 0.39799
XGB-EXACT + Scikit-Learn.MultiOutputRegressor: 1.37665
XGB-HIST + Scikit-Learn.MultiOutputRegressor: 1.34101
LightGBM + Scikit-Learn.MultiOutputRegressor: 2.62116

The experiment result looks promising, at least much better than GBDTs. However, multiple cascade layers seem to deteriorate the performance on this dataset, maybe a furture work to work on.

import scipy.io as scio
from sklearn.metrics import mean_squared_error

from deepforest import CascadeForestRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.multioutput import MultiOutputRegressor
from xgboost.sklearn import XGBRegressor
from lightgbm import LGBMRegressor


if __name__ == "__main__":
    
    train = scio.loadmat("sarcos_inv.mat")["sarcos_inv"]
    test = scio.loadmat("sarcos_inv_test.mat")["sarcos_inv_test"]

    X_train, y_train = train[:, :21], train[:, 21:]
    X_test, y_test = test[:, :21], test[:, 21:]

    model = CascadeForestRegressor(n_jobs=-1, verbose=2, random_state=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))
    
    model = RandomForestRegressor(n_estimators=800, n_jobs=-1, random_state=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = XGBRegressor(n_estimators=800 // 7,
                          objective="reg:squarederror",
                          tree_method='exact',
                          n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = XGBRegressor(n_estimators=800 // 7,
                          objective="reg:squarederror",
                          tree_method='hist',
                          n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

    single = LGBMRegressor(boosting_type='gbdt',
                            n_estimators=800 // 7,
                            n_jobs=-1)
    model = MultiOutputRegressor(single)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("Testing MSE: {:.5f}".format(mean_squared_error(y_test, y_pred)))

add multi output support

48245ff

Update CHANGELOG.rst

28ef7f5

xuyxu reviewed Feb 22, 2021

View reviewed changes

deepforest/_layer.py Show resolved Hide resolved

xuyxu reviewed Feb 22, 2021

View reviewed changes

deepforest/_layer.py Show resolved Hide resolved

xuyxu reviewed Feb 22, 2021

View reviewed changes

Alex-Medium added 2 commits February 22, 2021 19:24

refactor the layer class

3ebcd0c

Update _layer.py

80f45fa

update docstrings

9db9ee5

xuyxu reviewed Feb 22, 2021

View reviewed changes

LAMDA-NJU deleted a comment from allcontributors bot Feb 22, 2021

allcontributors bot mentioned this pull request Feb 22, 2021

docs: add Alex-Medium as a contributor #43

Merged

xuyxu merged commit 2189a9b into LAMDA-NJU:master Feb 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-output regression support for CascadeForestRegressor #40

Add multi-output regression support for CascadeForestRegressor #40

Alex-Medium commented Feb 21, 2021

xuyxu commented Feb 21, 2021

xuyxu commented Feb 21, 2021

tczhao commented Feb 21, 2021

tczhao commented Feb 21, 2021

Alex-Medium commented Feb 21, 2021

xuyxu left a comment

xuyxu commented Feb 22, 2021

xuyxu left a comment

xuyxu commented Feb 22, 2021

xuyxu commented Feb 22, 2021

allcontributors bot commented Feb 22, 2021

xuyxu commented Feb 22, 2021

Add multi-output regression support for CascadeForestRegressor #40

Add multi-output regression support for CascadeForestRegressor #40

Conversation

Alex-Medium commented Feb 21, 2021

xuyxu commented Feb 21, 2021

xuyxu commented Feb 21, 2021

tczhao commented Feb 21, 2021

tczhao commented Feb 21, 2021

Alex-Medium commented Feb 21, 2021

xuyxu left a comment

Choose a reason for hiding this comment

xuyxu commented Feb 22, 2021

xuyxu left a comment

Choose a reason for hiding this comment

xuyxu commented Feb 22, 2021

xuyxu commented Feb 22, 2021

allcontributors bot commented Feb 22, 2021

xuyxu commented Feb 22, 2021