Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample Weight Support? #89

Open
kmedved opened this issue Feb 8, 2022 · 7 comments
Open

Sample Weight Support? #89

kmedved opened this issue Feb 8, 2022 · 7 comments
Labels
enhancement New feature or request

Comments

@kmedved
Copy link

kmedved commented Feb 8, 2022

Hello - thanks all for the very interesting looking package. The hierarchical shrinkage wrapper seems especially interesting/novel. I'm interested in whether it would be possible to add sample weight support to this package? For background, sample weights are a fairly typical part of many scikit-learn packages (e.g., RandomForestRegressor or HistGradientBoostingRegressor, etc...), and are passed via the fit call, e.g., model.fit(X_train, y_train, sample_weight = w_train).

The purpose of sample weights is to increase the weighting of rows/observations based on some external criteria, typically based around how the training data was gathered, e.g., if your data has different sensors of varying sensitivity, you may increase the sample weighting of certain sensors. Or alternatively if your data is aggregated in some form, then you can increase the weights based on the aggregation (e.g., weekly data with a weight of 7, daily data with a weight of 1, etc...).

In terms of implementation, it's typically as simple as multiplying the loss for each row by the sample weights, to increase the model's sensitivity to large weightings, although I'm not sure if the novel hierarchical shrinkage capabilities of this package would present complications.

Thanks again for the very interesting looking package. I look forward to testing and using it.

@csinva csinva added the enhancement New feature or request label Feb 8, 2022
@csinva
Copy link
Owner

csinva commented Feb 8, 2022

Hi @kmedved 👋, thanks for your interest in the package! Indeed, supporting sample weight seems like it would be useful and especially interesting for hierarchical shrinkage - we'll add it in some time very soon :)

@csinva
Copy link
Owner

csinva commented Jul 29, 2022

An update: some of the models (but not all) now support sample_weight including FIGS, TAO, SLIM, CART, BoostedRules, SLIPPER, and SkopeRules. Still working on the others...

@mepland
Copy link
Collaborator

mepland commented Dec 29, 2022

Some parts of FIGS do not support sample_weight including the extract_sklearn_tree_from_figs() function.

@kmedved
Copy link
Author

kmedved commented Dec 30, 2022

Thanks for the work on this @csinva. Any update on getting sample weight supported added for hierarchical shrinkage?

@csinva
Copy link
Owner

csinva commented Dec 30, 2022

@aagarwal1996 @yanshuotan Can someone add in sample-weight support for HS?

@yanshuotan
Copy link
Collaborator

yanshuotan commented Jan 1, 2023

Actually HS already supports sample weights. sample_weight is fed into self.estimator_.fit() as an element of kwargs. For instance, see the following snippet:

Screen Shot 2023-01-01 at 1 58 23 PM
st))`

Furthermore, line 84 of the code uses weighted_n_node_samples to do shrinkage. When the original tree estimator is fit, it stores the weighted number of nodes in this array.

I do agree that it may be beneficial to make sample_weight an explicit (optional) argument into fit. @csinva what do you think?

@csinva
Copy link
Owner

csinva commented Jan 3, 2023

Agreed, thanks Yan Shuo for adding HS sample_weight as an explicit argument in #156.

Should work now @kmedved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants