Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to have features only for certain parameters? #244

Open
z-feldman opened this issue Mar 11, 2021 · 2 comments
Open

Is it possible to have features only for certain parameters? #244

z-feldman opened this issue Mar 11, 2021 · 2 comments

Comments

@z-feldman
Copy link

I've really enjoyed being able to look at feature importance and shap values for different parameters, it can be really insightful. To take it a step further, I've been wondering if it's possible to have certain features be specific to some parameters but not used in the estimation of other parameters.

I was using an automated feature selection - subbed in catboost as it doesn't support ngboost - and it dropped some features from the point estimate prediction that were at the top of the feature importance for the variance parameter when using ngboost. So that got me thinking if it was possible to add a way to specify in the model "I want features [a,b,c] to predict parameter_1 but I want [c,d,e] to predict parameter_2".

I'm not sure how the estimation of each parameter is working under the hood so I'm not sure if this is possible or not. Either way, love the package, thanks for the great work!

@alejandroschuler
Copy link
Collaborator

That's definitely feasible. It would be very similar to the method currently used to (randomly) subsample columns at each boosting iteration. You'd now need to keep track of what columns were used per-parameter in each iteration instead of globally in each iteration. So it's a little more "paperwork", so to speak, but doable.

see:
https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/ngboost.py#L134
https://github.com/stanfordmlgroup/ngboost/blob/master/ngboost/ngboost.py#L260

Ultimately I'm not sure how much it would change the final predictions. Boosting models basically do their own feature selection so there's usually not much point to doing it a-priori unless you have a strong inductive bias you want to provide (but even then- usually easier to let the model figure it out for itself). You're already seeing evidence of this in the feature importances. NGBoost is choosing different features to predict each of the parameters because different features turn out to be more or less useful. Interpreting what that means (or more likely doesn't) is a whole different story, of course.

@z-feldman
Copy link
Author

Thanks! I'll check that out. My specific problem is a time-series problem where I started out using rolling means and std deviations since my time-series isn't super strong lol. So I planned on keeping the rolling statistic for it's respective parameter. I agree that usually this wouldn't be super necessary and the other, non-rolling variables, I'm going to keep for both parameters since there's no strong prior on those. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants