Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve computation of final betas #43

Open
PhilippKaniuth opened this issue Mar 20, 2023 · 2 comments
Open

Improve computation of final betas #43

PhilippKaniuth opened this issue Mar 20, 2023 · 2 comments
Labels
enhancement Code improvement

Comments

@PhilippKaniuth
Copy link
Member

Possibly change how optionally returned final betas are computed to make them more useful for downstream analyses. For that, change frrsa/frrsa/fitting/fitting/final_model so that it does a repeated CV to find the best hyperparameter for the whole dataset. Unclear how to deal with several hyperparams i.e. whether averaging them is actually sensible. The best course of action might also depend on the kind of hyperparameter, i.e. whether one uses fractional ridge regression or the more classical approach as sklearn does (this depends on how one sets the parameter nonnegative).

PRs for this issue welcome.

@PhilippKaniuth PhilippKaniuth added the enhancement Code improvement label Mar 20, 2023
@PhilippKaniuth
Copy link
Member Author

For context: currently, the optionally returned betas are computed using the whole dataset. However, in order to do so, one needs the best hyperparameter for the whole dataset (for each target separately, if one has more than one). This best hyperparameter is currently only ad-hoc calculated by averaging the hyperparameters from all outer cross-validations. Note that the hyperparameter from an outer cross-validation is only based on a sub-subset of the whole data (as it's estimated in the inner cross-validation), not guaranteeing to yield the actual best hyperparamter for the whole dataset.

@PhilippKaniuth
Copy link
Member Author

PhilippKaniuth commented Mar 29, 2023

If the aim is to generalize betas to a second target their might be a (more feasible) alternative:

The user submits two (or several) targets. One of them will be used to fit the statistical model. The others will be simply correlated with the reweighted predicting matrix, essentially generalizing the model. This should likely be done in the func fit_and_score.

moved to its own issue #46

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Code improvement
Projects
None yet
Development

No branches or pull requests

1 participant