Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian regularization support in DALEX #445

Open
asheetal opened this issue Aug 9, 2021 · 3 comments
Open

Bayesian regularization support in DALEX #445

asheetal opened this issue Aug 9, 2021 · 3 comments
Labels
question ❔ Further information is requested

Comments

@asheetal
Copy link

asheetal commented Aug 9, 2021

In some social science fields large data do not exist and researchers must make decisions using small number of samples (p >> n problem)
Good to see support in R (tfprobability, brnn packages)
Wondering if the DALEX team has any thoughts/comments on this?

@hbaniecki hbaniecki added the question ❔ Further information is requested label Aug 9, 2021
@pbiecek
Copy link
Member

pbiecek commented Jan 15, 2022

@asheetal size of the data shall not matter in the implemented XAI techniques (nor local nor global),
but let's try, do you have any trained models for tests?

@asheetal
Copy link
Author

In a recent experiment with p >> n what I did was as follows

create an p x l array p = predictor, l = 1000 below
for (i in 1:1000) {
     randomize the seed
     build a keras model
     generate variable importance rank with DALEX
     against each predictor append the rank number from DALEX into its list
}
sort the predictor array based on how many times that predictor has received ones, followed by twos etc etc

It indeed helped. The final rank was a histogram against each predictor. I found that if I had run it once (l=1) I would have gotten completely inaccurate results.

@asheetal
Copy link
Author

Forgot to add. The problem is not within DALEX. The problem is the model itself. For p >> n, the model must be Bayesian probabilitic. So must work in conjunction tfprobability etc models, so that now the variable importance is not a rank rather a probabilistic range of ranks. The researcher can now choose to decide how to infer the rank - median, max, min, overlapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question ❔ Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants