Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the choice between Mean/Std and Median/IQR #32

Closed
glevv opened this issue Jun 14, 2020 · 5 comments
Closed

Add the choice between Mean/Std and Median/IQR #32

glevv opened this issue Jun 14, 2020 · 5 comments

Comments

@glevv
Copy link

glevv commented Jun 14, 2020

Median and IQR could be more robust and useful if distribution of importances is not normal.

Something like this

importance_df["importance_md"] = lofo_cv_scores_normalized.median(axis=1)
importance_df["importance_iqr"] = stats.iqr(lofo_cv_scores_normalized, axis=1)

Also for plot_importance there could be a choice between error and 95%CI;

For std it would be

importance_df.plot(x="feature", 
y="importance_mean", 
xerr=1.96 * importance_df.importance_std,
kind='barh', 
color=importance_df["color"], 
figsize=figsize)

and for iqr

importance_df.plot(x="feature", 
y="importance_md", 
xerr=1.57 * importance_df.importance_iqr / np.sqrt(n), # num_sampling for flofo and num of folds for lofo
kind='barh', 
color=importance_df["color"], 
figsize=figsize)
@aerdem4
Copy link
Owner

aerdem4 commented Jun 14, 2020

Nice suggestion @glevv Maybe a more generic solution is to return individual fold scores in lofo output and to add a boxplot option like:

df.set_index("feature").T.boxplot(column=features, vert=False)

What do you think?

@glevv
Copy link
Author

glevv commented Jun 14, 2020

Yes, that's great. Just tried it with FLOFO, works fine. Only 2 concerns:

  • with very small variance in feature importance boxes will be very small, basically lines (see screenshot)
    Untitled

  • it kinda works bad with small number of folds like 2

x = np.random.random(2)
plt.boxplot(x);

it will always produce visually the same plot just with different values. But I guess you can just add warning in initialization

if (isinstance(cv, int) and cv < 3) or (hasattr(cv, 'n_splits')) and cv.n_splits < 3):
    warning_str = ("Warning: Small number of folds could lead to inadequate results")
    warnings.warn(warning_str)

@aerdem4
Copy link
Owner

aerdem4 commented Jun 15, 2020

Since it will be optional, it won't be a big deal. Default plot can stay as current and with a parameter, people can select boxplot.

@glevv
Copy link
Author

glevv commented Jun 16, 2020

I guess it is closed then

@glevv glevv closed this as completed Jun 16, 2020
@aerdem4
Copy link
Owner

aerdem4 commented Jul 2, 2020

#33 brings the box plot feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants