Feature importance from DataFrame #45
Labels
enhancement
New feature or request
good first issue
Good for newcomers
low priority
Not urgent and won't degrade with time
Current state
Ensemble
andModel
classes haveget_feat_importance
methods to compute the permutation importance of input features. Currently the the input data must be supplied as aFoldYielder
object. There are occasions when one may wish to evaluate the feature importance on only a subset of the data (e.g. only on 2-jet events). This then requires saving the subset to a foldfile and instantiating a newFoldYielder
to point to the subset of data.Probable solution
The
get_feat_importance
methods are extend to takepandas.DataFrame
objects as inputs. This will no doubt impact certain aspects of the returned information, such as averaging over folds and computing uncertainties. I think it is reasonable that if the user really wants this extra information, they can export a new foldfile. This extension would simply be for getting the rough information quickly and producing an informative plot which wouldn't necessarily be used for publication.The text was updated successfully, but these errors were encountered: