To demonstrate the ``TreeModel().add_feat_importance()``method, we obtain the ``DOM_GSEC`` example dataset and its respective feature set (see [Breimann25a]_):

In [22]:
import aaanalysis as aa
aa.options["verbose"] = False # Disable verbosity

df_seq = aa.load_dataset(name="DOM_GSEC")
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC").head(7)

# Create feature matrix
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
X = sf.feature_matrix(features=df_feat["feature"], df_parts=df_parts)

We can not fit the ``TreeModel``, which will internally fit 3 tree-based models over 5 training rounds be default:

In [23]:
tm = aa.TreeModel()
tm = tm.fit(X, labels=labels)

We can directly retrieve the feature importance using the ``feat_importance`` attribute of the ``TreeModel`` class:

In [28]:
feat_importance = tm.feat_importance
print("Feature importance: ", feat_importance)

Feature importance:  [ 6.291 15.453 17.95  26.841 15.644  7.099 10.722]


To add these values to the feature DataFrame (``df_feat``), it should not already contain the ``feat_importance`` and ``feat_importance_std`` columns: 

In [25]:
# Remove feature importance columns
df_feat = df_feat[[x for x in list(df_feat) if x not in ["feat_importance", "feat_importance_std"]]]

Now the importance obtain from the fitted model can be inserted with the conventional column names by using the ``TreeModel().add_feat_importance()`` method:

In [26]:
df_feat = tm.add_feat_importance(df_feat=df_feat)
aa.display_df(df_feat)

Unnamed: 0,feature,category,subcategory,scale_name,scale_description,abs_auc,abs_mean_dif,mean_dif,std_test,std_ref,p_val_mann_whitney,p_val_fdr_bh,positions,feat_importance,feat_importance_std
1,"TMD_C_JMD_C-Seg...3,4)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.244,0.103666,0.103666,0.106692,0.110506,0.0,0.0,3132333435,6.291,0.391
2,"TMD_C_JMD_C-Seg...3,4)-FINA910104",Conformation,α-helix (C-cap),α-helix termination,"Helix terminati...n et al., 1991)",0.243,0.085064,0.085064,0.098774,0.096946,0.0,0.0,3132333435,15.453,0.7
3,"TMD_C_JMD_C-Seg...6,9)-LEVM760105",Shape,Side chain length,Side chain length,"Radius of gyrat... (Levitt, 1976)",0.233,0.137044,0.137044,0.161683,0.176964,0.0,1e-06,3233,17.95,0.525
4,"TMD_C_JMD_C-Seg...3,4)-HUTJ700102",Energy,Entropy,Entropy,"Absolute entrop...Hutchens, 1970)",0.229,0.098224,0.098224,0.106865,0.124608,0.0,1e-06,3132333435,26.841,0.606
5,"TMD_C_JMD_C-Seg...6,9)-RADA880106",ASA/Volume,Volume,Accessible surface area (ASA),"Accessible surf...olfenden, 1988)",0.223,0.095071,0.095071,0.114758,0.132829,0.0,2e-06,3233,15.644,0.779
6,"TMD_C_JMD_C-Seg...2,3)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.222,0.058671,0.058671,0.064895,0.069547,0.0,1e-06,27282930313233,7.099,0.542
7,"TMD_C_JMD_C-Seg...4,5)-FAUJ880109",Energy,Isoelectric point,Number hydrogen bond donors,"Number of hydro...e et al., 1988)",0.215,0.146661,0.146661,0.174609,0.188034,0.0,4e-06,33343536,10.722,0.35


To override already existing feature importance columns, set ``drop=True``:

In [27]:
# Drop existing feature columns and insert new ones
df_feat = tm.add_feat_importance(df_feat=df_feat, drop=True)
aa.display_df(df_feat)

Unnamed: 0,feature,category,subcategory,scale_name,scale_description,abs_auc,abs_mean_dif,mean_dif,std_test,std_ref,p_val_mann_whitney,p_val_fdr_bh,positions,feat_importance,feat_importance_std
1,"TMD_C_JMD_C-Seg...3,4)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.244,0.103666,0.103666,0.106692,0.110506,0.0,0.0,3132333435,6.291,0.391
2,"TMD_C_JMD_C-Seg...3,4)-FINA910104",Conformation,α-helix (C-cap),α-helix termination,"Helix terminati...n et al., 1991)",0.243,0.085064,0.085064,0.098774,0.096946,0.0,0.0,3132333435,15.453,0.7
3,"TMD_C_JMD_C-Seg...6,9)-LEVM760105",Shape,Side chain length,Side chain length,"Radius of gyrat... (Levitt, 1976)",0.233,0.137044,0.137044,0.161683,0.176964,0.0,1e-06,3233,17.95,0.525
4,"TMD_C_JMD_C-Seg...3,4)-HUTJ700102",Energy,Entropy,Entropy,"Absolute entrop...Hutchens, 1970)",0.229,0.098224,0.098224,0.106865,0.124608,0.0,1e-06,3132333435,26.841,0.606
5,"TMD_C_JMD_C-Seg...6,9)-RADA880106",ASA/Volume,Volume,Accessible surface area (ASA),"Accessible surf...olfenden, 1988)",0.223,0.095071,0.095071,0.114758,0.132829,0.0,2e-06,3233,15.644,0.779
6,"TMD_C_JMD_C-Seg...2,3)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.222,0.058671,0.058671,0.064895,0.069547,0.0,1e-06,27282930313233,7.099,0.542
7,"TMD_C_JMD_C-Seg...4,5)-FAUJ880109",Energy,Isoelectric point,Number hydrogen bond donors,"Number of hydro...e et al., 1988)",0.215,0.146661,0.146661,0.174609,0.188034,0.0,4e-06,33343536,10.722,0.35
