To demonstrate the ``SequenceFeature().get_df_feat()`` method, we load the ``DOM_GSEC`` example dataset including its respective features  (see [Breimann25a]_):

In [1]:
import aaanalysis as aa
aa.options["verbose"] = False
df_seq = aa.load_dataset(name="DOM_GSEC")
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC")
features = df_feat["feature"].to_list()
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
aa.display_df(df_feat, n_rows=5)


Unnamed: 0,feature,category,subcategory,scale_name,scale_description,abs_auc,abs_mean_dif,mean_dif,std_test,std_ref,p_val_mann_whitney,p_val_fdr_bh,positions,feat_importance,feat_importance_std
1,"TMD_C_JMD_C-Seg...3,4)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.244,0.103666,0.103666,0.106692,0.110506,0.0,0.0,3132333435,0.9704,1.438918
2,"TMD_C_JMD_C-Seg...3,4)-FINA910104",Conformation,α-helix (C-cap),α-helix termination,"Helix terminati...n et al., 1991)",0.243,0.085064,0.085064,0.098774,0.096946,0.0,0.0,3132333435,0.0,0.0
3,"TMD_C_JMD_C-Seg...6,9)-LEVM760105",Shape,Side chain length,Side chain length,"Radius of gyrat... (Levitt, 1976)",0.233,0.137044,0.137044,0.161683,0.176964,0.0,1e-06,3233,1.5548,2.109848
4,"TMD_C_JMD_C-Seg...3,4)-HUTJ700102",Energy,Entropy,Entropy,"Absolute entrop...Hutchens, 1970)",0.229,0.098224,0.098224,0.106865,0.124608,0.0,1e-06,3132333435,3.1112,3.109955
5,"TMD_C_JMD_C-Seg...6,9)-RADA880106",ASA/Volume,Volume,Accessible surface area (ASA),"Accessible surf...olfenden, 1988)",0.223,0.095071,0.095071,0.114758,0.132829,0.0,2e-06,3233,0.0,0.0


``features``, ``df_parts``, and the ``labels`` of the respective samples of the sequence DataFrame must be provided to retrieve the feature DataFrame:

In [2]:
# Mean difference values are higher because here negative samples (instead of unlabeled ones in Breimann25a) are used as a reference dataset
df_feat = sf.get_df_feat(features=features, df_parts=df_parts, labels=labels)
aa.display_df(df_feat, n_rows=5)

Unnamed: 0,feature,category,subcategory,scale_name,scale_description,abs_auc,abs_mean_dif,mean_dif,std_test,std_ref,p_val_mann_whitney,p_val_fdr_bh,positions
1,"TMD_C_JMD_C-Seg...3,4)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.335,0.168254,0.168254,0.106692,0.124924,0.0,0.0,3132333435
2,"TMD_C_JMD_C-Seg...3,4)-FINA910104",Conformation,α-helix (C-cap),α-helix termination,"Helix terminati...n et al., 1991)",0.333,0.150698,0.150698,0.098774,0.119888,0.0,0.0,3132333435
3,"TMD_C_JMD_C-Seg...6,9)-LEVM760105",Shape,Side chain length,Side chain length,"Radius of gyrat... (Levitt, 1976)",0.33,0.246867,0.246867,0.161683,0.197489,0.0,0.0,3233
4,"TMD_C_JMD_C-Seg...3,4)-HUTJ700102",Energy,Entropy,Entropy,"Absolute entrop...Hutchens, 1970)",0.327,0.162229,0.162229,0.106865,0.135247,0.0,0.0,3132333435
5,"TMD_C_JMD_C-Seg...6,9)-RADA880106",ASA/Volume,Volume,Accessible surface area (ASA),"Accessible surf...olfenden, 1988)",0.322,0.184252,0.184252,0.114758,0.164757,0.0,0.0,3233


You can adjust the provided labels of the test and reference group using ``label_test`` and ``label_ref``, which will alter the sign in ``mean_dif``:

In [3]:
df_feat = sf.get_df_feat(features=features, df_parts=df_parts, labels=labels, label_test=0, label_ref=1)
# Mean difference values display opposite signs because they represent the computed difference between the mean of the test group and the mean of the reference group
aa.display_df(df_feat, n_rows=5, col_to_show="mean_dif")

Unnamed: 0,mean_dif
1,-0.168254
2,-0.150698
3,-0.246867
4,-0.162229
5,-0.184252


The residue positions can be adjusted using the ``start``, ``tmd_len``, ``jmd_n_len``, and ``jmd_c_len`` parameters:

In [4]:
# Shift positions by 10 residues
df_feat = sf.get_df_feat(features=features, df_parts=df_parts, labels=labels,
                         start=11)
aa.display_df(df_feat, n_rows=5, col_to_show="positions")

Unnamed: 0,positions
1,4142434445
2,4142434445
3,4243
4,4142434445
5,4243


In [5]:
# Increase TMD length from 20 to 50
df_feat = sf.get_df_feat(features=features, df_parts=df_parts, labels=labels,
                         tmd_len=50)
aa.display_df(df_feat, n_rows=5, col_to_show="positions")

Unnamed: 0,positions
1,535455565758596061
2,535455565758596061
3,55565758
4,535455565758596061
5,55565758


T-test can be used instead of Mann-Whitney-U-test by setting ``parameteric=True``:

In [6]:
df_feat = sf.get_df_feat(features=features, df_parts=df_parts, labels=labels, parametric=True)
aa.display_df(df_feat, n_rows=5, col_to_show="p_val_ttest_indep")

Unnamed: 0,p_val_ttest_indep
1,0.0
2,0.0
3,0.0
4,0.0
5,0.0
