To demonstrate the ``SequenceFeature().get_df_pos()`` method, we load the ``DOM_GSEC`` example dataset including its respective features  (see [Breimann25a]_):

In [1]:
import aaanalysis as aa
aa.options["verbose"] = False
df_seq = aa.load_dataset(name="DOM_GSEC", n=20)
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC").head(100)
features = df_feat["feature"].to_list()
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
df_feat = sf.get_df_feat(features=features, labels=labels, df_parts=df_parts)
aa.display_df(df_feat, n_rows=5)

Unnamed: 0,feature,category,subcategory,scale_name,scale_description,abs_auc,abs_mean_dif,mean_dif,std_test,std_ref,p_val_mann_whitney,p_val_fdr_bh,positions
1,"TMD_C_JMD_C-Seg...3,4)-KLEP840101",Energy,Charge,Charge,"Net charge (Kle...n et al., 1984)",0.301,0.14,0.14,0.111692,0.110793,0.001116,0.00465,3132333435
2,"TMD_C_JMD_C-Seg...3,4)-FINA910104",Conformation,α-helix (C-cap),α-helix termination,"Helix terminati...n et al., 1991)",0.295,0.12949,0.12949,0.111228,0.125451,0.001413,0.005048,3132333435
3,"TMD_C_JMD_C-Seg...6,9)-LEVM760105",Shape,Side chain length,Side chain length,"Radius of gyrat... (Levitt, 1976)",0.335,0.245149,0.245149,0.176567,0.18247,0.000289,0.004133,3233
4,"TMD_C_JMD_C-Seg...3,4)-HUTJ700102",Energy,Entropy,Entropy,"Absolute entrop...Hutchens, 1970)",0.306,0.15571,0.15571,0.104963,0.136006,0.000921,0.004605,3132333435
5,"TMD_C_JMD_C-Seg...6,9)-RADA880106",ASA/Volume,Volume,Accessible surface area (ASA),"Accessible surf...olfenden, 1988)",0.342,0.18085,0.18085,0.138541,0.145353,0.000211,0.005267,3233


``df_feat`` must be provided to create ``df_pos``, containing an aggregated numerical value (``mean_dif`` column by default) per a selected scale category level (``category`` by default):


In [2]:
df_pos = sf.get_df_pos(df_feat=df_feat)
aa.display_df(df_pos, n_rows=5, show_shape=True)

DataFrame shape: (6, 40)


Unnamed: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40
ASA/Volume,0.0,0.0,0.0,0.0,0.0,-0.059101,-0.040276,-0.040276,-0.059101,-0.040276,0.0,0.004078,0.0,0.0,0.004078,0.0,0.0,0.086081,0.0,0.0,0.086081,0.0,0.0,0.086081,0.0,0.141517,0.089894,0.093706,0.093706,0.042851,0.145993,0.070463,0.04751,0.055056,0.055056,-0.016557,0.018667,0.0,0.0,0.187233
Conformation,0.026359,0.0,0.002911,-0.012762,0.0,0.002911,-0.051884,0.026359,-0.020537,0.026359,-0.035033,-0.065563,0.002911,-0.1125,-0.039593,0.1778,-0.056972,-0.019602,0.051167,-0.009796,-0.112065,0.051167,0.0,0.015927,-0.054268,0.060008,-0.00544,0.021499,0.043311,0.026814,0.07173,0.100947,0.130135,0.142818,0.140699,0.144639,0.110996,0.103118,0.103118,0.103118
Energy,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.107934,0.0,0.0,-0.011404,0.0,0.0,0.010029,-0.107934,-0.0726,0.010029,-0.107934,-0.066433,-0.06335,-0.045898,-0.027333,-0.021444,0.023545,0.010077,0.070978,0.055025,0.015219,0.058613,0.045465,0.051147,-0.068855,0.12159,0.12159,0.12159
Polarity,0.0,0.0,0.0,-0.122285,0.0,-0.07961,-0.100947,-0.07961,-0.07961,-0.07961,-0.122285,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.140781,-0.127236,-0.127236,-0.134008,-0.133926,-0.127236,-0.067446,-0.101554,-0.107563,-0.012088,-0.012088,-0.140616,0.0,0.0,0.0
Shape,0.096837,-0.082142,0.0,0.0463,0.020332,-0.056744,-0.022184,0.025681,-0.031325,0.062349,0.007347,-0.125,0.0,-0.082142,-0.125,0.0,-0.082142,0.0,0.083999,-0.082142,0.0,0.083999,0.0,0.0,0.0,0.083999,-0.052823,-0.052823,-0.052823,-0.052823,0.004098,0.034592,0.073556,0.085189,0.05095,0.19045,0.0,0.153668,0.0,0.0


You can change the considered numerical and categorical columns using the ``col_val`` and ``col_cat`` parameters: 

In [3]:
df_pos = sf.get_df_pos(df_feat=df_feat, col_val="abs_auc", col_cat="subcategory")
aa.display_df(df_pos, n_rows=5, show_shape=True, n_cols=5)

DataFrame shape: (35, 40)


Unnamed: 0,1,2,3,4,5
Accessible surface area (ASA),0.0,0.0,0.0,0.0,0.0
Amphiphilicity,0.0,0.0,0.0,0.0,0.0
Amphiphilicity (α-helix),0.0,0.0,0.0,0.0,0.0
Backbone-dynamics (-CH),0.0,0.0,0.0,0.0,0.0
Buried,0.0,0.0,0.0,0.0,0.0


The residue positions can be adjusted using the ``start``, ``tmd_len``, ``jmd_n_len``, and ``jmd_c_len`` parameters:

In [4]:
# Shift positions by 10 residues
df_pos = sf.get_df_pos(df_feat=df_feat, start=11)
aa.display_df(df_pos, n_rows=5, show_shape=True, n_cols=5)

DataFrame shape: (6, 40)


Unnamed: 0,11,12,13,14,15
ASA/Volume,0.0,0.004078,0.0,0.0,0.004078
Conformation,-0.035033,-0.065563,0.002911,-0.1125,-0.039593
Energy,0.0,0.0,-0.107934,0.0,0.0
Polarity,-0.122285,0.0,0.0,0.0,0.0
Shape,0.007347,-0.125,0.0,-0.082142,-0.125


In [5]:
# Increase TMD length from 20 to 50
df_pos = sf.get_df_pos(df_feat=df_feat, tmd_len=50)
aa.display_df(df_pos, n_rows=5, show_shape=True, n_cols=5)

DataFrame shape: (6, 70)


Unnamed: 0,1,2,3,4,5
ASA/Volume,0.0,0.0,0.0,0.0,0.0
Conformation,0.026359,0.0,0.002911,-0.012762,0.0
Energy,0.0,0.0,0.0,0.0,0.0
Polarity,0.0,0.0,0.0,-0.122285,0.0
Shape,0.096837,-0.082142,0.0,0.0463,0.020332


You can select a specific parts and normalize results using the ``list_parts`` and ``normalize`` parameters:

In [6]:
df_pos = sf.get_df_pos(df_feat=df_feat, list_parts=["jmd_c"], normalize=True)
aa.display_df(df_pos)

Unnamed: 0,jmd_c
ASA/Volume,0.476542
Conformation,0.973787
Energy,0.50102
Polarity,-0.480913
Shape,0.50114
Structure-Activity,-0.117262
