To demonstrate the ``ShapExplainer.add_feat_impact()`` method, we obtain the DOM_GSEC example dataset and its respective feature set (see [Breimann24c]_):

In [1]:
import shap
import aaanalysis as aa
aa.options["verbose"] = False # Disable verbosity

df_seq = aa.load_dataset(name="DOM_GSEC", n=3)
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC").head(10)

# Create feature matrix
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
X = sf.feature_matrix(features=df_feat["feature"], df_parts=df_parts)

aa.display_df(df_seq, )

Unnamed: 0,entry,sequence,label,tmd_start,tmd_stop,jmd_n,tmd,jmd_c
1,Q14802,MQKVTLGLLVFLAGF...PGETPPLITPGSAQS,0,37,59,NSPFYYDWHS,LQVGGLICAGVLCAMGIIIVMSA,KCKCKFGQKS
2,Q86UE4,MAARSWQDELAQQAE...SPKQIKKKKKARRET,0,50,72,LGLEPKRYPG,WVILVGTGALGLLLLFLLGYGWA,AACAGARKKR
3,Q969W9,MHRLMGVNSTAAAAA...AIWSKEKDKQKGHPL,0,41,63,FQSMEITELE,FVQIIIIVVVMMVMVVVITCLLS,HYKLSARSFI
4,P05067,MLPGLALLLLAAWTA...GYENPTYKFFEQMQN,1,701,723,FAEDVGSNKG,AIIGLMVGGVVIATVIVITLVML,KKKQYTSIHH
5,P14925,MAGRARSGLLLLLLG...EEEYSAPLPKPAPSS,1,868,890,KLSTEPGSGV,SVVLITTLLVIPVLVLLAIVMFI,RWKKSRAFGD
6,P70180,MRSLLLFTFSACVLL...RELREDSIRSHFSVA,1,477,499,PCKSSGGLEE,SAVTGIVVGALLGAGLLMAFYFF,RKKYRITIER


We can now create a ``ShapExplainer`` object and fit it to create the ``shap_values``, which are saved internally:

In [2]:
se = aa.ShapExplainer()
se.fit(X, labels=labels)

shap_values = se.shap_values

# Print SHAP values and expected value
print("SHAP values explain the feature impact for 3 negative and 3 positive samples")
print(shap_values.round(2))

SHAP values explain the feature impact for 3 negative and 3 positive samples
[[-0.06 -0.07 -0.04 -0.04 -0.04  0.   -0.05 -0.07 -0.04 -0.04]
 [-0.07 -0.08 -0.04 -0.05 -0.04 -0.03 -0.06 -0.07 -0.04  0.01]
 [-0.07 -0.09 -0.02 -0.04 -0.01 -0.03 -0.05 -0.06  0.   -0.04]
 [ 0.07  0.09  0.02  0.05  0.02  0.03  0.06  0.07  0.04  0.02]
 [ 0.07  0.09  0.05  0.05  0.04  0.01  0.05  0.07  0.02  0.03]
 [ 0.07  0.08  0.04  0.05  0.03  0.03  0.06  0.07  0.04  0.02]]


We can now include these SHAP values as feature impact (i.e., 