To demonstrate the ShapExplainer.fit()method, we obtain the DOM_GSEC example dataset and its respective feature set (see [Breimann24c]_):

In [4]:
import aaanalysis as aa
aa.options["verbose"] = False # Disable verbosity

df_seq = aa.load_dataset(name="DOM_GSEC", n=3)
labels = df_seq["label"].to_list()
df_feat = aa.load_features(name="DOM_GSEC").head(10)

# Create feature matrix
sf = aa.SequenceFeature()
df_parts = sf.get_df_parts(df_seq=df_seq)
X = sf.feature_matrix(features=df_feat["feature"], df_parts=df_parts)

aa.display_df(df_seq, )

Unnamed: 0,entry,sequence,label,tmd_start,tmd_stop,jmd_n,tmd,jmd_c
1,Q14802,MQKVTLGLLVFLAGF...PGETPPLITPGSAQS,0,37,59,NSPFYYDWHS,LQVGGLICAGVLCAMGIIIVMSA,KCKCKFGQKS
2,Q86UE4,MAARSWQDELAQQAE...SPKQIKKKKKARRET,0,50,72,LGLEPKRYPG,WVILVGTGALGLLLLFLLGYGWA,AACAGARKKR
3,Q969W9,MHRLMGVNSTAAAAA...AIWSKEKDKQKGHPL,0,41,63,FQSMEITELE,FVQIIIIVVVMMVMVVVITCLLS,HYKLSARSFI
4,P05067,MLPGLALLLLAAWTA...GYENPTYKFFEQMQN,1,701,723,FAEDVGSNKG,AIIGLMVGGVVIATVIVITLVML,KKKQYTSIHH
5,P14925,MAGRARSGLLLLLLG...EEEYSAPLPKPAPSS,1,868,890,KLSTEPGSGV,SVVLITTLLVIPVLVLLAIVMFI,RWKKSRAFGD
6,P70180,MRSLLLFTFSACVLL...RELREDSIRSHFSVA,1,477,499,PCKSSGGLEE,SAVTGIVVGALLGAGLLMAFYFF,RKKYRITIER


We can now create a ``ShapExplainer`` object and fit it to obtain the shap values and the expected value using the ``shap_values`` and ``exp_value`` (expected/base value) attributes:

In [8]:
se = aa.ShapExplainer()
se.fit(X, labels=labels)

shap_values = se.shap_values
exp_value = se.exp_value

# Print SHAP values and expected value
print("SHAP values explain the feature impact for 3 negative and 3 positive samples")
print(shap_values.round(2))

print("\nThe expected value approximates the expected model output (average prediction score).")
print("For a binary classification with balanced datasets, it is around 0.5:")
print(exp_value)

SHAP values explain the feature impact for 3 negative and 3 positive samples
[[-0.07 -0.06 -0.04 -0.04 -0.05 -0.   -0.06 -0.05 -0.05 -0.04]
 [-0.08 -0.08 -0.04 -0.04 -0.06 -0.03 -0.06 -0.06 -0.04  0.02]
 [-0.09 -0.08 -0.01 -0.04 -0.02 -0.03 -0.05 -0.05  0.   -0.05]
 [ 0.07  0.08  0.03  0.04  0.03  0.03  0.06  0.06  0.04  0.03]
 [ 0.08  0.08  0.04  0.05  0.06  0.01  0.05  0.06  0.02  0.03]
 [ 0.07  0.07  0.03  0.04  0.05  0.03  0.06  0.06  0.04  0.03]]

The expected value approximates the expected model output (average prediction score).
For a binary classification with balanced datasets, it is around 0.5:
0.5006666666666668


To obtain Monte Carlo estimates of the both, the ``ShapExplainer.fit()`` method performs 5 rounds of model fitting and averages the ``shap_values`` and ``exp_value`` across all rounds. The number of rounds can be adjusted using the ``n_rounds`` (default=5) parameter:

In [9]:
se = aa.ShapExplainer()
se = se.fit(X, labels=labels, n_rounds=10)

<aaanalysis.explainable_ai._shap_explainer.ShapExplainer at 0x7fd774ea1a60>