# Which of the interpretability methods is more faithful ?

**Approach:**
Calculate for IG & SHAP each Comprehensiveness & Sufficiency
* Which of them performs better in each ?
    - Reasons?
* Which labels are conspicuous for scoring high/low or 
    - Metric specific tendencies ?
    - Data Bias ?
* Finally IG or SHAP "better" overall ?
    - Pros/Cons of approach with Faithfulness metric

## Table of Contents
2. [Sufficiency](#suff)  
    2.1 [SHAP](#suff_shap)  
    2.2 [IG](#suff_ig)
3. [Additional Study](#suff_add)

In [1]:
import pickle
import numpy as np
from tabulate import tabulate
from statistics import mean

In [2]:
id2tag = {0: 'Anrede', 1: 'Diagnosen', 2: 'AllergienUnverträglichkeitenRisiken', 3: 'Anamnese', 4: 'Medikation', 5: 'KUBefunde', 6: 'Befunde', 7: 'EchoBefunde', 8: 'Zusammenfassung', 9: 'Mix', 10: 'Abschluss'}
tag2id = {tag: id for id, tag in id2tag.items()}

labels = list(id2tag.values())

In [21]:
with open("data.p", "rb") as f:
    data = pickle.load(f)

In [None]:
with open("ig.p", "rb") as f:
    ig = pickle.load(f)
    
with open("shap.p", "rb") as f:
    shap = pickle.load(f)
    
with open("eva_ig.p", "rb") as f:
    eva_ig = pickle.load(f)
    
with open("eva_shap.p", "rb") as f:
    eva_shap = pickle.load(f)

In [5]:
compr, suff = {l: None for l in ["Anrede"]}, {l: None for l in ["Anrede"]}

for l in labels:
    right = [data[l].index(d) for d in data[l] if d[0] == l]
    wrong = [data[l].index(d) for d in data[l] if d[0] != l]
    print(l, right, wrong)
    compr_ig = [e.score for eva in eva_ig[l] for e in eva.evaluation_scores if e.name=="aopc_compr"]
    compr_shap = [e.score for eva in eva_shap[l] for e in eva.evaluation_scores if e.name=="aopc_compr"]
    
    suff_ig = [e.score for eva in eva_ig[l] for e in eva.evaluation_scores if e.name=="aopc_suff"]
    suff_shap = [e.score for eva in eva_shap[l] for e in eva.evaluation_scores if e.name=="aopc_suff"]
    
    compr[l] = {"IG": (np.nanmean([e for e in compr_ig if compr_ig.index(e) in right]), np.nanmean([e for e in compr_ig if compr_ig.index(e) in wrong])), 
                "SHAP":(np.nanmean([e for e in compr_shap if compr_shap.index(e) in right]), np.nanmean([e for e in compr_shap if compr_shap.index(e) in wrong]))}
    suff[l] = {"IG": (np.nanmean([e for e in suff_ig if suff_ig.index(e) in right]), np.nanmean([e for e in suff_ig if suff_ig.index(e) in wrong])), 
                "SHAP":(np.nanmean([e for e in suff_shap if suff_shap.index(e) in right]), np.nanmean([e for e in suff_shap if suff_shap.index(e) in wrong]))}

Anrede [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10]
Diagnosen [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] []
AllergienUnverträglichkeitenRisiken [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12]
Anamnese [0, 1, 2, 3, 4, 6, 9, 10, 11, 12] [5, 7, 8, 13]
Medikation [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] []
KUBefunde [0, 1, 2, 3, 5, 6, 7, 8, 9, 10] [4]
Befunde [0, 1, 2, 3, 4, 5, 6, 8, 9, 10] [7, 11]
EchoBefunde [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12, 13, 14, 15, 16]
Zusammenfassung [0, 1, 2, 3, 4, 6, 7, 8, 9, 10] [5, 11, 12]
Mix [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [0, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
Abschluss [0, 2, 3, 4, 5, 6, 7, 8, 9, 11] [1, 10, 12, 13, 14, 15]


Mean of empty slice
Mean of empty slice
Mean of empty slice
Mean of empty slice


## **2. Sufficiency**<a name="suff"></a>
$f(x)_j - f(r_j)_j$ → Score difference once most important tokens included.  
**Lower scores better: Inclusion of top 10-100% most important tokens should drive the prediction.**  
Measures if top k% (10 step: 10-100) tokens in explanation are sufficient for the right prediction.

### **2.1. SHAP**<a name="suff_shap"></a>

In [6]:
table = [(l, [[round(s, 2) for s in v] for k, v in suff[l].items() if k == "SHAP"]) for l in labels]
table = sorted(table, key = lambda x: x[1][0][0], reverse = False)
table.insert(0, ["Label", "Sufficiency mean scores"])

print(tabulate(table, headers="firstrow"))

Label                                Sufficiency mean scores
-----------------------------------  -------------------------
Zusammenfassung                      [[0.0, 0.01]]
Anamnese                             [[0.02, 0.05]]
Befunde                              [[0.02, 0.02]]
EchoBefunde                          [[0.04, nan]]
AllergienUnverträglichkeitenRisiken  [[0.06, nan]]
Medikation                           [[0.07, nan]]
Anrede                               [[0.1, nan]]
Abschluss                            [[0.13, 0.35]]
Diagnosen                            [[0.19, nan]]
KUBefunde                            [[0.26, 0.12]]
Mix                                  [[0.4, 0.1]]


#### SHAP - Best Label: Zusammenfassung

In [8]:
sent = data["Zusammenfassung"][1][1]
score = 1.0 #bench.score(sent)
target = tag2id["Zusammenfassung"]
metr = eva_shap["Zusammenfassung"][1].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = shap["Zusammenfassung"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: -0.0


In [9]:
sent = data["Zusammenfassung"][1][1]
score = 1.0 #bench.score(sent)
target = tag2id["Zusammenfassung"]
metr = eva_ig["Zusammenfassung"][1].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = ig["Zusammenfassung"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.99


#### SHAP - Worst Label: Mix

In [11]:
sent = data["Mix"][4][1]
score = 1.0 #bench.score(sent)
target = tag2id["Mix"]
metr = eva_shap["Mix"][4].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = shap["Mix"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.56


### **2.2 Integrated Gradients**<a name="suff_ig"></a>

In [12]:
table = [(l, [[round(s, 2) for s in v] for k, v in suff[l].items() if k=="IG"]) for l in labels]
table = sorted(table, key = lambda x: x[1][0][0], reverse = False)
table.insert(0, ["Label", "Sufficiency mean scores"])

print(tabulate(table, headers="firstrow"))

Label                                Sufficiency mean scores
-----------------------------------  -------------------------
Befunde                              [[0.34, 0.98]]
EchoBefunde                          [[0.37, nan]]
Medikation                           [[0.43, nan]]
Abschluss                            [[0.43, 0.66]]
Zusammenfassung                      [[0.48, 0.01]]
Anamnese                             [[0.49, 0.4]]
AllergienUnverträglichkeitenRisiken  [[0.6, nan]]
Diagnosen                            [[0.66, nan]]
Anrede                               [[0.71, nan]]
KUBefunde                            [[0.79, 0.61]]
Mix                                  [[0.79, 0.95]]


#### IG - Best Label: EchoBefunde

In [14]:
sent = data["EchoBefunde"][3][1]
score = 1.0 #bench.score(sent)
target = tag2id["EchoBefunde"]
metr = eva_ig["EchoBefunde"][3].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = ig["EchoBefunde"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.34


#### IG - Worst Label: Mix

In [26]:
sent = data["Mix"][4][1]
score = 0.94 #bench.score(sent)
target = tag2id["Mix"]
metr = eva_ig["Mix"][4].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = ig["Suff_Mix"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.93


## Additional Study: Include negative contributing tokens<a name="suff_add"></a>

#### SHAP - Best Label: Zusammenfassung

In [17]:
sent = data["Zusammenfassung"][1][1]
score = 1.0 #bench.score(sent)
target = tag2id["Zusammenfassung"]
metr = eva_shap["Zusammenfassung"][1].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = shap["Zusammenfassung"]["Add"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.02


<span style="color:purple">**! Inclusion of negative tokens has no significant effect** </span>

#### SHAP - Worst Label: Zusammenfassung

In [18]:
sent = data["Mix"][4][1]
score = 1.0 #bench.score(sent)
target = tag2id["Mix"]
metr = eva_shap["Mix"][4].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = shap["Mix"]["Add"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.65


<span style="color:purple">**! Inclusion of negative tokens worsens score** </span>

#### IG - Worst Label: Mix

In [19]:
sent = data["Mix"][4][1]
score = 0.94 #bench.score(sent)
target = tag2id["Mix"]
metr = eva_ig["Mix"][0].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = ig["Suff_Mix"]["Add"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.66


<span style="color:purple">**! Inclusion of negative tokens improves score throughout last half of total steps such that correct label is predicted in last two of them and beforehand scores for false labels are reduced** </span>

#### IG - Best Label: EchoBefunde

In [20]:
sent = data["EchoBefunde"][3][1]
score = 1.0 #bench.score(sent)
target = tag2id["EchoBefunde"]
metr = eva_ig["EchoBefunde"][3].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = metr.scores[1:-1]
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

scores = [s for s in scores if s>=0]
aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, metr.scores[1:-1])]}\n")
for i in np.arange(.1, 1.1, .1):
    sect = round(len(scores)*i)
    indices = np.argsort(scores)[::-1][:sect]
    filtered = list(filter(lambda x: x!= "[MASK]", tokens))
    # Get top k tokens
    top_tok = [filtered[i] for i in sorted(indices)]
    #s = tokenizer.convert_tokens_to_string(top_tok)
    new = ig["EchoBefunde"]["Add"][i] #score[f"LABEL_{target}"] - bench.score(s)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) only: '{s}' affects original score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(s).values()))]}: {np.max(list(bench.score(s).values()))}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)), 2)}")


Mean of all scores: 0.08


<span style="color:purple">**! Inclusion of negative tokens in first step improves score substantially such that correct label is predicted** </span>