# Which of the interpretability methods is more faithful ?

**Approach:**
Calculate for IG & SHAP each Comprehensiveness & Sufficiency
* Which of them performs better in each ?
    - Reasons?
* Which labels are conspicuous for scoring high/low or 
    - Metric specific tendencies ?
    - Data Bias ?
* Finally IG or SHAP "better" overall ?
    - Pros/Cons of approach with Faithfulness metric

## Table of Contents
1. [Comprehensiveness](#comp)  
    1.1 [SHAP](#comp_shap)  
    1.2 [IG](#comp_ig)
2. [Additional Study](#comp_add)

In [1]:
import pickle
import numpy as np
from tabulate import tabulate
from statistics import mean

In [2]:
id2tag = {0: 'Anrede', 1: 'Diagnosen', 2: 'AllergienUnverträglichkeitenRisiken', 3: 'Anamnese', 4: 'Medikation', 5: 'KUBefunde', 6: 'Befunde', 7: 'EchoBefunde', 8: 'Zusammenfassung', 9: 'Mix', 10: 'Abschluss'}
tag2id = {tag: id for id, tag in id2tag.items()}

labels = list(id2tag.values())

In [3]:
with open("data.p", "rb") as f:
    data = pickle.load(f)

In [None]:
with open("ig.p", "rb") as f:
    ig = pickle.load(f)
    
with open("shap.p", "rb") as f:
    shap = pickle.load(f)

with open("eva_ig.p", "rb") as f:
    eva_ig = pickle.load(f)
    
with open("eva_shap.p", "rb") as f:
    eva_shap = pickle.load(f)

In [5]:
compr, suff = {l: None for l in labels}, {l: None for l in labels}

for l in labels:
    right = [data[l].index(d) for d in data[l] if d[0] == l]
    wrong = [data[l].index(d) for d in data[l] if d[0] != l]
    print(l, right, wrong)
    compr_ig = [e.score for eva in eva_ig[l] for e in eva.evaluation_scores if e.name=="aopc_compr"]
    compr_shap = [e.score for eva in eva_shap[l] for e in eva.evaluation_scores if e.name=="aopc_compr"]
    
    suff_ig = [e.score for eva in eva_ig[l] for e in eva.evaluation_scores if e.name=="aopc_suff"]
    suff_shap = [e.score for eva in eva_shap[l] for e in eva.evaluation_scores if e.name=="aopc_suff"]
    
    compr[l] = {"IG": (np.nanmean([e for e in compr_ig if compr_ig.index(e) in right]), np.nanmean([e for e in compr_ig if compr_ig.index(e) in wrong])), 
                "SHAP":(np.nanmean([e for e in compr_shap if compr_shap.index(e) in right]), np.nanmean([e for e in compr_shap if compr_shap.index(e) in wrong]))}
    suff[l] = {"IG": (np.nanmean([e for e in suff_ig if suff_ig.index(e) in right]), np.nanmean([e for e in suff_ig if suff_ig.index(e) in wrong])), 
                "SHAP":(np.nanmean([e for e in suff_shap if suff_shap.index(e) in right]), np.nanmean([e for e in suff_shap if suff_shap.index(e) in wrong]))}

Anrede [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10]
Diagnosen [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] []
AllergienUnverträglichkeitenRisiken [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12]
Anamnese [0, 1, 2, 3, 4, 6, 9, 10, 11, 12] [5, 7, 8, 13]
Medikation [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] []
KUBefunde [0, 1, 2, 3, 5, 6, 7, 8, 9, 10] [4]
Befunde [0, 1, 2, 3, 4, 5, 6, 8, 9, 10] [7, 11]
EchoBefunde [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12, 13, 14, 15, 16]
Zusammenfassung [0, 1, 2, 3, 4, 6, 7, 8, 9, 10] [5, 11, 12]
Mix [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [0, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
Abschluss [0, 2, 3, 4, 5, 6, 7, 8, 9, 11] [1, 10, 12, 13, 14, 15]


Mean of empty slice
Mean of empty slice
Mean of empty slice
Mean of empty slice


## **1. Comprehensiveness**<a name="comp"></a>
$f(x)_j-f(x\text\r_j)_j$ → Score difference once most important tokens are removed.   
**Higher scores better: Exclusion of top 10-100% most important tokens should harm prediction.**  
We expect performance on sentence to suddenly drop once 10% significant tokens removed.

### **1.1. SHAP**<a name="comp_shap"></a>

In [6]:
table = [(l, [[round(s, 2) for s in v] for k, v in compr[l].items() if k == "SHAP"]) for l in labels]
table = sorted(table, key = lambda x: x[1][0][0], reverse = True)
table.insert(0, ["Label", "Comprehensiveness mean scores"])

print(tabulate(table, headers="firstrow"))

Label                                Comprehensiveness mean scores
-----------------------------------  -------------------------------
Anrede                               [[1.0, nan]]
AllergienUnverträglichkeitenRisiken  [[0.85, nan]]
Mix                                  [[0.85, 0.99]]
Zusammenfassung                      [[0.82, 0.12]]
KUBefunde                            [[0.76, 0.93]]
Diagnosen                            [[0.75, nan]]
EchoBefunde                          [[0.66, nan]]
Anamnese                             [[0.65, 0.62]]
Befunde                              [[0.65, 0.79]]
Medikation                           [[0.64, nan]]
Abschluss                            [[0.58, 0.86]]


### Manual Comprehensiveness Scores

#### SHAP - Best Label: Anrede<a name="comp_shap_An"></a>

In [8]:
sent = data["Anrede"][9][1]
score = 1.0 #bench.score(sent)
target = tag2id["Anrede"]
metr = eva_shap["Anrede"][9].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0][:sect]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = shap["Anrede"][i] #score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}: {np.max(list(bench.score(sentence).values()))}")
    aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 1.0


#### IG: Anrede<a name="comp_ig_An"></a>

In [22]:
sent = data["Anrede"][9][1]
score = 1.0 #bench.score(sent)
target = tag2id["Anrede"]
metr = eva_ig["Anrede"][9].explanation #bench.explain(sent, target=target)[0] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0][:sect]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Anrede"][i] #score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}: {np.max(list(bench.score(sentence).values()))}")
    aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.51


#### SHAP - Worst Label: Abschluss<a name="comp_shap_Ab"></a>

In [11]:
sent = data["Abschluss"][2][1]
target = tag2id["Abschluss"]
score = 1.0 #bench.score(sent)
metr = eva_shap["Abschluss"][1].explanation #bench.explain(sent, target=target)[4] ### SHAP ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = shap["Abschluss"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.86


#### IG : Abschluss<a name="comp_ig_Ab"></a>

In [12]:
sent = data["Abschluss"][2][1]
target = tag2id["Abschluss"]
score = 1.0 #bench.score(sent)
metr = eva_ig["Abschluss"][0].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Abschluss"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.06


### **1.2. Integrated Gradients**<a name="comp_ig"></a>

In [13]:
table = [(l, [[round(s, 2) for s in v] for k, v in compr[l].items() if k == "IG"]) for l in labels]
table = sorted(table, key = lambda x: x[1][0][0], reverse = True)
table.insert(0, ["Label", "Comprehensiveness mean scores"])

print(tabulate(table, headers="firstrow"))

Label                                Comprehensiveness mean scores
-----------------------------------  -------------------------------
Mix                                  [[0.7, 0.17]]
Anrede                               [[0.4, nan]]
Abschluss                            [[0.33, 0.06]]
Diagnosen                            [[0.31, nan]]
AllergienUnverträglichkeitenRisiken  [[0.31, nan]]
KUBefunde                            [[0.29, 0.38]]
EchoBefunde                          [[0.25, nan]]
Befunde                              [[0.22, 0.0]]
Medikation                           [[0.13, nan]]
Zusammenfassung                      [[0.13, 0.26]]
Anamnese                             [[-0.0, 0.07]]


#### IG - Best Label: Mix<a name="comp_ig_Mi"></a>

In [15]:
sent = data["Mix"][4][1]
target = tag2id["Mix"]
score = 1.0 #bench.score(sent)
metr = eva_ig["Mix"][0].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Mix"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.91


#### IG - Worst Label: Anamnese<a name="comp_ig_Ana"></a>

In [17]:
sent = data["Anamnese"][1][1]
target = tag2id["Anamnese"]
score = 1.0 #bench.score(sent)
metr = eva_ig["Anamnese"][0].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Anamnese"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.0


---

## Additional Study: Include negative contributing tokens<a name="comp_add"></a>

#### SHAP - Best Label: Anrede

In [18]:
sent = data["Anrede"][9][1]
score = 1.0 #bench.score(sent)
target = tag2id["Anrede"]
metr = eva_shap["Anrede"][9].explanation #bench.explain(sent, target=target)[0] ### SHAP ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0][:sect]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = shap["Anrede"]["Add"][i] #score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}: {np.max(list(bench.score(sentence).values()))}")
    aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 1.0


<span style="color:purple">**! Inclusion of negative tokens doesn't improve score** </span>

#### SHAP - Worst Label: Anrede

In [19]:
sent = data["Abschluss"][2][1]
target = tag2id["Abschluss"]
score = 1.0 #bench.score(sent)
metr = eva_shap["Abschluss"][1].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = shap["Abschluss"]["Add"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.59


<span style="color:purple">**! Inclusion of negative tokens has disadvantage: Doesn't stray model that much away from right prediction, such that correct label in upper half is sometimes predicted** </span>

#### IG - Best Label: Mix

In [20]:
sent = data["Mix"][4][1]
target = tag2id["Mix"]
score = 1.0 #bench.score(sent)
metr = eva_ig["Mix"][0].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Mix"]["Add"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.93


<span style="color:purple">**! Inclusion of negative tokens has no siginificant effect** </span>

#### IG - Worst Label: Anamnese

In [21]:
sent = data["Anamnese"][1][1]
target = tag2id["Anamnese"]
score = 1.0 #bench.score(sent)
metr = eva_ig["Anamnese"][2].explanation #bench.explain(sent, target=target)[4] ### IG ###
scores = list(metr.scores[1:-1])
tokens = [t if scores[i]>=0 else "[MASK]" for i, t in enumerate(metr.tokens[1:-1])] 

aggr = []

#print(f"Original sentence: {sent} \tScore: {round(score[f'LABEL_{target}'],2)}\nFiltered: {[(t, s) for t, s in zip(tokens, scores)]}\n")

for i in np.arange(.1, 1.1, .1):
    sect = round(len([s for s in scores if s>=0])*i)
    sor_scores = np.sort(scores)[::-1][:sect]
    sentence = metr.tokens[1:-1]
    real_i = [scores.index(s) for s in sor_scores if s>=0]
    for x in real_i:
        sentence[x] = "[MASK]"
    sentence = list(filter(lambda x: x!= "[MASK]", sentence))
    #sentence = tokenizer.convert_tokens_to_string(sentence)
    new = ig["Anamnese"]["Add"][i]# score[f"LABEL_{target}"] - bench.score(sentence)[f"LABEL_{target}"]
    #print(f"{sect} important token(s) removed: {sentence} \t affects original sentence score: {round(new, 2)} | Labeled: {id2tag[np.argmax(list(bench.score(sentence).values()))]}")
    if sect >= 1:
        aggr.append(new)

print(f"\nMean of all scores: {round(mean(set(aggr)),2)}")


Mean of all scores: 0.46


<span style="color:purple">**! Inclusion of negative tokens improves score such that Anamnese loses probability in upper half and false positives are predicted in lower half** </span>