In [5]:
import warnings
warnings.filterwarnings("ignore")

import shap
import numpy as np
from src.modules.data_loader_single_hop import BoolQDataLoader
from src.modules.llm_client import LLMClient
from src.modules.rag_engine import RAGEngine

# DeepShap for NRM's

Here we will essentially try to rebuild the DeepShap Explanantion Algorithm for NRM's (Neural Retrieval Models).

The different aspect, compared to the other Posthoc Methods we discuss, is that here, we will try to explain the ranking of the documents themselves. In contrast to explaining where the information among the selected documents came from. This is a key difference

## Evaluation Techniques

**NOTE**: We are using the FullWiki version of the HotPotQA Dataset. Meaning the **entire Wiki** is the context for each question.

There are no ground truth explanations available for any neural model. We therefore have to use different evaluation metrics.

1. In the Paper "A study on the Interpretability of Neural Retrieval Models using
DeepSHAP" a LIME based explanation was used as proxy (https://github.com/marcotcr/lime)
2. **Faithfullness/Fidelity** We can also use a perturbation based evaluation approach. Meaning we will leave out the as important identified tokens in the query and rerun the retrieval comparing the ranking to each other (AOPC = "Area over perturbation curve"). (https://github.com/CristianCosci/AOPC_MoRF)
3. **Sparseness** Leave out all tokens except the top k identified by DeepSHAP.

${AOPC}_M = \frac{1}{L+1} \left\langle \sum_{k=1}^{L} f(x^{(0)}_M) - f(x^{(k)}_M) \right\rangle_{p(X)}$

### Experiment Setup

Bound by the Computational cost of DeepShap! (Run on KIGS Server?)
1. Select a finite amount of instances from the HotpotQA Dataset.
2. Conduct the Retrieval on those instances.
3. Compute Metrics for every Instance and average them to get overall score for DeepShap Explanability.

It is crucial to select the fitting Background image for DeepSHAP. Fortunatly this research was already conducted by [Fernando et. al.](https://arxiv.org/abs/1907.06484). The selection is dependent on the NRM used for the retrieval, tho performances are fairly similar for each NRM.

![Performance Metrics Background Images](img/image.png)

**ATTENTION**: We currently use "sentence-transformers/all-MiniLM-L6-v2" as the embedding model. This is a simple Bi-Encoder and no NRM. A NRM is a Reranker, using both query and document together to generate a relevance score instead of an embedding vector. These models yield significantly better matching results for query and documents, but to the drawback of computational cost and time.

## Next Steps

Discuss for general Experimental setup.

1. Explaining the output of the LLM based on the retrieved documents.
2. Explaining the retrieved documents.

This leads to key questions.

1. Should we compare those two experiments or link them in some way (**A sort of Pipeline for Explaining Retrieval (so both sides: Retrieval and LLM)**)
- If so how?
2. Should the setups for the experiments differ? 
- Using a NRM Reranking for DeepShap and no NRM for other project part?

# Implementation

First test implementation of DeepShap.

**NOTES:**
- We are looking at only the top document and calculate the AOPC-Metric for only that document. Since it was ranked the highest by the Retriever (no matter if it's a NRM or just Similarity Measure) we can assume that the identified Tokens are relevant for the Query.#
- We can also use LIME for Explanation or Comparison for DeepShap, the implementation is straight forward in fairly similar

**1. DeepShap for Bi-Encoder (Cosine Similarity)**

We use the top Document for the AOPC Metric.

In [7]:
DataLoader = BoolQDataLoader()
documents = DataLoader.setup()

RAGEngine = RAGEngine()
RAGEngine.setup(documents=documents)

Creating new vector store with 9427 documents...
Split into 15533 chunks.
RagEngine ready.


**2. DeepShap for NRM's (cross-encoder)<br>**

More powerful and insightful explanations. Also this is closer to the "real world" RAG System, since powerful systems use NRM's for ranking. (**Explain NRM shortly**)