# <a id='toc1_'></a>[Explainable Machine Learning](#toc0_)

**Skills**
1. Implement local explainable techniques like LIME, SHAP, and ICE plots using Python.
2. Implement global explainable techniques such as [Partial Dependence Plots](https://scikit-learn.org/stable/modules/partial_dependence.html) (PDP) and [Accumulated Local Effects](https://en.wikipedia.org/wiki/Accumulated_local_effects) (ALE) plots in Python.
3. Apply example-based explanation techniques to explain machine learning models using Python.
4. Visualize and explain neural network models using SOTA techniques in Python.
5. Critically evaluate interpretable attention and [saliency](https://en.wikipedia.org/wiki/Saliency_map) methods for transformer model explanations.
6. Explore emerging approaches to explainability for large language models (LLMs) and generative computer vision models.

---

**Table of contents**<a id='toc0_'></a>    
- [Explainable Machine Learning](#toc1_)    
- [Module 3️⃣](#toc2_)    
  - [XAI in LLMs](#toc2_1_)    
    - [XAI in LLM Challenges](#toc2_1_1_)    
    - [XAI in LLM Fine-tuning](#toc2_1_2_)    
    - [XAI in LLM Prompting](#toc2_1_3_)    
  - [XAI in Generative Computer Vision](#toc2_2_)    
    - [XAI in Generative CV](#toc2_2_1_)    
    - [XAI in GANs](#toc2_2_2_)    
    - [XAI in Diffusion Models](#toc2_2_3_)    
- [My Questions](#toc3_)    
- [Resources](#toc4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc2_'></a>[Module 3️⃣](#toc0_)
## <a id='toc2_1_'></a>[XAI in LLMs](#toc0_)
### <a id='toc2_1_1_'></a>[XAI in LLM Challenges](#toc0_)
1. Size and complexity of LLMs
   - e.g. GPT4, with unknown number of parameters likely in the trillions makes tracing their reasoning intractable
2. High dimensional representations are entangled + abstract
3. There exist few or now ground truths
### <a id='toc2_1_2_'></a>[XAI in LLM Fine-tuning](#toc0_)
> $\textcolor{#ba4e00}{\textbf{Fine-tuning}}$: A large base model (typically >1B params) trained on a corpus of unlabeled data is fine-tuned on a smaller dataset with labels or through RLHF.

We can break down explanations in LLM fine-tuning into local and global explanations.

#### Local Explanations 
- Feature attribution
- Attention-based explantion
- Example-based explanation
- Natural language explantion

##### Feature Attribution
Types are:
- $\textcolor{#ba4e00}{\textbf{Pertubation-based}}$: where you perturb input examples by removing, masking or altering input features. This can be embedding vectors, hidden units, words, or tokens, and then you evaluate model output changes.
- $\textcolor{#ba4e00}{\textbf{Gradient-based}}$: where you determine the importance of each input feature by analyzing the partial derivatives of the output with respect to each input dimension. 
  - The magnitude of the derivatives reflects the sensitivity of the output to changes in the input. 
  - Integrated gradients are the primary approach to gradient-based explanation in LLMs.
- $\textcolor{#ba4e00}{\textbf{Trans-SHAP}}$ (2023): mainly focuses on adapting SHAP to sub-word text input and providing sequential visualization explanations.
  - Adapted SHAP to transformer-based language models.
- $\textcolor{#ba4e00}{\textbf{Decomposition-based}}$: aim to break down the relevance score into linear contributions from the input. An example of this is layer wise relevance propagation (LRP).

##### Example-based Explanations
Types are:
- $\textcolor{#ba4e00}{\textbf{Counterfactual explanation}}$: reveal what would have happened based on certain observed input changes.
- $\textcolor{#ba4e00}{\textbf{Influential instance}}$: characterize the influence of individual training samples by measuring how much they affect the loss on test points.
- $\textcolor{#ba4e00}{\textbf{Adversarial example}}$: Neural models are highly vulnerable to carefully crafted small modifications in the input data that can drastically alter the model's predictions, despite being nearly imperceptible to humans. These are called adversarial examples. Adversarial examples expose areas where models fail and are used during training to improve model robustness and accuracy.
- $\textcolor{#ba4e00}{\textbf{Natural language explantions}}$: Explain a model's decision-making on an input sequence with generated text.
  - **Approach**: train a language model using both original textual data and human-annotated explanations.

#### Global Explanations 
The different types of global explantions are:

##### Probing-based
> $\textcolor{#ba4e00}{\textbf{Probing-based explanation}}$ we can look at either classifier-based probing or parameter-free probing. 

- In ***classifier-based probing***, the process is 
  1. freeze LLM parameters, then 
  2. generate representations from the LLM. 
  3. train a shallow classifier on those representations to predict linguistic properties. 
  - If you are probing for syntax, you are looking at things like parts of speech, morphology, or dependencies. 
  - Alternatively, if you are probing for semantics, you are examining coreferences, entities, and relations. 
- ***Parameter-free probing*** evaluates an LLM directly on tailored datasets without a classifier. It's important to note that, like many of these approaches, probing validity is debated by researchers. High performance may not mean true linguistic understanding by LLMs.

###### Probing Process
1. Freeze LLM parameters
2. Generate representations from the LLM
3. Train a shallow classifier on those representations to predict linguistic properties

#### Neuron Activation
> $\textcolor{#ba4e00}{\textbf{Neuron activation explanation}}$: examines individual neurons or dimensions rather than the whole vector space. It involves two steps.

1. one identifying important neurons and 
2. two learning relations between linguistic properties and neurons. 
- The lack of ground truth annotations makes evaluating component-level explanations challenging.
- 
#### Neuron-Activation Process
1. Identify important neurons (unsupervised)
2. Learn relations between linguistic properties and the individual ranked neurons in supervised tasks
3. Verify via ablation experiments
4. Generate natural language explanations
5. Test

#### Concept-based
> $\textcolor{#ba4e00}{\textbf{Concept-based explanation}}$: allow us to map the inputs to a set of concepts and measure important scores of each predefined concept to model predictions. Testing concept activation vectors is an example of this.

### <a id='toc2_1_3_'></a>[XAI in LLM Prompting](#toc0_)

- $\textcolor{#ba4e00}{\textbf{Prompting}}$ involves strategically designing task-specific instructions in natural language to guide model output without altering parameters. 
- $\textcolor{#ba4e00}{\textbf{Prompt engineering}}$ includes instruction, context, and user input, which is used to guide the output of the pre-trained large language model.

Types of Prompting:

- $\textcolor{#ba4e00}{\textbf{In-context learning (ICL)}}$: when a model is shown task demonstrations as part of the prompt.
- $\textcolor{#ba4e00}{\textbf{Chain-of-thought (CoT)}}$: involves prompting the model to describe its reasoning and go step by step through a problem.
  - Question we are trying to answer: how does in-context learning influence model behavior?

### XAI in Knowledge Augmentation (RAG)

RAG Pipeline: 
<img src="imgs/rag_pipeline.png" alt="RAG Pipeline" width="600">

1. $\textcolor{#ba4e00}{\textbf{Vector Database}}$: The first concept of our $\textcolor{#ba4e00}{\textbf{RAG}}$ pipeline utilizes embeddings. 
   - A Vector Database is where vector embeddings are stored. 
   - You typically take a bunch of unstructured data like PDFs, images, or videos, and embed some portion of that data using an embedding model that transforms data into vector embeddings.

<img src="imgs/rag_vector_db.png" alt="RAG Vector DB" width="600">

2. $\textcolor{#ba4e00}{\textbf{User query}}$: Also uses embeddings
   - When the user asks a question, the text is transformed into a vector embedding, the same embedding model you use to create your vector database.
   - E.g. if my user query is black leather boots, then that query will be embedded using my embedding model, and I will have a vector embedding that represents that query.

<img src="imgs/rag_user_query.png" alt="RAG User Query" width="600">

3. $\textcolor{#ba4e00}{\textbf{Similarity Algorithm}}$: match the user query embedding to my vector database using a similarity algorithm. 
   - A similarity algorithm is used to find closest matches to the user query in the vector database. 
   - You can use any distance metric here, such as Euclidean distance. 
   - However, most people building RAG pipelines use cosine similarity. 
   - $\textcolor{#ba4e00}{\textbf{Cosine Similarity}}$ 
     - Is scale-invariant, so it can measure the similarity of vectors regardless of the magnitude. 
     - It's more suitable for high dimensional spaces due to its focus on an angle rather than an absolute distance, which can be affected by the curse of dimensionality. 
     - It is interpretable because values are bounded between negative one and one. 

<img src="imgs/rag_similarity.png" alt="RAG Similarity" width="600">

4. Add the LMM to the loop.

<img src="imgs/rag_llm.png" alt="RAG LLM" width="600">

**How to improve Explanability of RAG itself?**
1. Make embedding spaces more explainable through visualization.

$\textcolor{#ba4e00}{\textbf{Latent/Embedding Space}}$: multidimensional space onto which concepts are mapped.
- Interestingly we can map different information sources like text and images to the same latent space, as long as we have a dataset that connects them together, like a large dataset of images and captions from the Internet.

## <a id='toc2_2_'></a>[XAI in Generative Computer Vision](#toc0_)
### <a id='toc2_2_1_'></a>[XAI in Generative CV](#toc0_)
### <a id='toc2_2_2_'></a>[XAI in GANs](#toc0_)
### <a id='toc2_2_3_'></a>[XAI in Diffusion Models](#toc0_)

# <a id='toc3_'></a>[My Questions](#toc0_)

# <a id='toc4_'></a>[Resources](#toc0_)

- [CNN Visualization](https://adamharley.com/nn_vis/cnn/3d.html)
- Paper: [Google Feature Visualization Interactive](https://distill.pub/2017/feature-visualization/)
- Paper: [2020, The elephant in the interpretability room: Why use attention as explanation when we have saliency methods](https://arxiv.org/pdf/2010.05607)