# Responsible AI: XAI GenAI project

This notebook is finalised by Mabel-Brenda Ifeoma Okikaa, s184075.
The first part of my assignment was dicussed with my group. Therefore, some similarites may occur. 
The group consists of: Kjær, Christian Valentin. Kolbert, Jedrzej Konrad & Madsen, Andreas Råskov

## 0. Background



Based on the previous lessons on explainability, post-hoc methods are used to explain the model, such as saliency map, SmoothGrad, LRP, LIME, and SHAP. Take LRP (Layer Wise Relevance Propagation) as an example; it highlights the most relevant pixels to obtain a prediction of the class "cat" by backpropagating the relevance. (image source: [Montavon et. al (2016)](https://giorgiomorales.github.io/Layer-wise-Relevance-Propagation-in-Pytorch/))

<!-- %%[markdown] -->
![LRP example](images/catLRP.jpg)

Another example is about text sentiment classification, here we show a case of visualizing the importance of words given the prediction of 'positive':

![text example](images/textGradL2.png)

where the words highlight with darker colours indicate to be more critical in predicting the sentence to be 'positive' in sentiment.
More examples could be found [here](http://34.160.227.66/?models=sst2-tiny&dataset=sst_dev&hidden_modules=Explanations_Attention&layout=default).

Both cases above require the class or the prediction of the model. But:

***How do you explain a model that does not predict but generates?***

In this project, we will work on explaining the generative model based on the dependency between words. We will first look at a simple example, and using Point-wise Mutual Information (PMI) to compute the saliency map of the sentence. After that we will contruct the expereiment step by step, followed by exercises and questions.


## 1. A simple example to start with
Given a sample sentence: 
> *Tokyo is the capital city of Japan.* 

We are going to explain this sentence by finding the dependency using a saliency map between words.
The dependency of two words in the sentence could be measured by [Point-wise mutual information (PMI)](https://en.wikipedia.org/wiki/Pointwise_mutual_information): 


Mask two words out, e.g. 
> \[MASK-1\] is the captial city of \[MASK-2\].


Ask the generative model to fill in the sentence 10 times, and we have:

| MASK-1      | MASK-2 |
| ----------- | ----------- |
|    tokyo   |     japan   |
|  paris  |     france    |
|  london  |     england    |
|  paris  |     france    |
|  beijing |  china |
|    tokyo   |     japan   |
|  paris  |     france    |
|  paris  |     france    |
|  london  |     england    |
|  beijing |  china |

PMI is calculated by: 

$PMI(x,y)=log_2⁡ \frac{p(\{x,y\}| s-\{x,y\})}{P(\{x\}|s-\{x,y\})P(\{y\}|s-\{x,y\})}$

where $x$, $y$ represents the words that we masked out, $s$ represents the setence, and $s-\{x,y\}$ represents the sentences tokens after removing the words $x$ and $y$.

In this example we have $PMI(Tokyo, capital) = log_2 \frac{0.2}{0.2 * 0.2} = 2.32$

Select an interesting word in the sentences; we can now compute the PMI between all other words and the chosen word using the generative model:
(Here, we use a longer sentence and run 20 responses per word.)
![](images/resPMI.png)


## 2. Preparation
### 2.1 Conda enviroment

```
conda env create -f environment.yml
conda activate xai_llm
```


### 2.2 Download the offline LLM

We use the offline LLM model from hugging face. It's approximately 5 GB.
Download it using the comman below, and save it under `./models/`.
```
huggingface-cli download TheBloke/openchat-3.5-0106-GGUF openchat-3.5-0106.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
# credit to https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF
```

## 3. Mask the sentence and get the responses from LLM
### 3.1 Get the input sentence

**Remember to change the anchor word index when changing the input sentence.**

In [1]:
def get_input():
    # ideally this reads inputs from a file, now it just takes an input
    return input("Enter a sentence: ")
    
anchor_word_idx = 0 # the index of the interested word
prompts_per_word = 20 # number of generated responses  

sentence = get_input()
print("Sentence: ", sentence)

Sentence:  Tokyo is the capital city of Japan


### 3.2 Load the model

In [2]:
from models.ChatModel import ChatModel
model_name = "openchat"
model = ChatModel(model_name)
print(f"Model: {model_name}")

llama_model_loader: loaded meta data with 23 key-value pairs and 291 tensors from ./models/openchat-3.5-0106.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = openchat_openchat-3.5-0106
llama_model_loader: - kv   2:                       llama.context_length u32              = 8192
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.at

Model: openchat


AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | 
Model metadata: {'general.name': 'openchat_openchat-3.5-0106', 'general.architecture': 'llama', 'llama.context_length': '8192', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '32000', 'general.file_type': '15', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '10000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.add_bos_token': 'true', 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.chat_template': "{{ bos_token }}{% for 

### 3.3 Run the prompts and get all the responses


In [3]:
from tools.command_generator import generate_prompts, prefix_prompt
from tools.evaluate_response import get_replacements
from tqdm import tqdm

def run_prompts(model, sentence, anchor_idx, prompts_per_word=20):
    prompts = generate_prompts(sentence, anchor_idx)
    all_replacements = []
    for prompt in prompts:
        replacements = []
        for _ in tqdm(
            range(prompts_per_word),
            desc=f"Input: {prompt}",
        ):
            response = model.get_response(
                prefix_prompt(prompt),
            ).strip()
            if response:
                replacement = get_replacements(prompt, response)
                if replacement:
                    replacements.append(replacement)
        if len(replacements) > 0:
            all_replacements.append(replacements)
    return all_replacements



In [4]:
# Example A: Tokyo is the capital city of Japan.
all_responses = run_prompts(model, sentence, anchor_word_idx, prompts_per_word)

Input: [MASK] [MASK] the capital city of Japan:   0%|          | 0/20 [00:00<?, ?it/s]

Input: [MASK] [MASK] the capital city of Japan:  50%|█████     | 10/20 [00:26<00:17,  1.72s/it]

 Response is not valid. ['[mask]', '[mask]', 'the', 'capital', 'city', 'of', 'japan'] ['tokyo']


Input: [MASK] [MASK] the capital city of Japan: 100%|██████████| 20/20 [00:49<00:00,  2.47s/it]
Input: [MASK] is [MASK] capital city of Japan:  70%|███████   | 14/20 [00:32<00:13,  2.25s/it]

 Response is not valid. ['[mask]', 'is', '[mask]', 'capital', 'city', 'of', 'japan'] ['tokyo', 'is', 'japans', 'capital', 'city']


Input: [MASK] is [MASK] capital city of Japan:  90%|█████████ | 18/20 [00:40<00:03,  1.94s/it]

 Response is not valid. ['[mask]', 'is', '[mask]', 'capital', 'city', 'of', 'japan'] ['tokyo', 'is', '[mask]']


Input: [MASK] is [MASK] capital city of Japan: 100%|██████████| 20/20 [00:46<00:00,  2.31s/it]
Input: [MASK] is the [MASK] city of Japan:  90%|█████████ | 18/20 [00:48<00:04,  2.32s/it]

 Response is not valid. ['[mask]', 'is', 'the', '[mask]', 'city', 'of', 'japan'] ['kyoto', 'is', 'the', 'ancient', 'capital', 'of', 'japan']


Input: [MASK] is the [MASK] city of Japan: 100%|██████████| 20/20 [00:52<00:00,  2.63s/it]
Input: [MASK] is the capital [MASK] of Japan:  65%|██████▌   | 13/20 [00:30<00:16,  2.43s/it]

 Response is not valid. ['[mask]', 'is', 'the', 'capital', '[mask]', 'of', 'japan'] ['tokyo', 'is', 'the', 'capital', '[japan]']


Input: [MASK] is the capital [MASK] of Japan:  80%|████████  | 16/20 [00:36<00:08,  2.14s/it]

 Response is not valid. ['[mask]', 'is', 'the', 'capital', '[mask]', 'of', 'japan'] ['tokyo', 'is', 'the', 'capital', 'tokyo']


Input: [MASK] is the capital [MASK] of Japan: 100%|██████████| 20/20 [00:53<00:00,  2.68s/it]
Input: [MASK] is the capital city [MASK] Japan: 100%|██████████| 20/20 [01:02<00:00,  3.14s/it]
Input: [MASK] is the capital city of [MASK]: 100%|██████████| 20/20 [00:47<00:00,  2.38s/it]


In [6]:
# We print out the computed replacements for the masks
for each_output in all_responses:
    print(each_output)

[['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['', ''], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is'], ['tokyo', 'is']]
[['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'tokyos'], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', 'the'], ['tokyo', '[mask]'], ['tokyo', 'the'], ['', ''], ['tokyo', 'tokyos'], ['tokyo', 'the'], ['tokyo', 'the'], ['', ''], ['tokyo', 'tokyos'], ['tokyo', '[tokyos]']]
[['tokyo', 'capital'], ['kyoto', 'ancient'], ['tokyo', 'capital'], ['tokyo', 'capital'], ['kyoto', 'ancient'], ['tokyo', 'capital'], ['tokyo', 'capital'], ['kyoto', 'former [mask]'], ['osaka', 'second largest'], ['tokyo', 'capital'], ['hiroshima', 'second largest'], ['tokyo', 'bustling capital'

### 3.4 EXERCISE: compute the PMI for each word

$PMI(x,y)=log_2⁡ \frac{p(\{x,y\}| s-\{x,y\})}{P(\{x\}|s-\{x,y\})P(\{y\}|s-\{x,y\})}$

* Compute the $P(x)$, $P(y)$ and $P(x,y)$ first and print it out.
* Compute the PMI for each word.
* Visualize the result by coloring. Tips: you might need to normalize the result first. 


In [13]:
import nltk
import pandas as pd
from nltk.tokenize import word_tokenize, sent_tokenize
from nltk.probability import FreqDist
nltk.download('punkt')
import numpy as np

def compute_pmi(sentence, all_responses, anchor_word_idx, prompts_per_word):
    # Tokenize the sentence (splits )
    words = word_tokenize(sentence.lower())

    # Clean punctuation and so forth 
    tokens = [token for token in words if token.isalnum()]
    
    # Convert words to lowercase
    words = [word.lower() for word in tokens]

    # Create a DataFrame to store PMI calculations
    pmi_df = pd.DataFrame(columns=words)
    pmi_df.loc['px'] = 0
    pmi_df.loc['py'] = 0
    pmi_df.loc['pxy'] = 0
    pmi_df.loc['pmi'] = 0
    pmi_df.loc['saliency'] = 0

    # variables
    idx_y = 0
    word_x = words[anchor_word_idx].lower()

    epsi = 1e-10 # no division by zero 
    # Iterate over each response
    for i, responses in enumerate(all_responses):
        px = epsi
        py = epsi
        pxy = epsi

        # Check response index matches the anchor word index
        if anchor_word_idx == i:
            idx_y = 1

        word_y = words[i + idx_y].lower()

        # Iterate over each response pair
        for response in responses:
            x = response[0]
            y = response[1]
            if x == word_x:
                px += 1
            if y == word_y:
                py += 1
            if x == word_x and y == word_y:
                pxy += 1

        # Calculate probabilities and PMI
        px = px / prompts_per_word
        py = py / prompts_per_word
        pxy = pxy / prompts_per_word
        pmi = np.log2(pxy / (px * py))

        # Update DataFrame with calculated values
        pmi_df.at['px', word_y] = px
        pmi_df.at['py', word_y] = py
        pmi_df.at['pxy', word_y] = pxy
        pmi_df.at['pmi', word_y] = pmi

    # Set the anchor word column to NaN
    pmi_df[word_x] = np.nan

    # Normalize PMI values and calculate saliency
    min_pmi = np.round(pmi_df.loc['pmi'].min(), 10)
    max_pmi = pmi_df.loc['pmi'].max()
    pmi_df.loc['saliency'] = (pmi_df.loc['pmi'] - min_pmi) / (max_pmi - min_pmi)

    return pmi_df


def highlight_text(sentence, p_df,thres):
    # Tokenize the sentence
    words = word_tokenize(sentence)

    # color words based on saliency scores from p_df
    highlighted_sentence = ""
    for word in words:
        if word in p_df.columns:
            # Get the saliency score for the word
            saliency_score = p_df.loc['saliency', word]

            # Set highlight color based on saliency score
            if saliency_score > thres:
                highlighted_word = f"\033[35m{word}\033[0m"  # Bold pink for high saliency
            else:
                highlighted_word = f"\033[91m{word}\033[0m"  # Purple for low saliency
        else:
            highlighted_word = word
        highlighted_sentence += highlighted_word + " "

    return highlighted_sentence

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\s184075\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [14]:
p_df = compute_pmi(sentence, all_responses, anchor_word_idx, prompts_per_word)
print(p_df)

# Highlight text based on saliency scores
highlighted_sentence = highlight_text(sentence.lower(), p_df, thres=0.1)

# Print highlighted text
print(highlighted_sentence)

          tokyo        is       the   capital      city            of  japan
px          NaN  0.950000  0.900000  0.550000  0.900000  1.000000e+00   0.25
py          NaN  0.950000  0.500000  0.350000  0.450000  1.000000e+00   0.25
pxy         NaN  0.950000  0.500000  0.350000  0.450000  1.000000e+00   0.25
pmi         NaN  0.074001  0.152003  0.862496  0.152003 -7.213476e-12   2.00
saliency    NaN  0.037000  0.076002  0.431248  0.076002 -3.606738e-12   1.00
[91mtokyo[0m [91mis[0m [91mthe[0m [35mcapital[0m [91mcity[0m [91mof[0m [35mjapan[0m 



## 4. EXERCISE: Try more examples; maybe come up with your own. Report the results.

* Try to come up with more examples and, change the anchor word/number of responses, and observe the results. What does the explanation mean? Do you think it's a nice explanation? Why and why not? 
* What's the limitation of the current method? When does the method fail to explain? 

### Example 1
Sentence: Photographer reveals wild desert life.

In my first example, I kept the first index as the anchor word and decreased the number of prompts per word to 15. This decision was made considering the specificity of the sentence 'Photographer reveals wild desert life.' 
Given the niche nature of the subject matter, expecting the language model to produce meaningful replacements within such a constrained context could be overly ambitious. As we have a particular subject matter (wild desert life) and action (photographer reveals).
Therefore, I narrowed down the scope abit to focus on generating relevant and coherent replacements. The sentence was inspired from BBC.news.com and shortened a bit to reduced running time. 

In [15]:
 

anchor_word_idx = 0 # the index of the interested word
prompts_per_word = 15 # number of generated responses  

sentence_1 = get_input()
print("Sentence: ", sentence_1)


Sentence:  Photographer reveals wild desert life


In [16]:
all_responses_1 = run_prompts(model, sentence_1, anchor_word_idx, prompts_per_word)
print("                 ")
print("                 ")
print("                 ")
print("                 ")
print("                 ")

for response in all_responses_1:
    print(response)


Input: [MASK] [MASK] wild desert life:   7%|▋         | 1/15 [00:05<01:14,  5.33s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['he', 'observed', 'the', 'diverse', 'wildlife', 'in', 'the', 'vast', 'desolate', 'desert']


Input: [MASK] [MASK] wild desert life:  27%|██▋       | 4/15 [00:22<01:00,  5.53s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['he', 'observed', 'the', 'fascinating', '[wild]', '[desert]', 'life']


Input: [MASK] [MASK] wild desert life:  33%|███▎      | 5/15 [00:26<00:48,  4.81s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['she', 'observed', 'the', 'diverse', 'wildlife', 'in', 'the', 'vast', 'desert', 'landscape']


Input: [MASK] [MASK] wild desert life:  40%|████      | 6/15 [00:32<00:47,  5.29s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['the', '[wild]', 'desert', 'life', 'thrives', 'on', 'the', '[mysterious]', 'dunes']


Input: [MASK] [MASK] wild desert life:  47%|████▋     | 7/15 [00:35<00:37,  4.69s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['giraffes', 'thrive', 'in', 'the', 'harsh', 'wild', 'desert', 'landscape']


Input: [MASK] [MASK] wild desert life:  53%|█████▎    | 8/15 [00:39<00:30,  4.31s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['elephants', 'thrive', 'in', 'the', 'vast', 'savannah']


Input: [MASK] [MASK] wild desert life:  60%|██████    | 9/15 [00:46<00:30,  5.13s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['she', 'observed', 'a', 'fascinating', 'variety', 'of', '[flora]', '[fauna]', 'in', 'the', 'wild', 'desert', 'landscape']


Input: [MASK] [MASK] wild desert life:  67%|██████▋   | 10/15 [00:52<00:27,  5.45s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['fierce', 'predators', 'thrive', 'in', 'this', 'vast', 'rugged', 'landscape']


Input: [MASK] [MASK] wild desert life:  73%|███████▎  | 11/15 [00:55<00:19,  4.78s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['foxes', 'thrive', 'in', 'the', 'vast', 'untamed', 'wilderness']


Input: [MASK] [MASK] wild desert life:  80%|████████  | 12/15 [01:02<00:16,  5.51s/it]

 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['the', 'harsh', 'desert', 'life', 'is', 'teeming', 'with', 'various', 'species', 'such', 'as', 'coyotes', 'roadrunners', 'and', 'snakes']


Input: [MASK] [MASK] wild desert life: 100%|██████████| 15/15 [01:18<00:00,  5.20s/it]


 Response is not valid. ['[mask]', '[mask]', 'wild', 'desert', 'life'] ['the', 'harsh', 'desert', 'life', 'thrives', 'on', 'minimal', 'resources']


Input: [MASK] reveals [MASK] desert life:  13%|█▎        | 2/15 [00:05<00:33,  2.56s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['africa', 'unveils', 'exotic', 'desert', 'life']


Input: [MASK] reveals [MASK] desert life:  20%|██        | 3/15 [00:09<00:37,  3.11s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['the', 'fox', 'reveals', 'the', 'beauty', 'of', 'the', 'desert', 'landscape']


Input: [MASK] reveals [MASK] desert life:  40%|████      | 6/15 [00:16<00:22,  2.52s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['desert', 'reveals', 'arid', 'wildlife']


Input: [MASK] reveals [MASK] desert life:  60%|██████    | 9/15 [00:25<00:18,  3.06s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['the', 'fox', 'unveils', 'its', 'arid', 'desert', 'life']


Input: [MASK] reveals [MASK] desert life:  73%|███████▎  | 11/15 [00:32<00:13,  3.36s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['the', 'fox', 'unveils', 'fascinating', 'desert', 'life']


Input: [MASK] reveals [MASK] desert life:  87%|████████▋ | 13/15 [00:40<00:07,  3.62s/it]

 Response is not valid. ['[mask]', 'reveals', '[mask]', 'desert', 'life'] ['the', 'wildebeest', 'exposes', 'fascinating', 'desert', 'life']


Input: [MASK] reveals [MASK] desert life: 100%|██████████| 15/15 [00:46<00:00,  3.13s/it]
Input: [MASK] reveals wild [MASK] life:   7%|▋         | 1/15 [00:04<01:01,  4.37s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['mastiffs', 'reveal', 'wild', 'african', 'life']


Input: [MASK] reveals wild [MASK] life:  13%|█▎        | 2/15 [00:07<00:44,  3.39s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['africa', 'unveils', 'wild', 'safari', 'life']


Input: [MASK] reveals wild [MASK] life:  20%|██        | 3/15 [00:09<00:33,  2.76s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['india', 'showcases', 'vibrant', 'rural', 'existence']


Input: [MASK] reveals wild [MASK] life:  33%|███▎      | 5/15 [00:14<00:26,  2.67s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['africa', 'unveils', 'thrilling', 'wildlife', 'experience']


Input: [MASK] reveals wild [MASK] life:  40%|████      | 6/15 [00:17<00:24,  2.78s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['the', 'sun', 'exposes', 'a', 'vivid', 'wildlife', 'scene']


Input: [MASK] reveals wild [MASK] life:  47%|████▋     | 7/15 [00:20<00:23,  2.89s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['the', 'author', 'unveils', 'a', 'thrilling', 'wildlife', 'documentary']


Input: [MASK] reveals wild [MASK] life:  60%|██████    | 9/15 [00:31<00:25,  4.24s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['the', 'artist', 'unveils', 'a', 'vivid', 'exotic', 'lifestyle']


Input: [MASK] reveals wild [MASK] life:  67%|██████▋   | 10/15 [00:34<00:19,  3.87s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['africa', 'unveils', 'breathtaking', 'wildlife']


Input: [MASK] reveals wild [MASK] life:  73%|███████▎  | 11/15 [00:38<00:14,  3.68s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['japan', 'unveils', 'mysterious', 'samurai', 'existence']


Input: [MASK] reveals wild [MASK] life:  80%|████████  | 12/15 [00:41<00:10,  3.45s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['mars', 'unveils', 'its', 'mysterious', 'red', 'life']


Input: [MASK] reveals wild [MASK] life:  87%|████████▋ | 13/15 [00:44<00:06,  3.43s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['mexico', 'unveils', 'vibrant', 'cityscape']


Input: [MASK] reveals wild [MASK] life:  93%|█████████▎| 14/15 [00:49<00:03,  3.81s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', '[mask]', 'life'] ['the', 'fox', 'unveils', 'its', 'extraordinary', 'life']


Input: [MASK] reveals wild [MASK] life: 100%|██████████| 15/15 [00:52<00:00,  3.48s/it]
Input: [MASK] reveals wild desert [MASK]:  33%|███▎      | 5/15 [00:25<01:02,  6.27s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'scientist', 'unveils', 'wild', 'desert', 'landscapes']


Input: [MASK] reveals wild desert [MASK]:  53%|█████▎    | 8/15 [00:33<00:25,  3.64s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'coyote', 'unveils', 'a', 'stunning', 'oasis']


Input: [MASK] reveals wild desert [MASK]:  67%|██████▋   | 10/15 [00:40<00:18,  3.71s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'artist', 'unveils', 'a', 'breathtaking', 'painting', 'depicting', 'the', 'vast', 'expanse', 'of', 'a', 'wild', 'desert', 'landscape']


Input: [MASK] reveals wild desert [MASK]:  73%|███████▎  | 11/15 [00:43<00:13,  3.47s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'sun', 'sets', 'over', 'the', 'vast', 'expanse', 'of', 'wild', 'desert', 'terrain']


Input: [MASK] reveals wild desert [MASK]:  87%|████████▋ | 13/15 [00:49<00:06,  3.39s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'artist', 'unveils', 'a', 'stunning', 'painting', 'depicting', 'a', 'vast', 'and', 'barren', 'wilderness']


Input: [MASK] reveals wild desert [MASK]:  93%|█████████▎| 14/15 [00:51<00:03,  3.12s/it]

 Response is not valid. ['[mask]', 'reveals', 'wild', 'desert', '[mask]'] ['the', 'artist', 'unveils', 'wild', 'desert', 'landscapes']


Input: [MASK] reveals wild desert [MASK]: 100%|██████████| 15/15 [00:54<00:00,  3.62s/it]

                 
                 
                 
                 
                 
[['', ''], ['fierce', 'predators thrive in the vast expanse of the untamed'], ['giraffes', 'thrive in the [african] [savannah]'], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['foxes', 'and rabbits experience challenging'], ['the', '[african] [sahara]'], ['', '']]
[['he', 'fascinating'], ['', ''], ['', ''], ['the fox', 'its arid'], ['desert', 'harsh'], ['', ''], ['sheila', 'exotic'], ['he', 'stunning'], ['', ''], ['the eagle', 'breathtaking'], ['', ''], ['he', 'his'], ['', ''], ['the fox', 'the beauty of'], ['the fox', 'the harsh']]
[['', ''], ['', ''], ['', ''], ['jeffrey', 'adventure'], ['', ''], ['', ''], ['', ''], ['the sentence should be science or nature', 'natural'], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['', ''], ['john', 'adventurous']]
[['the coyote', 'landscape'], ['the camel', 'scenery'], ['the fox', 'canyons'], ['the coyote', 'd




Example from the mask out. 
 
 \[MASK-1\] reveals \[MASK-2\] desert life.


| MASK-1      | MASK-2 |
| ----------- | ----------- |
|    he   |     fascinating   |
|  The fox  |     its arid   |
|  Desert |     harsh    |
|  Sheila  |     exotic    |
|  he        |   stunning |
|    the eagle   |     breathtaking   |
|  he  |     his    |
|  the fox  |     the beauty of   |
|  the fox |     the harsh    |


As anticipated, we observe that the model faced challenges in generating diverse and relevant alternatives given the specific context. It appears that the specificity of the prompt restricts the model's ability to explore various linguistic patterns. However, it does provide a comprehensive overview of what is typically associated with wild desert life. For instance, in some countries in Africa, the savannah is renowned as a place to experience wildlife. Additionally, the model suggests adjectives commonly used to describe such experiences. Therefore, the model in some way effectively aids in understanding the potential significance of a photographer revealing wild desert life.

In [19]:
p_df_1 = compute_pmi(sentence_1, all_responses_1, anchor_word_idx, prompts_per_word)
print(p_df_1)

# Highlight text based on saliency scores
highlighted_sentence = highlight_text(sentence_1.lower(), p_df_1, thres=0.1)

# Print highlighted text
print(highlighted_sentence)

          photographer       reveals          wild        desert          life
px                 NaN  6.666667e-12  6.666667e-12  6.666667e-12  6.666667e-12
py                 NaN  6.666667e-12  6.666667e-12  6.666667e-12  6.666667e-12
pxy                NaN  6.666667e-12  6.666667e-12  6.666667e-12  6.666667e-12
pmi                NaN  3.712617e+01  3.712617e+01  3.712617e+01  3.712617e+01
saliency           NaN  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
[91mphotographer[0m [35mreveals[0m [35mwild[0m [35mdesert[0m [35mlife[0m 


Aditionally, the consistent replacement of "photographer" with predominantly male nouns, pronouns, or even specific individuals like Elon Musk, as observed in the model's output, is interesting. This pattern suggests a potential bias or tendency within the language model towards associating certain roles or professions, such as photography, with masculinity. This bias could which may reflect societal stereotypes or gender norms present in the text corpora.

The singular instance where a female name "Sheila" is suggested as a replacement for "photographer," with the accompanying adjective "exotic," is particularly noteworthy. This deviates from the predominantly male replacements and therefore does not follow the abovementioned pattern. However, it still raises questions about the factors influencing the model's decision-making process and why a feminine name like "Sheila" is associated with the adjective "exotic" in this context.

Which led me to my next sentence.

### Example 2
Sentence: Young people are consumed by Instagram.

In my second example, I set the second index as the anchor word so 'people' and decreased the number of prompts per word to 15. I wanted to test out a stereotypical sentence and chose this sentence to investigate if the replacement words would be dominated by 'women', 'females', 'girls'. 

In [20]:
anchor_word_idx = 1 # the index of the interested word
prompts_per_word = 15 # number of generated responses  

sentence_2 = get_input()
print("Sentence: ", sentence_2)

Sentence:  young people are consumed by instagram


In [21]:
all_responses_2 = run_prompts(model, sentence_2, anchor_word_idx, prompts_per_word)
print("                 ")
print("                 ")
print("                 ")
print("                 ")
print("                 ")

for response in all_responses_2:
    print(response)


Input: [MASK] [MASK] are consumed by instagram: 100%|██████████| 15/15 [00:40<00:00,  2.67s/it]


 Response is not valid. ['[mask]', '[mask]', 'are', 'consumed', 'by', 'instagram'] ['users', 'consume', 'instagram']


Input: young [MASK] [MASK] consumed by instagram:  47%|████▋     | 7/15 [00:20<00:23,  2.89s/it]

 Response is not valid. ['young', '[mask]', '[mask]', 'consumed', 'by', 'instagram'] ['young', 'teenagers', 'obsessed', 'with', 'instagram']


Input: young [MASK] [MASK] consumed by instagram:  53%|█████▎    | 8/15 [00:22<00:18,  2.70s/it]

 Response is not valid. ['young', '[mask]', '[mask]', 'consumed', 'by', 'instagram'] ['young', 'adults', 'engaged', 'in', 'social', 'media']


Input: young [MASK] [MASK] consumed by instagram:  87%|████████▋ | 13/15 [01:24<00:16,  8.33s/it]

 Response is not valid. ['young', '[mask]', '[mask]', 'consumed', 'by', 'instagram'] ['young', 'people', 'overwhelmed', 'by', 'instagram']


Input: young [MASK] [MASK] consumed by instagram: 100%|██████████| 15/15 [01:27<00:00,  5.85s/it]
Input: young [MASK] are [MASK] by instagram: 100%|██████████| 15/15 [00:34<00:00,  2.30s/it]
Input: young [MASK] are consumed [MASK] instagram:  60%|██████    | 9/15 [00:27<00:15,  2.62s/it]

 Response is not valid. ['young', '[mask]', 'are', 'consumed', '[mask]', 'instagram'] ['young', 'teenagers', 'are', 'obsessed', 'with', 'instagram']


Input: young [MASK] are consumed [MASK] instagram: 100%|██████████| 15/15 [00:41<00:00,  2.77s/it]
Input: young [MASK] are consumed by [MASK]: 100%|██████████| 15/15 [01:00<00:00,  4.01s/it]

                 
                 
                 
                 
                 
[['users', ''], ['photographers', 'and influencers'], ['people', ''], ['cats', 'and dogs'], ['people', 'nowadays'], ['young', 'people'], ['humans', 'and phones'], ['humans', 'and smartphones'], ['foodies', ''], ['cats', 'and dogs'], ['fruits', 'and vegetables'], ['users', ''], ['users', ''], ['people', ''], ['', '']]
[['teenagers', 'are'], ['people', 'are'], ['couple', ''], ['teenagers', 'are'], ['female', 'student'], ['sentence', 'should be young people are often'], ['', ''], ['', ''], ['young', 'artist is constantly'], ['sentence', 'could be young teenagers are'], ['teenagers', ''], ['entrepreneur', ''], ['', ''], ['entrepreneurs', ''], ['woman', 'was']]
[['teenagers', 'captivated'], ['athletes', 'inspired'], ['photographers', 'inspired'], ['people', 'inspired'], ['people', 'fascinated'], ['women', 'inundated'], ['students', 'captivated'], ['people', 'fascinated'], ['teenagers', 'captivated'], [




In [25]:
p_df_2 = compute_pmi(sentence_2, all_responses_2, anchor_word_idx, prompts_per_word)
print(p_df_2)

# Highlight text based on saliency scores
highlighted_sentence = highlight_text(sentence_2.lower(), p_df_2, thres=0.1)

# Print highlighted text
print(highlighted_sentence)

                 young  people       are      consumed            by  \
px        2.000000e-01     NaN  0.066667  4.000000e-01  5.333333e-01   
py        6.666667e-12     NaN  0.200000  6.666667e-12  9.333333e-01   
pxy       6.666667e-12     NaN  0.066667  6.666667e-12  5.333333e-01   
pmi       2.321928e+00     NaN  2.321928  1.321928e+00  9.953567e-02   
saliency  6.002145e-02     NaN  0.060021  3.301387e-02  1.096765e-12   

             instagram  
px        6.666667e-12  
py        6.666667e-12  
pxy       6.666667e-12  
pmi       3.712617e+01  
saliency  1.000000e+00  
[91myoung[0m [91mpeople[0m [91mare[0m [91mconsumed[0m [91mby[0m [35minstagram[0m 


Below, the term "Instagram" emerges as significant within the context of the sentence, and upon masking it out, the entire domain shifts dramatically. The sentence transforms into an explanation of the natural order, where entities like young turtles or children are depicted as consumed by various forces such as predators or curiosity. These examples stand out as they deviate from the rest, illustrating how the specificity of "Instagram" restricts the range of potential interpretations beyond the realm of social media. Moreover, as anticipated, some replacements for "young people" include terms like "woman" or "female," suggesting that the data underlying the model may have a preconceived notion of typical Instagram users, such as entrepreneurs, women, teenagers, and influencers. Notably, there is a lack of explicit male nouns in these replacements.

## 5. Bonus Exercises
### 5.1 Language pre-processing. 
In this exercise, we only lower the letters and split sentences into words; there's much more to do to pre-process the language. For example, contractions (*I'll*, *She's*, *world's*), suffix and prefix, compound words (*hard-working*). It's called word tokenization in NLP, and there are some Python packages that can do such work for us, e.g. [*TextBlob*](https://textblob.readthedocs.io/en/dev/). 


### 5.2 Better word matching
In the above example of
> Tokyo is the capital of Japan and a popular metropolis in the world.

, GenAI never gives the specific word 'metropolis' when masking it out; instead, sometimes it provides words like 'city', which is not the same word but has a similar meaning. Instead of measuring the exact matching of certain words (i.e. 0 or 1), we can also measure the similarity of two words, e.g. the cosine similarity in word embedding, which ranges from 0 to 1. 