Matisse Colombon S4119576

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gsarti/ik-nlp-tutorials/blob/main/notebooks/W5E_Inseq_Analysis.ipynb)

In [1]:
# Run in Colab to install local packages
!pip install inseq

[33mDEPRECATION: pytorch-lightning 1.5.10 has a non-standard dependency specifier torch>=1.7.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


finishing touches W2E# Exercise 1: Analyzing language generation models with Inseq 🐛

*Adapted in part from the [Inseq documentation](https://inseq.readthedocs.io/)*

Inseq is a toolkit based on the 🤗 Transformers and [Captum](https://captum.ai/docs/introduction) libraries for interpreting language generation models using feature attribution methods. Inseq allows you to analyze the behavior of a language generation model by computing the importance of each input token for each token in the generated output. The importance can be obtained using approaches based on attention, gradients and more, which we will see in more detail in the final lecture.

Inseq is a relatively new library, and it is still under active development (contributions welcome! 🙂). You can refer to the [Inseq paper](https://arxiv.org/abs/2302.13942) for an overview of the tool and some examples, or [this paper](https://www.semanticscholar.org/paper/Are-Character-level-Translations-Worth-the-Wait-An-Edman-Toral/ed7b51e4a5c4835218f6697b280afb2849211939) for a recent work from our GroNLP group on using Inseq to analyze character-level translation models.

In the following sections two simple use-cases of Inseq are presented.

## Attributing (Un)constrained Machine Translation

In this section we will use Inseq to compute the importance of each input token for each token in the generated output. We will use the [Helsinki-NLP/opus-mt-en-nl](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl) model, which is a pretrained machine translation model from English to Dutch.

In [2]:
import inseq
import warnings

warnings.filterwarnings("ignore")

model = inseq.load_model("Helsinki-NLP/opus-mt-en-nl", "input_x_gradient")
out = model.attribute(
    "Don't get hot-headed mate, it's easy peasy to study models with Inseq!",
    attribute_target=True,
    step_scores=["probability"]
)
out.show()

Attributing with input_x_gradient...: 100%|██████████| 20/20 [00:00<00:00, 36.57it/s]


Unnamed: 0_level_0,▁Maak,▁je,▁niet,▁zo,▁druk,",",▁het,▁is,▁makkelijk,▁om,▁modellen,▁te,▁bestuderen,▁met,▁In,se,q,!,</s>
▁Don,0.045,0.036,0.02,0.028,0.022,0.026,0.02,0.014,0.02,0.01,0.011,0.012,0.016,0.012,0.007,0.006,0.005,0.027,0.02
',0.019,0.017,0.013,0.017,0.011,0.016,0.012,0.007,0.016,0.007,0.007,0.008,0.011,0.006,0.005,0.004,0.003,0.025,0.017
t,0.031,0.025,0.028,0.021,0.017,0.02,0.018,0.01,0.009,0.005,0.006,0.007,0.008,0.006,0.008,0.004,0.002,0.014,0.012
▁get,0.118,0.068,0.056,0.05,0.049,0.027,0.03,0.02,0.014,0.01,0.013,0.012,0.015,0.012,0.01,0.005,0.004,0.018,0.021
▁hot,0.066,0.073,0.071,0.088,0.084,0.042,0.026,0.022,0.016,0.01,0.014,0.011,0.012,0.011,0.009,0.008,0.007,0.022,0.019
-,0.021,0.02,0.024,0.027,0.025,0.018,0.01,0.007,0.008,0.005,0.005,0.005,0.007,0.005,0.005,0.005,0.006,0.013,0.009
headed,0.143,0.146,0.179,0.136,0.178,0.133,0.068,0.051,0.029,0.023,0.034,0.027,0.047,0.032,0.021,0.013,0.009,0.041,0.042
▁mate,0.085,0.072,0.081,0.061,0.07,0.095,0.052,0.02,0.02,0.013,0.014,0.013,0.02,0.015,0.01,0.005,0.005,0.021,0.022
",",0.02,0.02,0.017,0.016,0.015,0.03,0.019,0.013,0.014,0.006,0.005,0.007,0.008,0.006,0.004,0.004,0.003,0.021,0.018
▁it,0.017,0.012,0.012,0.01,0.009,0.021,0.026,0.022,0.014,0.012,0.008,0.013,0.014,0.009,0.005,0.004,0.003,0.017,0.014
',0.013,0.015,0.01,0.014,0.009,0.021,0.018,0.025,0.022,0.014,0.011,0.011,0.013,0.009,0.007,0.004,0.003,0.035,0.03
s,0.014,0.011,0.008,0.01,0.008,0.014,0.019,0.036,0.022,0.015,0.009,0.008,0.009,0.007,0.008,0.007,0.01,0.014,0.012
▁easy,0.045,0.048,0.023,0.03,0.02,0.026,0.056,0.082,0.082,0.056,0.025,0.031,0.027,0.023,0.012,0.006,0.004,0.044,0.03
▁peas,0.107,0.06,0.043,0.051,0.054,0.06,0.19,0.201,0.314,0.259,0.149,0.069,0.101,0.072,0.072,0.025,0.021,0.08,0.066
y,0.021,0.02,0.011,0.01,0.01,0.01,0.034,0.043,0.039,0.039,0.026,0.014,0.018,0.013,0.014,0.005,0.004,0.009,0.008
▁to,0.028,0.019,0.015,0.015,0.013,0.015,0.036,0.034,0.032,0.033,0.03,0.031,0.044,0.027,0.015,0.014,0.016,0.017,0.014
▁study,0.049,0.021,0.02,0.021,0.018,0.021,0.039,0.052,0.037,0.072,0.104,0.108,0.206,0.113,0.028,0.012,0.008,0.037,0.027
▁models,0.053,0.034,0.027,0.035,0.027,0.039,0.037,0.037,0.062,0.1,0.235,0.118,0.157,0.099,0.046,0.011,0.009,0.066,0.046
▁with,0.013,0.011,0.007,0.01,0.008,0.011,0.011,0.013,0.014,0.018,0.031,0.06,0.038,0.078,0.026,0.013,0.012,0.019,0.015
▁In,0.009,0.007,0.007,0.009,0.006,0.011,0.009,0.008,0.009,0.012,0.033,0.041,0.018,0.05,0.156,0.047,0.025,0.015,0.011
s,0.006,0.007,0.006,0.008,0.006,0.007,0.006,0.007,0.009,0.008,0.011,0.017,0.011,0.017,0.049,0.111,0.045,0.013,0.011
eq,0.033,0.043,0.034,0.052,0.037,0.059,0.045,0.042,0.065,0.045,0.079,0.137,0.077,0.181,0.336,0.55,0.6,0.121,0.119
!,0.025,0.017,0.016,0.023,0.013,0.026,0.017,0.02,0.026,0.01,0.019,0.02,0.02,0.021,0.017,0.012,0.018,0.053,0.075
</s>,0.021,0.015,0.017,0.018,0.016,0.017,0.014,0.016,0.022,0.015,0.018,0.021,0.02,0.025,0.042,0.055,0.061,0.03,0.049
probability,0.045,0.495,0.396,0.337,0.474,0.555,0.349,0.853,0.603,0.656,0.738,0.656,0.422,0.497,0.917,0.829,0.959,0.461,0.897

Unnamed: 0_level_0,▁Maak,▁je,▁niet,▁zo,▁druk,",",▁het,▁is,▁makkelijk,▁om,▁modellen,▁te,▁bestuderen,▁met,▁In,se,q,!,</s>
▁Maak,Unnamed: 1_level_1,0.184,0.166,0.116,0.135,0.049,0.044,0.038,0.012,0.015,0.011,0.011,0.004,0.009,0.006,0.002,0.001,0.02,0.02
▁je,Unnamed: 1_level_2,Unnamed: 2_level_2,0.091,0.063,0.046,0.037,0.03,0.031,0.008,0.01,0.01,0.007,0.003,0.006,0.003,0.001,0.001,0.014,0.015
▁niet,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,0.058,0.041,0.024,0.023,0.018,0.006,0.007,0.006,0.006,0.002,0.005,0.002,0.001,0.001,0.009,0.009
▁zo,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,0.051,0.036,0.025,0.021,0.008,0.007,0.007,0.007,0.002,0.006,0.002,0.001,0.001,0.012,0.011
▁druk,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,0.091,0.03,0.028,0.012,0.019,0.01,0.012,0.004,0.007,0.004,0.002,0.001,0.015,0.015
",",Unnamed: 1_level_6,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,0.035,0.029,0.008,0.016,0.007,0.007,0.003,0.007,0.003,0.001,0.001,0.014,0.014
▁het,Unnamed: 1_level_7,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,0.033,0.015,0.031,0.007,0.01,0.003,0.005,0.003,0.001,0.001,0.01,0.008
▁is,Unnamed: 1_level_8,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,0.017,0.026,0.007,0.008,0.003,0.004,0.003,0.001,0.001,0.008,0.008
▁makkelijk,Unnamed: 1_level_9,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,0.073,0.026,0.021,0.011,0.01,0.007,0.002,0.002,0.016,0.015
▁om,Unnamed: 1_level_10,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,0.014,0.022,0.009,0.008,0.005,0.002,0.002,0.011,0.01
▁modellen,Unnamed: 1_level_11,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,0.089,0.03,0.033,0.019,0.006,0.003,0.019,0.02
▁te,Unnamed: 1_level_12,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,0.009,0.009,0.004,0.002,0.001,0.009,0.007
▁bestuderen,Unnamed: 1_level_13,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,0.044,0.012,0.005,0.002,0.029,0.016
▁met,Unnamed: 1_level_14,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,0.013,0.01,0.007,0.007,0.008
▁In,Unnamed: 1_level_15,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,0.034,0.026,0.005,0.01
se,Unnamed: 1_level_16,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,0.066,0.009,0.013
q,Unnamed: 1_level_17,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,0.02,0.025
!,Unnamed: 1_level_18,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,0.07
</s>,Unnamed: 1_level_19,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19
probability,0.045,0.495,0.396,0.337,0.474,0.555,0.349,0.853,0.603,0.656,0.738,0.656,0.422,0.497,0.917,0.829,0.959,0.461,0.897


<details>
    <summary> Observations </summary>

- The model translates idiomatic expressions in a non-literal way. Attribution scores reflect that the model is attributing strong importance to pieces of the idiom when translating (e.g. `_headed`, `_peas`), while also accounting for the prefix when producing an idiomatic translation (e.g. `_Maak`).
- The model gets progressively more confident in its translation as it generates more tokens. The first generated token is very unlikely.
</details>

Let's try now to **constrain** the generation to a more literal translation of the input. Attributing a prespecified output can be intuitively thought as a way to ask a model to justify a possible prediction. Note that this should be done with care, since if the output is very unlikely the results will be very noisy.

In [3]:
import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-nl", "input_x_gradient")
out = model.attribute(
    "Don't get hot-headed mate, it's easy peasy to study models with Inseq!",
    #"Niet heethoofdig worden maatje, het is gemakkelijk peasy om modellen te bestuderen met Inseq!",
    "Niet heethoofdig worden man, het is een makkie om modellen te bestuderen met Inseq!",
    attribute_target=True,
    step_scores=["probability"]
)
out.show()

Attributing with input_x_gradient...: 100%|██████████| 24/24 [00:00<00:00, 37.62it/s]


Unnamed: 0_level_0,▁Niet,▁heet,hoofd,ig,▁worden,▁man,",",▁het,▁is,▁een,▁,mak,kie,▁om,▁modellen,▁te,▁bestuderen,▁met,▁In,se,q,!,</s>
▁Don,0.08,0.041,0.013,0.018,0.036,0.024,0.028,0.017,0.011,0.013,0.009,0.011,0.006,0.009,0.01,0.013,0.016,0.011,0.008,0.006,0.005,0.027,0.013
',0.031,0.025,0.012,0.015,0.024,0.016,0.024,0.015,0.006,0.008,0.005,0.006,0.004,0.007,0.007,0.008,0.011,0.006,0.004,0.004,0.003,0.026,0.01
t,0.045,0.021,0.013,0.013,0.024,0.02,0.018,0.011,0.007,0.007,0.006,0.005,0.003,0.005,0.006,0.007,0.008,0.005,0.007,0.004,0.002,0.014,0.009
▁get,0.095,0.041,0.018,0.034,0.071,0.042,0.025,0.019,0.013,0.011,0.009,0.011,0.006,0.008,0.011,0.01,0.015,0.01,0.01,0.005,0.004,0.02,0.021
▁hot,0.082,0.091,0.09,0.055,0.064,0.042,0.026,0.021,0.016,0.014,0.013,0.015,0.01,0.012,0.011,0.01,0.012,0.01,0.009,0.007,0.007,0.023,0.017
-,0.023,0.023,0.032,0.019,0.018,0.021,0.013,0.01,0.006,0.007,0.005,0.006,0.003,0.004,0.005,0.005,0.007,0.005,0.005,0.005,0.006,0.012,0.007
headed,0.126,0.186,0.263,0.143,0.092,0.145,0.071,0.06,0.039,0.032,0.033,0.04,0.02,0.026,0.032,0.027,0.046,0.03,0.022,0.013,0.009,0.041,0.039
▁mate,0.08,0.05,0.056,0.112,0.071,0.197,0.066,0.088,0.042,0.04,0.024,0.022,0.016,0.021,0.036,0.02,0.02,0.025,0.021,0.006,0.005,0.028,0.04
",",0.025,0.021,0.015,0.022,0.018,0.026,0.021,0.014,0.012,0.009,0.006,0.008,0.005,0.007,0.006,0.007,0.008,0.006,0.004,0.004,0.003,0.021,0.011
▁it,0.02,0.016,0.012,0.012,0.011,0.016,0.019,0.017,0.024,0.013,0.011,0.011,0.005,0.01,0.008,0.014,0.014,0.008,0.005,0.004,0.003,0.017,0.009
',0.016,0.026,0.012,0.012,0.016,0.016,0.027,0.018,0.029,0.017,0.012,0.011,0.006,0.012,0.009,0.012,0.012,0.009,0.006,0.004,0.003,0.035,0.019
s,0.014,0.014,0.009,0.008,0.011,0.014,0.014,0.016,0.043,0.02,0.017,0.015,0.007,0.012,0.007,0.009,0.009,0.007,0.007,0.007,0.01,0.014,0.008
▁easy,0.036,0.037,0.016,0.02,0.038,0.021,0.037,0.045,0.097,0.123,0.118,0.132,0.056,0.047,0.027,0.035,0.026,0.022,0.012,0.006,0.004,0.043,0.021
▁peas,0.098,0.079,0.051,0.041,0.069,0.061,0.086,0.121,0.249,0.265,0.253,0.214,0.107,0.167,0.111,0.072,0.097,0.063,0.066,0.024,0.02,0.073,0.053
y,0.016,0.009,0.007,0.009,0.017,0.01,0.012,0.022,0.049,0.045,0.051,0.037,0.018,0.027,0.017,0.013,0.017,0.012,0.013,0.005,0.004,0.008,0.008
▁to,0.02,0.017,0.012,0.013,0.019,0.016,0.015,0.025,0.037,0.037,0.038,0.034,0.014,0.023,0.028,0.031,0.042,0.026,0.014,0.014,0.017,0.016,0.011
▁study,0.032,0.03,0.016,0.016,0.029,0.02,0.03,0.028,0.044,0.041,0.031,0.046,0.021,0.055,0.092,0.097,0.199,0.109,0.025,0.011,0.008,0.035,0.023
▁models,0.033,0.05,0.03,0.024,0.038,0.026,0.051,0.036,0.029,0.038,0.038,0.044,0.042,0.068,0.208,0.104,0.152,0.096,0.045,0.011,0.009,0.063,0.029
▁with,0.011,0.015,0.008,0.007,0.011,0.008,0.014,0.011,0.01,0.011,0.009,0.011,0.007,0.013,0.029,0.049,0.036,0.074,0.024,0.013,0.012,0.018,0.011
▁In,0.009,0.011,0.007,0.006,0.009,0.009,0.013,0.008,0.008,0.007,0.007,0.007,0.004,0.008,0.033,0.032,0.017,0.048,0.137,0.047,0.025,0.015,0.01
s,0.007,0.012,0.007,0.006,0.008,0.007,0.009,0.006,0.006,0.007,0.006,0.005,0.003,0.005,0.01,0.014,0.011,0.016,0.044,0.111,0.045,0.012,0.009
eq,0.049,0.082,0.047,0.037,0.061,0.052,0.086,0.044,0.041,0.037,0.034,0.037,0.021,0.039,0.078,0.109,0.075,0.171,0.301,0.542,0.593,0.111,0.093
!,0.028,0.034,0.02,0.018,0.027,0.021,0.036,0.021,0.014,0.013,0.011,0.01,0.007,0.012,0.018,0.019,0.019,0.02,0.016,0.012,0.018,0.05,0.066
</s>,0.023,0.026,0.019,0.016,0.021,0.024,0.025,0.016,0.015,0.015,0.014,0.012,0.007,0.011,0.015,0.018,0.02,0.023,0.038,0.054,0.061,0.028,0.043
probability,0.133,0.013,0.684,0.71,0.687,0.004,0.644,0.585,0.881,0.011,0.046,0.963,0.948,0.797,0.724,0.625,0.392,0.48,0.921,0.827,0.958,0.618,0.899

Unnamed: 0_level_0,▁Niet,▁heet,hoofd,ig,▁worden,▁man,",",▁het,▁is,▁een,▁,mak,kie,▁om,▁modellen,▁te,▁bestuderen,▁met,▁In,se,q,!,</s>
▁Niet,Unnamed: 1_level_1,0.045,0.046,0.038,0.046,0.025,0.029,0.033,0.017,0.01,0.012,0.01,0.006,0.008,0.006,0.005,0.002,0.004,0.004,0.001,0.001,0.014,0.015
▁heet,Unnamed: 1_level_2,Unnamed: 2_level_2,0.167,0.09,0.031,0.021,0.02,0.021,0.012,0.011,0.016,0.012,0.01,0.009,0.006,0.007,0.003,0.004,0.006,0.002,0.001,0.01,0.012
hoofd,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,0.195,0.065,0.034,0.041,0.047,0.017,0.022,0.02,0.013,0.014,0.011,0.009,0.013,0.005,0.006,0.009,0.002,0.002,0.017,0.015
ig,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,0.057,0.025,0.025,0.04,0.014,0.016,0.014,0.01,0.005,0.009,0.008,0.009,0.003,0.006,0.009,0.001,0.001,0.014,0.012
▁worden,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,0.04,0.034,0.045,0.015,0.016,0.011,0.013,0.006,0.01,0.011,0.01,0.003,0.008,0.012,0.001,0.001,0.018,0.013
▁man,Unnamed: 1_level_6,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,0.083,0.102,0.039,0.046,0.026,0.024,0.011,0.023,0.032,0.018,0.007,0.027,0.027,0.004,0.002,0.037,0.037
",",Unnamed: 1_level_7,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,0.022,0.015,0.01,0.013,0.013,0.007,0.01,0.004,0.005,0.002,0.003,0.002,0.001,0.001,0.009,0.009
▁het,Unnamed: 1_level_8,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,0.024,0.021,0.028,0.022,0.012,0.014,0.005,0.005,0.002,0.003,0.003,0.001,0.001,0.007,0.007
▁is,Unnamed: 1_level_9,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,0.019,0.041,0.033,0.013,0.014,0.006,0.005,0.002,0.003,0.003,0.001,0.001,0.005,0.006
▁een,Unnamed: 1_level_10,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,0.061,0.061,0.031,0.02,0.008,0.007,0.002,0.004,0.004,0.001,0.001,0.008,0.009
▁,Unnamed: 1_level_11,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,0.038,0.07,0.025,0.008,0.009,0.004,0.004,0.003,0.001,0.001,0.004,0.007
mak,Unnamed: 1_level_12,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,0.416,0.122,0.035,0.042,0.017,0.017,0.014,0.004,0.004,0.02,0.031
kie,Unnamed: 1_level_13,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,0.115,0.034,0.037,0.016,0.015,0.014,0.003,0.003,0.016,0.021
▁om,Unnamed: 1_level_14,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,0.014,0.019,0.009,0.007,0.005,0.002,0.002,0.007,0.008
▁modellen,Unnamed: 1_level_15,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,0.075,0.027,0.026,0.016,0.005,0.003,0.012,0.016
▁te,Unnamed: 1_level_16,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,0.008,0.008,0.003,0.002,0.001,0.007,0.006
▁bestuderen,Unnamed: 1_level_17,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,0.039,0.01,0.005,0.002,0.02,0.017
▁met,Unnamed: 1_level_18,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,0.013,0.01,0.006,0.004,0.009
▁In,Unnamed: 1_level_19,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19,0.033,0.025,0.004,0.011
se,Unnamed: 1_level_20,Unnamed: 2_level_20,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20,Unnamed: 6_level_20,Unnamed: 7_level_20,Unnamed: 8_level_20,Unnamed: 9_level_20,Unnamed: 10_level_20,Unnamed: 11_level_20,Unnamed: 12_level_20,Unnamed: 13_level_20,Unnamed: 14_level_20,Unnamed: 15_level_20,Unnamed: 16_level_20,Unnamed: 17_level_20,Unnamed: 18_level_20,Unnamed: 19_level_20,Unnamed: 20_level_20,0.066,0.005,0.017
q,Unnamed: 1_level_21,Unnamed: 2_level_21,Unnamed: 3_level_21,Unnamed: 4_level_21,Unnamed: 5_level_21,Unnamed: 6_level_21,Unnamed: 7_level_21,Unnamed: 8_level_21,Unnamed: 9_level_21,Unnamed: 10_level_21,Unnamed: 11_level_21,Unnamed: 12_level_21,Unnamed: 13_level_21,Unnamed: 14_level_21,Unnamed: 15_level_21,Unnamed: 16_level_21,Unnamed: 17_level_21,Unnamed: 18_level_21,Unnamed: 19_level_21,Unnamed: 20_level_21,Unnamed: 21_level_21,0.012,0.037
!,Unnamed: 1_level_22,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22,Unnamed: 6_level_22,Unnamed: 7_level_22,Unnamed: 8_level_22,Unnamed: 9_level_22,Unnamed: 10_level_22,Unnamed: 11_level_22,Unnamed: 12_level_22,Unnamed: 13_level_22,Unnamed: 14_level_22,Unnamed: 15_level_22,Unnamed: 16_level_22,Unnamed: 17_level_22,Unnamed: 18_level_22,Unnamed: 19_level_22,Unnamed: 20_level_22,Unnamed: 21_level_22,Unnamed: 22_level_22,0.102
</s>,Unnamed: 1_level_23,Unnamed: 2_level_23,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23,Unnamed: 7_level_23,Unnamed: 8_level_23,Unnamed: 9_level_23,Unnamed: 10_level_23,Unnamed: 11_level_23,Unnamed: 12_level_23,Unnamed: 13_level_23,Unnamed: 14_level_23,Unnamed: 15_level_23,Unnamed: 16_level_23,Unnamed: 17_level_23,Unnamed: 18_level_23,Unnamed: 19_level_23,Unnamed: 20_level_23,Unnamed: 21_level_23,Unnamed: 22_level_23,Unnamed: 23_level_23
probability,0.133,0.013,0.684,0.71,0.687,0.004,0.644,0.585,0.881,0.011,0.046,0.963,0.948,0.797,0.724,0.625,0.392,0.48,0.921,0.827,0.958,0.618,0.899


### Your turn to comment

Comment on the resulting scores from the constrained, less idiomatic example, putting them in relation to the unconstrained, more idiomatic one. Consider the following aspects, but feel free to explore other examples and add your own observations:
1. How are importance scores distributed on idiomatic and non-idiomatic tokens?
2. What is the difference in probability between the two examples?
3. Do you notice some patterns regarding the low-probability tokens in the second example?
4. When is the target prefix playing a more important role in the generation, according to the attribution scores?

1: On idiomatic translations the model places more importance on the whole idiom while, while on non idiomic tokens the importance is more focussed on one token.

2: In the literal translation it is pretty sure about tokens hoofd, ig and worden (~0.7) however not very sure about the other tokens. In the idiomatic translation it is per token less sure but the probabilities are more steady at about 0.45.

3: The low probabilities seem to be for the literal translation parts, like the 'heet' from 'hot-headed' has a low probability.

4: As you can see in the Target salliancy heatmap the prefix from the first sentence (Maak) is more important then the prefix from the second sentence (Niet). So the Target prefix is more important in the idiomical translation.

## Contrastive Attribution for Motivating Preferences

In the previous section we used importance scores produced by attributing next token’s probability, which can be seen as answering the question “Which elements of the input are the most relevant to produce the next generation step?”.

However, in many cases we might be more interested in understanding why our model generated its output **rather than another one that we consider to be more likely**. The paper [“Interpreting Language Models with Contrastive Explanations”](https://arxiv.org/abs/2202.10419) by Yin and Neubig (2022) proposes a contrastive attribution method that can be used to answer this question. The method is integrated in Inseq and can be used as follows:

In [4]:
import inseq

model = inseq.load_model("google/flan-t5-base", "input_x_gradient")

out = model.attribute(
    "Does 3 + 3 equal 6?",
    # Fix the original target
    "yes",
    attributed_fn="contrast_prob_diff",
    # Also show the probability delta between the two options
    step_scores=["contrast_prob_diff", "probability"],
    contrast_targets="no",
)

# Normally attributions are normalized to sum up to 1
# Here we want to see how they contribute to the probability difference
out.show()

Attributing with input_x_gradient...: 100%|██████████| 3/3 [00:00<00:00,  8.69it/s]


Unnamed: 0_level_0,▁no → ▁yes,</s>
▁Does,0.149,0.0
▁3,0.09,0.0
▁+,0.089,0.0
▁3,0.097,0.0
▁equal,0.191,0.0
▁6,0.154,0.0
?,0.11,0.0
</s>,0.12,0.0
contrast_prob_diff,0.05,-0.0
probability,0.523,1.0


We can see that the model is relying more heavily on formulation keywords (`Does`, `equal`) and less on the numbers to determine the answer. The gap between the positive and negative answer is also quite small (5%), suggesting that the model is not very confident in its answer. Changing the input to `Does 3 + 3 equal 7?` confirms that the actual expression is not playing a relevant role in the generation.

> ⚠️ **Important**: Since contrastive attribution compares the probabilities of a pair of `(original, contrastive)` tokens, in order for it to work, the compared sequences must have the same length. For example, if "yes" was tokenized as `_y`, `es` we couldn't have compared it with a single token `_no` using Inseq.

### Your turn to attribute

Using the generation model and a task of your choice, try to use contrastive attribution on at least three examples to highlight some interesting pattern of your choice. We encourage you to explore whathever you find most interesting, but here are some suggestions:

- Is negation relevant in producing the correct answer in open question answering models like the one we used in the previous example?

- When translating sentences like `The nurse went to the hospital` to a gendered language like Spanish, Italian or German, the model will have to select a gender for the subject. What is the model relying on to make this choice?

- Considering a fixed example like `Does 3 + 3 = 6?` but using models with increasingly more parameters (e.g. `flan-t5-small`, `flan-t5-base`, `flan-t5-large`), how does input importance and model confidence change?

After producing the visualizations, comment on the results and try to explain what you observe.

Inseq also supports attribution of quantized models (see [example](https://inseq.readthedocs.io/examples/locate_gpt2_knowledge.html)), in case you want to explore using larger models for your analysis. Refer to the [Inseq documentation](https://inseq.readthedocs.io/) for more details.

I looked at how different size models affect the importance of tokens and confidence rates.
First looking at the probabilities we see that the how bigger the model gets how more confident it is on the answer No. 0.61 to 0.838 to 0.905, the biggest jump is from small to base, the jump from base to large is smaller but still clearly exists.

In all models the tokens London, Capital and France have the highest attribution, which seem like the correct words to answer the question. 

In [6]:
import inseq

models = {
    "small": inseq.load_model("google/flan-t5-small", "input_x_gradient"),
    "base": inseq.load_model("google/flan-t5-base", "input_x_gradient"),
    "large": inseq.load_model("google/flan-t5-large", "input_x_gradient")
}

question = "Is the London the capital of France?"

for size, model in models.items():

    
    out = model.attribute(
        question,
        # Fix the original target
        "no",
        attributed_fn="contrast_prob_diff",
        # Also show the probability delta between the two options
        step_scores=["contrast_prob_diff", "probability"],
        contrast_targets="yes",
    )
    
    out.weight_attributions("contrast_prob_diff")
    print(f"Model: {size}")
    out.show()

Attributing with input_x_gradient...: 100%|██████████| 3/3 [00:00<00:00, 18.09it/s]

Model: small





Unnamed: 0_level_0,▁yes → ▁no,</s>
▁I,0.028,-0.0
s,0.023,-0.0
▁the,0.013,-0.0
▁London,0.045,-0.0
▁the,0.017,-0.0
▁capital,0.041,-0.0
▁of,0.011,-0.0
▁France,0.037,-0.0
?,0.025,-0.0
</s>,0.028,-0.0
contrast_prob_diff,0.268,-0.0
probability,0.61,0.998


Attributing with input_x_gradient...: 100%|██████████| 3/3 [00:00<00:00, 12.29it/s]

Model: base





Unnamed: 0_level_0,▁yes → ▁no,</s>
▁I,0.062,0.0
s,0.05,0.0
▁the,0.05,0.0
▁London,0.126,0.0
▁the,0.058,0.0
▁capital,0.084,0.0
▁of,0.031,0.0
▁France,0.09,0.0
?,0.082,0.0
</s>,0.064,0.0
contrast_prob_diff,0.697,0.01
probability,0.838,0.997


Attributing with input_x_gradient...: 100%|██████████| 3/3 [00:00<00:00,  6.11it/s]

Model: large





Unnamed: 0_level_0,▁yes → ▁no,</s>
▁I,0.087,-0.0
s,0.047,-0.0
▁the,0.108,-0.0
▁London,0.149,-0.0
▁the,0.069,-0.0
▁capital,0.117,-0.0
▁of,0.031,-0.0
▁France,0.099,-0.0
?,0.056,-0.0
</s>,0.122,-0.0
contrast_prob_diff,0.885,-0.035
probability,0.905,0.95
