[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/gsarti/ik-nlp-tutorials/blob/main/notebooks/W5E_Inseq_Analysis.ipynb)

In [8]:
# Run in Colab to install local packages
!pip install inseq



You should consider upgrading via the 'C:\Users\luukv\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip' command.


# Exercise 1: Analyzing language generation models with Inseq üêõ

*Adapted in part from the [Inseq documentation](https://inseq.readthedocs.io/)*

Inseq is a toolkit based on the ü§ó Transformers and [Captum](https://captum.ai/docs/introduction) libraries for interpreting language generation models using feature attribution methods. Inseq allows you to analyze the behavior of a language generation model by computing the importance of each input token for each token in the generated output. The importance can be obtained using approaches based on attention, gradients and more, which we will see in more detail in the final lecture.

Inseq is a relatively new library, and it is still under active development (contributions welcome! üôÇ). You can refer to the [Inseq paper](https://arxiv.org/abs/2302.13942) for an overview of the tool and some examples, or [this paper](https://www.semanticscholar.org/paper/Are-Character-level-Translations-Worth-the-Wait-An-Edman-Toral/ed7b51e4a5c4835218f6697b280afb2849211939) for a recent work from our GroNLP group on using Inseq to analyze character-level translation models.

In the following sections two simple use-cases of Inseq are presented.

## Attributing (Un)constrained Machine Translation

In this section we will use Inseq to compute the importance of each input token for each token in the generated output. We will use the [Helsinki-NLP/opus-mt-en-nl](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl) model, which is a pretrained machine translation model from English to Dutch.

In [9]:
import inseq
import warnings

warnings.filterwarnings("ignore")

model = inseq.load_model("Helsinki-NLP/opus-mt-en-nl", "input_x_gradient")
out = model.attribute(
    "Don't get hot-headed mate, it's easy peasy to study models with Inseq!",
    attribute_target=True,
    step_scores=["probability"]
)
out.show()

Attributing with input_x_gradient...: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 20/20 [00:03<00:00,  6.02it/s]


Unnamed: 0_level_0,‚ñÅMaak,‚ñÅje,‚ñÅniet,‚ñÅzo,‚ñÅdruk,",",‚ñÅhet,‚ñÅis,‚ñÅmakkelijk,‚ñÅom,‚ñÅmodellen,‚ñÅte,‚ñÅbestuderen,‚ñÅmet,‚ñÅIn,se,q,!,</s>
‚ñÅDon,0.045,0.036,0.02,0.028,0.022,0.025,0.02,0.014,0.02,0.01,0.011,0.012,0.016,0.012,0.007,0.006,0.005,0.027,0.02
',0.019,0.017,0.013,0.017,0.011,0.016,0.012,0.007,0.016,0.007,0.007,0.008,0.011,0.006,0.005,0.004,0.003,0.025,0.017
t,0.031,0.025,0.028,0.021,0.017,0.02,0.018,0.01,0.009,0.005,0.006,0.007,0.008,0.006,0.008,0.004,0.002,0.014,0.012
‚ñÅget,0.118,0.068,0.056,0.05,0.049,0.027,0.03,0.02,0.014,0.01,0.013,0.012,0.015,0.012,0.01,0.005,0.004,0.018,0.021
‚ñÅhot,0.066,0.073,0.071,0.088,0.084,0.042,0.026,0.022,0.016,0.01,0.014,0.011,0.012,0.011,0.009,0.008,0.007,0.022,0.019
-,0.021,0.02,0.024,0.027,0.025,0.018,0.01,0.007,0.008,0.005,0.005,0.005,0.007,0.005,0.005,0.005,0.006,0.013,0.009
headed,0.143,0.146,0.179,0.136,0.178,0.133,0.068,0.051,0.029,0.023,0.034,0.027,0.047,0.032,0.021,0.013,0.009,0.041,0.042
‚ñÅmate,0.085,0.072,0.081,0.061,0.07,0.095,0.052,0.02,0.02,0.013,0.014,0.013,0.02,0.015,0.01,0.005,0.005,0.021,0.022
",",0.02,0.02,0.017,0.016,0.015,0.03,0.019,0.013,0.014,0.006,0.005,0.007,0.008,0.006,0.004,0.004,0.003,0.021,0.018
‚ñÅit,0.017,0.012,0.012,0.01,0.009,0.021,0.026,0.022,0.014,0.012,0.008,0.013,0.014,0.009,0.005,0.004,0.003,0.017,0.014
',0.013,0.015,0.01,0.014,0.009,0.021,0.018,0.025,0.022,0.014,0.011,0.011,0.013,0.009,0.007,0.004,0.003,0.035,0.03
s,0.014,0.011,0.008,0.01,0.008,0.014,0.019,0.036,0.022,0.015,0.009,0.008,0.009,0.007,0.008,0.007,0.01,0.014,0.012
‚ñÅeasy,0.045,0.048,0.023,0.03,0.02,0.026,0.056,0.082,0.082,0.056,0.025,0.031,0.027,0.023,0.012,0.006,0.004,0.044,0.03
‚ñÅpeas,0.107,0.06,0.043,0.051,0.054,0.06,0.19,0.201,0.314,0.259,0.149,0.069,0.101,0.072,0.072,0.025,0.021,0.08,0.066
y,0.021,0.02,0.011,0.01,0.01,0.01,0.034,0.043,0.039,0.039,0.026,0.014,0.018,0.013,0.014,0.005,0.004,0.009,0.008
‚ñÅto,0.028,0.019,0.015,0.015,0.013,0.015,0.036,0.034,0.032,0.033,0.03,0.031,0.044,0.027,0.015,0.014,0.016,0.017,0.014
‚ñÅstudy,0.049,0.021,0.02,0.021,0.018,0.021,0.039,0.052,0.037,0.072,0.104,0.108,0.206,0.113,0.028,0.012,0.008,0.037,0.027
‚ñÅmodels,0.053,0.034,0.027,0.035,0.027,0.039,0.037,0.037,0.062,0.1,0.235,0.118,0.157,0.099,0.046,0.011,0.009,0.066,0.046
‚ñÅwith,0.013,0.011,0.007,0.01,0.008,0.011,0.011,0.013,0.014,0.018,0.031,0.06,0.038,0.078,0.026,0.013,0.012,0.019,0.015
‚ñÅIn,0.009,0.007,0.007,0.009,0.006,0.011,0.009,0.008,0.009,0.012,0.033,0.041,0.018,0.05,0.156,0.047,0.025,0.015,0.011
s,0.006,0.007,0.006,0.008,0.006,0.007,0.006,0.007,0.009,0.008,0.011,0.017,0.011,0.017,0.049,0.111,0.045,0.013,0.011
eq,0.033,0.043,0.034,0.052,0.037,0.059,0.045,0.042,0.065,0.045,0.079,0.137,0.077,0.181,0.336,0.55,0.6,0.121,0.119
!,0.025,0.017,0.016,0.023,0.013,0.026,0.017,0.02,0.026,0.01,0.019,0.02,0.02,0.021,0.017,0.012,0.018,0.053,0.074
</s>,0.021,0.015,0.017,0.018,0.016,0.017,0.014,0.016,0.022,0.015,0.018,0.021,0.02,0.025,0.042,0.055,0.061,0.03,0.049
probability,0.045,0.494,0.396,0.337,0.474,0.555,0.349,0.853,0.603,0.656,0.738,0.656,0.422,0.497,0.917,0.829,0.959,0.461,0.897

Unnamed: 0_level_0,‚ñÅMaak,‚ñÅje,‚ñÅniet,‚ñÅzo,‚ñÅdruk,",",‚ñÅhet,‚ñÅis,‚ñÅmakkelijk,‚ñÅom,‚ñÅmodellen,‚ñÅte,‚ñÅbestuderen,‚ñÅmet,‚ñÅIn,se,q,!,</s>
‚ñÅMaak,Unnamed: 1_level_1,0.184,0.166,0.116,0.135,0.049,0.044,0.038,0.012,0.015,0.011,0.011,0.004,0.009,0.006,0.002,0.001,0.02,0.02
‚ñÅje,Unnamed: 1_level_2,Unnamed: 2_level_2,0.091,0.063,0.046,0.037,0.03,0.031,0.008,0.01,0.01,0.007,0.003,0.006,0.003,0.001,0.001,0.014,0.015
‚ñÅniet,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,0.058,0.041,0.024,0.023,0.018,0.006,0.007,0.006,0.006,0.002,0.005,0.002,0.001,0.001,0.009,0.009
‚ñÅzo,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,0.051,0.036,0.025,0.021,0.008,0.007,0.007,0.007,0.002,0.006,0.002,0.001,0.001,0.012,0.011
‚ñÅdruk,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,0.091,0.03,0.028,0.012,0.019,0.01,0.012,0.004,0.007,0.004,0.002,0.001,0.015,0.015
",",Unnamed: 1_level_6,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,0.035,0.029,0.008,0.016,0.007,0.007,0.003,0.007,0.003,0.001,0.001,0.014,0.014
‚ñÅhet,Unnamed: 1_level_7,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,0.033,0.015,0.031,0.007,0.01,0.003,0.005,0.003,0.001,0.001,0.01,0.008
‚ñÅis,Unnamed: 1_level_8,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,0.017,0.026,0.007,0.008,0.003,0.004,0.003,0.001,0.001,0.008,0.008
‚ñÅmakkelijk,Unnamed: 1_level_9,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,0.073,0.026,0.021,0.011,0.01,0.007,0.002,0.002,0.016,0.015
‚ñÅom,Unnamed: 1_level_10,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,0.014,0.022,0.009,0.008,0.005,0.002,0.002,0.011,0.01
‚ñÅmodellen,Unnamed: 1_level_11,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,0.089,0.03,0.033,0.019,0.006,0.003,0.019,0.02
‚ñÅte,Unnamed: 1_level_12,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,0.009,0.009,0.004,0.002,0.001,0.009,0.007
‚ñÅbestuderen,Unnamed: 1_level_13,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,0.044,0.012,0.005,0.002,0.029,0.016
‚ñÅmet,Unnamed: 1_level_14,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,0.013,0.01,0.007,0.007,0.008
‚ñÅIn,Unnamed: 1_level_15,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,0.034,0.026,0.005,0.01
se,Unnamed: 1_level_16,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,0.066,0.009,0.013
q,Unnamed: 1_level_17,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,0.02,0.025
!,Unnamed: 1_level_18,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,0.07
</s>,Unnamed: 1_level_19,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19
probability,0.045,0.494,0.396,0.337,0.474,0.555,0.349,0.853,0.603,0.656,0.738,0.656,0.422,0.497,0.917,0.829,0.959,0.461,0.897


<details>
    <summary> Observations </summary>

- The model translates idiomatic expressions in a non-literal way. Attribution scores reflect that the model is attributing strong importance to pieces of the idiom when translating (e.g. `_headed`, `_peas`), while also accounting for the prefix when producing an idiomatic translation (e.g. `_Maak`).
- The model gets progressively more confident in its translation as it generates more tokens. The first generated token is very unlikely.
</details>

Let's try now to **constrain** the generation to a more literal translation of the input. Attributing a prespecified output can be intuitively thought as a way to ask a model to justify a possible prediction. Note that this should be done with care, since if the output is very unlikely the results will be very noisy.

In [10]:
import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-nl", "input_x_gradient")
out = model.attribute(
    "Don't get hot-headed mate, it's easy peasy to study models with Inseq!",
    #"Niet heethoofdig worden maatje, het is gemakkelijk peasy om modellen te bestuderen met Inseq!",
    "Niet heethoofdig worden man, het is een makkie om modellen te bestuderen met Inseq!",
    attribute_target=True,
    step_scores=["probability"]
)
out.show()

Attributing with input_x_gradient...: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 24/24 [00:04<00:00,  5.53it/s]


Unnamed: 0_level_0,‚ñÅNiet,‚ñÅheet,hoofd,ig,‚ñÅworden,‚ñÅman,",",‚ñÅhet,‚ñÅis,‚ñÅeen,‚ñÅ,mak,kie,‚ñÅom,‚ñÅmodellen,‚ñÅte,‚ñÅbestuderen,‚ñÅmet,‚ñÅIn,se,q,!,</s>
‚ñÅDon,0.08,0.041,0.013,0.018,0.036,0.024,0.028,0.017,0.011,0.013,0.009,0.011,0.006,0.009,0.01,0.013,0.016,0.011,0.008,0.006,0.005,0.027,0.013
',0.031,0.025,0.012,0.015,0.024,0.016,0.024,0.015,0.006,0.008,0.005,0.006,0.004,0.007,0.007,0.008,0.011,0.006,0.004,0.004,0.003,0.026,0.01
t,0.045,0.021,0.013,0.013,0.024,0.02,0.018,0.011,0.007,0.007,0.006,0.005,0.003,0.005,0.006,0.007,0.008,0.005,0.007,0.004,0.002,0.014,0.009
‚ñÅget,0.095,0.041,0.018,0.034,0.071,0.042,0.025,0.019,0.013,0.011,0.009,0.011,0.006,0.008,0.011,0.01,0.015,0.01,0.01,0.005,0.004,0.02,0.021
‚ñÅhot,0.082,0.091,0.09,0.055,0.064,0.042,0.026,0.021,0.016,0.014,0.013,0.015,0.01,0.012,0.011,0.01,0.012,0.01,0.009,0.007,0.007,0.023,0.017
-,0.023,0.023,0.032,0.019,0.018,0.021,0.013,0.01,0.006,0.007,0.005,0.006,0.003,0.004,0.005,0.005,0.007,0.005,0.005,0.005,0.006,0.012,0.007
headed,0.126,0.186,0.263,0.143,0.092,0.145,0.071,0.06,0.039,0.032,0.033,0.04,0.02,0.026,0.032,0.027,0.046,0.03,0.022,0.013,0.009,0.041,0.039
‚ñÅmate,0.08,0.05,0.056,0.112,0.071,0.197,0.066,0.088,0.042,0.04,0.024,0.022,0.016,0.021,0.036,0.02,0.02,0.025,0.021,0.006,0.005,0.028,0.04
",",0.025,0.021,0.015,0.022,0.018,0.026,0.021,0.014,0.012,0.009,0.006,0.008,0.005,0.007,0.006,0.007,0.008,0.006,0.004,0.004,0.003,0.021,0.011
‚ñÅit,0.02,0.016,0.012,0.012,0.011,0.016,0.019,0.017,0.024,0.013,0.011,0.011,0.005,0.01,0.008,0.014,0.014,0.008,0.005,0.004,0.003,0.017,0.009
',0.016,0.026,0.012,0.012,0.016,0.016,0.027,0.018,0.029,0.017,0.012,0.011,0.006,0.012,0.009,0.012,0.012,0.009,0.006,0.004,0.003,0.035,0.019
s,0.014,0.014,0.009,0.008,0.011,0.014,0.014,0.016,0.043,0.02,0.017,0.015,0.007,0.012,0.007,0.009,0.009,0.007,0.007,0.007,0.01,0.014,0.008
‚ñÅeasy,0.036,0.037,0.016,0.02,0.038,0.021,0.037,0.045,0.097,0.123,0.118,0.132,0.056,0.047,0.027,0.035,0.026,0.022,0.012,0.006,0.004,0.043,0.021
‚ñÅpeas,0.098,0.079,0.051,0.041,0.069,0.061,0.086,0.121,0.249,0.265,0.253,0.214,0.107,0.167,0.111,0.072,0.097,0.063,0.066,0.024,0.02,0.073,0.053
y,0.016,0.009,0.007,0.009,0.017,0.01,0.012,0.022,0.049,0.045,0.051,0.037,0.018,0.027,0.017,0.013,0.017,0.012,0.013,0.005,0.004,0.008,0.008
‚ñÅto,0.02,0.017,0.012,0.013,0.019,0.016,0.015,0.025,0.037,0.037,0.038,0.034,0.014,0.023,0.028,0.031,0.042,0.026,0.014,0.014,0.017,0.016,0.011
‚ñÅstudy,0.032,0.03,0.016,0.016,0.029,0.02,0.03,0.028,0.044,0.041,0.031,0.046,0.021,0.055,0.092,0.097,0.199,0.109,0.025,0.011,0.008,0.035,0.023
‚ñÅmodels,0.033,0.05,0.03,0.024,0.038,0.026,0.051,0.036,0.029,0.038,0.038,0.044,0.042,0.068,0.208,0.104,0.152,0.096,0.045,0.011,0.009,0.063,0.029
‚ñÅwith,0.011,0.015,0.008,0.007,0.011,0.008,0.014,0.011,0.01,0.011,0.009,0.011,0.007,0.013,0.029,0.049,0.036,0.074,0.024,0.013,0.012,0.018,0.011
‚ñÅIn,0.009,0.011,0.007,0.006,0.009,0.009,0.013,0.008,0.008,0.007,0.007,0.007,0.004,0.008,0.033,0.032,0.017,0.048,0.137,0.047,0.025,0.015,0.01
s,0.007,0.012,0.007,0.006,0.008,0.007,0.009,0.006,0.006,0.007,0.006,0.005,0.003,0.005,0.01,0.014,0.011,0.016,0.044,0.111,0.045,0.012,0.009
eq,0.049,0.082,0.047,0.037,0.061,0.052,0.086,0.044,0.041,0.037,0.034,0.037,0.021,0.039,0.078,0.109,0.075,0.171,0.301,0.542,0.593,0.111,0.093
!,0.028,0.034,0.02,0.018,0.027,0.021,0.036,0.021,0.014,0.014,0.011,0.01,0.007,0.012,0.018,0.019,0.019,0.02,0.016,0.012,0.018,0.05,0.066
</s>,0.023,0.026,0.019,0.016,0.021,0.024,0.025,0.016,0.015,0.015,0.014,0.012,0.007,0.011,0.015,0.018,0.02,0.023,0.038,0.054,0.061,0.028,0.043
probability,0.133,0.013,0.684,0.71,0.687,0.004,0.644,0.585,0.881,0.011,0.046,0.963,0.948,0.797,0.724,0.625,0.392,0.48,0.921,0.827,0.958,0.618,0.899

Unnamed: 0_level_0,‚ñÅNiet,‚ñÅheet,hoofd,ig,‚ñÅworden,‚ñÅman,",",‚ñÅhet,‚ñÅis,‚ñÅeen,‚ñÅ,mak,kie,‚ñÅom,‚ñÅmodellen,‚ñÅte,‚ñÅbestuderen,‚ñÅmet,‚ñÅIn,se,q,!,</s>
‚ñÅNiet,Unnamed: 1_level_1,0.045,0.046,0.038,0.046,0.025,0.029,0.033,0.017,0.01,0.012,0.01,0.006,0.008,0.006,0.005,0.002,0.004,0.004,0.001,0.001,0.014,0.015
‚ñÅheet,Unnamed: 1_level_2,Unnamed: 2_level_2,0.167,0.09,0.031,0.021,0.02,0.021,0.012,0.011,0.016,0.012,0.01,0.009,0.006,0.007,0.003,0.004,0.006,0.002,0.001,0.01,0.012
hoofd,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,0.195,0.065,0.034,0.041,0.047,0.017,0.022,0.02,0.013,0.014,0.011,0.009,0.013,0.005,0.006,0.009,0.002,0.002,0.017,0.015
ig,Unnamed: 1_level_4,Unnamed: 2_level_4,Unnamed: 3_level_4,Unnamed: 4_level_4,0.057,0.025,0.025,0.04,0.014,0.016,0.014,0.01,0.005,0.009,0.008,0.009,0.003,0.006,0.009,0.001,0.001,0.014,0.012
‚ñÅworden,Unnamed: 1_level_5,Unnamed: 2_level_5,Unnamed: 3_level_5,Unnamed: 4_level_5,Unnamed: 5_level_5,0.04,0.034,0.045,0.015,0.016,0.011,0.013,0.006,0.01,0.011,0.01,0.003,0.008,0.012,0.001,0.001,0.018,0.013
‚ñÅman,Unnamed: 1_level_6,Unnamed: 2_level_6,Unnamed: 3_level_6,Unnamed: 4_level_6,Unnamed: 5_level_6,Unnamed: 6_level_6,0.083,0.102,0.039,0.046,0.026,0.024,0.011,0.023,0.032,0.018,0.007,0.027,0.027,0.004,0.002,0.037,0.037
",",Unnamed: 1_level_7,Unnamed: 2_level_7,Unnamed: 3_level_7,Unnamed: 4_level_7,Unnamed: 5_level_7,Unnamed: 6_level_7,Unnamed: 7_level_7,0.022,0.015,0.01,0.013,0.013,0.007,0.01,0.004,0.005,0.002,0.003,0.002,0.001,0.001,0.009,0.009
‚ñÅhet,Unnamed: 1_level_8,Unnamed: 2_level_8,Unnamed: 3_level_8,Unnamed: 4_level_8,Unnamed: 5_level_8,Unnamed: 6_level_8,Unnamed: 7_level_8,Unnamed: 8_level_8,0.024,0.021,0.028,0.022,0.012,0.014,0.005,0.005,0.002,0.003,0.003,0.001,0.001,0.007,0.007
‚ñÅis,Unnamed: 1_level_9,Unnamed: 2_level_9,Unnamed: 3_level_9,Unnamed: 4_level_9,Unnamed: 5_level_9,Unnamed: 6_level_9,Unnamed: 7_level_9,Unnamed: 8_level_9,Unnamed: 9_level_9,0.019,0.041,0.033,0.013,0.014,0.006,0.005,0.002,0.003,0.003,0.001,0.001,0.005,0.006
‚ñÅeen,Unnamed: 1_level_10,Unnamed: 2_level_10,Unnamed: 3_level_10,Unnamed: 4_level_10,Unnamed: 5_level_10,Unnamed: 6_level_10,Unnamed: 7_level_10,Unnamed: 8_level_10,Unnamed: 9_level_10,Unnamed: 10_level_10,0.061,0.061,0.031,0.02,0.008,0.007,0.002,0.004,0.004,0.001,0.001,0.008,0.009
‚ñÅ,Unnamed: 1_level_11,Unnamed: 2_level_11,Unnamed: 3_level_11,Unnamed: 4_level_11,Unnamed: 5_level_11,Unnamed: 6_level_11,Unnamed: 7_level_11,Unnamed: 8_level_11,Unnamed: 9_level_11,Unnamed: 10_level_11,Unnamed: 11_level_11,0.038,0.07,0.025,0.008,0.009,0.004,0.004,0.003,0.001,0.001,0.004,0.007
mak,Unnamed: 1_level_12,Unnamed: 2_level_12,Unnamed: 3_level_12,Unnamed: 4_level_12,Unnamed: 5_level_12,Unnamed: 6_level_12,Unnamed: 7_level_12,Unnamed: 8_level_12,Unnamed: 9_level_12,Unnamed: 10_level_12,Unnamed: 11_level_12,Unnamed: 12_level_12,0.416,0.122,0.035,0.042,0.017,0.017,0.014,0.004,0.004,0.02,0.031
kie,Unnamed: 1_level_13,Unnamed: 2_level_13,Unnamed: 3_level_13,Unnamed: 4_level_13,Unnamed: 5_level_13,Unnamed: 6_level_13,Unnamed: 7_level_13,Unnamed: 8_level_13,Unnamed: 9_level_13,Unnamed: 10_level_13,Unnamed: 11_level_13,Unnamed: 12_level_13,Unnamed: 13_level_13,0.116,0.034,0.037,0.016,0.015,0.014,0.003,0.003,0.016,0.021
‚ñÅom,Unnamed: 1_level_14,Unnamed: 2_level_14,Unnamed: 3_level_14,Unnamed: 4_level_14,Unnamed: 5_level_14,Unnamed: 6_level_14,Unnamed: 7_level_14,Unnamed: 8_level_14,Unnamed: 9_level_14,Unnamed: 10_level_14,Unnamed: 11_level_14,Unnamed: 12_level_14,Unnamed: 13_level_14,Unnamed: 14_level_14,0.014,0.019,0.009,0.007,0.005,0.002,0.002,0.007,0.008
‚ñÅmodellen,Unnamed: 1_level_15,Unnamed: 2_level_15,Unnamed: 3_level_15,Unnamed: 4_level_15,Unnamed: 5_level_15,Unnamed: 6_level_15,Unnamed: 7_level_15,Unnamed: 8_level_15,Unnamed: 9_level_15,Unnamed: 10_level_15,Unnamed: 11_level_15,Unnamed: 12_level_15,Unnamed: 13_level_15,Unnamed: 14_level_15,Unnamed: 15_level_15,0.075,0.027,0.026,0.016,0.005,0.003,0.012,0.016
‚ñÅte,Unnamed: 1_level_16,Unnamed: 2_level_16,Unnamed: 3_level_16,Unnamed: 4_level_16,Unnamed: 5_level_16,Unnamed: 6_level_16,Unnamed: 7_level_16,Unnamed: 8_level_16,Unnamed: 9_level_16,Unnamed: 10_level_16,Unnamed: 11_level_16,Unnamed: 12_level_16,Unnamed: 13_level_16,Unnamed: 14_level_16,Unnamed: 15_level_16,Unnamed: 16_level_16,0.008,0.008,0.003,0.002,0.001,0.007,0.006
‚ñÅbestuderen,Unnamed: 1_level_17,Unnamed: 2_level_17,Unnamed: 3_level_17,Unnamed: 4_level_17,Unnamed: 5_level_17,Unnamed: 6_level_17,Unnamed: 7_level_17,Unnamed: 8_level_17,Unnamed: 9_level_17,Unnamed: 10_level_17,Unnamed: 11_level_17,Unnamed: 12_level_17,Unnamed: 13_level_17,Unnamed: 14_level_17,Unnamed: 15_level_17,Unnamed: 16_level_17,Unnamed: 17_level_17,0.039,0.01,0.005,0.002,0.02,0.017
‚ñÅmet,Unnamed: 1_level_18,Unnamed: 2_level_18,Unnamed: 3_level_18,Unnamed: 4_level_18,Unnamed: 5_level_18,Unnamed: 6_level_18,Unnamed: 7_level_18,Unnamed: 8_level_18,Unnamed: 9_level_18,Unnamed: 10_level_18,Unnamed: 11_level_18,Unnamed: 12_level_18,Unnamed: 13_level_18,Unnamed: 14_level_18,Unnamed: 15_level_18,Unnamed: 16_level_18,Unnamed: 17_level_18,Unnamed: 18_level_18,0.013,0.01,0.006,0.004,0.009
‚ñÅIn,Unnamed: 1_level_19,Unnamed: 2_level_19,Unnamed: 3_level_19,Unnamed: 4_level_19,Unnamed: 5_level_19,Unnamed: 6_level_19,Unnamed: 7_level_19,Unnamed: 8_level_19,Unnamed: 9_level_19,Unnamed: 10_level_19,Unnamed: 11_level_19,Unnamed: 12_level_19,Unnamed: 13_level_19,Unnamed: 14_level_19,Unnamed: 15_level_19,Unnamed: 16_level_19,Unnamed: 17_level_19,Unnamed: 18_level_19,Unnamed: 19_level_19,0.033,0.025,0.004,0.011
se,Unnamed: 1_level_20,Unnamed: 2_level_20,Unnamed: 3_level_20,Unnamed: 4_level_20,Unnamed: 5_level_20,Unnamed: 6_level_20,Unnamed: 7_level_20,Unnamed: 8_level_20,Unnamed: 9_level_20,Unnamed: 10_level_20,Unnamed: 11_level_20,Unnamed: 12_level_20,Unnamed: 13_level_20,Unnamed: 14_level_20,Unnamed: 15_level_20,Unnamed: 16_level_20,Unnamed: 17_level_20,Unnamed: 18_level_20,Unnamed: 19_level_20,Unnamed: 20_level_20,0.066,0.005,0.017
q,Unnamed: 1_level_21,Unnamed: 2_level_21,Unnamed: 3_level_21,Unnamed: 4_level_21,Unnamed: 5_level_21,Unnamed: 6_level_21,Unnamed: 7_level_21,Unnamed: 8_level_21,Unnamed: 9_level_21,Unnamed: 10_level_21,Unnamed: 11_level_21,Unnamed: 12_level_21,Unnamed: 13_level_21,Unnamed: 14_level_21,Unnamed: 15_level_21,Unnamed: 16_level_21,Unnamed: 17_level_21,Unnamed: 18_level_21,Unnamed: 19_level_21,Unnamed: 20_level_21,Unnamed: 21_level_21,0.012,0.037
!,Unnamed: 1_level_22,Unnamed: 2_level_22,Unnamed: 3_level_22,Unnamed: 4_level_22,Unnamed: 5_level_22,Unnamed: 6_level_22,Unnamed: 7_level_22,Unnamed: 8_level_22,Unnamed: 9_level_22,Unnamed: 10_level_22,Unnamed: 11_level_22,Unnamed: 12_level_22,Unnamed: 13_level_22,Unnamed: 14_level_22,Unnamed: 15_level_22,Unnamed: 16_level_22,Unnamed: 17_level_22,Unnamed: 18_level_22,Unnamed: 19_level_22,Unnamed: 20_level_22,Unnamed: 21_level_22,Unnamed: 22_level_22,0.102
</s>,Unnamed: 1_level_23,Unnamed: 2_level_23,Unnamed: 3_level_23,Unnamed: 4_level_23,Unnamed: 5_level_23,Unnamed: 6_level_23,Unnamed: 7_level_23,Unnamed: 8_level_23,Unnamed: 9_level_23,Unnamed: 10_level_23,Unnamed: 11_level_23,Unnamed: 12_level_23,Unnamed: 13_level_23,Unnamed: 14_level_23,Unnamed: 15_level_23,Unnamed: 16_level_23,Unnamed: 17_level_23,Unnamed: 18_level_23,Unnamed: 19_level_23,Unnamed: 20_level_23,Unnamed: 21_level_23,Unnamed: 22_level_23,Unnamed: 23_level_23
probability,0.133,0.013,0.684,0.71,0.687,0.004,0.644,0.585,0.881,0.011,0.046,0.963,0.948,0.797,0.724,0.625,0.392,0.48,0.921,0.827,0.958,0.618,0.899


### Your turn to comment

Comment on the resulting scores from the constrained, less idiomatic example, putting them in relation to the unconstrained, more idiomatic one. Consider the following aspects, but feel free to explore other examples and add your own observations:
1. How are importance scores distributed on idiomatic and non-idiomatic tokens?
2. What is the difference in probability between the two examples?
3. Do you notice some patterns regarding the low-probability tokens in the second example?
4. When is the target prefix playing a more important role in the generation, according to the attribution scores?

### Answer
Something that can be observed in both examples is that a small proportion of the source tokens were important in generating a large part of the output. We can see that this is especially true for the source tokens 'headed' and '_peas', which are both part of idiomatic expressions. It seems that in some sense, while translating the two idiomatic expressions in the source the model finds a lot of the idioms' meanings in those tokens, as evidenced by the important role they play in generating a lot of the output tokens. There are also some non-idiomatic tokens with high importance, e.g. '_study' and '_models'. However, these are only really important when generating their translations ('_bestuderen' and '_modellen', respectively). This is different from the role that the idiomatic tokens 'headed' and '_peas' play, as they are not important just for generating their own translations but for generating the translations of the complete idioms they are a part of. Regarding the target saliency map, we can see that especially in the second example with the more idiomatic translations, the largest importance scores are for idiomatic tokens playing a role in generating specific other idiomatic tokens. In the first example on the other hand, we don't really see such a distribution.

One thing that some of the low-probability tokens seem to have in common is that they are the starts of idiomatic phrases. For example, the token '_heet' is the start of the idiomatic phrase 'heethoofdig worden'. It makes sense that the model does not assign a high probability to the first part of such a phrase. But given that the previous token was the start of such a phrase, the tokens in the rest of the phrase become much more likely. This is also what we see in the probabilities that the model assigns to those tokens, e.g. the tokens 'hoofd', 'ig', and '_worden' have relatively high probabilities. Because of the lack of 'unexpected' idiomatic phrases in the first example output, we don't really see this pattern, and in general we don't see very low probabilities there. The exception is the low probability of the first output token '_Maak'. In the second example, the probability of the first token '_Niet' is also relatively low. It makes sense that the first token gets a low probability, as there is relatively little information to base the decision on. It should be noted that the probability of the first token is about three times larger in the second example than in the first example. Possibly this is because '_Niet' is semantically more similar to '_Don' than '_Maak' is.

## Contrastive Attribution for Motivating Preferences

In the previous section we used importance scores produced by attributing next token‚Äôs probability, which can be seen as answering the question ‚ÄúWhich elements of the input are the most relevant to produce the next generation step?‚Äù.

However, in many cases we might be more interested in understanding why our model generated its output **rather than another one that we consider to be more likely**. The paper [‚ÄúInterpreting Language Models with Contrastive Explanations‚Äù](https://arxiv.org/abs/2202.10419) by Yin and Neubig (2022) proposes a contrastive attribution method that can be used to answer this question. The method is integrated in Inseq and can be used as follows:

In [11]:
import inseq

model = inseq.load_model("google/flan-t5-base", "input_x_gradient")

# Pre-compute ids and attention map for the contrastive target
contrast = model.encode("no", as_targets=True)

out = model.attribute(
    "Does 3 + 3 equal 6?",
    # Fix the original target
    "yes",
    attributed_fn="contrast_prob_diff",
    # Also show the probability delta between the two options
    step_scores=["contrast_prob_diff", "probability"],
    contrast_ids=contrast.input_ids,
    contrast_attention_mask=contrast.attention_mask,
)

# Normally attributions are normalized to sum up to 1
# Here we want to see how they contribute to the probability difference instead
out.weight_attributions("contrast_prob_diff")
out.show()

We can see that the model is relying more heavily on formulation keywords (`Does`, `equal`) and less on the numbers to determine the answer. The gap between the positive and negative answer is also quite small (5%), suggesting that the model is not very confident in its answer. Changing the input to `Does 3 + 3 equal 7?` confirms that the actual expression is not playing a relevant role in the generation.

> ‚ö†Ô∏è **Important**: Since contrastive attribution compares the probabilities of a pair of `(original, contrastive)` tokens, in order for it to work, the compared sequences must have the same length. For example, if "yes" was tokenized as `_y`, `es` we couldn't have compared it with a single token `_no` using Inseq.

### Your turn to attribute

Using the generation model and a task of your choice, try to use contrastive attribution on at least three examples to highlight some interesting pattern of your choice. We encourage you to explore whathever you find most interesting, but here are some suggestions:

- Is negation relevant in producing the correct answer in open question answering models like the one we used in the previous example?

- When translating sentences like `The nurse went to the hospital` to a gendered language like Spanish, Italian or German, the model will have to select a gender for the subject. What is the model relying on to make this choice?

- Considering a fixed example like `Does 3 + 3 = 6?` but using models with increasingly more parameters (e.g. `flan-t5-small`, `flan-t5-base`, `flan-t5-large`), how does input importance and model confidence change?

After producing the visualizations, comment on the results and try to explain what you observe.

Inseq also supports attribution of quantized models (see [example](https://inseq.readthedocs.io/examples/locate_gpt2_knowledge.html)), in case you want to explore using larger models for your analysis. Refer to the [Inseq documentation](https://inseq.readthedocs.io/) for more details.


### Answer
Here I look into a few examples of the Helsinki English to Spanish translation model having to translate sentences that contain jobs or roles that are traditionally seen as being either more feminine or more masculine. The model has to choose which gender to use in the translation. It can be interesting to see whether there are large differences in probabilities between the two gendered versions of words, and based on which input tokens the model bases its selection.
In the first example, the model has to translate the sentence "The nurse went to the hospital". We can see that the model is quite confident about selecting the feminine gender instead of the masculine gender, as the probability difference between the tokens '_La' and '_El' is 0.639. We can see that the model bases its decision in large part on the token '_nurse'. This suggests that the token '_nurse' and the feminine gender are strongly connected, probably due to the training data containing more examples of '_nurse' referring to a woman.

In [12]:
translation_model = inseq.load_model("Helsinki-NLP/opus-mt-en-es", "input_x_gradient")
contrast = translation_model.encode("El enfermero fue al hospital", as_targets=True)

out = translation_model.attribute(
    "The nurse went to the hospital",
    "La enfermera fue al hospital",
    attributed_fn = "contrast_prob_diff",
    step_scores=["contrast_prob_diff", "probability"],
    contrast_ids=contrast.input_ids,
    contrast_attention_mask=contrast.attention_mask
)

out.weight_attributions("contrast_prob_diff")
out.show()

Downloading (‚Ä¶)lve/main/config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

Downloading (‚Ä¶)"pytorch_model.bin";:   0%|          | 0.00/312M [00:00<?, ?B/s]

Downloading (‚Ä¶)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (‚Ä¶)okenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

Downloading (‚Ä¶)olve/main/source.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

Downloading (‚Ä¶)olve/main/target.spm:   0%|          | 0.00/826k [00:00<?, ?B/s]

Downloading (‚Ä¶)olve/main/vocab.json:   0%|          | 0.00/1.59M [00:00<?, ?B/s]

In the second example, we look at the sentence "The president introduced a new law". In this case, the stereotypical choice for the gender of 'the president' would be masculine instead of feminine. Again it can be observed that the model follows this bias. Namely, with a difference of 0.506 the model sees the masculine gender as much more likely than the feminine gender in this case. In this case we can again see that the token representing the job in the input ('_president') has the largest importance in selecting the masculine gender over the feminine gender.

In [13]:
contrast = translation_model.encode("La presidenta introdujo una nueva ley", as_targets=True)

out = translation_model.attribute(
    "The president introduced a new law",
    "El presidente introdujo una nueva ley",
    attributed_fn = "contrast_prob_diff",
    step_scores=["contrast_prob_diff", "probability"],
    contrast_ids=contrast.input_ids,
    contrast_attention_mask=contrast.attention_mask
)

out.weight_attributions("contrast_prob_diff")
out.show()

Examples of the model not following a common bias can also be found. For example, when translating the sentence "The parent cooks for the children", the model does not follow the traditional bias and prefers 'El padre' over 'La madre' as a translation of 'The parent'. The tokens '_parent' and '_cook' have the largest importance in selecting the masculine gender over the feminine gender. It is not entirely clear why the model does not follow a bias similar to the first two examples. Possibly 'el padre' as a translation of 'the parent' simply occurs more frequently in the training data in general.

In [14]:
contrast = translation_model.encode("La madre cocina para los ni√±os", as_targets=True)

out = translation_model.attribute(
    "The parent cooks for the children",
    "El padre cocina para los ni√±os",
    attributed_fn = "contrast_prob_diff",
    step_scores=["contrast_prob_diff", "probability"],
    contrast_ids=contrast.input_ids,
    contrast_attention_mask=contrast.attention_mask
)

out.weight_attributions("contrast_prob_diff")
out.show()