# Transformers language models



## Where do transformers live?

Open source TLMs are hosted in the [Hugginface Hub](https://huggingface.co/).

We can access them via their identification code through the hugginface API.

To do that it is necessary to install the transformers python library

In [None]:
! pip install transformers



In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
import torch
from transformers import AutoModel, AutoTokenizer
from transformers import AutoModelForCausalLM
from transformers import pipeline, set_seed
set_seed(42)
import warnings
warnings.filterwarnings('ignore')

# (Extremely) brief overview of a Transformer block



Transformers are made of several identical blocks. They are not a unique NN but a stacking of blocks (layers) which can be further subdivided in sub-block (sub-layers)

A single building block is composed of two key elements:
- An attention layer. Where attention heads make their computations.
- A feed forward layer. A deep neural network.

![img](https://jalammar.github.io/images/t/Transformer_encoder.png)

*Picture taken from https://jalammar.github.io/illustrated-transformer/*

# BERT



![img](https://sesameworkshop.org/wp-content/uploads/2023/03/presskit_ss_bio_bert-560x420.png)
*picture from Google*

BERT is a TLM based on encoders.

- It is trained on the *Masked Language Modelling* and *sentence prediction* tasks.

- It can deal with different contexts, positional information and OOV.

BERT has many different variants that improves or modify the standard one or are for specific languages. (i.e. Distil-BERT, RoBERTa, UmBERTo etc..).


We will use a version called *Distil-BERT* which is a "distilled" version of the original, making the latter smaller withouth loosing much in performances.

## BERT inputs workflow

The workflow to get embeddings in BERT is likewise:
- *Tokenize* raw texts
- Add two special tokens: [CLS] at the beginning and [SEP] at the end
- *encode* tokens with a *word id* (pre-implemented). Word ids are already present in the vocabulary of the model for known words. For unknown words a special token [UNK] will be assigned (if they cannot be decomposed).
- run the model on the prepared *encodings*


![img](https://jalammar.github.io/images/distilBERT/bert-distilbert-tokenization-2-token-ids.png)


*picture from [Jay Alammar awesome blog](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/)*




We can do that easily with *AutoModel* and *AutoTokenizer* classes

The library takes care of segmenting the text into tokens, converting it to ids and feed the model with the token embedding and the positional embedding (which both result from the pre-training)

In [None]:
#instantiate the model and the tokenizer
model_id = "distilbert/distilbert-base-cased" #find all the ids on the hugginface hub
model = AutoModel.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Note on special tokens:
- [CLS] token at the beginning. This is also known as the classifier token and it is a *global representation* of the whole sentence. We can use it to build classification systems over sentences and short text.
- [SEP] token at the end of the sentences.

Both this tokens are used during pretraining for the Next Sentence Prediction task.




The code take car of doing it automatically

In [None]:
#instantiate and encode some text
text = "I went to the gym yesterday morning. I was surrounded by people"
encoding_tokens = tokenizer(text, return_tensors = "pt")

### Brief note on tokenization

Tokenization in BERT models happens at the sub-lexical level. Thus, when a word is unknown, the tokenizer tries to break it down into pieces in the first place

In [None]:
for n, id in enumerate(encoding_tokens["input_ids"][0]):
  print(n,tokenizer.decode(id),"---->", id.item())

0 [CLS] ----> 101
1 I ----> 146
2 went ----> 1355
3 to ----> 1106
4 the ----> 1103
5 gym ----> 10759
6 yesterday ----> 8128
7 morning ----> 2106
8 . ----> 119
9 I ----> 146
10 was ----> 1108
11 surrounded ----> 4405
12 by ----> 1118
13 people ----> 1234
14 [SEP] ----> 102


In [None]:
text2 = "Ciao mi chiamo Mattia e studio linguistica computazionale"
enc = tokenizer.encode(text2)
for tok in enc:
  print(tokenizer.decode(tok))

[CLS]
C
##ia
##o
mi
ch
##iam
##o
Matt
##ia
e
studio
linguistic
##a
com
##put
##azi
##onal
##e
[SEP]


In [None]:
#special token for unknown words
print("Our special token for unknown words is: ")
print(tokenizer.encode("[UNK]", add_special_tokens = False))

# subword tokenization
print("Tokenization example at the sub-word level: ")
print(tokenizer.tokenize("rumination"))


Our special token for unknown words is: 
[100]
Tokenization example at the sub-word level: 
['r', '##umi', '##nation']


We can inspect the vocabulary of the model, check for its length and have a look at words at random position

In [None]:
print(f"The vocabulary of {model_id} has {len(tokenizer.vocab.keys())} tokens\n")

print("Words-->Word id\n")
for k in list(tokenizer.vocab.keys())[500:530]:
  print(k, "--->", tokenizer.vocab[k])


The vocabulary of distilbert/distilbert-base-cased has 28996 tokens

Words-->Word id

whether ---> 2480
Snow ---> 8442
́ ---> 389
##rkin ---> 17687
squares ---> 16004
fierce ---> 9250
concurrently ---> 18061
##ey ---> 2254
よ ---> 925
##ishing ---> 10506
Wednesday ---> 9031
##tee ---> 26032
circumstances ---> 5607
spends ---> 16994
Victoria ---> 3006
wonderful ---> 7310
ʔ ---> 375
million ---> 1550
264 ---> 23852
##oli ---> 11014
endorsed ---> 11889
##nko ---> 17075
umpire ---> 25077
##ronic ---> 26003
##sla ---> 26597
pump ---> 11188
faculties ---> 22094
silently ---> 8490
Own ---> 13432
Holding ---> 14382


## Extract the embeddings

In [None]:
#run the model over our input
text = "I went to the gym yesterday morning. I was surrounded by people"
encoding_tokens = tokenizer(text, return_tensors = "pt")

with torch.inference_mode():
  outputs = model(**encoding_tokens)

In [None]:
# access the shape of our output
shape = outputs["last_hidden_state"][0].shape
print(f"The shape of our output is: {shape}\n")
print(f"This means we have {shape[0]} words represented with vectors of dimension {shape[1]}")

The shape of our output is: torch.Size([15, 768])

This means we have 15 words represented with vectors of dimension 768


We have 15 words because we added 2 special tokens to our initial 13

In [None]:
# extract the embeddings and show them in association with the corresponding word
embedded_words = []
word_embeddings = outputs["last_hidden_state"][0]
words = [tokenizer.decode(i) for i in encoding_tokens["input_ids"][0]]
embedding_df = pd.DataFrame(word_embeddings).astype(float)
embedding_df.index = words
embedding_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,758,759,760,761,762,763,764,765,766,767
[CLS],0.333347,0.043025,-0.079218,-0.084207,-0.212797,-0.123895,0.252003,-0.174414,-0.056993,-0.938937,...,0.418048,0.361712,-0.152836,-0.049004,-0.023088,0.267946,-0.060299,-0.304645,0.117908,0.014567
I,0.262142,-0.338454,0.300905,0.009044,-0.121172,0.125886,0.553342,-0.232121,0.055819,-0.433263,...,0.184417,0.123045,-0.365363,-0.058124,-0.058661,0.49231,0.640887,-0.170047,-0.000576,0.093725
went,0.342653,0.337142,0.35199,-0.095413,0.628846,-0.074906,0.209897,-0.025465,-0.541057,0.518051,...,0.103655,0.074183,-0.10541,-0.054341,-0.032809,0.474045,0.135485,-0.02426,-0.264745,0.180085
to,0.45595,0.422148,-0.056485,0.062724,0.58496,-0.182029,0.491096,0.093366,-0.322163,0.311484,...,0.223978,0.320306,-0.295646,-0.386567,0.180663,0.154933,-0.648991,-0.136615,-0.015364,-0.158476
the,-0.015327,-0.012505,-0.052235,0.256436,0.432003,-0.038705,0.504136,-0.287105,-0.577677,0.35836,...,0.339154,0.339906,-0.247194,-0.203175,0.147722,0.481793,-0.278707,0.074257,-0.192144,0.184582
gym,0.089551,0.12153,0.052608,0.033602,0.075624,-0.006596,-0.02743,0.027705,-0.276591,0.242132,...,-0.033289,0.086502,0.186209,0.027672,-0.500361,-0.005887,0.241406,-0.473748,-0.241873,0.122401
yesterday,-0.086992,-0.345998,0.316899,0.166045,0.384935,0.181641,0.293512,-0.504798,-0.316557,0.345682,...,0.00197,0.066779,-0.031057,-0.040903,-0.222445,-0.081098,-0.281266,-0.349598,-0.159604,0.27714
morning,0.164726,-0.213105,0.094032,0.102668,0.483439,-0.035999,0.196843,-0.146765,-0.121567,0.396554,...,-0.010726,0.392034,-0.214797,-0.081358,0.129133,-0.066733,-0.443217,0.017861,0.094103,0.289514
.,0.507484,0.203033,0.122107,0.714107,0.362723,-0.125391,0.446712,-0.157362,-0.207219,0.186249,...,-0.095349,0.029737,-0.156662,-0.464045,-0.173531,0.418284,0.008564,-0.067754,0.342219,-0.132604
I,-0.030914,-0.056213,-0.224483,0.306893,0.199719,-0.091678,0.268608,-0.222513,-0.034011,0.325369,...,0.09923,0.136378,-0.047534,0.040639,-0.135952,0.340671,0.544617,0.044454,0.008186,0.174646


Different contexts yileds different embeddings.

A simple visualization of the differences between three embeddings of the same word:

We take two sentences as:
- We went to the river *bank*
- I went to the *bank* to make a deposit

In [None]:
@torch.inference_mode()
def get_token_repr(s,id):
  enc = tokenizer(s, return_tensors = "pt")
  position = [i.item() for i in enc["input_ids"][0]].index(id)
  print(position, enc["input_ids"][0][position])
  outputs = model(**enc)["last_hidden_state"]
  return outputs.squeeze(dim = 0)[position]

bank1 = "We went to the river bank."
bank2 = "I need to go to the bank to make a deposit."

#encode the word of our interest
word_id = tokenizer.encode("bank", add_special_tokens = False)[0]
#print(word_id)

#compare bank1 and bank 2 and bank no context
bank1_emb = get_token_repr(bank1, word_id)
bank2_emb = get_token_repr(bank2, word_id)
bank_non_context = model.embeddings.word_embeddings(torch.tensor(word_id)).detach()

#create a dataframe with the three tensors
bank_df = pd.DataFrame(torch.stack((bank_non_context, bank1_emb,bank2_emb))).astype(float)
bank_df.index = ["bank.no_cntxt","bank.1", "bank.2"]
bank_df.style.background_gradient(cmap = "Blues", vmin = -1, vmax = 1)



6 tensor(3085)
7 tensor(3085)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767
bank.no_cntxt,-0.130838,0.010942,-0.006302,-0.02116,0.009157,0.035847,-0.005595,-0.018413,-0.000985,0.016035,-0.055567,-0.055775,-0.050389,0.029651,-0.00973,-0.0043,-0.093038,-0.011695,-0.05116,0.086991,-0.000757,0.014374,0.020235,-0.006749,-0.02055,0.014688,0.025359,-0.047475,0.066669,0.004701,0.006874,-0.010996,0.022762,-0.016366,-0.109486,0.005475,-0.077198,-0.072873,-0.059618,-0.024512,0.065526,-0.017621,0.002748,0.000343,-0.00826,0.013144,-0.023263,-0.04091,0.029428,-0.011465,-0.019518,0.008667,0.00913,-0.05832,0.023711,-0.012727,-0.02813,-0.059762,-0.015144,-0.027434,-0.009115,0.00688,-0.000927,-0.007041,-0.060536,0.088622,-0.077682,-0.021863,-0.054702,-0.013514,-0.043952,-0.005337,0.065558,-0.003683,0.035852,-0.072763,-0.039982,-0.032839,0.01046,0.010213,-0.023142,-0.036105,-0.016676,0.000725,0.004747,-0.016275,-0.032326,-0.027581,0.005736,-0.098379,-0.028841,-0.018296,-0.024525,-0.050153,0.023196,-0.031957,-0.001727,-0.038177,0.017439,0.093199,-0.022934,-0.040684,0.033092,-0.084104,0.070209,-0.011878,0.032134,-0.067737,-0.030796,-0.075973,-0.007283,0.013403,-0.102289,-0.017647,-0.103881,0.06163,-0.126468,0.020454,-0.015968,-0.051159,-0.0226,-0.073875,0.02214,0.043317,0.056576,0.00528,0.029868,-0.028212,0.028802,-0.024825,-0.027309,-0.078552,-0.033347,0.022569,-0.047386,-0.01387,-0.051066,-0.002251,-0.028885,-0.099023,-0.03668,0.02389,-0.003598,-0.028497,0.003799,0.054898,-0.016748,0.072232,-0.054569,-0.076683,-0.022641,0.03313,0.003547,0.063445,0.013653,-0.001813,-0.025115,0.016565,-0.01287,-0.087461,0.001051,0.01176,0.033213,-0.024958,0.045728,-0.03678,-0.007057,-0.055806,0.026351,0.037066,-0.004793,-0.021058,-0.004838,-0.010458,-0.031227,0.008722,-0.096549,-0.063763,-0.004684,-0.001079,0.019099,0.010658,-0.010355,-0.045894,-0.032545,-0.002284,-0.028969,0.006034,-0.045115,-0.05143,0.025722,-0.002172,-0.10834,0.034143,-0.07863,-0.053984,-0.075145,-0.043398,-0.014855,0.02282,-0.051178,-0.006546,-0.009029,-0.011929,-0.019122,-0.048053,0.004734,0.024175,-0.004579,0.091687,0.006904,0.018562,-0.031074,0.002334,-0.067785,-0.039889,-0.035431,-0.007603,0.024761,-0.02038,-0.075087,0.020147,-0.065141,-0.029516,-0.085636,0.052217,0.003224,0.040012,-0.026679,-0.039735,-0.025595,-0.000167,-0.049877,0.030655,-0.037234,0.009629,-0.040909,-0.034446,-0.048385,0.055178,2.6e-05,-0.000575,0.009143,-0.069445,-0.023412,-0.012213,0.10934,-0.019101,-0.004103,-0.039916,-0.063831,0.045995,0.030467,0.052057,0.026234,0.008902,-0.004234,0.00785,-0.006164,0.005244,-0.054344,-0.016376,-0.004994,-0.075918,0.059898,0.03211,0.008643,0.002472,-0.10292,-0.040587,-0.039416,0.000687,-0.046548,-0.103594,0.051191,-0.006446,0.012089,-0.000455,-0.031654,-0.030536,-0.04086,-0.068654,0.0302,-0.080243,-0.050336,0.037925,0.051507,-0.019114,-0.022337,-0.058036,-0.049731,-0.04258,-0.011341,-0.007259,0.003153,0.014235,0.006665,-0.052474,-0.00161,0.008477,-0.097096,-0.026181,-0.058885,0.035968,-0.046723,0.023804,-0.123206,0.016752,-0.015814,-0.033592,-0.046791,-0.000596,-0.047747,-0.072355,0.051091,-0.036384,-0.029015,-0.016651,-0.001626,-0.063298,-0.009705,-0.072922,0.026317,-0.023673,-0.070611,-0.023836,-0.031326,-0.037687,-0.024495,-0.001693,-0.100314,-0.077331,-0.027944,-0.02788,-0.000571,0.005734,-0.021449,0.010022,-0.083231,-0.008105,0.009411,-0.0723,0.0629,0.062134,-0.041665,-0.018641,-0.050622,-0.039049,-0.041355,-0.086494,0.042851,-0.057901,-0.01726,0.049657,0.032649,0.01414,-0.033349,-0.022883,0.020183,0.064557,-0.092022,-0.008022,-0.050875,-0.047279,-0.031883,0.046618,-0.005372,0.068741,0.056362,-0.034331,-0.015634,-0.008046,-0.024142,-0.088556,-0.062269,-0.096573,-0.010082,0.026903,-0.12481,-0.002566,0.001066,0.053183,0.037127,-0.043924,-0.043693,-0.03223,-0.024849,0.007279,-0.027034,-0.026011,-0.007154,0.011851,-0.095074,-0.098268,-0.081043,0.028966,0.016356,0.037943,0.011472,-0.082458,-0.00628,-0.049924,0.0016,0.052042,-0.084414,0.012543,-0.066075,-0.012666,-0.020135,0.039073,-0.012323,-0.056219,-0.032622,-0.055223,-0.054014,-0.010803,-0.041228,-0.001821,-0.002225,0.021488,-0.082442,-0.046381,-0.088646,-0.071478,-0.014856,0.006231,-0.053245,-0.011375,-0.10953,-0.02857,0.003975,-0.063536,0.009278,0.028416,0.046104,-0.075813,0.015593,0.035874,0.049046,-0.033389,0.02772,0.018939,0.003192,0.045844,-0.052355,-0.017034,0.030426,0.028263,-0.028987,0.004923,0.00052,-0.095003,0.013159,-0.031869,-0.029629,0.029211,0.011421,0.038432,0.033178,-0.075714,-0.061959,0.00397,0.065946,0.001225,0.012456,-0.017398,-0.075553,0.011869,0.005689,-0.062678,-0.118112,-0.018534,-0.059515,0.045928,-0.027468,0.130451,0.007595,-0.013632,0.007676,-0.010331,0.010971,-0.04831,-0.008526,-0.06092,0.066473,-0.029373,-0.00132,-0.022106,0.03187,0.028047,-0.036161,0.008617,0.044157,-0.016768,-0.026837,-0.017271,0.01012,-0.031189,-0.01436,-0.09921,-0.05641,-0.032757,0.026061,-0.035443,0.016933,0.001311,-0.046203,0.010113,0.022289,-0.006648,0.025989,0.038663,-0.09227,-0.037806,0.043111,0.056336,-0.013997,0.010707,-0.035618,-0.074467,-0.047022,-0.084932,0.009952,-0.017849,-0.020718,0.0368,-0.031955,0.023631,-0.057476,-0.053474,0.029936,-0.013202,0.016668,-0.068185,-0.107558,0.057619,-0.082632,-0.027964,-0.033549,0.033724,-0.02755,-0.029302,0.056805,-0.089331,-0.022382,-0.099873,-0.069942,-0.084673,-0.004504,0.047159,-0.022275,-0.051084,0.030405,0.006477,-0.017914,0.00911,-0.008725,-0.008523,0.016726,0.041049,0.020024,-0.049122,-0.033718,0.049529,-0.001614,-0.098507,-0.037128,-0.044012,0.014045,0.04661,-0.004405,0.021715,0.010085,0.019488,0.03252,0.046036,0.063975,-0.031158,0.042721,0.057891,0.011701,0.018653,-0.016784,0.019925,-0.046476,0.004907,-0.003283,-0.002166,-0.019732,0.02004,-0.001098,-0.041172,-0.050935,-0.02206,0.033777,-0.03332,0.040721,-0.009878,-0.05001,0.007122,-0.028142,0.006854,-0.003566,0.018606,-0.052155,-0.025404,0.009653,-0.009327,0.028466,-0.045169,0.046862,-0.036898,-0.032263,-0.021493,0.039581,-0.041196,-0.024385,0.019902,-0.010089,-0.045834,-0.024729,0.04625,-0.017838,-0.132391,-0.000321,0.03196,0.039208,-0.039017,-0.073963,-0.056918,-0.0642,-0.046498,0.026322,-0.011141,-0.021733,0.048614,-0.024241,-0.110919,0.00791,0.079261,-0.010526,-0.085544,-0.006623,-0.036607,0.036525,0.027426,-0.008842,0.021767,-0.122049,-0.020758,-0.111789,-0.005169,-0.000952,-0.074816,0.031821,0.001062,0.043136,-0.045228,-0.008709,0.033983,0.059104,0.024883,-0.032135,-0.057746,-0.029464,-0.003735,0.035478,0.008043,0.012834,0.026593,0.061886,-0.075744,0.002588,0.029584,-0.058388,-0.062047,-0.017453,-0.068465,0.00475,-0.085815,-0.001971,0.033865,0.024596,-0.042698,0.045877,-0.009774,0.019971,0.010401,0.001553,0.075731,0.04307,-0.041362,-0.038278,0.0303,-0.049449,-0.048581,0.015663,-0.046913,-0.074743,-0.022914,0.053501,-0.095763,-0.00015,-0.037051,0.040715,-0.010672,0.022372,-0.001559,0.031236,-0.065747,-0.019179,-0.007688,-0.023395,-0.063711,-0.046816,0.018911,-0.111907,0.035357,-0.004468,-4.9e-05,-0.050878,-0.026803,0.055414,-0.032537,-0.08771,-0.060265,-0.050995,-0.012919,0.000728,-0.01314,-0.041526,-0.060541,-0.026165,-0.041188,0.031975,-0.10383,-0.005688,0.029932,0.063855,0.062426,-0.028167,-0.066179,-0.058975,-0.001254,0.030707,0.040964,-0.011544,0.0167,-0.054366,-0.032335,0.001014,-0.015933,0.006136,0.040959,-0.059501,0.057371,-0.005632,0.021275,-0.0151,0.023534,-0.035696,-0.048689,0.004421,-0.010579,0.002506,0.090661,0.015653,0.014414
bank.1,-0.056394,0.213424,-0.413571,0.27923,0.606008,-0.005858,0.142453,0.116765,-0.182434,0.14586,-0.27976,-0.266238,-0.54494,0.382557,-0.278007,-0.168043,-0.05029,-0.234957,-0.276053,0.326176,0.021984,-0.048297,0.152323,0.230629,-0.182378,-0.422662,0.167503,0.540043,0.548906,0.184869,0.144562,-0.095539,-0.196036,-0.451014,-0.23289,-0.012474,0.033149,0.328791,0.125509,0.126285,0.549295,-0.012489,-0.252819,0.014603,0.001824,-0.531605,0.357944,-0.357784,0.585544,-0.093639,-0.050113,-0.138517,0.555565,0.247137,-0.124345,-0.342414,-0.179829,-0.212029,-0.144758,-0.04304,0.362262,0.57343,0.257654,-0.044043,0.296917,0.777728,-0.077742,-0.250795,0.165649,-0.094879,-0.05767,-0.158489,-0.01805,0.422088,0.477088,-0.167707,-0.197256,0.14853,-0.088175,0.180648,0.169427,0.308075,0.169367,0.477987,-0.20308,-0.154211,-0.298673,0.372296,-0.078838,-0.146111,0.049977,-0.174633,0.063222,-0.128314,0.056469,0.091069,-0.323837,0.170643,-0.843707,0.188746,0.114216,-0.245087,0.090147,-0.550648,0.064704,-0.08212,0.115938,-0.438981,-0.135237,0.181005,0.096843,-0.109702,0.130592,0.016527,-0.36295,0.079775,-0.150198,-0.122077,0.005967,-0.036514,0.472833,-0.103429,0.36361,0.012345,0.327038,-0.111652,-0.034691,-0.279143,0.162945,-0.108779,-0.4025,-0.04793,0.200733,-0.138633,-0.01506,0.242171,-0.586409,0.595088,-1.037136,-0.30778,0.018607,-0.021924,0.505489,-0.368784,0.386006,-0.14915,0.363051,-0.454969,0.187531,0.148953,0.370247,-0.067164,0.34545,0.145383,-0.057431,-0.262763,0.046019,-0.161265,0.458569,0.118613,0.550343,-0.116819,-0.196145,0.345864,-0.204094,-0.31801,0.029242,0.570526,0.054432,0.025484,0.564861,0.032413,0.126438,0.155635,0.227533,-0.038143,-0.07387,-0.266996,-0.091948,0.108166,0.003862,0.121348,-0.127691,-0.264796,-0.279684,0.423158,-0.154224,0.194821,0.150934,0.069588,0.080306,-0.416283,0.27285,-0.181236,-0.51121,-0.066223,-0.101789,0.102562,0.055706,-0.239672,-0.374404,-0.130379,-0.082563,-0.157147,0.181684,-0.009333,0.171167,0.250074,-0.062137,-0.080874,0.181724,-0.107028,0.336395,0.310497,-0.181982,0.045634,-0.0833,0.345711,-0.359814,0.126113,-0.251479,0.448645,-0.296602,-0.205525,-0.379001,0.017211,-0.071124,0.37664,0.333853,-0.050157,-0.153422,0.434234,0.228762,-0.406977,-0.116334,-0.064038,0.14068,-0.530918,-0.558645,0.179535,0.010373,0.37685,-0.107208,0.161892,0.185043,0.176776,0.420547,-0.052047,0.463332,0.213082,-0.664854,0.114783,0.159516,0.442813,0.748166,-0.172974,-0.313199,0.389318,0.245007,0.027445,0.012766,0.075129,0.370907,-0.254692,0.071833,-0.098132,-0.061679,0.345765,-0.212934,0.115471,0.113154,0.304661,0.187518,0.178016,0.460192,0.301886,0.060637,-0.098937,0.151943,-0.520593,-0.116405,0.148481,-0.014042,-0.157492,-0.6293,-0.329831,0.061358,0.353215,0.013506,-0.153141,-0.297738,0.184837,-0.258458,0.21265,0.462963,-0.328174,-0.127745,0.035218,0.202976,0.257148,0.031156,-0.093039,0.040913,0.060146,-0.283457,0.459083,0.616869,0.52608,0.127128,0.571159,0.231196,0.274656,-0.428081,-0.165166,0.043577,-0.019693,-0.328628,0.194802,-0.674001,-0.24728,-0.093651,0.133444,0.0427,0.106912,0.089863,0.541106,0.060032,-0.222894,0.058953,-0.413058,-0.320587,0.129296,-0.522605,0.154574,0.265493,0.205241,-0.052393,0.396599,-0.489037,0.336653,0.333274,-0.584669,0.360709,0.080486,-0.064991,-0.11175,-0.194836,-0.091372,-0.060042,-0.229557,0.005408,-0.413932,-0.048081,-0.358469,0.458845,-0.330747,-0.258447,0.033049,0.27692,0.457418,-0.012144,-0.286431,-0.436102,0.291738,-0.544843,-0.066903,0.299383,0.819366,0.191186,-0.270564,0.5919,-0.165891,-0.007121,0.076631,-0.483225,-0.017585,-0.264028,0.308959,-0.189087,-0.506428,0.275188,0.028173,0.539671,-0.252195,-0.265585,-0.31043,-0.628092,0.072071,0.289569,-0.170838,0.252628,-0.344487,-0.094526,-0.646239,-0.328318,0.355126,-0.644336,0.029372,-0.250674,-0.269042,0.009027,-0.130539,-0.252966,0.234112,-0.16993,-0.110012,-0.240913,-0.005239,0.294212,0.083471,0.403685,-0.501221,-0.604277,0.221974,0.22027,-0.340316,0.197363,-0.067342,0.145628,0.143885,-0.154898,-0.299523,0.022286,-0.059402,0.021848,0.081495,-0.110292,-0.415518,-0.35356,-0.077718,0.050943,0.234296,-0.075216,-0.003266,0.175006,-0.516904,0.237919,0.213889,0.22656,-0.07478,-0.370561,-0.156847,0.065978,-0.316242,-0.30106,-0.171345,0.159344,0.11366,0.064898,0.047884,0.212471,-0.34795,-0.046831,-0.081584,0.194316,-0.585901,0.004223,0.173109,0.246486,0.107142,-0.132214,0.111892,0.158258,0.513396,0.122516,0.043179,0.013768,0.588957,-0.407107,0.293112,-0.232706,0.176298,0.247312,0.730563,0.002227,0.207336,-0.092641,0.117501,-0.022768,-0.017572,-0.4143,-0.264341,0.378726,-0.329722,0.189377,0.095648,0.441947,0.270667,0.247836,0.744872,-0.008995,0.226697,0.081098,0.463359,0.294117,0.156669,0.093165,0.032964,-0.045151,0.041406,-0.35425,-0.3163,0.294829,0.224357,-0.249181,0.28738,-0.198135,0.034142,0.081572,-0.248071,0.217397,0.020625,-0.205785,-0.416799,0.225078,0.123166,-0.287547,-0.197589,0.209539,0.027937,0.774407,0.017376,-0.426397,-0.018752,-0.143037,0.123611,0.513445,0.219131,-0.306554,0.247228,0.027453,-0.149292,0.064467,-0.17078,0.083585,0.277368,-0.075672,-0.278961,-0.11959,0.328704,-0.00209,0.211865,0.500152,-0.139837,-0.160775,0.030595,-0.192,0.362145,-0.173589,0.120087,0.308297,-0.166522,-0.378965,-0.017023,-0.014613,-0.262603,-0.398081,-9.012305,0.191123,-0.138461,0.070349,-0.312442,-0.144333,0.291627,-0.162094,0.208915,-0.018512,0.167369,0.036135,0.134414,0.196872,-0.021595,-0.210697,-0.039037,0.405777,0.287394,0.126096,0.034422,0.426281,0.468922,0.643688,-0.142996,0.09162,-0.348066,-0.272398,-0.11984,-0.173169,0.055152,0.200196,0.008215,-0.141701,-0.280697,-0.017905,0.192927,0.092025,-0.137557,0.033344,-0.076041,-0.41954,-0.155733,-0.108859,0.2241,-0.07612,-0.333616,0.769031,-0.100495,0.058284,0.422312,0.094228,0.069746,-0.118672,-0.624504,-0.08154,-0.051902,0.758985,-0.770392,-0.436289,0.016213,0.083154,0.43821,0.388709,-0.58708,-0.242262,-0.932369,0.020222,0.563272,0.508653,-0.254107,-0.332836,0.148146,-0.663431,-0.001487,0.137318,0.154581,0.399316,0.011039,0.801341,-0.585096,0.28911,0.642459,-0.221007,-0.211133,0.146033,-0.386074,0.596259,0.354217,-0.154329,0.419103,-0.469277,-0.044035,-0.210952,-0.103945,-0.106157,-0.257472,0.2075,0.273118,0.510746,-0.222595,0.403,0.147562,0.189472,-0.024216,0.326669,-0.69409,-0.533804,-0.091488,-0.154256,-0.242905,0.073097,0.398527,0.130154,0.313619,0.283499,0.48773,-0.199752,0.070583,-0.189076,0.400335,-0.076831,-0.651482,-0.146847,-0.089651,-0.467431,-0.187358,0.036025,0.086839,0.39961,0.425694,0.158775,-0.10669,0.012616,-0.098175,-0.095855,0.142432,0.022415,-0.094643,-0.128021,-0.245449,-0.062579,-0.217334,0.668408,0.313224,0.029925,-0.199058,0.282997,-0.210602,-0.2165,-0.156305,0.320852,-0.23224,-0.238215,0.078321,0.106153,-0.114533,0.176199,-0.100238,-0.227878,0.007824,0.153965,0.281576,0.079424,0.343888,0.209965,-0.269364,0.29333,0.097518,-0.040836,0.0186,-0.296374,-0.276132,-0.197316,-0.206285,-0.104456,0.183825,-0.378249,-0.129739,0.103314,0.22694,-0.147601,0.288929,0.005779,-0.162583,-0.113869,-0.242999,0.113406,0.301435,-0.750983,0.005676,0.124662,-0.425046,0.440452,-0.526513,0.326713,-0.133819,-0.144517,0.111467,0.141763,-0.23047,-0.094605,0.057692,-0.202196,-0.338637,0.285052,0.160683,0.299192,0.036148,0.141436,0.17133
bank.2,-0.32505,0.135324,0.105089,0.182049,0.355616,0.022431,0.114222,0.046336,-0.383388,-0.066842,-0.193575,-0.00748,-0.493044,0.340813,-0.288797,-0.037205,-0.258797,0.180596,-0.008792,0.15433,-0.161444,0.289583,0.030027,-0.005521,-0.21474,0.088079,0.311748,1.193475,0.511245,0.477776,0.247013,0.278597,-0.03879,-0.36065,-0.112546,0.061047,-0.070144,-0.08293,-0.119345,-0.03806,0.722395,-0.109494,0.162535,0.150404,-0.079666,-0.251127,0.064734,-0.030551,-0.212557,0.118455,0.038737,-0.032557,0.46523,-0.085202,0.16806,-0.489609,-0.317551,0.042654,-0.530807,0.228534,0.00471,0.325656,-0.156589,0.039917,-0.086379,0.951494,0.121718,-0.086936,0.258311,-0.12446,0.024247,-0.07214,0.290645,0.061457,0.243371,-0.386046,0.033429,0.100545,0.164226,0.339452,0.034163,0.154746,0.057926,-0.168585,-0.060847,-0.174538,-0.368065,0.321429,-0.064023,-0.504866,-0.002877,-0.029389,0.357652,-0.025113,0.161795,0.299375,-0.146981,-0.350518,-0.3613,0.344405,0.195849,-0.118922,0.024857,-0.503237,-0.110316,0.059504,-0.087274,-0.265285,-0.306961,0.401337,0.0649,-0.269654,-0.045771,-0.188259,-0.394906,-0.141374,-0.144179,0.125436,0.262868,-0.313369,0.757587,0.216207,0.302863,0.183007,0.478637,0.049248,0.009427,-0.183057,0.423189,-0.013721,-0.471503,0.041598,0.224951,0.010347,0.44113,0.100645,-0.468375,0.476832,-1.010642,-0.554101,-0.012939,-0.099618,0.344983,-0.319595,0.358342,-0.013946,-0.116806,-0.116614,0.140724,0.155921,-0.132298,0.052445,0.091963,0.048544,-0.112567,-0.768152,0.269774,-0.130676,0.217644,-0.327716,0.315979,-0.010259,-0.282099,0.268839,-0.029257,-0.267337,0.277277,0.481109,0.066735,0.292678,0.30296,0.137113,-0.06933,0.506701,0.116101,-0.006606,0.144076,-0.550985,0.292943,0.132258,-0.157181,0.016932,-0.053681,-0.271988,-0.08806,0.366125,0.17542,0.220001,-0.107907,0.195569,0.017735,-0.198986,0.234286,-0.223981,-0.240856,0.124995,-0.282864,-0.080159,0.224579,0.023936,-0.601292,-0.094738,0.302829,-0.080476,-0.07457,-0.098654,0.236581,0.423469,-0.476425,-0.096248,0.267714,-0.446682,0.284011,0.184744,-0.195484,0.004743,-0.121561,0.022336,-0.330192,0.106606,-0.219586,-0.087752,-0.434355,-0.20676,-0.191081,0.149738,0.306268,0.580348,0.544618,-0.108032,-0.023219,0.333006,-0.432774,-0.296982,-0.294342,0.041654,0.14904,-0.472104,-0.753079,0.746994,0.362777,0.241411,0.114611,0.0379,-0.056802,0.384571,0.333445,0.453158,0.224295,0.012266,-0.360372,0.294282,0.651851,0.331792,0.419555,-0.112404,-0.115135,0.160165,-0.100384,0.084813,0.106829,-0.182223,0.441018,-0.239047,0.123553,-0.213049,-0.148998,0.198574,-0.369271,-0.448763,-0.128297,0.032289,0.004228,0.346922,0.262608,-0.000623,-0.013223,0.267952,0.212265,-0.528823,-0.115662,-0.116042,-0.174962,-0.055713,-0.757518,-0.147893,0.297258,0.395347,0.31761,-0.263272,-0.13451,0.058109,-0.247517,0.233218,0.030988,-0.109816,-0.185366,0.258714,0.058193,0.172177,0.023969,-0.249837,0.562171,-0.369591,-0.487313,0.043874,-0.064603,0.401758,-0.156512,0.632611,0.113511,-0.10572,-0.353111,-0.323902,0.126828,0.277069,-0.190946,0.134269,-0.343412,-0.069501,0.379687,-0.13557,0.158138,0.049055,0.243153,0.550345,-0.244093,-0.333974,-0.075763,-0.198311,-0.086906,-0.0518,-0.197714,-0.470889,-0.032068,-0.022586,0.023403,0.198508,-0.583652,0.437954,0.131685,-0.151596,0.524839,0.061723,-0.43073,0.224941,-0.073956,-0.315643,-0.016716,-0.122661,0.07714,-0.449349,-0.434358,-0.230148,0.478476,-0.185249,-0.327078,-0.211618,0.455057,0.52145,-0.126759,-0.26874,-0.387143,-0.357892,-0.152445,-0.131277,-0.181698,0.783945,-0.098175,-0.143345,0.196748,0.011759,0.151039,-0.163838,-0.238578,-0.143445,-0.208734,0.095895,-0.332695,-0.602288,0.084294,0.080629,0.153826,-0.126293,-0.002369,-0.129547,-0.493122,0.121324,0.231975,-0.187562,0.142251,0.017646,-0.250591,-0.218139,-0.141771,0.278196,-0.452267,-0.173018,-0.348953,-0.045962,0.379481,-0.188525,-0.032801,-0.033059,-0.355864,-0.373475,-0.519347,0.157425,0.10905,0.227453,0.025822,-0.434404,-0.687835,0.404428,-0.299522,-0.240596,0.195154,0.089905,0.396229,0.021318,-0.52748,0.17328,-0.202338,-0.176525,0.097584,-0.125476,-0.019364,-0.070496,-0.494707,0.321745,0.054858,0.040782,-0.031466,-0.448942,0.163694,-0.446243,-0.305885,-0.084005,0.284687,0.011335,-0.179263,-0.181824,0.339458,-0.339308,-0.232312,-0.102036,0.276555,0.222405,-0.078327,0.151273,0.064223,-0.489008,0.137159,-0.375325,0.303618,-0.7166,0.176785,0.299297,0.046078,0.133056,-0.459892,-0.433236,0.252978,0.377561,0.176646,0.249643,-0.105212,0.671804,-0.0527,0.450752,-0.49716,-0.094296,0.397453,0.701592,0.091051,0.364615,0.021022,-0.0176,-0.091836,-0.174597,-0.200627,0.14122,0.625736,-0.024053,0.32957,0.363202,0.069265,0.081721,0.016844,0.136224,-0.192963,0.312502,0.083137,0.277013,0.558202,0.224764,-0.375715,-0.351473,0.149993,-0.53655,-0.655709,-0.287326,0.312465,-0.106069,-0.003127,0.089699,-0.490519,0.289105,0.135809,-0.306603,0.238621,-0.093256,-0.271794,-0.160585,0.155909,-0.035182,-0.411286,0.342805,0.199882,-0.338152,0.207653,0.025302,-0.339027,-0.148291,-0.15641,0.488773,0.418237,0.377139,-0.050633,-0.38747,-0.201035,-0.027922,0.114628,-0.088316,0.038645,0.4162,-0.016009,0.084428,0.197733,0.460048,0.143903,-0.110463,0.390926,0.017469,-0.265785,-0.015097,-0.08168,0.145303,-0.093272,0.390399,0.305451,-0.288198,-0.173284,0.399267,-0.150155,-0.321228,-0.57539,-8.993156,0.379318,-0.459095,0.512524,-0.829245,-0.170678,0.941934,0.115079,0.15133,-0.124703,-0.363894,-0.117795,0.538145,0.09236,-0.024018,-0.229316,-0.186503,-0.05763,0.339783,0.498969,0.024895,0.689498,0.20492,0.739139,0.446446,-0.034289,-0.045526,0.010258,0.349283,0.035157,0.242588,0.271479,-0.301884,-0.358719,-0.307911,-0.117666,-0.287914,0.046592,-0.327514,0.345734,0.109044,-0.397464,-0.420231,-0.246352,0.195717,-0.017936,-0.138155,0.489969,0.029374,0.272461,0.210499,0.006589,0.15389,-0.115848,-0.20483,-0.119935,0.050819,0.449575,-0.384652,-0.39593,-0.12946,-0.312371,0.38657,0.3382,-0.124445,-0.002894,-0.751069,-0.179256,0.566775,0.239966,-0.314115,-0.36898,0.187031,-0.564126,-0.065287,0.108502,0.121538,-0.131457,0.507733,0.258262,-0.524039,0.507601,0.727674,-0.232556,0.160086,0.035367,-0.146304,0.497953,0.134339,-0.174401,-0.018716,-0.444736,0.202401,-0.146299,-0.004744,-0.00264,-0.081747,-0.034444,-0.107061,0.317903,-0.184873,0.272138,0.48081,-0.014091,-0.056995,-0.086353,-0.704907,-0.285097,-0.190333,0.374595,0.067246,0.029717,0.480691,0.214664,0.433999,0.4298,0.589101,-0.183587,0.099987,0.136926,-0.076803,-0.241299,-0.212663,-0.393002,-0.045239,0.126593,-0.452187,0.20279,0.383596,0.564217,0.779106,0.120153,0.233362,0.144083,-0.207225,-0.114543,0.041233,-0.275389,-0.16694,0.194323,-0.632167,-0.023242,-0.149805,0.559968,0.431817,-0.173764,-0.159108,0.185891,-0.27352,0.330432,-0.300692,0.345404,-0.663429,-0.498695,0.023733,-0.019268,0.035048,-0.180029,0.04684,-0.095835,-0.026097,0.17139,0.284027,0.029654,0.364113,0.36574,-0.036689,0.107017,-0.038101,0.005663,0.015163,0.107378,-0.009353,-0.305547,0.040779,-0.098842,-0.168703,-0.088278,-0.33713,0.109278,0.166762,0.251462,0.382471,0.233256,-0.079233,0.088984,-0.328222,0.278138,0.119431,-0.099628,0.278443,0.128304,-0.198917,0.208134,-0.494166,0.114039,-0.086735,-0.146052,-0.02057,0.09081,0.218039,0.186026,0.10636,0.028843,-0.071775,-0.159007,0.223851,0.694953,0.178468,0.321511,0.191951


# The mechanism behind the magic: Attention

The *Self-Attention* mechanism allows these models to generate word representations that take into account the surrounding elements.

The picture below shows an example of it. You can see how the word "it" *attends* to other previous and successive words, all of which contribute to the dynamic representation of the current word.


![img](https://jalammar.github.io/images/t/transformer_self-attention_visualization.png)

*Picture from https://jalammar.github.io/illustrated-transformer/*

The mechanism itself is a complex matrix multiplication operation.
Each input token is represented with  three different matrices learned during the training process:
- Queries
- Keys
- Values

They are eventually multiplied together and normalized over the number of attention heads to get the inner representation of the word $t$ at layer $n$ before passing it to the Feed Forward Network

![img](https://jalammar.github.io/images/t/self-attention-matrix-calculation-2.png)

*Picture from https://jalammar.github.io/illustrated-transformer/*


Try https://huggingface.co/spaces/exbert-project/exbert for more high level visualizations of attention in various models!

# Some Practical example using BERT

## Fill-Mask


<div>
  <img src="https://media.geeksforgeeks.org/wp-content/uploads/20200422002516/maskedLanguage.jpg" width="600">
</div>

*picture from https://media.geeksforgeeks.org/

We can replicate the mask language modeling objective using the 'pipeline' object of the transformers library

[*pipeline*](https://huggingface.co/docs/transformers/en/main_classes/pipelines) is an object that leverage the pre-trained model or a specific fine-tuned one and takes care of all the preparation steps. It is mostly used for inference.

In [None]:
#instantiate the fill-mask pipeline
fill_mask = pipeline('fill-mask', model= model_id)

In [None]:
#helper function to get the prediction
def get_fillers(s, output_all = False):
    if not output_all:
      print(fill_mask(s)[0]["sequence"])
    else:
      return fill_mask(s)

In [None]:
fill_mask("Artificial Intelligence [MASK] take over the world.")

[{'score': 0.1016608253121376,
  'token': 1209,
  'token_str': 'will',
  'sequence': 'Artificial Intelligence will take over the world.'},
 {'score': 0.0328458696603775,
  'token': 2088,
  'token_str': 'forces',
  'sequence': 'Artificial Intelligence forces take over the world.'},
 {'score': 0.0315660759806633,
  'token': 1169,
  'token_str': 'can',
  'sequence': 'Artificial Intelligence can take over the world.'},
 {'score': 0.01982802338898182,
  'token': 5789,
  'token_str': 'agents',
  'sequence': 'Artificial Intelligence agents take over the world.'},
 {'score': 0.01851055398583412,
  'token': 1106,
  'token_str': 'to',
  'sequence': 'Artificial Intelligence to take over the world.'}]

Check for bias (still)...

In [None]:
print(fill_mask("In the hospital the woman works as a [MASK]."))
print(fill_mask("In the hospital the man works as a [MASK]."))

[{'score': 0.7329415678977966, 'token': 7439, 'token_str': 'nurse', 'sequence': 'In the hospital the woman works as a nurse.'}, {'score': 0.028688404709100723, 'token': 3995, 'token_str': 'doctor', 'sequence': 'In the hospital the woman works as a doctor.'}, {'score': 0.028663545846939087, 'token': 13487, 'token_str': 'maid', 'sequence': 'In the hospital the woman works as a maid.'}, {'score': 0.028032952919602394, 'token': 23722, 'token_str': 'cleaner', 'sequence': 'In the hospital the woman works as a cleaner.'}, {'score': 0.025764331221580505, 'token': 26458, 'token_str': 'housekeeper', 'sequence': 'In the hospital the woman works as a housekeeper.'}]
[{'score': 0.19614218175411224, 'token': 7439, 'token_str': 'nurse', 'sequence': 'In the hospital the man works as a nurse.'}, {'score': 0.1546916961669922, 'token': 3995, 'token_str': 'doctor', 'sequence': 'In the hospital the man works as a doctor.'}, {'score': 0.06976664066314697, 'token': 23722, 'token_str': 'cleaner', 'sequence': 

In [None]:
print(fill_mask("The only destiny of a woman is to be a [MASK]."))
print(fill_mask("The only destiny of a man is to be a [MASK]."))

[{'score': 0.07571644335985184, 'token': 1590, 'token_str': 'woman', 'sequence': 'The only destiny of a woman is to be a woman.'}, {'score': 0.052465204149484634, 'token': 6748, 'token_str': 'slave', 'sequence': 'The only destiny of a woman is to be a slave.'}, {'score': 0.03437161073088646, 'token': 21803, 'token_str': 'prostitute', 'sequence': 'The only destiny of a woman is to be a prostitute.'}, {'score': 0.02620248682796955, 'token': 8229, 'token_str': 'warrior', 'sequence': 'The only destiny of a woman is to be a warrior.'}, {'score': 0.023374376818537712, 'token': 7559, 'token_str': 'lover', 'sequence': 'The only destiny of a woman is to be a lover.'}]
[{'score': 0.043410658836364746, 'token': 6748, 'token_str': 'slave', 'sequence': 'The only destiny of a man is to be a slave.'}, {'score': 0.04164069890975952, 'token': 5540, 'token_str': 'god', 'sequence': 'The only destiny of a man is to be a god.'}, {'score': 0.035219352692365646, 'token': 1590, 'token_str': 'woman', 'sequence

In [None]:
# try something
get_fillers("")

Telicity: a simple example of a linguistic property

In [None]:
s1 = "I walked [MASK] 3 hours."
s2 = "I finished [MASK] 3 hours."
get_fillers(s1)
get_fillers(s2)

I walked for 3 hours.
I finished in 3 hours.


What about ambigous contexts?

In [None]:
# context is important!
get_fillers("I ate everything [MASK] 3 hours")
get_fillers("I ate everything [MASK] just 3 hours")


I ate everything for 3 hours


Does it respect pronoun agreement?

In [None]:
# pronoun gender
get_fillers("My teacher name is Daniel. [MASK] is a very good teacher.")

My teacher name is Daniel. he is a very good teacher.


In [None]:
get_fillers("John and Mary went out of the cinema and suddenly he kissed [MASK].")

John and Kevin went out of the cinema and suddenly he kissed her.


In [None]:
get_fillers("Lucy and mary went out of the cinema and suddenly she kissed [MASK].")

Lucy and mary went out of the cinema and suddenly she kissed him.


Subject-Verb agreement

In [None]:
get_fillers("The boys [MASK] shy.", output_all = True)

[{'score': 0.41352078318595886,
  'token': 1132,
  'token_str': 'are',
  'sequence': 'The boys are shy.'},
 {'score': 0.25092068314552307,
  'token': 1127,
  'token_str': 'were',
  'sequence': 'The boys were shy.'},
 {'score': 0.06096593663096428,
  'token': 3166,
  'token_str': 'seem',
  'sequence': 'The boys seem shy.'},
 {'score': 0.026996642351150513,
  'token': 3118,
  'token_str': 'remain',
  'sequence': 'The boys remain shy.'},
 {'score': 0.01755758561193943,
  'token': 2845,
  'token_str': 'appear',
  'sequence': 'The boys appear shy.'}]

## Question Answering

To solve Q&A tasks we must call a fine-tuned model. Here the *squad* in the model id stands for *Stanford Questions and Answering Dataset*. Which means the model has been fine-tuned on that dataset



In [None]:
#instantiate the qa pipeline
qa = pipeline("question-answering", "distilbert/distilbert-base-cased-distilled-squad")

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

In [None]:
#helper fucntion to print only the answer
def get_answer(query, context):
  return qa(query, context)["answer"]

In [None]:
question =  "Where did I leave the keys?"
context = "I left the keys on the table before leaving home yesterday"
qa(question, context)

{'score': 0.307202011346817, 'start': 16, 'end': 28, 'answer': 'on the table'}

In [None]:
question =   "When I came home yesterday?"
context = "Yesterday I went home late at night, after a drink with my friends"

get_answer(question, context)

'late at night'

"Exercise"

You can try to fill the quotes below with some text of your choice.

In [None]:
question =   ""
context = ""
get_answer(question, context)

## Classification
*Adapted from https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/*


![img](https://miro.medium.com/v2/resize:fit:720/format:webp/1*fOdb7SbBeMyWcHfTuVyeQA.png)

*Picture from https://towardsdatascience.com/feature-extraction-with-bert-for-text-classification-533dde44dc2f*

We can use BERT as a feature extractor. This means to use it to generate vector representation of our texts. We can then feed a *ML* algorithm with our representations and train it to solve a *classification task*

Whe we extract features from a list of sentences we have to do several things:
- Tokenize each sentence
- Truncate sentences longer than the available context (512 tokens for bert)
- Pad the seqences with zeros to make every sentence long as the longest

For a detailed exposition of these steps we refer to [Jay Alammar's blog](https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/) and the notebook provided there


Here, for the sake of simplicity we compress all the steps (and even jump some)

In [None]:
#instantiate the feature extraction pipeline
feature_extractor = pipeline("feature-extraction", model_id)


We download a dataset containing movie reviews tagged with a value which could be 0 or 1.

- 0 stands for Negative
- 1 stands for positive

*Remember that every ML model deals only with numbers! It is our task to map words or textual label into numbers and viceversa*

In [None]:
# download an example dataset
df = pd.read_csv('https://github.com/clairett/pytorch-sentiment-classification/raw/master/data/SST2/train.tsv', delimiter='\t', header=None)[:2000]
df.columns = ["text","label"]
df.head()

Unnamed: 0,text,label
0,"a stirring , funny and finally transporting re...",1
1,apparently reassembled from the cutting room f...,0
2,they presume their audience wo n't sit still f...,0
3,this is a visually stunning rumination on love...,1
4,jonathan parker 's bartleby should have been t...,1


The steps we are going to follow are:
- Feature extraction. We extract the embeddings for each of our texts
    
  - after that we are going to extract and store only the embedding corresponding to the [CLS] token. This will act as a global representation of our whole text.
- We divide our extracted features in two separate sets:

 - training_set = this is used to train our models along with the training labels
 - test_set = we will evaluate our trained model on this small test, using the test_labels
- we feed a Machine Learning classifier with our features and the corresponding label. This ML model will learn to categorize the representations we provide to it assigning a value of 0 or 1. 0 stands for *Negative*, 1 stands for *Positive*
-We evaluate the *Accuracy* of our model trained on the extracted representation and compare it to a dummy baseline.





<div>
<img src = https://jalammar.github.io/images/distilBERT/bert-distilbert-tutorial-sentence-embedding.png width = 800>
</div>

<div>
<img src = https://jalammar.github.io/images/distilBERT/bert-distilbert-train-test-split-sentence-embedding.png width = 800>
</div>


<div>
<img src = https://jalammar.github.io/images/distilBERT/bert-training-logistic-regression.png width = 800 >
</div>

*Pictures from https://jalammar.github.io/a-visual-guide-to-using-bert-for-the-first-time/*








In [None]:
#esxtract features with our feature extractor
features = feature_extractor(df["text"].tolist(), return_tensors = "pt")

In [None]:
# extract the classification token [CLS] of each sentence
cls_tokens = np.array([f.squeeze(dim = 0)[0].numpy() for f in features])

In [None]:
# split the data in train and test set
labels = df["label"]
train_features, test_features, train_labels, test_labels = train_test_split(cls_tokens, labels)

In [None]:
#train a logistic classifier with distil-bert representations
log_reg = LogisticRegression()
log_reg.fit(train_features, train_labels)

#evaluate our model on test set computing the accuracy
print(f"Distil BERT score: {log_reg.score(test_features, test_labels)}")

Distil BERT score: 0.82


In [None]:
# compare results with a dummy classifier that goes randomly
from sklearn.dummy import DummyClassifier
clf = DummyClassifier()
scores = cross_val_score(clf, train_features, train_labels)
print("Dummy classifier score: %0.3f " % (scores.mean()))

Dummy classifier score: 0.525 


### We can then use the classifier trained on  BERT representations to predict new text.

*Note* that the text has to undergo the same process as the training phase!
that is it should be transformed into a vector representation in the first place.

I have provided a simple function to do so


In [None]:
# helper function to get pretty printed predictions predictions
def predict(text: str):
  cls_token = feature_extractor(text)[0]
  pred = log_reg.predict(cls_token)
  if pred[0] == 0:
    print("NEGATIVE")
  else:
    print("POSITIVE")


predict("i hate the film")
predict("I loved the film")


NEGATIVE
POSITIVE


In [None]:
predict("This film was a total crap")


NEGATIVE


Let's try it on real [reviews](https://www.rottentomatoes.com/m/poor_things/reviews?type=user) taken on rotten tomatoes about the movie *Poor Things*


ex1: 1/5 *

best to skip this weird piece of blank. apart from one 90 second scene with Dafoe and Stone when she returns to the mansion to learn about "God's" reasons, the movie is completely mishandled and skippable.

---
ex2: 5/5 *

One of the most glorious examples of surrealist cinema...really my pride in the fact that a Greek director is behind such a brilliant idea makes me very proud.

---

ex3: 3,5/5 *

The production design team deserve heaps of praise, absolutely stunning sets. The overall story felt like it lacked depth however and I personally took nothing from the film and found it rather forgettable. Visually quite stunning at times although occasionally a little gimmicky.


In [None]:
#store the reviews
review1 = "best to skip this weird piece of blank. apart from one 90 second scene with Dafoe and Stone when she returns to the mansion to learn about God's reasons, the movie is completely mishandled and skippable."
review2 = "One of the most glorious examples of surrealist cinema...really my pride in the fact that a Greek director is behind such a brilliant idea makes me very proud."
review3 = "The production design team deserve heaps of praise, absolutely stunning sets. The overall story felt like it lacked depth however and I personally took nothing from the film and found it rather forgettable. Visually quite stunning at times although occasionally a little gimmicky."

In [None]:
#predict the sentiment for review1
predict(review1)

NEGATIVE


In [None]:
#predict review 2
predict(review2)

POSITIVE


In [None]:
#predict review3
predict(review3)

POSITIVE


---
# GPT

<div>
  <img src="https://lena-voita.github.io/resources/lectures/lang_models/neural/nn_lm_idea_linear-min.png" width="800">
</div>


Causal language modeling is done by feeding the moel with the seqeunce and the real expected continuation as labels. This is reapeted recursively for each step of the sequence, as shown in the image below:

![img](https://jalammar.github.io/images/gpt2/transformer-decoder-attention-mask-dataset.png)

In [None]:
sent = "My name is Bond, James"

In [None]:
# download the model
gpt2_model_id = "gpt2"
gpt2_model = AutoModelForCausalLM.from_pretrained(gpt2_model_id)
gpt_tokenizer = AutoTokenizer.from_pretrained(gpt2_model_id)

In [None]:
#encode the text for GPT
gpt_encs = gpt_tokenizer(sent, return_tensors = "pt", add_special_tokens = True)

In [None]:
# take a look at the tokenization output
for n, id in enumerate(gpt_encs["input_ids"][0]):
  print(n,tokenizer.decode(id),"---->", id.item())

In [None]:
#generate the output
output = gpt2_model.generate(**gpt_encs, max_new_tokens = 1)
output
tokenizer.batch_decode(output)

We could try to train a logistic classifier in the same way we did with BERT with the GPT2 representations

### Text generation pipeline

In [None]:
#instantiate the text generation pipeline
text_gen = pipeline('text-generation', model='gpt2', pad_token_id = 50256)

In [None]:
# exmple of raw generation for 5 sequences
text_gen("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

What can we control about the generation?

We can set a few parameters to give a minumum direction to our generation.
- temperature = a paramenter taking float values between 0 and 1. It regulates the creativity of the model generations. A value closer to 1, make the model more creative but less reliable. On the opposite, going toward 0, makes the model more deterministic, but more accurate.
- max_new_tokens = the number of new tokens we want to generate
- num_return_sequences = the number of sequences we want to return.

More parameters are available such as *top_k* and *top_p*, but they concern more advanced generation decoding settings.

In [None]:
#helper function to get results pretty printed
def get_gens(text, output_all = False, temperature = 0.1, max_new_tokens = 1, num_return_sequences = 1):
  if not output_all:
    print(text_gen(text,
                   max_new_tokens = max_new_tokens,
                   temperature = temperature,
                   num_return_sequences = num_return_sequences
                   )[0]["generated_text"])
  else:
    return text_gen(text, max_new_tokens = max_new_tokens, temperature = temperature, pad_token_id = 50256 )

Indirect oject identification

In [None]:
get_gens("John and Mary went out of the cinema when suddenly he kissed")

In [None]:
get_gens("John and Mary went out of the cinema when suddenly she kissed")

Subject-Verb agreement

In [None]:
get_gens("I saw the boys. He")

In [None]:
get_gens("I saw the boys. They")