In [1]:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
import pandas as pd

# Transformer Language Models
### For sentiment analysis
Author: [Camilo](https://github.com/camilocarvajalreyes/)

**Objectives**: You will get familiarised with the use of pre-trained models, how to interpret them and implement them in a simple NLP pipeline.

**To do**: Trying a [pre-trained](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) sentiment classifier based on Transformers in an industry case! 

### Instructions
1. Download the vectors, install packages and test if everything is okay (if you haven't done so yet)
2. Load the packages and import vectors by running cells in section 0. We will get familiarised with a pre-trained model that uses Transformers from HuggingFace
3. Read the instructions in Section 1 and implement the model in a sentiment classifiction task.
4. In section 2 we will see how to visualise the attention weights of the model for some examples
5. Don't forget to ask if you have any question

## Section 0 - Importing a pre-trained model
In this tutorial we will use a library called **transformers** from the French company **HuggingFace**. [Transformers](https://github.com/huggingface/transformers) offers more than 40 architectures for NLP based on the [the original transformer module](https://arxiv.org/abs/1706.03762).

We'll use [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

This model is a based on the [distilBERT](https://arxiv.org/abs/1910.01108), a lighter version of [BERT](https://arxiv.org/abs/1810.04805) and it has been trained for the [Stanford Sentiment Analysis dataset](https://paperswithcode.com/dataset/sst)

In [2]:
MODEL_NAME = "distilbert-base-uncased-finetuned-sst-2-english"

tokenizer = DistilBertTokenizer.from_pretrained(MODEL_NAME)
model = DistilBertForSequenceClassification.from_pretrained(MODEL_NAME)

First of all we need to do some pre-processing on the input text. Fortunately, HuggingFace includes **Tokenizers** from pre-trained models.

Applying this particular pre-trained model to a tokenised input returns a [SequenceClassifierOutput](https://huggingface.co/transformers/main_classes/output.html#transformers.modeling_outputs.SequenceClassifierOutput) object. More precisely, we can get the **logit** values, which reflect the probability of a Class to be chosen over the other ones.

In this case, we have a **binary sentiment classification**, thus the the logits vector show the odds of the sample being **negative** or **positive** respectively.

In [3]:
input_example = tokenizer("The cutest dog I've ever seen", return_tensors="pt")
output = model(**input_example)
#logits = outputs.logits
#print(logits)
print('Output:')
print(output)
print('Logits:')
print(output.logits)

Furthermore, if we happen to have labels for our samples, we can add them when calling the model and it will output the corresponding **loss**. In this case, we know that our phrase is **positive**, the label should be **1**

In [None]:
label = torch.tensor([1]).unsqueeze(0)  # Batch size 1, postive sentiment (label 1)
output = model(**input_example, labels=label) 
print(output.loss)

Both outputs are PyTorch tensors, but we can transform it to a **numpy array**.

Remark: the first dimension is 1 because it corresponds to the batch size, which is one since we are only inputing one example

In [None]:
tensor = output.logits
array = tensor.detach().numpy()
print(array.shape)
print(array)

## Section 1 - Clasifying tweets
Let's consider a real case scenario: a major **US Airline** needs you to classify tweets from its costumers. They provide the following [dataset](https://www.kaggle.com/crowdflower/twitter-airline-sentiment):

Since we won't fine tune the mode, we will only take 2000 out of the ~14000 tweets

In [None]:
full_dataset = pd.read_csv("data/Tweets-subset.csv")
pd.set_option('max_colwidth', 120)
full_dataset = full_dataset.drop(['tweet_id','negativereason','negativereason_confidence','retweet_count',
             'airline','airline_sentiment_gold','name','negativereason_gold',
             'tweet_coord','tweet_created','tweet_location','user_timezone'], axis=1)
dataset = full_dataset[(full_dataset['airline_sentiment']!='neutral')]

# example sentence 
index = 16
print(dataset.iloc[index].text)

dataset.head()

**1. Using our pre-trained Model, code a function that takes a string of text and outputs 1 if it's positive and 0 if negative**

In [None]:
### to do ###

**2. Iterate over the dataset and compute the average loss and the accuracy of the classifier**

(This cell should take about 1 minute to run)

In [None]:
for index, row in dataset.iterrows():
    text = row.text
    ### to do ###
    
print('average loss = {}'.format(average_loss))
print('accuracy = {}'.format(accuracy))

### Visualising Attention

The airline wishes to understand how the model works. Therefore, you decide to show a visualisation of the attention weights from the first and last layer of transformers. Thankfully, Transformers provides the attention weights when output_attentions=True

In [None]:
def attention(list_text,layer=0):
    ''' parameters:
    text: input string for the classifier
    layer: layer to show the attention weights from
    '''
    input_sentence = tokenizer(list_text, return_tensors="pt",is_split_into_words=True)
    output = model(**input_sentence,output_attentions=True)
    print('text: '+list_text)
    array = output.attentions[layer].detach().numpy()
    array = array.mean(axis=1) # taking average over all attention heads
    return array[0,:,:]

#example
#attention("@VirginAmerica and it's a really big bad thing about it")

**3. Using the attention function, code a heatmap to visualise the scoresfrom the model's self attention**

You can take inspiration from [this example](https://matplotlib.org/stable/gallery/images_contours_and_fields/image_annotated_heatmap.html)

In [None]:
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (10,6) # setting figure size

def heatmap_attention(text,layer=0):
    tokenized_sequence = tokenizer.tokenize(text)
    labels = ['[CLS]']+tokenized_sequence+['[SEP]']

    matrix = ### to do ###


    fig, ax = plt.subplots()
    im = ax.imshow(matrix)

    # We want to show all ticks...
    
    ### to do ###
    
    # ... and label them with the respective list entries
    
    ### to do ###
    
    # Rotate the tick labels and set their alignment.
    plt.setp(ax.get_xticklabels(), rotation=45, ha="right",
             rotation_mode="anchor")

    ax.set_title("Attention visualisation for our Transformer based classifier")
    fig.tight_layout()
    plt.show()

In [None]:
heatmap_attention("@VirginAmerica and it's a really big bad thing about it")

### Bonus task
In order to make the task simpler, we have so far ignored neutral tweets.

In [None]:
full_dataset[(full_dataset['airline_sentiment']=='neutral')].head()

**Design a way of predicting that the tweet is neutral**

You may use the attribute **logits** from the model's output

In [None]:
### to do ###

### Section 2: More on HuggingFace's transformers

We have used a pre-trained model, but we can fine-tune it for our specific task or data-set. You can find the instructions for doing so in [this tutorial](https://github.com/huggingface/notebooks/blob/master/examples/text_classification.ipynb)

Furthermore, there's an even simpler way of using HuggingFace's pre-trained models: [pipelines](https://github.com/huggingface/transformers/blob/master/notebooks/03-pipelines.ipynb)

In [None]:
from transformers import pipeline

**Example using a sentiment classification task:**

In [None]:
nlp_sentence_classif = pipeline('sentiment-analysis')
nlp_sentence_classif("@VirginAmerica and it's a really big bad thing about it")

**Example of machine translation**

In [None]:
# English to French
translator = pipeline('translation_en_to_fr')
translator("HuggingFace is a French company that is based in New York City. HuggingFace's mission is to solve NLP one commit at a time")