## Introducing Facebook's Natural Language Model : Roberta

## Introducing Facebook's Roberta Model.


Key words to research relating to the Roberta neural network structure in your own time :

1. Attention mechanism underlying transformers
2. BERT
3. Bidirectional Encoders
4. LSTM: Long short term memory neural networks
5. Tokenization.  byte-level BPE tokenizer.
6. Masked language modeling
 

 

#### Two nlp libraries  for transformer models.
1. Facebook's pytorch library: An open source machine learning framework that accelerates the path from research prototyping to production deployment.
pytorch github: https://github.com/pytorch
pytorch  website: https://pytorch.org/

2. Huggingface python library
website: https://huggingface.co/transformers/index.html    




#### Links to relevant research papers: 

Roberta:https://arxiv.org/abs/1907.11692

Attention mechanism underlying the transformer: https://arxiv.org/abs/1706.03762

Tokenization; https://huggingface.co/transformers/tokenizer_summary.html


## Import a pretrained ROBERTA MODEL from FACEBOOK'S HUB. 

https://pytorch.org/hub/pytorch_fairseq_roberta/

## Focus and test model on one NLP task: MNLI

##  MNLI :

"MNLI The Multi-Genre Natural Language Inference Corpus (Williams et al., 2018) is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence
and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are
gathered from ten different sources, including transcribed speech, fiction, and government reports.
We use the standard test set, for which we obtained private labels from the authors, and evaluate
on both the matched (in-domain) and mismatched (cross-domain) sections. We also use and recommend the SNLI corpus (Bowman et al., 2015) as 550k examples of auxiliary training data." 

Source: https://openreview.net/pdf?id=rJ4km2R5t7

## DEFINE TWO SETS OF SENTENCES TO COMPARE. 

In [7]:

#Premise sentences
s1=[('I am definitely in the mood to go out for dinner tonight.'),
    ('My favourite wine comes from the Boudeaux region of France.'),
    ('The president was shot.')] 

#Hypothesis sentences
s2=[('I wonder if we should eat at home this evening.'),
    ('If I had to pick I would pick the Burgundy over the Bordeaux'),
    ('The president is dead')] 


In [8]:
#Premise sentences
s1

['I am definitely in the mood to go out for dinner tonight.',
 'My favourite wine comes from the Boudeaux region of France.',
 'The president was shot.']

In [9]:
#Hypothesis sentencs
s2

['I wonder if we should eat at home this evening.',
 'If I had to pick I would pick the Burgundy over the Bordeaux',
 'The president is dead']

## RUN THE MODEL 

In [10]:
#import torch library and the large roberta pre-trained model. 

import torch
#Get the Roberta model from Facebook hub. Using the Roberta pre-trained large MNLI model.
roberta = torch.hub.load('pytorch/fairseq', 'roberta.large.mnli')

Using cache found in /Users/brianfarrell/.cache/torch/hub/pytorch_fairseq_master


loading archive file http://dl.fbaipublicfiles.com/fairseq/models/roberta.large.mnli.tar.gz from cache at /Users/brianfarrell/.cache/torch/pytorch_fairseq/7685ba8546f9a5ce1a00c7a6d7d44f7e748d22681172f0f391c3d48f487c801c.74e37d47306b3cc51c5f8d335022a392c29f1906c8cd9e9cd3446d7422cf55d8
| dictionary: 50264 types


In [11]:
#Function to test the model 
def hello_roberta():     
    #Three sentence pairs to test the model on.
     
    for i in range(3): 
        label= {0:'Contradiction',1:'Neutral',2:'Entailment'}
        pred=[]
        tokens = roberta.encode(s1[i], s2[i])
        prediction = roberta.predict('mnli', tokens).argmax().item()  
        
        for k,v in label.items():
            if prediction == k:
                pred.append(v)
                print("")            
                print(s1[i],s2[i]) 
                print("Roberta's prediction:",pred)
                
hello_roberta()
             


I am definitely in the mood to go out for dinner tonight. I wonder if we should eat at home this evening.
Roberta's prediction: ['Contradiction']

My favourite wine comes from the Boudeaux region of France. If I had to pick I would pick the Burgundy over the Bordeaux
Roberta's prediction: ['Contradiction']

The president was shot. The president is dead
Roberta's prediction: ['Entailment']


## View Roberta structure.

In [12]:
#Large Roberta model has aprox 300mln parameters.  
roberta.model 

RobertaModel(
  (decoder): RobertaEncoder(
    (sentence_encoder): TransformerSentenceEncoder(
      (embed_tokens): Embedding(50265, 1024, padding_idx=1)
      (embed_positions): LearnedPositionalEmbedding(514, 1024, padding_idx=1)
      (layers): ModuleList(
        (0): TransformerSentenceEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear(in_features=1024, out_features=4096, bias=True)
          (fc2): Linear(in_features=4096, out_features=1024, bias=True)
          (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        )
        (1): TransformerSentenceEncoderLayer(
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((1024,)