## **Assignment 3 - Deep Learning with RNNs**

### **Haoyuan Qin (001048178)**

### **Abstract**

A recurrent neural network is a class of artificial neural networks where connections between nodes form a directed or undirected graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition.

The following models are from [Hugging Face](https://huggingface.co/models). 

For the mainly used library Transformers, its goal is to provide a single API through which any Transformer model can be loaded, trained, and saved. Before the training or testing, pipeline of Transformers groups together three steps: preprocessing with a tokenizer (to convert the text inputs into numbers that the model can make sense of), passing the inputs through the model, and postprocessing the output (to convert logits, the raw, unnormalized scores outputted by the last layer of the model, to probabilities).

I followed the official documents, tried standard example and then test with customized data to evaluate its performance in each question.

### **Questions:**

**1. Fill-Mask (10 Points)**    Run a Fill-Mask language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Downloading transformers-4.24.0-py3-none-any.whl (5.5 MB)
[K     |████████████████████████████████| 5.5 MB 4.3 MB/s 
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1
  Downloading tokenizers-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 48.7 MB/s 
Collecting huggingface-hub<1.0,>=0.10.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
[K     |████████████████████████████████| 163 kB 46.9 MB/s 
Installing collected packages: tokenizers, huggingface-hub, transformers
Successfully installed huggingface-hub-0.10.1 tokenizers-0.13.2 transformers-4.24.0


In [None]:
pip install SentencePiece

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting SentencePiece
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[K     |████████████████████████████████| 1.3 MB 3.8 MB/s 
[?25hInstalling collected packages: SentencePiece
Successfully installed SentencePiece-0.1.97


In [None]:
# Example of Hugging face 
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/466k [00:00<?, ?B/s]

[{'score': 0.10731087625026703,
  'token': 4827,
  'token_str': 'fashion',
  'sequence': "hello i'm a fashion model."},
 {'score': 0.08774493634700775,
  'token': 2535,
  'token_str': 'role',
  'sequence': "hello i'm a role model."},
 {'score': 0.05338375270366669,
  'token': 2047,
  'token_str': 'new',
  'sequence': "hello i'm a new model."},
 {'score': 0.046672236174345016,
  'token': 3565,
  'token_str': 'super',
  'sequence': "hello i'm a super model."},
 {'score': 0.027095822617411613,
  'token': 2986,
  'token_str': 'fine',
  'sequence': "hello i'm a fine model."}]

In [None]:
# test of Hugging face 
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("\"Now\" he said, you’d better meet my wife. She is the one who really [MASK] this school.")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.4772254526615143,
  'token': 2318,
  'token_str': 'started',
  'sequence': '" now " he said, you ’ d better meet my wife. she is the one who really started this school.'},
 {'score': 0.1604544073343277,
  'token': 3216,
  'token_str': 'runs',
  'sequence': '" now " he said, you ’ d better meet my wife. she is the one who really runs this school.'},
 {'score': 0.035816799849271774,
  'token': 2631,
  'token_str': 'founded',
  'sequence': '" now " he said, you ’ d better meet my wife. she is the one who really founded this school.'},
 {'score': 0.03256159648299217,
  'token': 2580,
  'token_str': 'created',
  'sequence': '" now " he said, you ’ d better meet my wife. she is the one who really created this school.'},
 {'score': 0.030960548669099808,
  'token': 2328,
  'token_str': 'built',
  'sequence': '" now " he said, you ’ d better meet my wife. she is the one who really built this school.'}]

This model is a basic version of BERT (Bidirectional Encoder Representations from Transformers) model. According to the paper by Jacob Devlin et al., There are two steps in the framework: pre-training and fine-tuning. During pre-training, the model is trained on unlabeled data over different pre-training tasks. For finetuning, the BERT model is first initialized with the pre-trained parameters, and all of the parameters are fine-tuned using labeled data from the downstream tasks. Each downstream task has separate fine-tuned models, even though they are initialized with the same pre-trained parameters.

I chose a cloze question of China's College entrance examination as a test sentence. The correct answer is "runs", but I think "starts" is also correct if it's without context. The model are highly accurate in this sentence. But there is one of the weaknesses of the model, which has no way of making judgments based on long or hidden context information. So if you describe a sentence that's too neutral, it will be a big deviation.

**2. Question Answering (10 Points)**  Run a Question Answering language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
# Example of Hugging face

from transformers import pipeline

question_answerer = pipeline("question-answering")

context = r"""
Extractive Question Answering is the task of extracting an answer from a text given a question. 
An example of a question answering dataset is the SQuAD dataset, which is entirely based on that 
task. If you would like to fine-tune a model on a SQuAD task, you may leverage the 
examples/pytorch/question-answering/run_squad.py script.
"""

result = question_answerer(question="What is extractive question answering?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = question_answerer(question="What is a good example of a question answering dataset?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

Answer: 'the task of extracting an answer from a text given a question', score: 0.6177, start: 34, end: 95
Answer: 'SQuAD dataset', score: 0.5152, start: 148, end: 161


In [None]:
# test of Hugging face

from transformers import pipeline

question_answerer = pipeline("question-answering")

context = r"""
The Intelligent Transport team at Newcastle University have turned an electric car into a mobile laboratory named 
“DriveLAB” in order to understand the challenges faced by older drivers and to discover where the key stress points are.
Research shows that giving up driving is one of the key reasons for a fall in health and well-being among older people, 
leading to them becoming more isolated and inactive.
Led by Professor Phil Blythe, the Newcastle team are developing in-vehicle technologies for older drivers which they hope 
could help them to continue driving into later life.
These include custom-made navigation tools, night vision systems and intelligent speed adaptations. Phil Blythe explains: 
“For many older people, particularly those living alone or in the country, driving is important for preserving their 
independence, giving them the freedom to get out and about without having to rely on others.”
"But we all have to accept that as we get older our reactions slow down and this often results in people avoiding any 
potentially challenging driving conditions and losing confidence in their driving skills. The result is that people stop 
driving before they really need to."
Dr Amy Guo, the leading researcher on the older driver study, explains, "The DriveLAB is helping us to understand what 
the key points and difficulties are for older drivers and how we might use technology to address these problems.
"For example, most of us would expect older drivers always go slower than everyone else but surprisingly, we found that 
in 30mph zones they struggled to keep at a constant speed and so were more likely to break the speed limit and be at risk 
of getting fined. We’re looking at the benefits of systems which control their speed as a way of preventing that.
"We hope that our work will help with technological solutions to ensure that older drivers stay safer behind the wheel."
"""

result = question_answerer(question="What is the purpose of the Drivel AB?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = question_answerer(question="Why is driving important for older people according to Phil Blythe?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

result = question_answerer(question="What do researchers hope to do for older drivers?", context=context)
print(f"Answer: '{result['answer']}', score: {round(result['score'], 4)}, start: {result['start']}, end: {result['end']}")

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Answer: 'to understand the challenges faced by older drivers', score: 0.1853, start: 136, end: 187
Answer: 'preserving their 
independence', score: 0.3683, start: 810, end: 840
Answer: 'continue driving into later life', score: 0.3947, start: 553, end: 585


Extractive Question Answering is the task of extracting an answer from a text given a question. The official document of Hugging Face shows the process below:

>1. Instantiate a tokenizer and a model from the checkpoint name. The model is identified as a BERT model and loads it with the weights stored in the checkpoint.
>2. Define a text and a few questions.
>3. Iterate over the questions and build a sequence from the text and the current question, with the correct model-specific separators token type ids and attention masks.
>4. Pass this sequence through the model. This outputs a range of scores across the entire sequence tokens (question and text), for both the start and end positions.
>5. Compute the softmax of the result to get probabilities over the tokens.
>6. Fetch the tokens from the identified start and stop values, convert those tokens to a string.
>7. Print the results.

Using pipelines to do question answering extract an answer from a text given a question. It leverages a fine-tuned model on SQuAD. This returns an answer extracted from the text, a confidence score, alongside “start” and “end” values, which are the positions of the extracted answer in the text.


**3. Summarization (10 Points)**    Run a Summarization language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.


In [None]:
# example
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """ New York (CNN)When Liana Barrientos was 23 years old, she got married in Westchester County, New York.
A year later, she got married again in Westchester County, but to a different man and without divorcing her first husband.
Only 18 days after that marriage, she got hitched yet again. Then, Barrientos declared "I do" five more times, sometimes 
only within two weeks of each other.
In 2010, she married once more, this time in the Bronx. In an application for a marriage license, she stated it was her 
"first and only" marriage.
Barrientos, now 39, is facing two criminal counts of "offering a false instrument for filing in the first degree," 
referring to her false statements on the 2010 marriage license application, according to court documents.
Prosecutors said the marriages were part of an immigration scam.
On Friday, she pleaded not guilty at State Supreme Court in the Bronx, according to her attorney, Christopher Wright, 
who declined to comment further.
After leaving court, Barrientos was arrested and charged with theft of service and criminal trespass for allegedly 
sneaking into the New York subway through an emergency exit, said Detective Annette Markowski, a police spokeswoman. 
In total, Barrientos has been married 10 times, with nine of her marriages occurring between 1999 and 2002.
All occurred either in Westchester County, Long Island, New Jersey or the Bronx. She is believed to still be married to 
four men, and at one time, she was married to eight men at once, prosecutors say.
Prosecutors said the immigration scam involved some of her husbands, who filed for permanent residence status shortly 
after the marriages. Any divorces happened only after such filings were approved. It was unclear whether any of the men 
will be prosecuted.
The case was referred to the Bronx District Attorney\'s Office by Immigration and Customs Enforcement and the Department 
of Homeland Security\'s
Investigation Division. Seven of the men are from so-called "red-flagged" countries, including Egypt, Turkey, Georgia, 
Pakistan and Mali.
Her eighth husband, Rashid Rajput, was deported in 2006 to his native Pakistan after an investigation by the Joint 
Terrorism Task Force.
If convicted, Barrientos faces up to four years in prison.  Her next court appearance is scheduled for May 18.
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))


Downloading:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

[{'summary_text': 'Liana Barrientos has been married 10 times, sometimes within two weeks of each other. At one time, she was married to eight men at once, prosecutors say. She is believed to still be married to four men.'}]


In [None]:
# test 1
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """
The Intelligent Transport team at Newcastle University have turned an electric car into a mobile laboratory named 
“DriveLAB” in order to understand the challenges faced by older drivers and to discover where the key stress points are.
Research shows that giving up driving is one of the key reasons for a fall in health and well-being among older people, 
leading to them becoming more isolated and inactive.
Led by Professor Phil Blythe, the Newcastle team are developing in-vehicle technologies for older drivers which they hope 
could help them to continue driving into later life.
These include custom-made navigation tools, night vision systems and intelligent speed adaptations. Phil Blythe explains: 
“For many older people, particularly those living alone or in the country, driving is important for preserving their 
independence, giving them the freedom to get out and about without having to rely on others.”
"But we all have to accept that as we get older our reactions slow down and this often results in people avoiding any 
potentially challenging driving conditions and losing confidence in their driving skills. The result is that people stop 
driving before they really need to."
Dr Amy Guo, the leading researcher on the older driver study, explains, "The DriveLAB is helping us to understand what 
the key points and difficulties are for older drivers and how we might use technology to address these problems.
"For example, most of us would expect older drivers always go slower than everyone else but surprisingly, we found that 
in 30mph zones they struggled to keep at a constant speed and so were more likely to break the speed limit and be at risk 
of getting fined. We’re looking at the benefits of systems which control their speed as a way of preventing that.
"We hope that our work will help with technological solutions to ensure that older drivers stay safer behind the wheel."
"""
print(summarizer(ARTICLE, max_length=130, min_length=30, do_sample=False))


[{'summary_text': 'Newcastle University is developing in-vehicle technologies for older drivers. These include navigation tools, night vision systems and intelligent speed adaptations. Research shows that giving up driving is one of the key reasons for a fall in health and well-being among older people.'}]


In [None]:
# test 2
from transformers import pipeline

summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

ARTICLE = """
Bill: I like eating pizza, how about you?
Alex: I feel it awful, I prefer steaks.
"""
print(summarizer(ARTICLE, max_length=20, min_length=10, do_sample=False))


[{'summary_text': 'Alex: I feel it awful, I prefer steaks. Bill: I like eating'}]


In the summary generation task, the input sequence is the document we want to summarize and the output sequence is a summary of the document. Bart-large-cnn model is the basic model that is fine-tuned with CNN/Daily Mail Abstractive Summarization Task. The Seq2Seq architecture can be used directly for summary tasks without any new operations, and pre-training tasks are well suited for downstream tasks.

In terms of methods, text abstracts can be divided into two categories: Extractive and Abstractive. The former abstracts sentences from the original text directly, while the latter generates abstracts word by word. In comparison, the extraction method can not summarize the content of the original text roundly because of its inherent characteristics. The generative method is more flexible, but it is prone to make errors. For example, the third test shows the output content contrary to the original text. 

Before putting into use, it is very important to make sure that the summary of the model summary is consistent with the meaning of the original text.

**4. Text Classification (10 Points)** Run a Text Classification language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.


In [None]:
# example
from transformers import pipeline
classifier = pipeline("text-classification",
model='distilbert-base-uncased-finetuned-sst-2-english', 
return_all_scores=True)
prediction = classifier(
    "I love using transformers. The best part is wide range of support and its easy to use")
print(prediction)

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/232k [00:00<?, ?B/s]

[[{'label': 'NEGATIVE', 'score': 0.0007330990047194064}, {'label': 'POSITIVE', 'score': 0.9992669224739075}]]




In [None]:
# test 1
from transformers import pipeline
classifier = pipeline(
    "text-classification",model='distilbert-base-uncased-finetuned-sst-2-english', 
    return_all_scores=True)
prediction = classifier("Nobody doesn't like this restaurant.")
print(prediction)

[[{'label': 'NEGATIVE', 'score': 0.9927523136138916}, {'label': 'POSITIVE', 'score': 0.0072477348148822784}]]


This model is a fine-tune checkpoint of DistilBERT-base-uncased, fine-tuned on SST-2. According to the introduction, it has better accuracy. Bert, which is pre-trained on a large number of data sets, usually only needs fine tuning to run well on many NLP tasks in this experiment. For emotional classification, it needs to add one or two linear layers at the last layer, output category probability, set cross entropy as loss, and train on a small number of data sets. That is input the same input to encoder and decoder. Finally, the last hidden node of decoder is input to classification layer (full connection layer) to obtain the final classification result.

This method can judge whether the overall emotion of the input is positive or negative by grasping the emotional tendency of keywords. The advantage is that it can be used for the analysis of long film reviews or restaurant reviews to simplify the problem, but it cannot recognize indirect expressions of emotion (like "Nobody doesn't like this restaurant." It will be negative.).

**5. Text Generation (10 Points)** Run a Text Generation language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
# example
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/548M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Hello, I'm a language model, I'm writing a new language for you. But first, I'd like to tell you about the language itself"},
 {'generated_text': "Hello, I'm a language model, and I'm trying to be as expressive as possible. In order to be expressive, it is necessary to know"},
 {'generated_text': "Hello, I'm a language model, so I don't get much of a license anymore, but I'm probably more familiar with other languages on that"},
 {'generated_text': "Hello, I'm a language model, a functional model... It's not me, it's me!\n\nI won't bore you with how"},
 {'generated_text': "Hello, I'm a language model, not an object model.\n\nIn a nutshell, I need to give language model a set of properties that"}]

In [None]:
# test 1
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
input = "Hello, I'm a language model,"
for i in range(0,5):
    g = generator(input, max_length=30*i, num_return_sequences=2)
    input = g[0]['generated_text']
    print(input)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Input length of input_ids is 8, but `max_length` is set to 0. This can lead to unexpected behavior. You should consider increasing `max_new_tokens`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello, I'm a language model, I


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello, I'm a language model, I've got some more features to add and we'll get it done. In fact, I think there's


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello, I'm a language model, I've got some more features to add and we'll get it done. In fact, I think there's a lot of potential here. In that respect, I wish it would be possible. If there's potential, what's the next step? And I


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Hello, I'm a language model, I've got some more features to add and we'll get it done. In fact, I think there's a lot of potential here. In that respect, I wish it would be possible. If there's potential, what's the next step? And I've kind of got these ideas, as it turns out, and the next step is going forward, but we're at this very early stage, and
Hello, I'm a language model, I've got some more features to add and we'll get it done. In fact, I think there's a lot of potential here. In that respect, I wish it would be possible. If there's potential, what's the next step? And I've kind of got these ideas, as it turns out, and the next step is going forward, but we're at this very early stage, and it's kind of very early, and we'll see what's happened in the next round."

Hilbert didn't reveal that his work


In [None]:
# test 2
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("Hello, I'm a language model,", max_length=300, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Hello, I\'m a language model, I\'ve got some more features to add and we\'ll get it done. In fact, I think there\'s a lot of potential here. In that respect, I wish it would be possible. If there\'s potential, what\'s the next step? And I\'ve kind of got these ideas, as it turns out, and the next step is going forward, but we\'re at this very early stage, and it\'s kind of very early, and we\'ll see what\'s happened in the next round."\n\nHilbert didn\'t reveal that his work here was just the beginning of the work that would go into his application. But he acknowledged that work was still to come.\n\n"We\'ll try to do some more," said the Michigan native in a prepared statement. "I don\'t really know everything yet. But I\'m sure we\'ll get it out that we\'re pretty sure it has some things that could be done that are very early on."'},
 {'generated_text': 'Hello, I\'m a language model, and a language has to respect the language."\n\nKrebs said he\'s always made sur

In [None]:
# test 3
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='gpt2')
set_seed(42)
generator("I used to be a positive man, until", max_length=300, num_return_sequences=2)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "I used to be a positive man, until I realised that I'm actually a negative man... it's just like I think what I'm doing is wrong.\n\nBENNETT : No man has ever done anything like that. If you did, it would have been really obvious. And I've had people say 'what are you thinking all of a sudden?' because you're going, 'what do you mean? Who am I writing about? Why are you writing this?'\n\nSIMON : I've got no idea what you're talking about.\n\nBENNETT : I understand, if you do it a few times over then it may come back again. It's just you have to recognise that you're out there doing that one little thing that's important to you in life, not that you're making anything up, or that you don't believe what you're talking about.\n\nSIMON : So do you?\n\nBENNETT : I do. I can hardly think of a time where that's something else I would ever write about, which is never going to happen. It would be funny to say that.\n\nSIMON : How are you now?\n\nBENNETT : I've been here for

In text generation task, I used GPT-2 model developed by OpenAI. This model was trained on (and evaluated against) WebText, a dataset consisting of the text contents of 45 million links posted by users of the ‘Reddit’ social network. WebText is made of data derived from outbound links from Reddit and does not consist of data taken directly from Reddit itself. More precisely, it was trained to guess the next word in sentences. The source data is input into encoder, and the target text to be generated is input into decoder for auto-regression generation.

More precisely, inputs are sequences of continuous text of a certain length and the targets are the same sequence, shifted one token (word or piece of word) to the right. The model uses internally a mask-mechanism to make sure the predictions for the token only uses the inputs from to.

In Tests 1 and 2, I found that there was not much difference in the results of producing 300 lengths of text at a time and gradually producing 30 lengths of text at a time, which might provide less variety if used as a writing aid. But the generation is smooth and error-free in non-story paragraphs.



**6. Text2Text Generation (10 Points)** Run a Text2Text language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
model_name = 'tuner007/pegasus_paraphrase'
torch_device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = PegasusTokenizer.from_pretrained(model_name)
model = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

def get_response(input_text,num_return_sequences,num_beams):
  batch = tokenizer([input_text],truncation=True,padding='longest',
  max_length=60, return_tensors="pt").to(torch_device)
  translated = model.generate(**batch,max_length=60,num_beams=num_beams, 
  num_return_sequences=num_return_sequences, temperature=1.5)
  tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
  return tgt_text

Downloading:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/86.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.28G [00:00<?, ?B/s]

In [None]:
# example
num_beams = 10
num_return_sequences = 10
context = "The ultimate test of your knowledge is your capacity to convey it to another."
get_response(context,num_return_sequences,num_beams)

['The test of your knowledge is your ability to convey it.',
 'The ability to convey your knowledge is the ultimate test of your knowledge.',
 'The ability to convey your knowledge is the most important test of your knowledge.',
 'Your capacity to convey your knowledge is the ultimate test of it.',
 'The test of your knowledge is your ability to communicate it.',
 'Your capacity to convey your knowledge is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge to another is the ultimate test of your knowledge.',
 'Your capacity to convey your knowledge is the most important test of your knowledge.',
 'The test of your knowledge is how well you can convey it.',
 'Your capacity to convey your knowledge is the ultimate test.']

In [None]:
# test
num_beams = 10
num_return_sequences = 10
context = "As the chart above shows, revenues in 2009 more than doubled from 2008."
get_response(context,num_return_sequences,num_beams)

['The chart shows that revenues more than doubled in 2009.',
 'In 2009, revenues more than doubled from the year before.',
 'In 2009, revenues more than doubled from 2008.',
 'Revenues more than doubled from 2008 as shown in the chart above.',
 'In 2009, revenues more than doubled.',
 'Revenues more than doubled from 2008 to 2009.',
 'The chart shows that revenues doubled from 2008 to 2009.',
 'In 2009, revenues more than doubled from the previous year.',
 'The chart shows that revenues doubled in 2009.',
 'The chart shows that in 2009, revenues more than doubled.']

Text2TextGeneration is the pipeline for text to text generation using seq2seq models. It is a single pipeline for all kinds of NLP tasks like Question answering, sentiment classification, question generation, translation, paraphrasing, summarization, etc. More like a rich version supporting more training tasks.

I use the model pegasus_paraphrase. It is used to generate different representations of input statements. It can be used to enrich the sentence patterns used in writing. Results appear to be generated by extraction and composition, and produces nothing wrong. Looks enough for its purpose.

**7. Token Classification (10 Points)** Run a Token Classification language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.


In [None]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

# example
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
nlp("My name is Wolfgang and I live in Berlin")

Downloading:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/829 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/433M [00:00<?, ?B/s]

[{'entity': 'B-PER',
  'score': 0.9990139,
  'index': 4,
  'word': 'Wolfgang',
  'start': 11,
  'end': 19},
 {'entity': 'B-LOC',
  'score': 0.999645,
  'index': 9,
  'word': 'Berlin',
  'start': 34,
  'end': 40}]

In [None]:
# test
nlp("Keira Knightley got married in a low-key in the south of France on Saturday")

[{'entity': 'B-PER',
  'score': 0.99957126,
  'index': 1,
  'word': 'Ke',
  'start': 0,
  'end': 2},
 {'entity': 'B-PER',
  'score': 0.995046,
  'index': 2,
  'word': '##ira',
  'start': 2,
  'end': 5},
 {'entity': 'I-PER',
  'score': 0.999683,
  'index': 3,
  'word': 'Knight',
  'start': 6,
  'end': 12},
 {'entity': 'I-PER',
  'score': 0.99626935,
  'index': 4,
  'word': '##ley',
  'start': 12,
  'end': 15},
 {'entity': 'B-LOC',
  'score': 0.99964833,
  'index': 16,
  'word': 'France',
  'start': 57,
  'end': 63}]

bert-base-NER is a fine-tuned BERT model that is ready to use for Named Entity Recognition and achieves state-of-the-art performance for the NER task. It has been trained to recognize four types of entities: location (LOC), organizations (ORG), person (PER) and Miscellaneous (MISC).

Specifically, this model is a bert-base-cased model that was fine-tuned on the English version of the standard CoNLL-2003 Named Entity Recognition dataset.

For Token classification tasks, the complete input is input into encoder and decoder, and all the hidden nodes in the last layer of decoder are represented as the model of each Token, and then the representation of each Token is classified, and finally the result output is obtained.

The training dataset distinguishes between the beginning and continuation of an entity so that if there are back-to-back entities of the same type, the model can output where the second entity begins. As in the dataset, each token will be classified as one of the following classes:

It shows high accuracy with the example input. But when test with other input, it will divide a whole name into 2 parts or highlight wrong words. 

**8. Translation (10 Points)** Run a Translation language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
  
tokenizer = AutoTokenizer.from_pretrained("unicamp-dl/translation-en-pt-t5")
model = AutoModelForSeq2SeqLM.from_pretrained("unicamp-dl/translation-en-pt-t5")

# example
enpt_pipeline = pipeline('text2text-generation', model=model, tokenizer=tokenizer)
enpt_pipeline("translate English to Portuguese: I like to eat rice.")


Downloading:   0%|          | 0.00/1.95k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/756k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.35M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.79k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/634 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/892M [00:00<?, ?B/s]

[{'generated_text': 'Eu gosto de comer arroz.'}]

In [None]:
# test
enpt_pipeline("translate English to Portuguese: I like to eat rice.")

[{'generated_text': 'Eu gosto de comer arroz.'}]

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. Google explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. According to the modell card of t5:

>Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new “Colossal Clean Crawled Corpus”, we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our dataset, pre-trained models, and code.

This model brings an implementation of T5 for translation in EN-PT tasks using a modest hardware setup. It proposes some changes in tokenizator and post-processing that improves the result and used a Portuguese pretrained model for the translation.

**9. Zero-Shot Classification (10 Points)** Run a Zero-Shot language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# example
premise = "I first thought that I liked the movie, but upon second thought it was actually disappointing."
hypothesis = "The movie was good."

input = tokenizer(premise, hypothesis, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to('cpu'))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["entailment", "neutral", "contradiction"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

Downloading:   0%|          | 0.00/417 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/18.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/156 [00:00<?, ?B/s]

  "The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option"


Downloading:   0%|          | 0.00/1.09k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/369M [00:00<?, ?B/s]

{'entailment': 6.6, 'neutral': 17.3, 'contradiction': 76.1}


In [None]:
# test 1
premise = """A new model offers an explanation for how the Galilean satellites formed around the solar system’s 
largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling mysteries 
when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, best known for his 
contributions to the search for the solar system’s missing “Planet Nine,” spotted a beer bottle. At a steep, 20 
degree grade, he wondered why it wasn’t rolling down the hill. He realized there was a breeze at his back 
holding the bottle in place. Then he had a thought that would only pop into the mind of a theoretical 
astrophysicist: “Oh! This is how Europa formed.” Europa is one of Jupiter’s four large Galilean moons. And in 
a paper published Monday in the Astrophysical Journal, Dr. Batygin and a co-author, Alessandro Morbidelli, a 
planetary scientist at the Côte d’Azur Observatory in France, present a theory explaining how some moons form 
around gas giants like Jupiter and Saturn, suggesting that millimeter-sized grains of hail produced during the 
solar system’s formation became trapped around these massive worlds, taking shape one at a time into the 
potentially habitable moons we know today."""

input = tokenizer(premise, truncation=True, return_tensors="pt")
output = model(input["input_ids"].to('cpu'))  # device = "cuda:0" or "cpu"
prediction = torch.softmax(output["logits"][0], -1).tolist()
label_names = ["space & cosmos", "scientific discovery", "microbiology", "robots", "archeology"]
prediction = {name: round(float(pred) * 100, 1) for pred, name in zip(prediction, label_names)}
print(prediction)

{'space & cosmos': 5.2, 'scientific discovery': 71.2, 'microbiology': 23.7}


I chose the model DeBERTa-v3-base-mnli-fever-anli. It can determine whether two paragraphs are related or whether one paragraph is related to several groups of keywords. From the tests, its accuracy was pretty high.

The first is the disentangled attention mechanism, where each word is represented using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangled matrices on their contents and relative positions, respectively. Second, an enhanced mask decoder is used to incorporate absolute positions in the decoding layer to predict the masked tokens in model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve models' generalization. 

**10. Sentence Similarity (10 Points)** Run a Sentence Similarity language model. Explain the theory behind your model, and run it.  Analyze how well you think it worked.

In [None]:
pip install sentence_transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting sentence_transformers
  Downloading sentence-transformers-2.2.2.tar.gz (85 kB)
[K     |████████████████████████████████| 85 kB 2.3 MB/s 
Building wheels for collected packages: sentence-transformers
  Building wheel for sentence-transformers (setup.py) ... [?25l[?25hdone
  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125938 sha256=94e4e9d79b0388dc7b504f33b786ac54e8856d5f9241d3d4bf8dfbd2b3b8a40f
  Stored in directory: /root/.cache/pip/wheels/bf/06/fb/d59c1e5bd1dac7f6cf61ec0036cc3a10ab8fecaa6b2c3d3ee9
Successfully built sentence-transformers
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.2.2


In [None]:
from sentence_transformers import SentenceTransformer

# example
question = "<Q>How many models can I host on HuggingFace?"
answers = [
  "<A>All plans come with unlimited private models and datasets.",
  "<A>AutoNLP seamlessly integrated with the Hugging Face ecosystem.",
  "<A>Based on how much training data and model variants are created, we send you a compute cost and payment link."
]
model = SentenceTransformer('clips/mfaq')
q_embedding, *a_embeddings = model.encode([question] + answers)
print(answers)
best_answer_idx = sorted(enumerate(a_embeddings), key=lambda x: q_embedding.dot(x[1]), reverse=True)[0][0]
print(answers[best_answer_idx])

Downloading:   0%|          | 0.00/1.65k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/3.74k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/117 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.11G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/294 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/9.08M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/464 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/229 [00:00<?, ?B/s]

  "Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 "


['<A>All plans come with unlimited private models and datasets.', '<A>AutoNLP seamlessly integrated with the Hugging Face ecosystem.', '<A>Based on how much training data and model variants are created, we send you a compute cost and payment link.']
<A>All plans come with unlimited private models and datasets.


In [None]:
# test
question = "<Q>what is a standard drink measurement"
answers = [
  "<A>2 ounces of regular beer",
  "<A>Drinking an occasional glass of red wine is good for you.",
  "<A>the process of associating numbers with physical quantities and phenomena"
]
model = SentenceTransformer('clips/mfaq')
q_embedding, *a_embeddings = model.encode([question] + answers)
print(answers)

['<A>2 ounces of regular beer', '<A>Drinking an occasional glass of red wine is good for you.', '<A>the process of associating numbers with physical quantities and phenomena']


mfaq model is a multilingual FAQ retrieval model trained on the MFAQ dataset, it ranks candidate answers according to a given question. Maxime De Bruyn ec al. collected around 6M FAQ pairs from the web, in 21 different languages. Although this is significantly larger than existing FAQ retrieval datasets, it comes with its own challenges: duplication of content and uneven distribution of topics.
