<font size=6>NLP with Transformers with the T5 Model</font>

# Introduction

In this assignment you will learn how to apply the T5 pre-trained model to three tasks
1. summarization,
2. translation,
3. grammar checking.

This will require installation of `pytorch` and the `transformers` package.  You should already have `pytorch` installed.  To install `transformers`, you can use

    pip install transformers


Then run the following code cell.  The first time it is run, the t5-base model will be downloaded.

In [41]:
import transformers as tr

# initialize the model architecture and weights
model = tr.T5ForConditionalGeneration.from_pretrained("t5-base")
# initialize the model tokenizer
tokenizer = tr.T5Tokenizer.from_pretrained("t5-base")

Some weights of the model checkpoint at t5-base were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Now you will use `model` and `tokenizer` to perform the above tasks.  Here are some examples.

## Summarize

First, let's summarize this text with at most 100 words.

In [42]:
text = """
Julia was designed from the beginning for high performance. Julia programs compile to efficient 
native code for multiple platforms via LLVM.
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use.
Reproducible environments make it possible to recreate the same Julia environment every time, 
across platforms, with pre-built binaries.
Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented 
and functional programming patterns. The talk on the Unreasonable Effectiveness of Multiple 
Dispatch explains why it works so well.
Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, 
and more. One can build entire Applications and Microservices in Julia.
Julia is an open source project with over 1,000 contributors. It is made available under the 
MIT license. The source code is available on GitHub.
"""

In [43]:
len(text)

929

In [44]:
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
inputs

tensor([[21603,    10, 18618,    47,   876,    45,     8,  1849,    21,   306,
           821,     5, 18618,  1356,  2890,   699,    12,  2918,  4262,  1081,
            21,  1317,  5357,  1009,     3, 10376, 12623,     5, 18618,    19,
          4896,  1427,   686,    26,     6,  4227,   114,     3,     9,  4943,
            53,  1612,     6,    11,    65,   207,   380,    21,  6076,   169,
             5,   419,  1409,  4817,  2317,  8258,   143,    34,   487,    12,
         23952,     8,   337, 18618,  1164,   334,    97,     6,   640,  5357,
             6,    28,   554,    18, 16152,  2701,  5414,     5, 18618,  2284,
          1317, 17648,    38,     3,     9, 20491,     6,   492,    34,   514,
            12,  3980,   186,  3735,    18,  9442,    11,  5014,  6020,  4264,
             5,    37,  1350,    30,     8,   597,   864,   739,   179, 18652,
           655,    13, 16821,     3, 23664, 14547,     3,  9453,   572,    34,
           930,    78,   168,     5, 18618,   795,  

In [45]:
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)

In [46]:
print(outputs)
print(outputs.shape)

tensor([[    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743,     6,    11,    72,     3,     5,     8,  1391,  1081,
            19,   347,    30,     3, 30516,   365,     8,     3, 12604,  3344,
             3,     5,     1],
        [    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743

In [47]:
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. the source code is available on GitHub under the MIT license.</s>

Result 2
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. it is an open source project with over 1,000 contributors.</s> <pad> <pad>

Result 3
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. Julia is an open source project with over 1,000 contributors.</s> <pad> <pad>


## Translation

Now let's translate the text to German.

In [48]:
inputs = tokenizer.encode('translate English to German: ' + text, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=3)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia</s>

Result 2
<pad> Julia wurde von Anfang an für hohe Performance entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia verwendet</s>

Result 3
<pad> Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren

## Grammar Checker

Now to check some grammar.

In [49]:
sentence = 'This sentence do not be grammatical.'
inputs = tokenizer.encode('cola sentence: ' + sentence, return_tensors='pt')
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

<pad> unacceptable</s>


# Requirements

## Summarization

Cut and paste a news story that has at least five paragraphs that describes the recent news about the COVID-19 vaccine developed by the University of Oxford University and AstraZenec.  Try at least three values for each of the parameters:

* `max_length`,
* `min_length`,
* `length_penalty`, and
* `num_beams`.

Copy and paste into a markdown cell what you consider to be the best summarization of the news article.  Also, describe the effects of these four parameters on the results with at least four sentences.

[This article](https://github.com/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb) will help you understand a bit more about these parameters.

In [50]:
text1 = """
The coronavirus vaccine developed by the University of Oxford 
is highly effective at stopping people developing Covid-19 symptoms, a large trial shows.
Interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose.
The results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection.
However, the Oxford jab is far cheaper, and is easier to store and get to every corner of the world than the other two.
So the vaccine will play a significant role in tackling the pandemic, if it is approved for use by regulators.
"The announcement today takes us another step closer to the time when we can use vaccines to bring an 
end to the devastation caused by [the virus]," said the vaccine's architect, Prof Sarah Gilbert.
The UK government has pre-ordered 100 million doses of the Oxford vaccine, 
and AstraZeneca says it will make three billion doses for the world next year.
Prime Minister Boris Johnson said it was "incredibly exciting news" and that 
while there were still safety checks to come, "these are fantastic results".
Speaking at a Downing Street briefing on Monday evening, Mr Johnson added that 
the majority of people most in need of a vaccination in the UK might be able to get one by Easter.
And Prof Andrew Pollard - director of the Oxford vaccine group - said it had been "a very exciting day" and 
paid tribute to the 20,000 volunteers in the trials around the world, including more than 10,000 in the UK.
"""

In [51]:
len(text1)

1513

In [52]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print(outputs)
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

tensor([[    0,     8,  4301,   106,     9, 18095, 12956,  1597,    57,     8,
          3819,    13, 10274,    19,  1385,  1231,    44, 10847,   151,  2421,
           638,  6961,  4481,  3976,     3,     5,  1413,   603,   331,  6490,
         14719,  1711,     6,    68,     8,  4768,   497,     8,  2320,   164,
            36,    38,   306,    38, 12669,    57, 22689,    53,     8,  6742,
             3,     5,     8, 12956,    19,   623,  8139,     6,    11,    19,
          1842,    12,  1078,    11,   129,    12,   334,  2752,    13,     8,
           296,   145,     8,   119,   192,     3,     5,     1,     0],
        [    0,     8,  4301,   106,     9, 18095, 12956,  1597,    57,     8,
          3819,    13, 10274,    19,  1385,  1231,    44, 10847,   151,  2421,
           638,  6961,  4481,  3976,     3,     5,     8,   772,    56,    36,
           894,    38,     3,     9, 20020,     6,    68,   369,   227,   276,
            89,  8585,    11,  5070,     9, 12956,     7, 

In [53]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=1000, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=20, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  

outputs = model.generate(inputs, max_length=500, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  
    

torch.Size([3, 79])

Result 1
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two.</s> <pad>

Result 2
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. the results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two.</s>

Result 3
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high

In [54]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=100, min_length=80, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)

print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=100, min_length=50, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  

outputs = model.generate(inputs, max_length=100, min_length=1, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

torch.Size([3, 100])

Result 1
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and AstraZen

Result 2
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. the results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and AstraZ

Result 3
<pad> the coronavirus vaccine developed by the university of Ox

In [55]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=10.0, num_beams=4,
                         num_return_sequences=3)

print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=0.5, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  

outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=0.0, num_beams=4,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

torch.Size([3, 100])

Result 1
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and AstraZen

Result 2
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. the results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and AstraZ

Result 3
<pad> the coronavirus vaccine developed by the university of Ox

In [56]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=15,
                         num_return_sequences=3)

print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=10,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  

outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=3,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

torch.Size([3, 79])

Result 1
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two.</s> <pad>

Result 2
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. the results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two.</s>

Result 3
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high

In [57]:
inputs = tokenizer.encode("summarize: " + text1, return_tensors="pt", max_length=512, truncation=True)
inputs
outputs = model.generate(inputs, max_length=1000, min_length=100, length_penalty=2.0, num_beams=8,
                         num_return_sequences=3)

print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=200, min_length=20, length_penalty=2.0, num_beams=8,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))  

outputs = model.generate(inputs, max_length=50, min_length=5, length_penalty=0.5, num_beams=5,
                         num_return_sequences=3)
print()
print(outputs.shape)
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

torch.Size([3, 115])

Result 1
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and AstraZeneca says it will make three billion doses for the world next</s>

Result 2
<pad> the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. the UK government has pre-ordered 100 million doses of the Oxford vaccine, and Astr

## Discussion about the summarization

the coronavirus vaccine developed by the university of Oxford is highly effective at stopping people developing Covid-19 symptoms. interim data suggests 70% protection, but the researchers say the figure may be as high as 90% by tweaking the dose. the vaccine is far cheaper, and is easier to store and get to every corner of the world than the other two. 


I think that is the best summarization of the article that I found online. That is from max_length=100, min_length=10, length_penalty=1.0, num_beams=4.


   The max_length will limit the length of the summarazation, we can see from our example, if we set the max_length to 20 and min_length is 10, the summarazation is not even a sentence, whihc is not we expected. The min_length impact on the legth of summarazation too, like I tested min_length to be 80 and max_length is 100, the rusult will be longer than we expected. The length_penalty also impacted on the result if we set it too zero, we can see from our example the result became shorter than we want and with messy code like lots of 'pad'. Also, if we increase the number_penalty to 10, the result will be longer than number_penalty is 1.0, and addition part is not a sentence. The num_beams will influence our result accuracy. However, I think the text I used is too simple or short, so when the num_beams is 3 have same rusult as 10. That is same for other 3 paramters, if we used differnent length of text, the result will be different. but for the text I choose, I think max_length=100, min_length=10, length_penalty=1.0, num_beams=4 could get the best result I want to see.
   
   

 By the way, We can see from our result, like I just double those 4 parameters to be max_length=200, min_length=20, length_penalty=2.0, num_beams=8, the result is same as max_length=100, min_length=10, length_penalty=1.0, num_beams=4. That is kind of reason 
    why I choose max_length=100, min_length=10, length_penalty=1.0, num_beams=4 is the best summarazation result.

## Translation

Try translating the first paragraph of your news story into German.  Use `num_return_sequences=5` and translate the German back to English using [translate.google.com](https://translate.google.com/).  Experiment with at least three values for the above four parameters.  Using the google translations, describe which German translation is best, and which parameter values led to its generation.

In [77]:
text2 = """
The coronavirus vaccine developed by the University of Oxford is highly effective at stopping people 
developing Covid-19 symptoms, a large trial shows.
Interim data suggests 70% protection, but the researchers 
say the figure may be as high as 90% by tweaking the dose.
The results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection.
"""

inputs = tokenizer.encode('translate English to German: ' + text2, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, aber nachdem Impfstoffe von Pfizer und Moderna einen Schutz von 95 % aufwiesen.</s>

Result 2
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, kommen jedoch nachdem Impfstoffe von Pfizer und Moderna 95 % des Schutzes</s>

Result 3
<pad> Der von der Universität Oxford entwickelte Kor

In [78]:

inputs = tokenizer.encode('translate English to German: ' + text2, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=30, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
    
outputs = model.generate(inputs, max_length=150, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)
print()

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=3000, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)
print()
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))    


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen

Result 2
<pad> Der von der University of Oxford entwickelte Koronavirus-Impfstoff hält Menschen, die Covid-19-Symptome entwickeln,

Result 3
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff hält Menschen, die Covid-19-Symptome entwickeln, sehr

Result 4
<pad> Eine große Studie zeigt, dass der von der Universität Oxford entwickelte Koronavirus-Impfstoff die Entwicklung von Covid-19

Result 5
<pad> Eine große Studie zeigt, dass der von der Universität Oxford entwickelte Koronavirus-Impfstoff bei der Verhinderung


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar

In [80]:


inputs = tokenizer.encode('translate English to German: ' + text2, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
    
outputs = model.generate(inputs, max_length=150, min_length=130, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)
print()

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=150, min_length=50, length_penalty=1.0, num_beams=10,
                         num_return_sequences=5)
print()
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, aber nachdem Impfstoffe von Pfizer und Moderna einen Schutz von 95 % aufwiesen.</s>

Result 2
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, kommen jedoch nachdem Impfstoffe von Pfizer und Moderna 95 % des Schutzes</s>

Result 3
<pad> Der von der Universität Oxford entwickelte Kor

In [82]:
inputs = tokenizer.encode('translate English to German: ' + text2, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=10.0, num_beams=10,
                         num_return_sequences=5)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
    
outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=3.0, num_beams=10,
                         num_return_sequences=5)
print()

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=0.0, num_beams=10,
                         num_return_sequences=5)
print()
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, aber nachdem Impfstoffe von Pfizer und Moderna einen Schutz von 95 % aufwiesen.</s>

Result 2
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, um die Entwicklung von Covid-19-Symptomen zu stoppen, zeigt eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, kommen jedoch nachdem Impfstoffe von Pfizer und Moderna 95 % des Schutzes</s>

Result 3
<pad> Der von der Universität Oxford entwickelte Kor

In [110]:
inputs = tokenizer.encode('translate English to German: ' + text2, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=1.0, num_beams=20,
                         num_return_sequences=5)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))
    
outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=1.0, num_beams=15,
                         num_return_sequences=5)
print()

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))

outputs = model.generate(inputs, max_length=150, min_length=10, length_penalty=1.0, num_beams=5,
                         num_return_sequences=5)
print()
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, wenn es darum geht, Menschen, die an Covid-19-Symptomen erkranken, zu stoppen, so eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, kommen jedoch nachdem Impfstoffe</s>

Result 2
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, wenn es darum geht, Menschen, die an Covid-19-Symptomen erkranken, zu stoppen, so eine große Studie. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, aber nachdem Impfstoffe von P</s>

Result 3
<pad> Der von der Universität Oxford entwickelte Koronavirus-Impfstoff ist hochwirksam, wenn es dar

## Discussion about translation result



According to the result, the translation from max_length=150, min_length=10, length_penalty=1.0, num_beams=5, that is the best:

Eine große Studie zeigt, dass der von der Universität Oxford entwickelte Koronavirus-Impfstoff bei der Verhinderung von Covid-19-Symptomen sehr wirksam ist. Zwischenzeitliche Daten deuten auf einen Schutz von 70 % hin, aber die Forscher sagen, die Zahl könnte durch eine <unk> nderung der Dosis sogar bis zu 90 % betragen. Die Ergebnisse werden als Triumph angesehen, aber nachdem Impfstoffe von Pfizer und Moderna einen Schutz von 95 % aufwiesen
    
Use Goggle translate back to English is:
    A large study shows that the coronavirus vaccine developed by Oxford University is very effective in preventing Covid-19 symptoms. Interim data suggest 70% protection, but researchers say the number could go as high as 90% by changing the dose. The results are viewed as a triumph, but after Pfizer and Moderna vaccines showed 95% protection
    
    
That is really close to the initial version. Firstly, I changed max_length to 30,150,3000, and I found 150 is enough for this paragrph, if we use max_length=10, the result will be too short. Secondly, I changed min_length to 10,130,50 and keep max_length as 150 . The result told me that min_length don't affect the translation too much. Then I tried three differences value for length_penalty, the results are very similar. Actually, I can't even find any difference, so I think that paramater doesn't affect the result of translation. Then I changed num_beams to 20,15,5 and use max_length=150, min_length=10, length_penalty=1.0, whcih I conclude from previous experiments. The results from those three values are different. The translations from num_beams 20 and 15 are a little bit inaccurate and miss some information. According to this, I can say the num_beams led to its generation, and max_length affect a little bit.


## Grammar Checker

Write a for loop that checks the grammatical correctness of each sentence in a list of sentences.  Apply it to the first paragraph of your news article.  Describe the results.

Now modify at least three of the sentences in your paragraph to make the sentences grammatically incorrect and repeat the analysis of all sentences.  Describe the results. Are your grammatically incorrect sentences correctly identified?

In [102]:
sentence = """The coronavirus vaccine developed by the University of Oxford is highly effective at stopping people 
developing Covid-19 symptoms, a large trial shows.
Interim data suggests 70% protection, but the researchers 
say the figure may be as high as 90% by tweaking the dose.
The results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection."""

text3 = sentence.split(".")
print(text3)
for i in range(len(text3)-1):
    inputs = tokenizer.encode('cola sentence: ' + text3[i], return_tensors='pt')
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))

['The coronavirus vaccine developed by the University of Oxford is highly effective at stopping people \ndeveloping Covid-19 symptoms, a large trial shows', '\nInterim data suggests 70% protection, but the researchers \nsay the figure may be as high as 90% by tweaking the dose', '\nThe results will be seen as a triumph, but come after Pfizer and Moderna vaccines showed 95% protection', '']
<pad> acceptable</s>
<pad> acceptable</s>
<pad> acceptable</s>


# Discussion about result

Result told as that 3 sentences in the first paragraph of the article that I found don't have grammar error. The for loop run 3 times and show all is acceptable.

In [109]:
sentence = """The coronavirus vaccine developed by the University of Oxford is highly effective is at stopping people 
developing Covid-19 symptoms, a large trial shows.
Interim data suggests 70% protection, but the researchers 
is to say the figure may be as high as 90% by tweaking the dose.
The results will be seen as a triumph, but is will come after Pfizer and Moderna vaccines showed 95% protection."""

text3 = sentence.split(".")
print(text3)

for i in range(len(text3)-1):
    inputs = tokenizer.encode('cola sentence: ' + text3[i], return_tensors='pt')
    outputs = model.generate(inputs)
    print(tokenizer.decode(outputs[0]))

['The coronavirus vaccine developed by the University of Oxford is highly effective is at stopping people \ndeveloping Covid-19 symptoms, a large trial shows', '\nInterim data suggests 70% protection, but the researchers \nis to say the figure may be as high as 90% by tweaking the dose', '\nThe results will be seen as a triumph, but is will come after Pfizer and Moderna vaccines showed 95% protection', '']
<pad> unacceptable</s>
<pad> unacceptable</s>
<pad> unacceptable</s>


# Discussion about result

The result show 3 unaccepatbles, which are correct as I changed three sentences to be incorrect. Grammatically incorrect sentences correctly identified.

# Extra Credit

Read some of the on-line documentation and examples that describe how to fine-tune the T5 model to do better English to German and German to English translations.  Try fine-tuning the T5 model we use here on example translations.  Does it perform better?

Warning: This will take a lot of time to figure out.  First try to find examples on-line of training to fine-tune the model.