<font size=6>NLP with Transformers with the T5 Model</font>

# Introduction

In this assignment you will learn how to apply the T5 pre-trained model to three tasks
1. summarization,
2. translation,
3. grammar checking.

This will require installation of `pytorch` and the `transformers` package.  You should already have `pytorch` installed.  To install `transformers`, you can use

    pip install transformers


Then run the following code cell.  The first time it is run, the t5-base model will be downloaded.

In [1]:
import transformers as tr

# initialize the model architecture and weights
model = tr.T5ForConditionalGeneration.from_pretrained("t5-base")
# initialize the model tokenizer
tokenizer = tr.T5Tokenizer.from_pretrained("t5-base")

Some weights of the model checkpoint at t5-base were not used when initializing T5ForConditionalGeneration: ['decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight']
- This IS expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing T5ForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


Now you will use `model` and `tokenizer` to perform the above tasks.  Here are some examples.

## Summarize

First, let's summarize this text with at most 100 words.

In [2]:
text = """
Julia was designed from the beginning for high performance. Julia programs compile to efficient 
native code for multiple platforms via LLVM.
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use.
Reproducible environments make it possible to recreate the same Julia environment every time, 
across platforms, with pre-built binaries.
Julia uses multiple dispatch as a paradigm, making it easy to express many object-oriented 
and functional programming patterns. The talk on the Unreasonable Effectiveness of Multiple 
Dispatch explains why it works so well.
Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, 
and more. One can build entire Applications and Microservices in Julia.
Julia is an open source project with over 1,000 contributors. It is made available under the 
MIT license. The source code is available on GitHub.
"""

In [3]:
len(text)

929

In [4]:
inputs = tokenizer.encode("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
inputs

tensor([[21603,    10, 18618,    47,   876,    45,     8,  1849,    21,   306,
           821,     5, 18618,  1356,  2890,   699,    12,  2918,  4262,  1081,
            21,  1317,  5357,  1009,     3, 10376, 12623,     5, 18618,    19,
          4896,  1427,   686,    26,     6,  4227,   114,     3,     9,  4943,
            53,  1612,     6,    11,    65,   207,   380,    21,  6076,   169,
             5,   419,  1409,  4817,  2317,  8258,   143,    34,   487,    12,
         23952,     8,   337, 18618,  1164,   334,    97,     6,   640,  5357,
             6,    28,   554,    18, 16152,  2701,  5414,     5, 18618,  2284,
          1317, 17648,    38,     3,     9, 20491,     6,   492,    34,   514,
            12,  3980,   186,  3735,    18,  9442,    11,  5014,  6020,  4264,
             5,    37,  1350,    30,     8,   597,   864,   739,   179, 18652,
           655,    13, 16821,     3, 23664, 14547,     3,  9453,   572,    34,
           930,    78,   168,     5, 18618,   795,  

In [5]:
outputs = model.generate(inputs, max_length=100, min_length=10, length_penalty=1.0, num_beams=4,
                         num_return_sequences=3)

In [6]:
print(outputs)
print(outputs.shape)

tensor([[    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743,     6,    11,    72,     3,     5,     8,  1391,  1081,
            19,   347,    30,     3, 30516,   365,     8,     3, 12604,  3344,
             3,     5,     1],
        [    0, 18618,    19,  4896,  1427,   686,    26,     6,  4227,   114,
             3,     9,  4943,    53,  1612,     6,    11,    65,   207,   380,
            21,  6076,   169,     3,     5, 18618,   795,     3,     9, 30373,
            27,    87,   667,     6, 10531,  7050,    53,     6,    20, 14588,
          3896,     6,     3, 12578,     6,  9639,    53,     6,     3,     9,
          2642,  2743

In [7]:
for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. the source code is available on GitHub under the MIT license.</s>

Result 2
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. it is an open source project with over 1,000 contributors.</s> <pad> <pad>

Result 3
<pad> Julia is dynamically typed, feels like a scripting language, and has good support for interactive use. Julia provides asynchronous I/O, metaprogramming, debugging, logging, profiling, a package manager, and more. Julia is an open source project with over 1,000 contributors.</s> <pad> <pad>


## Translation

Now let's translate the text to German.

In [8]:
inputs = tokenizer.encode('translate English to German: ' + text, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=3)

for i in range(outputs.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(outputs[i]))


Result 1
<pad> Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia</s>

Result 2
<pad> Julia wurde von Anfang an für hohe Performance entwickelt. Julia-Programme kompilieren zu effizientem nativen Code für mehrere Plattformen über LLVM. Julia ist dynamisch getippt, fühlt sich wie eine Skriptsprache und hat gute Unterstützung für interaktive Verwendung. Reproduzierbare Umgebungen ermöglichen es, die gleiche Julia-Umgebung jedes Mal, über Plattformen hinweg, mit vorgefertigten Binärdateien zu erschaffen. Julia verwendet</s>

Result 3
<pad> Julia wurde von Anfang an für hohe Leistung entwickelt. Julia-Programme kompilieren

## Grammar Checker

Now to check some grammar.

In [9]:
sentence = 'This sentence do not be grammatical.'
inputs = tokenizer.encode('cola sentence: ' + sentence, return_tensors='pt')
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

<pad> unacceptable</s>


# Requirements

## Summarization

Cut and paste a news story that has at least five paragraphs that describes the recent news about the COVID-19 vaccine developed by the University of Oxford University and AstraZenec.  Try at least three values for each of the parameters:

* `max_length`,
* `min_length`,
* `length_penalty`, and
* `num_beams`.

Copy and paste into a markdown cell what you consider to be the best summarization of the news article.  Also, describe the effects of these four parameters on the results with at least four sentences.

[This article](https://github.com/huggingface/blog/blob/master/notebooks/02_how_to_generate.ipynb) will help you understand a bit more about these parameters.

In [61]:
story2 = "A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reaction after receiving a shot of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization.The Indian Council of Medical Research (ICMR), India's medical research regulator, is assisting an inquiry into the participant's allegation but told Reuters on Sunday there is currently \"no immediate cause of concern,\" nor are there any plans to halt the trial.The 40-year-old trial participant, who received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he experienced acute \"neurological and psychological\" side effects after he received the vaccine. He is seeking 50 million rupees—around $676,000—in compensation. The man also called for the testing, manufacturing, and distribution of the AstraZeneca vaccine to be \"stopped immediately.\"The Serum Institute of India, the vaccine manufacturer running the AstraZeneca vaccine trials in India, said in a statement to India's Economic Times that there is \"absolutely no correlation\" between the man's condition and the vaccine trial. It called the man's allegations \"malicious and misconceived\" and said it would seek around $13.5 million in damages for the allegations.The Serum Institute has already produced millions of doses of AstraZeneca's vaccine.The ICMR, the Serum Institute, AstraZeneca, and the Oxford Vaccine Group, which developed the vaccine with AstraZeneca, didn't immediately reply to Fortune’s requests for comment.AstraZeneca experienced a late-stage trial hiccup in September when it halted clinical trials across the globe because of a suspected adverse reaction in a U.K.-based trial participant.The vaccine's trials in the U.K. resumed on Sept. 12, four days after the suspension, following safety reviewers' confirmation that it was safe to do so; the Serum Institute received approval to resume trials on Sept. 16; U.S. trials resumed in October.The India trial participant's allegations follow last week's criticism of AstraZeneca for a perceived lack of transparency in its clinical trial analysis.On Nov. 23, AstraZeneca announced that an early analysis of its late-stage clinical trial data showed its COVID-19 vaccine candidate was either 62% or 90% effective, depending on how the doses were administered to participants. AstraZeneca's announcement followed COVID-19 vaccine trial results from Pfizer and Moderna, which had both reported efficacy rates of 90% and up.AstraZeneca's results were widely considered positive and promising, especially because its candidate is relatively cheap and easy to produce and a large portion of its doses are slated to go to low-income countries. It's also easier to transport and store than Pfizer's and Moderna's vaccines because it doesn't require ultralow storage temperatures.But days after its Nov. 23 news, AstraZeneca and Oxford came under fire for initially omitting some information about the trial results, including that that the 90% efficacy rate was discovered by mistake, when researchers unintentionally gave a group of participants half a dose of the vaccine instead of the full dose.AstraZeneca defended its results and methods, saying it used the \"highest standards\" and that it would carry out further analysis."
inputs = tokenizer.encode("summarize: " + story2, return_tensors="pt", truncation=False)

In [11]:
outputs1 = model.generate(inputs, max_length=1000, min_length=100, length_penalty=1.0, num_beams=4, num_return_sequences=1)

In [75]:
print("Summary 1:\n"+tokenizer.decode(outputs1[0], skip_special_tokens=True))

Summary 1:
the 40-year-old trial participant claims he experienced acute "neurological and psychological" side effects after he received the vaccine. he is seeking 50 million rupees—around $676,000—in compensation. the vaccine manufacturer says there is "absolutely no correlation" between the man's condition and the vaccine trial. astraZeneca experienced a late-stage trial hiccup in September when it halted clinical trials across the globe.


In [13]:
outputs2 = model.generate(inputs, max_length=200, min_length=50, length_penalty=1.0, num_beams=4, num_return_sequences=1)

In [74]:
print("Summary 2:\n"+tokenizer.decode(outputs2[0], skip_special_tokens=True))

Summary 2:
the 40-year-old trial participant claims he experienced acute "neurological and psychological" side effects after receiving the vaccine. he is seeking 50 million rupees—around $676,000—in compensation. he also called for the testing, manufacturing, and distribution of the vaccine to be "stopped immediately"


In [15]:
outputs3 = model.generate(inputs, max_length=150, min_length=20, length_penalty=0.75, num_beams=7, num_return_sequences=1)

In [73]:
print("Summary 3:\n"+tokenizer.decode(outputs3[0], skip_special_tokens=True))

Summary 3:
the 40-year-old man received the vaccine shots on Oct. 1 at a trial site in Chennai, india. he said he experienced acute "neurological and psychological" side effects after he received the vaccine. he is seeking 50 million rupees—around $676,000—in compensation.


In [17]:
outputs4 = model.generate(inputs, max_length=500, min_length=25, length_penalty=0.25, num_beams=5, num_return_sequences=1)

In [72]:
print("Summary 4:\n"+tokenizer.decode(outputs4[0], skip_special_tokens=True))

Summary 4:
the 40-year-old trial participant claims he experienced acute "neurological and psychological" side effects after he received the vaccine. he is seeking 50 million rupees—around $676,000—in compensation. the vaccine manufacturer says there is "absolutely no correlation" between the man's condition and the vaccine trial. astraZeneca experienced a late-stage trial hiccup in september when it halted clinical trials across the globe because of a suspected adverse reaction..... ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

In [68]:
outputs4 = model.generate(inputs, max_length=3000, min_length=1000, length_penalty=1.25, num_beams=4, num_return_sequences=1)

In [71]:
print("Summary 4:\n"+tokenizer.decode(outputs4[0], skip_special_tokens=True))

Summary 4:
the 40-year-old trial participant claims he experienced acute "neurological and psychological" side effects after he received the vaccine. he is seeking 50 million rupees—around $676,000—in compensation. the vaccine manufacturer says there is "absolutely no correlation" between the man's condition and the vaccine trial. astraZeneca experienced a late-stage trial hiccup in september when it halted clinical trials across the globe because of a suspected adverse reaction..... ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­

In [80]:
print(len(outputs1[0]),len(outputs2[0]),len(outputs3[0]),len(outputs4[0]))

108 71 71 3000


# Best Summary:

the 40-year-old trial participant claims he experienced acute "neurological and psychological" side effects after he received the vaccine. he is seeking 50 million rupees—around $676,000—in compensation. the vaccine manufacturer says there is "absolutely no correlation" between the man's condition and the vaccine trial. astraZeneca experienced a late-stage trial hiccup in September when it halted clinical trials across the globe.

For the summarization portion of the assignment I found that past a certain length, the summaries I generated were all truncated, as shown in Summary 4 and Summary 5 above.  As such, I attempted to generate shorter summaries using smaller values for length_penalty and max_length.  Overall, I observed shorter summaries when using smaller values of length_penalty (<1.0).  I also observed that min_length had a much clearer effect on the length of the summaries than max_length did. For example, although Summary 1 and Summary 2 had a very large difference in max_length, the overall length of each summary was more in line with the value of min_length that was used.  I think that using a higher value for num_beams like in Summary 3 may have resulted in some phrases being substituted that would not have been otherwise.  

## Translation

Try translating the first paragraph of your news story into German.  Use `num_return_sequences=5` and translate the German back to English using [translate.google.com](https://translate.google.com/).  Experiment with at least three values for the above four parameters.  Using the google translations, describe which German translation is best, and which parameter values led to its generation.

In [33]:
inputs = tokenizer.encode('translate English to German: ' + text, return_tensors='pt',
                          max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=10,
                         num_return_sequences=3)

In [34]:
first_paragraph = "A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reaction after receiving a shot of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization.The Indian Council of Medical Research (ICMR), India's medical research regulator, is assisting an inquiry into the participant's allegation but told Reuters on Sunday there is currently \"no immediate cause of concern,\" nor are there any plans to halt the trial. The 40-year-old trial participant, who received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he experienced acute \"neurological and psychological\" side effects after he received the vaccine. He is seeking 50 million rupees—around $676,000—in compensation. The man also called for the testing, manufacturing, and distribution of the AstraZeneca vaccine to be \"stopped immediately.\""
german_inputs = tokenizer.encode('translate English to German: ' + first_paragraph, return_tensors='pt', truncation=False)

In [35]:
german_outputs1 = model.generate(german_inputs, max_length=1500, min_length=20, length_penalty=1.0, num_beams=5, num_return_sequences=5)

In [76]:
for i in range(german_outputs1.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(german_outputs1[i], skip_special_tokens=True))


Result 1
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für das Koronavirus erhalten hatte, der in der späten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), Indiens Aufsichtsbehörde für medizinische Forschung, unterstützt eine Untersuchung der Behauptung des

Result 2
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für das Koronavirus erhalten hatte, der in der späten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), Indiens Aufsichtsbehörde für medizinische Forschung, unterstützt die Untersuchung der Behauptung des

Result 3
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für 

In [37]:
german_outputs2 = model.generate(german_inputs, max_length=1000, min_length=100, length_penalty=1.5, num_beams=6, num_return_sequences=5)

In [77]:
for i in range(german_outputs2.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(german_outputs2[i], skip_special_tokens=True))


Result 1
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er eine Poke des Impfstoffs Coronavirus erhalten hatte, der in der letzten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), Indiens Aufsichtsbehörde für medizinische Forschung, unterstützt eine Untersuchung der Behauptung des Teilnehmers,

Result 2
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er eine Poke des Impfstoffs Coronavirus erhalten hatte, der in der letzten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), die indische Aufsichtsbehörde für medizinische Forschung, unterstützt eine Untersuchung der Behauptung des Teilnehmers

Result 3
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er eine Poke des Impfsto

In [51]:
german_outputs3 = model.generate(german_inputs, max_length=400, min_length=50, length_penalty=0.4, num_beams=7, num_return_sequences=5)

In [78]:
for i in range(german_outputs3.shape[0]):
    print('\nResult', i + 1)
    print(tokenizer.decode(german_outputs3[i], skip_special_tokens=True))


Result 1
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für den Koronavirus erhalten hatte, der in der späten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), Indiens Aufsichtsbehörde für medizinische Forschung, unterstützt eine Untersuchung der Behauptung des

Result 2
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für das Koronavirus erhalten hatte, der in der späten Phase der Tests getestet wird. Der indische Rat für medizinische Forschung (ICMR), Indiens Aufsichtsbehörde für medizinische Forschung, unterstützt eine Untersuchung der Behauptung des

Result 3
Ein Teilnehmer an AstraZeneca's COVID-19 Impfstoff-Studie in Indien behauptet, er habe eine unerwünschte Reaktion erlitten, nachdem er einen Schuss des Impfstoffs für

# Google Translations:

german_outputs1:

    Result 1: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, is backing an investigation into the claim of the
    
    Result 2: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, supports the investigation into the claim of the
    
    Result 3: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, is backing an investigation into the claim
    
    Result 4: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. The Indian Council for Medical Research (ICMR), India's regulatory agency for medical research, is supporting an investigation into the allegation of the
    
    Result 5: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's medical research regulatory agency, is backing an investigation into the claim

german_outputs2:

    Result 1: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a poke of the coronavirus vaccine being tested in the final stage of the tests. India's Medical Research Council (ICMR), India's regulatory agency for medical research, is backing an investigation into the participant's claim
    
    Result 2: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a poke of the coronavirus vaccine being tested in the final stage of the tests. Indian Medical Research Council (ICMR), the Indian regulatory agency for medical research, is backing an investigation into the participant's claim
    
    Result 3: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a poke of the coronavirus vaccine being tested in the final stage of the tests. The Indian Council for Medical Research (ICMR), India's regulatory agency for medical research, supports the investigation into the contestant's claim that
    
    Result 4: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a poke of the coronavirus vaccine being tested in the final stage of the tests. The Indian Medical Research Council (ICMR), India's regulatory agency for medical research, is supporting an investigation into the participant's claim
    
    Result 5: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a poke of the coronavirus vaccine being tested in the final stage of the tests. The Indian Medical Research Council (ICMR), the Indian regulatory agency for medical research, is supporting the investigation into the participant's claim
    
german_outputs3:

    Result 1: A participant in AstraZeneca's COVID-19 vaccine study in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, is backing an investigation into the claim of the
    
    Result 2: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, is backing an investigation into the claim of the
    
    Result 3: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, supports the investigation into the claim of the
    
    Result 4: A participant in AstraZeneca's COVID-19 vaccine study in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. India's Medical Research Council (ICMR), India's regulatory agency for medical research, supports the investigation into the claim of the
    
    Result 5: A participant in AstraZeneca's COVID-19 vaccine trial in India claims he suffered an adverse reaction after receiving a shot of the vaccine for the coronavirus that is being tested in the late stage of testing. The Indian Council for Medical Research (ICMR), India's regulatory agency for medical research, is supporting an investigation into the allegation of the
    
# Best Translation:

I think the best translation is result 5 of german_outputs3.  This is because it successfully translated words such as "shot" and "allegation," which were translated as synonyms such as "poke" and "claim" in other versions.  This was most likely because it had the highest value for num_beams.  

## Grammar Checker

Write a for loop that checks the grammatical correctness of each sentence in a list of sentences.  Apply it to the first paragraph of your news article.  Describe the results.

Now modify at least three of the sentences in your paragraph to make the sentences grammatically incorrect and repeat the analysis of all sentences.  Describe the results. Are your grammatically incorrect sentences correctly identified?

In [54]:
sentence_list = ["A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reaction after receiving a shot of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization.", "The Indian Council of Medical Research (ICMR), India's medical research regulator, is assisting an inquiry into the participant's allegation but told Reuters on Sunday there is currently \"no immediate cause of concern,\" nor are there any plans to halt the trial.", "The 40-year-old trial participant, who received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he experienced acute \"neurological and psychological\" side effects after he received the vaccine.", "He is seeking 50 million rupees—around $676,000—in compensation.", "The man also called for the testing, manufacturing, and distribution of the AstraZeneca vaccine to be \"stopped immediately.\""]
#first_paragraph = "A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reaction after receiving a shot of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization.The Indian Council of Medical Research (ICMR), India's medical research regulator, is assisting an inquiry into the participant's allegation but told Reuters on Sunday there is currently \"no immediate cause of concern,\" nor are there any plans to halt the trial. The 40-year-old trial participant, who received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he experienced acute \"neurological and psychological\" side effects after he received the vaccine. He is seeking 50 million rupees—around $676,000—in compensation. The man also called for the testing, manufacturing, and distribution of the AstraZeneca vaccine to be \"stopped immediately.\""
for sentence in sentence_list:
    inputs = tokenizer.encode('cola sentence: ' + sentence, return_tensors='pt')
    outputs = model.generate(inputs)
    print(sentence, "is grammatically", tokenizer.decode(outputs[0]), "\n")

A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reaction after receiving a shot of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization. is grammatically <pad> acceptable</s> 

The Indian Council of Medical Research (ICMR), India's medical research regulator, is assisting an inquiry into the participant's allegation but told Reuters on Sunday there is currently "no immediate cause of concern," nor are there any plans to halt the trial. is grammatically <pad> acceptable</s> 

The 40-year-old trial participant, who received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he experienced acute "neurological and psychological" side effects after he received the vaccine. is grammatically <pad> acceptable</s> 

He is seeking 50 million rupees—around $676,000—in compensation. is grammatically <pad> acceptable</s> 

The man also called for the testi

In [60]:
bad_sentence_list = ["A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reactions after received a shots of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization.", "The Indian Council of Medical Research (ICMR), India's medical research regulator, is assist an inquiry into the participant's allegation but did told Reuters on Sunday there is currently \"no immediate cause of concern,\" nor are there any plans to halting the trial.", "The 40-year-old trial participant, who will received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he did experienced acute \"neurological and psychological\" side effects after he will received the vaccine.", "He is be seeking 50 million rupees—around $676,000—in compensation.", "The man will also called for the testing, manufacturings, and distributings of the AstraZeneca vaccine to be \"stopped immediately.\""]
for sentence in bad_sentence_list:
    inputs = tokenizer.encode('cola sentence: ' + sentence, return_tensors='pt')
    outputs = model.generate(inputs)
    print(sentence, "is grammatically", tokenizer.decode(outputs[0]), "\n")

A participant in AstraZeneca's COVID-19 vaccine trial in India claims he had an adverse reactions after received a shots of the coronavirus vaccine that is in late-stage testing, adding to a list of recent woes for the British drugmaker's experimental immunization. is grammatically <pad> acceptable</s> 

The Indian Council of Medical Research (ICMR), India's medical research regulator, is assist an inquiry into the participant's allegation but did told Reuters on Sunday there is currently "no immediate cause of concern," nor are there any plans to halting the trial. is grammatically <pad> acceptable</s> 

The 40-year-old trial participant, who will received the vaccine shots on Oct. 1 at a trial site in Chennai, India, said he did experienced acute "neurological and psychological" side effects after he will received the vaccine. is grammatically <pad> acceptable</s> 

He is be seeking 50 million rupees—around $676,000—in compensation. is grammatically <pad> unacceptable</s> 

The man w

# Grammar Checker Results

The grammar-checking for loop I wrote found all the of original sentences I provided to be grammatically acceptable.  I then modified each of the 5 sentences to be grammatically incorrect and did the same grammar-checking loop with the new grammatically incorrect sentences.  The grammar-checker loop only identified one sentence as being grammatically unacceptable: sentence 4.  

# Extra Credit

Read some of the on-line documentation and examples that describe how to fine-tune the T5 model to do better English to German and German to English translations.  Try fine-tuning the T5 model we use here on example translations.  Does it perform better?

Warning: This will take a lot of time to figure out.  First try to find examples on-line of training to fine-tune the model.