#### Prerequisites

In [None]:
%%capture 

!pip install transformers==4.18.0
!pip install datasets==2.4.0

### Imports 

In [2]:
from transformers import GPT2LMHeadModel
from transformers import GPT2Tokenizer
from transformers import pipeline
from datasets import load_metric 
import transformers
import pandas as pd
import datasets
import logging

In [3]:
pd.options.display.max_colwidth = 100

##### Setup logging

In [4]:
logger = logging.getLogger('sagemaker')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler())

##### Log versions of dependencies 

In [5]:
logger.info(f'[Using transformers: {transformers.__version__}]')
logger.info(f'[Using datasets: {datasets.__version__}]')

[Using transformers: 4.18.0]
[Using datasets: 2.4.0]


### Copy candidate models from S3 to local for evaluation 

In [None]:
!aws s3 cp s3://sagemaker-us-east-1-119174016168/model/custom/ ./models/pretrained-from-scratch/ --recursive

### Load candidate models for evaluation 

##### Load Out-of-the-box (OOB) GPT2

In [6]:
oob_gpt2 = pipeline('text-generation', model='gpt2')

##### Load OOB GPT2 fine-tuned on covid news articles 

##### Load custom GPT2 further pre-trained (trained from scratch) on covid news articles 

In [7]:
custom_gpt2 = pipeline('text-generation', model='./models/pretrained-from-scratch/')

### Evaluate reference articles againsts the candidate models

In [8]:
ref_df = pd.read_csv('./data/test_articles.csv', names=['reference_article', 'prompt'])
ref_df

Unnamed: 0,reference_article,prompt
0,"On Tuesday, Dr. Fauci and other health officials testified before the U.S. House Energy and Comm...","On Tuesday, Dr. Fauci and other health officials"
1,Pfizer Inc. on Wednesday reported results from two late-stage studies ahead of schedule as it pu...,Pfizer Inc. on Wednesday reported results
2,President Donald Trump said the U.S. has the outbreak of the coronavirus under control and has b...,President Donald Trump said the U.S. has the outbreak
3,"President Joe Biden has signed a flurry of executive orders, actions and memorandums aimed at ra...",President Joe Biden has signed a flurry of executive orders
4,"China is effectively in a lockdown. From big cities to little villages, almost every community i...",China is effectively in a lockdown.
5,"Australian biotech Mesoblast has been riding high on expectations for its COVID-19 treatment, li...",Australian biotech Mesoblast has been riding high on expectations for its
6,"The December, 2019 coronavirus disease outbreak has seen many countries ask people who have pote...","The December, 2019 coronavirus disease outbreak has seen many countries ask people who have pote..."
7,The first confirmed case of coronavirus in India was reported today ( Jan. 30) in the southern s...,The first confirmed case of coronavirus in India was


In [9]:
ref_df['reference_article'][0]

'On Tuesday, Dr. Fauci and other health officials testified before the U.S. House Energy and Commerce Committee to discuss how the administration has been handling the coronavirus outbreak. Yahoo Finance’s Anjalee Khemlani breaks down the latest news about the coronavirus on The Final Round.'

In [10]:
for _, row in ref_df.iterrows():
    ref_article, prompt = row
    custom_gpt2_response = custom_gpt2(prompt, num_return_sequences=1, max_length=300, repetition_penalty=10.0, top_k=1, top_p=1.0)[0]['generated_text']
    oob_gpt2_response = oob_gpt2(prompt, num_return_sequences=1, max_length=300, repetition_penalty=10.0, top_k=1, top_p=1.0)[0]['generated_text']
    print(f'Prompt: {prompt}')
    print()
    print(f'Ref article: {ref_article}')
    print()
    print(f'Custom GPT2 Response: {custom_gpt2_response}')
    print()
    print(f'OOB GPT2 Response: {oob_gpt2_response}')
    print('-' * 200)

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


[2023-01-26 03:32:08.240 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:3606 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2023-01-26 03:32:08.273 pytorch-1-8-gpu-py3-ml-g4dn-xlarge-60bd0d07a83be181dcf7335baae2:3606 INFO profiler_config_parser.py:102] Unable to find config at /opt/ml/input/config/profilerconfig.json. Profiler is disabled.


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: On Tuesday, Dr. Fauci and other health officials

Ref article: On Tuesday, Dr. Fauci and other health officials testified before the U.S. House Energy and Commerce Committee to discuss how the administration has been handling the coronavirus outbreak. Yahoo Finance’s Anjalee Khemlani breaks down the latest news about the coronavirus on The Final Round.

Custom GPT2 Response: On Tuesday, Dr. Fauci and other health officials in the city of tulsa have been working to develop a covid-19 testing site for residents who are asymptomatic or had mild symptoms but do not require hospitalization due directly by their employer ( e) - they will be tested at least once every two weeks starting on mondayth june 1st with results expected within 24 hours after that date; this is an important step towards reopening oklahoma’ s economy as it begins its phased reemergence from lockdown measures following months when cases were still low comparedwith many states across america where new infections 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: Pfizer Inc. on Wednesday reported results

Ref article: Pfizer Inc. on Wednesday reported results from two late-stage studies ahead of schedule as it put off its March 31 investor day amid the coronavirus outbreak. The drugmaker said its experimental treatment, abrocitinib, was effective in treating atopic dermatitis in combination with topical therapies in a late-stage study. In addition, it also reported positive top-line results from another late-stage study testing its pneumococcal conjugate vaccine candidate in adults 18 years of age or older not previously vaccinated against pneumococcal disease, a type of bacterial infection.

Custom GPT2 Response: Pfizer Inc. on Wednesday reported results for the first quarter of 2020, which ended march 31st and is now expected to be down around 30% year-over‑year ( yoy).the company’ s revenues were $ 1 billion in q1fy20 compared with revenue growth rate at 5%.notablyin its earnings call last week management noted that it had seen a dec

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: President Donald Trump said the U.S. has the outbreak

Ref article: President Donald Trump said the U.S. has the outbreak of the coronavirus under control and has been briefed by the Centers for Disease Control and Prevention. Speaking to CNBC, Trump said he wasn't worried it would turn into a pandemic and said the only person infected had flown in from China. He repeated his view that the impeachment is a hoax. Trump batted away a question on whether the Fed's balance sheet was the prime reason for the stock-market SPX, +2.64% gains. He said Fed interest rates should still go lower because the dollar DXY, -0.19% is strong.

Custom GPT2 Response: President Donald Trump said the U.S. has the outbreak of covid-19 in france? or is it a new disease that will be with us for some time to come, and which we can not afford now.? ” “ i’ m very concerned about this virus because there are so many people who have been infected but don't show any symptoms at all... they're still being trea

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: President Joe Biden has signed a flurry of executive orders

Ref article: President Joe Biden has signed a flurry of executive orders, actions and memorandums aimed at rapidly addressing the coronavirus pandemic and dismantling many of President Donald Trump's policies. The 30 executive actions Biden has taken in the first days of his administration include halting funding for the construction of Trump's border wall, reversing Trump's travel ban targeting largely Muslim countries, imposing a mask mandate on federal property, ramping up vaccination supplies and requiring international travelers to provide proof of a negative Covid-19 test prior to traveling to the US

Custom GPT2 Response: President Joe Biden has signed a flurry of executive orders to promote the rights and freedoms that are guaranteed by law in his home country. he is also seeking an injunction against any attempt at “ expropriation ” or confiscating property, including those owned under state-owned enterprises

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: China is effectively in a lockdown.

Ref article: China is effectively in a lockdown. From big cities to little villages, almost every community is under quarantine to a varying degree, or at least faces some travel restrictions. There is little information on how long this will last. One thing for sure is that the government is willing to keep the country in lockdown until the virus outbreak comes under control. A government mobilisation on this scale is unprecedented.

Custom GPT2 Response: China is effectively in a lockdown. the company has been forced to close its stores and furlough staff, while it continues with online sales for essentials such as food delivery services that have seen an increase during covid-19 lockdowns across europe. “ we are seeing increased demand from our customers who want us more than ever before so they can shop at home or pick up their groceries on amazon fresh every day without leaving any time limit ”, said ceo nisha varma earlier this month, 

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: Australian biotech Mesoblast has been riding high on expectations for its

Ref article: Australian biotech Mesoblast has been riding high on expectations for its COVID-19 treatment, licensed to Novartis, but fell back to Earth after it said a phase 3 trial of the cell therapy was a bust. Shares in the stem cell specialist on the ASX lost more than a third of their value after data experts said the study of remestemcel-L in ventilator-dependent patients with moderate to severe acute respiratory distress syndrome ( ARDS) due to COVID-19 was unlikely to show a benefit. Mesoblast said the trial could have been affected by improvements in the care of COVID-19 patients over the last few months, as doctors gathered experience in treating the disease. That included the use of experimental drugs like dexamethasone and Gilead’ s antiviral Veklury (remdesivir).

Custom GPT2 Response: Australian biotech Mesoblast has been riding high on expectations for its covid-19 vaccine. the company’ s

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: The December, 2019 coronavirus disease outbreak has seen many countries ask people who have potentially come into contact with the infection to isolate themselves

Ref article: The December, 2019 coronavirus disease outbreak has seen many countries ask people who have potentially come into contact with the infection to isolate themselves at home or in a dedicated quarantine facility. Decisions on how to apply quarantine should be based on the best available evidence. This review of the psychological impact of quarantine using three electronic databases. Of 3166 papers found, 24 are included in this Review. Most reviewed studies reported negative psychological effects including post-traumatic stress symptoms, confusion, and anger. Stressors included longer quarantine duration, infection fears, frustration, boredom, inadequate supplies, inadequate information, financial loss, and stigma.

Custom GPT2 Response: The December, 2019 coronavirus disease outbreak has seen many countrie

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Prompt: The first confirmed case of coronavirus in India was

Ref article: The first confirmed case of coronavirus in India was reported today ( Jan. 30) in the southern state of Kerala. The patient, a female student at Wuhan University in China, tested positive for the novel coronavirus after returning to Kerala. Kerala health minister KK Shailaja has called an emergency meeting at 3pm.

Custom GPT2 Response: The first confirmed case of coronavirus in India was the second person to die from covid-19. he had been admitted on march 26th, and died at a hospital where his family lives with him after contracting it while working as an emergency room doctor for two years before being discharged last week ( april 1st).the death has prompted tributes across social media platforms including twitter twtr which said that mr cassidy’ s wife is now “ well enough ” but did not specify how many people were affected by their illness or who they believed might have contracted them.) one tweeter wrote: