# GPT-3 - Generative Pre-trained Transformer 3

**Paper:** [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165)

Papers with Code: [GPT-3](https://paperswithcode.com/method/gpt-3)

[Open AI Documentation](https://beta.openai.com/docs/models/gpt-3)

In [None]:
!pip install openai

In [None]:
!pip install -U sagemaker

In [None]:
!pip install -U boto3 awswrangler

In [105]:
import os
import pandas as pd
import sagemaker
import awswrangler as wr
import openai

Set up Open AI `OPENAI_API_KEY`

In [111]:
openai.api_key = ('OPENAI_API_KEY')

## Import data

In [268]:
role = sagemaker.get_execution_role()
data_location = 's3://datasets/text_summarization/corpus/corpus.csv'

df = pd.DataFrame(pd.read_csv(data_location))

Chosing texts with `<= 2000` words

In [269]:
df_optimal = df[df['LENGHT'] == 'OPTIMAL']

In [None]:
print(df_optimal.shape)
df_optimal.head(3)

## Creating the functions with Open AI applications:

- Summarize for a 2nd grader
- TL;DR summarization
- Keywords
- Notes to summary

In [274]:
# Summarize for a 2nd grader
def second_grader_summarization(text):
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt="Summarize this for a second-grade student:\n\n{}.\n\n".format(text),
        temperature=0.7,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    return response['choices'][0]['text']


# TL;DR summarization
def Tl_dr_summarization(text):
    response_Tl_dr = openai.Completion.create(
        model="text-davinci-003",
        prompt="{}\n\nTl;dr\n\n".format(text),
        temperature=0.7,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    return response_Tl_dr['choices'][0]['text']


# Keywords
def keywords_extraction(text):
    response_extract = openai.Completion.create(
        model="text-davinci-003",
        prompt="Extract keywords from this text:\n\n{}\n\n".format(text),
        temperature=0.7,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.8,
        presence_penalty=0.0
    )
    return response_extract['choices'][0]['text']

# Keywords
def keywords_extraction_05(text):
    response_extract = openai.Completion.create(
        model="text-davinci-003",
        prompt="Extract keywords from this text:\n\n{}\n\n".format(text),
        temperature=0.5,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.8,
        presence_penalty=0.0
    )
    return response_extract['choices'][0]['text']


# Notes to summary
def notes2summary(text):
    response_notes2summary = openai.Completion.create(
        model="text-davinci-003",
        prompt="Convert my short hand into a first-hand account of the meeting:\n\n{}\n\n".format(text),
        temperature=0,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    return response_notes2summary['choices'][0]['text']

# Notes to summary
def notes2summary_05(text):
    response_notes2summary = openai.Completion.create(
        model="text-davinci-003",
        prompt="Convert my short hand into a first-hand account of the meeting:\n\n{}\n\n".format(text),
        temperature=0.5,
        max_tokens=600,
        top_p=1.0,
        frequency_penalty=0.0,
        presence_penalty=0.0
    )
    return response_notes2summary['choices'][0]['text']

## Calling Open AI API with GPT-3

In [None]:
summarized = pd.DataFrame()

for i in range(110): #357
    text = df_optimal['TEXT'].values[i]
    
    summarized.at[i, "Original Text"] = df_optimal['TEXT'].values[i]
    
    summarized.at[i, "second_grader_summarization"] = second_grader_summarization(text)
    summarized.at[i, "Tl_dr_summarization"] = Tl_dr_summarization(text)
    summarized.at[i, "keywords_extraction_0.7"] = keywords_extraction(text)
    summarized.at[i, "keywords_extraction_0.5"] = keywords_extraction_05(text)
    summarized.at[i, "notes2summary_0"] = notes2summary(text)
    summarized.at[i, "notes2summary_0.5"] = notes2summary_05(text)
    
    summarized.at[i, "Key"] = df_optimal['KEY'].values[i]
    summarized.at[i, "Filename"] = df_optimal['FILENAME'].values[i]
    summarized.at[i, "NUM_WORDS"] = df_optimal['NUM_WORDS'].values[i]
    summarized.at[i, "NUM_WORDS_ROUNDED"] = df_optimal['NUM_WORDS_ROUNDED'].values[i]
    
    if i % 5 == 0:
        wr.s3.to_csv(
            df=summarized,
            path='s3://datasets/text_summarization/summarized/GPT-3_text-davinci-003/GPT-3_text-davinci-003.csv'
        )
        print(i)

    
summarized    

In [244]:
summarized = summarized.replace('\n',' ', regex = True)

In [None]:
summarized

In [None]:
wr.s3.to_csv(
            df=summarized,
            path='s3://datasets/text_summarization/summarized/GPT-3_text-davinci-003/GPT-3_text-davinci-003.csv'
        )

----

<mark>Input texts examples:</mark>

In [259]:

text = '''Hawaii comprises nearly the entire Hawaiian archipelago, 137 volcanic islands spanning 1,500 miles (2,400 km) that are physiographically and ethnologically part of the Polynesian subregion of Oceania. The state's ocean coastline is consequently the fourth-longest in the U.S., at about 750 miles (1,210 km).[b] The eight main islands, from northwest to southeast, are Niʻihau, Kauaʻi, Oʻahu, Molokaʻi, Lānaʻi, Kahoʻolawe, Maui, and Hawaiʻi—the last of these, after which the state is named, is often called the "Big Island" or "Hawaii Island" to avoid confusion with the state or archipelago. The uninhabited Northwestern Hawaiian Islands make up most of the Papahānaumokuākea Marine National Monument, the United States' largest protected area and the fourth-largest in the world. Of the 50 U.S. states, Hawaii is the eighth-smallest in land area and the 11th-least populous, but with 1.4 million residents ranks 13th in population density. Two-thirds of the population lives on O'ahu, home to the state's capital and largest city, Honolulu. Hawaii is among the country's most diverse states, owing to its central location in the Pacific and over two centuries of migration. As one of only six majority-minority states, it has the country's only Asian American plurality, its largest Buddhist community, and the largest proportion of multiracial people. Consequently, it is a unique melting pot of North American and East Asian cultures, in addition to its indigenous Hawaiian heritage. Settled by Polynesians some time between 1000 and 1200 CE, Hawaii was home to numerous independent chiefdoms. In 1778, British explorer James Cook was the first known non-Polynesian to arrive at the archipelago; early British influence is reflected in the state flag, which bears a Union Jack. An influx of European and American explorers, traders, and whalers arrived shortly after leading to the decimation of the once isolated Indigenous community by introducing diseases such as syphilis, gonorrhea, tuberculosis, smallpox, measles, leprosy, and typhoid fever, reducing the native Hawaiian population from between 300,000 and one million to less than 40,000 by 1890. Hawaii became a unified, internationally recognized kingdom in 1810, remaining independent until American and European businessmen overthrew the monarchy in 1893; this led to annexation by the U.S. in 1898. As a strategically valuable U.S. territory, Hawaii was attacked by Japan on December 7, 1941, which brought it global and historical significance, and contributed to America's decisive entry into World War II. Hawaii is the most recent state to join the union, on August 21, 1959. In 1993, the U.S. government formally apologized for its role in the overthrow of Hawaii's government, which spurred the Hawaiian sovereignty movement.'''

## Summarize for a 2nd grader

> Translates difficult text into simpler concepts.

In [None]:
response = openai.Completion.create(
    model="text-davinci-003",
    #prompt=text,
    prompt="Summarize this for a second-grade student:\n\n{}\n\n".format(text),
    temperature=0.7,
    max_tokens=554,
    top_p=1.0,
    frequency_penalty=0.0,
    presence_penalty=0.0
)

response['choices'][0]['text']

## TL;DR summarization

> Summarize text by adding a `'tl;dr:'` to the end of a text passage. It shows that the API understands how to perform a number of tasks with no instructions.

In [None]:
response_Tl_dr = openai.Completion.create(
  model="text-davinci-002",
  prompt="{}\n\nTl;dr\n\n".format(text),
  temperature=0.7,
  max_tokens=554,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

response_Tl_dr['choices'][0]['text']

## Keywords

> Extract keywords from a block of text. At a lower temperature it picks keywords from the text. At a higher temperature it will generate related keywords which can be helpful for creating search indexes.

In [None]:
response_extract = openai.Completion.create(
    model="text-davinci-002",
    prompt="Extract keywords from this text:\n\n{}\n\n".format(text),
    temperature=0.3,
    max_tokens=1024,
    top_p=1.0,
    frequency_penalty=0.8,
    presence_penalty=0.0
)

response_extract['choices'][0]['text']

## Notes to summary

> Notes to summary

In [None]:
response_notes2summary = openai.Completion.create(
  model="text-davinci-002",
  prompt="Convert my short hand into a first-hand account of the meeting:\n\n{}\n\n".format(text),
  temperature=0.7,
  max_tokens=600,
  top_p=1.0,
  frequency_penalty=0.0,
  presence_penalty=0.0
)

response_notes2summary['choices'][0]['text']