# SECTION 1: Introduction

Welcome to my NLP project using GPT-3 and Seinfeld data. The goal is to allow the user to prompt GPT-3 with something and have it respond with an AI-generated Seinfeld situation.

Example:
> Prompt: "Trying to connect to WiFi"  
> Response: "When the WiFi George usually steals suddenly has a password, he becomes addicted to trying to "hack" in. J: 'Just get your own!' G: 'NEVER' "
(This example response is from the @ModernSeinfeld twitter feed.)

Setup:
- This notebook probably won't work as-is on Colab. My local dev environment was jupyter notebook with a miniconda 4.11.0 python environment, run on a macbook
- You'll need an OpenAI API key. Save this either to your Google drive in ./data/GPT3_api.key or in an .env file in the working directory
- The data is saved in ./data. You can use the ScrapeTwitter_ModernSeinfeld notebook to see how the tweets were gathered.



# SECTION 2: Data

The data used is:


*   Seinfeld episodes synopsis (173), from imdb and scraped here: https://www.kaggle.com/bcruise/seinfeld-episodes
*   @ModernSeinfeld tweets (492), scraped using twint in the acompanying notebook
*   Curb Your Enthusiasm episode synopsis, might be interesting to add later for more of a "Larry David" bot

With a combined 565 examples, we should have enough data to fine-tune GPT-3. According to the OpenAI guide <https://beta.openai.com/docs/guides/fine-tuning>, "we recommend having at least a couple hundred examples. In general, we've found that each doubling of the dataset size leads to a linear increase in model quality."

In [45]:
import pandas as pd

episodes_df = pd.read_csv('./data/seinfeld_imdb.csv.xls',usecols=['title','desc'])
tweets_df = pd.read_csv('./data/SeinfeldToday_tweets.csv',usecols=['tweet'], skiprows=[1])
print(episodes_df.count())

print(episodes_df.head(1), "\n")
print(tweets_df.head(1))

title    173
desc     173
dtype: int64
                 title                                               desc
0  Good News, Bad News  Jerry and George argue whether an overnight vi... 

                                               tweet
0  George's GF wants a "no phones at dinner" rule...


# SECTION 3: Working with Open AI and GPT3

- First we load the API Key
- Then we fine-tune a model, following https://beta.openai.com/docs/guides/fine-tuning
- Then we test prompting the model with something and seeing what Seinfeldy situation it comes up with 


In [24]:
!pip -q install openai

In [2]:
import os
import openai

In [20]:
# Load OpenAI API Key

try:
  # When in Colab
  from google.colab import drive
  drive.mount('/content/drive')
  with open("/content/drive/My Drive/Colab Notebooks/GPT3_api", 'r') as file:
    openai.api_key = file.read().rstrip('\n')
except:
  # When in local dev environment
  try:
     # Load variables from .env file in working directory
     !pip install python-dotenv
     from dotenv import load_dotenv
     load_dotenv()
  except:
     # You'll need to set the environment variables somehow, perhaps in .bashrc
     print("Warning: .env file not found")
  API_KEY = os.getenv('PROJECT_API_KEY')
  openai.api_key = os.getenv("OPENAI_API_KEY")



In [21]:
# Test OpenAI API
response = openai.Completion.create(engine="text-davinci-001", prompt="Say this is a test", max_tokens=6)
print(response)

{
  "choices": [
    {
      "finish_reason": "length",
      "index": 0,
      "logprobs": null,
      "text": "\n\nThis is a test"
    }
  ],
  "created": 1644344165,
  "id": "cmpl-4ZPK1Y6D34H8YooQRTdNznDCfDhvS",
  "model": "text-davinci:001",
  "object": "text_completion"
}
