# Language Models, the Chat Format and Tokens

## Setup

In [None]:
from google.colab import drive

drive.mount('/content/drive/')

Mounted at /content/drive/


In [None]:
import os
os.chdir("drive/")
os.chdir('My Drive')
os.chdir('Experiment')
os.chdir('ChatGPT')

In [None]:
OUTPUT_DIR = './outputs/'
if not os.path.exists(OUTPUT_DIR):
    os.makedirs(OUTPUT_DIR)

In [None]:
!pip install openai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting openai
  Downloading openai-0.27.7-py3-none-any.whl (71 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.0/72.0 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
Collecting aiohttp (from openai)
  Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
  Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.5/114.5 kB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
  Downloadin

In [None]:
import openai
import os

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())

openai.api_key  = os.getenv('OPENAI_API_KEY')

#### helper function

Throughout this course, we will use OpenAI's `gpt-3.5-turbo` model and the [chat completions endpoint](https://platform.openai.com/docs/guides/chat). 

This helper function will make it easier to use prompts and look at the generated outputs:

In [None]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message["content"]

In [None]:
def get_completion_from_messages(messages, model="gpt-3.5-turbo", temperature=0, max_tokens=500):
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, # this is the degree of randomness of the model's output
        max_tokens=max_tokens, # the maximum number of tokens the model can ouptut 
    )
    return response.choices[0].message["content"]

In [None]:
def get_completion_and_token_count(messages, model="gpt-3.5-turbo", temperature=0, max_tokens=500):
    
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=temperature, 
        max_tokens=max_tokens,
    )
    
    content = response.choices[0].message["content"]
    
    token_dict = {
      'prompt_tokens':response['usage']['prompt_tokens'],
      'completion_tokens':response['usage']['completion_tokens'],
      'total_tokens':response['usage']['total_tokens'],
    }

    return content, token_dict

## Prompt the model and get a completion

In [None]:
response = get_completion("What is the capital of France?")
print(response)

The capital of France is Paris.


## Tokens

In [None]:
response = get_completion("Take the letters in lollipop and reverse them")
print(response)

ppilolol


"lollipop" in reverse should be "popillol"

In [None]:
response = get_completion("""Take the letters in l-o-l-l-i-p-o-p and reverse them""")
print(response)

p-o-p-i-l-l-o-l


## With formatted messages

In [None]:
messages =  [  
  {'role':'system', 
  'content':"""You are an assistant who responds in the style of Dr Seuss."""},    
  {'role':'user', 
  'content':"""write me a very short poem about a happy carrot"""},  
] 
response = get_completion_from_messages(messages, temperature=1)
print(response)

Oh, the happy orange carrot,
With a smile as big as a parrot,
Growing in the garden so green,
Oh, how happy it has been!


In [None]:
# length
messages =  [  
  {'role':'system',
  'content':'All your responses must be \
  one sentence long.'},    
  {'role':'user',
  'content':'write me a story about a happy carrot'},  
]
response = get_completion_from_messages(messages, temperature =1)
print(response)

Once upon a time, there lived a happy carrot named Carl who lived in a beautiful garden and loved nothing more than basking in the warm sun and soaking up the fresh rainwater.


In [None]:
# length
messages =  [  
{'role':'system',
 'content':'All your responses must be \
one sentence long.'},    
{'role':'user',
 'content':'write me a story about a happy carrot'},  
]
response = get_completion_from_messages(messages, temperature =1)
print(response)

Once there was a carrot named Carl who grew to be plump and bright, always gleeful in his patch and when he was harvested, he became the starring ingredient in a delicious bowl of soup that brought smiles to everyone who tasted it.


In [None]:
# combined
messages =  [  
{'role':'system',
 'content':"""You are an assistant who \
responds in the style of Dr Seuss. \
All your responses must be one sentence long."""},    
{'role':'user',
 'content':"""write me a story about a happy carrot"""},
] 
response = get_completion_from_messages(messages, 
                                        temperature =1)
print(response)

In a garden so bright, lived a carrot so cheery and light; he loved chatting with the bees and the birds, and every morning he smiled at the rising sun without any words.


In [None]:
messages = [
  {'role':'system', 
  'content':"""You are an assistant who responds\
  in the style of Dr Seuss."""},    
  {'role':'user',
  'content':"""write me a very short poem \ 
  about a happy carrot"""},  
]


response, token_dict = get_completion_and_token_count(messages)

In [None]:
print(response)

Oh, the happy carrot, so bright and so bold,
With a smile on its face, and a story untold.
It grew in the garden, with love and with care,
And now it's so happy, it's beyond compare!


In [None]:
print(token_dict)

{'prompt_tokens': 41, 'completion_tokens': 49, 'total_tokens': 90}


#### A note about the backslash

- In the notebook, we are using a backslash `\` to make the text fit on the screen without inserting newline '\n' characters.
- GPT-3 isn't really affected whether you insert newline characters or not.  But when working with LLMs in general, you may consider whether newline characters in your prompt may affect the model's performance.