<a href="https://colab.research.google.com/github/Vlad-Enia/NN-LLM-Intro/blob/master/Part%20II%20-%20LLMs/Demos/Text%20Generation/OpenAI_Text_Generation_Chat_Completions_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#OpenAI Account Setup


## Creating OpenAI API key
* First, [create](https://platform.openai.com/signup) an OpenAI account
* Add [credit balance](https://platform.openai.com/settings/organization/billing/overview)
  * Using the API is charged per token, each model having different prices
  * We will mainly be using GPT-4o mini, which is the cost-effective model offered by OpenAI.
  * For the purpose of these tutorials, $5 should be more than enough
* Generate an [API key](https://platform.openai.com/api-keys)

## Storing and using the API key
* Store the key as a secret in Google Colab
  * Click the "Key" button on the left toolbar
  * Add the API key as a secret with name `OPENAI_API_KEY`
  * Enable `Notebook access` for the secret
  * Access the key using:

In [2]:
from google.colab import userdata
from openai import OpenAI

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
client = OpenAI(api_key=OPENAI_API_KEY)

# Generating text with Large Language Models (LLMs)

`GPT-4o mini` is one a cost-effiecient model by [OpenAI](https://openai.com/). " `GPT-4o Mini` can generate human-like prose by responding to prompts written in the [Chat Markup Language](https://learn.microsoft.com/en-us/azure/cognitive-services/openai/how-to/chatgpt), or ChatML for short. Here are few examples demonstrating how to leverage `GPT-4o mini`'s text-generation capabilities. Start by asking `GPT-4o mini` why is the sky blue. Run this cell several times and you'll get a different result each time. Set `temperature` to 0, however, and the results will be the same most of the time:


In [3]:
messages = [{
    'role': 'user',
    'content': 'Why is the sky blue?'
}]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages
)

print(response.choices[0].message.content)

The sky appears blue due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it is made up of various colors, each having different wavelengths. Blue light has a shorter wavelength and is scattered in all directions by the gases and particles in the atmosphere. 

As the blue light is scattered more than the other colors, it becomes more prominent to our eyes, especially when the sun is high in the sky. During sunrise and sunset, the light has to pass through a larger volume of atmosphere, scattering the shorter blue wavelengths and allowing the longer red and orange wavelengths to dominate, which is why we see more reddish colors during these times.


You can richen the UI experience by streaming the response. Here's how:

In [None]:
response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

The sky appears blue primarily due to a phenomenon called Rayleigh scattering. This occurs when sunlight interacts with the Earth's atmosphere, which is made up of various gases and particles. 

Sunlight, or white light, consists of multiple colors, each with different wavelengths. Blue light has shorter wavelengths than red light. When sunlight passes through the atmosphere, the shorter blue wavelengths are scattered more than the longer wavelengths (like red) because they collide with air molecules more frequently.

As a result, when you look up at the sky during the day, you see more of this scattered blue light. In contrast, during sunrise and sunset, the sunlight passes through a thicker layer of the atmosphere, scattering the shorter wavelengths out of the line of sight and allowing the longer wavelengths, such as reds and oranges, to dominate. This is why the sky can appear red or orange during those times.

## Context

Messages transmitted to `GPT-4o-mini` use the Chat Markup Language. ChatML exists so that the context of a conversation can be preserved across calls. To demonstrate, ask the LLM what its name is:

In [None]:
messages = [{
    'role': 'user',
    'content': 'My name is Vlad. What\'s your name?' \
}]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Hello, Vlad! I'm an AI language model and don't have a personal name, but you can just call me Assistant. How can I help you today?

But now try this:

In [None]:
messages = [
    {
        'role': 'system',
        'content': 'You are a friendly assistant named GPT.'
    },
    {
        'role': 'user',
        'content': 'My name is Vlad. What\'s your name?'
    }
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Hello, Vlad! I’m called GPT. How can I assist you today?

You can be as specific as you’d like with `system` messages, even saying "If you don’t know the answer to a question, say I don’t know." You can also prescribe a persona. Replace "friendly" with "sarcastic" in the message from system and run the code again.  


In [4]:
messages = [
    {
        'role': 'system',
        'content': 'You are a sarcastic, sassy assistant named GPT.'
    },
    {
        'role': 'user',
        'content': 'My name is Vlad. What\'s your name?'
    }
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Oh, just call me GPT, the fabulous and ever-so-amazing assistant. What can I do for you today, Vlad? Save the world? Find the meaning of life? Or maybe just help you decide what to binge-watch next?


ChatML's greatest power lies in persisting context from one call to the next. As an example, try this:

In [None]:
messages = [
    {
        'role': 'system',
        'content': 'You are a friendly assistant named GPT.'
    },
    {
        'role': 'user',
        'content': 'My name is Vlad. What\'s your name?'
    }
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Hi Vlad! I'm GPT, your friendly assistant. How can I help you today?

Then follow up immediately with this:

In [None]:
messages = [
    {
        'role': 'system',
        'content': 'You are a friendly assistant named GPT.'
    },
    {
        'role': 'user',
        'content': 'What is my name?'
    }
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

I don't know your name unless you tell me. How can I assist you today?

The LLM will respond with something along the lines of "I'm sorry, but I don’t have access to that information." But now try this:

In [None]:
messages = [
    {
        'role': 'system',
        'content': 'You are a friendly assistant named GPT.'
    },
    {
        'role': 'user',
        'content': 'My name is Vlad. What\'s your name?'
    },
    {
        'role': 'assistant',
        'content': 'Hello Vlad, my name is GPT. Nice to meet you!'
    },
    {
        'role': 'user',
        'content': 'What is my name?'
    }
]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Your name is Vlad!

Get it? Calls to `GPT-4o mini` are stateless. If you give `GPT-4o mini` your name in one call and ask it to repeat your name in the next call, it has no clue. But with ChatML, you can provide past responses as context for the current call. You can build a conversational assistant simply by repeating the last few prompts and responses in each call to `GPT-4o mini`. The further back you go, the longer the assistant's "memory" will be.

## Tokenization

LLMs don't work with words; they work with *tokens*. Tokenization plays an important role in Natural Language Processing. Neural networks can’t process text, at least not directly; they only process numbers. Tokenization converts words into numbers that a deep-learning model can understand. When an LLM generates a response by predicting a series of tokens, the tokenization process is reversed to convert the tokens into human-readable text.

OpenAI LLMs use a form of tokenization called [Byte-Pair Encoding](https://en.wikipedia.org/wiki/Byte_pair_encoding) (BPE).


As a rule of thumb, 3 words on average translate to about 4 BPE tokens. That’s important because LLMs limit the number of tokens in each API call. The maximum token count is controlled by a parameter named `max_tokens`. For `GPT-4o-mini`, which has a context window size of `128K` (i.e. you can input `128000` tokens), it's enough to pass in a document that's a few hundred pages long. If the number of tokens generated exceeds `max_tokens`, then either the call will fail or the response will be truncated.

You can compute the number of tokens generated from a text sample with help from a Python package named [`tiktoken`](https://pypi.org/project/tiktoken/):

In [None]:
!pip install tiktoken

Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.2 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.2/1.2 MB[0m [31m87.4 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m32.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: tiktoken
Successfully installed tiktoken-0.9.0


In [None]:
import tiktoken

text = '''
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
    Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
    Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
    Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
    '''

encoding = tiktoken.encoding_for_model('gpt-4o-mini')
num_tokens = len(encoding.encode(text))
print(f'{num_tokens} tokens')

96 tokens


You can learn more about tokens and how to estimate the number of tokens in a `messages` object using [OpenAI Cookbook](https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_with_tiktoken.ipynb)

There are a couple of reasons to be aware of the token count in each call. First, you’re charged by the token for input and output. The larger the `messages` array and the longer the response, the more you pay. Second, when using the messages array to provide context from previous calls, you have a finite amount of space to work with. It's common practice to pick a number – say, 10 or 20 – and limit the context from previous calls to that number of messages, or to programmatically compute the number of tokens that a conversation comprises and include as many messages as `max_tokens` will allow.

## Natural Language Processing

`GPT-4o mini` can perform many NLP tasks such as sentiment analysis and neural machine translation (NMT) without further training. Here's an example that translates text from English to French. It's a good idea to set `temperature` to 0 here since you generally want translations to be accurate and repeatable rather than creative:

###Neural Machine Translation

In [None]:
content =  f'Translate the following text from English to Serbian: The quick brown fox jumps over the lazy dog'

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    temperature=0,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Brza smeđa lisica preskoči lenju psa.

###Sentiment Analysis

The following examples demonstrate how to use `GPT-4o-mini` for sentiment analysis:

In [None]:
content = '''
    Indicate whether the following review's sentiment is positive or
    negative: Great food and excellent service.
    '''

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

The review's sentiment is positive.

In [None]:
content = '''
    Indicate whether the following review's sentiment is positive or
    negative: Long lines and poor customer service.
    '''

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

The sentiment of the review is negative.

###Spam Filter

Sentiment analysis is a text-classification task. LLMs can classify text in other ways, too. The next two examples demonstrate how to use `GPT-4o mini` as a spam filter:

In [None]:
content = '''
    Indicate whether the following email is spam or not spam:
    Please plan to attend the code review at 2:00 p.m. this afternoon.
    '''

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Not spam.

In [None]:
content = '''
    Indicate whether the following email is spam or not spam:
    Order prescription meds online and save $$$.
    '''

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

The email is likely to be classified as spam. It promotes ordering prescription medications online, which is commonly associated with spam or potentially fraudulent offers, especially when it emphasizes saving money in an unsolicited manner.

In [None]:
content = '''
    Indicate whether the following email is spam or not spam:
    Order prescription meds online and save $$$.
    '''

messages = [
    {
      'role': 'system',
      'content':
        '''You are a spam filter. Please evaluate using \"Spam\" or \"Not Spam".
          If you cannot evaluate, please respond with \"Cannot evaluate\"'''
    },
    {
        'role': 'user',
        'content': content
    }
  ]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Spam

In [None]:
content = '''
    12341241241
    '''

messages = [
    {
      'role': 'system',
      'content':
        '''You are a spam filter. Please evaluate using \"Spam\" or \"Not Spam".
          If you cannot evaluate, please respond with \"Cannot evaluate\"'''
    },
    {
        'role': 'user',
        'content': content
    }
  ]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    stream=True
)

for chunk in response:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end='')

Cannot evaluate

###Parsing Raw Data

A practical use for LLMs is parsing freeform address fields and generating structured data. Here's an example:

In [None]:
addresses = [
    '11 Aviation Avenue, Charlottetown, PE C1E 0A1, Canada',
    'Roche Molecular Systems, Inc., 4300 Hacienda Drive, Pleasanton, CA 94588, US',
    'Cross Research S.A., Phase I Unit, Via F.A. Giorgioli 14, Arzo, 6864, CH',
    'Wasdell Group, Wasdell Packaging Ltd Unit 1-8, Euroway Industrial Estate, Blagrove, Swindon, SN5 8YW, GB',
    'Policlinico Gemelli, 4th Floor, Wing J, Largo Gemelli 8, Rome, 00168, IT',
    'Academisch Ziekenhuis Maastricht, CDL Stamcellaboratorium, P. Debyelaan 25 5e, Maastricht, 6229 HX, NL',
    'Wintellect Brussels, Leuvensesteenweg 555, Marken Benelux, 1930, BE',
    'SCM department, AstraZeneca K.K., Maihara Factory, AstraZeneca K.K., 215-31, Miyos, Shiga-Ken, 215-31, JP',
    'Healthcare Logistics Australia, 7 Dolerite Way, Pemulwuy NSW 2145, AU',
    'Suncoast Research, 2128 W Flagler St, Suite 101, Mami, FL, 33135, US'
]

for address in addresses:
    content = f'''
        Parse the freeform address below into fields and return a JSON
        representation that uses the following format. Convert country
        abbreviations such as "CA" into country names such as "Canada"
        and state abbreviations such as "CA" into state names such as
        "California." Leave unknown fields blank. Also correct any
        obvious misspellings.

        {{
            "Name": "Recipient",
            "Street Address": "Street address",
            "City": "City, town, etc.",
            "State": "State, province, region, territory, canton, county, department, länder, or prefecture",
            "Country": "Country name",
            "Postal Code": "Postal code"
        }}

        Address: {address}
        '''

    messages = [{ 'role': 'user', 'content': content }]

    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=messages,
        response_format={ 'type': 'json_object' },
        stream=True
    )

    for chunk in response:
        content = chunk.choices[0].delta.content
        if content is not None:
            print(content, end='')

    print()

{
    "Name": "Recipient",
    "Street Address": "11 Aviation Avenue",
    "City": "Charlottetown",
    "State": "Prince Edward Island",
    "Country": "Canada",
    "Postal Code": "C1E 0A1"
}
{
    "Name": "Roche Molecular Systems, Inc.",
    "Street Address": "4300 Hacienda Drive",
    "City": "Pleasanton",
    "State": "California",
    "Country": "United States",
    "Postal Code": "94588"
}
{
    "Name": "Cross Research S.A.",
    "Street Address": "Via F.A. Giorgioli 14",
    "City": "Arzo",
    "State": "",
    "Country": "Switzerland",
    "Postal Code": "6864"
}
{
    "Name": "Wasdell Group",
    "Street Address": "Wasdell Packaging Ltd Unit 1-8, Euroway Industrial Estate, Blagrove",
    "City": "Swindon",
    "State": "",
    "Country": "United Kingdom",
    "Postal Code": "SN5 8YW"
}
{
    "Name": "Recipient",
    "Street Address": "Policlinico Gemelli, 4th Floor, Wing J, Largo Gemelli 8",
    "City": "Rome",
    "State": "",
    "Country": "Italy",
    "Postal Code": "00168

##Ingesting Whole Documents

`GPT-4o-mini`'s 128K context-window size enables it to ingest large documents for summarization or other purposes. Let's see what it can do with Microsoft's 2022 annual report. We will ask it format the output using Markdown, and then use the `IPython.display` package to properly display the generated summary.

In [5]:
from google.colab import drive
drive.mount('/content/drive')
MSFT_ANNUAL_REPORT_FILE_PATH = userdata.get('MSFT_ANNUAL_REPORT_FILE_PATH')
with open(MSFT_ANNUAL_REPORT_FILE_PATH, 'r') as file:
    report = file.read()


Mounted at /content/drive


In [6]:
from IPython.display import Markdown, display

# replace with actual path to report file, after saving it to your drive

content = f'''
    Summarize the following annual report from Microsoft. Use
    markdown formatting in your output:

    {report}
    '''

messages = [{ 'role': 'user', 'content': content }]

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages
)

output = response.choices[0].message.content
display(Markdown(output))

# Microsoft Annual Report Summary (Fiscal Year Ending June 30, 2022)

## Overview
In a tumultuous year characterized by economic uncertainty, high inflation, and geopolitical challenges, Microsoft continued to thrive, experiencing record revenue of **$198 billion** and operating income of **$83 billion**. The introduction of advanced digital technologies positioned Microsoft as a leader in empowering organizations and individuals.

## Mission & Responsibility
Microsoft's mission is to empower every person and organization on the planet to achieve more. The company emphasizes its role in driving digital transformation across industries, believing technology is crucial in overcoming contemporary challenges.

### Key Examples of Impact
- **Ferrovial**: Utilizing Microsoft cloud to build safer roads for future autonomous vehicles.
- **Peace Parks Foundation**: Leveraging Azure AI for wildlife protection and park maintenance.
- **Kawasaki Heavy Industries**: Developing an industrial metaverse using Azure IoT and HoloLens.
- **Globo**: Empowering employee solutions with Power Platform.
- **Ørsted**: Using Microsoft Intelligent Data Platform for predictive maintenance in wind energy.

## Financial Highlights
- Microsoft Cloud surpassed **$100 billion** in annualized revenue for the first time.
- Notable revenue growth was reported across various segments:
  - **Intelligent Cloud**: Increased by 25% to **$75.3 billion**.
  - **Productivity and Business Processes**: Up 18% to **$63.4 billion**.
  - **More Personal Computing**: Grew 10% to **$59.7 billion**.

## Social Responsibility Initiatives
- Focused on increasing access to digital skills and commitments to equip **10 million individuals** from underserved communities with job skills by 2025, especially in cybersecurity.
- Provided **$3.2 billion** in technology donations to nonprofits, with plans to double outreach over the next five years.

### Commitment to Inclusivity
- Advocating for human rights and racial equity through community support initiatives.
- Ensured that over **50 million** people in rural areas gained broadband access since 2017.

### Sustainability Goals
- Committed to becoming carbon negative by 2030, water positive by 2030, and zero waste by 2030, while achieving significant reductions in emissions.

## Technology Innovations and Future Opportunities
Microsoft plans to leverage the increasing integration of technology in various sectors, anticipating significant growth in technology's contribution to global GDP.

### Strategic Areas of Focus:
- **Productivity and Business Processes**: Expanding tools like Microsoft 365, Dynamics 365, and Teams.
- **Intelligent Cloud**: Succeeding through Azure and new AI services.
- **Gaming**: Focusing on Xbox and cloud gaming solutions, with the acquisition of Activision Blizzard set to enhance offerings.

## Company Culture and Values
Microsoft continues to prioritize a **growth mindset**, fostering a culture of inclusion, collaboration, and diversity to better serve its diverse global customer base.

### Employee Engagement
- Significant employee contributions with **$255 million** donated to nonprofits and over **720,000** volunteer hours recorded in 2022.

## Conclusion
Chairman and CEO Satya Nadella expressed gratitude for shareholder support and emphasized that with continued commitment to innovation and social responsibility, Microsoft is poised for a transformative future.

---

Microsoft's annual report emphasizes the company's resilience, its commitment to social responsibility, and potential for future growth, all while navigating complex global challenges.