# The OpenAI API

In this section we will cover the basics of using the OpenAI API, including:
- Chat Completions
- Streaming
- Vision input

The beauty of the OpenAI API is that is very simple to use.

In your environment you should have a file called `.env` with the following:

```bash
OPENAI_API_KEY="sk-proj-1234567890"
```

We will give you this key in the workshop. __The key will be deactivated after the workshop!__

You can then grab the key using python:


In [86]:
from openai import OpenAI
import dotenv
import os
from rich import print as rprint # for making fancy outputs

dotenv.load_dotenv()

client = OpenAI()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

## Chat Completions

Calling a model is simple

In [87]:
system_prompt = "You are Matsuo Basho, the great Japanese haiku poet."
user_query = "Can you give me a haiku about a Samurai cat."

response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128
)

print(response.choices[0].message.content)

Silent paws tread soft,  
Moonlight gleams on sharpened claws—  
Honor in each pounce.  


Purrfect.

### Available models

Here we have used model `gpt-4o-mini`, but there are a range of models available.

In [88]:
for model in client.models.list():
    print(model)

Model(id='gpt-3.5-turbo', created=1677610602, object='model', owned_by='openai')
Model(id='gpt-3.5-turbo-0125', created=1706048358, object='model', owned_by='system')
Model(id='dall-e-2', created=1698798177, object='model', owned_by='system')
Model(id='gpt-4-1106-preview', created=1698957206, object='model', owned_by='system')
Model(id='tts-1-hd-1106', created=1699053533, object='model', owned_by='system')
Model(id='tts-1-hd', created=1699046015, object='model', owned_by='system')
Model(id='dall-e-3', created=1698785189, object='model', owned_by='system')
Model(id='whisper-1', created=1677532384, object='model', owned_by='openai-internal')
Model(id='text-embedding-3-large', created=1705953180, object='model', owned_by='system')
Model(id='text-embedding-3-small', created=1705948997, object='model', owned_by='system')
Model(id='text-embedding-ada-002', created=1671217299, object='model', owned_by='openai-internal')
Model(id='gpt-4-turbo', created=1712361441, object='model', owned_by='sys

As of writing `gpt-4o-2024-08-06` is the current best offering. But we'll stick with `gpt-4o-mini`, because it is cheaper and still highly capable.

### The response object

What is the `response` object?

In [65]:
rprint(response)

There is some useful stuff in here, apart from the `content` property, such as the token usage. You might notice some other things too, like `function_call` and `tool_calls`. These are specific to OpenAI models, and not every model supports function calling or tools, so we won't cover them. We can achieve many of the same effects without them anyway.

## Streaming a response

Streaming a response is mainly for user experience. It allows the user to see the response as it comes in, rather than waiting for the whole response to come in. For many applications, this might not be necessary.

In [9]:
response = client.chat.completions.create(
  model="gpt-4o-mini",
  messages=[
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_query},
  ],
  max_tokens=128,
  stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

Silent paws in dusk,  
Moonlit blade gleams in the night—  
Fierce heart, whiskers twitch.

All this really does is create a streaming object, which acts like a generator. We can then print the chunk as it comes in.

## Vision input
A huge draw of OpenAI models is the ability to input vision data. This is useful for a wide range of applications, including:
- Image captioning
- Object detection
- Face recognition
- Image generation

Let's try an example of inputting an image. First we need to look at the image:

![plot](../imgs/figure.jpeg)

Here is the caption from this figure:

<table><tr><td>

**Fig. 2 Spatial and temporal self-similarity and correlation in switching activity.**

_(A) Percolating devices produce complex patterns of switching events that are self-similar in nature. The top panel contains 2400 s of data, with the bottom panels showing segments of the data with 10, 100, and 1000 times greater temporal magnification and with 3, 9, and 27 times greater magnification on the vertical scale (units of G0 = 2e2/h, the quantum of conductance, are used for convenience). The activity patterns appear qualitatively similar on multiple different time scales. (B and E) The probability density function (PDF) for changes in total network conductance, P(ΔG), resulting from switching activity exhibits heavy-tailed probability distributions. (C and F) IEIs follow power law distributions, suggestive of correlations between events. (D and G) Further evidence of temporal correlation between events is given by the autocorrelation function (ACF) of the switching activity (red), which decays as a power law over several decades. When the IEI sequence is shuffled (gray), the correlations between events are destroyed, resulting in a significant increase in slope in the ACF. The data shown in (B) to (D) (sample I) were obtained with our standard (slow) sampling rate, and the data shown in (E) to (G) (sample II) were measured 1000 times faster (see Materials and Methods), providing further evidence for self-similarity._
</td></tr></table>

This figure is taken from _[Avalanches and criticality in self-organized nanoscale networks, Mallinson et al., 2019.](https://www.science.org/doi/10.1126/sciadv.aaw8438)_

Now let's use the OpenAI vision model to generate a caption for this figure.

In [92]:
prompt = (
    "This figure is a caption from a paper entitled Avalanches and criticality in self-organized nanoscale networks. "
    "Please provide a caption for this figure. "
    "You should describe the figure, grouping the panels where appropriate. "
    "Feel free to make any inferences you need to."

)

The process of calling a vision model is a little more involved, but OpenAI have a [convenient tutorial](https://platform.openai.com/docs/guides/vision) on how to do this.

Essentially we need to first convert the image to a base64 string. We can then pass this to the OpenAI API.

In [95]:
import base64
import requests

# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Path to your image
image_path = "imgs/figure.jpeg"


def get_image_caption(image_path, prompt):
  # Getting the base64 string
  base64_image = encode_image(image_path)

  headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
  }

  payload = {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": prompt
          },
          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }
          }
        ]
      }
    ],
    "max_tokens": 512
  }

  response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

  return response.json()['choices'][0]['message']['content']

In [96]:
caption = get_image_caption(image_path, prompt)
print(caption)

**Figure Caption:**

**Figure X:** Analysis of avalanche dynamics and criticality in self-organized nanoscale networks. 

**(A)** Time series data showing the fluctuations in conductance, \(\Delta G(G_0)\), over varying observation periods (100 s, 10 s, 1 s, 0.1 s). The four panels illustrate distinct behavior and amplitude of fluctuations as time scales decrease. 

**(B) and (E)** Power law distributions, \(P(\Delta G)\), of the amplitude of conductance fluctuations are presented on logarithmic scales, revealing power-law exponents \(\Delta G \approx -2.59\) (B) and \(\Delta G \approx -2.36\) (E). 

**(C) and (F)** Temporal distributions, \(P(t)\), showing the frequency of events over time. The corresponding power law exponents are \(t \approx -1.39\) (C) and \(t \approx -1.30\) (F), indicating scale-invariant behavior.

**(D) and (G)** Characteristic avalanche sizes \(A(t)\) as a function of time, indicating distinct scaling regimes with exponents \(t \approx -0.19\) and \(t \approx 

I mean, I don't know about you, but I think that's incredible. Let's consider what it has done:
- Correctly grouped the panels in the same way the real caption did.
- Provided information on the observation periods.
- Drawn out the important information, such as critical exponents.
- Made the link between power law distributions and scale-free behaviour.

However, it has failed to provide information on temporal correlations, and it has not noticed the self-similarity in caption 1.

But this is still quite impressive, and with more information we could potentially get some better captions. We will return to this later...