# Chat Completions API

The simplest and most common way to call a Large Language Model (LLM) is through the **Chat Completions API**. The idea behind this interface is straightforward you provide a sequence of messages that represent a conversation, and the model predicts what should come next. In other words, it completes the chat. This format is intuitive for both humans and machines, which is why it has become the dominant method for interacting with LLMs.

Although the Chat Completions API was originally introduced by OpenAI, it quickly became a standard across the industry. Most modern AI platforms open-source and commercial now support the same message-based interface because it provides a consistent, structured way to guide model behavior, maintain context, and control responses.

At its core, the API takes an ordered list of messages. Each message includes a role **system**, **user**, or **assistant** and content which is the text being communicated. The **system role** defines the **model’s behavior**, the **user role** contains the **actual request**, and the **assistant role** represents the **model’s previous replies**. By structuring input this way, applications can build rich, multi-turn interactions that feel natural and maintain state across calls.

This design makes **Chat Completions** ideal for **chatbots**, **automated customer support**, **interactive agents**, **tutoring systems**, and any application where a conversational experience is needed. It also serves as an accessible entry point for beginners, requiring no complex setup—only a list of messages and a single API call.

# Get the OpenAI Key

This section loads environment variables from the `.env` file and connects to OpenAI using the `API_KEY`.

**Instructions**

1. Create a `.env` file in your project directory.

2. Add the following line to the file, replacing the value with your actual OpenAI API key `OPENAI_API_KEY=sk-xxxxxxx`

3. Run the script below to load and verify the API key.

In [None]:
import os
from dotenv import load_dotenv

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

def validate_api_key(key: str):
    if not key:
        raise ValueError(
            "❌ No API key found! Please check the troubleshooting notebook."
        )
    if not key.startswith("sk-proj-"):
        raise ValueError(
            "⚠️ API key found, but it doesn't start with 'sk-proj-'. Please use the correct key."
        )
    if key.strip() != key:
        raise ValueError(
            "⚠️ API key has leading or trailing whitespace. Please remove them."
        )
    print("✔ API key has found and looks good!")

validate_api_key(api_key)

# Using an endpoint to call the Chat Completions API

An endpoint is the url exposed by an API where you send a request and receive a response. With OpenAI's Chat Completions API, the endpoint might look like

```bash
POST https://api.openai.com/v1/chat/completions
```

When calling an LLM, you send a **POST request** with some information:

- Headers, for authorization and content type

- Payload, include messages, model, and other parameters

In [None]:
import requests

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "model": "gpt-5-nano",
    "messages": [
        {"role": "user", "content": "Tell me a fun fact"}
    ]
}

payload

In [None]:
response = requests.post(
    "https://api.openai.com/v1/chat/completions",
    headers=headers,
    json=payload
)

response.json()

In [None]:
response.json()["choices"][0]["message"]["content"]

# OpenAI Package

Using HTTP endpoints can be **cumbersome** that you need to construct **HTTP requests**, set headers correctly, **handle authentication tokens**, and format **JSON payloads** precisely. On top of that, you must handle network errors, rate limits, and parse responses manually, which can make the code verbose and error-prone.

The OpenAI Python package simplifies this process by providing a cleaner syntax that **abstracts** these details. It includes **built-in error handling**, **raises informative exceptions for common issues**, and **offers helpful parameter defaults** and **utility functions**, making it **easier to manage models**, **responses**, and advanced features like **streaming**.

## ChatGPT

### Create OpenAI Client

In [None]:
from openai import OpenAI

openai = OpenAI()

### Let's ask for a fun fact

In [None]:
response = openai.chat.completions.create(
    model="gpt-5-nano",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact"
        }
    ]
)

response.choices[0].message.content

## And then this great thing happened

**OpenAI's Chat Completions API** became so popular that other model providers started offering OpenAI-compatible endpoints.

OpenAI made it even easier so you can use the same Python client library developed for GPT, simply by specifying a different endpoint URL and API key to access another provider.

So you can use

```python
from openai import OpenAI

gemini = OpenAI(
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
    api_key="AIz...."
)

gemini.chat.completions.create(
    model="gemini-1.5",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact"
        }
    ]
)
```

Note that even though references the OpenAI Python client, it is only acting as a **generic client** to call the endpoint you specify. It means no OpenAI models are being used the actual computation is handled entirely by the provider at the endpoint.

## Google Gemini

### Create a Google Gemini API Key

**1. Go to AI Studio**

Visit https://aistudio.google.com/, use your Google account to sign in. If you don’t have an account, create one.

**2. Navigate to API Keys**

Once signed in, go to the API keys https://aistudio.google.com/api-keys

**3. Create a new api key**

- Click on the button to create a new key.

- Give your key a name, for example `defautl`.

- Select the permissions or roles required for your use case, usually “Read & Write” for API access.

**4. Storage Your API Key**

After creation, the API key will be displayed, storage it immediately, some platforms only show it once.

### Add the key into the `.env` file

Open your `.env` file in your project, add the line like

```bash
GOOGLE_API_KEY=AIz...
```

Replace `AIz...` with your actual API key then save the .env file.

### Get the Gemini Key

In [None]:
import os
from dotenv import load_dotenv

# Gemini endpoint
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"

# Load .env file
load_dotenv(override=True)

# Get the Gemini API key
google_api_key = os.getenv("GOOGLE_API_KEY")

def validate_gemini_api_key(key: str):
    if not key:
        raise ValueError(
            "❌ No Gemini API key found! Please add your key to the .env file and save it."
        )
    if not key.startswith("AIz"):
        raise ValueError(
            "⚠️ API key found, but it doesn't start with 'AIz'. Please use the correct Gemini key."
        )
    if key.strip() != key:
        raise ValueError(
            "⚠️ API key has leading or trailing whitespace. Please remove them."
        )
    print("✔ Gemini API key found and looks good!")

# Validate the key
validate_gemini_api_key(google_api_key)


### Create the OpenAI Client

In [None]:
from openai import OpenAI

gemini = OpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)

### Let's ask for a fun fact

In [None]:
response = gemini.chat.completions.create(
    model="gemini-2.5-flash-lite",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact"
        }
    ]
)

response.choices[0].message.content

## Ollama

### Compatible model

Ollama uses LLaMA models from Meta.

```bash
llama3.2
```

If your computer has limited resources, use the smaller version

```bash
llama3.2:1b
```

Not using llama3.3 or llama4 because they are too large for most personal computers.

Try to download the `llama3.2` with script below

```bash
ollama pull llama3.2
```

### Running the Ollama server

Try to excute the following, if it prints "Ollama is running", the server is active and ready to accept requests. If not, you need to start the Ollama server manually.

Open the terminal and run
```bash
ollama serve
```

It will start the local server at http://localhost:11434.

In [None]:
import requests
print(requests.get("http://localhost:11434").content)

### Create the OpenAI Client

In [None]:
OLLAMA_BASE_URL = "http://localhost:11434/v1"

ollama = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

Note that the `/v1` in the base url is required because the OpenAI-compatible client expects API version 1 endpoints. Without `/v1`, requests to the server will fail.


### Let's ask for a fun fact

In [None]:
response = ollama.chat.completions.create(
    model="llama3.2",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact"
        }
    ]
)

response.choices[0].message.content

We can use with another models support by Ollama like `deepseek-r1:1.5b`, which is "distilled" into Qwen from Alibaba Cloud

In [None]:
!ollama pull deepseek-r1:1.5b

In [None]:
response = ollama.chat.completions.create(
    model="deepseek-r1:1.5b",
    messages=[
        {
            "role": "user",
            "content": "Tell me a fun fact"
        }
    ]
)

response.choices[0].message.content