In [3]:
%pip install openai

Note: you may need to restart the kernel to use updated packages.


To run this in Colab, you will need to set an API Key named `DSBA_LLAMA3_KEY` and `MODAL_BASE_URL`, which is the URL endpoint where the LLaMa 3 model is hosted. You will need to add `/v1/` to the `MODAL_BASE_URL` path so it will look like:

```
# MODAL_BASE_URL
https://your-workspace-name--vllm-openai-compatible-serve.modal.run/v1/
```



If using the class API example, these will be provided to you. Otherwise you will need to get these from your Modal service.

![](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*5wEevNCOf80GTHwptPTB4g.png)

As mentioned, I have hosted `LLaMa3-8B-Instruct` model that we'll use instead of OpenAI. The reason is this avoids individual costs on the API -- only cost to me for hosting on Modal.

This hosted model will **not** be up indefinitely and only for class demo purposes.

If you host your own model, be sure to destroy it when you're done or you'll be charged.

In [4]:
#Environmental variables: Running locally
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

API_KEY = os.getenv('DSBA_LLAMA3_KEY')
MODAL_BASE_URL = os.getenv('MODAL_BASE_URL')


In [21]:
#Environmental variables: Running in Colab
from openai import OpenAI
from google.colab import userdata

api_key = api_key=userdata.get("DSBA_LLAMA3_KEY")
base_url = userdata.get("MODAL_BASE_URL")

ModuleNotFoundError: No module named 'google'

In [5]:
from openai import OpenAI
client = OpenAI(api_key = API_KEY)
client.base_url = MODAL_BASE_URL

model = "/models/NousResearch/Meta-Llama-3-8B-Instruct"


In [6]:
messages = [
    {
        "role": "system",
        "content": "You are a poetic assistant, skilled in writing satirical doggerel with creative flair.",
    },
    {
        "role": "user",
        "content": "Compose a limerick about baboons and racoons.",
    },
]

stream = client.chat.completions.create(
    model=model,
    messages=messages,
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

In [8]:
messages = [
    {
        "role": "system",
        "content": "You think logically, step by step.",
    },
    {
        "role": "user",
        "content": "What is 2+ 2 +2",
    },
]

stream = client.chat.completions.create(
    model=model,
    messages=messages,
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

1. Start with the first number: 2
 2. Add 2: 2 + 2 = 4
 3. Add 2 again: 4 + 2 = 6

The answer is 6.

Alternatively, you can run this as a cURL command. This example shows how to run it in bash (Unix/Mac).

This assumes you have set local environmental variables (e.g., `.env` with `MODAL_BASE_URL` and `DSBA_LLAMA3_KEY` and loaded them)

```bash
curl "$MODAL_BASE_URL/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $DSBA_LLAMA3_KEY" \
    -d '{
        "model": "/models/NousResearch/Meta-Llama-3-8B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'
```

You can also add `| jq` if you have [`jq`](https://jqlang.github.io/jq/download/) installed to have it "pretty print":
```bash
curl "$MODAL_BASE_URL/chat/completions" \
    -H "Content-Type: application/json" \
    -H "Authorization: Bearer $DSBA_LLAMA3_KEY" \
    -d '{
        "model": "/models/NousResearch/Meta-Llama-3-8B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }' | jq
```