<a href="https://colab.research.google.com/github/wandb/edu/blob/main/llm-structured-extraction/4.final-project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
<!--- @wandbcode{llmeng-1-final} -->

## Setup Colab

Run this code if you're using Google Colab, you can skip if you're running locally. You may need to restart Colab after installing requirements.

In [11]:
from pathlib import Path

# Download files on colab
if not Path("requirements.txt").exists():
    !wget https://raw.githubusercontent.com/wandb/edu/main/llm-structured-extraction/{requirements.txt,helpers.py}
    !pip install -r requirements.txt -Uqq

--2025-04-29 16:32:23--  https://raw.githubusercontent.com/wandb/edu/main/llm-structured-extraction/requirements.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 90 [text/plain]
Saving to: ‘requirements.txt’


2025-04-29 16:32:23 (3.81 MB/s) - ‘requirements.txt’ saved [90/90]

--2025-04-29 16:32:23--  https://raw.githubusercontent.com/wandb/edu/main/llm-structured-extraction/helpers.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 994 [text/plain]
Saving to: ‘helpers.py’


2025-04-29 16:32:23 (47.5 MB/s) - ‘helpers.py’ saved [994/994]

FINISHED --2025-04-29 16:32:23--
Total wall clock time: 0.3s
Downloaded: 2 files, 1.1K in 0s (24.3 MB/s)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [17]:
import os
from getpass import getpass
import openai

# Setup your Openai API key
if os.getenv("OPENAI_API_KEY") is None:
  if any(['VSCODE' in x for x in os.environ.keys()]):
    print('Please enter password in the VS Code prompt at the top of your VS Code window!')
  os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/account/api-keys\n")
  openai.api_key = os.getenv("OPENAI_API_KEY", "")


os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API key"
print("OpenAI API key configured")

OpenAI API key configured


## Using Weave for LLM Experiment Tracking

[Weave](https://wandb.github.io/weave/) is a lightweight toolkit by Weights & Biases for tracking and evaluating LLM applications. It allows you to:

- Log and debug language model inputs, outputs, and traces
- Build rigorous evaluations for LLM use cases
- Organize information across the LLM workflow

OpenAI calls are automatically logged to Weave.
`@weave.op()` allows you to log additional information to Weave.

In [18]:
import weave
weave.init("llmeng-1-final")

Please login to Weights & Biases (https://wandb.ai/) to continue:


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize?ref=weave
wandb: Paste an API key from your profile and hit enter:

 ··········


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33medbertkwesi-ek[0m ([33medbertkwesi-ek-unilever[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Logged in as Weights & Biases user: edbertkwesi-ek.
View Weave data at https://wandb.ai/edbertkwesi-ek-unilever/llmeng-1-final/weave


<weave.trace.weave_client.WeaveClient at 0x7a1c78846e50>

## Using Weights & Biases to track experiments

Experimenting with prompts, function calling and response model schema is critical to get good results. As LLM Engineers, we will be methodical and use Weights & Biases to track our experiments.

Here are a few things you should consider logging:

1. Save input and output pairs for later analysis
2. Save the JSON schema for the response_model
3. Having snapshots of the model and data allow us to compare results over time, and as we make changes to the model we can see how the results change.

This is particularly useful when we might want to blend a mix of synthetic and real data to evaluate our model. We will use the `wandb` library to track our experiments and save the results to a dashboard.


In [21]:
import json
import instructor

from openai import AsyncOpenAI
from helpers import dicts_to_df
from datetime import date
from pydantic import BaseModel, Field


class DateRange(BaseModel):
    chain_of_thought: str = Field(
        description="Think step by step to plan what is the best time range to search in"
    )
    start: date
    end: date


class Query(BaseModel):
    rewritten_query: str = Field(
        description="Rewrite the query to make it more specific"
    )
    published_daterange: DateRange = Field(
        description="Effective date range to search in"
    )

    def report(self):
        dct = self.model_dump()
        dct["usage"] = self._raw_response.usage.model_dump()
        return dct



# We'll use a different client for async calls
# To highlight the difference and how we can use both
aclient = instructor.patch(AsyncOpenAI())


async def expand_query(
    q, *, model: str = "gpt-3.5-turbo", temp: float = 0
) -> Query:
    return await aclient.chat.completions.create(
        model=model,
        temperature=temp,
        response_model=Query,
        messages=[
            {
                "role": "system",
                "content": f"You're a query understanding system for the Metafor Systems search engine. Today is {date.today()}. Here are some tips: ...",
            },
            {"role": "user", "content": f"query: {q}"},
        ],
    )

In [24]:
import asyncio
import time
import pandas as pd
import wandb

model = "gpt-3.5-turbo"
temp = 0

run = wandb.init(
    project="llmeng-1-final",
    config={"model": model, "temp": temp},
)

test_queries = [
    "latest developments in artificial intelligence last 3 weeks",
    "renewable energy trends past month",
    "quantum computing advancements last 2 months",
    "biotechnology updates last 10 days",
]
start = time.perf_counter()

queries = await asyncio.gather(
    *[expand_query(q, model=model, temp=temp) for q in test_queries]
)
duration = time.perf_counter() - start

with open("schema.json", "w+") as f:
    schema = Query.model_json_schema()
    json.dump(schema, f, indent=2)

with open("results.jsonlines", "w+") as f:
    for query in queries:
        f.write(query.model_dump_json() + "\n")

df = dicts_to_df([q.report() for q in queries])
df["input"] = test_queries
df.to_csv("results.csv")


run.log({"schema": wandb.Table(dataframe=pd.DataFrame([{"schema": schema}]))})

run.log(
    {
        "usage_total_tokens": df["usage_total_tokens"].sum(),
        "usage_completion_tokens": df["usage_completion_tokens"].sum(),
        "usage_prompt_tokens": df["usage_prompt_tokens"].sum(),
        "duration (s)": duration,
        "average duration (s)": duration / len(queries),
        "n_queries": len(queries),
    }
)


run.log(
    {
        "results": wandb.Table(dataframe=df),
    }
)

files = wandb.Artifact("data", type="dataset")

files.add_file("schema.json")
files.add_file("results.jsonlines")
files.add_file("results.csv")


run.log_artifact(files)
run.finish()

🍩 https://wandb.ai/edbertkwesi-ek-unilever/llmeng-1-final/r/call/01968290-ac2f-71a1-bbab-f647f7295d23🍩 https://wandb.ai/edbertkwesi-ek-unilever/llmeng-1-final/r/call/01968290-ac38-7cf1-a3c4-e02d1e3085ec
🍩 https://wandb.ai/edbertkwesi-ek-unilever/llmeng-1-final/r/call/01968290-ac27-7501-8a43-a274f4136974

🍩 https://wandb.ai/edbertkwesi-ek-unilever/llmeng-1-final/r/call/01968290-ac40-7421-8ed6-8195143528fd


InstructorRetryException: Connection error.

## Create a W&B Report

After logging your experiments, create a [W&B Report](https://docs.wandb.ai/guides/reports/create-a-report) and document your findings. Copy the link to your report into a text file and submit it as the final project assignment in our course platform. Good luck!