# BrainTrust Classificcation Tutorial (Article Titles)

Welcome to [BrainTrust](https://www.braintrustdata.com/)! This tutorial will teach you the basics of working with BrainTrust to evaluate a text classification use case, including creating a project, running experiments, and analyzing their results.

Before starting, please make sure that you have a BrainTrust account. If you do not, please [sign up](https://www.braintrustdata.com) or [get in touch](mailto:info@braintrustdata.com). After this tutorial, feel free to dig deeper by visiting [the docs](http://www.braintrustdata.com/docs).

In [1]:
# NOTE: Replace YOUR_OPENAI_KEY with your OpenAI API Key and YOUR_BRAINTRUST_API_KEY with your BrainTrust API key. Do not put it in quotes.
%%capture
%env OPENAI_API_KEY="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"
%env BRAINTRUST_API_KEY="sk-GxjrosiMYn0pAtBlWqdrBnQr14zqIi3HZDCvSH462fwV1w0V"
%env BRAINTRUST_API_URL="http://localhost:3000"

UsageError: Line magic function `%%capture` not found.


We'll start by installing some dependencies, setting up the environment, and downloading the GitHub issues.

In [2]:
%pip install braintrust guidance openai datasets

Note: you may need to restart the kernel to use updated packages.


In [3]:
import asyncio
import braintrust
import guidance
import openai
import time

from datasets import load_dataset

MODEL = "gpt-3.5-turbo"
guidance.llm = guidance.llms.OpenAI(MODEL)

# Load dataset from  Huggingface.
dataset = load_dataset("ag_news", split="train")

# Shuffle and trim to 20 datapoints. Restructure our dataset
# slightly so that each item in the list contains the title
# itself ("text") and the expected category index ("label").
trimmed_dataset = dataset.shuffle()[:20]
articles = [{
    "text": trimmed_dataset["text"][i],
    "label": trimmed_dataset["label"][i],
    } for i in range(len(trimmed_dataset["text"]))]

# Extract category names from the dataset and build a map from index to
# category name. We will use this to compare the expected categories to
# those produced by the model.
category_names = dataset.features['label'].names
category_map = dict([i for i in enumerate(category_names)])

ModuleNotFoundError: No module named '_lzma'

## Writing the initial prompts

Let's analyze the first example, and build up a prompt for categorizing a title. We're using a library called [Guidance](https://github.com/microsoft/guidance), which makes it easy to template prompts and cache results. With BrainTrust, can use any library you'd like -- Guidance, LangChain, or even just direct calls to an LLM.

The prompt provides the article's title to the model, and asks it to generate a category.

In [None]:
issue = issues[0]

one_article = articles[0]
print(one_article["text"])
print(one_article["label"])

In [None]:
classify_article = guidance('''
{{#system~}}
You are an editor in a newspaper who helps writers identify the right category for their news articles,
by reading the article's title. The category should be one of the following: World, Sports, Business
or Sci/Tech. Reply with one word corresponding to the category.
{{~/system}}

{{#user~}}
Article title: {{article_title}}
{{~/user}}

{{#assistant~}}
{{gen 'category' max_tokens=500}}
{{~/assistant}}''')

out = classify_article(one_article["text"])

## Running across the dataset

Now that we have automated the process of classifying titles, we can test the full set of issues. This block uses Python's async features to generate and grade in parallel, effectively making your OpenAI account's rate limit the limiting factor.

As it runs, it compares the generated category to the expected one from the dataset. Once this loop completes, you can view the results in BrainTrust.

In [None]:
async def evaluate_article(article):
  title = article["text"]
  category = classify_article(article_title=article["text"])["category"].strip().lower()
  expected = category_map[article["label"]].strip().lower()
  return (title, expected, category)


async def run_on_all_articles():
    start = time.time()
    tasks = [asyncio.create_task(evaluate_article(article)) for article in articles]
    category_grades = [await t for t in tasks]
    end = time.time()
    print("Took", end - start, "seconds")
    return category_grades


def analyze_experiment(data):
  for (title, expected, category) in categories:
    experiment.log(
        inputs={"title": title},
        output=category,
        expected=expected,
        scores={"match": 1 if category == expected else 0},
    )

  print(experiment.summarize())

# This line assumes there is an async event loop initialized but that the
# notebook is not running inside of an async task. This is true on Google Colab,
# but other notebook environments may vary. If you see errors while trying this
# code in a different tool, try changing this to `await run_on_all_issues()`, or
# initializing an event loop above with `asyncio.get_event_loop()`.
categories = asyncio.run(run_on_all_articles())
experiment = braintrust.init(
  project="classify-article-titles",
  experiment="original-prompt"
)
analyze_experiment(categories)

## Pause and analyze the results in BrainTrust!

The cell above will print a link to the BrainTrust experiment. Go check it out (NOTE: it may take up to a minute to synchronize the data for viewing).