
# Section 2: Article Analyzer

This Colab notebook analyzes the tarrif article using OpenAI's **Responses API** .
It demonstrates **structured outputs** (JSON Schema) so you can convert model judgements into clean data (e.g., political slant, sentiment, and article quality).

### What you'll learn
- How to call the Responses API with `response_format={"type":"json_schema", ... , "strict": true}`.
- How to design a robust extraction prompt with clear rubrics to reduce ambiguity.
- How to parse and store the model's JSON into a pandas DataFrame for later analysis.



## 1) Setup

In [36]:
# @title
#@ Title
import requests
import json, polars as pl
import openai
from openai import OpenAI
import os
from google.colab import userdata
from tqdm import tqdm

OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

if not OPENAI_API_KEY:
    try:
        import getpass
        OPENAI_API_KEY = getpass.getpass("Enter your OpenAI API key (input hidden): ").strip()
    except Exception:
        raise ValueError("Unable to capture API key input. Set os.environ['OPENAI_API_KEY'] manually.")

if not OPENAI_API_KEY:
    raise ValueError("Missing API key. Please provide a valid OpenAI API key.")

# Make the key available to the SDK
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = OPENAI_API_KEY

client = OpenAI(api_key=OPENAI_API_KEY)



## 2.0) Working with text in Python

- A useful Python trick for combining strings is using the `+` sign.

For example:



In [None]:
text = "Hello " + "World"

print(text) # -> "Hello World"

In [None]:
name = "Calvin"

print("Hello " + name) # --> "Hello Calvin#

## 2.1) Analyze the article using the Responses API

### First, lets read in the article text. You don't need to understand how this works, but exposed so you can see the article text is stored in a variable called `article_text`


In [None]:
URL = "https://raw.githubusercontent.com/calisley/dpi-681/refs/heads/main/section-2/article.txt"
res = requests.get(URL, timeout=30)
res.raise_for_status()
res.encoding = res.encoding or "utf-8"
article_text = res.text

print(f"Loaded from URL: {URL} ({len(article_text):,} chars)")


### Here, use the prompt engineering tricks we discussed in class to get the model to tell you something about the article

- We use that string concatenation trick here to add it to our input prompt

Play around with a few different system prompts and be ready to discuss what you found.

In [None]:

resp = client.responses.create(
    model="gpt-5-mini",
    instructions="[Your system prompt here]",
    input="Analyze this article" + article_text #String trick we learned
)

print(resp.output_text)


## 3) Interlude on Polars and Loops

### Example: Basic DataFrame

In [None]:
data = {
    "name": ["Alice", "Bob"],
    "age": [25, 30]
}

df = pl.DataFrame(data)
print(df)

Example: Reading in some data

In [None]:
URL = "https://raw.githubusercontent.com/calisley/dpi-681/refs/heads/main/section-2/articles.csv"
article_df = pl.read_csv(URL)
print(article_df.head()) #prints the first 5 rows

Example: Basic Loop

In [None]:
numbers = [1, 2, 3, 4]
for num in numbers:
    print(num)

Example: Looping over a dataframe

> Important: Note how we extracted the "name" value for each row (`row['name']`) Think of this like each row being a dictionary, where the key is the column name and the value is the value you want.

In [None]:
df = pl.DataFrame({
  "name": ["Alice", "Bob"],
  "age": [25, 30]
})

for row in df.iter_rows(named=True):
    print(row["name"])

Example: Looping over our article dataframe

In [None]:
for row in article_df.iter_rows(named=True):
    print(row["truncated_article"]) # Here, we extract the truncated article by its column name as shown above


## 3) Large Scale Data Analysis



In [None]:
outputs = []

for row in tqdm(
    article_df.iter_rows(named=True),
    total=article_df.height,          # total rows for an accurate bar
    desc="Analyzing articles"
):
    resp = # put your API request here (seen above in section 1)
    # for speed, I would use gpt-5-nano, not gpt-5-mini
    outputs.append(resp.output_text)   # <- collect one result per row

# Add results as a new column
article_df = article_df.with_columns(pl.Series("output", outputs))

print(article_df.head())

## 3.1) If time, try to vibe code some analysis of this data.

Ask Gemini to generate you a plot of your data.