# Part 2: Structured Output Generation

Second part of the "AI Starter Pack JavaScript" NICAR25 class - let's try out structured output generation!

Structured output generation involves giving an LLM a JSON schema that it will follow when generating output. Instead of replying back with english or "Sure thing!", it will instead return the exact JSON parse-able schema that you care about. 

Structured output generation is great for:

- Natural language processing
- Extracting data out of super messy data
- Fake data generation

Though I wouldn't trust structured outputs for *real* data, like 'Return the FIPS code for Kern County' or "return a list of current Lakers players and their season scores"

## Loading Environment variables for OpenAI

Copy+paste the `.env.sample` file to a new `.env` file, then paste in the OpenAI key that I will share during the class.

In [None]:
import "jsr:@std/dotenv/load";

## Sample: Hitting the OpenAI API with the Vercel AI SDK

In [None]:
import { generateText } from "npm:ai";
import { openai } from "npm:@ai-sdk/openai";

const {text} = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: "Short haiku about a lonely mountain",
});
console.log(text);

In [None]:
// How do we get JSON back? Can we just ask for it?
const {text} = await generateText({
  model: openai("gpt-4o-mini"),
  prompt: "JSON of a person with name and age",
});
console.log(text);

In [None]:
JSON.parse(text); // :(

## Solution: `generateObject()` with `zod`

In [None]:
import { generateObject } from "npm:ai";
import { z } from "npm:zod";

const PersonSchema = z.object({
  name: z.string(),
  age: z.number(),
});

const {object} = await generateObject({
  model: openai("gpt-4o-mini"),
  prompt: "JSON of a person with name and age",
  schema: PersonSchema
});

console.log(object);
console.log(object.name);
console.log(object.age);

## Now let's cook with campaign emails

In [None]:
import {Database} from "jsr:@db/sqlite";
const db = new Database("dwillis-emails.db");

for (const { rowid, body } of db.sql`select rowid, body from emails_raw limit 10`) {
  console.log(rowid, body.substring(0, 100));
}

In [None]:
const PROMPT = `
    Parse the following political email and return a JSON object with the following schema:
    
    "committee": Name of the committee in the disclaimer that begins with 'Paid for by'  \
    but does not include 'Paid for by', the committee address, or the treasurer name. 
    Should be null if not present.
    
    "sender" which is the name of the person, if any, mentioned as the author of the email. 
    Should be null if not present.
    
    Do not include any other text, no yapping.
`;

const EmailSchema = z.object({
  committee: z.string().nullable(),
  sender: z.string().nullable(),
});

for (const { rowid, body } of db.sql`select rowid, body from emails_raw limit 3`) {
  const { object } = await generateObject({
    model: openai("gpt-4o-mini"),
    prompt: PROMPT + body,
    schema: EmailSchema,
  });
  console.log(object);
}
