# JSON Mode

When building AI applications, especially agentic systems, you often need the model to generate more than just free-form text responses. Much more capable systems are possible when you can generate data in a format that your application can parse, process, and act upon. This is where JSON Mode comes in - a simple yet powerful technique for getting structured outputs from language models.

In this lesson, you'll learn how to enable JSON Mode in the OpenAI API, how to prompt models effectively to produce valid JSON, and how to process JSON lists and dictionaries to build more complex workflows. By the end, you'll have practical experience with a fundamental technique used in many advanced AI applications.

## Setup

First, let's set up our environment with the necessary imports.

In [None]:
from openai import OpenAI

client = OpenAI()

Before we dive into AI-generated JSON, let's do a super quick refresher on parsing JSON strings in Python.

The `json_string` below is a string of valid JSON that needs to be converted into a Python dictionary before we can access the data inside. For this purpose, we import the `json` module and use its method `.loads()`, passing in the string. Execute the cell below to see it in action.

In [None]:
import json

json_string = """{{"data": [1,2,3]}}"""

parsed_json = json.loads(json_string)
print(parsed_json['data'])

It sure would be nice if we could generate JSON with an LLM that we could parse using `.loads()` into valid Python, right?

While models are getting better and better at following instructions, they do still mess this up if we don't use all the tools at our disposal. For instance, execute the following cell and see what the model outputs when we request the answer in JSON.

In [None]:
def get_completion(prompt, model="gpt-4o"):
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        seed=42,
    )
    return response.choices[0].message.content

get_completion('Who won the world series in 2020? Please respond in JSON, with "winner" as the key and the winner as the value.')

See the problem? LLMs often default to adding markdown to outputs. As a result, our JSON is forced inside a markdown code block, which would cause an error if we tried to load it with `json.loads()`.

We'll now learn a few techniques to help guarantee the model outputs valid JSON when we need it to. The two tools we'll use to accomplish this are:

- adding instructions to the system prompt ensuring the output is JSON
- enabling OpenAI's JSON Mode in the Chat Completions API

While JSON Mode is a unique feature in the OpenAI Chat Completions API, other frontier models will also benefit from the system prompt technique we'll describe.

If a system prompt is not available in your LLM of choice, you can also add these instructions to the main prompt body.

Below is the same code as the call we executed above, with a few changes:

- We've added a system prompt field instructing the model to output valid JSON without any markdown backticks
- We've added an argument for `response_format` that sets the type of output to `json_object`.

One important thing to note when specifying JSON object as the desired response format is that the prompt must contain "JSON" in the request or the call to the API will return an error. This is just an extra safeguard from OpenAI to ensure we get what we're asking for. Since we're including "JSON" in both our system prompt and user prompt, we'll be fine.

Execute the code below and see if we get valid JSON in answer to our question.

In [None]:
def get_demo_json_completion(system_prompt, prompt):
    system_prompt = "You are a helpful assistant designed to output JSON. Do not include any markdown backticks in your response."
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
        seed=42,
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content

answer = get_demo_json_completion('Who won the world series in 2020? Please respond in JSON, with "winner" as the key and the winner as the value.')
json_answer = json.loads(answer)
print(json_answer['winner'])

Success! Now you try. Below is a prompt that will not output valid JSON if sent to the model as-is. Make the same changes that we made to the prompt above so that it successfully parses and displays the data in the JSON.

Execute the code below and see if we get valid JSON in answer to our question.

## Checkpoint 1/3

Add a system prompt instructing the model to output valid JSON without markdown backticks and add `"json_object"` as the type of `response_format` when the API is called. Then parse the JSON using `json.loads()`, assigning the result to the variable `json_output`.

In [None]:
def get_json_completion(prompt):
    system_prompt = ## YOUR SOLUTION HERE ##
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
        seed=42,
        ## YOUR SOLUTION HERE ##
    )
    return response.choices[0].message.content

answer = get_json_completion("""Output a JSON object with four keys: "name", "age", "city", and "book".
Make their values the name, age, and city of a character from a book, and the book title.""")
json_output = ## YOUR SOLUTION HERE ##
print('NAME: ', json_output['name'])
print('AGE: ', json_output['age'])
print('CITY: ', json_output['city'])
print('BOOK: ', json_output['book'])

Now that we've got the basics down, let's explore some ways to use this tool to enable interesting structures for agentic systems.

The first such structure we'll explore involves prompting items in a list. Imagine you want the model to generate a list of ideas for possible blog posts, with a draft of a 'hook' paragraph for each. Instead of attempting to do it all in one prompt—which may degrade quality, as the model is forced to attend to many different ideas at once—you can iterate across a list of ideas and prompt the model for only one draft at a time. Let's try it out.

## Checkpoint 2/3

Your task will be to complete the function `generate_blog_ideas` and correctly parse and prepare its result. The function will be used to generate valid JSON in the form of `{"data": ["list", "of", "blog ideas"]}`.

When the list of ideas is successfully extracted and assigned to `ideas_list`, we iterate through each item in that list, prompting the model for a 'hook' paragraph to start off a blog post about that topic. Note that you are expected to pass your own topic to the `generate_blog_ideas` function call. Make it a short string of one or two words.

Be sure to instruct the model to output JSON in the system prompt and set the `response_format` parameter to `{"type": "json_object"}`. Complete the messages list such that the system prompt and prompt are both passed into their appropriate positions.

Then, pass your own blog topic to the `generate_blog_ideas` function. Parse the JSON using the method we used earlier. And extract the value of the `"data"` key and assign it to `ideas_list`. Then execute the function and see the model write the hooks for three different blog post ideas.

In [None]:
def generate_blog_ideas(topic):
    system_prompt =
    prompt = f"""
    <instructions>Generate a list of 3 blog post ideas related to the topic of '{topic}'.</instructions>
    <format>Each idea should be a string in the JSON field 'data'.</format>
    """
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": }, ## YOUR SOLUTION HERE ##
                  {"role": "user", "content": }], ## YOUR SOLUTION HERE ##
        seed=42,
        response_format={"type": "json_object"})
    
    return response.choices[0].message.content

ideas = generate_blog_ideas() ## YOUR SOLUTION HERE ##
json_ideas = ## YOUR SOLUTION HERE ##
ideas_list = ## YOUR SOLUTION HERE ##

for idea in ideas_list:
    draft = f"""
    <instructions>Generate the opening "hook" paragraph for a blog post about the following idea:</instructions>
    <style>
    The hook should be a single paragraph that is no more than 100 words. It should be written in a way that is engaging and interesting to the reader.
    Write at an eighth grade reading level.
    </style>
    <idea>{idea}</idea>
    """
    draft_prompt = f"<topic>{idea}: {idea}"
    draft = get_completion(draft_prompt)
    print(draft)
    print('---')

Lists aren't the only way to interact with JSON data in agentically interesting ways. In our final checkpoint, let's see what dictionaries can accomplish.

One reliable use case of LLMs is in condensing and adding structure to large texts. Let's use JSON Mode to extract structured data from an email text. We'll then format the resulting dictionary to prepare it for upload into a database.

## Checkpoint 3/3

This exercise will be a little more open-ended. Given an email text, write a prompt that will extract data from the email and assign it to the variable `email_data`. The data must be in the following shape:

In [None]:
email_data = {
    "customer_info": {
        "name": "name from email here",
        "customer_id": "id from email here",
        "contact": {
            "email": "email address from email here",
            "phone": "phone number from email here"
        }
    },
    "email_summary": "email summary here",
    "order_details": {
        "order_number": "order number from email here",
        "issues": ["list", "of", "everything wrong with order here"]
    }
}

We'll provide the `email_text` below and you'll write the code to extract this data via the LLM. Be sure to assign the resulting Python dictionary to `email_data` at the bottom of the cell.

One technique that will immensely help your outputs here is demonstrating to the model the shape of the JSON object you expect to be returned. Remember that curly braces can be included in f-strings by doubling them up. For example:

In [None]:
json_f_string = f"""
Make your output look like this:
{{
    "data": "big json object"
}}
"""

In [None]:
email_text = """
Subject: Urgent Issue with Recent Order #45789 and Website Navigation Problems

Dear Customer Support Team,

I hope this email finds you well. My name is Sarah Johnson, and I'm writing to express my frustration with several issues I've encountered with my recent order and your website.

First, I placed order #45789 on March 15th for the Premium XL Blender (Model BX2000) in blue, but I received the standard model in red instead.

Second, I've been trying to access my account on your website for the past three days, but I keep getting an error message saying "Server timeout" whenever I try to log in.

I also noticed that your company website has a slight typo on the homepage - it says "Our prodcuts are the best" instead of "Our products are the best." The navigation menu seems broken too.

On a slightly more positive note, I did appreciate the quick email response from your automated system acknowledging my order. The tracking information was helpful while it worked.

I've been a loyal customer for over five years and have recommended your products to many friends and family members. However, this experience has been disappointing.

Please address these issues as soon as possible. I expect either the correct product to be shipped within the next week or a full refund including shipping costs.

If I don't hear back within 48 hours, I will unfortunately need to contact my credit card company to dispute the charges and file a complaint with the Better Business Bureau.

Thank you for your prompt attention to this matter.

Sincerely,
Sarah Johnson
Customer ID: J29875
Phone: (555) 123-4567
Email: sarah.johnson@email.com
"""

In [None]:
def get_json_completion(prompt):
    system_prompt = "You are a helpful assistant that outputs JSON. Do not include any markdown backticks in your response."
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}],
        seed=42,
        response_format={"type": "json_object"})
    
    return response.choices[0].message.content

# Create a prompt that asks the LLM to extract structured data from the email
prompt = f"""
## YOUR SOLUTION HERE ##
"""

# Get the JSON response from the LLM
json_response = get_json_completion(prompt)

# Parse the JSON string into a Python dictionary
email_data = json.loads(json_response)
print(email_data)