# A Gentle Introduction to Structured Generation with Anthropic API

This notebook reports the code for the examples in the blog post A Gentle Introduction to Structured Generation with Anthropic API, the first in the series "Building Reproducible LLM Applications".

### Motivating Examples for Structured Generation

Extracting information from unstructured text (e.g., with regular expressions) can be cumbersome and error-prone. Structured generation avoids this pitfall, making the models' outputs easier to parse without manual intervention.

In [None]:
import re
def process_unstructured_response(response: str):
    """Only working for the specific example"""
    # Extract order number
    order_number_match = re.search(r'order #(\d+)', response)
    order_number = order_number_match.group(1) if order_number_match else None

    # Extract status
    status_match = re.search(r"it's currently ([\w\s]+)", response)
    status = status_match.group(1) if status_match else None

    # Extract shipping information
    shipped_match = re.search(r'It was shipped (\w+)', response)
    shipped_date = shipped_match.group(1) if shipped_match else None

    # Extract estimated delivery time
    delivery_match = re.search(r'expected to arrive within (\d+-\d+) (\w+)', response)
    if delivery_match:
        delivery_time = delivery_match.group(1)
        delivery_unit = delivery_match.group(2)
    else:
        delivery_time = None
        delivery_unit = None

    return {
        "order_number": order_number,
        "status": status,
        "shipped_date": shipped_date,
        "estimated_delivery": f"{delivery_time} {delivery_unit}" if delivery_time and delivery_unit else None
    }

# Test the function
response = "I've checked your order #12345, and it's currently in transit. It was shipped yesterday and is expected to arrive within 3-5 business days. Is there anything else I can help you with?"
result = process_unstructured_response(response)
print(result)

{'order_number': '12345', 'status': 'in transit', 'shipped_date': 'yesterday', 'estimated_delivery': '3-5 business'}


### Set Up Environment and Test Anthropic API


Initialize the client with the API key. The key should be stored in a `.env` file.

In [2]:
import anthropic
from dotenv import load_dotenv
import os
load_dotenv()
key = os.environ["ANTHROPIC_API_KEY"]
if key is None:
    raise ValueError("Error: ANTHROPIC_API_KEY not found")
client = anthropic.Anthropic(
    api_key=key,
)

In [7]:
# Test the API key is working with a simple query
response = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    messages=[
        {"role": "user", "content": "What is a JSON schema in a sentence?"}
    ],
    max_tokens=200,
)
print(response)


Message(id='msg_01JP2jU4EmvgYXbpcGSrdxu6', content=[TextBlock(text='A JSON schema is a declarative format for describing the structure, content, and validation rules of JSON data.', type='text')], model='claude-3-5-sonnet-20240620', role='assistant', stop_reason='end_turn', stop_sequence=None, type='message', usage=Usage(input_tokens=16, output_tokens=25))


In [8]:
# Access the model's text response
text_response = response.content[0].text
print(text_response)

A JSON schema is a declarative format for describing the structure, content, and validation rules of JSON data.


### Using System Prompts to Guide Output Formats

First, we define some schemas, in different formats, that we want Claude to respond in.

In [9]:
example_dictionary = {
    "topic": "zip format",
    "citations": [{"citation_number": 1, "source": "https://example.com"}],
    "answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."
}

example_json_string = '{"topic": "zip format", "citations": [{"citation_number": 1, "source": "https://example.com"}], "answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."}'

example_yaml_string = """topic: zip format
citations:
  - citation_number: 1
    source: https://example.com
answer: The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed.
"""
print(example_dictionary)
print(example_json_string)
print(example_yaml_string)


{'topic': 'zip format', 'citations': [{'citation_number': 1, 'source': 'https://example.com'}], 'answer': 'The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed.'}
{"topic": "zip format", "citations": [{"citation_number": 1, "source": "https://example.com"}], "answer": "The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed."}
topic: zip format
citations:
  - citation_number: 1
    source: https://example.com
answer: The .zip format is a compressed file format that groups multiple files into a single archive, with the files inside the archive appearing as if they were not compressed.



Now, we use the Claude API to generate responses in the same format as the examples above.

In [12]:
# Examples are included in the system prompt, so Claude knows the format we want
response_list = []
for example in [example_dictionary, example_json_string, example_yaml_string]:
    response = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        system=f"You are a helpful assistant that responds in the same format as the following example: {example}",
        messages=[
            {"role": "user", "content": "What is a JSON schema in a sentence?"}
        ],
        max_tokens=200,
    )
    response_list.append(response.content[0].text)
    print(f"Claude response: {response.content[0].text}")

Claude response: {
    "topic": "JSON schema",
    "citations": [
        {
            "citation_number": 1,
            "source": "https://json-schema.org/understanding-json-schema/"
        }
    ],
    "answer": "A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data."
}
Claude response: {
  "topic": "JSON schema",
  "citations": [
    {
      "citation_number": 1,
      "source": "https://json-schema.org/understanding-json-schema/"
    }
  ],
  "answer": "A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data."
}
Claude response: topic: JSON schema

citations:
  - citation_number: 1
    source: https://json-schema.org/understanding-json-schema/

answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the str

With structured responses, we can parse the responses into the format we specified before. Information is neatly organized and easy to access.

In [13]:
import json
import yaml
import ast

def parse_response(response, format_type):
    try:
        if format_type == 'dict':
            # WARNING: ast.literal_eval is safer than eval, but still use caution
            return ast.literal_eval(response)
        elif format_type == 'json':
            return json.loads(response)
        elif format_type == 'yaml':
            return yaml.safe_load(response)
    except Exception as e:
        print(f"Error parsing {format_type} response: {e}")
        return None

# Parse and print each response
for response, format_type in zip(response_list, ['dict', 'json', 'yaml']):
    parsed = parse_response(response, format_type)
    if parsed:
        print(f"\nParsed {format_type.upper()} response:")
        print(f"Topic: {parsed.get('topic')}")
        print(f"Citation: {parsed.get('citations')[0] if parsed.get('citations') else 'No citation'}")
        print(f"Answer: {parsed.get('answer')}")
    else:
        print(f"\nFailed to parse {format_type.upper()} response")


Parsed DICT response:
Topic: JSON schema
Citation: {'citation_number': 1, 'source': 'https://json-schema.org/understanding-json-schema/'}
Answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data.

Parsed JSON response:
Topic: JSON schema
Citation: {'citation_number': 1, 'source': 'https://json-schema.org/understanding-json-schema/'}
Answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data.

Parsed YAML response:
Topic: JSON schema
Citation: {'citation_number': 1, 'source': 'https://json-schema.org/understanding-json-schema/'}
Answer: A JSON schema is a declarative language that allows you to annotate and validate JSON documents, defining the structure, constraints, and documentation of JSON data.


We can apply the same approach to a list of file formats, and end up with outputs that are consistent from a structural perspective, and easy to parse.

In [14]:
file_formats = [
    "zip", "tar", "rar", "7z", "iso", "gz", "bz2", "xz", "pdf", "docx"
]

format_info = {}

for format in file_formats:
    response = client.messages.create(
        model="claude-3-5-sonnet-20240620",
        system=f"You are a helpful assistant that responds in the same format as the following example: {example_json_string}",
        messages=[
            {"role": "user", "content": f"What is the {format} file format in one sentence?"}
        ],
        max_tokens=200,
    )
    
    try:
        parsed_response = json.loads(response.content[0].text)
        format_info[parsed_response['topic']] = parsed_response['answer']
    except json.JSONDecodeError:
        print(f"Error parsing response for {format} format")

# Save the dictionary to a JSON file
with open('file_formats_info.json', 'w') as f:
    json.dump(format_info, f, indent=2)

print("File format information has been saved to file_formats_info.json")

File format information has been saved to file_formats_info.json
