## Structured Output: Guaranteed Responses

Structured output refers to the ability of the `ChatCompletions` API to return responses in a predefined format, such as a JSON object or a Pydantic Model. This is particularly useful when you need the model to adhere to a specific schema for downstream processing or integration with other systems. By defining the expected structure, you can ensure the response is validated and parsed into a predictable format. 

Key Features of Structured Outputs

1. Customizable Response Format
    - You can specify the expected structure of the response using the response_format parameter.
    - This can be defined as either a JSON schema or a Pydantic model, depending on your requirements.
2. Using JSON Schema with create:
    - The `chat.completions.create` method allows you to provide a JSON schema via the `response_format` parameter.
    - This guides the model to generate responses in the desired structure without requiring Python-based schema definitions.
3. Using Pydantic Models with parse
    - The `chat.completions.parse` method supports validation and parsing using Pydantic models.
    - This is ideal for scenarios where you need Python-based schema definitions and strict adherance to the structure.

### Setting up Structured Output

In [14]:
from pydantic import BaseModel

# Define the expected structure of the response
class ParsedSentence(BaseModel):
    subject: str
    verb: str
    obj: str

In [15]:
# Make the request to extract parts of a simple sentence
response = client.chat.completions.parse(
    model="deepseek/deepseek-chat-v3-0324:free",
    messages=[
        {
            "role": "system",
            "content": "Extract the grammatical components from the sentence.",
        },
        {"role": "user", "content": "The cat chased the mouse."},
    ],
    response_format=ParsedSentence, # This response_format parameter is the key to StructuredOut
)

In [16]:
print(response.choices[0].message)

ParsedChatCompletionMessage[ParsedSentence](content='{"subject": "The cat", "verb": "chased", "obj": "the mouse"}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=ParsedSentence(subject='The cat', verb='chased', obj='the mouse'), reasoning=None)


🔔 Question: How can we extract our parsed message from this `ParsedChatCompletionMessage` Object? What fields can you see?

In [90]:
# To extract our Structured Response
print(response.choices[0].message.parsed)

print("Subject:", response.choices[0].message.parsed.subject)
print("Verb:", response.choices[0].message.parsed.verb)
print("Obj:", response.choices[0].message.parsed.obj)

subject='The cat' verb='chased' obj='the mouse'
Subject: The cat
Verb: chased
Obj: the mouse


### 🥊 [Final Challenge]: Putting it all togther - Classification using Structured Output

The goal of this final challenge is to combine everything you've learned to build a reliable, end-to-end workflow for structured data extraction. You will take a new narrative from an essential worker and use a Pydantic model to extract key themes in a validated, structured format.

Your Goal:
Using the DeepSeek API, your task is to create a Python script that takes the provided new_narrative as input and produces the expected_output by:
1. **Defining a Pydantic Model**: Create a BaseModel that accurately represents the structure of the expected_output.
2. **Building the Prompt**: Construct a messages list that includes the system prompt, a few-shot example, and the final user message containing the new_narrative.
3. **Making the API Call**: Use the client.chat.completions.create method to call the DeepSeek model.
4. **Validating the Response**: Use your Pydantic model to validate and parse the raw JSON string returned by the LLM.

#### The Input

This is the narrative you will be using as input for the LLM.

```python
new_narrative = """
In the quiet of the night, I'd mop floors at the hospital, the only sound the soft swish of the bucket. Patients came and went, doctors hurried past. Sometimes, they'd look right through me. But I'd always tell myself: someone has to keep this place clean for the healers to do their healing. That thought got me through the loneliest shifts.
"""

#### The Expected Output
Your script should produce a Pydantic object that, when printed, looks like this. This is your target.

```python
ThematicAnalysis(emotion=['loneliness', 'dedication'], material_conditions=['hospital cleaning'], solidarity='present', theme='invisibility of labor and pride')```

Hint: You will need to define the ThematicAnalysis Pydantic class to have fields for emotion, material_conditions, solidarity, and theme, just like the previous examples. Be sure to use the correct data types and a Literal type for the solidarity field. Good luck!

In [None]:
import json
from openai import OpenAI
from pydantic import BaseModel
from typing import List, Literal, Optional

# (Assuming client is already initialized)
# Intialize Client
client = OpenAI(
  base_url="https://openrouter.ai/api/v1", 
  api_key=API_KEY,
)

# The new narrative to analyze
new_narrative = """
In the quiet of the night, I'd mop floors at the hospital, the only sound the soft swish of the bucket. Patients came and went, doctors hurried past. Sometimes, they'd look right through me. But I'd always tell myself: someone has to keep this place clean for the healers to do their healing. That thought got me through the loneliest shifts.
"""

# Step 1: Define the Pydantic class here to match the desired output structure.
# Hint: Your class should have fields for 'emotion', 'material_conditions', 'solidarity', and 'theme'.
#       Remember to use the correct data types!
#
#       You may need to look up how to allow list datatypes in Pydantic. 
class ThematicAnalysis(BaseModel):
    emotion: List[str]
    material_conditions: List[str]
    solidarity: bool
    theme: str
    

# Step 2: Build the few-shot prompt.
# Hint: The messages list needs a system prompt, a user/assistant example pair,
#       and the final user message with the new narrative.
#
messages = [
    {"role": "system", "content": "You are an expert qualitative researcher. Analyze the following narrative and extract key themes."},
    {"role": "user", "content": new_narrative}
]


# Step 3: Make the API call to the DeepSeek model using `.parse`.
# Hint: The `.parse` method handles the validation and returns a Pydantic object directly.
#
parsed_analysis = client.chat.completions.parse(
    model="deepseek/deepseek-chat-v3-0324:free",
    messages=messages,
    response_format=ThematicAnalysis
)


# # The `parsed_analysis` variable now holds a validated Pydantic object!
print(parsed_analysis)

ParsedChatCompletion[ThematicAnalysis](id='gen-1755899251-sAqab8yxdqoknw0JL0dx', choices=[ParsedChoice[ThematicAnalysis](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[ThematicAnalysis](content='{\n  "emotion": [\n    "Loneliness",\n    "Resilience"\n  ],\n  "material_conditions": [\n    "Hospital environment",\n    "Night shift work"\n  ],\n  "solidarity":  false,\n  "theme": "Invisible labor"\n}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=ThematicAnalysis(emotion=['Loneliness', 'Resilience'], material_conditions=['Hospital environment', 'Night shift work'], solidarity=False, theme='Invisible labor'), reasoning=None), native_finish_reason='stop')], created=1755899252, model='deepseek/deepseek-chat-v3-0324:free', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=67, prompt_tokens=101, total_tokens=168, completion_tokens_details=N

In [22]:
print(parsed_analysis.choices[0].message.parsed)

emotion=['Loneliness', 'Resilience'] material_conditions=['Hospital environment', 'Night shift work'] solidarity=False theme='Invisible labor'


## Conclusion: What We've Learned

In this workshop, you've moved from the conceptual understanding of LLM APIs to hands-on, programmatic control. You now have the foundational knowledge to build powerful, automated workflows with language models.

Here’s a quick summary of the key concepts you’ve mastered:

1.  **API Mechanics:** You understand that LLM APIs are **stateless**. To simulate memory and maintain context, you are **responsible for managing the message history** yourself by sending a full list of messages with each new request.
2.  **The Power of Roles:** You know how to use the `system`, `user`, and `assistant` roles to give the model instructions, provide it with your prompts, and capture its responses. The `system` role is particularly powerful for setting high-level rules and persona.
3.  **Zero-Shot vs. Few-Shot Prompting:** You've seen how **zero-shot** prompting is great for general tasks but can result in inconsistent output. In contrast, **few-shot** prompting is essential for guiding the model to produce a consistent, predictable format by providing it with a single example or a few examples.
4.  **Structured Output:** You now have the ultimate tool for reliability: **structured output with Pydantic**. By defining a `BaseModel`, you can give the LLM a clear blueprint for its response, and the `.parse` method ensures the output is always a valid, usable Python object. This is the critical step for moving from simple chat to scalable data analysis.

The specific example we worked through of thematic coding of a narrative is just one use case. The most powerful concept you’ve learned today is the idea of using structured output to programmatically define how the LLM should respond. This is a foundational technique that you can apply across all social science workflows, whether you are doing **data extraction**, **summarization**, or **classification**. By providing the LLM with a schema, you gain precise control over its output, making it a reliable and powerful tool for your research.