# Pydantic AI with Deepseek R1

A small prototype for Pydantic AI agentic workflow.

Ollama client needs to be installed to access the local LLM.

## Simple local call to Ollama

In [4]:
import openai

# Connect to local Ollama instance
client = openai.Client(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)

response = client.chat.completions.create(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "What is Pydantic AI?"}],
    temperature=0.0,
    stop=["<think></think>"],
    max_tokens=500
)

In [5]:
from IPython.display import display, Markdown

def pretty_print_response(response):
    """
    Displays the LLM response in a clean, readable format using Markdown.
    """
    display(Markdown(f"**Response:**\n\n{response}"))

pretty_print_response(response.choices[0].message.content)

**Response:**

<think>

</think>

Pydantic AI refers to the use of Pydantic, a Python library for data validation and serialization, in conjunction with artificial intelligence (AI) techniques. This combination allows for enhanced data handling, validation, and transformation in AI applications.

### Key Components:
1. **Pydantic**: A lightweight library that makes data validation easier by providing tools to define, validate, and serialize data structures.
2. **AI Techniques**: Machine learning models or algorithms that can be applied alongside Pydantic's validation capabilities.

### Applications of Pydantic AI:
1. **Data Cleaning**: AI-driven methods can automate the detection and correction of invalid or inconsistent data using Pydantic's validation rules.
2. **Schema Learning**: Pydantic can work with AI models to learn data schemas from datasets, enabling automatic validation without explicit schema definitions.
3. **Error Analysis**: By integrating AI, Pydantic can analyze why certain data entries are invalid and provide insights for improving data quality.
4. **Transformers**: AI-based transformers can preprocess or postprocess data using Pydantic's structure to ensure compliance with desired formats.

### Benefits:
- **Automation**: Reduces manual effort in data validation.
- **Scalability**: Handles large datasets efficiently.
- **Flexibility**: Combines the strengths of both Python and AI for versatile solutions.

### Use Cases:
- **Financial Services**: Automating validation of financial data to prevent errors or fraud.
- **Healthcare**: Ensuring patient data adheres to strict standards before analysis.
- **E-commerce**: Validating user inputs like orders, ratings, and reviews.

In summary, Pydantic AI leverages the power of Python's Pydantic library with AI techniques to create robust solutions for data validation, transformation, and automation.

## Simple Pydantic AI workflow

[**ollama_example.py**](https://ai.pydantic.dev/models/#example-local-usage)

In [10]:
from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
import nest_asyncio

nest_asyncio.apply()

# Define your structured output
class Explanation(BaseModel):
    summary: str
    python_function: str

# Set up the Ollama-compatible model
ollama_model = OpenAIModel(
    model_name="llama3.1:8b",
    provider=OpenAIProvider(base_url="http://localhost:11434/v1"),
    api_key="ollama"
)

# Create the agent that will handle structured responses
agent = Agent(model=ollama_model, result_type=Explanation)

# Run the agent with a natural language query
result = agent.run_sync(
    """
    Write a Python decorator using a class with a '__call__' method.
    It must not use closures or nested functions.
    """
)
# Access parsed response
print("Summary:", result.data.summary)
print("\nPython Code:\n", result.data.python_function)

# Optional: Token usage
print("\nUsage stats:", result.usage())

Summary: A Python decorator using a class with a '__call__' method

Python Code:
 @property

class Deco:
  def __init__(self, func):
    self.func = func

  def __call__(self):
    return self(func)

def deco(func):return Deco(func)

Usage stats: Usage(requests=1, request_tokens=196, response_tokens=84, total_tokens=280, details=None)


## A more complex Pydantic AI workflow

In [24]:
from retry import retry
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic import BaseModel
from typing import List
import nest_asyncio

nest_asyncio.apply()


# Schema definition
class Subtask(BaseModel):
    name: str
    description: str


class SubtasksList(BaseModel):
    tasks: List[Subtask]


class GeneratedFunction(BaseModel):
    task_name: str
    python_code: str


# Ollama-compatible model
ollama_model = OpenAIModel(
    model_name="llama3.1:8b",
    provider=OpenAIProvider(base_url="http://localhost:11434/v1"),
    api_key="ollama",
)

# Agent using the structured schema
extract_agent = Agent(model=ollama_model, result_type=SubtasksList)


# Retry wrapper
# @retry(tries=5, delay=2, backoff=2)
def extract_subtasks(description: str):
    return extract_agent.run_sync(
        f"""
        You are a task extraction assistant.

        Please respond with JSON structured as:
        {{
        "tasks": [
            {{"name": "...", "description": "..."}},
            ...
        ]
        }}

        Here is the description to analyze: {description}
        """,
        model_settings={"max_retries": 5},
    )


# Description of the task for Agent
description = """
Build a pipeline that loads data from S3, cleans it, applies PCA, and saves it back.
"""

# Print tasks to execute
try:
    result = extract_subtasks(description)
    print("\nExtracted Subtasks:")
    for task in result.data.tasks:
        print(f"- {task.name}: {task.description}")
except Exception as e:
    print("\nStill failed after retries:", e)

# New agent to generate code per task
codegen_agent = Agent(model=ollama_model, result_type=GeneratedFunction)

print("\n🧪 Generated Code Snippets:\n")
for task in result.data.tasks:
    codegen_prompt = (
        f"Write a Python function for the task '{task.name}': {task.description}. "
        "The function should be realistic and self-contained."
    )
    code_result = codegen_agent.run_sync(codegen_prompt)
    print(f"🔧 {code_result.data.task_name}:\n")
    print(code_result.data.python_code)
    print("\n" + "-" * 50 + "\n")


Extracted Subtasks:
- : Load data from S3
- : Clean the data
- : Apply PCA to the cleaned data
- : Save the transformed data back to S3

🧪 Generated Code Snippets:

🔧 Load data from S3:

def load_data_from_s3(bucket_name, file_key):    s3 = boto3.client('s3')    try:        obj = s3.get_object(Bucket=bucket_name, Key=file_key)        data = obj['Body'].read()        return data    except Exception as e:        print(f'Error loading S3 object: {e}')        return None

--------------------------------------------------

🔧 Cleaned Data Function:

import pandas as pd
def clean_data(df):
    # drop missing values
    df = df.dropna()
    # remove duplicates
    df = df.drop_duplicates()
    return df

--------------------------------------------------

🔧 Plain text responses are not permitted, please call one of the functions instead.:

print('Function called successfully.')

--------------------------------------------------

🔧 save_transformed_data_to_s3:

import boto3
bucket_name = 'my