---
title: "Is Pydantic making your LLM dumber?"
date: "09/15/2024"
date-modified: last-modified
description-meta: "Pydantic is a powerful tool for data validation, but can it make your LLM dumber?"
toc: true
toc-depth: 3
lightbox: true
fig-cap-location: margin
categories:
  - llm
  - openai
  - pydantic
  - python
author:
  - name: Dylan Castillo
    url: https://dylancastillo.co
    affiliation: Iwana Labs
    affiliation-url: https://iwanalabs.com
citation: true
comments:
  utterances:
    repo: dylanjcastillo/blog_comments
    theme: dark-blue
    issue-term: pathname
---

In [168]:
import pandas as pd

from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

df = pd.read_json("../data/live_bench/coding/question.jsonl", lines=True, orient="records")
df = df.query("livebench_release_date == '2024-06-24'")
df.shape

(88, 14)

In [169]:
df.task.unique()

array(['coding_completion', 'LCB_generation'], dtype=object)

In [170]:
# Verify there's a single turn in all rows
assert all(len(turns) == 1 for turns in df.turns), "There should be a single turn in all rows"

In [171]:
#| output: false
#| echo: false

# Extract the full prompt from the turns
df["prompt"] = df.turns.apply(lambda x: x[0]).str.strip()

# Split prompt by ### and extract the first part
df["first_message"] = df.prompt.str.split("###").str[1].str.strip()

In [172]:
#| output: false
#| echo: false

with pd.option_context('display.max_colwidth', None):
    display(df[["task", "first_message"]].drop_duplicates().T)

Unnamed: 0,0,38
task,coding_completion,LCB_generation
first_message,"Instructions: You are an expert Python programmer. You will be given a question (problem specification) and the first lines of Python solution to this problem, and will write in Python the remaining lines of the program to produce a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the second part of the program that you wrote.",Instructions: You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the program.


In [173]:
df["original_json"][0]

{'question_title': 'apply-operations-to-maximize-frequency-score',
 'question_content': 'You are given a 0-indexed integer array nums and an integer k.\nYou can perform the following operation on the array at most k times:\n\nChoose any index i from the array and increase or decrease nums[i] by 1.\n\nThe score of the final array is the frequency of the most frequent element in the array.\nReturn the maximum score you can achieve.\nThe frequency of an element is the number of occurences of that element in the array.\n \nExample 1:\n\nInput: nums = [1,2,6,4], k = 3\nOutput: 3\nExplanation: We can do the following operations on the array:\n- Choose i = 0, and increase the value of nums[0] by 1. The resulting array is [2,2,6,4].\n- Choose i = 3, and decrease the value of nums[3] by 1. The resulting array is [2,2,6,3].\n- Choose i = 3, and decrease the value of nums[3] by 1. The resulting array is [2,2,6,2].\nThe element 2 is the most frequent in the final array so our score is 3.\nIt can b

In [174]:
full_message_coding_completion = """
### Instructions: You are an expert Python programmer. You will be given a question (problem specification) and the first lines of Python solution to this problem, and will write in Python the remaining lines of the program to produce a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the second part of the program that you wrote.

### Question: {question}

### Format: You will use the following starter code to write the solution to the problem. 

```python
{starter_code}
```

Only write the missing portion of the code, not the entire code. Be very careful to match appropiate indentation. Directly appending your code after the partial code should produce a correct completion solution. You will write the answer using the following JSON format:
{{ "answer": "<answer>" }}
"""

full_message_LCB_generation = """
### Instructions: You are an expert Python programmer. You will be given a question (problem specification) and will generate a correct Python program that matches the specification and passes all tests. You will NOT return anything except for the program.

### Question: {question}

### Format: Read the inputs from stdin solve the problem and write the answer to stdout (do not directly test on the sample inputs). You will write the answer using the following JSON format:
{{ "answer": "<answer>" }}
""".strip()

In [175]:
def create_full_message(original_json, question_type):
    question = original_json["question_content"]
    if question_type == "coding_completion":
        return full_message_coding_completion.format(question=question, starter_code=original_json["starter_code"])
    elif question_type == "LCB_generation":
        return full_message_LCB_generation.format(question=question)
    else:
        raise ValueError(f"Invalid question type: {question_type}")

In [176]:
questions = []
for index, row in df.iterrows():
    questions.append(create_full_message(row.original_json, row.task))

In [226]:
import json

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": questions[0]}
    ],
    response_format={"type": "json_object"}
)
response_data = json.loads(response.choices[0].message.content)

In [234]:
question_number = 0
func_name = json.loads(df.iloc[question_number].original_json["metadata"])["func_name"]
input_args = json.loads(df.iloc[question_number].public_test_cases)[0]["input"].split("\n")
output = eval(json.loads(df.iloc[question_number].public_test_cases)[0]["output"])
args = [eval(arg) for arg in input_args]

exec(df.iloc[question_number].original_json["starter_code"] + "\n" + response_data["answer"])

arg_strs = []
for arg in args:
    if isinstance(arg, str):
        arg_strs.append(f"'{arg}'")
    else:
        arg_strs.append(str(arg))

exec(f"a = Solution()\nresult = a.{func_name}({', '.join(arg_strs)})\nassert result == {output}")

NameError: name 'List' is not defined

In [225]:
df.iloc[0]["public_test_cases"]

'[{"input": "[1, 2, 6, 4]\\n3", "output": "3", "testtype": "functional"}, {"input": "[1, 4, 4, 2, 4]\\n0", "output": "3", "testtype": "functional"}]'