-
Notifications
You must be signed in to change notification settings - Fork 422
Expand file tree
/
Copy pathpydantic.mdx
More file actions
436 lines (348 loc) · 15.1 KB
/
pydantic.mdx
File metadata and controls
436 lines (348 loc) · 15.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
---
title: Comparing Pydantic
---
Pydantic is a popular library for data validation in Python used by most -- if not all -- LLM frameworks, like [instructor](https://github.com/jxnl/instructor/tree/main).
BAML also uses Pydantic. The BAML Rust compiler can generate Pydantic models from your `.baml` files. But that's not all the compiler does -- it also takes care of fixing common LLM parsing issues, supports more data types, handles retries, and reduces the amount of boilerplate code you have to write.
Let's dive into how Pydantic is used and its limitations.
### Why working with LLMs requires more than just Pydantic
Pydantic can help you get structured output from an LLM easily at first glance:
```python
class Resume(BaseModel):
name: str
skills: List[str]
def create_prompt(input_text: str) -> str:
PROMPT_TEMPLATE = f"""Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{input_text}
---
Schema:
{Resume.model_json_schema()['properties']}
Output JSON:
"""
return PROMPT_TEMPLATE
def extract_resume(input_text: str) -> Union[Resume, None]:
prompt = create_prompt(input_text)
chat_completion = client.chat.completions.create(
model="gpt-5", messages=[{"role": "system", "content": prompt}]
)
try:
output = chat_completion.choices[0].message.content
if output:
return Resume.model_validate_json(output)
return None
except Exception as e:
raise e
```
That's pretty good, but now we want to add an `Education` model to the `Resume` model. We add the following code:
```diff
...
+class Education(BaseModel):
+ school: str
+ degree: str
+ year: int
class Resume(BaseModel):
name: str
skills: List[str]
+ education: List[Education]
def create_prompt(input_text: str) -> str:
additional_models = ""
+ if "$defs" in Resume.model_json_schema():
+ additional_models += f"\nUse these other schema definitions as +well:\n{Resume.model_json_schema()['$defs']}"
PROMPT_TEMPLATE = f"""Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{input_text}
---
Schema:
{Resume.model_json_schema()['properties']}
+ {additional_models}
Output JSON:
""".strip()
return PROMPT_TEMPLATE
...
```
A little ugly, but still readable... But managing all these prompt strings can make your codebase disorganized very quickly.
Then you realize the LLM sometimes outputs some text before giving you the json, like this:
```diff
+ The output is:
{
"name": "John Doe",
... // truncated for brevity
}
```
So you add a regex to address that that extracts everything in `{}`:
```diff
def extract_resume(input_text: str) -> Union[Resume, None]:
prompt = create_prompt(input_text)
print(prompt)
chat_completion = client.chat.completions.create(
model="gpt-5", messages=[{"role": "system", "content": prompt}]
)
try:
output = chat_completion.choices[0].message.content
print(output)
if output:
+ # Extract JSON block using regex
+ json_match = re.search(r"\{.*?\}", output, re.DOTALL)
+ if json_match:
+ json_output = json_match.group(0)
return Resume.model_validate_json(output)
return None
except Exception as e:
raise e
```
Next you realize you actually want an array of `Resumes`, but you can't really use `List[Resume]` because Pydantic and Python don't work this way, so you have to add another wrapper:
```diff
+class ResumeArray(BaseModel):
+ resumes: List[Resume]
```
Now you need to change the rest of your code to handle different models. That's good longterm, but it is now more boilerplate you have to write, test and maintain.
Next, you notice the LLM sometimes outputs a single resume `{...}`, and sometimes an array `[{...}]`...
You must now change your parser to handle both cases:
```diff
+def extract_resume(input_text: str) -> Union[List[Resume], None]:
+ prompt = create_prompt(input_text) # Also requires changes
chat_completion = client.chat.completions.create(
model="gpt-5", messages=[{"role": "system", "content": prompt}]
)
try:
output = chat_completion.choices[0].message.content
if output:
# Extract JSON block using regex
json_match = re.search(r"\{.*?\}", output, re.DOTALL)
if json_match:
json_output = json_match.group(0)
try:
+ parsed = json.loads(json_output)
+ if isinstance(parsed, list):
+ return list(map(Resume.model_validate_json, parsed))
+ else:
+ return [ResumeArray(**parsed)]
return None
except Exception as e:
raise e
```
You could retry the call against the LLM to fix the issue, but that will cost you precious seconds and tokens, so handling this corner case manually is the only solution.
---
## A small tangent -- JSON schemas vs type definitions
Sidenote: At this point your prompt looks like this:
```
JSON Schema:
{'name': {'title': 'Name', 'type': 'string'}, 'skills': {'items': {'type': 'string'}, 'title': 'Skills', 'type': 'array'}, 'education': {'anyOf': [{'$ref': '#/$defs/Education'}, {'type': 'null'}]}}
Use these other JSON schema definitions as well:
{'Education': {'properties': {'degree': {'title': 'Degree', 'type': 'string'}, 'major': {'title': 'Major', 'type': 'string'}, 'school': {'title': 'School', 'type': 'string'}, 'year': {'title': 'Year', 'type': 'integer'}}, 'required': ['degree', 'major', 'school', 'year'], 'title': 'Education', 'type': 'object'}}
```
and sometimes even GPT-4 outputs incorrect stuff like this, even though it's technically correct JSON (OpenAI's "JSON mode" will still break you)
```
{
"name":
{
"title": "Name",
"type": "string",
"value": "John Doe"
},
"skills":
{
"items":
{
"type": "string",
"values":
[
"Python",
"JavaScript",
"React"
]
... // truncated for brevity
```
(this is an actual result from GPT-4 before some more prompt engineering)
when all you really want is a prompt that looks like the one below -- with way less tokens (and less likelihood of confusion). :
```diff
Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
John Doe
Python, Rust
University of California, Berkeley, B.S. in Computer Science, 2020
---
+JSON Schema:
+{
+ "name": string,
+ "skills": string[]
+ "education": {
+ "school": string,
+ "degree": string,
+ "year": integer
+ }[]
+}
Output JSON:
```
Ahh, much better. **That's 80% less tokens** with a simpler prompt, for the same results. (See also Microsoft's [TypeChat](https://microsoft.github.io/TypeChat/docs/introduction/) which uses a similar schema format using typescript types)
---
But we digress, let's get back to the point. You can see how this can get out of hand quickly, and how Pydantic wasn't really made with LLMs in mind. We haven't gotten around to adding resilience like **retries, or falling back to a different model in the event of an outage**. There's still a lot of wrapper code to write.
### Pydantic and Enums
There are other core limitations.
Say you want to do a classification task using Pydantic. An Enum is a great fit for modelling this.
Assume this is our prompt:
```text
Classify the company described in this text into the best
of the following categories:
Text:
---
{some_text}
---
Categories:
- Technology: Companies involved in the development and production of technology products or services
- Healthcare: Includes companies in pharmaceuticals, biotechnology, medical devices.
- Real estate: Includes real estate investment trusts (REITs) and companies involved in real estate development.
The best category is:
```
Since we have descriptions, we need to generate a custom enum we can use to build the prompt:
```python
class FinancialCategory(Enum):
technology = (
"Technology",
"Companies involved in the development and production of technology products or services.",
)
...
real_estate = (
"Real Estate",
"Includes real estate investment trusts (REITs) and companies involved in real estate development.",
)
def __init__(self, category, description):
self._category = category
self._description = description
@property
def category(self):
return self._category
@property
def description(self):
return self._description
```
We add a class method to load the right enum from the LLM output string:
```python
@classmethod
def from_string(cls, category: str) -> "FinancialCategory":
for c in cls:
if c.category == category:
return c
raise ValueError(f"Invalid category: {category}")
```
Update the prompt to use the enum descriptions:
```python
def print_categories_and_descriptions():
for category in FinancialCategory:
print(f"{category.category}: {category.description}")
def create_prompt(text: str) -> str:
additional_models = ""
print_categories_and_descriptions()
PROMPT_TEMPLATE = f"""Classify the company described in this text into the best
of the following categories:
Text:
---
{text}
---
Categories:
{print_categories_and_descriptions()}
The best category is:
"""
return PROMPT_TEMPLATE
```
And then we use it in our AI function:
```python
def classify_company(text: str) -> FinancialCategory:
prompt = create_prompt(text)
chat_completion = client.chat.completions.create(
model="gpt-5", messages=[{"role": "system", "content": prompt}]
)
try:
output = chat_completion.choices[0].message.content
if output:
# Use our helper function!
return FinancialCategory.from_string(output)
return None
except Exception as e:
raise e
```
What gets hairy is if you want to change your types.
- What if you want the LLM to return an object instead? You have to change your enum, your prompt, AND your parser.
- What if you want to handle cases where the LLM outputs "Real Estate" or "real estate"?
- What if you want to save the enum information in a database? `str(category)` will save `FinancialCategory.healthcare` into your DB, but your parser only recognizes "Healthcare", so you'll need more boilerplate if you ever want to programmatically analyze your data.
### Alternatives
There are libraries like [instructor](https://github.com/jxnl/instructor/tree/main) do provide a great amount of boilerplate but you're still:
1. Using prompts that you cannot control. E.g. [a commit may change your results underneath you](https://github.com/jxnl/instructor/commit/1b6d8253c0f7dfdaa6cb1dbdbd37684d192ddecf).
1. Using more tokens than you may need to to declare schemas (higher costs and latencies)
1. **There are no included testing capabilities.**. Developers have to copy-paste JSON blobs everywhere, potentially between their IDEs and other websites. Existing LLM Playgrounds were not made with structured data in mind.
1. Lack of observability. No automatic tracing of requests.
## Enter BAML
The Boundary toolkit helps you iterate seamlessly compared to Pydantic.
Here's all the BAML code you need to solve the Extract Resume problem from earlier (VSCode prompt preview is shown on the right):
<img src="/assets/vscode/extract-resume-prompt-preview.png" />
<Note>
Here we use a "GPT4" client, but you can use any model. See [client docs](/ref/llm-client-providers/open-ai)
</Note>
{/*
```baml
class Education {
school string
degree string
year int
}
class Resume {
name string
skills string[]
education Education[]
}
function ExtractResume(resume_text: string) -> Resume {
client GPT4
prompt #"
Parse the following resume and return a structured representation of the data in the schema below.
Resume:
---
{{ input.resume_text }}
---
Output in this JSON format:
{{ ctx.output_format }}
Output JSON:
"#
}
``` */}
The BAML compiler generates a python client that imports and calls the function:
```python
from baml_client import baml as b
async def main():
resume = await b.ExtractResume(resume_text="""John Doe
Python, Rust
University of California, Berkeley, B.S. in Computer Science, 2020""")
assert resume.name == "John Doe"
```
That's it! No need to write any more code. Since the compiler knows what your function signature is we literally generate a custom deserializer for your own unique usecase that _just works_.
Converting the `Resume` into an array of resumes requires a single line change in BAML (vs having to create array wrapper classes and parsing logic).
In this image we change the types and BAML automatically updates the prompt, parser, and the Python types you get back.
<img src="/assets/comparisons/prompt_view.gif" />
Adding retries or resilience requires just [a couple of modifications](/ref/llm-client-strategies/retry-policy). And best of all, **you can test things instantly, without leaving your VSCode**.
### The bottom line
Pydantic is excellent for data validation, but LLM applications need more than validation - they need a complete structured extraction solution.
**BAML's advantages over Pydantic:**
- **No boilerplate** - BAML generates all parsing, retry, and error handling code
- **Visual development** - See prompts and test instantly in VSCode
- **Better prompts** - Optimized schema format uses 80% fewer tokens
- **Schema-Aligned Parsing** - Handles malformed JSON and edge cases automatically
- **Multi-model support** - Works with any LLM provider, not just OpenAI
- **Type safety across languages** - Generated clients for Python, TypeScript, Java, Go
- **Built-in resilience** - Retries, fallbacks, and smart error recovery
**What you get with BAML that Pydantic can't provide:**
- **Instant testing** - No API calls or token costs during development
- **Prompt optimization** - See exactly what's sent and optimize token usage
- **Production features** - Automatic retries, model fallbacks, streaming support
- **Better debugging** - Know exactly why extraction failed
- **Future-proof** - Never get locked into one model or provider
**Why this matters for your team:**
- **10x faster iteration** - Test prompts instantly without running Python code
- **Better reliability** - Handle edge cases and malformed outputs automatically
- **Cost optimization** - Reduce token usage with optimized schema formats
- **Model flexibility** - Switch between GPT, Claude, open-source models seamlessly
We built BAML because writing a Python library wasn't powerful enough to solve the real challenges of LLM structured extraction.
### Conclusion
Get started today with [Python](/guide/installation-language/python), [TypeScript](/guide/installation-language/typescript), [Go](/guide/installation-language/go), [Ruby](/guide/installation-language/ruby) or [other languages](/guide/installation-language/rest-api-other-languages).
Our mission is to make the best developer experience for AI engineers working with LLMs. Contact us at founders@boundaryml.com or [Join us on Discord](https://discord.gg/BTNBeXGuaS) to stay in touch with the community and influence the roadmap.