- https://sean22492249.medium.com/pydantic-%E7%9A%84%E4%BB%8B%E7%B4%B9-3721a0691162
- https://www.youtube.com/watch?v=yj-wSRJwrrc

### Pydantic

- powered by Type Hints
    - python lack of static typing
    - pydantic vs. dataclass
- Field and Model level Data validation
- JSON serialization
    - outputs JsonSchema

In [19]:
from datetime import datetime
from typing import Tuple

from pydantic import BaseModel, Field

In [2]:
class Student(BaseModel):
    birth: datetime
    grades: Tuple[int, int]

In [3]:
stu = Student(birth='1992-01-01', grades=['80', '90'])

In [4]:
stu.birth

datetime.datetime(1992, 1, 1, 0, 0)

In [5]:
stu.grades

(80, 90)

### basics

In [20]:
from pydantic import EmailStr

In [22]:
class User(BaseModel):
    name: str
    email: EmailStr
    account: int

In [24]:
User(name='zhang', email='zch@126', account='1234')

ValidationError: 1 validation error for User
email
  value is not a valid email address: The part after the @-sign is not valid. It should have a period. [type=value_error, input_value='zch@126', input_type=str]

### validator

In [14]:
from typing import Annotated
from pydantic import BaseModel, ValidationError
from pydantic import AfterValidator, BeforeValidator

def name_must_contain_space(v: str) -> str:
    if ' ' not in v:
        return ValueError(f'Name must contain a space: {v}')
    return v.lower()

class UserDetail(BaseModel):
    age: int
    name: Annotated[str, AfterValidator(name_must_contain_space)]

In [12]:
person = UserDetail(age='32', name='zhang')

In [13]:
person.name

ValueError('Name must contain a space: zhang')

In [15]:
class UserDetail(BaseModel):
    age: int
    name: Annotated[str, BeforeValidator(name_must_contain_space)]

In [16]:
person = UserDetail(age='32', name='zhang')

ValidationError: 1 validation error for UserDetail
name
  Input should be a valid string [type=string_type, input_value=ValueError('Name must contain a space: zhang'), input_type=ValueError]
    For further information visit https://errors.pydantic.dev/2.8/v/string_type

### custom validator

In [28]:
from pydantic import field_validator

In [30]:
class User(BaseModel):
    name: str
    email: EmailStr
    account: int
    
    @field_validator('account')
    def validate_account(cls, value):
        if value <= 0:
            raise ValueError(f'account id must be positive: {value}')
        return value

In [34]:
User(name='zhang', email='zch@126.com', account='-1234')

ValidationError: 1 validation error for User
account
  Value error, account id must be positive: -1234 [type=value_error, input_value='-1234', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/value_error

### json 序列化

In [35]:
user = User(name='zhang', email='zch@126.com', account='1234')

In [36]:
user.json()

'{"name":"zhang","email":"zch@126.com","account":1234}'

In [37]:
user.dict()

{'name': 'zhang', 'email': 'zch@126.com', 'account': 1234}

## pydantic vs. dataclasses

| Feature | pydantic | dataclass |
|---------|----------|-----------|
| Type Hints | ✅ | ✅ |
| Data Validation | ✅ | ❌ |
| Serialization | ✅ | ⚠️ |
| Built-In | ❌ | ✅ |

## structured prompting & LangChain

In [6]:
from langchain_core.pydantic_v1 import BaseModel, Field

- LLMs (large language models) are eating softwares
    - LLMs need to be backward compatible with existing software.
    - building systems not chatbots
- 90% of applications output json
    - or need some structured output they parse with regex

```
(prompt: str, schema: Model) -> Model
```

### instructor

In [9]:
import openai
import instructor
from pydantic import BaseModel

instructor.patch()

ValueError: Either client or create must be provided