# Getting started with Pydantic (and superpower your dataclasses)

2023-02-24

# Bonjour à tous 👋

# 🛣️ Agenda

1. Dataclass is cool
2. Pydantic is nice
3. Pydantic - Level II


# 1 - Dataclass is cool

* It was originally described in PEP 557.

* Python 3.7 (Jun. 2018)

In [1]:
from dataclasses import dataclass, field
from typing import ClassVar, Optional
from datetime import datetime

@dataclass(order=True, frozen=True, unsafe_hash=True)
class Item:
    id:int
    name: str
    price:float = field(compare=False, hash=False)
    sku: int | None = None
    quantity: int = field(default=0, repr=False, hash=False)
    ingredients: list[str] = field(compare=False, default_factory=list)
    added_at : Optional[datetime] = field(default_factory=datetime.now)
    brand: ClassVar[int] = 19


In [2]:
i = Item(1, "compu", 10)
i


Item(id=1, name='compu', price=10, sku=None, ingredients=[], added_at=datetime.datetime(2023, 2, 28, 9, 51, 54, 707731))

## But dataclass has limitations

In [3]:
import json
data = json.load(open("./2023-02-24_pydantic.json"))
data

{'items': [{'id': 123,
   'price': 152,
   'quantity': 65,
   'ingredients': ['roquefort', 'camembert'],
   'name': 'you'},
  {'id': 456,
   'price': '245',
   'quantity': -10,
   'ingredients': ['comté', 'st nectaire', 31000],
   'name': 'tartine',
   'added_at': '2021-03-20T00:00:00'}]}

In [4]:
# Read from json
items = [Item(**item) for item in data["items"]]
items

[Item(id=123, name='you', price=152, sku=None, ingredients=['roquefort', 'camembert'], added_at=datetime.datetime(2023, 2, 28, 9, 51, 56, 126017)),
 Item(id=456, name='tartine', price='245', sku=None, ingredients=['comté', 'st nectaire', 31000], added_at='2021-03-20T00:00:00')]

In [5]:
# dot notation
items[0].name

'you'

In [6]:
items[0].__dict__

{'id': 123,
 'name': 'you',
 'price': 152,
 'sku': None,
 'quantity': 65,
 'ingredients': ['roquefort', 'camembert'],
 'added_at': datetime.datetime(2023, 2, 28, 9, 51, 56, 126017)}

But
* no type checking at runtime
* no parsing
* no validation

# 2 - Pydantic is nice

## Getting started

In [7]:
from dataclasses import dataclass, field
from typing import ClassVar, Optional
from datetime import datetime
from pydantic import BaseModel

class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19


In [8]:
i = Item(id=1, name="compu", price=10)
i
# Pydantic objects don't support positional arguments unless you implement it yourself: https://stackoverflow.com/questions/73156982/allow-positional-arguments-for-basemodel-pydantic

Item(id=1, name='compu', price=10.0, sku=None, quantity=0, ingredients=[], added_at=datetime.datetime(2023, 2, 28, 9, 52, 3, 295250))

In [9]:
import json
data = json.load(open("./2023-02-24_pydantic.json"))

# Read from json
items = [Item(**item) for item in data["items"]]
items

[Item(id=123, name='you', price=152.0, sku=None, quantity=65, ingredients=['roquefort', 'camembert'], added_at=datetime.datetime(2023, 2, 28, 9, 52, 3, 295250)),
 Item(id=456, name='tartine', price=245.0, sku=None, quantity=-10, ingredients=['comté', 'st nectaire', '31000'], added_at=datetime.datetime(2021, 3, 20, 0, 0))]

In [10]:
item = items[0]
item.dict()

{'id': 123,
 'name': 'you',
 'price': 152.0,
 'sku': None,
 'quantity': 65,
 'ingredients': ['roquefort', 'camembert'],
 'added_at': datetime.datetime(2023, 2, 28, 9, 52, 3, 295250)}

## Field

In [38]:
# Default
from pydantic import BaseModel, Field

class Item(BaseModel):
    id:int = Field(default=0)
    name: str = Field(default="")
    price:float

i = Item(price=10)
i

Item(id=0, name='', price=10.0)

In [41]:
# Default
from pydantic import BaseModel, Field

class Item(BaseModel):
    id:int = Field(default=0)
    name: str = Field(default="", max_length=3)
    price:float = Field(gt=10, le=20, exclude=True)
    ingredients: list[str] = Field(max_items=2)

i = Item(name="compu", price=9, ingredients=["apple", "manzana", "pomme"])
i

ValidationError: 3 validation errors for Item
name
  ensure this value has at most 3 characters (type=value_error.any_str.max_length; limit_value=3)
price
  ensure this value is greater than 10 (type=value_error.number.not_gt; limit_value=10)
ingredients
  ensure this value has at most 2 items (type=value_error.list.max_items; limit_value=2)

In [43]:
i = Item(price=11, ingredients=["apple", "manzana"])
i.dict()
# Price is excluded

{'id': 0, 'name': '', 'ingredients': ['apple', 'manzana']}

* default, alias, title, description, exclude, include, gt, ge, lt, le, mutliple_of, min_items, max_items, min_length, max_length, regex...

More infos at: https://docs.pydantic.dev/usage/schema/#field-customization

### Literal

In [12]:
# Literal
from dataclasses import dataclass, field
from typing import Literal

@dataclass
class Item():
    id:int
    name: Literal["apple", "ddmanzana"]
    price:float

In [14]:
i = Item(1, "aa", 3)
i
# It should not work

Item(id=1, name='aa', price=3)

In [17]:
from dataclasses import dataclass, field
from typing import ClassVar, Optional
from datetime import datetime
from pydantic import BaseModel

class Item(BaseModel):
    id:int | str
    name: Literal["apple", "pomme"]
    price:float


In [18]:
i = Item(id=1, name="compu", price=10)
i

ValidationError: 1 validation error for Item
name
  unexpected value; permitted: 'apple', 'pomme' (type=value_error.const; given=compu; permitted=('apple', 'pomme'))

In [19]:
i = Item(id=1, name="apple", price=10)
i

Item(id=1, name='apple', price=10.0)

### Required Fields

In [25]:
# By default all annotated fields are required
from pydantic import BaseModel

class Item(BaseModel):
    id:int 
    name: str 
    price:float


In [26]:
i = Item(id=1)
i

ValidationError: 2 validation errors for Item
name
  field required (type=value_error.missing)
price
  field required (type=value_error.missing)

In [27]:
# Optional
from pydantic import BaseModel

class Item(BaseModel):
    id:int | None
    name: str 
    price:float


In [30]:
i = Item(name=1, price=10)
i

Item(id=None, name='1', price=10.0)

In [44]:
# Set Required
from pydantic import BaseModel

class Item(BaseModel):
    id:int | None = ...
    name: str 
    price:float


In [45]:
i = Item(name=1, price=10)
i

ValidationError: 1 validation error for Item
id
  field required (type=value_error.missing)

In [34]:
i = Item(id=None, name=1, price=10)
i

Item(id=None, name='1', price=10.0)

In [None]:
# Required with ellipsis with Field
from pydantic import BaseModel, Field

class Item(BaseModel):
    id:int = Field(default=...)
    name: str = Field(default="")
    price:float

i = Item(price=10)
i

## Validators

In [59]:
from pydantic import BaseModel, validator


class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19
    
    @validator('name')
    def name_must_contain_space(cls, v):
        if ' ' not in v:
            raise ValueError('must contain a space')
        return v.title()

In [60]:
i = Item(id=1, name="compu", price=10)
i

ValidationError: 1 validation error for Item
name
  must contain a space (type=value_error)

In [61]:
i = Item(id=1, name="computer asus", price=10)
i

Item(id=1, name='Computer Asus', price=10.0, sku=None, quantity=0, ingredients=[], added_at=datetime.datetime(2023, 2, 27, 10, 34, 38, 482242))

In [62]:
from pydantic import BaseModel, ValidationError, validator


class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19
    
    @validator('sku', 'quantity')
    def check_int(cls, v):
        if v < 0:
            raise ValueError('value must be greather than 0')
        return v

In [63]:
i = Item(id=1, name="computer asus", price=10)
i

Item(id=1, name='computer asus', price=10.0, sku=None, quantity=0, ingredients=[], added_at=datetime.datetime(2023, 2, 27, 10, 34, 39, 91602))

In [64]:
i = Item(id=1, name="computer asus", price=10, sku=-1)
i

ValidationError: 1 validation error for Item
sku
  value must be greather than 0 (type=value_error)

In [65]:
i = Item(id=1, name="computer asus", price=10, quantity=-1)
i

ValidationError: 1 validation error for Item
quantity
  value must be greather than 0 (type=value_error)

In [66]:
# Check list
from pydantic import BaseModel, ValidationError, validator


class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19
    
    @validator('ingredients', each_item=True)
    def check_ingredients(cls, v):
        new_item = v.title()
        return new_item

In [67]:
i = Item(id=1, name="computer asus", price=10, ingredients=["camembert", "comté"])
i

Item(id=1, name='computer asus', price=10.0, sku=None, quantity=0, ingredients=['Camembert', 'Comté'], added_at=datetime.datetime(2023, 2, 27, 10, 34, 40, 992465))

In [76]:
# Root validators
# Check list
from pydantic import BaseModel, ValidationError, root_validator


class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19
    
    @root_validator(pre=True) #pre = before field validation occurs. Default = False
    def check_sku_or_quantity(cls, values):
        if "sku" not in values and "quantity" not in values:
            raise ValueError("Item should have either sku or quantity")
        return values

In [77]:
i = Item(id=1, name="computer asus", price=10, quantity=10)
i

Item(id=1, name='computer asus', price=10.0, sku=None, quantity=10, ingredients=[], added_at=datetime.datetime(2023, 2, 27, 10, 34, 57, 181930))

In [78]:
i = Item(id=1, name="computer asus", price=10, sku=1)
i

Item(id=1, name='computer asus', price=10.0, sku=1, quantity=0, ingredients=[], added_at=datetime.datetime(2023, 2, 27, 10, 34, 57, 181930))

In [79]:
i = Item(id=1, name="computer asus", price=10)
i

ValidationError: 1 validation error for Item
__root__
  Item should have either sku or quantity (type=value_error)

## Model Config

In [81]:
from pydantic import BaseModel, ValidationError, root_validator


class Item(BaseModel):
    id:int
    name: str
    price:float

    class Config:
        max_anystr_length = 10 # Any field configurtion we saw before
      

In [82]:
i = Item(id=1, name="computer", price=10)
i

Item(id=1, name='computer', price=10.0)

In [83]:
i = Item(id=1, name="computer acheté récemment", price=10)
i

ValidationError: 1 validation error for Item
name
  ensure this value has at most 10 characters (type=value_error.any_str.max_length; limit_value=10)

In [None]:
# Change behaviour globally
from pydantic import BaseModel as PydanticBaseModel


class BaseModel(PydanticBaseModel):
    class Config:
        arbitrary_types_allowed = True


class MyClass:
    """A random class"""


class Model(BaseModel):
    x: MyClass

## Schema

In [84]:
from pydantic import BaseModel, ValidationError, root_validator


class Item(BaseModel):
    id:int
    name: str
    price:float
    sku: int | None = None
    quantity: int = 0
    ingredients: list[str] = []
    added_at : Optional[datetime] = datetime.now()
    brand: ClassVar[int] = 19


In [86]:
print(Item.schema_json(indent=2))

{
  "title": "Item",
  "type": "object",
  "properties": {
    "id": {
      "title": "Id",
      "type": "integer"
    },
    "name": {
      "title": "Name",
      "type": "string"
    },
    "price": {
      "title": "Price",
      "type": "number"
    },
    "sku": {
      "title": "Sku",
      "type": "integer"
    },
    "quantity": {
      "title": "Quantity",
      "default": 0,
      "type": "integer"
    },
    "ingredients": {
      "title": "Ingredients",
      "default": [],
      "type": "array",
      "items": {
        "type": "string"
      }
    },
    "added_at": {
      "title": "Added At",
      "default": "2023-02-27T10:39:23.503610",
      "type": "string",
      "format": "date-time"
    }
  },
  "required": [
    "id",
    "name",
    "price"
  ]
}


In [None]:
Item

# 3 - Beyond Pydantic - Level II

## Generate code

`pip install datamodel-code-generator`

`datamodel-codegen --input /input/ --input-file-type jsonschema --output /output/`


Supported input types:
* OpenAPI 3 (YAML/JSON, OpenAPI Data Type)
* JSON Schema (JSON Schema Core/JSON Schema Validation)
* JSON/YAML/CSV Data (it will be converted to JSON Schema)
* Python dictionary (it will be converted to JSON Schema)


Supported output types:
* pydantic.BaseModel
* dataclasses.dataclass



More infos at https://github.com/koxudaxi/datamodel-code-generator

## Validate DAG files

```yaml
default_schedule_time : "30 0 * * *"
start_date: 2021-02-01 00:00:00
dag_frequency: daily
description: Dag to run my awesome ELT job
task_concurrency: 1
max_active_dag_runs: 1

data_points:
    - name: my_data_point
      external_tasks_to_wait:
        - dag_id: my_dag_id
          task_id: my_task_id
````

In [None]:
from datetime import datetime
from typing import Optional, List, Literal
from pydantic import BaseModel

class DagDefinition(BaseModel):
    """Dag yaml file definition"""

    default_schedule_time : str
    start_date: datetime
    dag_frequency: Literal["daily", "hourly"]
    description: Optional[str]
    task_concurrency: Optional[int]
    max_active_dag_runs: Optional[int]

    data_points: List[DataPoints]

# 😎 Conclusion




Pydantic
* Enforces type hints at runtime
* Validates complex data structure
* Validates custom business logic with validators
* Ensures all the implementations follow the same data structure

# 📚 Sources

* Simple data validation and setting management with Pydantic - presented by Teddy Crepineau
     * https://www.youtube.com/watch?v=bLqU_1u_4Qw
* Do we still need dataclasses? // PYDANTIC tutoria
     * https://www.youtube.com/watch?v=Vj-iU-8_xLs
* Does Pydantic Replace Dataclasses in Python?
     * https://www.youtube.com/watch?v=Nlhp4EmE55I 
* Pydantic is the Python Package I Wish I'd Learned Earlier
     * https://www.youtube.com/watch?v=_1Nf9KNhsPw 
* Parsing data with Pydantic - A Python short by Michael Kennedy
     * https://www.youtube.com/watch?v=aHv7-6WIxNM 




# README
