# (2) Using Pydantic

Pydantic is...

> Data validation and settings management using Python type annotations.
> 
> pydantic enforces type hints at runtime, and provides user friendly errors when data is invalid.
> 
> Define how data should be in pure, canonical Python; validate it with pydantic.

*https://docs.pydantic.dev*

Lets use Pydantic to load and validate our data. To get started we will still load the data using the built in `json` module.

In [22]:
import json
from pathlib import Path

from rich.pretty import pprint
from rich import inspect


json_data = json.loads(Path("data.json").read_text())
inspect(json_data)

Next, we need to define Pydantic models. This may seem like more work but...

- It will save you time in the long run
  - Easier to write code that uses the data
  - Easier to refactor the code
  - Less errors
- And you can even cheat by using tools like <https://jsontopydantic.com>

In [23]:
from typing import List

from pydantic import BaseModel

class Book(BaseModel):
    title: str
    publisher: str
    year: int


class Author(BaseModel):
    name: str
    year_of_birth: int
    books: List[Book]


class ApiResponse(BaseModel):
    authors: List[Author]

Now you can easily parse and validate your API response!

In [24]:
data = ApiResponse(**json_data)
pprint(data)

Now you can explore data using Python attributes instead of using dictionary keys. Autocomplete now works great!

In [25]:
data.authors[0].name

'Bruce Willis'

In [26]:
data.authors[1].books[0].title

'My First book'

Pydantic has also automatically fixed the types. Using the type annotations it will attempt to convert the value to the correct type.

In [27]:
pprint(data.authors[0].year_of_birth)
pprint(json_data["authors"][0]["year_of_birth"])

If it cannot convert the value it will throw an error. My IDE will also warn me that I am using an incorrect type.

In [28]:
try:
  Book(
      title="Hello world",
      publisher="O Willy",
      year="two-thousand"
  )
except Exception as e:
  pprint(e)

Pydantic allows you to easily convert data back into a Python dict or json format. This is useful for passing the data into other downstream libraries or tools.

In [29]:
# convert to json
data.json()

'{"authors": [{"name": "Bruce Willis", "year_of_birth": 2019, "books": [{"title": "My First book", "publisher": "Big Books", "year": 2020}]}, {"name": "Bruce Willis", "year_of_birth": 2019, "books": [{"title": "My First book", "publisher": "Big Books", "year": 2020}]}]}'

In [30]:
# convert to a dict
data.dict()

{'authors': [{'name': 'Bruce Willis',
   'year_of_birth': 2019,
   'books': [{'title': 'My First book',
     'publisher': 'Big Books',
     'year': 2020}]},
  {'name': 'Bruce Willis',
   'year_of_birth': 2019,
   'books': [{'title': 'My First book',
     'publisher': 'Big Books',
     'year': 2020}]}]}

In [31]:
# convert into a dataframe
import pandas as pd

df = pd.DataFrame.from_records([i.dict() for i in data.authors])
df

Unnamed: 0,name,year_of_birth,books
0,Bruce Willis,2019,"[{'title': 'My First book', 'publisher': 'Big ..."
1,Bruce Willis,2019,"[{'title': 'My First book', 'publisher': 'Big ..."
