# Pydantic:
- pydantic is a Python library that handles data validation and settings management using type-annotated class-fields. In this post, we will cover the basics of pydantic, and see how to use it to model and validate JSON data coming from an external source. We'll see how to create constrained fields, write custom field validations, and how to export models to JSON/dictionaries.
  
- Often, when dealing with data from files, external APIs or from users, we need to validate this data and often convert data from one type to another - for example, converting a string representation of a number to an integer or a float. We may need to account for optional fields, fields with dynamic default values, and fields with very specific validation rules.

- **Sample Data :**
```
{
        "id": "d15782d9-3d8f-4624-a88b-c8e836569df8",
        "name": "Eric Travis",
        "date_of_birth": "1995-05-25",
        "GPA": "3.0",
        "course": "Computer Science",
        "department": "Science and Engineering",
        "fees_paid": false
    }
```

- There are a few things we need to account for with this data, so let's say the following:

    1. The **date_of_birth** is represented as a string in the dataset - we want to convert this to a date object.
    2. The **GPA** is also represented as a string - we want to convert this to a floating-point number between 0 and 4.
    3. The **course** field is potentially null for some records, as some students may not yet be assigned to a course. We need to make this field optional.
    4. The **department** field should be constrained to a small set of permissible values, since the university/college only has a small number of departments.

In [6]:
import requests
from pprint import pprint

url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v1.json'
response = requests.get(url)
data = response.json()
pprint(data)

[{'GPA': '3.0',
  'course': 'Computer Science',
  'date_of_birth': '1995-05-25',
  'department': 'Science and Engineering',
  'fees_paid': False,
  'id': 'd15782d9-3d8f-4624-a88b-c8e836569df8',
  'name': 'Eric Travis'},
 {'GPA': '2.5',
  'course': None,
  'date_of_birth': '1996-02-10',
  'department': 'Science and Engineering',
  'fees_paid': True,
  'id': '4c7b4c43-c863-4855-abc0-3657c078ce23',
  'name': 'Mark Smith'},
 {'GPA': '3.5',
  'course': 'Biology',
  'date_of_birth': '1996-10-01',
  'department': 'Life Sciences',
  'fees_paid': False,
  'id': '5cd9ad59-fcf1-462c-8863-282a9fb693e4',
  'name': 'Marissa Barker'},
 {'GPA': '3.23',
  'course': 'Philosophy',
  'date_of_birth': '1994-08-22',
  'department': 'Arts and Humanities',
  'fees_paid': True,
  'id': '48dda775-785d-41e3-b0dd-26a4a2f7722f',
  'name': 'Justin Holden'},
 {'GPA': '3.9',
  'course': 'Film Studies',
  'date_of_birth': '1995-08-05',
  'department': 'Arts and Humanities',
  'fees_paid': True,
  'id': '7ffe2ceb-562b-

### Things to notice on the above data
- `id` - this should be a UUID object (a universally unique identifier). Python has a uuid module in the standard library which we can use to validate this field
- `name` - string
- `course` - string (there are only few courses in the university). Hint: **enum**
-  `department` - string (it contains **null** values as well). Hint : use **Union** of string and null
- `GPA` - this should be a floating-point number (this should be between 0-4). Hint: **constrain**
- `date_of_birth` - this should be a date object (we should not let age <16 to enroll). Hint : **custom validators**
- `fees_paid` - this is a boolean field

In [22]:
import uuid
from datetime import datetime, date, timedelta
from pydantic import BaseModel, Field, confloat, field_validator
from typing import Union
from enum import Enum

class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'

class Student(BaseModel):
    id : uuid.UUID
    name : str
    date_of_birth : date
    GPA : confloat(gt=0, le=4)
    course : Union[str, None]       # course can contain either str or null values
    department : DepartmentEnum
    fees_paid : bool

    @field_validator('date_of_birth')
    def ensure_16_or_over(cls, value):
        sixteen_years_ago = datetime.now() - timedelta(days=365*16) # datetime object
        sixteen_years_ago = sixteen_years_ago.date()  # date
        if value > sixteen_years_ago:
            raise ValueError("Too young to enroll")
        return value

for student in data:
    model = Student(**student)
    print(model)
    print('\n')
    pprint(model.dict())
    print('\n')
    pprint(model.json())

id=UUID('d15782d9-3d8f-4624-a88b-c8e836569df8') name='Eric Travis' date_of_birth=datetime.date(1995, 5, 25) GPA=3.0 course='Computer Science' department=<DepartmentEnum.SCIENCE_AND_ENGINEERING: 'Science and Engineering'> fees_paid=False


{'GPA': 3.0,
 'course': 'Computer Science',
 'date_of_birth': datetime.date(1995, 5, 25),
 'department': <DepartmentEnum.SCIENCE_AND_ENGINEERING: 'Science and Engineering'>,
 'fees_paid': False,
 'id': UUID('d15782d9-3d8f-4624-a88b-c8e836569df8'),
 'name': 'Eric Travis'}


('{"id":"d15782d9-3d8f-4624-a88b-c8e836569df8","name":"Eric '
 'Travis","date_of_birth":"1995-05-25","GPA":3.0,"course":"Computer '
 'Science","department":"Science and Engineering","fees_paid":false}')
id=UUID('4c7b4c43-c863-4855-abc0-3657c078ce23') name='Mark Smith' date_of_birth=datetime.date(1996, 2, 10) GPA=2.5 course=None department=<DepartmentEnum.SCIENCE_AND_ENGINEERING: 'Science and Engineering'> fees_paid=True


{'GPA': 2.5,
 'course': None,
 'date_of_birth': datetime.date

## 2. Nested Module and Literal Typing

In [35]:
import requests

url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v2.json'
response = requests.get(url)
data = response.json()
pprint(data)

[{'GPA': '3.0',
  'course': 'Computer Science',
  'date_of_birth': '1995-05-25',
  'department': 'Science and Engineering',
  'fees_paid': False,
  'id': 'd15782d9-3d8f-4624-a88b-c8e836569df8',
  'modules': [{'credits': 20,
               'id': 1,
               'name': 'Data Science and Machine Learning',
               'professor': 'Prof. Susan Love',
               'registration_code': 'abc'},
              {'credits': 20,
               'id': 'e96e86a6-c4e0-4441-af43-0c22cc472e18',
               'name': 'Web Development',
               'professor': 'Prof. James Herman',
               'registration_code': 'abc'},
              {'credits': 10,
               'id': 3,
               'name': 'Relational Databases and SQL',
               'professor': 'Prof. Samantha Curtis',
               'registration_code': 'abc'}],
  'name': 'Eric Travis'},
 {'GPA': '2.5',
  'course': None,
  'date_of_birth': '1996-02-10',
  'department': 'Science and Engineering',
  'fees_paid': True,
  'id': '

- Here we can see that **modules** have nested json objects.
```
{
    "id": 1,
    "name": "Data Science and Machine Learning",
    "professor": "Prof. Susan Love",
    "credits": 20,
    "registration_code": "abc"
}
```
- Each student record has a list of three of these modules, if the student has chosen a course. There are a few quirks here that we're going to cover with tools provided by Pydantic:

    1. The `id` field can be either an integer (as above), or a UUID.
    2. The `credits` field can only have the possible values of 10 or 20.
    3. If a student has not chosen a course, then the `modules` list will not exist in the data.
    4. If a student has `modules`, there must only be **3 modules** for the academic year

In [47]:
from typing import Optional, Literal, Union, List
from datetime import datetime, date, timedelta
import uuid
from pydantic import BaseModel, confloat, Field, field_validator
from enum import Enum


class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'

class Module(BaseModel): 
    id : Union[int, uuid.UUID]
    name : str
    professor : str
    credits : Literal[10,20]  #should contain either 10 or 20
    registration_code : str

class Student(BaseModel):
    id : uuid.UUID
    name : str
    date_of_birth : date
    course : Optional[str]
    department : DepartmentEnum
    GPA : confloat(ge=0, le=4)
    fees_paid : bool
    modules : List[Module]=[]  #should eithe 0 or 3 list of modules

    @field_validator('date_of_birth')
    def ensure_age(cls, value):
        sixteen_years_ago = datetime.now() - timedelta(days=365*16)
        sixteen_years_ago = sixteen_years_ago.date()
        if value > sixteen_years_ago:
            raise ValueError('Too young to enroll the course')
        return value

    @field_validator('modules')
    def validate_length(cls,value):
        if len(value) and len(value)!=3:
            raise ValueError('List of Modules should have length 3 or 0')
        return value


for student in data:
    model = Student(**student)
    pprint(model.json())

('{"id":"d15782d9-3d8f-4624-a88b-c8e836569df8","name":"Eric '
 'Travis","date_of_birth":"1995-05-25","course":"Computer '
 'Science","department":"Science and '
 'Engineering","GPA":3.0,"fees_paid":false,"modules":[{"id":1,"name":"Data '
 'Science and Machine Learning","professor":"Prof. Susan '
 'Love","credits":20,"registration_code":"abc"},{"id":"e96e86a6-c4e0-4441-af43-0c22cc472e18","name":"Web '
 'Development","professor":"Prof. James '
 'Herman","credits":20,"registration_code":"abc"},{"id":3,"name":"Relational '
 'Databases and SQL","professor":"Prof. Samantha '
 'Curtis","credits":10,"registration_code":"abc"}]}')
('{"id":"4c7b4c43-c863-4855-abc0-3657c078ce23","name":"Mark '
 'Smith","date_of_birth":"1996-02-10","course":null,"department":"Science and '
 'Engineering","GPA":2.5,"fees_paid":true,"modules":[]}')
('{"id":"5cd9ad59-fcf1-462c-8863-282a9fb693e4","name":"Marissa '
 'Barker","date_of_birth":"1996-10-01","course":"Biology","department":"Life '
 'Sciences","GPA":3.5,"fee

## 3. Pydantic Field, Advanced Export, config, alias
- Set **default values** with the Field() function
- Use **aliases** to allow model fields to have different names than the fields in the source data
- **Include/exclude** fields when exporting models, using both the Field() function and model export functions.
- Add **titles** and descriptions for fields in JSON Schema outputs, using the Field() function.
- Defined model **Config** classes to set model-wide configuration.

In [51]:
import requests

url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v2.json'
response = requests.get(url)
data = response.json()

In [60]:
from typing import Optional, Literal, Union, List
from datetime import datetime, date, timedelta
import uuid
from pydantic import BaseModel, confloat, Field, field_validator
from enum import Enum

class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'

class Module(BaseModel):
    id : Union[int, uuid.UUID]
    name : str
    professor : str
    credits : Literal[10,20]
    registration_code : str

class Student(BaseModel):
    id : uuid.UUID
    student_name : str = Field(alias='name')  # using alias to look for 'name' in data where its coming from api/external
    date_of_birth : date = Field(default_factory=lambda : datetime.today().date())  # setting today's date as default value which is dynamic
    GPA : confloat(ge=0, le=4) 
    course : Optional[str]
    department : DepartmentEnum
    modules : List[Module] = Field(default=[], max_length=10) # setting default of modules as empty list

    class Config:
        use_enum_values = True
        extra = 'ignore'         # if any extra parameter comes from data we can ignore it . Other values "forbid, allow"
        title = 'Student Model'  # when exporting we get "Student Model" in the json schema

    @field_validator('date_of_birth')
    def ensure_age(cls, value):
        sixteen_years_ago = datetime.now() - timedelta(days=365*16)
        sixteen_years_ago = sixteen_years_ago.date()
        if value > sixteen_years_ago:
            raise ValueError('Too young to enroll the course')
        return value

    @field_validator('modules')
    def ensure_length(cls, value):
        if len(value) and len(value)!=3:
            raise ValueError('Length of modules should be either 0 or 3')
        return value


## before excluding
for student in data:
    model = Student(**student)
    pprint(model.dict())
    print('++'*25)
    break
    
## after excluding the key's in output
exclude_keys = {'id':True,  # i want to exclude id bcoz its uuid for security reasons
               'modules': {'__all__': 'registration_code'} # excluding registration code for each module which is a submodule(that's why __all__ needed)
               }
for student in data:
    model = Student(**student)
    print('Enum Original values:',model.department) # since in config we mentioned use_enum_values =True, so in output we can see original values, instead of enum objects.
    print('\n')
    pprint(model.dict(exclude=exclude_keys)) # check printing, it wont have id, and registration codes...
    print('++'*25)
    break

{'GPA': 3.0,
 'course': 'Computer Science',
 'date_of_birth': datetime.date(1995, 5, 25),
 'department': 'Science and Engineering',
 'id': UUID('d15782d9-3d8f-4624-a88b-c8e836569df8'),
 'modules': [{'credits': 20,
              'id': 1,
              'name': 'Data Science and Machine Learning',
              'professor': 'Prof. Susan Love',
              'registration_code': 'abc'},
             {'credits': 20,
              'id': UUID('e96e86a6-c4e0-4441-af43-0c22cc472e18'),
              'name': 'Web Development',
              'professor': 'Prof. James Herman',
              'registration_code': 'abc'},
             {'credits': 10,
              'id': 3,
              'name': 'Relational Databases and SQL',
              'professor': 'Prof. Samantha Curtis',
              'registration_code': 'abc'}],
 'student_name': 'Eric Travis'}
++++++++++++++++++++++++++++++++++++++++++++++++++
Enum Original values: Science and Engineering


{'GPA': 3.0,
 'course': 'Computer Science',
 'date_of

## 4. Pre vallidator, and Post Validators
- `model_validator` validates after each individual validation is done
- `@field_validator(pre=True)` validates at the start even before the individual validation
- `@field_validator(each_item=True)` validates each item for the given individual

In [62]:
import requests

url = 'https://raw.githubusercontent.com/bugbytes-io/datasets/master/students_v3.json'
response = requests.get(url)
data = response.json()
pprint(data)

[{'GPA': '3.0',
  'course': 'Computer Science',
  'date_of_birth': '1995-05-25',
  'department': 'Science and Engineering',
  'fees_paid': False,
  'id': 'd15782d9-3d8f-4624-a88b-c8e836569df8',
  'modules': [{'credits': 20,
               'id': 1,
               'name': 'Data Science and Machine Learning',
               'professor': 'Prof. Susan Love',
               'registration_code': 'abc'},
              {'credits': 20,
               'id': 'e96e86a6-c4e0-4441-af43-0c22cc472e18',
               'name': 'Web Development',
               'professor': 'Prof. James Herman',
               'registration_code': 'abc'},
              {'credits': 10,
               'id': 3,
               'name': 'Relational Databases and SQL',
               'professor': 'Prof. Samantha Curtis',
               'registration_code': 'abc'}],
  'name': 'Eric Travis',
  'tags': 'motivated, skilled, hard-working'},
 {'GPA': '2.5',
  'course': None,
  'date_of_birth': '1996-02-10',
  'department': 'Science an

In [68]:
from typing import Optional, Literal, Union, List
from datetime import datetime, date, timedelta
import uuid
from pydantic import BaseModel, confloat, Field, field_validator, root_validator, model_validator
from enum import Enum

class DepartmentEnum(Enum):
    ARTS_AND_HUMANITIES = 'Arts and Humanities'
    LIFE_SCIENCES = 'Life Sciences'
    SCIENCE_AND_ENGINEERING = 'Science and Engineering'

class Module(BaseModel):
    id : Union[int, uuid.UUID]
    name : str
    professor : str
    credits : Literal[10,20]
    registration_code : str

class Student(BaseModel):
    id : uuid.UUID
    student_name : str = Field(alias='name')  # using alias to look for 'name' in data where its coming from api/external
    date_of_birth : date = Field(default_factory=lambda : datetime.today().date())  # setting today's date as default value which is dynamic
    GPA : confloat(ge=0, le=4) 
    course : Optional[str]
    department : DepartmentEnum
    modules : List[Module] = Field(default=[], max_length=10) # setting default of modules as empty list

    class Config:
        use_enum_values = True
        extra = 'ignore'         # if any extra parameter comes from data we can ignore it . Other values "forbid, allow"
        title = 'Student Model'  # when exporting we get "Student Model" in the json schema
        anystr_strip_whitespace = True # stripping any whitespaces when we do any operation for the given item. Note: here doing it for 'tags'

    
    @root_validator
    def check_gpa_courses(values): # if GPA<3 , he cant enroll is science and engineering 
        gpa = values.get('GPA')        # Note: its not value, its "values"
        dept = values.get('department')

        valid_gpa = (gpa>=3.0)
        valid_dept = (dept==DepartmentEnum.SCIENCE_AND_ENGINEERING)
        if dept_science:
            if not valid_gpa:
                raise ValueError('GPA not high enough for Science & Engineering')
        return values


    @field_validator('tags', pre=True) # in data tags have values like this -->  'tags': 'erudite, clever, motivated' . Thats why it needs to be run first
    def split_tags(cls, value):
        return value.split(',')

    
    @field_validator('tags', each_item=True)
    def validate_tags(cls, value):
        if value=='slacker':
            raise ValueError('Cant enroll the student as he is a slacker')
        return values
    
    @field_validator('date_of_birth')
    def ensure_age(cls, value):
        sixteen_years_ago = datetime.now() - timedelta(days=365*16)
        sixteen_years_ago = sixteen_years_ago.date()
        if value > sixteen_years_ago:
            raise ValueError('Too young to enroll the course')
        return value

    @field_validator('modules')
    def ensure_length(cls, value):
        if len(value) and len(value)!=3:
            raise ValueError('Length of modules should be either 0 or 3')
        return value


for student in data:
    try:
        model = Student(**student)
        print(model.tags)   # print out the list of tags to the terminal
    except ValidationError as e:
        print(e)

/var/folders/14/v6n289_j38zblcgwvg271f1w0000gn/T/ipykernel_50894/2708425829.py:35: PydanticDeprecatedSince20: Pydantic V1 style `@root_validator` validators are deprecated. You should migrate to Pydantic V2 style `@model_validator` validators, see the migration guide for more details. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.9/migration/
  @root_validator


PydanticUserError: If you use `@root_validator` with pre=False (the default) you MUST specify `skip_on_failure=True`. Note that `@root_validator` is deprecated and should be replaced with `@model_validator`.

For further information visit https://errors.pydantic.dev/2.9/u/root-validator-pre-skip

In [65]:
import pydantic
pydantic.__version__

'2.9.2'