# Advanced Usage

## Complex Types

We have already seen how to define classes / models using simple types, such as `int`, `float`, `str`, `bool`, etc. However in practice data structures are often more complex. They include for examples dictionaries, list of lists, or mutiple allowed types. Of course Pydantic also supports these more complex types, such as lists, dictionaries, enums, and unions. In the following we will see an overview of those types and how to use them:


### Typed Lists and Dictionaries

Lists and dictionaries are very common data structures in Python. Pydantic supports typed lists and dictionaries, which means that we can also define the type of the elements in the list or the type of the values in the dictionary.
Typed lists and dictionaries are defined using the `list` and `dict` generic types. For example, we can define a model with a list of floats as follows:

In [None]:
from pydantic import BaseModel

class LineV1(BaseModel):
    """A Line object that can be used to represent a line."""
    x: list[float]
    y: list[float]

    def length(self):
        """Length of the line"""
        length = 0

        for idx in range(len(self.x) - 1):
            length += ((self.x[idx] - self.x[idx + 1]) ** 2 + (self.y[idx] - self.y[idx + 1]) ** 2) ** 0.5
        
        return length

In [None]:
line_v1 = LineV1(x=[0, 1, 3], y=[0, 1, 2])
display(line_v1)

In [None]:
print(line_v1.length())

Note that the behavior is exactly the same as for simple types. So values are converted to the specified type if possible, and otherwise a `ValidationError` is raised.

In [None]:
line_v1 = LineV1(x=[0, 1, "3"], y=[0, True, 2])
display(line_v1)

### Enums and Union Types

In manny cases it is useful to provide users with a selection of valid values, such as strings. The data structure to handle this is called `Enum`. Enums are defined using the `Enum` generic type in Python. For example, we can define a selection for the line color:


In [None]:
from pydantic import BaseModel
from enum import Enum

class LineColor(str, Enum):
    """Line color enum"""
    red = "red"
    green = "green"
    blue = "blue"


class LineV2(BaseModel):
    """A Line object that can be used to represent a line."""
    x: list[float]
    y: list[float]
    color: LineColor = LineColor.red

    def length(self):
        """Length of the line"""
        length = 0

        for idx in range(len(self.x) - 1):
            length += ((self.x[idx] - self.x[idx + 1]) ** 2 + (self.y[idx] - self.y[idx + 1]) ** 2) ** 0.5
        
        return length

In [None]:
line_v2 = LineV2(x=[0, 1, 2], y=[0, 1, 2], color="red")
display(line_v2) 

Now let's try an invalid color:

In [None]:
line_v2 = LineV2(x=[0, 1, 2], y=[0, 1, 2], color="purple")

Note that there is https://github.com/pydantic/pydantic-extra-types, which provides support for validation of CSS colors. `from pydantic.color import Color`

In [None]:
from typing import Optional, Union

from pydantic import BaseModel
from enum import Enum

class LineColor(str, Enum):
    """Line color enum"""
    red = "red"
    green = "green"
    blue = "blue"


class LineV2(BaseModel):
    """A Line object that can be used to represent a line."""
    x: list[float]
    y: list[float]
    color: Union[LineColor, None] = LineColor.red



### Datetime Types

Pydantic supports parsing and validation of datetime types.



In [None]:
from datetime import datetime

class LonLatTimeVector(BaseModel):
    """Represent a position on earth with time."""
    lon: float
    lat: float
    time: datetime = None

### Custom Data Types

### Hierarchical Structures

One of the most powerful features of Pydantic is the ability to combine models to create hierarchical structures. This is done by defining a model attribute as another model.


In [None]:
from pydantic import BaseModel

class Point(BaseModel):
    """Representation of a two-dimensional point coordinate."""

    x: float
    y: float

    def distance_to(self, other: "Point") -> float:
        """Computes the distance to another `PointV3`."""
        dx = self.x - other.x
        dy = self.y - other.y
        return (dx**2 + dy**2)**0.5


class Triangle(BaseModel):
    """Representation of a triangle"""
    point_a : Point
    point_b : Point
    point_c : Point

    def circumference(self):
        """Circumference of the triangle"""
        return (
            self.point_a.distance_to(self.point_b)
            + self.point_b.distance_to(self.point_c)
            + self.point_c.distance_to(self.point_a)
        )

triangle = Triangle(
    point_a=Point(x=0, y=0),
    point_b=Point(x=1, y=0),
    point_c=Point(x=0, y=1),
)

display(triangle)

In [None]:
print(triangle.circumference())

In [None]:
class LineV3(BaseModel):
    """Line object"""
    points = list[Point]

    def length(self):
        """Line length"""
        length = 0

        for point, next_point in zip(self.points[:-1], self.points[1:]):
            length += point.distance_to(next_point)
        
        return length


If you compare to our first implemmentation at the beginning the `LineV3` is more compact, better readable and more elegant. 

### Private Attributes

By default all attributes of a Pydantic model are public. However, sometimes we want to define private attributes, which are not accessible from outside the model. This can be done using the `ClassVar` and `PrivateAttr` types. The `ClassVar` type is used to define class variables, which are shared by all instances of the class. The `PrivateAttr` type is used to define private attributes, which are not accessible from outside the model. The following example shows how to use these types:

In [None]:
class :
    pass

`Config.underscore_attrs_are_private` is `True`

## Type Validation

We saw a sneak peak of type validation in Part 2. Basic Usage. Now we are going to take a deep dive in type validation with Pydantic.

### `validator` decorator

If custom validation is required above and beyond what Pydantic provides out of the box, the `validator` decorator may be used to create validation class methods as part of the Pydantic model defintion. For example, imagine we want to create a model representing a user, with fields for the user's given name, surname, username, and passwords from two separate passwrod creation inputs. We may want to impose the following restrictions:
1. Username must contain only ascii characters
1. Both provided passwords must be identical
1. Given name and surname must be alphabetic characters only and must start with a capital letter
We can accomplish this using the following model

In [None]:
from typing import Any
from pydantic import BaseModel, validator


class User(BaseModel):

    username: str
    password1: str
    password2: str
    given_name: str
    surname: str

    @validator("username")
    def username_must_be_ascii(cls, username: str) -> str:
        if not username.isascii():
            raise ValueError("must be alphanumeric")
        return username

    @validator("password2")
    def passwords_must_match(cls, password2: str, values: dict[str, Any]) -> str:
        if ("password1" in values) and (password2 != values["password1"]):
            raise ValueError("Passwords do not match")
        return password2

    @validator("given_name", "surname")
    def names_must_be_alphabetic(cls, name: str) -> str:
        if not name.isalpha():
            raise ValueError("must be alphabetic")
        return name.capitalize()

Now let's create a valid user and see what happens...

In [None]:
display(
    User(
        username="scipy.2023.is.fun",
        password1="sup3rSecurePa$$w0rd",
        password2="sup3rSecurePa$$w0rd",
        given_name="joHn",
        surname="doe",
    )
)

Notice that even though the provided name "joHn doe" was not properly capitalized, we were able to correct this by using the `str.capitalize` method and thus did not need to throw any errors. Now let's create an invalid user and see what happens...

In [None]:
display(
    User(
        username="§cipy.2023.is.fun",
        password1="sup3rSecurePa$$w0rd",
        password2="sup3rSecurePa$$w0rd2",
        given_name="John Harry",
        surname="Doe-Smith",
    )
)

Notice that Pydantic does not stop at the first validation error. It keeps track of all validation errors and gives a detailed summary of all validation errors in one error message. This allows the user to fix multiple problems at once instead of fixing a problem and then running into the next problem.

Some notes to keep in mind when using the `validator` decorator:
* The name of the validation method can be any valid Python name, but it helps to be descriptive
* The method will be a class method and not an instance method, so it is customary to name the first methd parameter `cls`
* The second method parameter will refer to the parsed value of the field under inspection and can be any valid Python name
* An optional third method parameter called `values` will refer to a dictionary of all previously parsed fields (fields are parsed in the order they are defined in the model)
  * If a field fails validation, it will not be present in the `values` dictionary in remaining validation methods
* The same method may be used to validate multiple fields by passing the name of each field as multiple arguments to the decorator
* Validation methods can perform additional parsing of fields on top of any parsing automatically provided by Pydantic
* Validation methods should either return the parsed value or raise one of `ValueError`, `TypeError`, or `AssertionError`

By default, `validator` will perform validation *after* other validation such as providing `"hello"` to an `int` field. But we can create validation methods that operator *before* other validation by using `pre=True` in the `validator` keyword arguments.

`validator` also has a `each_item` keyword argument that will apply the method to each item of list-like field.

Let's see `pre` and `each_item` in action...


In [None]:
from typing import List

class Foo(BaseModel):

    positive_ints: List[int]

    @validator("positive_ints", pre=True)
    def split_comma_separated_values(cls, positive_ints: Any) -> Any:
        if isinstance(positive_ints, str):
            return positive_ints.split(",")
        return positive_ints

    @validator("positive_ints", each_item=True)
    def must_be_positive(cls, item: int) -> int:
        if item <= 0:
            raise ValueError(f"{item} is not positive")
        return item

display(
    Foo(positive_ints=(67.4, 2, True))
)
display(
    Foo(positive_ints="2,4,6,8")
)
display(
    Foo(positive_ints=["-4", 4, 0, 7])
)

Notice the first example successfully converted `(67.4, 2, True)` into the list of integers `[67, 2, 1]`. And thanks to `pre=True`, the second example converted `"2,4,6,8"` to `[2, 4, 6, 8]`. Had we set `pre=False` or left the default value, the string `"2,4,6,8"` would have led to a validation error. The third example returns a validation error as the input list contains non-positive integers. The error message even tells us which indices of the input list are leading to the validation error.

Another default behavior of `validator` is to not validate fields when a value is not provided. But there may be scenarios where this is not the desired behavior. Let's see an example. Let's return to our `User` class from above. Let's create a `NewUser` class which is the same as the `User` class but contains an addition `created_at` field. This will take a `datetime` but be optional. When not supplied, the default value should be the current UTC time. A naive implementation might look like:

In [None]:
from datetime import datetime


class NewUser(User):

    created_at: datetime = datetime.utcnow()

But there is a problem with this implementation. The default time will represent the time the class was defined, not when it was instantiated...

In [None]:
import time


class NewUser(User):

    created_at: datetime = datetime.utcnow()

print(f"NewUser defined at  {datetime.utcnow()}")

time.sleep(3)

new_user = NewUser(
    username="joe",
    password1="1234",
    password2="1234",
    given_name="joe",
    surname="davis",
)
print(f"new_user created at {new_user.created_at}")

Notice that despite occuring 3 seconds apart, the class definition and class instantiation are reporting as just milliseconds apart. We can solve this be creating a validation method where we set `always=True` in the `validator` keyword arguments. This proper implemention of this class looks like...

In [None]:
from typing import Optional, Union


class NewUser(User):

    created_at: Optional[datetime]

    @validator("created_at", always=True)
    def set_default_time(cls, created_at: Union[datetime, None]) -> datetime:
        return datetime.utcnow() if created_at is None else created_at

print(f"NewUser defined at  {datetime.utcnow()}")

time.sleep(3)

new_user = NewUser(
    username="joe",
    password1="1234",
    password2="1234",
    given_name="joe",
    surname="davis",
)
print(f"new_user created at {new_user.created_at}")

Now we get the expected before that the `new_user` was created 3 seconds after the `NewUser` class was defined.

### `root_validator` decorator

It is possible to perform validation on the entire model data in one validation method using the `root_validator` decorator. Recall in our `User` class we validated that `password1` and `password2` matched using the `validator` decorator. The same functionality can be implemented with the `root_validator` decorator...

In [None]:
from pydantic import root_validator


class UserPassword(BaseModel):

    password1: str
    password2: str

    @root_validator
    def passords_must_match(cls, values: dict[str, Any]) -> dict[str, Any]:
        pw1, pw2 = values.get('password1'), values.get('password2')
        if pw1 is not None and pw2 is not None and pw1 != pw2:
            raise ValueError('passwords do not match')
        return values

display(
    UserPassword(password1="1234", password2="1234")
)
display(
    UserPassword(password1="1234", password2="12345")
)

Root validators also accept a `pre=True` keyword argument just like the `validator` decorator.

The full documentation on validators can be found at https://docs.pydantic.dev/latest/usage/validators/

### Skipping validation

Type validation can be a slow process that you may want to skip for performance reasons. If you know you have data from a trusted source that is pre-validated, then you may use the `construct` method of your Pydantic model when instantiating the class.

In [None]:
user_data = {
    "username": "scipy.2023.is.fun",
    "password1": "1234",
    "password2": "1234",
    "given_name": "john",
    "surname": "doe",
}

In [None]:
%%timeit
User(**user_data)

In [None]:
%%timeit
User.construct(**user_data)

We can see that instatiating the `User` class with `construct` is much faster than with validation. But be mindful that skipping validation can result in invalid field values if the data is not pre-validated. Look what happens if we use the previous example that led to 4 validation errors...

In [None]:
display(
    User.construct(
        username="§cipy.2023.is.fun",
        password1="sup3rSecurePa$$w0rd",
        password2="sup3rSecurePa$$w0rd2",
        given_name="John Harry",
        surname="Doe-Smith",
    )
)

As expected we see no validation error andthe resulting `User` instance has several invalid values.

### `Config` type validation settings

Recall we can customize our Pydantic model by setting certain attributes in the `Config` class in our model. There are several settings related to type validation that can reduce the amount of validation methods needed.
* `anystr_strip_whitespace` - strip leading and trailing whitespaces for all `str` and `bytes` (default: `False`)
* `anystr_upper` - convert all characters to uppercase for `str` and `bytes` (default: `False`)
* `anystr_lower` - convert all characters to lowercase for `str` and `bytes` (default: `False`)
* `min_anystr_length` - minimum length for all `str` and `bytes` (default: `0`)
* `max_anystr_length` - maximum length for all `str` and `bytes` (default: `None`)
* `validate_all` - whether to validate fields when no value is provided (default: `False`)
* `validate_assignment` - whether to validate when fields are updated after instantiation (default: `False`)

Let's see some of these in action...

In [None]:
class Foo(BaseModel):

    bar: str

    class Config:
        anystr_strip_whitespace = True
        anystr_upper = True
        min_anystr_length = 8
        max_anystr_length = 32
        validate_assignment = True

display(
    Foo(bar="   hello SciPy!      ")
)

In [None]:
display(
    Foo(bar="    baz   ")
)

Notice that in this example the provided string has 10 characters, surpassing the 8 character minimum, but after stripping leading and trailing whitespaces, the string only has 3 characters. Thus a validation error arises exclaiming that `bar` is not at least characters long.

In [None]:
foo = Foo(bar="   hello SciPy!      ")
foo.bar = 80 * "-"

Here we try to change the value of `foo.bar` after instantiation, but we get a validation error exclaiming that the provided string is too long. Note that if `validate_assignment` were set to `False`, this would not have raised a validation error.

### `validate_arguments` decorator

Until now we have been discussing type validation in the context of Pydantic models, but Pydantic also provides a way to enforce function argument types at runtime via the `validate_arguments` decorator. This decorator will perform type coercion and type validation just like any Pydantic model.

Continuing our example with the `User` class, suppose a user wants to login to a service and we want to validate the provided credentials. We could write a function `validate_credentials` to compare a user provided username/password pair to a list of `User` instances. We may want to be sure the provided type are of the expected types.

DISCLAIMER: The following is **not** intended to be used as a model for sensitive date storage and/or credential authentication. This example serves only to demonstrate the usage of the `validate_arguments` decorator.

In [None]:
from pydantic import validate_arguments

def login():
    print("Successfully logged in!")

def logout():
    print("Invalid username or password")

@validate_arguments
def validate_credentials(
    username: str,
    password: str,
    registered_users: list[User],
) -> None:

    for user in registered_users:
        if (user.username == username) and (user.password1 == password):
            login()
            return

    logout()

Now let's create a database of users, but we will not create any `User` classes. Each user will just be a dictionary that conforms to the `User` format.

In [None]:
registered_users = [
    {
        "username": "abc",
        "password1": "123",
        "password2": "123",
        "given_name": "Joe",
        "surname": "Smith",
    },
    {
        "username": "def",
        "password1": "321",
        "password2": "321",
        "given_name": "Jane",
        "surname": "Davidson",
    },
]

Now let's attempt to login in to the service...

In [None]:
username, password = "def", "321"

validate_credentials(
    username=username,
    password=password,
    registered_users=registered_users,
)

We successfully logged in! But notice the magic that happened here. Our database of users is just a list of dictionaries. None of the users in our database have `username` or `password1` attributes, yet `validate_credentials` was able to access these attributes without raising an exception. Evidently the `validate_arguments` decorator coerced the list of dictionaries into a list of `User` instances. Presumably, if we then alter our database to include invalid users, then we should get a validation error when we try to call `validate_credentials`. Let's see if we are right...

In [None]:
registered_users.append(
    {
        "username": "ghi",
        "password1": "pass",
        "password2": "different-pass",
        "given_name": "John",
        "surname": "Johnson",
    },
)

validate_credentials(
    username=username,
    password=password,
    registered_users=registered_users,
)

So even though we provided the same, valid username and password, the function did not execute and raised a validation error because it could not coerce the list of registered user dictionaries into a list of `User` instances.

The `validate_arguments` decorator can provide peace of mind when developing code, because you know the input arguments will have the specified type inside the function body. As with any validation, this does take time so the drawback is a decrease in performance. Though, in practice, this drop in speed is almost never important.

## Dynamic Model Definition

In rare cases it can be useful to create a Pydantic model dynamically, i.e. without defining the model as a class explicitely. This is useful for situatioms where the shape of a model is not known until runtime.. Pydantic supports this use case using the `create_model` function. The function takes the name of the model as the first argument, followed by the fields of the model. The fields are defined as a dictionary, where the keys are the names of the fields and the values are the types of the fields. The types can be either Python types or Pydantic models. The function returns a Pydantic model class.

In [None]:


list(combinations([1, 2, 3], 2))

Creating models from `TypedDict` and `NamedTuple` is also supported. The following example shows how to create a model from a `TypedDict`: