(ch05)=
# Validating Data Beyond Types

```{admonition} Starting File: <code>04_pydantic_molecule.py</code>
:class: important
This chapter will start from the <code>04_pydantic_molecule.py</code> and end on the <code>05_valid_pydantic_molecule.py</code>.
```

Data validation goes far beyond just type. *Pydantic* has provided the basic tools for doing data validation on data types, but it also provides the tools for writing custom validators to check so much more.

We'll be covering the *pydantic* `validator` decorator and applying that to our data to check structure and scientific rigor. We'll also cover how to validate types not native to Python, such as NumPy arrays.

```{admonition} Check Out Pydantic
:class: note
We will not be covering all the capabilities of *pydantic* here, and we highly encourage you to visit [the pydantic docs](https://pydantic-docs.helpmanual.io/) to learn about all the powerful and easy-to-execute things *pydantic* can do.
```



```{admonition} Compatibility with Python 3.8 and below
:class: note
If you have Python 3.8 or below, you will need to import container type objects such as `List`, `Tuple`, `Dict`, etc. from the `typing` library instead of their native types of `list`, `tuple`, `dict`, etc. This chapter will assume Python 3.9 or greater, however, both approaches will work in >=Python 3.9 and have 1:1 replacements of the same name.
```

## Pydantic's Validator Decorator

Let's start by looking at the state of our code prior to extending the validators. As usual, let's also define our test data.

In [1]:
from pydantic import BaseModel


class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: list[list[float]]

    @property
    def num_atoms(self):
        return len(self.symbols)

In [2]:
mol_data = {  # Good data
    "coordinates": [[0, 0, 0], [1, 1, 1], [2, 2, 2]], 
    "symbols": ["H", "H", "O"], 
    "charge": 0.0, 
    "name": "water"
}

bad_name = {"name": 789}  # Name is not str
bad_charge = {"charge": [1, 0.0]}  # Charge is not int or float
noniter_symbols = {"symbols": 1234567890}  # Symbols is an int
nonlist_symbols = {"symbols": '["H", "H", "O"]'}  # Symbols is a string (notably is a string-ified list)
tuple_symbols = {"symbols": ("H", "H", "O")}  # Symbols as a tuple?
bad_coords = {"coordinates": ["1", "2", "3"]}  # Coords is a single list of string
inner_coords_not3d = {"coordinates": [[1, 2, 3], [4, 5]]}
bad_symbols_and_cords = {"symbols": ["H", "H", "O"],
                         "coordinates": [[1, 1, 1], [2.0, 2.0, 2.0]]
                        }  # Coordinates top-level list is not the same length as symbols

You may notice we have extended our "Good Data" here to have `coordinates` actually define the `Nx3` structure where `N = len(symbols)`. This is important for what we plan to validate.

*pydantic* allows you to write custom validators, in addition to the type validators which run automatically for a type annotation. This `validator` is pulled from the `pydantic` module just like `BaseModel`, and is used to decorate a *class* function you write. Let's look at the most basic `validator` we can write and assign it to `coordinates`.

In [3]:
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: list[list[float]]
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        return coords

    @property
    def num_atoms(self):
        return len(self.symbols)

Here we have defined an additional validator which does nothing, but has the basic structure we can look at. For convenience and reference, I've broken the aspects of the `validator` into a list.

* The `validator` decorator takes as arguments the *exact* name of the attributes you are validating against as a string. In this case `coordinates`. You could provide multiple string args of each attribute you want to run through the validator if you want to reuse it.
* The function name can be whatever you want it to be. We've called it `ensure_coordinates_is_3D` to be meaningful if anyone ever wants to come back and see what this should be doing.
* The function itself is a *class function*. Similar to what happens when you use the `@classmethod` decorator from native Python, this validator is intended to be called on the non-instanced class. The formal nomenclature for the first variable here is therefore `cls` and not `self`. Your IDE may complain about this, but it should be `cls`. 
* The first argument of the function can be whatever string name you want EXCLUDING the following list: `values`, `config`, and `field` (reasons discussed later in this chapter).
* The return MUST be the validated data to be fed into the attribute. We've done nothing to our variable `coords`, so we simply return it. If you fail to have a `return` statement with something, it will return `None` and that will be considered valid.
* `validator` runs *after* type validation, unless specified (see later in this chapter).

That may seem like lots of rules, but most of them are boilerplate and intuitive. Let's apply these items to our validator. We want to make sure the inner lists of `coordinates` are 3D, or length 3. We don't have to worry about type checking (that was done before any custom `validator` was run), so we can just do an iteration of the top list and make sure. Let's apply that now.

In [4]:
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: list[list[float]]
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        if any(len(failure := inner) != 3 for inner in coords):  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

In [5]:
good_water = Molecule(**mol_data)
mangled = {**mol_data, **inner_coords_not3d}
water = Molecule(**mangled)

ValidationError: 1 validation error for Molecule
coordinates
  Inner coordinates must be 3D, got [4.0, 5.0] of length 2 (type=value_error)

Here we have checked the good data still works, and checked that the mangled data raised an error. It's important to note the error raised by the function can be any type of error, but what came out in the error report was a `ValidationError`. We can also see the error message is what we put as the error string and `type` of error is of the type we raised. This is why it's very important to have meaningful error strings when your custom validator fails.

With all that said, our validator function really does look like any other function we may call to do a quick check of data, and then some special addons to make it work with *pydantic*. There is no practical limit to the number of `validator`s you have in a given class, so validate to your heart's content.

```{admonition} Python Assignment Expressions "The Walrus Operator" <code>:=</code>
:class: note
Since Python 3.8, there is a new operator for "assignment expressions" called "[The Walrus Operator](https://peps.python.org/pep-0572/)" which allows variables to be assigned inside other expressions. We've used it here to trap the value at time of error and save space. Do not feel compelled to use this yourself, especially if it's not clear what is happening.
```

<div class="exercise">
<p class="exercise-title"> Check your knowledge: Validator Basics
    <p>How would you validate that <code>symbols</code> entries are at most 2 characters? There is more than one correct solution beyond what we show here.</p>

```{admonition} Possible Solution:
:class: dropdown
```python
@validator("symbols")
def symbols_are_possible_element_length(cls, symbs):
    if not all(1 <= len(failure := symb) <= 2 for symb in symbs):
        raise ValueError(f"Symbols be 1 or 2 characters, got {failure}")
    return symbs
```
</div>

## Validating against other fields

*pydantic*'s validators can check fields beyond their own. This is helpful for cross referencing dependent data. In our example, we want to make sure there are exactly the right number of `coordinates` as there are `symbols` in our `Molecule`. To check against other fields in a `validator`, we extend the arguments to include one called `values`. We are going to leave our initial validator to show a feature of the `validator`s for now, but we could combine them (and will) later.

In [6]:
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: list[list[float]]
        
    @validator("coordinates")
    def ensure_coordinates_match_symbols(cls, coords, values):
        n_symbols = len(values["symbols"])
        if (n_coords := len(coords)) != n_symbols:  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols." 
                             f" There are {n_coords} coordinates and {n_symbols} symbols.")
        return coords
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        if any(len(failure := inner) != 3 for inner in coords):  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

We've added a second validator to our code called `ensure_coordinates_match_symbols`, and this funciton will validate against `coordinates`. There are two main things we can see from adding this function:

1. Multiple functions can be declared to validate against the same field.
2. We've added a one of the blocked argument names to our new validator: `values`.

The reason the blocked argument names were given in the list of rules for `validators` is because *pydantic*'s `validator` reserves those to inject special code. The addition of `values` as an argument tells the `validator` to also retrieve *all previously validated fields for the model*. In our case, that would be `name`, `charge`, and `symbols` as those entries appeared before `coordinates` in the list of attributes. Any and all validators which would have been applied to those three entries have already been done and what we have access to is their validated records as a dictionary called `values` in the function itself. [See the *pydantic* docs](https://pydantic-docs.helpmanual.io/usage/validators/) for more details about the special arguments in `validator`.

Let's see this in action

In [7]:
good_water = Molecule(**mol_data)
mangled = {**mol_data, **bad_symbols_and_cords}
water = Molecule(**mangled)

ValidationError: 1 validation error for Molecule
coordinates
  There must be an equal number of XYZ coordinates as there are symbols. There are 2 coordinates and 3 symbols. (type=value_error)

## Non-native Types in Pydantic

Scientific data does not, and often should not, be confined to native Python types. One of the most common data types, especially in the sciences, is the NumPy Array (`ndarray` class). The most natural place for this would be `coordinates` where we want to simplify this list of list construct. Let's see what happens when we try to just make the type annotation a `ndarray` and see how *pydantic* handles coercion, or how it does not.

In [8]:
import numpy as np
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: np.ndarray
        
    @validator("coordinates")
    def ensure_coordinates_match_symbols(cls, coords, values):
        n_symbols = len(values["symbols"])
        if (n_coords := len(coords)) != n_symbols:  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols." 
                             f" There are {n_coords} coordinates and {n_symbols} symbols.")
        return coords
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        if any(len(failure := inner) != 3 for inner in coords):  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

RuntimeError: no validator found for <class 'numpy.ndarray'>, see `arbitrary_types_allowed` in Config

This error was thrown because *pydantic* is coded to handle certain types of data, but it cannot handle types it was not programmed to understand. However, *pydantic* does provide a useful error message to fix this.

You can configure your *pydantic* models to modify their behavior by adding a class within the `BaseModel` class explicitly called `Config`. This is not an imported object, its just a class bearing that name. Within that class, you set class attributes that serve as the options.

```{admonition} More Config settings
:class: note
You can see all of the config settings [in the *pydantic* docs](https://pydantic-docs.helpmanual.io/usage/model_config/)
```

Our particular error is saying we need to configure our model and set `arbitrary_types_allowed`, in this case to `True`. This will tell this particular `BaseModel` to permit types that it does not naturally understand how to handle, and assume the user/programer will handle it. Let's see what `Molecule` looks like with this set. Note: The location of the `class Config` statement does not matter, and `Config` is on a per-model basis, not a global *pydantic* configuration.

In [9]:
import numpy as np
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: np.ndarray
        
    class Config:
        arbitrary_types_allowed = True
        
    @validator("coordinates")
    def ensure_coordinates_match_symbols(cls, coords, values):
        n_symbols = len(values["symbols"])
        if (n_coords := len(coords)) != n_symbols:  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols." 
                             f" There are {n_coords} coordinates and {n_symbols} symbols.")
        return coords
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        if any(len(failure := inner) != 3 for inner in coords):  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

Our model is now configured to allow arbitrary types; no more error. Let's see what happens when we pass in our data.

In [10]:
water = Molecule(**mol_data)

ValidationError: 1 validation error for Molecule
coordinates
  instance of ndarray expected (type=type_error.arbitrary_type; expected_arbitrary_type=ndarray)

We're still getting a validation error, but it's different. *pydantic* is now telling us that the data given to `coordinates` must be of type `ndarray`. Remember there are two default levels of validation in *pydantic*: Ensure type, manually written validators. When we have `arbitrary_types_allowed` configured, any unknown type to *pydantic* is not type-checked or coerced beyond that it is the declared type. Effectively, a glorified `isinstance` check.

So to fix this, either the user has to have already cast the data to the expected type, or the developer has to preempt the type validation somehow.

## Pre-Validators in Pydantic

Good news! You can make *pydantic* validators that run before the type validation, effectively adding a third layer of validation stack. These are called "pre-validators" and will run before any other level of validator. The primary use case for these validators is data coercion, and that includes casting incoming data to specific types. E.g. Casting a list of lists to a NumPy array because we have `arbitrary_types_allowed` set.

A pre-validator is defined exactly like any other `validator`, it just has the keyword `pre=True` in its arguments. We're going to use the validator to take the `coordinates` data in, and cast it to a NumPy array.

In [11]:
import numpy as np
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: np.ndarray
        
    class Config:
        arbitrary_types_allowed = True
    
    @validator("coordinates", pre=True)
    def coord_to_numpy(cls, coords):
        try:
            coords = np.asarray(coords)
        except ValueError:
            raise ValueError(f"Could not cast {coords} to numpy array")
        return coords
        
    @validator("coordinates")
    def ensure_coordinates_match_symbols(cls, coords, values):
        n_symbols = len(values["symbols"])
        if (n_coords := len(coords)) != n_symbols:  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"There must be an equal number of XYZ coordinates as there are symbols." 
                             f" There are {n_coords} coordinates and {n_symbols} symbols.")
        return coords
        
    @validator("coordinates")
    def ensure_coordinates_is_3D(cls, coords):
        if any(len(failure := inner) != 3 for inner in coords):  # Walrus operator (:=) for Python 3.8+
            raise ValueError(f"Inner coordinates must be 3D, got {failure} of length {len(failure)}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

Now we can see what happens when we run our model

In [12]:
water = Molecule(**mol_data)
water.coordinates

array([[0, 0, 0],
       [1, 1, 1],
       [2, 2, 2]])

We now have a NumPy array for our `coordinates`. Since we now have a NumPy array for `coordinates`, we can refine the original `validator`s. We'll condense our normal `coordinates` `validator`s down to a single one.

In [13]:
import numpy as np
from pydantic import BaseModel, validator


class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: np.ndarray
        
    class Config:
        arbitrary_types_allowed = True
    
    @validator("coordinates", pre=True)
    def coord_to_numpy(cls, coords):
        try:
            coords = np.asarray(coords)
        except ValueError:
            raise ValueError(f"Could not cast {coords} to numpy array")
        return coords
        
    @validator("coordinates")
    def coords_length_of_symbols(cls, coords, values):
        symbols = values["symbols"]
        if (len(coords.shape) != 2) or (len(symbols) != coords.shape[0]) or (coords.shape[1] != 3):
            raise ValueError(f"Coordinates must be of shape [Number Symbols, 3], was {coords.shape}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

In [14]:
water = Molecule(**mol_data)

In [15]:
mangle = {**mol_data, **bad_charge, **bad_coords}
water = Molecule(**mangle)

ValidationError: 2 validation errors for Molecule
charge
  value is not a valid float (type=type_error.float)
coordinates
  Coordinates must be of shape [Number Symbols, 3], was (3,) (type=value_error)

We've now upgraded our `Molecule` with more advanced data validation leaning into scientific validity, added in custom types which increase our model's usability, and configured our model to further expand our capabilities. The code is now at the Lesson Materials labeled `05_valid_pydantic_molecule.py`.

Next chapter we'll look at nesting models to allow more complicated data structures. 

Below is a supplementary section on how you can define custom, non-native types without `arbitrary_types_allowed`, giving you greater control over defining custom or even shorthand types.

## Supplemental: Defining Custom Types with Built-In Validators

In the example of this chapter, we showed how to combine `arbitrary_types_allowed` in `Config` with the `validator(..., pre=True)` to convert incoming data to the types not understood by *pydantic*. There are obvious limitations to this such as having to write a different set of validators for each Model, being limited (or at least confined) in how you can permit types through, and then having to be accepting of arbitrary types.

*pydantic* provides a separate way to write your custom class validator by extending the class in question. This can be done even to extend existing known types to augment them to special conditions. 

```{admonition} Pydantic example: Regular Expression Based String Extension
:class: note
The pydantic site has an example of [validating a string that is a UK postcode](https://pydantic-docs.helpmanual.io/usage/types/#classes-with-__get_validators__) on their site, which creates a custom validator for <code>str</code> type. Check it out for more examples.
```

Let's extend a NumPy array type to have be something *pydantic* can validate without needing to use `arbitrary_types_allowed`. The main thing you need is to make a subclass of the type in question, then create a `classmethod` called `__get_validators__` that takes no arguments, and yields a series of `classmethod` validation functions. But talk is cheap and examples are better.

In [16]:
import numpy as np

class ValidatableArray(np.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.cast_to_ndarray

    @classmethod
    def cast_to_ndarray(cls, v):
        try:
            v = np.asarray(v)
        except ValueError:
            raise ValueError(f"Could not cast {v} to NumPy Array!")

        return v

That's it. We've defined our subclass of `ndarray` called `ValidatableArray`. We added a `classmethod` called `__get_validators__` which accepts no argument and yields a `classfunctions` to handle the validation. We only have one function here, so there is only one object to `yield`. If there were multiple objects to yield, each object would accept the validated/coerced value from the one before it. Because these are all `classmethod`s and not called on an instance of the `ValidatableArray`, we don't have to do anything with the validated value inside `__get_validators__`; its all handled by a routine in *pydantic*.

Let's apply this to our `Molecule`.

```{admonition} This won't appear in the next chapter
:class: note
The main Lesson Materials will not have this modification since this is all supplemental. Next chapter will start with the <code>05_valid_pydantic_molecule.py</code> Lesson Materials.
```

In [17]:
from pydantic import BaseModel, validator

class Molecule(BaseModel):
    name: str
    charge: float
    symbols: list[str]
    coordinates: ValidatableArray
        
    @validator("coordinates")
    def coords_length_of_symbols(cls, coords, values):
        symbols = values["symbols"]
        if (len(coords.shape) != 2) or (len(symbols) != coords.shape[0]) or (coords.shape[1] != 3):
            raise ValueError(f"Coordinates must be of shape [Number Symbols, 3], was {coords.shape}")
        return coords
    
    @property
    def num_atoms(self):
        return len(self.symbols)

In [18]:
water = Molecule(**mol_data)

In [19]:
mangle = {**mol_data, **bad_charge, **bad_coords}
water = Molecule(**mangle)

ValidationError: 2 validation errors for Molecule
charge
  value is not a valid float (type=type_error.float)
coordinates
  Coordinates must be of shape [Number Symbols, 3], was (3,) (type=value_error)

We removed the `Config` since we no longer are handling arbitrary types: we're handling the explicit type we defined. We also removed the `pre=True` validator on `coordinates` because that work got pushed to the `ValidatableArray`. That new subclass we wrote already preempts our custom `coords_length_of_symbols` `validator` because it operates at the same time as the type annotation check, which comes before custom validators in order of operations.

If we wanted to make a custom schema output for our new type, we would need to add another class method called `__modify_schema__`. However, please refer to the [*pydantic* docs](https://pydantic-docs.helpmanual.io/usage/types/#classes-with-__get_validators__) for more details.

## Supplemental: Defining Custom NumPy Type AND Setting Data Type (*dtype*)

It is possible to set the NumPy array `dtype` as well as part of the type checking without having to define multiple custom types. This approach is not related to *pydantic* per se, but is a showcase of chaining several very advanced Python topics together.

In the previous Supplemental, we showed how to write a subclass with `__get_validators__` to define a NumPy `ndarray` type in *pydantic*. We cast the input data to a numpy array with the `np.asarray`. That function can also accept a `dtype=...` argument where you can specify the type of data the array will be. How would you support arbitrarily setting the `dtype`?

There are several, equally acceptable and perfectly valid, approaches to this. 

### Multiple Validators

One option would be to make multiple types of validators and call the one you need. And there are several ways to do this. The first way is to just make multiple classes.

In [20]:
class IntArray(np.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.cast_to_ndarray

    @classmethod
    def cast_to_ndarray(cls, v):
        try:
            v = np.asarray(v, dtype=int)
        except ValueError:
            raise ValueError(f"Could not cast {v} to NumPy Array!")
        return v
    
class FloatArray(np.ndarray):
    @classmethod
    def __get_validators__(cls):
        yield cls.cast_to_ndarray

    @classmethod
    def cast_to_ndarray(cls, v):
        try:
            v = np.asarray(v, dtype=float)
        except ValueError:
            raise ValueError(f"Could not cast {v} to NumPy Array!")
        return v

In [21]:
class IntMolecule(Molecule):
    coordinates: IntArray
        
class FloatMolecule(Molecule):
    coordinates: FloatArray
        
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)

[[0 0 0]
 [1 1 1]
 [2 2 2]]
[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]]


A valid approach, can be dropped in when needed. However, this involves code duplication. 

We can cut down on the work by defining a top level class and then inheriting and subclassing it as needed through Monkey Patching.

In [22]:
class BaseArray(np.ndarray):
    _dtype = None
    
    @classmethod
    def __get_validators__(cls):
        yield cls.cast_to_ndarray

    @classmethod
    def cast_to_ndarray(cls, v):
        try:
            v = np.asarray(v, dtype=cls._dtype)
        except ValueError:
            raise ValueError(f"Could not cast {v} to NumPy Array!")
        return v

class InttArray(BaseArray):
    _dtype = int
    
class FloatArray(BaseArray):
    _dtype = float

In [23]:
class IntMolecule(Molecule):
    coordinates: IntArray
        
class FloatMolecule(Molecule):
    coordinates: FloatArray
        
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)

[[0 0 0]
 [1 1 1]
 [2 2 2]]
[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]]


### Make an on Demand Typer Function

One option is to just make a function create types on demand.

In [24]:
class BaseArray(np.ndarray):
    _dtype = None
    
    @classmethod
    def __get_validators__(cls):
        yield cls.cast_to_ndarray

    @classmethod
    def cast_to_ndarray(cls, v):
        try:
            v = np.asarray(v, dtype=cls._dtype)
        except ValueError:
            raise ValueError(f"Could not cast {v} to NumPy Array!")
        return v

def array_typer(dtype):
    class GeneratedType(BaseArray):
        _dtype = dtype
    return GeneratedType

In [25]:
class IntMolecule(Molecule):
    coordinates: array_typer(int)
        
class FloatMolecule(Molecule):
    coordinates: array_typer(float)
        
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)

[[0 0 0]
 [1 1 1]
 [2 2 2]]
[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]]


But this has the problem of now having to regenerate a new class each time, and its type schema will always be "GeneratedType" class. This isn't a problem most of the time, but it can be a little confusing to suddenly see functions in type annotation instead of the normal types and square brackets.

### Metclass for on-the-fly assignment.

One option MolSSI deploys in our [`QCElemental` package](https://github.com/MolSSI/QCElemental/blob/295642189fe3c4d0812142c0304d8ae9c8674d4c/qcelemental/models/types.py#L39) is to use a [Python Metaclass](https://docs.python.org/3/reference/datamodel.html#metaclasses) as a way to define a class generator whose properties are set dynamically, then usable by the class. 

```{admonition} Here There be Forbidden Magics
:class: warning
“Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).”

— Tim Peters, Author of [Zen of Python, PEP 20](https://peps.python.org/pep-0020/)
```

Metaclasses are usually not something you want to touch, because you don't need to. The above methods provide a fine way to generate type hints dynamically. However, if you want to be fancy, you can use a Metaclass. The best primer I, Levi Naden, have found on Metaclasses at the time of writing this section (Fall 2022) was through [this Stack Overflow answer](https://stackoverflow.com/a/6581949/10364409).

For our example, we're going to define a base typed array, define a Metaclass which abuses the `__getindex__` to treat our `[ ]` arguments for "type hint" as a type assignment, then feed the metaclass in with that type specification as a settable parameter and generate classes on the fly.

Let's see that in code where we've annotated the lines everything in the last paragraph said.

In [26]:
import numpy as np

# Base typed array
class TypedArray(np.ndarray):    
    _dtype = None
    
    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, v):
        try:
            v = np.asarray(v, dtype=cls._dtype)
        except ValueError:
            raise ValueError("Could not cast {} to NumPy Array!".format(v))
        return v

# A Metaclass which abuses the `__getindex__` to treat our `[ ]` arguments for "type hint" as a type assignment
class ArrayMeta(type):
    def __getitem__(self, dtype):
        # Feed the metaclass in with that type specification as a settable parameter
        return type("Array", (TypedArray,), {"_dtype": dtype})

# Generate classes on the fly.
class Array(np.ndarray, metaclass=ArrayMeta):
    pass

Then in practice, we just "index" the `Array` class as needed.

In [27]:
class IntMolecule(Molecule):
    coordinates: Array[int]
        
class FloatMolecule(Molecule):
    coordinates: Array[float]
        
print(IntMolecule(**mol_data).coordinates)
print(FloatMolecule(**mol_data).coordinates)

[[0 0 0]
 [1 1 1]
 [2 2 2]]
[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]]


There are also a couple downsides to this. You have to understand Metaclasses, which we showed a quote warning against. You cannot just use a bare `Array` because then it's an arbitrary type. Lastly, if you want default `dtype` handling you have to use the `TypedArray` as the type annotation or `Array[None]`, which can be a bit confusing.

### Do what makes sense, and only if you need to

All of these methods are equally valid, with upsides and downsides alike. Your use case may not even need `dtype` specification and you can just accept the normal NumPy handling of casting to array plus your own custom `validator` functions to make sure the data look correct. Hopefully though this supplemental section has given you ideas to inspire your own code design and give you ideas on interesting and helpful things you can do.