Manual Data Validation
==================

```{important} Starting File: CHAPTER_2_WAFFLES
This chapter will start from the CHAPTER_2_WAFFLES and end on the CHAPTER_3_WAFFLES.
```

We so far have looked at type hints in Python as augments to arguments, `dataclass` decorators to reduce code and automatically cast variables to attributes of the same name, and how to make our inputs more stable. Everything we've done up to now has been making our code easier to setup, read, and work around. However, we have not done anything to validate the inputs. 

No mater your field of work, validation of data is going to be something that happens. There is no getting around it, especially in the scientific field. It may be offloaded, automated, or even trvially simple, but it will happen; so we may as well get better at it.

```{admonition} Compatibility with Python 3.8 and below
:class: note
If you have Python 3.8 or below, you will need to import container type objects such as `List`, `Tuple`, `Dict`, etc. from the `typing` library instead of their native types of `list`, `tuple`, `dict`, etc. This chapter will assume Python 3.9 or greater, however, both approaches will work in >=Python 3.9 and have 1:1 replacements of the same name.
```

## Dataclass `__post_init__` method

Let's take a look at our code as we left it from last chapter.

In [1]:
from dataclasses import dataclass
from typing import Union

@dataclass
class Molecule:
    name: str
    charge: Union[float, int]
    symbols: list[str]
    coordinates: list[list[float]]
        
    @property
    def num_atoms(self):
        return len(self.symbols)
        
    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

In [2]:
mol_data = {
    "coordinates": [[0, 0, 0]], 
    "symbols": ["H", "H", "O"], 
    "charge": 0.0, 
    "name": "water"
}

We've seen several examples so far of feeeding in non-type appropriate code and not having errors. Now we're going to do actual validation on our `dataclass`. Although there are third party libraries to do some of this, you the developer still have to have an understand the scientific use case for what is considered valid; even beyond the type checking itself.

We first have to know how to access the data on input. The `dataclass` decorator takes over the normal `__init__` process where someone may expect to write our validation code, or call the validation function(s). `dataclass` also provides a secondary function called `__post_init__` which is called automatically after the `__init__`, if it is defined. This function is basically free space for the developer to do whatever they want with the `dataclass` like there was an `__init__`, just after instance variables are assigned.

In [9]:
@dataclass
class Molecule:
    name: str
    charge: Union[float, int]
    symbols: list[str]
    coordinates: list[list[float]]
    
    def __post_init__(self):
        # Do whatever you want here, all instance attributes will be available.
        print(f"{[getattr(self, thing) for thing in mol_data.keys()]}")
        print("Post Init Ran")
        
    @property
    def num_atoms(self):
        return len(self.symbols)
        
    def __str__(self):
        return f"name: {self.name}\ncharge: {self.charge}\nsymbols: {self.symbols}"

In [10]:
water = Molecule(**mol_data)

[[[0, 0, 0]], ['H', 'H', 'O'], 0.0, 'water']
Post Init Ran


What you can see in the above code is that `__post_init__` did run, and does have access to all of the attributes we're working with in this problem. If we wanted to recreate the `__init__` like settings we had from the BASE_FILE_WAFFLES, we could set `self.num_atoms = len(self.symbols)` in the `__post_init_` as well, but we'll leave it as a `property`. Let's actually delve into some validation, starting with simple type validation.

## Manually validating types

Although there are external libraries to do type and value validation of data, we're going to go through the manual process in this chapter to show all the nuances that have to be thought of. Even the most sophisticated type-checking libraries still need the programmer to tell them what are the correct types, and are the values of those incoming data correct for the application. 