Following the [README](https://github.com/unionai-oss/pandera)

**Key Features**

1. Define a schema once and use it to validate different dataframe types including `pandas`, `dask`, `modin`, and `pyspark.pandas`.
1. `Check` the types and properties of columns in a pd.DataFrame or values in a pd.Series.
1. Perform more complex statistical validation like *hypothesis testing*.
1. Seamlessly integrate with existing data analysis/processing pipelines via function *decorators*.
1. Define schema models with the class-based API with `pydantic`-style syntax and validate dataframes using the typing syntax.
1. Synthesize data from schema objects for property-based testing with pandas data structures.
1. Lazily Validate dataframes so that all validation rules are executed before raising an error.
1. Integrate with a rich ecosystem of python tools like `pydantic`, `fastapi` and `mypy`.


### Quick Start

In [1]:
import pandas as pd
import pandera as pa


# data to validate
df = pd.DataFrame(
    {
        "column1": [1, 4, 0, 10, 9],
        "column2": [-1.3, -1.4, -2.9, -10.1, -20.4],
        "column3": ["value_1", "value_2", "value_3", "value_2", "value_1"],
    }
)

# define schema
schema = pa.DataFrameSchema(
    {
        "column1": pa.Column(int, checks=pa.Check.le(10)),
        "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
        "column3": pa.Column(
            str,
            checks=[
                pa.Check.str_startswith("value_"),
                # define custom checks as functions that take a series as input and
                # outputs a boolean or boolean Series
                pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2),
            ],
        ),
    }
)

validated_df = schema(df)  # NOTE.
display(validated_df)

Unnamed: 0,column1,column2,column3
0,1,-1.3,value_1
1,4,-1.4,value_2
2,0,-2.9,value_3
3,10,-10.1,value_2
4,9,-20.4,value_1


### Schema Model

`pandera` also provides an *alternative API* for expressing schemas inspired by `dataclasses` and `pydantic`.

The equivalent `SchemaModel` for the above `DataFrameSchema` would be:

In [2]:
from pandera.typing import Series


class Schema(pa.SchemaModel):

    column1: Series[int] = pa.Field(le=10)
    column2: Series[float] = pa.Field(lt=-1.2)
    column3: Series[str] = pa.Field(str_startswith="value_")

    @pa.check("column3")
    def column_3_check(cls, series: Series[str]) -> Series[bool]:
        """Check that values have two elements after being split with '_'"""
        return series.str.split("_", expand=True).shape[1] == 2  # pyright: ignore


Schema.validate(df)  # NOTE.

Unnamed: 0,column1,column2,column3
0,1,-1.3,value_1
1,4,-1.4,value_2
2,0,-2.9,value_3
3,10,-10.1,value_2
4,9,-20.4,value_1
