# <img src="https://raw.githubusercontent.com/OlivierNDO/framecheck/main/images/logo.png" alt="FrameCheck" width="512" height="125">

# FrameCheck: Pandas DataFrame Validation

**FrameCheck** is a lightweight, flexible validation library for pandas DataFrames.

Instead of writing dozens of repetitive checks or dealing with complex schema configurations, FrameCheck offers a clean, fluent API that makes validation both readable and maintainable.

Key features:
- Simple, chainable validation methods
- Column and DataFrame-level validation
- Support for both error and warning-level assertions
- No configuration files or decorators

This notebook demonstrates how FrameCheck can help you implement robust validation with minimal code.

## Setup

In [None]:
# Install framecheck
!pip install framecheck -q

# Import required packages
import logging
import pandas as pd
from framecheck import FrameCheck, register_check_function

## Sample Data: Model Output Validation

Let's create a dataset representing ML model output that requires validation:

In [None]:
df = pd.DataFrame({
    'transaction_id': ['TXN1001', 'TXN1002', 'TXN1003'],
    'user_id': [501, 502, 503],
    'transaction_time': ['2024-04-15 08:23:11', '2024-04-15 08:45:22', '2024-04-15 09:01:37'],
    'model_score': [0.0, 0.92, 0.95],
    'model_version': ['v2.1.0', 'v2.1.0', 'v2.1.0'],
    'flagged_for_review': [False, True, False]
})

<div style="background-color: #f8f9fa; padding: 15px; border-left: 5px solid #4285f4; border-radius: 4px; margin: 15px 0;">
  <p style="margin: 0; font-size: 14px;">
    <strong>Optional:</strong> You may pass a logger to FrameCheck as of version 0.5.0, but it's not required.
  </p>
</div>

In [None]:
logger = logging.getLogger("model_validation")
logger.setLevel(logging.INFO)

## Validation Requirements

This data needs to meet these conditions:
- `transaction_id`: follows TXN format
- `user_id`: positive integer
- `transaction_time`: valid datetime
- `model_score`: float between 0-1 **(warn if equal to 0)**
- `model_version`: string
- `flagged_for_review`: boolean
- No missing values
- Business rule: high scores (>0.9) must be flagged

Define FrameCheck object and validate data

In [None]:
# Note - you may use a lambda instead, but that will prevent you from saving
@register_check_function(name="high_score_is_flagged")
def high_score_is_flagged(row):
    return row['model_score'] <= 0.9 or row['flagged_for_review'] is True

model_score_validator = (
    FrameCheck(logger = logger)
    .column('transaction_id', type='string', regex=r'^TXN\d{4,}$')
    .column('user_id', type='int', min=1)
    .column('transaction_time', type='datetime', before='now')
    .column('model_score', type='float', min=0.0, max=1.0)
    .column('model_score', type='float', not_in_set=[0.0], warn_only=True)
    .column('model_version', type='string')
    .column('flagged_for_review', type='bool')
    .custom_check(
        high_score_is_flagged,
        "flagged_for_review must be True when model_score > 0.9"
    )
    .not_null()
    .not_empty()
    .only_defined_columns()
)

result = model_score_validator.validate(df)

if not result.is_valid:
    print(result.summary())

Optional - save as a serialized object

In [None]:
model_score_validator.save('model_score_validator.json')

Loading and using the saved object

In [None]:
reloaded_validator = FrameCheck().load('model_score_validator.json')
result = reloaded_validator.validate(df)

See what checks are done in the loaded validator with .info()

In [None]:
reloaded_validator.info()

## Validation Results

FrameCheck shows two issues:

**Warning:**
- `model_score` contains value 0.0 (suspicious but allowed)

**Error:**
- Transaction with score > 0.9 not flagged for review (violates business rule)

## Using Validation Results

Get a summary of all validation issues

In [None]:
print(result.summary())

Identify invalid rows

In [None]:
invalid_rows = result.get_invalid_rows(df)
invalid_rows

Access errors and warnings (although these will be logged or printed to console already by default)

In [None]:
result.errors

In [None]:
result.warnings

## Raising Exceptions

Use `.raise_on_error()` to throw exceptions for invalid data

In this example, we require that `id` be non-null and unique.

In [None]:
simple_df = pd.DataFrame({
    'id': ['A001', 'A001', 'A003', None],
    'value': [5, -1, 10, 7]
})

strict_validator = (
    FrameCheck()
    .column('id', type='string', regex=r'^A\d{3}$', not_null = True)
    .unique(columns=['id'])
    .raise_on_error()
)

# This will raise a ValueError with detailed validation message
strict_validator.validate(simple_df)

## Design Patterns in Production

One of the framecheck principles is `No configuration files. Ever.`

To keep your codebase clean, you can define your FrameCheck objects in a module and import them.

> Pattern 1: Define validators in a module
-------------------------------------------

#### validators.py

```python

import logging
from framecheck import FrameCheck
logger = logging.getLogger('main') # logger optional but recommended for prod
# ... configure logger ...

price_validator = (
    FrameCheck(logger = logger)
    .column('item_id', type='string')
    .column('price', type='float', min=0)
    .not_null()
)
```

#### main.py (or wherever)

```python
from validators import price_validator

result = price_validator.validate(df)
```

> Pattern 2: Save and load serialized object
-------------------------------------------

However, you may also save the FrameCheck object as of version 0.4.4

```python
price_validator.save('price_validator.json')
```

...and then load

```python
price_validator = FrameCheck.load('price_validator.json')
```