
# STEP 10: Final Validation (Complete Pandas Guide)

This notebook covers **ALL essential validation checks** to perform
AFTER data cleaning and BEFORE analysis, reporting, or ML.

Focus: **data quality, consistency, correctness, and readiness**.


In [None]:

import pandas as pd
import numpy as np


## 1. Sample Cleaned Dataset

In [None]:

df = pd.DataFrame({
    "order_id": [1001, 1002, 1003, 1004],
    "customer": ["Alice", "Bob", "Charlie", "David"],
    "amount": [2500.0, 1800.0, 2200.0, 3000.0],
    "city": ["Mumbai", "Delhi", "Mumbai", "Pune"],
    "order_date": pd.to_datetime(["2024-01-01", "2024-01-02", "2024-01-03", "2024-01-04"])
})
df


## 2. Final Shape & Structure Check

In [None]:

df.shape
df.info()


## 3. Check Missing Values (Should be Zero)

In [None]:

df.isnull().sum()


## 4. Validate Data Types

In [None]:

df.dtypes


## 5. Check Duplicate Records

In [None]:

df.duplicated().sum()


## 6. Validate Business Rules

In [None]:

# Example business rules
df['amount'].min() >= 0
df['order_id'].is_unique


## 7. Statistical Sanity Check

In [None]:

df.describe()


## 8. Validate Categorical Values

In [None]:

df['city'].unique()
df['city'].value_counts()


## 9. Date Range Validation

In [None]:

df['order_date'].min()
df['order_date'].max()


## 10. Memory & Performance Check

In [None]:

df.memory_usage(deep=True)


## 11. Schema Validation (Optional)

In [None]:

expected_columns = ['order_id', 'customer', 'amount', 'city', 'order_date']
set(df.columns) == set(expected_columns)



## ✅ Best Practices & Interview Notes
- Validation is mandatory before analysis
- Always recheck nulls, types, duplicates
- Validate against business rules
- Sanity checks prevent silent data bugs



## ✔ Summary
- Final validation ensures data reliability
- This step protects downstream analytics
- Never skip validation in real projects
