# Data Observability Rules

In this notebook, we will see how rules can act as a pre or a post condition on a use case.  
With customer files, we will create a Personal_data dataset by joining the Name and the Age of customers following their ID. 

## Init Agent
We first init the data observability library, this will allow to track in a log file the data transformation happening in this notebook. 

In [None]:
from kensu.utils.kensu_provider import KensuProvider
K = KensuProvider().initKensu()
import kensu.pandas as pd

## Reading data
Let's read the datasets:
- The first one, `Name`, contains the full name of the customer, and the ID of the customer
- The second, `Age`, contains the age and the ID of the customer

In [None]:
Name=pd.read_csv('Name_Surname.csv')
Age=pd.read_csv('Age.csv')

In [None]:
Name

In [None]:
Age

## Pre-Condition

In the following cell, we are checking that the two input files contains the `id` column. 
Without this column, we can't merge the datasets. 
In case `id` is not present, we will have a warning. This will help in case of issue in the merge, to detect or exclude the cause. 

In [None]:
if 'id' not in Name.columns or 'id' not in Age.columns:
    logging.warning('Missing key in the input file(s)')
    

## Post-Condition

We can now merge the two dataset. 

In [None]:
Personal_data = Name.merge(Age, on='id')

In [None]:
Personal_data

This check will be performed at the time the `Personal_data.csv` file will be written. 
It will check that the number of rows of the output dataset has the same number of rows as the inputs, meaning that for all the customer ids in the `Name` dataset, we have a corresponding id in the `Age` dataset. 

In [None]:
from kensu.utils.rule_engine import check_nrows_consistency
check_nrows_consistency()

In [None]:
Personal_data.to_csv('Personal_data.csv')

The warning indicates that only 10 out of 19 customers have their id in the `Age` dataset, meaning that the output dataset is of poor quality. 