## Defining Data Quality SLAs
### Data Completeness
**Description**: Set an SLA that ensures that 95% of data fields in your dataset are filled (non-null values). Practice by checking a dataset of your choice and calculate its completeness.

In [1]:
import pandas as pd

data = {
    "customer_id": [1, 2, 3, 4, 5],
    "name": ["Alice", "Bob", None, "David", "Eve"],
    "email": ["a@example.com", None, "c@example.com", "d@example.com", "e@example.com"],
    "age": [25, 30, 22, None, 29]
}

df = pd.DataFrame(data)

total_values = df.size
non_null_values = df.count().sum()
completeness_percentage = (non_null_values / total_values) * 100

sla_pass = completeness_percentage >= 95

completeness_percentage, sla_pass


(85.0, False)

### Data Timeliness:
**Description**: Establish an SLA that specifies that data should be integrated and processed within 24 hours of acquisition. Monitor the data pipeline for timeliness.

In [2]:
import pandas as pd
from datetime import datetime, timedelta

data = {
    "record_id": [1, 2, 3, 4],
    "acquisition_time": [
        datetime.now() - timedelta(hours=10),
        datetime.now() - timedelta(hours=25),
        datetime.now() - timedelta(hours=3),
        datetime.now() - timedelta(hours=27)
    ],
    "processing_time": [
        datetime.now() - timedelta(hours=5),
        datetime.now() - timedelta(hours=20),
        datetime.now() - timedelta(hours=2),
        datetime.now() - timedelta(hours=25)
    ]
}

df = pd.DataFrame(data)

df["processing_delay"] = (df["processing_time"] - df["acquisition_time"]).dt.total_seconds() / 3600
df["sla_met"] = df["processing_delay"] <= 24

sla_compliance = df["sla_met"].mean() * 100

df[["record_id", "processing_delay", "sla_met"]], sla_compliance


(   record_id  processing_delay  sla_met
 0          1               5.0     True
 1          2               5.0     True
 2          3               1.0     True
 3          4               2.0     True,
 100.0)

### Data Consistency:
**Description**: Define an SLA for maintaining consistency across various related datasets. Implement a check to ensure that 99% of data entries are consistent.

In [3]:
import pandas as pd

orders = pd.DataFrame({
    "order_id": [101, 102, 103, 104, 105],
    "customer_id": [1, 2, 3, 4, 5]
})

customers = pd.DataFrame({
    "customer_id": [1, 2, 3, 4],
    "name": ["Alice", "Bob", "Charlie", "David"]
})

orders["is_consistent"] = orders["customer_id"].isin(customers["customer_id"])
consistency_rate = orders["is_consistent"].mean() * 100

orders, consistency_rate


(   order_id  customer_id  is_consistent
 0       101            1           True
 1       102            2           True
 2       103            3           True
 3       104            4           True
 4       105            5          False,
 80.0)