## Defining Data Quality SLAs
### Data Completeness
**Description**: Set an SLA that ensures that 95% of data fields in your dataset are filled (non-null values). Practice by checking a dataset of your choice and calculate its completeness.

In [1]:
# write your code from here
import pandas as pd

# Load your dataset (adjust path to your file)
df = pd.read_csv('your_dataset.csv')

# Calculate completeness per column (percentage of non-null values)
completeness_per_column = df.notnull().mean() * 100

# Calculate overall completeness (percentage of all filled fields in dataset)
overall_completeness = df.notnull().stack().mean() * 100

print("Completeness per column (%):")
print(completeness_per_column)

print(f"\nOverall dataset completeness: {overall_completeness:.2f}%")

# Define SLA threshold
sla_threshold = 95.0

if overall_completeness >= sla_threshold:
    print(f"✅ SLA Met: Dataset completeness is above {sla_threshold}%")
else:
    print(f"⚠️ SLA Not Met: Dataset completeness is below {sla_threshold}%")


FileNotFoundError: [Errno 2] No such file or directory: 'your_dataset.csv'

### Data Timeliness:
**Description**: Establish an SLA that specifies that data should be integrated and processed within 24 hours of acquisition. Monitor the data pipeline for timeliness.

In [None]:
# write your code from here
import pandas as pd
from datetime import datetime, timedelta

# Sample dataset with acquisition and processing timestamps
data = {
    'record_id': [1, 2, 3],
    'acquisition_time': ['2025-05-18 12:00:00', '2025-05-18 14:00:00', '2025-05-19 10:00:00'],
    'processing_time': ['2025-05-19 10:00:00', '2025-05-19 15:00:00', '2025-05-20 12:00:00']
}

df = pd.DataFrame(data)

# Convert to datetime
df['acquisition_time'] = pd.to_datetime(df['acquisition_time'])
df['processing_time'] = pd.to_datetime(df['processing_time'])

# Calculate delay between acquisition and processing
df['delay_hours'] = (df['processing_time'] - df['acquisition_time']).dt.total_seconds() / 3600

# SLA threshold in hours
sla_hours = 24

# Check which records meet the SLA
df['sla_met'] = df['delay_hours'] <= sla_hours

print(df)

# Summary
if df['sla_met'].all():
    print("✅ SLA Met: All records processed within 24 hours.")
else:
    failed = df[~df['sla_met']]
    print(f"⚠️ SLA Not Met: {len(failed)} record(s) exceeded 24-hour processing time.")



### Data Consistency:
**Description**: Define an SLA for maintaining consistency across various related datasets. Implement a check to ensure that 99% of data entries are consistent.

In [None]:
# write your code from here