### Monitor Data Quality Trends Over Time

**Task 1**: Create a Trends Analysis Report

**Objective**: Understand long-term data quality trends.

**Steps**:
1. Use historical data (or simulate data) to analyze how data quality has changed over time.
2. Calculate trends for the KPIs defined earlier using statistical measures or visual charts.
3. Write a report summarizing your findings, noting any persistent issues or improvements.

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
weeks = pd.date_range(start='2025-02-01', periods=10, freq='W')
data = {
    'Week': weeks,
    'Accuracy Rate': np.clip(np.random.normal(0.92, 0.02, size=10), 0.85, 0.95),
    'Completeness Rate': np.clip(np.random.normal(0.89, 0.03, size=10), 0.8, 0.95),
    'Timeliness Rate': np.clip(np.random.normal(0.75, 0.05, size=10), 0.6, 0.85)
}
df_trends = pd.DataFrame(data)
df_trends.to_csv('kpi_trends.csv', index=False)
df_trends.set_index('Week').plot(figsize=(10, 6), marker='o', title='Data Quality KPI Trends Over Time')
plt.ylabel('KPI Value')
plt.grid(True)
plt.tight_layout()
plt.savefig('kpi_trend_chart.png')
plt.show()

**Task 2**: Evaluate Continuous Improvement Measures

**Objective**: Implement strategic changes based on trend analysis.

**Steps**:
1. Identify patterns or recurring issues from your trend analysis report.
2. Propose three continuous improvement strategies to address these issues.
3. Plan how to implement these strategies and measure their effectiveness over the next cycle.

In [None]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
weeks = pd.date_range(start='2025-02-01', periods=10, freq='W')
phase = ['Pre' if i < 5 else 'Post' for i in range(10)]
accuracy = np.concatenate([np.random.normal(0.90, 0.02, 5), np.random.normal(0.94, 0.01, 5)])
completeness = np.concatenate([np.random.normal(0.85, 0.03, 5), np.random.normal(0.91, 0.02, 5)])
timeliness = np.concatenate([np.random.normal(0.70, 0.05, 5), np.random.normal(0.80, 0.03, 5)])
df = pd.DataFrame({
    'Week': weeks,
    'Phase': phase,
    'Accuracy Rate': np.clip(accuracy, 0.85, 0.96),
    'Completeness Rate': np.clip(completeness, 0.75, 0.95),
    'Timeliness Rate': np.clip(timeliness, 0.60, 0.90)
})
df.to_csv('kpi_improvement_trend.csv', index=False)
fig, axes = plt.subplots(3, 1, figsize=(10, 10), sharex=True)
kpis = ['Accuracy Rate', 'Completeness Rate', 'Timeliness Rate']
for i, kpi in enumerate(kpis):
    df.plot(x='Week', y=kpi, ax=axes[i], marker='o', color='tab:blue', label=kpi)
    axes[i].axvline(df['Week'][4], color='red', linestyle='--', label='Strategy Applied' if i == 0 else "")
    axes[i].set_ylabel(kpi)
    axes[i].grid(True)
plt.suptitle('Data Quality KPI Trends Before and After Strategy Implementation', fontsize=14)
plt.tight_layout()
plt.subplots_adjust(top=0.92)
plt.savefig('kpi_strategy_impact.png')
plt.show()
summary = df.groupby('Phase')[['Accuracy Rate', 'Completeness Rate', 'Timeliness Rate']].mean().round(3)
print(summary)
