## Compare Data Completeness Over Time

**Description**: Analyze the trend of missing data in `"sales_data.csv"` over several months stored in a "date" column. Visualize missing data rates by month.

In [2]:
# Write your code from here
import pandas as pd
import numpy as np

# Seed for reproducibility
np.random.seed(42)

# Generate date range for 6 months (e.g., Jan to June 2024)
dates = pd.date_range(start='2024-01-01', end='2024-06-30', freq='D')

# Number of rows (daily)
n = len(dates)

# Create synthetic sales data with some randomness
data = {
    'date': dates,
    'Monthly_Sales': np.random.randint(1000, 5000, size=n),
    'Region': np.random.choice(['North', 'South', 'East', 'West'], size=n),
    'Product_Category': np.random.choice(['A', 'B', 'C'], size=n),
    'Customer_Visits': np.random.randint(50, 200, size=n)
}

df = pd.DataFrame(data)

# Introduce missing values (~5%)
for col in ['Monthly_Sales', 'Region', 'Product_Category', 'Customer_Visits']:
    df.loc[df.sample(frac=0.05).index, col] = np.nan

# Save to CSV
df.to_csv('sales_data.csv', index=False)

print("Sample sales_data.csv generated with shape:", df.shape)
print(df.head())


Sample sales_data.csv generated with shape: (182, 5)
        date  Monthly_Sales Region Product_Category  Customer_Visits
0 2024-01-01         4174.0  North                B             72.0
1 2024-01-02         4507.0  South                C            176.0
2 2024-01-03         1860.0   East                C            186.0
3 2024-01-04         2294.0   East              NaN            189.0
4 2024-01-05         2130.0  South                A            178.0
