# AWS Product Usage — EDA & Visualization

This notebook explores your AWS product usage dataset and generates a few portfolio‑ready charts.

**Instructions:** Place your CSVs in `data-sets/` (or update the `DATA_GLOB`).

In [None]:
import glob, os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

DATA_GLOB = 'data-sets/*.csv'  # change if needed
files = glob.glob(DATA_GLOB)
assert files, 'Put CSV files under data-sets/ or change DATA_GLOB.'
frames = [pd.read_csv(f, encoding='utf-8-sig') for f in files]
df = pd.concat(frames, ignore_index=True)
len(df), df.columns.tolist()

In [None]:
# Standardize columns

df.columns = [c.strip().lower().replace(' ', '_') for c in df.columns]
if 'date' in df.columns:
    df['date'] = pd.to_datetime(df['date'], errors='coerce')
df.head()

In [None]:
# Example chart: Top products by revenue/requests

metric = 'revenue' if 'revenue' in df.columns else ('requests' if 'requests' in df.columns else None)
if metric and 'product' in df.columns:
    top = df.groupby('product')[metric].sum().sort_values(ascending=False)[:10]
    plt.figure(); top.plot(kind='bar'); plt.title(f'Top Products by {metric}'); plt.tight_layout(); plt.show()
else:
    print('Add a numeric `revenue` or `requests` column and a `product` column to unlock this chart.')

## Next ideas
- Trend lines for revenue/usage
- Cohort retention (needs `user_id` + `date`)
- Error rate vs traffic
- Product mix area chart
