# Phase 1: Data Collection & Exploratory Data Analysis

This notebook demonstrates how to load the raw data collected from the various APIs and perform basic exploratory analysis.  You can run it after executing `python src/data/collect_air_quality.py` to populate the `data/raw/` directory.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Adjust these paths to point at your raw data files
openaq_path = '../data/raw/Vellore_openaq_2024-08-17_2024-08-24.csv'
weather_path = '../data/raw/Vellore_visualcrossing_2024-08-17_2024-08-24.csv'

openaq_df = pd.read_csv(openaq_path)
weather_df = pd.read_csv(weather_path)

print('OpenAQ records:', len(openaq_df))
print('Weather records:', len(weather_df))
openaq_df.head()

In [None]:
# Pivot OpenAQ data to get pollutants as columns
pivot = openaq_df.pivot_table(values='value', index='datetime', columns='parameter', aggfunc='mean')
pivot.index = pd.to_datetime(pivot.index)

# Plot PM2.5 time series
plt.figure(figsize=(12, 4))
pivot['pm25'].plot(title='PM2.5 concentration (OpenAQ)', ylabel='µg/m³')
plt.show()

You can extend this notebook by exploring other pollutants, overlaying weather variables and computing summary statistics.