# Exercise 3: Detailed Data Analysis and Multiple Visualizations

**Objective:** Perform comprehensive data analysis with multiple visualizations.

**Skills Practiced:**
- Loading and cleaning complex datasets
- Data resampling and aggregation
- Creating dual y-axis plots
- Creating grouped bar charts
- Advanced plot customization
- Multiple visualizations in one exercise

## Part 1: Setup and Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.dates as mdates
import numpy as np
import os

# Plotting setup
pd.plotting.register_matplotlib_converters()
sns.set_theme(style="whitegrid", font="Times New Roman", font_scale=2)

# Create plots folder if it doesn't exist
os.makedirs('plots', exist_ok=True)
print("Setup Complete")

## Part 2: Load and Clean the Data

**Task:** Load the office sensor data and prepare it for analysis

In [None]:
# File path
filepath = "datasets/May_Office_213.csv"

# Load the CSV with encoding fix and parse dates
df = pd.read_csv(
    filepath,
    encoding='ISO-8859-1',
    index_col="Datetime",
    parse_dates=True,
    dayfirst=True  # For DD.MM.YYYY format
)

# Clean up column names: strip spaces and replace special characters
df.columns = df.columns.str.strip().str.replace('[^A-Za-z0-9_]+', '_', regex=True)

# Sort by datetime index
df = df.sort_index()

# Display information
print("First date in data:", df.index.min())
print("Last date in data:", df.index.max())
print("\nColumn names:")
print(df.columns.tolist())
print("\nFirst few rows:")
df.head()

## Part 3: Data Resampling

**Task:** Resample the data to daily averages for better visualization

In [None]:
# Select only numeric columns for resampling
numeric_df = df.select_dtypes(include='number')

# Resample to daily averages
daily_selected = numeric_df.resample('D').mean().round(2)

# Display the daily averages
print("Daily averages (from start of month):")
print(daily_selected.head())
print("\nShape:", daily_selected.shape)

## Part 4: Visualization 1 - Dual Y-Axis Plot

**Task:** Create a plot with two y-axes showing Humidity and Temperature

**Requirements:**
- Left y-axis: Relative Humidity (0-100%)
- Right y-axis: Temperature (0-40°C)
- Use different colors and markers for each line
- Format dates on x-axis (every 2 days)
- Figure size: (18, 9)
- Save as 'Office213_HumTemp_May2025.jpg'

In [None]:
# Your code here
fig, ax1 = plt.subplots(figsize=(18, 9))

# Left y-axis: Relative Humidity
color1 = sns.color_palette("deep")[0]  # Seaborn's blue
l1, = ax1.plot(
    daily_selected.index, 
    daily_selected['Humidity_'], 
    color=color1, 
    linestyle='-', 
    linewidth=3, 
    marker='o', 
    markersize=8, 
    label='Relative Humidity [%]'
)
ax1.set_ylabel('Relative Humidity [%]', color='k', fontsize=32, weight='bold')
ax1.tick_params(axis='y', labelcolor='k')
ax1.set_ylim([0, 100])
ax1.set_yticks([0, 25, 50, 75, 100])

# Right y-axis: Temperature
ax2 = ax1.twinx()
color2 = sns.color_palette("deep")[1]  # Seaborn's orange
l2, = ax2.plot(
    daily_selected.index, 
    daily_selected['Temperature_C_'], 
    color=color2, 
    linestyle='-', 
    linewidth=3, 
    marker='s', 
    markersize=8, 
    label='Indoor Temperature [°C]'
)
ax2.set_ylabel('Indoor Temperature [°C]', color='k', fontsize=32, weight='bold')
ax2.tick_params(axis='y', labelcolor='k')
ax2.set_ylim([0, 40])

# X-axis: Format dates nicely
ax1.xaxis.set_major_locator(mdates.DayLocator(interval=2))  # every 2 days
ax1.xaxis.set_major_formatter(mdates.DateFormatter('%d-%b'))
fig.autofmt_xdate(rotation=45)

# Grid and background
ax1.grid(True, which='major', axis='both', linestyle='--', alpha=0.5)

# Legend
lines = [l1, l2]
labels = [line.get_label() for line in lines]
ax1.legend(lines, labels, loc='upper right', fontsize=19, frameon=False, fancybox=True, shadow=True)

# Title
plt.title('Humidity and Temperature Over Time', fontsize=36, weight='bold', pad=20)

plt.tight_layout()
plt.savefig('plots/Office213_HumTemp_May2025.jpg', format='jpg', dpi=150)
plt.show()

## Part 5: Visualization 2 - Grouped Bar Chart

**Task:** Create a grouped bar chart comparing PM2.5 and PM10 concentrations

**Requirements:**
- Grouped bars side by side
- Y-axis: 0 to 25 with intervals of 5
- X-axis: Show dates every 3 days
- Use distinct colors for PM2.5 and PM10
- Figure size: (18, 9)
- Save as 'Office213_PMs_May2025.jpg'

In [None]:
# Your code here
# Data
pm25 = daily_selected['PM2_5_ug_m3_']
pm10 = daily_selected['PM10_ug_m3_']
dates = daily_selected.index

# Bar width and positions
bar_width = 0.4
x = np.arange(len(dates))

fig, ax = plt.subplots(figsize=(18, 9))

# Use distinct colors
color_pm25 = '#1f77b4'  # blue
color_pm10 = '#ff7f0e'  # orange

# PM2.5 bars
bars1 = ax.bar(x - bar_width/2, pm25, width=bar_width, label='PM2.5 (μg/m³)', color=color_pm25)

# PM10 bars
bars2 = ax.bar(x + bar_width/2, pm10, width=bar_width, label='PM10 (μg/m³)', color=color_pm10)

# Y-axis: interval of 5, max 25
ax.set_ylim(0, 25)
ax.set_yticks(np.arange(0, 26, 5))

# X-axis formatting: show every 3rd date for clarity
ax.set_xticks(x[::3])
ax.set_xticklabels([d.strftime('%d-%b') for d in dates[::3]], rotation=45, ha='right')

ax.set_ylabel('PM Concentration (μg/m³)', fontsize=32, weight='bold')
ax.set_xlabel('Date', fontsize=32, weight='bold')
ax.legend(fontsize=19, frameon=True, fancybox=True, shadow=True)
plt.title('Daily PM2.5 and PM10 Concentrations', fontsize=36, weight='bold', pad=20)

plt.tight_layout()
plt.savefig('plots/Office213_PMs_May2025.jpg', format='jpg', dpi=150)
plt.show()

## Part 6: Data Summary and Statistics

**Task:** Calculate and display summary statistics for key variables

In [None]:
# Your code here
print("=== SUMMARY STATISTICS ===\n")

# Temperature statistics
if 'Temperature_C_' in daily_selected.columns:
    print("Temperature (°C):")
    print(f"  Mean: {daily_selected['Temperature_C_'].mean():.2f}")
    print(f"  Min: {daily_selected['Temperature_C_'].min():.2f}")
    print(f"  Max: {daily_selected['Temperature_C_'].max():.2f}")
    print(f"  Std: {daily_selected['Temperature_C_'].std():.2f}\n")

# Humidity statistics
if 'Humidity_' in daily_selected.columns:
    print("Humidity (%):")
    print(f"  Mean: {daily_selected['Humidity_'].mean():.2f}")
    print(f"  Min: {daily_selected['Humidity_'].min():.2f}")
    print(f"  Max: {daily_selected['Humidity_'].max():.2f}")
    print(f"  Std: {daily_selected['Humidity_'].std():.2f}\n")

# PM2.5 statistics
if 'PM2_5_ug_m3_' in daily_selected.columns:
    print("PM2.5 (μg/m³):")
    print(f"  Mean: {daily_selected['PM2_5_ug_m3_'].mean():.2f}")
    print(f"  Min: {daily_selected['PM2_5_ug_m3_'].min():.2f}")
    print(f"  Max: {daily_selected['PM2_5_ug_m3_'].max():.2f}\n")

# PM10 statistics
if 'PM10_ug_m3_' in daily_selected.columns:
    print("PM10 (μg/m³):")
    print(f"  Mean: {daily_selected['PM10_ug_m3_'].mean():.2f}")
    print(f"  Min: {daily_selected['PM10_ug_m3_'].min():.2f}")
    print(f"  Max: {daily_selected['PM10_ug_m3_'].max():.2f}\n")

## Part 7: Challenge - Create Your Own Visualization

**Task:** Choose any two variables from the dataset and create a visualization

**Suggestions:**
- Compare any two environmental variables
- Create a scatter plot showing correlation
- Create a time series with multiple variables
- Be creative!

## Exercise Complete! ✅

**What you learned:**
- Advanced data loading and cleaning
- Data resampling and aggregation
- Creating dual y-axis plots
- Creating grouped bar charts
- Advanced plot customization
- Calculating summary statistics

**Congratulations!** You've completed all three exercises and are now proficient in Python data visualization!