# Module 07: Capstone Project - Complete Data Visualization Analysis

**Estimated Time**: 120 minutes  
**Difficulty**: Intermediate

## Project Overview

In this capstone project, you will perform a complete end-to-end data analysis and visualization project, combining all the skills you've learned:

- Data loading and preparation
- Exploratory data analysis with Matplotlib and Seaborn
- Time series analysis
- Interactive visualizations with Plotly
- Publication-quality figures
- Professional data storytelling

## Scenario: Global Climate Analysis

You are a data analyst for an environmental research organization. Your task is to analyze global temperature and CO2 data to create a comprehensive visual report that:

1. Shows historical temperature trends
2. Explores seasonal patterns
3. Examines the relationship between CO2 and temperature
4. Compares regional differences
5. Creates interactive dashboards for stakeholders
6. Produces publication-ready figures for a research paper

---

## Part 1: Setup and Data Generation

Since we're working with synthetic data for this educational project, we'll create realistic climate-like data.

In [None]:
# Import all required libraries
%matplotlib inline

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
from scipy import stats
import warnings

# Configure settings
sns.set_theme(style="whitegrid", context="notebook")
warnings.filterwarnings("ignore")
np.random.seed(42)

print("All libraries imported successfully!")
print("\nCapstone Project: Global Climate Analysis")
print("=" * 50)

In [None]:
# Generate realistic climate data


def generate_climate_data(start_year=1960, end_year=2024, regions=None):
    """
    Generate synthetic but realistic climate data.

    Features:
    - Long-term warming trend
    - Seasonal patterns
    - Regional variations
    - Correlation with CO2
    - Natural variability
    """
    if regions is None:
        regions = ["North America", "Europe", "Asia", "South America", "Africa"]

    # Generate daily dates
    dates = pd.date_range(start=f"{start_year}-01-01", end=f"{end_year}-12-31", freq="D")

    n_days = len(dates)

    # Create time-based trend (global warming)
    years_elapsed = (dates - dates[0]).days / 365.25
    warming_trend = 0.02 * years_elapsed  # ~0.02°C per year

    # Create CO2 data (increasing trend)
    co2_base = 315  # Starting CO2 in ppm (1960 level)
    co2_increase = 1.8 * years_elapsed  # ~1.8 ppm per year
    co2 = co2_base + co2_increase + np.random.normal(0, 2, n_days)

    # Initialize data structure
    data_list = []

    for region in regions:
        # Regional baseline temperature
        regional_bases = {
            "North America": 10,
            "Europe": 9,
            "Asia": 12,
            "South America": 24,
            "Africa": 26,
        }
        base_temp = regional_bases.get(region, 15)

        # Seasonal pattern (stronger in temperate regions)
        seasonal_amplitude = 15 if region in ["North America", "Europe", "Asia"] else 8
        day_of_year = dates.dayofyear
        seasonal = seasonal_amplitude * np.sin(2 * np.pi * (day_of_year - 80) / 365)

        # Random daily variation
        daily_variation = np.random.normal(0, 2, n_days)

        # Combine all components
        temperature = base_temp + warming_trend + seasonal + daily_variation

        # Add correlation with CO2 (not perfect, but noticeable)
        co2_effect = 0.01 * (co2 - co2_base)
        temperature = temperature + co2_effect

        # Create DataFrame for this region
        for date, temp, co2_val in zip(dates, temperature, co2):
            data_list.append(
                {
                    "date": date,
                    "region": region,
                    "temperature": temp,
                    "co2": co2_val,
                    "year": date.year,
                    "month": date.month,
                    "season": get_season(date.month),
                }
            )

    return pd.DataFrame(data_list)


def get_season(month):
    """Convert month to season (Northern Hemisphere)"""
    if month in [12, 1, 2]:
        return "Winter"
    elif month in [3, 4, 5]:
        return "Spring"
    elif month in [6, 7, 8]:
        return "Summer"
    else:
        return "Fall"


# Generate the data
print("Generating climate dataset...")
climate_df = generate_climate_data()

print(f"\nDataset created successfully!")
print(f"Shape: {climate_df.shape}")
print(f"Date range: {climate_df['date'].min()} to {climate_df['date'].max()}")
print(f"Regions: {climate_df['region'].unique()}")
print(f"\nFirst few rows:")
climate_df.head(10)

In [None]:
# Data quality check
print("Data Quality Report")
print("=" * 50)
print(f"\nTotal records: {len(climate_df):,}")
print(f"Missing values: {climate_df.isnull().sum().sum()}")
print(f"Duplicate rows: {climate_df.duplicated().sum()}")

print("\nTemperature Statistics:")
print(climate_df.groupby("region")["temperature"].describe())

print("\nCO2 Statistics:")
print(climate_df["co2"].describe())

## Part 2: Exploratory Data Analysis with Matplotlib

Let's start with basic exploratory analysis using Matplotlib.

In [None]:
# Overall temperature trend
# Calculate yearly averages
yearly_temp = climate_df.groupby("year")["temperature"].mean().reset_index()
yearly_co2 = climate_df.groupby("year")["co2"].mean().reset_index()

fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Temperature trend
axes[0].plot(
    yearly_temp["year"],
    yearly_temp["temperature"],
    linewidth=2.5,
    color="orangered",
    marker="o",
    markersize=4,
)

# Add trend line
z = np.polyfit(yearly_temp["year"], yearly_temp["temperature"], 1)
p = np.poly1d(z)
axes[0].plot(
    yearly_temp["year"],
    p(yearly_temp["year"]),
    "--",
    linewidth=2,
    color="darkred",
    alpha=0.7,
    label=f"Trend: +{z[0]:.3f}°C/year",
)

axes[0].set_ylabel("Global Average Temperature (°C)", fontsize=12, fontweight="bold")
axes[0].set_title("Global Temperature and CO2 Trends (1960-2024)", fontsize=16, fontweight="bold")
axes[0].legend(loc="upper left", fontsize=11)
axes[0].grid(True, alpha=0.3)

# CO2 trend
axes[1].plot(
    yearly_co2["year"],
    yearly_co2["co2"],
    linewidth=2.5,
    color="steelblue",
    marker="s",
    markersize=4,
)

# Add trend line
z_co2 = np.polyfit(yearly_co2["year"], yearly_co2["co2"], 1)
p_co2 = np.poly1d(z_co2)
axes[1].plot(
    yearly_co2["year"],
    p_co2(yearly_co2["year"]),
    "--",
    linewidth=2,
    color="darkblue",
    alpha=0.7,
    label=f"Trend: +{z_co2[0]:.2f} ppm/year",
)

axes[1].set_xlabel("Year", fontsize=12, fontweight="bold")
axes[1].set_ylabel("CO₂ Concentration (ppm)", fontsize=12, fontweight="bold")
axes[1].legend(loc="upper left", fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Temperature increase: {z[0]:.4f}°C per year")
print(f"Total warming (1960-2024): {z[0] * 64:.2f}°C")
print(f"\nCO2 increase: {z_co2[0]:.2f} ppm per year")
print(f"Total CO2 rise (1960-2024): {z_co2[0] * 64:.1f} ppm")

In [None]:
# Regional comparison
regional_yearly = climate_df.groupby(["year", "region"])["temperature"].mean().reset_index()

fig, ax = plt.subplots(figsize=(14, 7))

colors = ["#0173B2", "#DE8F05", "#029E73", "#CC78BC", "#949494"]
regions = climate_df["region"].unique()

for region, color in zip(regions, colors):
    data = regional_yearly[regional_yearly["region"] == region]
    ax.plot(data["year"], data["temperature"], linewidth=2, label=region, color=color, alpha=0.8)

ax.set_title("Temperature Trends by Region (1960-2024)", fontsize=16, fontweight="bold")
ax.set_xlabel("Year", fontsize=12, fontweight="bold")
ax.set_ylabel("Average Temperature (°C)", fontsize=12, fontweight="bold")
ax.legend(loc="upper left", fontsize=11, framealpha=0.9)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Regional warming rates:")
for region in regions:
    data = regional_yearly[regional_yearly["region"] == region]
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        data["year"], data["temperature"]
    )
    print(f"{region:20} {slope:.4f}°C/year (R² = {r_value**2:.3f})")

## Part 3: Statistical Analysis with Seaborn

In [None]:
# Seasonal patterns by region
fig, axes = plt.subplots(2, 2, figsize=(16, 12))
axes = axes.ravel()

# Sample data for recent years for clearer visualization
recent_data = climate_df[climate_df["year"] >= 2020]

# Violin plot
sns.violinplot(
    data=recent_data, x="season", y="temperature", hue="region", ax=axes[0], palette="Set2"
)
axes[0].set_title(
    "Seasonal Temperature Distribution by Region (2020-2024)", fontsize=13, fontweight="bold"
)
axes[0].set_xlabel("Season", fontsize=11)
axes[0].set_ylabel("Temperature (°C)", fontsize=11)
axes[0].legend(title="Region", bbox_to_anchor=(1.05, 1), loc="upper left")

# Box plot
sns.boxplot(
    data=climate_df[climate_df["year"].isin([1970, 1990, 2010, 2024])],
    x="year",
    y="temperature",
    ax=axes[1],
    palette="coolwarm",
)
axes[1].set_title("Temperature Distribution Across Decades", fontsize=13, fontweight="bold")
axes[1].set_xlabel("Year", fontsize=11)
axes[1].set_ylabel("Temperature (°C)", fontsize=11)

# Correlation heatmap
# Prepare data for correlation
yearly_data = climate_df.groupby("year").agg({"temperature": "mean", "co2": "mean"}).reset_index()

corr_matrix = yearly_data[["year", "temperature", "co2"]].corr()
sns.heatmap(
    corr_matrix,
    annot=True,
    fmt=".3f",
    cmap="RdYlBu_r",
    center=0,
    square=True,
    ax=axes[2],
    cbar_kws={"shrink": 0.8},
)
axes[2].set_title("Correlation Matrix", fontsize=13, fontweight="bold")

# Regression plot
sns.regplot(
    data=yearly_data,
    x="co2",
    y="temperature",
    ax=axes[3],
    scatter_kws={"alpha": 0.6, "s": 50},
    line_kws={"color": "red", "linewidth": 2},
)
axes[3].set_title("Temperature vs CO₂ Concentration", fontsize=13, fontweight="bold")
axes[3].set_xlabel("CO₂ (ppm)", fontsize=11)
axes[3].set_ylabel("Temperature (°C)", fontsize=11)
axes[3].grid(True, alpha=0.3)

# Calculate correlation
corr = yearly_data["co2"].corr(yearly_data["temperature"])
axes[3].text(
    0.05,
    0.95,
    f"R = {corr:.3f}",
    transform=axes[3].transAxes,
    fontsize=12,
    verticalalignment="top",
    bbox=dict(boxstyle="round", facecolor="wheat", alpha=0.5),
)

plt.tight_layout()
plt.show()

print(f"\nKey Statistics:")
print(f"Temperature-CO2 correlation: {corr:.4f}")
print(f"This indicates a {'strong' if abs(corr) > 0.7 else 'moderate'} positive relationship")

## Part 4: Time Series Analysis

Deep dive into temporal patterns using advanced time series techniques.

In [None]:
# Monthly analysis for a specific region
north_america = climate_df[climate_df["region"] == "North America"].copy()
north_america = north_america.set_index("date")

# Resample to monthly
monthly_temp = north_america["temperature"].resample("M").mean()
monthly_co2 = north_america["co2"].resample("M").mean()

# Calculate rolling statistics
rolling_12m = monthly_temp.rolling(window=12).mean()
rolling_60m = monthly_temp.rolling(window=60).mean()

fig, axes = plt.subplots(2, 1, figsize=(14, 10), sharex=True)

# Temperature with rolling averages
axes[0].plot(
    monthly_temp.index, monthly_temp, linewidth=0.8, color="gray", alpha=0.5, label="Monthly"
)
axes[0].plot(rolling_12m.index, rolling_12m, linewidth=2, color="blue", label="12-month average")
axes[0].plot(rolling_60m.index, rolling_60m, linewidth=2.5, color="red", label="5-year average")

axes[0].set_title(
    "North America Temperature: Monthly Data and Rolling Averages", fontsize=15, fontweight="bold"
)
axes[0].set_ylabel("Temperature (°C)", fontsize=12)
axes[0].legend(loc="upper left", fontsize=11)
axes[0].grid(True, alpha=0.3)

# Seasonal decomposition visualization (simplified)
# Calculate yearly cycle
monthly_temp_df = monthly_temp.to_frame()
monthly_temp_df["month"] = monthly_temp_df.index.month
seasonal_pattern = monthly_temp_df.groupby("month")["temperature"].mean()

# Plot seasonal pattern
months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
axes[1].plot(
    range(1, 13), seasonal_pattern.values, "o-", linewidth=2.5, markersize=8, color="green"
)
axes[1].fill_between(range(1, 13), seasonal_pattern.values, alpha=0.3, color="green")
axes[1].set_xticks(range(1, 13))
axes[1].set_xticklabels(months)
axes[1].set_title("Average Seasonal Pattern", fontsize=15, fontweight="bold")
axes[1].set_xlabel("Month", fontsize=12)
axes[1].set_ylabel("Temperature (°C)", fontsize=12)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("Seasonal amplitude (peak-to-trough):")
print(f"{seasonal_pattern.max() - seasonal_pattern.min():.2f}°C")

## Part 5: Interactive Visualizations with Plotly

Create interactive dashboards for stakeholders.

In [None]:
# Interactive global temperature map (using scatter for regions)
# Aggregate by year and region
yearly_regional = (
    climate_df.groupby(["year", "region"]).agg({"temperature": "mean", "co2": "mean"}).reset_index()
)

fig = px.line(
    yearly_regional,
    x="year",
    y="temperature",
    color="region",
    title="Interactive Regional Temperature Trends",
    labels={"temperature": "Temperature (°C)", "year": "Year", "region": "Region"},
)

fig.update_traces(line=dict(width=2.5))
fig.update_layout(hovermode="x unified", height=600, font=dict(size=12))

fig.show()

print("Interactive features:")
print("  - Hover to see exact values")
print("  - Click legend to show/hide regions")
print("  - Zoom and pan to explore periods")
print("  - Double-click to reset view")

In [None]:
# Interactive 3D visualization
# Sample data for performance
sample_data = climate_df[climate_df["year"] >= 2000].sample(n=5000)

fig = px.scatter_3d(
    sample_data,
    x="year",
    y="co2",
    z="temperature",
    color="region",
    title="3D Climate Data Exploration (2000-2024)",
    labels={"temperature": "Temperature (°C)", "co2": "CO₂ (ppm)", "year": "Year"},
    opacity=0.6,
)

fig.update_traces(marker=dict(size=3))
fig.update_layout(
    scene=dict(xaxis_title="Year", yaxis_title="CO₂ (ppm)", zaxis_title="Temperature (°C)"),
    height=700,
)

fig.show()

print("\nRotate the 3D plot to explore relationships from different angles!")

In [None]:
# Comprehensive interactive dashboard
fig = make_subplots(
    rows=2,
    cols=2,
    subplot_titles=(
        "Global Temperature Trend",
        "CO₂ Concentration",
        "Regional Comparison (2024)",
        "Temperature vs CO₂",
    ),
    specs=[[{"type": "scatter"}, {"type": "scatter"}], [{"type": "bar"}, {"type": "scatter"}]],
)

# Temperature trend
fig.add_trace(
    go.Scatter(
        x=yearly_temp["year"],
        y=yearly_temp["temperature"],
        mode="lines+markers",
        name="Temperature",
        line=dict(color="orangered", width=2),
        marker=dict(size=4),
    ),
    row=1,
    col=1,
)

# CO2 trend
fig.add_trace(
    go.Scatter(
        x=yearly_co2["year"],
        y=yearly_co2["co2"],
        mode="lines+markers",
        name="CO₂",
        line=dict(color="steelblue", width=2),
        marker=dict(size=4),
    ),
    row=1,
    col=2,
)

# Regional comparison (2024)
recent_regional = climate_df[climate_df["year"] == 2024].groupby("region")["temperature"].mean()
fig.add_trace(
    go.Bar(x=recent_regional.index, y=recent_regional.values, name="2024 Avg", marker_color="teal"),
    row=2,
    col=1,
)

# Temperature vs CO2
fig.add_trace(
    go.Scatter(
        x=yearly_data["co2"],
        y=yearly_data["temperature"],
        mode="markers",
        name="Temp vs CO₂",
        marker=dict(
            size=6,
            color=yearly_data["year"],
            colorscale="Viridis",
            showscale=True,
            colorbar=dict(title="Year", x=1.15),
        ),
    ),
    row=2,
    col=2,
)

# Update axes
fig.update_xaxes(title_text="Year", row=1, col=1)
fig.update_xaxes(title_text="Year", row=1, col=2)
fig.update_xaxes(title_text="Region", row=2, col=1)
fig.update_xaxes(title_text="CO₂ (ppm)", row=2, col=2)

fig.update_yaxes(title_text="Temperature (°C)", row=1, col=1)
fig.update_yaxes(title_text="CO₂ (ppm)", row=1, col=2)
fig.update_yaxes(title_text="Temperature (°C)", row=2, col=1)
fig.update_yaxes(title_text="Temperature (°C)", row=2, col=2)

fig.update_layout(title_text="Climate Analysis Dashboard", showlegend=False, height=800)

# Save interactive dashboard
fig.write_html("../notebooks/outputs/climate_dashboard.html")
fig.show()

print("\nInteractive dashboard saved to: climate_dashboard.html")
print("Share this file with stakeholders for interactive exploration!")

## Part 6: Publication-Quality Figures

Create figures ready for scientific publication.

In [None]:
# Set publication parameters
plt.rcParams.update(
    {
        "font.size": 8,
        "axes.labelsize": 9,
        "axes.titlesize": 10,
        "xtick.labelsize": 7,
        "ytick.labelsize": 7,
        "legend.fontsize": 7,
        "font.family": "sans-serif",
        "font.sans-serif": ["Arial"],
        "axes.linewidth": 0.5,
        "lines.linewidth": 1.2,
    }
)

# Create publication figure
fig = plt.figure(figsize=(7, 8), dpi=300)
gs = fig.add_gridspec(3, 2, hspace=0.4, wspace=0.35)

# Panel A: Global temperature trend
ax1 = fig.add_subplot(gs[0, :])
ax1.plot(
    yearly_temp["year"],
    yearly_temp["temperature"],
    color="#0173B2",
    linewidth=1.2,
    marker="o",
    markersize=2,
)
ax1.plot(
    yearly_temp["year"], p(yearly_temp["year"]), "--", color="#DE8F05", linewidth=1.5, alpha=0.8
)
ax1.set_ylabel("Temperature (°C)", fontsize=9)
ax1.set_title("A", fontsize=11, fontweight="bold", loc="left")
ax1.spines["top"].set_visible(False)
ax1.spines["right"].set_visible(False)
ax1.grid(True, alpha=0.2, linewidth=0.5)

# Panel B: Regional trends
ax2 = fig.add_subplot(gs[1, 0])
colors = ["#0173B2", "#DE8F05", "#029E73", "#CC78BC", "#949494"]
for region, color in zip(climate_df["region"].unique(), colors):
    data = regional_yearly[regional_yearly["region"] == region]
    ax2.plot(data["year"], data["temperature"], color=color, linewidth=0.8, alpha=0.8, label=region)
ax2.set_ylabel("Temperature (°C)", fontsize=9)
ax2.set_title("B", fontsize=11, fontweight="bold", loc="left")
ax2.legend(frameon=False, loc="upper left", fontsize=6)
ax2.spines["top"].set_visible(False)
ax2.spines["right"].set_visible(False)
ax2.grid(True, alpha=0.2, linewidth=0.5)

# Panel C: Temperature vs CO2
ax3 = fig.add_subplot(gs[1, 1])
scatter = ax3.scatter(
    yearly_data["co2"],
    yearly_data["temperature"],
    s=10,
    alpha=0.6,
    c=yearly_data["year"],
    cmap="viridis",
    edgecolors="none",
)
ax3.plot(
    yearly_data["co2"],
    np.poly1d(np.polyfit(yearly_data["co2"], yearly_data["temperature"], 1))(yearly_data["co2"]),
    "r--",
    linewidth=1,
    alpha=0.7,
)
ax3.set_xlabel("CO₂ (ppm)", fontsize=9)
ax3.set_ylabel("Temperature (°C)", fontsize=9)
ax3.set_title("C", fontsize=11, fontweight="bold", loc="left")
ax3.spines["top"].set_visible(False)
ax3.spines["right"].set_visible(False)
ax3.grid(True, alpha=0.2, linewidth=0.5)
cbar = plt.colorbar(scatter, ax=ax3)
cbar.set_label("Year", fontsize=7)
cbar.ax.tick_params(labelsize=6)

# Panel D: Seasonal patterns
ax4 = fig.add_subplot(gs[2, :])
seasonal_by_region = climate_df.groupby(["season", "region"])["temperature"].mean().unstack()
season_order = ["Winter", "Spring", "Summer", "Fall"]
seasonal_by_region = seasonal_by_region.reindex(season_order)

x = np.arange(len(season_order))
width = 0.15
for i, (region, color) in enumerate(zip(seasonal_by_region.columns, colors)):
    ax4.bar(x + i * width, seasonal_by_region[region], width, label=region, color=color, alpha=0.8)

ax4.set_xlabel("Season", fontsize=9)
ax4.set_ylabel("Temperature (°C)", fontsize=9)
ax4.set_title("D", fontsize=11, fontweight="bold", loc="left")
ax4.set_xticks(x + width * 2)
ax4.set_xticklabels(season_order)
ax4.legend(frameon=False, ncol=5, loc="upper center", fontsize=6)
ax4.spines["top"].set_visible(False)
ax4.spines["right"].set_visible(False)
ax4.grid(True, alpha=0.2, linewidth=0.5, axis="y")

# Overall title
fig.suptitle(
    "Global Climate Analysis: Temperature and CO₂ Trends (1960-2024)",
    fontsize=11,
    fontweight="bold",
    y=0.995,
)

# Save
plt.savefig("../notebooks/outputs/publication_climate_figure.png", dpi=300, bbox_inches="tight")
plt.savefig("../notebooks/outputs/publication_climate_figure.pdf", bbox_inches="tight")

plt.show()

print("Publication-quality figure created!")
print("\nFiles saved:")
print("  - publication_climate_figure.png (300 DPI)")
print("  - publication_climate_figure.pdf (vector)")
print("\nReady for journal submission!")

# Reset parameters
plt.rcParams.update(plt.rcParamsDefault)

## Part 7: Data Storytelling - Executive Summary

Create a compelling narrative with visualizations.

In [None]:
# Generate executive summary statistics
print("=" * 60)
print("EXECUTIVE SUMMARY: GLOBAL CLIMATE ANALYSIS")
print("Period: 1960-2024")
print("=" * 60)

# Key findings
temp_1960 = climate_df[climate_df["year"] == 1960]["temperature"].mean()
temp_2024 = climate_df[climate_df["year"] == 2024]["temperature"].mean()
temp_increase = temp_2024 - temp_1960

co2_1960 = climate_df[climate_df["year"] == 1960]["co2"].mean()
co2_2024 = climate_df[climate_df["year"] == 2024]["co2"].mean()
co2_increase = co2_2024 - co2_1960

print("\n1. TEMPERATURE TRENDS")
print(f"   Average temperature (1960): {temp_1960:.2f}°C")
print(f"   Average temperature (2024): {temp_2024:.2f}°C")
print(f"   Total increase: {temp_increase:.2f}°C ({(temp_increase/temp_1960*100):.1f}%)")
print(f"   Rate of change: {z[0]:.4f}°C/year")

print("\n2. CO₂ CONCENTRATION")
print(f"   CO₂ level (1960): {co2_1960:.1f} ppm")
print(f"   CO₂ level (2024): {co2_2024:.1f} ppm")
print(f"   Total increase: {co2_increase:.1f} ppm ({(co2_increase/co2_1960*100):.1f}%)")
print(f"   Rate of change: {z_co2[0]:.2f} ppm/year")

print("\n3. REGIONAL VARIATIONS")
for region in climate_df["region"].unique():
    data = regional_yearly[regional_yearly["region"] == region]
    slope, _, r_value, _, _ = stats.linregress(data["year"], data["temperature"])
    print(f"   {region:20} +{slope:.4f}°C/year (R² = {r_value**2:.3f})")

print("\n4. KEY INSIGHTS")
print(f"   • Temperature-CO₂ correlation: {corr:.3f} (strong positive)")
print(f"   • All regions show warming trends")
print(f"   • Seasonal amplitude: {seasonal_pattern.max() - seasonal_pattern.min():.1f}°C")
print(f"   • Acceleration in recent decades visible in 5-year averages")

print("\n5. RECOMMENDATIONS")
print("   1. Continue monitoring global temperature and CO₂ levels")
print("   2. Focus on regional adaptation strategies")
print("   3. Investigate seasonal pattern changes over time")
print("   4. Develop predictive models for future scenarios")

print("\n" + "=" * 60)
print("Report generated:", datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
print("=" * 60)

## Capstone Project Summary

### What You've Accomplished

Congratulations! You've completed a comprehensive data visualization project that demonstrates mastery of:

#### 1. Data Preparation
- Generated realistic synthetic climate dataset
- Performed data quality checks
- Structured data for analysis

#### 2. Exploratory Analysis (Matplotlib)
- Created trend visualizations
- Compared multiple time series
- Applied statistical analysis

#### 3. Statistical Visualization (Seaborn)
- Violin plots for distributions
- Box plots for comparisons
- Correlation heatmaps
- Regression analysis

#### 4. Time Series Analysis
- Monthly and yearly aggregations
- Rolling averages
- Seasonal decomposition
- Trend identification

#### 5. Interactive Visualizations (Plotly)
- Interactive line charts
- 3D visualizations
- Multi-panel dashboards
- HTML exports for sharing

#### 6. Publication-Quality Figures
- Professional multi-panel layout
- Proper sizing and DPI
- Colorblind-friendly palette
- Publication-ready formats (PNG, PDF)

#### 7. Data Storytelling
- Executive summary with key findings
- Clear insights and recommendations
- Professional presentation

### Skills Demonstrated

- ✓ Matplotlib fundamentals and customization
- ✓ Seaborn statistical plots
- ✓ Time series visualization
- ✓ Interactive Plotly dashboards
- ✓ Publication best practices
- ✓ Color accessibility
- ✓ Data storytelling
- ✓ Professional communication

### Project Outputs

1. **Interactive Dashboard**: `climate_dashboard.html`
2. **Publication Figure**: `publication_climate_figure.png/pdf`
3. **Executive Summary**: Statistical insights and recommendations

---

## Next Steps

### Continue Your Learning

1. **Apply to Real Data**
   - Use actual climate datasets (NOAA, NASA)
   - Analyze your organization's data
   - Create dashboards for real stakeholders

2. **Advanced Topics**
   - Dash for web applications
   - D3.js for custom visualizations
   - Geographic mapping (GeoPandas, Folium)
   - Machine learning visualization (confusion matrices, ROC curves)

3. **Build Your Portfolio**
   - Create GitHub repository with your visualizations
   - Write blog posts explaining your work
   - Contribute to open-source visualization projects

4. **Practice Regularly**
   - Participate in data viz challenges (#TidyTuesday)
   - Recreate visualizations from news articles
   - Experiment with new techniques and libraries

### Resources

- **Documentation**: Matplotlib, Seaborn, Plotly official docs
- **Books**: 
  - "The Visual Display of Quantitative Information" by Edward Tufte
  - "Storytelling with Data" by Cole Nussbaumer Knaflic
- **Communities**: r/dataisbeautiful, Plotly community forum
- **Inspiration**: FlowingData, Information is Beautiful

---

**Congratulations on completing the Data Visualization Fundamentals course!**

You now have the skills to create professional, impactful visualizations that communicate insights effectively. Keep practicing and building your portfolio!

## Optional Challenge: Extend the Project

Take this project further by:

1. **Adding More Analysis**
   - Forecast future temperatures using trend analysis
   - Identify anomalies and extreme events
   - Calculate climate metrics (warming stripes, etc.)

2. **Enhanced Visualizations**
   - Animated timeline of temperature changes
   - Geographic heatmap of regional data
   - Polar plots for seasonal patterns

3. **Interactive Features**
   - Dropdown to select regions
   - Date range slider
   - Toggle between different metrics

4. **Real Data Integration**
   - Download actual climate data from NOAA
   - Merge with population or economic data
   - Validate findings against real trends

Use the cells below to experiment!

In [None]:
# Your extended analysis here

In [None]:
# Additional visualizations

In [None]:
# Experimentation space