# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

# Set plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [4]:
# Load the dataset
df = pd.read_csv('data/Climate_Change_Indicators.csv') # Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,Year,Global Average Temperature (°C),CO2 Concentration (ppm),Sea Level Rise (mm),Arctic Ice Area (million km²)
0,1948,13.17,397.04,116.25,5.97
1,1996,13.1,313.17,277.92,9.66
2,2015,14.67,311.95,290.32,8.4
3,1966,14.79,304.25,189.71,11.83
4,1992,13.15,354.52,14.84,11.23


In [None]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

In [None]:
# TODO: Aggregate data by year to create a 124-year time series
# Aggregate data by year (taking the mean for numerical columns)
df_agg = df.groupby("Year").mean().reset_index()

# Display the first few rows
print(df_agg.head())

# Check the number of unique years
print(f"\nNumber of unique years: {df_agg['Year'].nunique()}")


## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:
# TODO: Perform univariate analysis for each climate variable
# Include descriptive statistics and appropriate visualizations
climate_variables = [
    "Global Average Temperature (°C)",
    "CO2 Concentration (ppm)",
    "Sea Level Rise (mm)",
    "Arctic Ice Area (million km²)"
]

# Descriptive Statistics
print("Descriptive Statistics:\n")
print(df_agg[climate_variables].describe())

# Visualizing Each Variable
plt.figure(figsize=(14, 10))

for i, col in enumerate(climate_variables, 1):
    plt.subplot(2, 2, i)
    sns.histplot(df_agg[col], bins=30, kde=True, color="royalblue")
    plt.title(f"Distribution of {col}")
    plt.xlabel(col)
    plt.ylabel("Frequency")

plt.tight_layout()
plt.show()

# Box Plots for Outlier Detection
plt.figure(figsize=(12, 8))
sns.boxplot(data=df_agg[climate_variables], palette="coolwarm")
plt.title("Box Plots of Climate Variables")
plt.xticks(rotation=15)
plt.show()


## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:
# TODO: Perform bivariate analysis
# Include correlation analysis and appropriate visualizations
climate_variables = [
    "Global Average Temperature (°C)",
    "CO2 Concentration (ppm)",
    "Sea Level Rise (mm)",
    "Arctic Ice Area (million km²)"
]

#  Correlation Matrix
correlation_matrix = df_agg[climate_variables].corr()
print("Correlation Matrix:\n", correlation_matrix)

#  Heatmap to Visualize Correlations
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap of Climate Variables")
plt.show()

#  Scatter Plots to Show Relationships
sns.pairplot(df_agg[climate_variables], diag_kind="kde", markers="o", plot_kws={'alpha':0.5})
plt.suptitle("Pairplot of Climate Variables", y=1.02)
plt.show()

## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:
# TODO: Perform multivariate analysis
# Create advanced visualizations showing multiple variables
climate_variables = [
    "Global Average Temperature (°C)",
    "CO2 Concentration (ppm)",
    "Sea Level Rise (mm)",
    "Arctic Ice Area (million km²)"
]

#  Heatmap for Multivariate Correlations
plt.figure(figsize=(8, 6))
sns.heatmap(df_agg[climate_variables].corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Multivariate Correlation Heatmap")
plt.show()

# Pairplot to Explore Pairwise Relationships
sns.pairplot(df_agg[climate_variables], diag_kind="kde", markers="o", plot_kws={'alpha':0.5})
plt.suptitle("Multivariate Pairplot of Climate Variables", y=1.02)
plt.show()

#  3D Scatter Plot (CO2 vs Temperature vs Sea Level)
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df_agg["CO2 Concentration (ppm)"], df_agg["Global Average Temperature (°C)"], df_agg["Sea Level Rise (mm)"], c=df_agg["CO2 Concentration (ppm)"], cmap='coolwarm')

ax.set_xlabel("CO2 Concentration (ppm)")
ax.set_ylabel("Global Temperature (°C)")
ax.set_zlabel("Sea Level Rise (mm)")
ax.set_title("3D Scatter Plot: CO2 vs Temperature vs Sea Level")
plt.show()

#  Multivariate Line Plot (Trends Over Time)
plt.figure(figsize=(12, 6))
for col in climate_variables[1:]:  # Exclude 'Year'
    plt.plot(df_agg["Year"], df_agg[col], label=col, linewidth=2)

plt.xlabel("Year")
plt.ylabel("Value")
plt.title("Multivariate Time Series Plot of Climate Variables")
plt.legend()
plt.show()

## 5. Conclusions and Insights

Summarize your findings and discuss their implications.

# TODO: Write your conclusions here
## 5. Conclusions and Insights

### 🔍 Key Findings

1. **Rising Global Temperatures**  
   - Global average temperature has significantly increased over time.  
   - Strong positive correlation between **CO₂ concentration** and **temperature**, supporting greenhouse gas-induced warming.  

2. **Increase in CO₂ Concentration**  
   - CO₂ levels have steadily risen, especially post-industrial era.  
   - This trend aligns with increased **fossil fuel consumption** and **deforestation**.  

3. **Sea Level Rise**  
   - A consistent increase in **sea levels** is observed over the years.  
   - Rising temperatures contribute to **glacial melting** and **thermal expansion of seawater**.  

4. **Declining Arctic Ice Area**  
   - Arctic ice coverage is **shrinking** over time.  
   - This affects **polar ecosystems, ocean currents, and global climate patterns**.  

---

### 🌍 Implications

- **Climate Change Acceleration:**  
  - The trends confirm that human activities (**CO₂ emissions**) are **driving global warming**.  
  - Immediate actions like **reducing emissions** and **transitioning to renewable energy** are crucial.  

- **Rising Sea Levels Pose a Threat:**  
  - Coastal cities and low-lying areas are at **high risk of flooding** due to sea level rise.  
  - This calls for **adaptation strategies**, including improved infrastructure and policy measures.  

- **Arctic Melting & Global Impact:**  
  - Loss of Arctic ice affects **weather patterns, biodiversity, and ocean circulation**.  
  - This can lead to **more extreme weather events** globally.  

---

### 📊 Future Considerations

- Further research on **climate feedback loops** (e.g., methane release from permafrost).  
- Assessing the **impact of policy changes** (e.g., Paris Agreement) on emissions.  
- Conducting **regional climate analysis** to understand localized effects.  

---

