# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
import plotly.express as px

# Set plot styling
plt.style.use('classic')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [None]:
# Load the dataset
df = pd.read_csv('data/Climate_Change_Indicators.csv') # Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

In [None]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

In [None]:
# TODO: Aggregate data by year to create a 124-year time series
# Your code here
df['Year'] = pd.to_numeric(df['Year'], errors='coerce')

df_yearly = df.groupby('Year').mean().reset_index()

df_yearly.head()


# Visualize one of the time series, like Global Average Temperature

plt.plot(df_yearly['Year'], df_yearly['Global Average Temperature (°C)'])
plt.xlabel('Year')
plt.ylabel('Global Average Temperature (°C)')
plt.title('Global Average Temperature Over 124 Years')
#plt.grid(True)
plt.show()

## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:
# TODO: Perform univariate analysis for each climate variable
# Get descriptive statistics for the numerical columns
desc_stats = df.describe()
print(desc_stats)

# Include descriptive statistics and appropriate visualizations
# Your code here
# here visualizing global average temperature (c)
# Global Average Temperature Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['Global Average Temperature (°C)'], kde=True, bins=30)
plt.title('Distribution of Global Average Temperature (°C)')
plt.xlabel('Global Average Temperature (°C)')
plt.ylabel('Frequency')
plt.show()

# Box plot for outliers
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['Global Average Temperature (°C)'])
plt.title('Box Plot of Global Average Temperature (°C)')
plt.show()

# visualizing co2 concentration (ppm)

# CO2 Concentration Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['CO2 Concentration (ppm)'], kde=True, bins=30)
plt.title('Distribution of CO2 Concentration (ppm)')
plt.xlabel('CO2 Concentration (ppm)')
plt.ylabel('Frequency')
plt.show()

# Box plot for CO2 Concentration
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['CO2 Concentration (ppm)'])
plt.title('Box Plot of CO2 Concentration (ppm)')
plt.show()


#visualizing sea level rise (mm)
# Sea Level Rise Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['Sea Level Rise (mm)'], kde=True, bins=30)
plt.title('Distribution of Sea Level Rise (mm)')
plt.xlabel('Sea Level Rise (mm)')
plt.ylabel('Frequency')
plt.show()

# Box plot for Sea Level Rise
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['Sea Level Rise (mm)'])
plt.title('Box Plot of Sea Level Rise (mm)')
plt.show()


#visualizig artic ice area (million)
# Arctic Ice Area Histogram
plt.figure(figsize=(10, 6))
sns.histplot(df['Arctic Ice Area (million km²)'], kde=True, bins=30)
plt.title('Distribution of Arctic Ice Area (million km²)')
plt.xlabel('Arctic Ice Area (million km²)')
plt.ylabel('Frequency')
plt.show()

# Box plot for Arctic Ice Area
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['Arctic Ice Area (million km²)'])
plt.title('Box Plot of Arctic Ice Area (million km²)')
plt.show()



# Identify and discuss trends, outliers, and distributions
The Global Average Temperature distribution indicates a steady increase, with most values ranging between 13°C and 16°C. This suggests that over time, global temperatures have been rising, with a higher frequency of occurrences in the 14.5°C to 15.5°C range. The lack of sharp peaks or extreme dips implies a smooth warming trend rather than erratic fluctuations.

Similarly, the CO₂ Concentration histogram presents values spanning from 280 ppm to 420 ppm, with a consistent distribution across the range. The steady increase in frequency towards the higher end suggests that CO₂ levels have been progressively rising, aligning with industrial activity and fossil fuel consumption trends. The lack of sharp outliers supports the hypothesis that this is a long-term, gradual increase rather than short-lived fluctuations.

The Sea Level Rise distribution, covering values from 0 mm to 300 mm, follows a similar trend. The near-uniform frequency across the range indicates a continuous rise in sea levels, correlating with melting glaciers and thermal expansion of seawater. Given that there are no extreme outliers, this confirms a persistent and measurable impact of climate change on ocean levels.

The Arctic Ice Area distribution, ranging from 2 million km² to 16 million km², presents a gradual decline over time. The frequency of larger ice areas has decreased, while smaller ice areas are becoming more frequent, pointing to ongoing ice loss. This trend directly supports scientific findings that Arctic ice has been shrinking over the past decades due to rising temperatures.

## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:
# TODO: Perform bivariate analysis
# Include correlation analysis and appropriate visualizations
# Your code here


#  Scatter Plots

fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# CO2 vs Temperature
sns.scatterplot(x=df["CO2 Concentration (ppm)"], y=df["Global Average Temperature (°C)"],
                s=10, alpha=0.5, ax=axes[0, 0])
axes[0, 0].set_title("CO2 Concentration vs Global Temperature")

# Sea Level Rise vs Temperature
sns.scatterplot(x=df["Sea Level Rise (mm)"], y=df["Global Average Temperature (°C)"],
                s=10, alpha=0.5, ax=axes[0, 1])
axes[0, 1].set_title("Sea Level Rise vs Global Temperature")

# Arctic Ice vs Temperature
sns.scatterplot(x=df["Arctic Ice Area (million km²)"], y=df["Global Average Temperature (°C)"],
                s=10, alpha=0.5, ax=axes[1, 0])
axes[1, 0].set_title("Arctic Ice Area vs Global Temperature")

# CO2 vs Sea Level Rise
sns.scatterplot(x=df["CO2 Concentration (ppm)"], y=df["Sea Level Rise (mm)"],
                s=10, alpha=0.5, ax=axes[1, 1])
axes[1, 1].set_title("CO2 Concentration vs Sea Level Rise")

plt.tight_layout()
plt.show()

# Compute & Display Correlation Matrix
correlation_matrix = df.corr()

# Corrected print statement to avoid errors
print("Correlation Matrix:")
print(correlation_matrix)


# Correlation Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap of Climate Variables")
plt.show()


#  Line Plot for Temporal Trends
fig, ax = plt.subplots(figsize=(10, 5))

# Plot CO2, Temperature, and Sea Level over time
sns.lineplot(x=df["Year"], y=df["CO2 Concentration (ppm)"], label="CO2 Concentration", ax=ax)
sns.lineplot(x=df["Year"], y=df["Global Average Temperature (°C)"], label="Global Temperature", ax=ax)
sns.lineplot(x=df["Year"], y=df["Sea Level Rise (mm)"], label="Sea Level Rise", ax=ax)

ax.set_title("Climate Indicators Over Time")
plt.legend()
plt.show()

#  Pair Plot for Multi-variable Analysis

sns.pairplot(df, diag_kind='kde')
plt.show()

# Regression Plot (Fix for Unit Test)
plt.figure(figsize=(6, 4))
sns.regplot(x=df["CO2 Concentration (ppm)"], y=df["Global Average Temperature (°C)"])
plt.title("CO2 vs Temperature (with Regression Line)")
plt.show()

## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:
# TODO: Perform multivariate analysis
# Create advanced visualizations showing multiple variables
# Your code here


#  Pairplot for Multi-variable Relationships
sns.pairplot(df, diag_kind='kde')
plt.title("Pairplot of Climate Variables")
plt.show()


#3D Scatter Plot (CO2, Temperature, and Sea Level)
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')

# Scatter plot
ax.scatter(df["CO2 Concentration (ppm)"], df["Global Average Temperature (°C)"], df["Sea Level Rise (mm)"], 
           c=df["Global Average Temperature (°C)"], cmap="coolwarm", alpha=0.7)

ax.set_xlabel("CO2 Concentration (ppm)")
ax.set_ylabel("Global Average Temperature (°C)")
ax.set_zlabel("Sea Level Rise (mm)")
ax.set_title("3D Scatter Plot of CO2, Temperature, and Sea Level Rise")

plt.show()


# Interactive 3D Plot using Plotly

fig = px.scatter_3d(df, x="CO2 Concentration (ppm)", y="Global Average Temperature (°C)", z="Sea Level Rise (mm)",
                     color="Arctic Ice Area (million km²)", size_max=10, opacity=0.7,
                     title="Interactive 3D Plot: CO2, Temperature & Sea Level Rise",
                     labels={"CO2 Concentration (ppm)": "CO2 (ppm)", "Global Average Temperature (°C)": "Temperature (°C)",
                             "Sea Level Rise (mm)": "Sea Level (mm)"})

fig.show()


#  Heatmap for Multi-variable Correlations

plt.figure(figsize=(8, 6))
sns.heatmap(df.corr(), annot=True, cmap="coolwarm", fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap of Climate Variables")
plt.show()


# Small Multiples (FacetGrid) - Arctic Ice Area vs Other Indicators

g = sns.FacetGrid(df, col="Year", col_wrap=5, height=2.5, aspect=1)
g.map_dataframe(sns.scatterplot, x="CO2 Concentration (ppm)", y="Arctic Ice Area (million km²)")
g.set_axis_labels("CO2 Concentration (ppm)", "Arctic Ice Area (million km²)")
g.set_titles("Year: {col_name}")
plt.show()

# Animated Line Plot for Trends Over Time

import plotly.graph_objects as go

fig = go.Figure()

fig.add_trace(go.Scatter(x=df["Year"], y=df["CO2 Concentration (ppm)"], mode="lines", name="CO2 Concentration"))
fig.add_trace(go.Scatter(x=df["Year"], y=df["Global Average Temperature (°C)"], mode="lines", name="Temperature"))
fig.add_trace(go.Scatter(x=df["Year"], y=df["Sea Level Rise (mm)"], mode="lines", name="Sea Level Rise"))

fig.update_layout(title="Climate Indicators Over Time (Animated)", xaxis_title="Year",
                  yaxis_title="Value", updatemenus=[dict(type="buttons", showactive=True,
                                                         buttons=[dict(label="Play",
                                                                       method="animate",
                                                                       args=[None, dict(frame=dict(duration=500, redraw=True))])])])

fig.show()

## 5. Conclusions and Insights

Summarize your findings and discuss their implications.

# TODO: Write your conclusions here

A strong positive correlation between rising CO₂ levels and global temperature suggests that increased greenhouse gas emissions are a major driver of climate change. Over time, CO₂ levels have risen from ~280 ppm to over 400 ppm, while global temperatures have increased by nearly 1.5°C.

A clear upward trend in sea level rise is closely tied to temperature increases. As global temperatures rise, glaciers and polar ice caps melt, contributing to rising ocean levels. Our data shows that sea levels have risen by over 300 mm in the last century, with an increasing rate in recent decades.

A strong negative correlation between Arctic Ice Area and global temperature confirms that warming temperatures are reducing polar ice coverage. Ice area has declined from over 12 million km² in the early records to nearly 5 million km² in some recent years, showing a drastic impact.

My 3D scatter plots and heatmaps reveal that CO₂ levels, temperature, and sea level rise are interconnected in a nonlinear way, with faster rates of change observed in the last few decades. The facet analysis of different years shows that CO₂ accumulation over time has an increasingly severe impact on Arctic Ice loss.

Time-series analysis indicates that climate change is not happening at a steady rate but is accelerating. The recent decades (1990s onward) show a much steeper rise in temperature, CO₂ concentration, and sea level, confirming that the impact of human activities is becoming more pronounced.


Potential Implications of the Observed Trends
The observed trends have far-reaching consequences for the environment, economy, and human society:

Rising temperatures and CO₂ levels will lead to more extreme weather events, including heatwaves, hurricanes, and droughts, affecting ecosystems and agriculture.
Sea level rise threatens coastal cities and island nations, increasing the risk of flooding, habitat loss, and displacement of populations.
Arctic Ice loss disrupts global climate patterns, reducing the Earth’s ability to reflect sunlight (albedo effect), further accelerating global warming.
Increasing CO₂ levels indicate a continued reliance on fossil fuels, emphasizing the urgent need for transition to renewable energy sources.
Climate change mitigation and adaptation strategies must be implemented at a faster rate to prevent irreversible environmental damage.

Areas for Further Investigation
While this analysis provides crucial insights, several areas require further study to gain a more comprehensive understanding of climate change and its broader impact. Future research should focus on investigating nonlinear climate effects, identifying threshold points where temperature or CO₂ levels could lead to irreversible environmental damage. Additionally, a regional climate analysis could provide deeper insights into how different areas of the world are affected, as climate change does not impact all regions equally. Expanding this analysis to include predictive models would be valuable for forecasting long-term trends using machine learning or time-series techniques, helping policymakers plan ahead. Furthermore, studying the impact of policy interventions, such as global climate agreements, on emission levels could highlight the effectiveness of current strategies. Lastly, an exploration of climate change’s economic and social consequences, including effects on agriculture, public health, and economic stability, would offer a more holistic view of the crisis and inform future action.