# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [44]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



# Set plot styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [45]:
# Load the dataset
df = pd.read_csv('data/Climate_Change_Indicators.csv') # Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

Unnamed: 0,Year,Global Average Temperature (°C),CO2 Concentration (ppm),Sea Level Rise (mm),Arctic Ice Area (million km²)
0,1948,13.17,397.04,116.25,5.97
1,1996,13.1,313.17,277.92,9.66
2,2015,14.67,311.95,290.32,8.4
3,1966,14.79,304.25,189.71,11.83
4,1992,13.15,354.52,14.84,11.23


In [46]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

Dataset Information:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1048576 entries, 0 to 1048575
Data columns (total 5 columns):
 #   Column                           Non-Null Count    Dtype  
---  ------                           --------------    -----  
 0   Year                             1048576 non-null  int64  
 1   Global Average Temperature (°C)  1048576 non-null  float64
 2   CO2 Concentration (ppm)          1048576 non-null  float64
 3   Sea Level Rise (mm)              1048576 non-null  float64
 4   Arctic Ice Area (million km²)    1048576 non-null  float64
dtypes: float64(4), int64(1)
memory usage: 40.0 MB
None

Missing Values:
Year                               0
Global Average Temperature (°C)    0
CO2 Concentration (ppm)            0
Sea Level Rise (mm)                0
Arctic Ice Area (million km²)      0
dtype: int64


In [47]:
# TODO: Aggregate data by year to create a 124-year time series
df_aggregated = df.groupby('Year').mean().reset_index()

# Display the first few rows of the aggregated dataset
print(df_aggregated.head(10))

   Year  Global Average Temperature (°C)  CO2 Concentration (ppm)  \
0  1900                        14.506663               350.373405   
1  1901                        14.485343               349.757140   
2  1902                        14.476262               349.299686   
3  1903                        14.492360               349.644375   
4  1904                        14.494241               349.537032   
5  1905                        14.486222               349.768517   
6  1906                        14.501610               350.269288   
7  1907                        14.507352               349.707452   
8  1908                        14.489932               349.908538   
9  1909                        14.524320               349.477657   

   Sea Level Rise (mm)  Arctic Ice Area (million km²)  
0           150.408288                       8.978659  
1           150.548828                       8.947272  
2           152.174821                       9.035554  
3           150.

## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:

# Rename columns for clarity
df_aggregated.rename(columns={
    "Global Average Temperature (°C)": "Temperature",
    "CO2 Concentration (ppm)": "CO2",
    "Sea Level Rise (mm)": "Sea_Level",
    "Arctic Ice Area (million km²)": "Ice_Area"
}, inplace=True)

# Define variables for analysis
variables = ["Temperature", "CO2", "Sea_Level", "Ice_Area"]

# Compute and print descriptive statistics
stats_df = df_aggregated[variables].describe().round(2)

for column in variables:
    print(f"\nDescriptive Statistics for {column}:")
    print(stats_df[column].to_string())


In [None]:

# Plot histograms with KDE
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes = axes.flatten()  # Flatten to easily iterate over

for i, column in enumerate(variables):
    sns.histplot(df_aggregated[column], bins=20, kde=True, color="blue", ax=axes[i])
    axes[i].set_xlabel(column, fontsize=12)
    axes[i].set_ylabel("Frequency", fontsize=12)
    axes[i].set_title(f"Histogram of {column}", fontsize=14)

plt.tight_layout()
plt.show()


In [None]:

# Plot box plots for each variable
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes = axes.flatten()

for i, column in enumerate(variables):
    sns.boxplot(x=df_aggregated[column], ax=axes[i], whis=1.5)
    axes[i].set_title(f'Box Plot for {column}', fontsize=14)
    axes[i].set_xlabel(column, fontsize=12)
    axes[i].set_ylabel('Value', fontsize=12)

plt.tight_layout()
plt.show()


In [None]:

# Plot KDE plots for each variable
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes = axes.flatten()

for i, column in enumerate(variables):
    sns.kdeplot(df_aggregated[column], color='red', ax=axes[i])
    axes[i].set_title(f'KDE Plot of {column}', fontsize=14)
    axes[i].set_xlabel(column, fontsize=12)
    axes[i].set_ylabel('Density', fontsize=12)

plt.tight_layout()
plt.show()


In [None]:

# Plot Distribution plots (displot) for each variable
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes = axes.flatten()

for i, column in enumerate(variables):
    sns.histplot(df_aggregated[column], bins=20, kde=True, color="blue", ax=axes[i])
    axes[i].set_title(f'Distribution Plot of {column}', fontsize=14)
    axes[i].set_xlabel(column, fontsize=12)
    axes[i].set_ylabel('Frequency', fontsize=12)

plt.tight_layout()
plt.show()


In [None]:

# Time Series Analysis for each variable
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12, 8))
axes = axes.flatten()

for i, column in enumerate(variables):
    axes[i].plot(df_aggregated["Year"], df_aggregated[column], marker='o', linestyle='-', label=f"{column} Trend")
    axes[i].set_xlabel("Year", fontsize=12)
    axes[i].set_ylabel(column, fontsize=12)
    axes[i].set_title(f"Time Series (1900-2023) of {column}", fontsize=14)
    axes[i].legend()

plt.tight_layout()
plt.show()


## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:

# Bivariate Analysis - Scatter plots for pairs of variables
sns.pairplot(df_aggregated[variables], diag_kind='kde', markers='o')
plt.suptitle('Pair Plot of Climate Variables', y=1.02, fontsize=16)
plt.tight_layout()
plt.show()


In [None]:

# Compute correlation coefficients (excluding the 'Year' variable)
correlation_matrix = df_aggregated[variables].corr()
print("Correlation Coefficients:\n", correlation_matrix)


In [None]:


# Correlation Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)
plt.title('Correlation Heatmap', fontsize=16)
plt.tight_layout()  # Ensures everything fits well in the figure
plt.show()


In [None]:

# Line plots to analyze trends over time
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
variables = ['Temperature', 'CO2', 'Sea_Level', 'Ice_Area']
colors = ['r', 'g', 'b', 'purple']

for ax, var, color in zip(axes.flatten(), variables, colors):
    sns.lineplot(data=df_aggregated, x='Year', y=var, ax=ax, color=color)
    ax.set_title(f'Trend of {var} Over Time', fontsize=14)
    ax.set_xlabel('Year', fontsize=12)
    ax.set_ylabel(var, fontsize=12)

plt.tight_layout()
plt.show()


## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:


# Create a 3D figure and scatter plot
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection="3d")

# Scatter plot
sc = ax.scatter(df_aggregated['Year'], df_aggregated['CO2'], df_aggregated['Temperature'], 
                c=df_aggregated['Sea_Level'], cmap='coolwarm', s=50, alpha=0.7)

# Set labels and title
ax.set_xlabel('Year', fontsize=12)
ax.set_ylabel('CO2 Levels', fontsize=12)
ax.set_zlabel('Temperature', fontsize=12)
ax.set_title('3D Scatter Plot of Temperature vs CO2 vs Year', fontsize=14)

# Color bar
cbar = plt.colorbar(sc)
cbar.set_label('Sea Level', fontsize=12)

plt.show()


In [None]:



# Reshape data for Seaborn
df_melted = df_aggregated.melt(id_vars='Year', value_vars=['Temperature', 'CO2', 'Sea_Level', 'Ice_Area'])

# Create multiple line plots for each variable using FacetGrid
g = sns.FacetGrid(df_melted, col='variable', col_wrap=2, height=4, sharex=True)
g.map(sns.lineplot, 'Year', 'value')

# Set the title and show the plot
g.set_titles("{col_name}", fontsize=14)
g.set_axis_labels("Year", "Value", fontsize=12)

plt.tight_layout()
plt.show()


## 5. Conclusions and Insights

Summarize your findings and discuss their implications.

Based on the climate data analysis, several important findings and insights were derived from the data, which provide a clearer understanding of the climate trends over time. The following points summarize the key observations:

Key Findings:
Global Average Temperature Increase:

There has been a significant increase in the global average temperature over the years, which is a clear indicator of global warming. This trend is particularly noticeable after the mid-20th century, aligning with the industrial revolution and increased greenhouse gas emissions.
Rising CO2 Concentration:

The CO2 concentration has been rising consistently, showing a strong correlation with the increase in global temperature. This is a crucial finding as CO2 is one of the primary greenhouse gases driving climate change. The spike in CO2 concentration around the 1950s aligns with industrialization and increased fossil fuel consumption.
Sea Level Rise:

The sea level has been rising steadily over the years, which is a direct result of the melting polar ice caps and glaciers, as well as thermal expansion of the ocean due to global warming. This finding has significant implications for coastal areas, as rising sea levels threaten ecosystems, infrastructure, and populations living in these areas.
Declining Arctic Ice Area:

The Arctic ice area has been decreasing over time, a clear sign of the impact of global warming. The shrinking ice caps not only contribute to rising sea levels but also disrupt ecosystems and biodiversity in the Arctic region.
Correlations Between Variables:

Strong positive correlations were observed between temperature and CO2 levels, which is expected given that CO2 is a major driver of global temperature increases. The relationship between temperature and sea level is also significant, as higher temperatures lead to the melting of ice sheets and glaciers, contributing to sea level rise.
The analysis suggests that as one variable increases (e.g., temperature), others like CO2 and sea level rise tend to follow suit.
Implications of the Findings:
Impact on Policy and Decision-Making:

The increase in global temperature and CO2 concentration underscores the urgent need for climate action. Policymakers and governments should prioritize efforts to reduce carbon emissions and transition to renewable energy sources to mitigate climate change.
The rising sea levels and shrinking ice areas indicate the necessity for more robust climate adaptation strategies, particularly for coastal and polar regions. This includes investing in infrastructure to withstand flooding and potential displacement of populations.
Environmental and Ecological Impact:

The decline in Arctic ice and the corresponding rise in sea levels pose threats to biodiversity, particularly in the Arctic region. There is an urgent need to protect vulnerable species and ecosystems.
Changes in temperature and sea levels also have far-reaching effects on agriculture, water resources, and biodiversity across the globe. Understanding these trends allows for better preparation for future challenges related to climate change.
Future Climate Scenarios:

If current trends continue, it is likely that the world will experience more extreme weather events, such as heatwaves, storms, and floods. Addressing these trends requires not only reducing greenhouse gas emissions but also enhancing resilience to these events through infrastructure, better land use practices, and disaster preparedness.
The Role of Climate Science and Research:

Continued climate research is essential to refine our understanding of the complex interactions between temperature, CO2, sea levels, and ice areas. This will allow for more accurate predictions and help guide both mitigation and adaptation efforts in the coming decades.
Final Thoughts:
The analysis strongly supports the scientific consensus that human activity, particularly the burning of fossil fuels, is driving climate change. Immediate action is needed to address these issues. In addition to global initiatives, local efforts and innovative solutions will play a critical role in reducing the impact of climate change and fostering a sustainable future for generations to come.