# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.animation as animation

# Set plot styling
#plt.style.use('seaborn-whitegrid')
sns.set_style('whitegrid')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [None]:
# Load the dataset
df = pd.read_csv("data/Climate_Change_Indicators.csv") # Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

In [None]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

In [None]:
# TODO: Aggregate data by year to create a 124-year time series
# Your code here
yearly_data = df.groupby('Year')[["Global Average Temperature (°C)","CO2 Concentration (ppm)", "Sea Level Rise (mm)", "Arctic Ice Area (million km²)"]].mean().reset_index()
yearly_data.head()



## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:
# TODO: Perform univariate analysis for each climate variable
# Include descriptive statistics and appropriate visualizations
# Your code here
print("Descriptive Statistics:")
print(yearly_data.describe())
print("\nVisualizations:")  # You can use line plots, histograms, or density plots
plt.figure(figsize=(15, 10))    # Set the figure size
plt.subplot(2, 2, 1)    # Create a 2x2 grid of subplots and select the first one            
sns.lineplot(x='Year', y='Global Average Temperature (°C)', data=yearly_data)
plt.title("Global Average Temperature (°C)")    # Set the title of the subplot  
plt.subplot(2, 2, 2)    # Select the second subplot
sns.lineplot(x='Year', y='CO2 Concentration (ppm)', data=yearly_data)   # Create a line plot    
plt.title("CO2 Concentration (ppm)")    # Set the title of the subplot
plt.subplot(2, 2, 3)    # Select the third subplot
sns.lineplot(x='Year', y='Sea Level Rise (mm)', data=yearly_data)   # Create a line plot
plt.title("Sea Level Rise (mm)")    # Set the title of the subplot
plt.subplot(2, 2, 4)    # Select the fourth subplot
sns.lineplot(x='Year', y='Arctic Ice Area (million km²)', data=yearly_data)   # Create a line plot
plt.title("Arctic Ice Area (million km²)")    # Set the title of the subplot
plt.tight_layout()    # Adjust the subplots to fit into the figure area
plt.show()


## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:
# TODO: Perform bivariate analysis
sns.pairplot(yearly_data, diag_kind='kde')
plt.show()


plt.figure(figsize=(8, 6))
sns.heatmap(yearly_data.corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title("Correlation Heatmap")
plt.show()

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
pairs = [('Global Average Temperature (°C)', 'CO2 Concentration (ppm)'),
         ('Sea Level Rise (mm)', 'Arctic Ice Area (million km²)')]

i, j = 0, 0
for x, y in pairs:
    sns.regplot(x=yearly_data[x], y=yearly_data[y], ax=axes[i, j])
    axes[i, j].set_title(f'Scatter Plot: {x} vs {y}')
    j += 1
    if j > 1:
        i += 1
        j = 0
plt.tight_layout()
plt.show()


## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:
# TODO: Perform multivariate analysis
import matplotlib.animation as animation


### 1️⃣ 3D Scatter Plot ###
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')

sc = ax.scatter(yearly_data['CO2 Concentration (ppm)'], 
                yearly_data['Sea Level Rise (mm)'], 
                yearly_data['Global Average Temperature (°C)'], 
                c=yearly_data['Year'], cmap='viridis')

ax.set_xlabel('CO2 Concentration (ppm)')
ax.set_ylabel('Sea Level Rise (mm)')
ax.set_zlabel('Global Avg Temp (°C)')
ax.set_title("3D Climate Data Analysis")

# Add colorbar
cbar = plt.colorbar(sc)
cbar.set_label('Year')

plt.show()

### 2️⃣ Pairplot to Explore Multivariate Relationships ###
sns.pairplot(yearly_data, diag_kind='kde', corner=True)
plt.suptitle("Pairplot of Climate Indicators", y=1.02)
plt.show()

### 3️⃣ Animated Line Plot ###
fig, ax = plt.subplots(figsize=(10, 6))

def update(frame):
    ax.clear()
    ax.plot(yearly_data['Year'][:frame], yearly_data['CO2 Concentration (ppm)'][:frame], label='CO2 (ppm)', color='red')
    ax.plot(yearly_data['Year'][:frame], yearly_data['Global Average Temperature (°C)'][:frame], label='Temp (°C)', color='blue')
    ax.plot(yearly_data['Year'][:frame], yearly_data['Sea Level Rise (mm)'][:frame], label='Sea Level (mm)', color='green')
    
    ax.set_xlabel('Year')
    ax.set_ylabel('Value')
    ax.set_title('Climate Indicators Over Time')
    ax.legend()
    ax.set_xlim(yearly_data['Year'].min(), yearly_data['Year'].max())

ani = animation.FuncAnimation(fig, update, frames=len(yearly_data), interval=100)

plt.show()


## 5. Conclusions and Insights

Summarize your findings and discuss their implications.


In [None]:
# TODO: Write your conclusions here
conclusions = '''
1. Global temperatures, CO2 concentration, and sea levels have shown an increasing trend over time, while Arctic ice area has been decreasing.
2. Strong correlations exist between CO2 levels and temperature rise, as well as between sea level rise and Arctic ice area reduction.
3. Future studies could explore causal relationships and predictive modeling for climate indicators.
'''
print(conclusions)
