# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot styling
plt.style.use('seaborn-whitegrid')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [None]:
# Load the dataset
df = pd.read_csv('data/Climate_Change_Indicators.csv')# Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

In [None]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

In [None]:
# TODO: Aggregate data by year to create a 124-year time series
# Your code here

## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:
# TODO: Perform univariate analysis for each climate variable
# Include descriptive statistics and appropriate visualizations
# Your code here

## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:
# TODO: Perform bivariate analysis
# Include correlation analysis and appropriate visualizations
# Plotting heatmap
# Your code here
correlation_matrix = df.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix of Climate Indicators')
plt.show()

In [None]:
scatter_pairs = [
    ('CO2 Concentration (ppm)', 'Global Average Temperature (°C)'),
    ('CO2 Concentration (ppm)', 'Sea Level Rise (mm)'),
    ('Global Average Temperature (°C)', 'Arctic Ice Area (million km²)')
]

# Plotting scatter plots
plt.figure(figsize=(15, 5))

for i, (x_var, y_var) in enumerate(scatter_pairs):
    plt.subplot(1, 3, i + 1)
    sns.scatterplot(data=df, x=x_var, y=y_var)
    plt.title(f'{y_var} vs {x_var}')
    plt.xlabel(x_var)
    plt.ylabel(y_var)

plt.tight_layout()
plt.show()

In [None]:
# Plotting variables over time
plt.figure(figsize=(12, 8))

# Global Temperature over time
plt.subplot(2, 2, 1)
sns.lineplot(data=df, x='Year', y='Global Average Temperature (°C)')
plt.title('Global Average Temperature Over Time')

# CO2 Concentration over time
plt.subplot(2, 2, 2)
sns.lineplot(data=df, x='Year', y='CO2 Concentration (ppm)')
plt.title('CO2 Concentration Over Time')

# Sea Level Rise over time
plt.subplot(2, 2, 3)
sns.lineplot(data=df, x='Year', y='Sea Level Rise (mm)')
plt.title('Sea Level Rise Over Time')

# Arctic Ice Area over time
plt.subplot(2, 2, 4)
sns.lineplot(data=df, x='Year', y='Arctic Ice Area (million km²)')
plt.title('Arctic Ice Area Over Time')

plt.tight_layout()
plt.show()

In [None]:
selected_vars = [
    'Global Average Temperature (°C)',
    'CO2 Concentration (ppm)',
    'Sea Level Rise (mm)',
    'Arctic Ice Area (million km²)'
]

sns.pairplot(df[selected_vars], diag_kind='kde')
plt.suptitle('Pairwise Scatter Plots of Climate Variables', y=1.02)
plt.show()

## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:
# TODO: Perform multivariate analysis
# Create advanced visualizations showing multiple variables
# Your code here

## 5. Conclusions and Insights

Summarize your findings and discuss their implications.

# TODO: Write your conclusions here