# Climate Data Exploratory Data Analysis

## Introduction
This notebook contains an exploratory data analysis of climate data from 1900 to 2023. The dataset includes global temperatures, CO2 concentration, sea level rise, and Arctic ice area.

Your task is to perform a comprehensive EDA following the requirements in the README.md file.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot styling
plt.style.use('seaborn-whitegrid')
sns.set_palette('viridis')
%matplotlib inline

## 1. Data Preparation

Load the climate data and perform necessary cleaning and aggregation.

In [None]:
# Load the dataset
df = pd.read_csv('climate_data.csv') # Place the correct path to the file you are reading here (Make sure to load using the relative path)

# Display the first few rows of the dataset
df.head()

In [None]:
# Check for missing values and basic information about the dataset
print("Dataset Information:")
print(df.info())
print("\nMissing Values:")
print(df.isnull().sum())

In [None]:
# TODO: Aggregate data by year to create a 124-year time series
df['Year'] = pd.to_datetime(df['Year'], format='%Y')
df_yearly = df.groupby(df['Year'].dt.year).agg(np.mean)
df_yearly.head()

## 2. Univariate Analysis

Analyze each climate variable independently.

In [None]:
# TODO: Perform univariate analysis for each climate variable
# Include descriptive statistics and appropriate visualizations
plt.figure(figsize=(10, 6))
sns.histplot(df['Global Average Temperature (°C)'], bins=20, kde=True)
plt.title('Global Average Temperature Distribution')
plt.xlabel('Temperature (°C)')
plt.ylabel('Frequency')
plt.show()
print("\nDescriptive Statistics for Global Average Temperature:")
print(df['Global Average Temperature (°C)'].describe())
plt.figure(figsize=(10, 6))
sns.histplot(df['CO2 Concentration (ppm)'], bins=20, kde=True)
plt.title('CO2 Concentration Distribution')
plt.xlabel('CO2 Concentration (ppm)')
plt.ylabel('Frequency')
plt.show()
print("\nDescriptive Statistics for CO2 Concentration:")
print(df['CO2 Concentration (ppm)'].describe())
plt.figure(figsize=(10, 6))
sns.histplot(df['Sea Level Rise (mm)'], bins=20, kde=True)
plt.title('Sea Level Rise Distribution')
plt.xlabel('Sea Level Rise (mm)')
plt.ylabel('Frequency')
plt.show()
print("\nDescriptive Statistics for Sea Level Rise:")
print(df['Sea Level Rise (mm)'].describe())
plt.figure(figsize=(10, 6))
sns.histplot(df['Arctic Ice Area (million km²)'], bins=20, kde=True)
plt.title('Arctic Ice Area Distribution')
plt.xlabel('Arctic Ice Area (million km²)')
plt.ylabel('Frequency')
plt.show()
print("\nDescriptive Statistics for Arctic Ice Area:")
print(df['Arctic Ice Area (million km²)'].describe())

## 3. Bivariate Analysis

Explore relationships between pairs of climate variables.

In [None]:
# TODO: Perform bivariate analysis
# Include correlation analysis and appropriate visualizations
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['CO2 Concentration (ppm)'], y=df['Global Average Temperature (°C)'])
plt.title('CO2 Concentration vs Global Average Temperature')
plt.xlabel('CO2 Concentration (ppm)')
plt.ylabel('Global Average Temperature (°C)')
plt.show()
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['Sea Level Rise (mm)'], y=df['Global Average Temperature (°C)'])
plt.title('Sea Level Rise vs Global Average Temperature')
plt.xlabel('Sea Level Rise (mm)')
plt.ylabel('Global Average Temperature (°C)')
plt.show()
correlation_matrix = df[['Global Average Temperature (°C)', 'CO2 Concentration (ppm)', 'Sea Level Rise (mm)', 'Arctic Ice Area (million km²)']].corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## 4. Multivariate Analysis

Investigate relationships among three or more variables.

In [None]:
# TODO: Perform multivariate analysis
# Create advanced visualizations showing multiple variables
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(df[['Global Average Temperature (°C)', 'CO2 Concentration (ppm)', 'Sea Level Rise (mm)', 'Arctic Ice Area (million km²)']])
df_pca = pd.DataFrame(pca_result, columns=['PC1', 'PC2'])
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df_pca['PC1'], y=df_pca['PC2'])
plt.title('PCA: Climate Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

## 5. Conclusions and Insights

Summarize your findings and discuss their implications.

# TODO: Write your conclusions here