##**Problem Statement**

Climate change, largely driven by increasing carbon dioxide (CO₂) emissions, poses a critical threat to global ecosystems and human societies. This study seeks to analyze the historical relationship between CO₂ concentrations and global temperature anomalies to identify long-term trends and key turning points. The objective is threefold:

(1) To examine how industrial and anthropogenic activities have influenced climate patterns over time,

(2) To detect significant anomalies that may indicate unusual natural or human-driven events, and

(3) To simulate future climate scenarios based on varying CO₂ emission trajectories. Through this comprehensive analysis, the study aims to generate actionable insights that can inform effective, data-driven policy decisions to mitigate the impacts of climate change.

##**About the Dataset**

To support this analysis, two comprehensive datasets are utilized. The **Annual Surface Temperature Change** dataset captures temperature anomalies across various countries from 1960 to 2022, measured in degrees Celsius relative to a historical baseline. It offers annual observations, allowing a clear view of how global temperatures have evolved over time. Complementing this, the **Monthly Atmospheric CO₂ Concentrations** dataset provides monthly CO₂ measurements in parts per million (ppm) from 1958 to 2024, enabling detailed temporal insights into rising emission levels. Together, these datasets form a strong foundation for understanding historical climate trends, identifying anomalies, and simulating future scenarios to evaluate the impact of carbon emissions on global warming.

In [1]:
import pandas as pd

temperature_data = pd.read_csv('/content/temperature.csv')
co2_data = pd.read_csv('/content/carbon_emmission.csv')

temperature_data_preview = temperature_data.head()
co2_data_preview = co2_data.head()

print("Temperature Data Preview:")
print(temperature_data_preview)

print("\nCO2 Data Preview:")
print(co2_data_preview)

Temperature Data Preview:
   ObjectId                       Country ISO2 ISO3  F1961  F1962  F1963  \
0         1  Afghanistan, Islamic Rep. of   AF  AFG -0.113 -0.164  0.847   
1         2                       Albania   AL  ALB  0.627  0.326  0.075   
2         3                       Algeria   DZ  DZA  0.164  0.114  0.077   
3         4                American Samoa   AS  ASM  0.079 -0.042  0.169   
4         5      Andorra, Principality of   AD  AND  0.736  0.112 -0.752   

   F1964  F1965  F1966  ...  F2013  F2014  F2015  F2016  F2017  F2018  F2019  \
0 -0.764 -0.244  0.226  ...  1.281  0.456  1.093  1.555  1.540  1.544  0.910   
1 -0.166 -0.388  0.559  ...  1.333  1.198  1.569  1.464  1.121  2.028  1.675   
2  0.250 -0.100  0.433  ...  1.192  1.690  1.121  1.757  1.512  1.210  1.115   
3 -0.140 -0.562  0.181  ...  1.257  1.170  1.009  1.539  1.435  1.189  1.539   
4  0.308 -0.490  0.415  ...  0.831  1.946  1.690  1.990  1.925  1.919  1.964   

   F2020  F2021  F2022  
0  0.498  1

⚫ Here , we are using two datasets:

1) **Temperature Data**: Annual temperature anomalies measured in degrees Celsius across decades.

2) **CO₂ Data**: Monthly global atmospheric CO₂ concentrations in parts per million (ppm).

▶ Now, we will calculate key statistics for temperature changes and CO₂ concentrations, such as mean, median, and variance:

In [2]:
# selecting and computing statistics for temperature changes
temperature_values = temperature_data.filter(regex='^F').stack()
temperature_stats = {
    "Mean": temperature_values.mean(),
    "Median": temperature_values.median(),
    "Variance": temperature_values.var()
}

# computing statistics for CO2 concentrations
co2_values = co2_data["Value"]
co2_stats = {
    "Mean": co2_values.mean(),
    "Median": co2_values.median(),
    "Variance": co2_values.var()
}

temperature_stats, co2_stats

({'Mean': np.float64(0.5377713483146068),
  'Median': 0.47,
  'Variance': 0.4294524831504378},
 {'Mean': np.float64(180.71615286624203),
  'Median': 313.835,
  'Variance': 32600.00200469294})

⚫ The mean temperature change is approximately **0.54°C**, with a median of **0.47°C** and a variance of **0.43**, indicating slight variability in temperature anomalies. For CO₂ concentrations, the mean is **180.72 ppm**, the median is significantly higher at **313.84 ppm**, and the variance is **32,600**, which reflects substantial variability in CO₂ levels over the dataset’s timeframe. This highlights the stronger fluctuation in CO₂ data compared to temperature changes.

▶ Next, we'll examine how temperature changes and CO2 concentrations have evolved overtime and the relationships between them:

In [3]:
import plotly.graph_objects as go
import plotly.express as px

#extracting time-series data for plotting
#temperature : averaging across countries for each year
temperature_years = temperature_data.filter(regex='^F').mean(axis=0)
temperature_years.index = temperature_years.index.str.replace('F','').astype(int)

#Co2: parsing year and averaging monthly data
co2_data['Year'] = co2_data['Date'].str[:4].astype(int)
co2_yearly = co2_data.groupby('Year')['Value'].mean()

#Time-series plot temperature and CO2 levels
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=temperature_years.index, y=temperature_years.values,
    mode='lines+markers', name="Temperature Change (°C)"
))
fig.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_yearly.values,
    mode='lines+markers', name="CO2 Concentration (ppm)", line = dict(dash='dash')
))
fig.update_layout(
    title="Time-series of Temperature Change and CO2 Concentrations",
    xaxis_title="Year",
    yaxis_title="Values",
    template="plotly_white",
    legend_title="Metrics"
)
fig.show()

# correlation heatmap
merged_data = pd.DataFrame({
    "Temperature Change": temperature_years,
    "CO2 Concentration": co2_yearly
}).dropna()

heatmap_fig = px.imshow(
    merged_data.corr(),
    text_auto=".2f",
    color_continuous_scale='RdBu',
    title="Correlation Heatmap")
heatmap_fig.update_layout(
    template="plotly_white"
)
heatmap_fig.show()

# scatter plot: temperature Vs CO2 Concentrations
scatter_fig = px.scatter(
    merged_data,
    x="CO2 Concentration", y="Temperature Change",
    labels={"CO2 Concentration": "CO2 Concentration (ppm)", "Temperatur Change": "Temperature Change (°C)"},
    title="Temperature Change Vs CO2 Concentration",
    template="plotly_white"
)
scatter_fig.update_traces(marker=dict(size=10, opacity=0.7))
scatter_fig.show()

◼ The time-series graph shows a consistent increase in CO₂ concentrations (measured in ppm) over the years, which indicates the accumulation of greenhouse gases in the atmosphere. Simultaneously, a slight upward trend in global temperature change suggests that rising CO₂ levels are associated with global warming. The temporal alignment supports the hypothesis of CO₂’s significant contribution to temperature increase.

◼ The heatmap reveals a strong positive correlation (0.96) between CO₂ concentrations and temperature changes. This statistical relationship reinforces the observation that higher CO₂ levels are closely linked with increasing global temperatures, which highlights the importance of addressing carbon emissions to mitigate climate change.

◼ The scatter plot shows a clear linear trend, where higher CO₂ concentrations correspond to greater temperature changes. This visual evidence underscores the direct relationship between CO₂ emissions and global warming, which provides further support for policies targeting reductions in carbon emissions to combat climate impacts.

##**Trends and Seasonal Variations Analysis**

Now, we will identify long-term trends and seasonal variations in the data using linear regression:

In [4]:
from scipy.stats import linregress

# temperature trend
temp_trend = linregress(temperature_years.index, temperature_years.values)
temp_trend_line = temp_trend.slope * temperature_years.index + temp_trend.intercept

# CO2 trend
co2_trend = linregress(co2_yearly.index, co2_yearly.values)
co2_trend_line = co2_trend.slope * co2_yearly.index + co2_trend.intercept

fig_trends = go.Figure()

fig_trends.add_trace(go.Scatter(
    x=temperature_years.index, y=temperature_years.values,
    mode='lines+markers', name="Temperature Change (°C)"
))
fig_trends.add_trace(go.Scatter(
    x=temperature_years.index, y=temp_trend_line,
    mode='lines', name=f"Temperature Trend (Slope: {temp_trend.slope:.2f})"
))
fig_trends.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_yearly.values,
    mode='lines+markers', name="CO2 Concentration (ppm)"
))
fig_trends.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_trend_line,
    mode='lines', name=f"CO2 Trend (Slope: {co2_trend.slope:.2f})", line=dict(dash='dash')
))

fig_trends.update_layout(
    title="Trends in Temperature Change and CO2 Concentrations",
    xaxis_title="Year",
    yaxis_title="Values",
    template="plotly_white",
    legend_title="Metrics"
)
fig_trends.show()

▶ The graph shows the linear trends in both temperature change and CO2 concentrations over time, represented by their respective slopes.The CO2 trend has a much steeper slope(0.32) compared to temperature(0.03), which indicates a faster rate of increase in CO2 emissions relative to temperature change. This suggests that while CO2 levels are rising rapidly, the temperature impact, though slower, is accumulating steadily and may have long-term consequences.

In [5]:
# seasonal variations in CO2 concentrations
co2_data['Month'] = co2_data['Date'].str[-2:].astype(int)
co2_monthly = co2_data.groupby('Month')['Value'].mean()

fig_seasonal = px.line(
    co2_monthly,
    x=co2_monthly.index,
    y=co2_monthly.values,
    labels={"x": "Month", "y": "CO₂ Concentration (ppm)"},
    title="Seasonal variations in CO₂ Concentrations",
    markers=True
)
fig_seasonal.update_layout(
    xaxis=dict(tickmode="array", tickvals=list(range(1,13))),
    template="plotly_white"
)
fig_seasonal.show()

▶ The above graph highlights the seasonal fluctuations in CO2 concentrations, which peak during late spring and early summer(around May) and reach the lowest levels in fall(around September). These variations are likely due to natural processes such as plant photosynthesis, which absorbs CO2 during the growing season , and respiration, which releases CO2 in the off-season .This seasonal cycle underscores the role of natural carbon sinks in moderating atmospheric CO2 levels.

##**Correlation and Causality Analysis**

-> To quantify the relationship between CO2 and temperature anomalies , we will now compute **Pearson** and **Spearman correlation coefficients**. And to investigate whether changes in CO2 cause temperature anomalies , we will perform **Granger Causality tests:**

In [6]:
from scipy.stats import pearsonr, spearmanr
from statsmodels.tsa.stattools import grangercausalitytests

# pearson and spearman correlation coefficents
pearson_corr, _ = pearsonr(merged_data["CO2 Concentration"], merged_data["Temperature Change"])
spearman_corr, _ = spearmanr(merged_data["CO2 Concentration"], merged_data["Temperature Change"])

# granger casuality test
granger_data = merged_data.diff().dropna()
granger_results = grangercausalitytests(granger_data, maxlag=3, verbose=False)

# extracting p-values for causality
granger_p_values = {f"Lag {lag}": round(results[0]['ssr_chi2test'][1], 4)
                    for lag, results in granger_results.items()}

pearson_corr, spearman_corr, granger_p_values


verbose is deprecated since functions should not print results



(np.float64(0.9554282559257312),
 np.float64(0.9379013371609882),
 {'Lag 1': np.float64(0.0617),
  'Lag 2': np.float64(0.6754),
  'Lag 3': np.float64(0.2994)})

▶ Here, **Pearson Correlation** (0.9554) indicates a very strong linear relationship between CO2 concentrations and temperature changes. **Spearman Correlation** (0.9374) indicates a very strong monotonic relationship between CO2 concentrations and temperature changes.

▶ Granger Causality Test: The p-values for lags 1, 2 and 3 are as follows:

*   Lag 1: 0.0617 (slightly above the common significance threshold of 0.05, suggesting weak evidence for causality).
*   Lag 2: 0.6754 (not significant, no evidence of causality).
*   Lag 3: 0.2994 (not significant, no evidence of causality).


▶ There is a very strong correlation between CO2 concentrations and temperature changes.However, Granger Causality tests do not provide strong evidence that changes in CO2 concentrations directly cause changes in temperature within the lags tested.

##**Lagged Effects Analysis**

In [7]:
import statsmodels.api as sm

# creating lagged CO2 data to investigate lagged effects
merged_data['CO2_Lag 1'] = merged_data['CO2 Concentration'].shift(1)
merged_data['CO2_Lag 2'] = merged_data['CO2 Concentration'].shift(2)
merged_data['CO2_Lag 3'] = merged_data['CO2 Concentration'].shift(3)

# dropping rows with NaN due to lags
lagged_data = merged_data.dropna()

X = lagged_data[['CO2 Concentration','CO2_Lag 1', 'CO2_Lag 2', 'CO2_Lag 3']]
y = lagged_data['Temperature Change']
X = sm.add_constant(X)

# fitting linear regression model
model = sm.OLS(y, X).fit()

model_summary = model.summary()
model_summary

0,1,2,3
Dep. Variable:,Temperature Change,R-squared:,0.949
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,252.5
Date:,"Sat, 28 Jun 2025",Prob (F-statistic):,2.9699999999999997e-34
Time:,06:11:19,Log-Likelihood:,45.098
No. Observations:,59,AIC:,-80.2
Df Residuals:,54,BIC:,-69.81
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-4.7980,0.317,-15.137,0.000,-5.434,-4.163
CO2 Concentration,0.3245,0.055,5.942,0.000,0.215,0.434
CO2_Lag 1,-0.2962,0.068,-4.361,0.000,-0.432,-0.160
CO2_Lag 2,0.0104,0.068,0.153,0.879,-0.126,0.146
CO2_Lag 3,-0.0107,0.056,-0.191,0.849,-0.123,0.101

0,1,2,3
Omnibus:,2.369,Durbin-Watson:,1.554
Prob(Omnibus):,0.306,Jarque-Bera (JB):,2.077
Skew:,-0.457,Prob(JB):,0.354
Kurtosis:,2.902,Cond. No.,7540.0


▶ The OLS regression results indicate a strong relationship between CO2 concentration and temperature change, with an R-squared value of 0.949, meaning 94.9% of the variance in temperature change is explained by the model. The coefficient for CO2 concentration(0.3245) is statistically significant
(p < 0.05), which suggests a positive association between CO2 levels and temperature change.

##**Clustering Climate Patterns**

▶ Next, we will group years based on similarities in temperature anomalies and CO2 concentrations using **K-Means clustering**:

In [8]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np

# preparing the data for clustering
clustering_data = merged_data[["Temperature Change", "CO2 Concentration"]].dropna()

scaler = StandardScaler()
scaled_data = scaler.fit_transform(clustering_data)

# applying K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)
clustering_data['Cluster'] = kmeans.fit_predict(scaled_data)

# adding labels for periods with similar climate patterns
clustering_data['Label'] = clustering_data['Cluster'].map({
    0: 'Moderate Temp & CO2',
    1: 'High Temp & CO2',
    2: 'Low Temp & CO2'
})

import plotly.express as px

fig_clusters = px.scatter(
    clustering_data,
    x="CO2 Concentration",
    y="Temperature Change",
    color="Label",
    color_discrete_sequence=px.colors.qualitative.Set2,
    labels={
        "CO2 Concentration": "CO2 Concentration (ppm)",
        "Temperature Change": "Temperature Change (°C)",
        "Label": "Climate Pattern"
    },
    title="Clustering of Years Based on Climate Patterns"
)

fig_clusters.update_layout(
    template="plotly_white",
    legend_title="Climate Pattern"
)

fig_clusters.show()

▶ The clustering graph segments years into three distinct climate patterns based on CO2 concentration and temperature change: low CO2 and temperature (green), moderate CO2 and temperature (orange), and high CO2 and temperature (blue). The progession from green to orange and then to blue clusters reflects a clear trend of increasing temperature change corresponding to rising CO2 levels, effectively illustrating the correlation between greenhouse gas concentrations and global temperature variations.

▶ This clustering emphasizes the cumulative and escalating impact of carbon emissions on global temperature patterns , which illustrates the need for targeted interventions to mitigate future increases.

##**Predicting Temperature Changes Under What If Analysis**

In [9]:
# setting up a simple predictive model using linear regression
from sklearn.linear_model import LinearRegression

# Preparing data
X = merged_data[["CO2 Concentration"]].values
y = merged_data["Temperature Change"].values

model = LinearRegression()
model.fit(X, y)

# function to simulate "what-if" scenarios
def simulate_temperature_change(co2_percentage_change):
    # Calculating new CO2 concentrations
    current_mean_co2 = merged_data["CO2 Concentration"].mean()
    new_co2 = current_mean_co2 * (1 + co2_percentage_change / 100)

    # predicting temperature change
    predicted_temp = model.predict([[new_co2]])
    return predicted_temp[0]

# simulating scenarios
scenarios = {
    "Increase CO2 by 10%": simulate_temperature_change(10),
    "Decrease CO2 by 10%": simulate_temperature_change(-10),
    "Increase CO2 by 20%": simulate_temperature_change(20),
    "Decrease CO2 by 20%": simulate_temperature_change(-20)
}

scenarios

{'Increase CO2 by 10%': np.float64(1.0866445037958163),
 'Decrease CO2 by 10%': np.float64(-0.059993041237237144),
 'Increase CO2 by 20%': np.float64(1.6599632763123422),
 'Decrease CO2 by 20%': np.float64(-0.6333118137537621)}

▶ A 10% increase in CO2 results in a notable rise in temperature anomalies,which demonstrates the sensitivity of global temperatures to CO2 levels.
Conversely, a 10-20% reduction in CO2 could lead to significant cooling effects, which will potentially reverse some warming trends.

##**SUMMARY**

*   **Strong Positive Correlation**: There is a clear positive relationship between rising CO₂ concentrations and global temperature anomalies.
*   **Faster CO₂ Growth**: CO₂ levels are increasing at a faster rate compared to temperature changes.
*   **Time-Series & Clustering Insights**: Analyses reveal consistent trends of escalating emissions driving temperature increases.
*   **Seasonal Variations**: Natural carbon sinks, such as forests and oceans, play a moderating role in seasonal CO₂ fluctuations.
*   **Lagged Effects**: Current CO₂ concentrations have the most significant impact on present temperature changes, with reduced influence from past emissions.
*   **Scenario Simulations**: “What-if” modeling shows global temperatures are highly sensitive to changes in CO₂ levels.
*   **Policy Implications**: Even modest reductions in emissions can substantially mitigate global warming, emphasizing the need for immediate and effective climate policies.






