<a href="https://colab.research.google.com/github/AnjuMau8418/Carbon-Emissions-Impact-Analysis-with-Python/blob/main/Carbon_Emissions_Impact_Analysis_with_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this analysis, we aim to explore the impact of carbon emissions on global temperatures. The focus will be on identifying historical trends, detecting anomalies, and simulating potential future scenarios to understand how changes in CO₂ concentrations influence temperature anomalies.

In [None]:
import pandas as pd

In [None]:
# load the datasets
temperature_data = pd.read_csv('temperature.csv')
co2_data = pd.read_csv('carbon_emmission.csv')

In [None]:
# preview temperature data
temperature_data.head()

Unnamed: 0,ObjectId,Country,ISO2,ISO3,F1961,F1962,F1963,F1964,F1965,F1966,...,F2013,F2014,F2015,F2016,F2017,F2018,F2019,F2020,F2021,F2022
0,1,"Afghanistan, Islamic Rep. of",AF,AFG,-0.113,-0.164,0.847,-0.764,-0.244,0.226,...,1.281,0.456,1.093,1.555,1.54,1.544,0.91,0.498,1.327,2.012
1,2,Albania,AL,ALB,0.627,0.326,0.075,-0.166,-0.388,0.559,...,1.333,1.198,1.569,1.464,1.121,2.028,1.675,1.498,1.536,1.518
2,3,Algeria,DZ,DZA,0.164,0.114,0.077,0.25,-0.1,0.433,...,1.192,1.69,1.121,1.757,1.512,1.21,1.115,1.926,2.33,1.688
3,4,American Samoa,AS,ASM,0.079,-0.042,0.169,-0.14,-0.562,0.181,...,1.257,1.17,1.009,1.539,1.435,1.189,1.539,1.43,1.268,1.256
4,5,"Andorra, Principality of",AD,AND,0.736,0.112,-0.752,0.308,-0.49,0.415,...,0.831,1.946,1.69,1.99,1.925,1.919,1.964,2.562,1.533,3.243


Temperature_data contains annual temperature anomalies measured in degrees Celsius across decades.

In [None]:
# preview co2_data
co2_data.head()

Unnamed: 0,ObjectId,Country,Date,Value
0,1,World,1958M03,315.7
1,2,World,1958M04,317.45
2,3,World,1958M05,317.51
3,4,World,1958M06,317.24
4,5,World,1958M07,315.86


In [None]:
co2_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1570 entries, 0 to 1569
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   ObjectId  1570 non-null   int64  
 1   Country   1570 non-null   object 
 2   Date      1570 non-null   object 
 3   Value     1570 non-null   float64
 4   Month     1570 non-null   int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 61.5+ KB


CO2_data contains monthly global atmospheric CO₂ concentrations in parts per million (ppm).

**Calculate key statistics for temperature changes and CO₂ concentrations, such as mean, median, and variance**




In [None]:
# selecting and computing statistics for temperature changes
temperature_values = temperature_data.filter(regex='^F').stack()  # extracting all year columns
temperature_stats = {
    "Mean": temperature_values.mean(),
    "Median": temperature_values.median(),
    "Variance": temperature_values.var()
}

temperature_stats

{'Mean': 0.5377713483146068, 'Median': 0.47, 'Variance': 0.4294524831504378}

The mean temperature change is approximately 0.54°C, with a median of 0.47°C and a variance of 0.43, indicating slight variability in temperature anomalies.

In [None]:
# computing statistics for CO2 concentrations
co2_values = co2_data["Value"]  # extracting the Value column
co2_stats = {
    "Mean": co2_values.mean(),
    "Median": co2_values.median(),
    "Variance": co2_values.var()
}
co2_stats

{'Mean': 180.71615286624203, 'Median': 313.835, 'Variance': 32600.00200469294}

For CO₂ concentrations, the mean is 180.72 ppm, the median is significantly higher at 313.84 ppm, and the variance is 32,600, which reflects substantial variability in CO₂ levels over the dataset’s timeframe. This highlights the stronger fluctuation in CO₂ data compared to temperature changes.

**Time-Series Analysis** :                               
A statistical method that analyzes data points gathered over time. It can help identify patterns and trends in data, which can be used to make predictions.

**Examine how temperature changes and CO₂ concentrations have evolved overtime and the relationships between them**

In [None]:
import plotly.graph_objects as go
import plotly.express as px

In [None]:
# temperature: averaging across countries for each year
temperature_years = temperature_data.filter(regex='^F').mean(axis=0)
temperature_years.index = temperature_years.index.str.replace('F', '').astype(int)

In [None]:
# CO2: parsing year and averaging monthly data
co2_data['Year'] = co2_data['Date'].str[:4].astype(int)
co2_yearly = co2_data.groupby('Year')['Value'].mean()

In [None]:
# time-series plot for temperature and CO2 levels
fig = go.Figure()
fig.add_trace(go.Scatter(
    x=temperature_years.index, y=temperature_years.values,
    mode='lines+markers', name="Temperature Change (°C)"
))
fig.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_yearly.values,
    mode='lines+markers', name="CO₂ Concentration (ppm)", line=dict(dash='dash')
))
fig.update_layout(
    title="Time-series of Temperature Change and CO₂ Concentrations",
    xaxis_title="Year",
    yaxis_title="Values",
    template="plotly_dark",
    legend_title="Metrics"
)
fig.show()

The time-series graph shows a consistent increase in CO₂ concentrations (measured in ppm) over the years, which indicates the accumulation of greenhouse gases in the atmosphere. Simultaneously, a slight upward trend in global temperature change suggests that rising CO₂ levels are associated with global warming. The temporal alignment supports the hypothesis of CO₂’s significant contribution to temperature increase.

In [None]:
# correlation heatmap for temperature and CO2 levels
merged_data = pd.DataFrame({
    "Temperature Change": temperature_years,
    "CO₂ Concentration": co2_yearly
}).dropna()

heatmap_fig = px.imshow(
    merged_data.corr(),
    text_auto=".2f",
    color_continuous_scale="RdBu",  # diverging colormap similar to coolwarm
    title="Correlation Heatmap"
)
heatmap_fig.update_layout(
    template="plotly_dark"
)
heatmap_fig.show()

The heatmap reveals a strong positive correlation (0.96) between CO₂ concentrations and temperature changes. This statistical relationship reinforces the observation that higher CO₂ levels are closely linked with increasing global temperatures, which highlights the importance of addressing carbon emissions to mitigate climate change.

In [None]:
# scatter plot: temperature vs CO2 concentrations
scatter_fig = px.scatter(
    merged_data,
    x="CO₂ Concentration", y="Temperature Change",
    labels={"CO₂ Concentration": "CO₂ Concentration (ppm)", "Temperature Change": "Temperature Change (°C)"},
    title="Temperature Change vs CO₂ Concentration",
    template="plotly_dark"
)
scatter_fig.update_traces(marker=dict(size=10, opacity=0.7))
scatter_fig.show()

The scatter plot shows a clear linear trend, where higher CO₂ concentrations correspond to greater temperature changes. This visual evidence underscores the direct relationship between CO₂ emissions and global warming, which provides further support for policies targeting reductions in carbon emissions to combat climate impacts.

**Trends and Seasonal Variations Analysis**

Identify long-term trends and seasonal variations in the data using linear regression

In [None]:
from scipy.stats import linregress

In [None]:
# temperature trend
temp_trend = linregress(temperature_years.index, temperature_years.values)
temp_trend_line = temp_trend.slope * temperature_years.index + temp_trend.intercept

LinregressResult(slope=0.02617007657931695, intercept=-51.604381776430415, rvalue=0.937864499613493, pvalue=2.8698368576501183e-29, stderr=0.001250031262730772, intercept_stderr=2.489537765485542)

In [None]:
# CO2 trend
co2_trend = linregress(co2_yearly.index, co2_yearly.values)
co2_trend_line = co2_trend.slope * co2_yearly.index + co2_trend.intercept

In [None]:
# visualization of the Co2 concetration and temperature change trend
fig_trends = go.Figure()

fig_trends.add_trace(go.Scatter(
    x=temperature_years.index, y=temperature_years.values,
    mode='lines+markers', name="Temperature Change (°C)"
))
fig_trends.add_trace(go.Scatter(
    x=temperature_years.index, y=temp_trend_line,
    mode='lines', name=f"Temperature Trend (Slope: {temp_trend.slope:.2f})", line=dict(dash='dash')
))
fig_trends.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_yearly.values,
    mode='lines+markers', name="CO₂ Concentration (ppm)"
))
fig_trends.add_trace(go.Scatter(
    x=co2_yearly.index, y=co2_trend_line,
    mode='lines', name=f"CO₂ Trend (Slope: {co2_trend.slope:.2f})", line=dict(dash='dash')
))

fig_trends.update_layout(
    title="Trends in Temperature Change and CO₂ Concentrations",
    xaxis_title="Year",
    yaxis_title="Values",
    template="plotly_white",
    legend_title="Metrics"
)
fig_trends.show()


The graph shows the linear trends in both temperature change and CO₂ concentrations over time, represented by their respective slopes. The CO₂ trend has a much steeper slope (0.32) compared to temperature (0.03), which indicates a faster rate of increase in CO₂ emissions relative to temperature change. This suggests that while CO₂ levels are rising rapidly, the temperature impact, though slower, is accumulating steadily and may have long-term consequences.

In [None]:
# seasonal variations in CO2 concentrations
co2_data['Month'] = co2_data['Date'].str[-2:].astype(int)
co2_monthly = co2_data.groupby('Month')['Value'].mean()

fig_seasonal = px.line(
    co2_monthly,
    x=co2_monthly.index,
    y=co2_monthly.values,
    labels={"x": "Month", "y": "CO₂ Concentration (ppm)"},
    title="Seasonal Variations in CO₂ Concentrations",
    markers=True
)
fig_seasonal.update_layout(
    xaxis=dict(tickmode="array", tickvals=list(range(1, 13))),
    template="plotly_white"
)
fig_seasonal.show()

The above graph highlights the seasonal fluctuations in CO₂ concentrations, which peak during late spring and early summer (around May) and reach the lowest levels in fall (around September). These variations are likely due to natural processes such as plant photosynthesis, which absorbs CO₂ during the growing season, and respiration, which releases CO₂ in the off-season. This seasonal cycle underscores the role of natural carbon sinks in moderating atmospheric CO₂ levels.

**Correlation and Causality Analysis**

To quantify the relationship between CO₂ and temperature anomalies, we will now compute **Pearson and Spearman correlation coefficients** and to investigate whether changes in CO₂ cause temperature anomalies, we will perform **Granger Causality tests**.

In [None]:
from scipy.stats import pearsonr, spearmanr
from statsmodels.tsa.stattools import grangercausalitytests

**Spearman Correlation:** Spearman’s correlation, another name for Spearman’s rank correlation coefficient, is a statistical tool that dives into how two variables are connecte. Instead of assuming a straight line relationship, it assesses how much one variable tends to go up or down as the other changes along with it. This change, called a monotonic relationship, can be either a steady increase together or a consistent decrease together. Even if the data doesn’t form a perfect line, Spearman’s correlation can reveal this underlying trend.

**Pearson Correlation Coefficient:** The Pearson correlation coefficient also known as linear correlation is a statistical measure that quantifies the strength and direction of a linear relationship between two continuous variables. It ranges from -1 to 1, with values close to -1 indicating a strong negative linear relationship, values close to 1 indicating a strong positive linear relationship, and 0 indicating no linear relationship.

In [None]:
# pearson and spearman correlation coefficients
pearson_corr, _ = pearsonr(merged_data["CO₂ Concentration"], merged_data["Temperature Change"])
spearman_corr, _ = spearmanr(merged_data["CO₂ Concentration"], merged_data["Temperature Change"])
print("pearson_corr : ", pearson_corr)
print("spearman_corr : ", spearman_corr)

pearson_corr :  0.9554282559257312
spearman_corr :  0.9379013371609882


Pearson Correlation (0.9554) indicates a very strong linear relationship between CO₂ concentrations and temperature changes. Spearman Correlation (0.9379) indicates a very strong monotonic relationship between CO₂ concentrations and temperature changes.

In [None]:
# granger causality test
granger_data = merged_data.diff().dropna()  # first differencing to make data stationary
granger_results = grangercausalitytests(granger_data, maxlag=3, verbose=False)
print("granger_results :", granger_results)

granger_results : {1: ({'ssr_ftest': (3.3160399767925055, 0.07385687345425217, 57.0, 1), 'ssr_chi2test': (3.49056839662369, 0.06171939026911884, 1), 'lrtest': (3.3928082074778843, 0.06548133596441778, 1), 'params_ftest': (3.3160399767925135, 0.07385687345425185, 57.0, 1.0)}, [<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7da3ac76ffd0>, <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7da3ac74a550>, array([[0., 1., 0.]])]), 2: ({'ssr_ftest': (0.3591506212976526, 0.6999244302356598, 54.0, 2), 'ssr_chi2test': (0.7848106169096851, 0.675430300863396, 2), 'lrtest': (0.7796367217699753, 0.677179865679328, 2), 'params_ftest': (0.35915062129765707, 0.6999244302356555, 54.0, 2.0)}, [<statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7da3a066c5d0>, <statsmodels.regression.linear_model.RegressionResultsWrapper object at 0x7da3a066c810>, array([[0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.]])]), 3: ({'ssr_ftest': (1.0755


verbose is deprecated since functions should not print results



**Granger causality test**: The Granger causality test is a statistical method that assesses if one time series can help predict another time series based on the chronological sequence of events, emphasizing that the cause precedes the effect.

In [None]:
# extracting p-values for causality
granger_p_values = {f"Lag {lag}": round(results[0]['ssr_chi2test'][1], 4)
                    for lag, results in granger_results.items()}

print("granger_p_values : ", granger_p_values)

granger_p_values :  {'Lag 1': 0.0617, 'Lag 2': 0.6754, 'Lag 3': 0.2994}


**Granger Causality Test:** The p-values for lags 1, 2, and 3 are as follows:

Lag 1: 0.0617 (slightly above the common significance threshold of 0.05, suggesting weak evidence for causality).

Lag 2: 0.6754 (not significant, no evidence of causality).

Lag 3: 0.2994 (not significant, no evidence of causality).

There is a very strong correlation between CO₂ concentrations and temperature changes. However, Granger Causality tests do not provide strong evidence that changes in CO₂ concentrations directly cause changes in temperature within the lags tested.

**Lagged Effects Analysis**: Analyze whether CO₂ concentrations from previous years (lagged values) influence current temperature anomalies.This would allow us to test if historical CO₂ levels have a delayed impact on temperature changes. Here we would use Ordinary Least Squares (OLS) regression model.

In [None]:
import statsmodels.api as sm

In [None]:
# creating lagged CO2 data to investigate lagged effects
merged_data['CO₂ Lag 1'] = merged_data["CO₂ Concentration"].shift(1)
merged_data['CO₂ Lag 2'] = merged_data["CO₂ Concentration"].shift(2)
merged_data['CO₂ Lag 3'] = merged_data["CO₂ Concentration"].shift(3)

In [None]:
# dropping rows with NaN due to lags
lagged_data = merged_data.dropna()

X = lagged_data[['CO₂ Concentration', 'CO₂ Lag 1', 'CO₂ Lag 2', 'CO₂ Lag 3']]
y = lagged_data['Temperature Change']
X = sm.add_constant(X)  # adding a constant for intercept

model = sm.OLS(y, X).fit()

model_summary = model.summary()
model_summary

0,1,2,3
Dep. Variable:,Temperature Change,R-squared:,0.949
Model:,OLS,Adj. R-squared:,0.945
Method:,Least Squares,F-statistic:,252.5
Date:,"Thu, 13 Mar 2025",Prob (F-statistic):,2.9699999999999997e-34
Time:,15:16:12,Log-Likelihood:,45.098
No. Observations:,59,AIC:,-80.2
Df Residuals:,54,BIC:,-69.81
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-4.7980,0.317,-15.137,0.000,-5.434,-4.163
CO₂ Concentration,0.3245,0.055,5.942,0.000,0.215,0.434
CO₂ Lag 1,-0.2962,0.068,-4.361,0.000,-0.432,-0.160
CO₂ Lag 2,0.0104,0.068,0.153,0.879,-0.126,0.146
CO₂ Lag 3,-0.0107,0.056,-0.191,0.849,-0.123,0.101

0,1,2,3
Omnibus:,2.369,Durbin-Watson:,1.554
Prob(Omnibus):,0.306,Jarque-Bera (JB):,2.077
Skew:,-0.457,Prob(JB):,0.354
Kurtosis:,2.902,Cond. No.,7540.0


The OLS regression results indicate a strong relationship between CO₂ concentration and temperature change, with an R-squared value of 0.949, meaning 94.9% of the variance in temperature change is explained by the model. The coefficient for CO₂ concentration (0.3245) is statistically significant (p < 0.05), which suggests a positive association between CO₂ levels and temperature change.

**Clustering Climate Patterns**

We group years based on similarities in temperature anomalies and CO₂ concentrations using K-Means clustering.

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import numpy as np

In [None]:
# preparing the data for clustering
clustering_data = merged_data[["Temperature Change", "CO₂ Concentration"]].dropna()

scaler = StandardScaler()
scaled_data = scaler.fit_transform(clustering_data)

In [None]:
# applying K-Means clustering
kmeans = KMeans(n_clusters=3, random_state=42)  # assuming 3 clusters for simplicity
clustering_data['Cluster'] = kmeans.fit_predict(scaled_data)

In [None]:
# adding labels for periods with similar climate patterns
clustering_data['Label'] = clustering_data['Cluster'].map({
    0: 'Moderate Temp & CO₂',
    1: 'High Temp & CO₂',
    2: 'Low Temp & CO₂'
})

In [None]:
# Visualization of clustering of years based climate patterns
import plotly.express as px

fig_clusters = px.scatter(
    clustering_data,
    x="CO₂ Concentration",
    y="Temperature Change",
    color="Label",
    color_discrete_sequence=px.colors.qualitative.Set2,
    labels={
        "CO₂ Concentration": "CO₂ Concentration (ppm)",
        "Temperature Change": "Temperature Change (°C)",
        "Label": "Climate Pattern"
    },
    title="Clustering of Years Based on Climate Patterns"
)

fig_clusters.update_layout(
    template="plotly_dark",
    legend_title="Climate Pattern"
)

fig_clusters.show()

The clustering graph segments years into three distinct climate patterns based on CO₂ concentration and temperature change: **low CO₂ and temperature (green), moderate CO₂ and temperature (orange), and high CO₂ and temperature (blue).**
The progression from green to orange and then to blue clusters reflects a clear trend of increasing temperature change corresponding to rising CO₂ levels, effectively illustrating the correlation between greenhouse gas concentrations and global temperature variations.



This clustering emphasizes the cumulative and escalating impact of carbon emissions on global temperature patterns, which illustrates the need for targeted interventions to mitigate future increases.



**Predicting Temperature Changes Under What If Analysis**

In [None]:
# setting up a simple predictive model using linear regression
from sklearn.linear_model import LinearRegression

In [None]:
# Preparing data
X = merged_data[["CO₂ Concentration"]].values  # CO₂ concentration as input
y = merged_data["Temperature Change"].values   # temperature change as target

In [None]:
model = LinearRegression()
model.fit(X, y)

In [None]:
# function to simulate "what-if" scenarios
def simulate_temperature_change(co2_percentage_change):
    # Calculate new CO2 concentrations
    current_mean_co2 = merged_data["CO₂ Concentration"].mean()
    new_co2 = current_mean_co2 * (1 + co2_percentage_change / 100)

    # predict temperature change
    predicted_temp = model.predict([[new_co2]])
    return predicted_temp[0]


In [None]:
# simulating scenarios
scenarios = {
    "Increase CO₂ by 10%": simulate_temperature_change(10),
    "Decrease CO₂ by 10%": simulate_temperature_change(-10),
    "Increase CO₂ by 20%": simulate_temperature_change(20),
    "Decrease CO₂ by 20%": simulate_temperature_change(-20),
}
scenarios

{'Increase CO₂ by 10%': 1.0866445037958163,
 'Decrease CO₂ by 10%': -0.059993041237237144,
 'Increase CO₂ by 20%': 1.6599632763123422,
 'Decrease CO₂ by 20%': -0.6333118137537621}

A 10% increase in CO₂ results in a notable rise in temperature anomalies, which demonstrates the sensitivity of global temperatures to CO₂ levels. Conversely, a 10-20% reduction in CO₂ could lead to significant cooling effects, which will potentially reverse some warming trends.

**Summery**:                                                  
Our analysis highlights a strong positive correlation between rising CO₂ concentrations and global temperature anomalies, with CO₂ levels increasing at a faster rate than temperature changes. Time-series and clustering analyses reveal clear trends of escalating emissions driving temperature increases, while seasonal variations underscore the moderating role of natural carbon sinks. Lagged effects suggest that current CO₂ levels have the most significant impact on temperature changes, with diminishing influence from past emissions. Simulating “what-if” scenarios demonstrate the sensitivity of global temperatures to CO₂ levels, which emphasizes that even modest reductions in emissions could significantly mitigate global warming. These findings underline the urgent need for actionable policies to address climate change effectively.