# Temperature Rise Prediction
This notebook analyzes global temperature trends and builds forecasting models.

## This notebook is the work of MIKOŁAJ HOJDA and can be found [here](https://www.kaggle.com/code/mikolajhojda/predictions-of-the-average-temperature-rise)
It focuses on the global change using a singular data source for the Earth which in my opinion is misleading but it provides a good reference point to delve deeper into local changes 

# Background

Climate Change is one of the biggest threats to our planet, so I decided to predict the average land temperature in the future. I chose three periods which I based on in creating my models. The reason behind creating three models is the different temperature growth rates throughout nearly 3 centuries. Data in the 18th and 19th centuries are also more distracted than nowadays. It can be a consequence of worse measure technology.

# Content:
1. [Setup](#1)
2. [EDA](#2)
3. [Trends](#3)
    * [Create a Trend Feature](#3.1)
4. [Linear Regression:](#4)
    * [1750 - 2015](#4.1)
    * [1850 - 2015](#4.2)
    * [1950 - 2015](#4.3)

# <a id="1">Setup</a>

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import datetime as dt
from scipy.stats import pearsonr
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

from learntools.time_series.style import *

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
temperatures = pd.read_csv('../input/climate-change-earth-surface-temperature-data/GlobalTemperatures.csv')

# <a id="2">EDA</a>

In [None]:
temperatures.head()

In [None]:
max(temperatures.dt)

In [None]:
temperatures.dtypes

In [None]:
temperatures.describe()

In [None]:
plt.figure(figsize=(18,10))
plt.scatter(data = temperatures, x = 'dt',y = 'LandAverageTemperature')
plt.show()

In [None]:
temperatures['Date'] = pd.to_datetime(temperatures.dt, format='%Y-%d-%m')
temperatures['Year'] = temperatures['Date'].dt.year
temperatures['Date'] = temperatures['Date'].map(dt.datetime.toordinal)

In [None]:
df = temperatures.groupby('Year')['LandAverageTemperature'].mean().reset_index()

In [None]:
plt.figure(figsize=(18,10))
plt.scatter(data = df, x = 'Year',y = 'LandAverageTemperature')
plt.show()

# <a id="3">Trends</a>

In [None]:
temperature_px = df['LandAverageTemperature']
df['10'] = temperature_px.rolling(window=10).mean()

plt.figure(figsize=(18,10))
ax = plt.subplot()
ax.plot(df['LandAverageTemperature'], alpha=0.8, label='land average temperature')
ax.plot(df['10'], color="orange", label='10-year land average temperature')
ax.set_xticks([0,50,100,150,200,250])
ax.set_xticklabels([1750,1800,1850,1900,1950,2000])
plt.xlabel('Years')
plt.ylabel('Temperature (in °C)')
plt.grid()
plt.legend()
plt.show()
plt.clf()

In [None]:
ax = df['LandAverageTemperature'].plot(**plot_params)
ax.set(title="Land Average Temperature per Year in the last 250 years", ylabel="Land Average Temperature")
ax.set_xticks([0,50,100,150,200,250])
ax.set_xticklabels([1750,1800,1850,1900,1950,2000])
plt.show()

## <a id="3.1">Create a Trend Feature</a>

In [None]:
trend = df['LandAverageTemperature'].rolling(
    window=10,
    center=True,
    min_periods=6,
).mean()

ax = df['LandAverageTemperature'].plot(**plot_params, alpha=0.5)
ax = trend.plot(ax=ax, linewidth=3)
ax.set(title="Land Average Temperature in the last 250 years", ylabel="Land Average Temperature")
ax.set_xticks([0,50,100,150,200,250])
ax.set_xticklabels([1750,1800,1850,1900,1950,2000])
plt.show()

In [None]:
from statsmodels.tsa.deterministic import DeterministicProcess

average_temperature = temperatures.groupby('Year').mean()['LandAverageTemperature']
y = average_temperature.copy()  # the target

# YOUR CODE HERE: Instantiate `DeterministicProcess` with arguments
# appropriate for a cubic trend model
dp = DeterministicProcess(index=y.index, order=3)
X = dp.in_sample()
X_fore = dp.out_of_sample(steps=90)

In [None]:
model = LinearRegression()
model.fit(X, y)

y_pred = pd.Series(model.predict(X), index=X.index)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)

ax = y.plot(**plot_params, alpha=0.5, title="Average Land Temperature", ylabel="Land Temperature")
ax = y_pred.plot(ax=ax, linewidth=3, label="Trend", color='C0')
ax = y_fore.plot(ax=ax, linewidth=3, label="Trend Forecast", color='C3')
ax.legend();

# <a id="4">Linear Regression</a>

## <a id="4.1">Data from 1750 - 2015</a>

In [None]:
corr, p = pearsonr(df['Year'], df['LandAverageTemperature'])
print('Pearson correlation of Year and Land Average Temperature: ' + str(corr))

Strong positive correlation

In [None]:
lr = LinearRegression()

In [None]:
X = df['Year']
y = df['LandAverageTemperature']

X = X.values.reshape(-1,1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)

In [None]:
lr.fit(X_train, y_train)
lr.score(X_test, y_test)

In [None]:
y_pred = lr.predict(X_test)

In [None]:
years = pd.DataFrame(X_test)

In [None]:
plt.figure(figsize=(18,10))
plt.scatter(X, y, alpha=0.6)
plt.plot(X_test, y_pred, color="orange")
plt.xlabel('Years')
plt.ylabel('Temperature (in °C)')
plt.show()
plt.clf()

In [None]:
print(lr.coef_)
print(10 * lr.coef_)

In [None]:
print(lr.predict(np.array([2030, 2050]).reshape(-1,1)))

#### Interpretation

Every year, the average land temperature increases by an average of 0.0047 °C. Every ten years, the average land temperature increases by an average of 0.0475 °C.
The average land temperature in 2030 will be 9.0572 °C and in 2050 9.1521 °C.

## <a id="4.2">Data from 1850 - 2015</a>

In [None]:
corr, p = pearsonr(df[df['Year'] >= 1850]['Year'], df[df['Year'] >= 1850]['LandAverageTemperature'])
print('Pearson correlation of Year and Land Average Temperature: ' + str(corr))

Very strong positive correlation

In [None]:
lr = LinearRegression()

In [None]:
X = df[df['Year'] >= 1850]['Year']
y = df[df['Year'] >= 1850]['LandAverageTemperature']

X = X.values.reshape(-1,1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)

In [None]:
lr.fit(X_train, y_train)
lr.score(X_test, y_test)

In [None]:
y_pred = lr.predict(X_test)

In [None]:
plt.figure(figsize=(18,10))
plt.scatter(X, y, alpha=0.6)
plt.plot(X_test, y_pred, color="orange")
plt.xlabel('Years')
plt.ylabel('Temperature (in °C)')
plt.show()
plt.clf()

In [None]:
print(lr.coef_)
print(10 * lr.coef_)

In [None]:
print(lr.predict(np.array([2030, 2050]).reshape(-1,1)))

#### Interpretation

Every year, the average land temperature increases by an average of 0.0084 °C. Every ten years, the average land temperature increases by an average of 0.0843 °C.
The average land temperature in 2030 will be 9.4019 °C and in 2050 9.5706 °C.

## <a id="4.3">Data from 1950 - 2015</a>

In [None]:
corr, p = pearsonr(df[df['Year'] >= 1950]['Year'], df[df['Year'] >= 1950]['LandAverageTemperature'])
print('Pearson correlation of Year and Land Average Temperature: ' + str(corr))

Very strong positive correlation

In [None]:
lr = LinearRegression()

In [None]:
X = df[df['Year'] >= 1950]['Year']
y = df[df['Year'] >= 1950]['LandAverageTemperature']

X = X.values.reshape(-1,1)

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.8)

In [None]:
lr.fit(X_train, y_train)
lr.score(X_test, y_test)

In [None]:
y_pred = lr.predict(X_test)

In [None]:
plt.figure(figsize=(18,10))
plt.scatter(X, y, alpha=0.6)
plt.plot(X_test, y_pred, color="orange")
plt.xlabel('Years')
plt.ylabel('Temperature (in °C)')
plt.show()
plt.clf()

In [None]:
print(lr.coef_)
print(lr.coef_ * 10)

In [None]:
print(lr.predict(np.array([2030, 2050]).reshape(-1,1)))

#### Interpretation

Every year, the average land temperature increases by an average of 0.0201 °C. Every ten years, the average land temperature increases by an average of 0.2009 °C.
The average land temperature in 2030 will be 9.9087 °C and in 2050 10.3106 °C.