The below notebook shows an ARIMA model tweaked to predict average monthly temperatures. Its purpose is purely educational. Firstly the user has to choose the country they want to run predictions for. After that they get to choose the parameters of the ARIMA model - both the numerical and temporal parameters (beginnings and endings of both train and test periods).

In [14]:
import pandas as pd
import ipywidgets as widgets
from IPython.display import display

# Load the CSV file into a DataFrame
df = pd.read_csv('./archivetemp/GlobalLandTemperaturesByCountry.csv')
df['dt'] = pd.to_datetime(df['dt'])

# Set the 'Date' column as the index
df.set_index('dt', inplace=True)
# Global variable to store the filtered DataFrame
global filtered_df
filtered_df = pd.DataFrame()

# Get unique countries
countries = df['Country'].unique()

# Create a dropdown widget for countries
country_dropdown = widgets.Dropdown(
    options=['Select a country'] + list(countries),
    description='Country:',
    disabled=False,
)

# Function to update the global DataFrame
def update_dataframe(change):
    global filtered_df
    if change['new'] != 'Select a country':
        filtered_df = df[df['Country'] == change['new']]
        print(f"DataFrame updated for {change['new']}")

# Observe changes in the dropdown
country_dropdown.observe(update_dataframe, names='value')

# Display the widget
display(country_dropdown)


Dropdown(description='Country:', options=('Select a country', 'Åland', 'Afghanistan', 'Africa', 'Albania', 'Al…

The above cell downloads data from the CSV file GlobalLandTemperaturesByCountry. The file consists of average monthly temperatures for all the countries spanning from the 18th century up to 2013.
There is a dropdown list which allows us to change the country for which we have saved the monthly temperatures. Every time we want to change the country we need to change the country back in 
the above cell. 

In [15]:
# Example code to process the filtered DataFrame
# Ensure that a country has been selected and the DataFrame is not empty
preprocessed_df = filtered_df.dropna()

if not preprocessed_df.empty:
    # Example processing: display the first few rows
    display(preprocessed_df.head())
else:
    print("Please select a country first.")

Unnamed: 0_level_0,AverageTemperature,AverageTemperatureUncertainty,Country
dt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1743-11-01,3.937,2.057,Poland
1744-04-01,8.889,3.11,Poland
1744-05-01,11.952,1.839,Poland
1744-06-01,14.867,1.799,Poland
1744-07-01,17.313,1.803,Poland


The above cell's sole purpose is to ensure that we have correctly selected the country.

In [16]:
import matplotlib.pyplot as plt
processed_df = preprocessed_df.copy()
processed_df['Year'] = processed_df.index.year

In [17]:

year_range = (processed_df.index.year.min(), processed_df.index.year.max())

start_year_slider = widgets.IntSlider(
    value=year_range[1]-10,
    min=year_range[0],
    max=year_range[1],
    step=1,
    description='Start Year:',
    continuous_update=False
)

end_year_slider = widgets.IntSlider(
    value=year_range[1],
    min=year_range[0],
    max=year_range[1],
    step=1,
    description='End Year:',
    continuous_update=False
)
def update_plot(start_year, end_year):
    # Filter the DataFrame
    plotted_df = processed_df[(processed_df.index.year >= start_year) & (processed_df.index.year <= end_year)].copy()
    plotted_df['PrevYearTemp'] = plotted_df['AverageTemperature'].shift(12)
    plotted_df['TempDifference'] = plotted_df['AverageTemperature'] - plotted_df['PrevYearTemp']
    # Plotting
    plt.figure(figsize=(10, 5))
    plt.plot(plotted_df.index, plotted_df['AverageTemperature'])  # Replace 'AverageTemperature' with your column name
    plt.plot(plotted_df.index, plotted_df['PrevYearTemp'], label='Previous Year Temp', linestyle='--')
    plt.title(f'Average Temperature from {start_year} to {end_year}')
    plt.xlabel('Year')
    plt.ylabel('Average Temperature')  # Replace with your column label
    plt.grid(True)
    plt.show()
    # Plotting the Temperature Difference
    plt.figure(figsize=(10, 5))
    plt.plot(plotted_df.index, plotted_df['TempDifference'], color='red', marker='o', linestyle='-')
    plt.title(f'Temperature Difference from {start_year+1} to {end_year}')
    plt.xlabel('Year')
    plt.ylabel('Temperature Difference')
    plt.grid(True)
    plt.show()

widgets.interactive(update_plot, start_year=start_year_slider, end_year=end_year_slider)

interactive(children=(IntSlider(value=2003, continuous_update=False, description='Start Year:', max=2013, min=…

In the above cell we find the average monthly temperatures for years between January of the starting year and the last available month of the ending year (August for 2013, December for earlier years)
We first plot the averages for month X in year Y and month X in year Y-1. After that we plot the differences between the averages of the same month vs the same month 1 year before. It shows, that
these differences can be as large as more than 10 degrees Celsius, especially in winter months, which show the largest standard deviations.

In [18]:
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np 
from sklearn.metrics import mean_squared_error

caption = widgets.Label(
    value='Adjust the SARIMA Model Parameters and Date Ranges:',
    layout=widgets.Layout(width='100%', justify_content='center')
)

train_start_year_widget = widgets.IntSlider(value=1990, min=year_range[0], max=year_range[1], description='Train Start Year', tooltip='Train Start Year')
train_end_year_widget = widgets.IntSlider(value=2011, min=year_range[0], max=year_range[1], description='Train End Year', tooltip='Train End Year')
test_end_year_widget = widgets.IntSlider(value=2013, min=year_range[0], max=year_range[1], description='Test End Year', tooltip='Test End Year')

# Month widgets
train_start_month_widget = widgets.IntSlider(value=1, min=1, max=12, step=1, description='Train Start Month', tooltip='Train Start Month')
train_end_month_widget = widgets.IntSlider(value=12, min=1, max=12, step=1, description='Train End Month', tooltip='Train End Month')
test_end_month_widget = widgets.IntSlider(value=8, min=1, max=12, step=1, description='Test End Month', tooltip='Test End Month')

p_widget = widgets.IntSlider(value=1, min=0, max=5, step=1, description='p:', tooltip='Autoregressive order (p)')
d_widget = widgets.IntSlider(value=1, min=0, max=2, step=1, description='d:', tooltip='Degree of Differencing (d)')
q_widget = widgets.IntSlider(value=1, min=0, max=5, step=1, description='q:', tooltip='Moving Average Order (q)')
P_widget = widgets.IntSlider(value=1, min=0, max=5, step=1, description='P:', tooltip='Seasonal AutoRegressive Order (P)')
D_widget = widgets.IntSlider(value=1, min=0, max=2, step=1, description='D:', tooltip='Seasonal Differencing Order (D)')
Q_widget = widgets.IntSlider(value=1, min=0, max=5, step=1, description='Q:', tooltip='Seasonal Moving Average Order (Q)')
S_widget = widgets.IntSlider(value=12, min=1, max=24, step=1, description='S:', tooltip='Seasonal Periodicity (S)')

def plot_forecast(p, d, q, P, D, Q, S, train_start_year, train_start_month, train_end_year, train_end_month, test_end_year, test_end_month):
    # Setting the testing start year to one year after the training end year
    train_start = f"{train_start_year}-{train_start_month:02d}-01"
    train_end = f"{train_end_year}-{train_end_month:02d}-01"
    if train_end_month < 12:
        test_start = f"{train_end_year}-{train_end_month + 1:02d}-01"  # One month after training ends
    else:
        test_start = f"{train_end_year + 1}-01-01"  # One month after training ends
    test_end = f"{test_end_year}-{test_end_month:02d}-01"

    train_series = processed_df[(processed_df.index >= train_start) & (processed_df.index <= train_end)]['AverageTemperature'].asfreq('MS')
    test_series = processed_df[(processed_df.index >= test_start) & (processed_df.index <= test_end)]['AverageTemperature'].asfreq('MS')

    # Fit the SARIMA model and plot as before
    model = ARIMA(train_series, order=(p, d, q), seasonal_order=(P, D, Q, S))
    model_fit = model.fit()

    forecasts = model_fit.get_forecast(steps=len(test_series))
    forecast_series = forecasts.predicted_mean

    rmse = np.sqrt(mean_squared_error(test_series, forecast_series))
    print(f'RMSE: {rmse}')

    plt.figure(figsize=(10, 5))
    plt.plot(train_series.index, train_series, label='Training Data')
    plt.plot(test_series.index, test_series, label='Actual Test Data', color='blue')
    plt.plot(forecast_series.index, forecast_series, label='Forecast', color='red', linestyle='--')
    plt.title('SARIMA Forecast')
    plt.xlabel('Year')
    plt.ylabel('Average Temperature')
    plt.legend()
    plt.grid(True)
    plt.show()

# Grouping the existing widgets
parameter_widgets = widgets.VBox([p_widget, d_widget, q_widget, P_widget, D_widget, Q_widget, S_widget])
date_widgets = widgets.VBox([train_start_year_widget, train_start_month_widget, 
                             train_end_year_widget, train_end_month_widget, 
                             test_end_year_widget, test_end_month_widget])

# Creating a vertical box layout with the caption at the top
vbox_layout = widgets.VBox([caption, parameter_widgets, date_widgets])


# Now use this vbox_layout in your interactive plot
interactive_plot = widgets.interactive_output(plot_forecast, 
                                              {'p': p_widget, 'd': d_widget, 'q': q_widget,
                                               'P': P_widget, 'D': D_widget, 'Q': Q_widget, 'S': S_widget,
                                               'train_start_year': train_start_year_widget,
                                               'train_start_month': train_start_month_widget,
                                               'train_end_year': train_end_year_widget,
                                               'train_end_month': train_end_month_widget,
                                               'test_end_year': test_end_year_widget,
                                               'test_end_month': test_end_month_widget})

display(vbox_layout, interactive_plot)

display(interactive_plot)

VBox(children=(Label(value='Adjust the SARIMA Model Parameters and Date Ranges:', layout=Layout(justify_conten…

Output()

Output()

In the last cell we perform ARIMA. We provide several customizable parameters for the user - the beginning and ending months for both the training and testing period, as well as parameters for the
seasonal ARIMA model. Once again large differences between seasons year to year can be seen, for Poland we see winter months with averages both above 0 and below -10 degrees Celsius. The changes of months do not change much in the appearance of the graph. The most interesting is the S (seasonality) parameter. Based on the training period ending we can possible achieve a prediction of endless summer, endless winter or even an ice age. 

p (AutoRegressive Order):
This parameter represents the number of lagged observations in the model.
In other words, it's the number of past data points that the model uses to predict the current value.

d (Degree of Differencing):
This parameter is the number of times the data have had past values subtracted.
Differencing is a method of transforming a time series dataset to make it stationary (i.e., to have constant mean and variance over time).

q (Moving Average Order):
This parameter represents the size of the moving average window.
It's the number of lagged forecast errors that the model uses.
A moving average term in an ARIMA model is a past error (difference between an observed value and a predicted value).

P (Seasonal AutoRegressive Order):
This is similar to p but for the seasonal component of the model.
It represents the number of seasonal lags of the autoregressive model.

D (Seasonal Differencing Order):
Similar to d, it's the number of seasonal differences applied to the series.
If a series has a stable seasonal pattern over time, D can help in making the series stationary.

Q (Seasonal Moving Average Order):
Analogous to q, this is for the seasonal part of the time series.
It's the number of seasonal moving average terms.

s (Seasonal Periodicity):
This parameter represents the length of the seasonal cycle.
For example, s is 12 for monthly data with an annual cycle, 4 for quarterly data, etc.