# Notebook 5: Weather predictions for: Łódź, Gdańsk, Szczecin, Rzeszów, Warszawa and Kraków
Each csv file for different cities contains data collected in 2022 or 2022/23. Predictions will be executed for 2022 and MAE (mean absolute error) will be calculated for each prediction.

In the notebook comparison of predictions made by LSTM model and XGBoost algorithm will be conducted.

Predictions will be executed for different Polish cities. The main goal is to check the influence on predictions (made in different cities) of learning LSTM models on data from only Łódź and learning XGBoost algorithm on data from Warszawa, Wrocław, Szczecin, Rzeszów.

XGBoost algorithms will make a predictions for data in each city separately and then will save them.

LSTM models will be updated by data from Łódź every seven days and then predictions for the next seven days will be executed.

Important note: first execute XGB_model.ipynb and LSTM_model_different_windows_sizes_and_50epochs.ipynb notebooks for models creation.

## All necessary libraries imports

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pickle

from sklearn.metrics import mean_absolute_error

Imports from helpful_functions.python script located in root/notebooks folder.

In [2]:
from helpful_functions import  min_max_denormalization, transform_data

## Files to load

In [3]:
# minimum and maximum values for denormalization read
min_and_max = pd.read_csv("generated_models/lstm_models/min_and_max")
min = min_and_max['min']
max = min_and_max['max']

In [4]:
# XGBoost biases
xgb_bias_df = pd.read_csv("generated_models/xgb_models/biases_xgboost")
xgb_bias_df.drop('Unnamed: 0',axis='columns', inplace=True)

xgb_bias = []

for i in range(3):
    xgb_bias.append(xgb_bias_df.iloc[i].values)

In [5]:
xgb_bias

[array([ 0.23349158, -0.21502533, -0.00570393]),
 array([ 0.24179471, -0.25555926, -0.02740839]),
 array([ 0.18760138, -0.33614829, -0.02982774])]

In [6]:
# models load
# LSTM models
model_lstm = []
model_lstm.append(pickle.load(open("generated_models/lstm_models/lstm_1.pkl", "rb", -1)))
model_lstm.append(pickle.load(open("generated_models/lstm_models/lstm_2.pkl", "rb", -1)))
model_lstm.append(pickle.load(open("generated_models/lstm_models/lstm_3.pkl", "rb", -1)))

# XGBoost models
model_xgb = []
model_xgb.append(pickle.load(open("generated_models/xgb_models/xgb_1.pkl", "rb", -1)))
model_xgb.append(pickle.load(open("generated_models/xgb_models/xgb_2.pkl", "rb", -1)))
model_xgb.append(pickle.load(open("generated_models/xgb_models/xgb_3.pkl", "rb", -1)))

In [7]:
# Read the CSV file for with data from  Łódź -  data necessary for making LSTM models updates
Lodz_data = pd.read_csv("all_data/data_distance_from_Lodz/Lodz_2022.csv")
Lodz_lstm = Lodz_data[["relh", "skph", "temp"]]

## Predictions for each city

In [8]:
# longtitude and latitude of cities necessary for XGBoost models
long_lat = {"Lodz_2022_lon":19,"Lodz_2022_lat":51,
    "Gdansk_2022_2023_lon":18.64,"Gdansk_2022_2023_lat":54.35, "Krakow_2022_2023_lon":19.9450,"Krakow_2022_2023_lat":50.0647, "Rzeszow_2022_2023_lon": 22, "Rzeszow_2022_2023_lat":50.04  ,
      "Szczecin_2022_2023_lon":14.55, "Szczecin_2022_2023_lat":53.42, "Warszawa_2022_2023_lon":21.01, "Warszawa_2022_2023_lat":52.22
}

# names of folders with cities data to load
all_cities = ["Lodz_2022", "Gdansk_2022_2023", "Krakow_2022_2023", "Rzeszow_2022_2023", "Szczecin_2022_2023", "Warszawa_2022_2023"]

# dictionaries for MAE saving for predictions evaluation
MAE_lstm_all = {}
MAE_xgb_all = {}

Predictions and MAE by XGBoost models.

In [9]:

########################### XGB #############################

for city in all_cities:
    # log
    print(city)

    # name of file to read data from
    fileName = "all_data/data_distance_from_Lodz/" + city + ".csv"
    all_data = pd.read_csv(fileName)
    # predictions for 2022 only
    all_data = all_data[all_data['year']==2022]

    # data for xGBoost model #####################
    all_data.drop('Unnamed: 0',axis='columns', inplace=True)
    all_data_x = all_data.copy()
    all_data_x.rename(columns ={'temp':'tmpc', 'skph':'sped'}, inplace=True)
    all_data_x['hour'] = pd.to_datetime(all_data['time']).dt.hour
    all_data_x.drop('time',axis='columns', inplace=True)

    # different lon and lat
    lon = long_lat[city+"_lon"]
    lat = long_lat[city+"_lat"]
    all_data_x.insert(loc=0, column="lat", value=lat) 
    all_data_x.insert(loc=0, column="lon", value=lon) 
    all_data_x=all_data_x[all_data_x['minutes']==0]
    all_data_x.drop('minutes',axis='columns', inplace=True)

    ########################### XGB #############################
    MAE_humid_xgb_temp = []
    MAE_wind_xgb_temp = []
    MAE_temp_xgb_temp = []

    for hour in range(1,4): # cause predicitons for next 3 hours
        # make predictionss
        X = all_data_x[:-hour]
        y = all_data_x[hour:]
        y_pred_xgb = model_xgb[hour-1].predict(X)
        y_pred_xgb = pd.DataFrame(y_pred_xgb, columns=[ "lon", "lat",  "tmpc",  "relh" ,"sped" ,"day","month", "year", "hour"])

        MAE_humid_xgb_temp.append(mean_absolute_error(y_pred_xgb[["relh"]] + xgb_bias[hour-1][0],y[["relh"]]))
        MAE_wind_xgb_temp.append(mean_absolute_error(y_pred_xgb[["sped"]] + xgb_bias[hour-1][1],y[["sped"]]))
        MAE_temp_xgb_temp.append(mean_absolute_error(y_pred_xgb[["tmpc"]] + xgb_bias[hour-1][2],y[["tmpc"]]))

    # save MAE for each place
    MAE_xgb_all[city] = [MAE_humid_xgb_temp, MAE_wind_xgb_temp, MAE_temp_xgb_temp]     


Lodz_2022
Gdansk_2022_2023
Krakow_2022_2023
Rzeszow_2022_2023
Szczecin_2022_2023
Warszawa_2022_2023


Predictions and MAE by LSTM models.

In [10]:

######################### LSTM ###############################

# paraeters needed for updates
how_many_updates = 53 # because one year has ~52 weeks and 320 timestamps it is 6.66 days 
size_of_timestamps_in_updating_set = 320
window_size = 12
epochs = 5

# converting data frames to  numpy arrays
Lodz_lstm = Lodz_lstm.to_numpy()

for update in range(how_many_updates):
    
    for hour in range(3):
        # every model update with data from Łódź
        # data for update
        data_patch_update = Lodz_lstm[update * size_of_timestamps_in_updating_set : (update+1) * size_of_timestamps_in_updating_set + window_size + (hour*2)+1]
        X, y = transform_data(data_patch_update, max, min, timestamps_count = (hour*2)+1, is_update = True)
        len = y.shape[0]
        len = y.shape[0] - (y.shape[0] % (32))
        X = X[:len]
        y = y[:len] 
        # model update
        model_lstm[hour].reset_states()
        model_lstm[hour].fit(X, y,  epochs=epochs, shuffle=False , verbose = 0, batch_size=32)

    for city in all_cities:
        fileName = "all_data/data_distance_from_Lodz/" + city + ".csv"
        all_data = pd.read_csv(fileName)

        # data for LSTM model #####################
        all_data_lstm = all_data[["relh", "skph", "temp"]]
        all_data_lstm=all_data_lstm.to_numpy()

        ######################### LSTM ###############################

        MAE_humid_lstm_temp = []
        MAE_wind_lstm_temp = []
        MAE_temp_lstm_temp = []

        for hour in range(3): # cause predicitons for next 3 hours
            # choose proper patch set
            data_patch_test = all_data_lstm[(update+1) * size_of_timestamps_in_updating_set : (update+2) * size_of_timestamps_in_updating_set + window_size + (hour*2)+1]
            X_test, y_test = transform_data(data_patch_test, max, min, timestamps_count = (hour*2)+1, is_update = True)
            
            len = y_test.shape[0]
            len = y_test.shape[0] - (y_test.shape[0] % (32))
            X_test = X_test[:len]
            y_test = y_test[:len] 

            # make predictions
            model_lstm[hour].reset_states()
            predictions = model_lstm[hour].predict(X_test, verbose = 0, batch_size=32)

            pred = []
            actual =[]

            # denormalization of data
            weather_components_size = y.shape[1]
            for i in range(weather_components_size):
                denormalized = min_max_denormalization(predictions[:,i], max[i], min[i])
                pred.append(denormalized)
                actual.append(min_max_denormalization(y_test[:,i], max[i], min[i]))

            # MAE for one patch set
            MAE_humid_lstm_temp.append(mean_absolute_error(actual[0], pred[0]))
            MAE_wind_lstm_temp.append(mean_absolute_error(actual[1], pred[1]))
            MAE_temp_lstm_temp.append(mean_absolute_error(actual[2], pred[2]))

        # save data for each place - sum previously collected MAE with the new from the last patch set of data
        if city in MAE_lstm_all:
            MAE_lstm_all[city] = [np.array(MAE_humid_lstm_temp)/how_many_updates + MAE_lstm_all[city][0], np.array(MAE_wind_lstm_temp)/how_many_updates + MAE_lstm_all[city][1], np.array(MAE_temp_lstm_temp)/how_many_updates + MAE_lstm_all[city][2]]  
        else:
            MAE_lstm_all[city] = [np.array(MAE_humid_lstm_temp)/how_many_updates, np.array(MAE_wind_lstm_temp)/how_many_updates, np.array(MAE_temp_lstm_temp)/how_many_updates]    

## Results

LSTM models and XGBoost algorithms predictions comparison in dataframes for each weather condition: humidity, speed of wind and temperature.

The second and third columns shows MAE calculated for Łódź city made by LSTM and XGBoost.

Row number 0: MAE of predictions for the next hour.

Row number 1: MAE of predictions for the second hour.

Row number 2: MAE of predictions for the third hour.

In [11]:
# MAE for relative humidity in %
data_h = pd.DataFrame()
data_h['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_h[name_lstm] = MAE_lstm_all[city][0]

    name_xgb = city[:3] + "_xgb"
    data_h[name_xgb] = MAE_xgb_all[city][0]

# display dataframe showing humidity MAE
data_h

Unnamed: 0,hour,Lod_lstm,Lod_xgb,Gda_lstm,Gda_xgb,Kra_lstm,Kra_xgb,Rze_lstm,Rze_xgb,Szc_lstm,Szc_xgb,War_lstm,War_xgb
0,1,3.954737,4.984522,4.082874,4.907365,4.058131,5.001627,3.98982,4.74469,4.023046,4.982228,3.981287,4.863664
1,2,5.602851,6.586282,5.723394,6.335953,5.629078,6.543729,5.705966,6.228138,5.735522,6.375445,5.656984,6.347952
2,3,7.08247,7.660446,7.044643,7.56646,6.915136,7.510011,7.173334,7.351211,7.099335,7.271966,7.081618,7.477397


In [12]:
# MAE for speed of wind in km/h
data_w = pd.DataFrame()
data_w['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_w[name_lstm] = MAE_lstm_all[city][1]

    name_xgb = city[:3] + "_xgb"
    data_w[name_xgb] = MAE_xgb_all[city][1]

# display dataframe showing speed of wind MAE
data_w

Unnamed: 0,hour,Lod_lstm,Lod_xgb,Gda_lstm,Gda_xgb,Kra_lstm,Kra_xgb,Rze_lstm,Rze_xgb,Szc_lstm,Szc_xgb,War_lstm,War_xgb
0,1,2.73888,3.821859,2.951593,3.896294,2.785627,3.750902,2.993489,3.948651,2.753421,3.615799,2.70546,3.640435
1,2,3.543294,4.540666,3.913893,4.830589,3.503742,4.617141,3.766094,4.643947,3.483085,4.371958,3.436027,4.326958
2,3,4.162859,5.061692,4.608375,5.224802,4.063432,5.143353,4.329574,5.073702,4.054976,4.680567,3.997314,4.69287


In [13]:
# MAE for temperature in °C
data_t = pd.DataFrame()
data_t['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_t[name_lstm] = MAE_lstm_all[city][2]

    name_xgb = city[:3] + "_xgb"
    data_t[name_xgb] = MAE_xgb_all[city][2]

# display dataframe showing temperature MAE
data_t

Unnamed: 0,hour,Lod_lstm,Lod_xgb,Gda_lstm,Gda_xgb,Kra_lstm,Kra_xgb,Rze_lstm,Rze_xgb,Szc_lstm,Szc_xgb,War_lstm,War_xgb
0,1,0.74082,0.880744,0.67847,0.829363,0.742519,0.902643,0.728595,0.8366,0.733584,0.839178,0.723126,0.845153
1,2,1.186689,1.237262,1.057698,1.164239,1.172115,1.251426,1.171509,1.156465,1.140331,1.178679,1.131275,1.18377
2,3,1.5467,1.547482,1.377356,1.45653,1.515188,1.559553,1.537935,1.428198,1.486455,1.428072,1.467036,1.453026


LSTM models predictions comparison in dataframes for each weather condition: humidity, speed of wind and temperature.

The second column shows MAE calculated for Łódź city. The other columns shows respectively MAE calculated for Gdańsk, Kraków, Rzeszów, Szczecin and Warszawa.

Row number 0: MAE of predictions for the next hour.

Row number 1: MAE of predictions for the second hour.

Row number 2: MAE of predictions for the third hour.

In [14]:
# humidity LSTM only
# MAE for relative humidity in %
data_h_lstm = pd.DataFrame()
data_h_lstm['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_h_lstm[name_lstm] = MAE_lstm_all[city][0]

data_h_lstm

Unnamed: 0,hour,Lod_lstm,Gda_lstm,Kra_lstm,Rze_lstm,Szc_lstm,War_lstm
0,1,3.954737,4.082874,4.058131,3.98982,4.023046,3.981287
1,2,5.602851,5.723394,5.629078,5.705966,5.735522,5.656984
2,3,7.08247,7.044643,6.915136,7.173334,7.099335,7.081618


In [15]:
# speed of wind LSTM only
# MAE for speed of wind in km/h
data_w_lstm = pd.DataFrame()
data_w_lstm['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_w_lstm[name_lstm] = MAE_lstm_all[city][1]

data_w_lstm

Unnamed: 0,hour,Lod_lstm,Gda_lstm,Kra_lstm,Rze_lstm,Szc_lstm,War_lstm
0,1,2.73888,2.951593,2.785627,2.993489,2.753421,2.70546
1,2,3.543294,3.913893,3.503742,3.766094,3.483085,3.436027
2,3,4.162859,4.608375,4.063432,4.329574,4.054976,3.997314


In [20]:
# temperature LSTM only
# MAE for temperature in °C
data_t_lstm = pd.DataFrame()
data_t_lstm['hour'] = [1,2,3]
for city in all_cities:
    name_lstm = city[:3] + "_lstm"
    data_t_lstm[name_lstm] = MAE_lstm_all[city][2]

data_t_lstm

Unnamed: 0,hour,Lod_lstm,Gda_lstm,Kra_lstm,Rze_lstm,Szc_lstm,War_lstm
0,1,0.74082,0.67847,0.742519,0.728595,0.733584,0.723126
1,2,1.186689,1.057698,1.172115,1.171509,1.140331,1.131275
2,3,1.5467,1.377356,1.515188,1.537935,1.486455,1.467036


XGBoost models predictions comparison in dataframes for each weather condition: humidity, speed of wind and temperature.

The second column shows MAE calculated for Łódź city. The other columns shows respectively MAE calculated for Gdańsk, Kraków, Rzeszów, Szczecin and Warszawa.

Row number 0: MAE of predictions for the next hour.

Row number 1: MAE of predictions for the second hour.

Row number 2: MAE of predictions for the third hour.

In [17]:
# humidity XGBoost only
# MAE for relative humidity in %
data_h_xgb = pd.DataFrame()
data_h_xgb['hour'] = [1,2,3]
for city in all_cities:
    name_xgb = city[:3] + "_xgb"
    data_h_xgb[name_xgb] = MAE_xgb_all[city][0]

data_h_xgb

Unnamed: 0,hour,Lod_xgb,Gda_xgb,Kra_xgb,Rze_xgb,Szc_xgb,War_xgb
0,1,4.984522,4.907365,5.001627,4.74469,4.982228,4.863664
1,2,6.586282,6.335953,6.543729,6.228138,6.375445,6.347952
2,3,7.660446,7.56646,7.510011,7.351211,7.271966,7.477397


In [18]:
# speed of wind XGBoost only
# MAE for speed of wind in km/h
data_w_xgb = pd.DataFrame()
data_w_xgb['hour'] = [1,2,3]
for city in all_cities:
    name_xgb = city[:3] + "_xgb"
    data_w_xgb[name_xgb] = MAE_xgb_all[city][1]

data_w_xgb

Unnamed: 0,hour,Lod_xgb,Gda_xgb,Kra_xgb,Rze_xgb,Szc_xgb,War_xgb
0,1,3.821859,3.896294,3.750902,3.948651,3.615799,3.640435
1,2,4.540666,4.830589,4.617141,4.643947,4.371958,4.326958
2,3,5.061692,5.224802,5.143353,5.073702,4.680567,4.69287


In [19]:
# temperature XGBoost only
# MAE for temperature in °C
data_t_xgb = pd.DataFrame()
data_t_xgb['hour'] = [1,2,3]
for city in all_cities:
    name_xgb = city[:3] + "_xgb"
    data_t_xgb[name_xgb] = MAE_xgb_all[city][2]

data_t_xgb

Unnamed: 0,hour,Lod_xgb,Gda_xgb,Kra_xgb,Rze_xgb,Szc_xgb,War_xgb
0,1,0.880744,0.829363,0.902643,0.8366,0.839178,0.845153
1,2,1.237262,1.164239,1.251426,1.156465,1.178679,1.18377
2,3,1.547482,1.45653,1.559553,1.428198,1.428072,1.453026


### Conclusion:
Even the LSTM models have been learned on only historic data from Łódź city and XGBoost algorithms have been learned on data from Warszawa, Wrocław, Szczecin, Rzeszów - there is no huge impact on predictions for different polish cities, at least on this tested in this notebook.