# Modeling Pipeline

First thing to do is to initialize the configuration file under <strong>src/modeling/utils/config.json</strong>. Below are reported the two examples of config file for Winter and Summer seasons

Now we proceed importing all the necessary libraries

In [None]:
import sys
sys.path.append('../src/') #sometimes needed when the ../src/ path is not found
from modeling.utils.data import *
from modeling.utils.plotting import *
from modeling.utils.config import *
#only useful for jupyter notebooks such that any change in the code behind does not need the kernel to be restarted
%load_ext autoreload
%autoreload 2

First step is to create the dataset by means of the build_data() function. The process is automatized. If the daily anomaly dataset was already created, the file is directly read, otherwise the process starts from the hourly observation file, it resamples it to daily, it evaluates the normal and finally the anomaly (pushing all the intermediate steps to file)

In [None]:
dt = build_data(normal_mode = 'flat', normal_freq = 'm')
# normal_mode can either be flat or dynamic. If 'flat' normal_freq is ignored
# if dynamic, with normal_freq we can govern the frequency: {'m': 'monthly', 'w': 'weekly', 'd': 'daily'} 

We have then to weight the anomaly, to compensate for the elongation along the latitude

In [None]:
dt = weighted_anomaly(dt)

We have then to flat the spatial dimensions, in order to perform the dimensionality reduction through PCA, and for plotting the results of the modelign

In [None]:
pivot_anomaly = flat_table(dt)

Now the <strong>dimensionality reduction</strong> step. Either thorugh <strong>PCA</strong>

In [None]:
reduced_anomaly = reduce_dim(pivot_anomaly, method = 'PCA', exp_variance = .9, season = 'SUMMER', load_est = 'pca_summer.pkl')
#exp_variance could be either a float between 0 and 1, indicating the percentage of explained variance
#or it could be hard-coded as an integer to specify the exact number of components
#load_est is used to load a pre-fitted PCA estimator (to be used when we want to do inference)

or through <strong>VAE</strong>

In [None]:
reduced_anomaly = reduce_dim(dt, method = 'VAE', season = 'SUMMER', model = "sigma_vae_statedict_5_SLP")
#note here we pass directly the 3D dataset dt, because it works on the images
#we have to specify the season (representing the folder where the VAE models are saved)
#and the file of the VAE model we want to use

We now proceed in diving in train and test datasets, in case we want to train from scratch the clustering algorithms

In [None]:
train_X, test_X, pivot_train, pivot_test =\ 
    train_test_split(reduced_anomaly, pivot_anomaly, test_size = 0.2, random_state = 42)

The training loop adopts a 5-Fold Cross-Validation process, optimizing each model under 4 different scores. However, both the choice of the models to be optimized, and the scores are customizable

In [None]:
for model in ['kmeans','bayesian_gmm','gmm']:
    for scoring in  ["score", "ch", "bic", "silhouette"]:
        estimator = cross_val(reduced_anomaly.values, method = model, scoring = scoring,
                              season = season, folder = 'new folder name')
#we specify again the season (root folder) and the folder where the models will be saved.
#If it does not exist it is created automatically

If we decide to skip the training procedure, or we simply just want to take a look at the performances of the models, we use the following function

In [None]:
get_statistics(f'../models/{season}/wanted folder', train_X, test_X)

Finally, to visualize the centroids of one particular model

For <strong>K-Means</strong>

In [None]:
estimator = load_estimator(f'../models/{season}/SLP_pca_4pcs/kmeans_model_ch.pkl')
outputs = extract_regimes(reduced_anomaly, method='kmeans', nb_regimes = None, estimator = estimator)
labels, inertias, _ = outputs
plot_regimes(pivot_anomaly, labels)

For <strong>Mixture Models</strong> (just change the name of the wanted model)

In [None]:
estimator = load_estimator(f'../models/{season}/SLP_pca_4pcs/gmm_model_silhouette.pkl')
probas, elbo, means, covariances, _ = extract_regimes(reduced_anomaly, method='gmm',
                                                      nb_regimes = None, estimator = estimator)
labels = np.argmax(probas, axis=1)
plot_regimes(pivot_anomaly, labels)

## Additional

To retrieve the EOFs and PCs we use the following function

In [None]:
eofs_, pcs = eofs(pivot_anomaly)

To plot the EOFs we use instead

In [None]:
plot_EOFS(eofs_)

# Dashboard

First thing to do is to initialize the configuration file under <strong>src/dashboard/utils/config.json</strong>. Below are reported the two examples of config file for Winter and Summer seasons

In order to start the dashboard, just run the following command in a terminal/command prompt

In [None]:
!streamlit run energy_dash.py

The possible options to be selected in the dash board are: <strong>EU-7, BE, ES, FR, GE, IT, NE, UK</strong> to look for the countries quantifications, <strong>Sub-seasonal Forecasts</strong> to test the sub-seasonal forecast of ECMWF, <strong>Meteo-France</strong> to perform the comparison with Meteo-France predictions, <strong>Model Dynamics</strong> to examine the transition probabilities of each model, and <strong>Comparison True Measurements and Synthetic Data</strong> to perform a data exploration of the additional weather and energy data