## ELECTRIC ENERGY DEMAND FORECASTING

Before delving into this exploration of Convolutional Neural Networks, it is recommended to first review the related material on Recurrent Neural Networks (file`2_programming_rnn.ipyn`). This sequence reflects the order in which these methods were explored, and references to the RNN model's results are made throughout this process.


### Convolucional Neural Network

In this part of the project, we will explore the Convolutional Neural Network (CNN) method to observe how the prediction results behave using this approach.  
We will begin by importing all the necessary libraries for this study.


In [None]:
import pandas as pd 
import numpy as np
import math
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder

from tensorflow.keras import layers, models
from sklearn.metrics import mean_squared_error

The next step is to load the data that has already been downloaded and processed in the correct format for subsequent use in training and evaluating the models. If you do not yet have the data loaded, run the`downloading_script.ipynb` script or manually download the data from the`data` folder. Once you have the data locally, you can proceed.

We load the data and perform the necessary transformations for its subsequent use.

In [None]:
df_pivoted_2020 = pd.read_csv('../data/df_pivoted/df_pivoted_2020.csv')
df_pivoted_2021 = pd.read_csv('../data/df_pivoted/df_pivoted_2021.csv')
df_pivoted_2022 = pd.read_csv('../data/df_pivoted/df_pivoted_2022.csv')
df_pivoted_2023 = pd.read_csv('../data/df_pivoted/df_pivoted_2023.csv')

df_2021 = pd.read_csv('../data/df/df_2021.csv')
df_2022 = pd.read_csv('../data/df/df_2022.csv')
df_2023 = pd.read_csv('../data/df/df_2023.csv')

# Updating specific columns for 2020 based on a specific date and applying interpolation for missing data
indx = df_pivoted_2020[df_pivoted_2020['Date'] == '2020-03-29'].index[0]
df_pivoted_2020.iloc[indx, 24:36] = [df_pivoted_2020.iloc[indx, 23] - i*(df_pivoted_2020.iloc[indx, 23] - df_pivoted_2020.iloc[indx, 36]) / 13 for i in range(1,13)]

# Updating specific columns for 2021 based on a specific date and applying interpolation for missing data
indx = df_pivoted_2021[df_pivoted_2021['Date'] == '2021-03-28'].index[0]
df_pivoted_2021.iloc[indx, 24:36] = [df_pivoted_2021.iloc[indx, 23] - i*(df_pivoted_2021.iloc[indx, 23] - df_pivoted_2021.iloc[indx, 36]) / 13 for i in range(1,13)]

# Updating specific columns for 2022 based on a specific date and applying interpolation for missing data
indx = df_pivoted_2022[df_pivoted_2022['Date'] == '2022-03-27'].index[0]
df_pivoted_2022.iloc[indx, 24:36] = [df_pivoted_2022.iloc[indx, 23] - i*(df_pivoted_2022.iloc[indx, 23] - df_pivoted_2022.iloc[indx, 36]) / 13 for i in range(1,13)]

# Updating specific columns for 2023 based on a specific date and applying interpolation for missing data
indx = df_pivoted_2023[df_pivoted_2023['Date'] == '2023-03-26'].index[0]
df_pivoted_2023.iloc[indx, 24:36] = [df_pivoted_2023.iloc[indx, 23] - i*(df_pivoted_2023.iloc[indx, 23] - df_pivoted_2023.iloc[indx, 36]) / 13 for i in range(1,13)]

# Handle missing values for 2020 by filling them with corresponding values from 2021 data
null_index = df_pivoted_2020[df_pivoted_2020.isnull().any(axis=1)].index[0]
df_pivoted_2020.iloc[null_index, 253:289] = df_2021[df_2021['Date'] == '2020-12-31']['Real']

# Handle missing values for 2021 by filling them with corresponding values from 2022 data
null_index = df_pivoted_2021[df_pivoted_2021.isnull().any(axis=1)].index[0]
df_pivoted_2021.iloc[null_index, 253:289] = df_2022[df_2022['Date'] == '2021-12-31']['Real']

# Handle missing values for 2022 by filling them with corresponding values from 2023 data
null_index = df_pivoted_2022[df_pivoted_2022.isnull().any(axis=1)].index[0]
df_pivoted_2022.iloc[null_index, 253:289] = df_2023[df_2023['Date'] == '2022-12-31']['Real']

# Train dataframe creation
train_df = pd.concat([df_pivoted_2020, df_pivoted_2021, df_pivoted_2022])
train_df = train_df.reset_index().drop(['index'], axis=1)
train_df['Date'] = pd.to_datetime(train_df['Date'])
train_df['Day_Week'] = train_df['Date'].dt.day_name()

# Test dataframe creation
test_df = df_pivoted_2023.drop(df_pivoted_2023.index[-1])
test_df['Date'] = pd.to_datetime(test_df['Date'])
test_df['Day_Week'] = test_df['Date'].dt.day_name()

# Convert the 'Date' column to datetime type
train_df['Date'] = pd.to_datetime(train_df['Date'])
test_df['Date'] = pd.to_datetime(test_df['Date'])

# Encode the 'Day_Week' column into numeric variables
label_encoder = LabelEncoder()
train_df['Day_Week'] = label_encoder.fit_transform(train_df['Day_Week'])
test_df['Day_Week'] = label_encoder.fit_transform(test_df['Day_Week'])

Next, we will scale our data, and here you have two versions of the dataset. One is the version I initially used for testing where I did not consider the day of the week data, and the other includes it (which is the version I ultimately used). I am providing both versions so you can experiment with them as well.

In [3]:
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train_df.drop(['Date', 'Day_Week'], axis=1))
test_scaled = scaler.fit_transform(test_df.drop(['Date', 'Day_Week'], axis=1))

In [4]:
scaler = MinMaxScaler(feature_range=(0, 1))
train_scaled = scaler.fit_transform(train_df.drop(['Date'], axis=1))
test_scaled = scaler.fit_transform(test_df.drop(['Date'], axis=1))

In the next step, similar to the Recurrent Neural Networks, I need to create functions that allow me to reorder the data in the format required by the models, so that the pre-existing libraries can be utilized. In this case, the two versions I created are the same as with the RNN. Version 1 (which I use) takes the data from the last 21 days to predict day 22. Version 2, on the other hand, also includes the information from the same day of the week for the last 4 weeks. In practice, version 1 yielded the best results for me, but there is always the option to create new versions of the functions.

In [5]:
# convert an array of values into a dataset matrix
def create_dataset_v1(dataset, look_back=21):
	dataX, dataY = [], []
	for i in range(len(dataset)-look_back-1):
		a = dataset[i:(i+look_back)]
		dataX.append(a)
		dataY.append(dataset[i + look_back])
	return np.array(dataX), np.array(dataY)

def create_dataset_v2(dataset, look_back):
	dataX, dataY = [], []
	for i in range(len(dataset)-29):
		mismo_dia = [(x * 7) + i for x in list(range(4))]
		uno = dataset[mismo_dia[-1]+7-look_back:mismo_dia[-1]+7]
		dos = dataset[mismo_dia]
		a = np.concatenate((uno,dos))
		dataX.append(a)
		dataY.append(dataset[mismo_dia[-1]+7])
	return np.array(dataX), np.array(dataY)

We use version 1 of the function to properly reorder my dataset.

In [6]:
look_back = 21
trainX, trainY = create_dataset_v1(train_scaled, look_back)
testX, testY = create_dataset_v1(test_scaled, look_back)

Next, I will show you the structure your dataset should have. As you can see, the training dataset consists of 1074 records (corresponding to 3 years of training data), with 289 columns (representing the data collected for each day: 288 demand data points for every 5 minutes and 1 for the day of the week). Additionally, both the training and test sets have 21, representing the number of past days' data that the algorithm uses to predict the next day.

In [9]:
print("Train x shape:", trainX.shape)
print("Test x shape:", testX.shape)

print("Train y shape:", trainY.shape)
print("Test y shape:", testY.shape)

Train x shape: (1074, 21, 289)
Test x shape: (342, 21, 289)
Train y shape: (1074, 289)
Test y shape: (342, 289)


### Creation of the CNN model

We will use the same TensorFlow library that we used for creating the RNN model. Let's define the model parameters.

In [None]:
# Define the CNN model
model = models.Sequential([
    # Convolutional layer with 64 filters and ReLU activation function
    layers.Conv1D(64, kernel_size=1, activation='relu', input_shape=(21, 289)),
    # Max pooling layer with a window size of 3
    layers.MaxPooling1D(pool_size=3),
    # Convolutional layer with 64 filters and ReLU activation function
    layers.Conv1D(64, kernel_size=4, activation='relu'),
    # Max pooling layer with a window size of 4
    layers.MaxPooling1D(pool_size=4),
    # Flatten layer to connect with fully connected layers
    layers.Flatten(),
    # Fully connected layer with 64 neurons and ReLU activation function
    layers.Dense(64, activation='relu'),
    # Output layer with 289 neurons
    layers.Dense(289)
])
# Compile the model
model.compile(optimizer='adam', loss='mse', metrics=['mae'])

# Model summary
model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv1d_4 (Conv1D)           (None, 21, 64)            18560     
                                                                 
 max_pooling1d_4 (MaxPooling  (None, 7, 64)            0         
 1D)                                                             
                                                                 
 conv1d_5 (Conv1D)           (None, 4, 64)             16448     
                                                                 
 max_pooling1d_5 (MaxPooling  (None, 1, 64)            0         
 1D)                                                             
                                                                 
 flatten_2 (Flatten)         (None, 64)                0         
                                                                 
 dense_4 (Dense)             (None, 64)               

This model is a Convolutional Neural Network (CNN) designed to work with multivariate time series data. Below is an explanation of each layer:

**Conv1D Layer**: This is the first convolutional layer. Conv1D refers to a one-dimensional convolution, which is suitable for time series data. It has 64 filters, meaning the layer will learn 64 different patterns from the input data. Each filter has a size of 1 (kernel_size), meaning each time the filter is applied, one consecutive time point will be considered. The ReLU activation function is applied after the convolution. The input shape specified is (21, 289), meaning each sample has a length of 21 time points, and each time point has 289 features (variables).

**MaxPooling1D Layer**: This is a one-dimensional max pooling layer. MaxPooling1D is used to reduce the dimensionality of the output from the convolutional layer while preserving the most important features. Here, a max pooling operation with a window size of 3 is applied, reducing the length of the resulting sequence by one-third.

**Second Conv1D Layer**: This is another convolutional layer similar to the first one, with 64 filters.

**Second MaxPooling1D Layer**: Another max pooling layer that further reduces the dimensionality of the output.

**Flatten Layer**: This layer is used to flatten the output from the convolutional layers and prepare it to be fed into the fully connected layers.

**Dense Layer**: This is a fully connected layer with 64 neurons and the ReLU activation function. Dense layers are used to learn non-linear relationships in the data.

**Dense Output Layer**: This is the final output layer. It has 289 neurons, matching the number of features in your output data. This layer does not have a specific activation function, meaning the output is not restricted to any particular range.

We now train the model for 50 epochs with a batch_size of 1.

In [12]:
model.fit(trainX, trainY, epochs=50, batch_size=1, verbose=0)

<keras.callbacks.History at 0x20fbe9ddd50>

Next, we will evaluate the model's performance using the test dataset.

In [13]:
trainY_inv = scaler.inverse_transform(trainY)
testY_inv = scaler.inverse_transform(testY)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)
# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
testPredict = scaler.inverse_transform(testPredict)
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY_inv, trainPredict))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY_inv, testPredict))
print('Test Score: %.2f RMSE' % (testScore))

Train Score: 1.39 RMSE
Test Score: 2.29 RMSE


With these parameters, we obtain an RMSE of 1.39 GW on the test set. However, with the evaluation set, we get an RMSE of 2.29, which is not very good considering the results achieved with the RNN model. 

Next, just as we did with the RNN model, we will use nested loops to test different combinations of parameters such as the number of neurons and the number of epochs. We will store all the results in a dataframe.

In [None]:
df_nn = pd.DataFrame(columns=['Neurons', 'Epochs', 'Train RMSE', 'Test RMSE'])
trainY_inv = scaler.inverse_transform(trainY)
testY_inv = scaler.inverse_transform(testY)

for neurons_number in [20, 50, 100]:
    for num_epochs in [20, 50, 100]:

        model = models.Sequential([
            layers.Conv1D(neurons_number, kernel_size=6, activation='relu', input_shape=(21, 289)),
            layers.MaxPooling1D(pool_size=4),
            layers.Conv1D(64, kernel_size=2, activation='relu'),
            layers.MaxPooling1D(pool_size=3),
            layers.Flatten(),
            layers.Dense(64, activation='relu'),
            layers.Dense(289)
        ])
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        model.fit(trainX, trainY, epochs=num_epochs, batch_size=1, verbose=0)

        # make predictions
        trainPredict = model.predict(trainX)
        testPredict = model.predict(testX)

        
        # invert predictions
        trainPredict = scaler.inverse_transform(trainPredict)
        testPredict = scaler.inverse_transform(testPredict)
        
        # calculate root mean squared error
        trainScore = math.sqrt(mean_squared_error(trainY_inv, trainPredict))
        testScore = math.sqrt(mean_squared_error(testY_inv, testPredict))

        new_row = {'Neurons': neurons_number, 'Epochs': num_epochs, 'Train RMSE': trainScore, 'Test RMSE': testScore}
        df_nn = pd.concat([df_nn, pd.DataFrame([new_row])])
        print(df_nn)

  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
0      20     50    1.393362   1.914653
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
0      20     50    1.393362   1.914653
0      20    100    1.294735   1.961142
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
0      20     50    1.393362   1.914653
0      20    100    1.294735   1.961142
0      50     20    1.801341   2.127014
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
0      20     50    1.393362   1.914653
0      20    100    1.294735   1.961142
0      50     20    1.801341   2.127014
0      50     50    1.254227   1.731056
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.464021   1.886013
0      20     50    1.393362   1.914653
0      20    100    1.294735   1.961142
0      50     20    1.801341   2.127014


We can observe that the results improve compared to the initial test we conducted, but they are still worse than those obtained with the RNN model.

During the time I was searching for the model that would yield the best solution, I also experimented, as mentioned, with different dataset sizes. For example, this case is the evaluation using the data from the last 27 days (4 weeks).

In [18]:
df_nn = pd.DataFrame(columns=['Neurons', 'Epochs', 'Train RMSE', 'Test RMSE'])
trainY_inv = scaler.inverse_transform(trainY)
testY_inv = scaler.inverse_transform(testY)

for neurons_number in [20, 50, 100]:
    for num_epochs in [20, 50, 100]:

        model = models.Sequential([
            layers.Conv1D(neurons_number, kernel_size=7, activation='relu', input_shape=(27, 289)),
            layers.MaxPooling1D(pool_size=3),
            layers.Conv1D(64, kernel_size=4, activation='relu'),
            layers.MaxPooling1D(pool_size=4),
            layers.Flatten(),
            layers.Dense(64, activation='relu'),
            layers.Dense(289)
        ])
        model.compile(optimizer='adam', loss='mse', metrics=['mae'])
        model.fit(trainX, trainY, epochs=num_epochs, batch_size=1, verbose=0)

        # make predictions
        trainPredict = model.predict(trainX)
        testPredict = model.predict(testX)

        
        # invert predictions
        trainPredict = scaler.inverse_transform(trainPredict)
        testPredict = scaler.inverse_transform(testPredict)
        
        # calculate root mean squared error
        trainScore = math.sqrt(mean_squared_error(trainY_inv, trainPredict))
        testScore = math.sqrt(mean_squared_error(testY_inv, testPredict))

        new_row = {'Neurons': neurons_number, 'Epochs': num_epochs, 'Train RMSE': trainScore, 'Test RMSE': testScore}
        df_nn = pd.concat([df_nn, pd.DataFrame([new_row])])
        print(df_nn)

  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
0      20     50    1.402327   2.002637
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
0      20     50    1.402327   2.002637
0      20    100    1.351819   2.223125
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
0      20     50    1.402327   2.002637
0      20    100    1.351819   2.223125
0      50     20    1.268707   1.785111
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
0      20     50    1.402327   2.002637
0      20    100    1.351819   2.223125
0      50     20    1.268707   1.785111
0      50     50    1.098674   1.698830
  Neurons Epochs  Train RMSE  Test RMSE
0      20     20    1.547657   1.963241
0      20     50    1.402327   2.002637
0      20    100    1.351819   2.223125
0      50     20    1.268707   1.785111


Since the results are very close but the difficulty of the model increases to a greater extent, I decided to stick with the option of using 21 days. The next step is to find the optimal value for the number of neurons. To do this, we will use the GridSearchCV function, testing different values.

In [None]:
from sklearn.model_selection import GridSearchCV
from keras import models, layers
from keras.wrappers.scikit_learn import KerasRegressor

# Function to create the CNN model
def create_cnn(neurons_number=64):
    model = models.Sequential([
        layers.Conv1D(neurons_number, kernel_size=6, activation='relu', input_shape=(21, 289)),
        layers.MaxPooling1D(pool_size=4),
        layers.Conv1D(64, kernel_size=3, activation='relu'),
        layers.MaxPooling1D(pool_size=2),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(289)
    ])
    model.compile(optimizer='adam', loss='mse', metrics=['mae'])
    return model

# Create the KerasRegressor model
model = KerasRegressor(build_fn=create_cnn, epochs=100, batch_size=1, verbose=0)

# Define the hyperparameters for grid search
param_grid = {'neurons_number': [32, 64, 128]}  # You can adjust the values as needed

# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=5)
grid_result = grid_search.fit(trainX, trainY)

# Display the results
print("Best result: %f using %s" % (grid_result.best_score_, grid_result.best_params_))

  model = KerasRegressor(build_fn=create_cnn, epochs=100, batch_size=1, verbose=0)


Mejor resultado: -0.007537 usando {'neurons_number': 64}


We can observe that the optimal number of neurons is 64. Therefore, I will create and evaluate a model with these parameters to compare it with the best model obtained in the RNN study.

In [19]:
model = models.Sequential([
    layers.Conv1D(64, kernel_size=6, activation='relu', input_shape=(21, 289)),
    layers.MaxPooling1D(pool_size=4),
    layers.Conv1D(64, kernel_size=3, activation='relu'),
    layers.MaxPooling1D(pool_size=2),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(289)
])
# Compilar el modelo
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=0)

# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
testPredict = scaler.inverse_transform(testPredict)

trainScore = math.sqrt(mean_squared_error(trainY_inv, trainPredict))
testScore = math.sqrt(mean_squared_error(testY_inv, testPredict))
print('Train Score: %.2f RMSE' % (trainScore))
print('Test Score: %.2f RMSE' % (testScore))

Train Score: 1.26 RMSE
Test Score: 2.26 RMSE


Now I will test with different values of kernel_size and pool_size to see if it changes the result.

In [18]:
model = models.Sequential([
    layers.Conv1D(64, kernel_size=1, activation='relu', input_shape=(21, 289)),
    layers.MaxPooling1D(pool_size=3),
    layers.Conv1D(64, kernel_size=4, activation='relu'),
    layers.MaxPooling1D(pool_size=4),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(289)
])
# Compilar el modelo
model.compile(optimizer='adam', loss='mse', metrics=['mae'])
model.fit(trainX, trainY, epochs=100, batch_size=1, verbose=0)

# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)

# invert predictions
trainPredict = scaler.inverse_transform(trainPredict)
testPredict = scaler.inverse_transform(testPredict)

trainScore = math.sqrt(mean_squared_error(trainY_inv, trainPredict))
testScore = math.sqrt(mean_squared_error(testY_inv, testPredict))
print('Train Score: %.2f RMSE' % (trainScore))
print('Test Score: %.2f RMSE' % (testScore))

Train Score: 1.56 RMSE
Test Score: 2.27 RMSE


In my case, I will not continue exploring this path, as the results obtained have not been promising. After analyzing the potential of RNNs and CNNs, it is evident from the results that RNNs are better at handling and predicting time series data. CNNs, on the other hand, are not able to capture this relationship effectively, which is why their results are considerably worse than those of RNNs.

However, there is still much more to explore in this area. It is possible to optimize many more parameters (not just the number of neurons) as well as all the other parameters used in the model. One can also experiment with modifying the input dataset, both by altering its format (such as adding more than 21 days of data to predict day number 22) and by including additional fields of interest (for example, weather data).