# LSTM Utilization Prediction

This Jupyter Notebook is aimed to test, if the utilization of hardware can be predicted by its historical utilization.
For this, a Long-Short Term Memory (LSTM) Neural Networks are used.

These are a special kind of Recurrent Neural Networks (RNN), which are capable of learning long-term dependencies.
This property fits our use case of trying to predict a future sequential time-series based on a past sequential time-series.


## Resources 

- This notebook relied on the sources:
  -  [How to apply LSTM using PyTorch](https://cnvrg.io/pytorch-lstm/) and
  -  [Predicting-cloud-CPU-usage-on-Azure-data](https://github.com/amcs1729/Predicting-cloud-CPU-usage-on-Azure-data).
- [Further Reading on LSTM](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)

## Importing the Python Modules

*Note: If you encounter an error while trying to load the modules, go to the README.md for installing infos*

In [None]:
# used for statistical processes, i.e scaling the dataset

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# plotting the data
import matplotlib.pyplot as plt
# used for the dataframes
import pandas as pd
# transforming dataframes into arrays
# and those arrays to Tensors, the ML approach can work with
import numpy as np
%matplotlib inline

# required for the LSTM model
import torch
import torch.nn as nn
from torch.autograd import Variable

## Loading the Dataframe

In this cell the dataframe with the machine utilization data will be loaded and prepared if necessary.

In [None]:
df = pd.read_csv('training_machine_sorted_df.csv')
df['timestamp'] = pd.to_datetime(df['start_date'])
df = df.set_index('timestamp')
# df = df.sort_index()
df.drop(columns=['start_date'], inplace=True)
df.head()

In [None]:
df.groupby('machine').count()

df = df.query("machine == 'ffb1bc4dc2fbb09d0477f0f0'")
df = df.drop(columns=['machine', 'gpu_type', 'job_name'])
df

## Add One-Hot Encoded Columns for Taskname

In order to process categorical data, in this case the `task_name` column, we need to encode it.

For this, we use the `pandas.get_dummies()` method that returns the `task_name` column as one-hot encoded columns.

In [None]:
dummies = pd.get_dummies(df.task_name)
dummies

### Add the One-Hot Encoded Columns

After generating the one-hot encoded columns for `task_name`, we append it to the dataframe.
Afterwards, we remove the `task_name` column since it is now represented by those appended columns.

In [None]:
df = df.join(dummies)
df.drop(columns=['task_name'], inplace=True)
df

## Splitting the Dataframe into Train and Testset

In [None]:
TRAIN_LENGTH = round(len(df) * 0.8)
TEST_LENGTH = len(df) - TRAIN_LENGTH
train = df.iloc[0:TRAIN_LENGTH]
test = df[TRAIN_LENGTH:]

df.columns

## Scaling the Datasets

In this step the train and test datasets are scaled to represent the data values in a (-1, 1) interval.
This is done to omit unwanted behaviour by the machine learning model. 

[Code Source](https://cnvrg.io/pytorch-lstm/)

In [None]:
ss_scaler = StandardScaler()
mm_scaler = MinMaxScaler()
# mm_scaler = StandardScaler()

y_range = ['cpu_usage', 'gpu_wrk_util', 'avg_mem', 'max_mem',
       'avg_gpu_wrk_mem', 'max_gpu_wrk_mem', 'runtime']

X_ss = pd.DataFrame(ss_scaler.fit_transform(train))
y_mm = pd.DataFrame(mm_scaler.fit_transform(test[y_range]))

## Split the Dataset

Now the dataset gets split into test and training dataset.

*Note: To later be able to convert the dataset into Tensors, it is necessary to convert them to numpy arrays via `.to_numpy()`*.

In [None]:
TRAIN_SPLIT: int = 600
TEST_SPLIT = TRAIN_SPLIT + 100

X_train = X_ss[:TRAIN_SPLIT].to_numpy()
X_test = X_ss[TRAIN_SPLIT:TEST_SPLIT].to_numpy()

y_train = y_mm[:TRAIN_SPLIT].to_numpy()
y_test = y_mm[TRAIN_SPLIT:TEST_SPLIT].to_numpy()

print("Training Shape", X_train.shape, y_train.shape)
print("Testing Shape", X_test.shape, y_test.shape) 

## Converting the Datasets to Tensors

In order to be able to use the datasets with PyTorch, we first have to convert them to Tensors.

In [None]:
X_train_tensors = Variable(torch.Tensor(X_train))
X_test_tensors = Variable(torch.Tensor(X_test))

y_train_tensors = Variable(torch.Tensor(y_train))
y_test_tensors = Variable(torch.Tensor(y_test))

y_train_tensors.shape

## Reshaping to Rows, Timestamps and Features

In the reshaping process, we add an additional dimension.

This is done, because LSTMs are built for sequential data and cannot "comprehend" simple 2-D data as its input.
They need to also have the timestamp information with them, so they can work properly.

In [None]:
# Reshaping
X_train_tensors_final = torch.reshape(X_train_tensors, (X_train_tensors.shape[0], 1, X_train_tensors.shape[1]))
X_test_tensors_final = torch.reshape(X_test_tensors, (X_test_tensors.shape[0], 1, X_test_tensors.shape[1]))

print("Training Shape", X_train_tensors_final.shape, y_train_tensors.shape)
print("Testing Shape", X_test_tensors_final.shape, y_test_tensors.shape) 

## Create the LSTM Model



In [None]:
class LSTM(nn.Module):

    def __init__(self, num_classes: int, input_size: int, hidden_size: int, num_layers: int, seq_length: int) -> None:
        super(LSTM, self).__init__()
        self.num_classes: int = num_classes
        self.input_size: int = input_size
        self.hidden_size: int = hidden_size
        self.num_layers: int = num_layers
        self.seq_length: int = seq_length

        # long-short term memory layer
        self.lstm = nn.LSTM(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True)

        # first fully connected layer
        self.fc_1 = nn.Linear(hidden_size, 256)
        # second fully connected layer
        self.fc_2 = nn.Linear(256, 128)
        # thrid fully connected layer
        self.fc_3 = nn.Linear(128, num_classes)
        # activation function
        self.relu = nn.LeakyReLU()

    def forward(self, input):
        hidden_state = Variable(torch.zeros(self.num_layers, input.size(0), self.hidden_size))
        internal_state = Variable(torch.zeros(self.num_layers, input.size(0), self.hidden_size))

        # Propagate input through LSTM
        output, (hn, cn) = self.lstm(input, (hidden_state, internal_state))
        # Reshaping the data for the Dense layer
        hn = hn.view(-1, self.hidden_size)
        out = self.relu(hn)
        out = self.fc_1(out)
        out = self.relu(out)
        out = self.fc_2(out)
        out = self.relu(out)
        out = self.fc_3(out)
        
        return out
    

## Defining some hyperparameters

In the following cell, some hyperparameters are defined for further usage.

In [None]:
num_epochs: int = 1000
learning_rate: float = 0.005

# number of features
input_size: int = len(train.columns)
# number of features in hidden state
hidden_size: int = len(train.columns) * 2
# number of stacked lstm layers
num_layers: int = 1
# number of output classes
num_classes: int = len(y_range)

## Instantiate the LSTM object

In [None]:
lstm = LSTM(num_classes, input_size, hidden_size, num_layers, X_train_tensors_final.shape[1])
lstm.train()

## Define the Loss Function and Optimizer

In [None]:
# mean squared error for regression
criterion = nn.MSELoss()
# optimizer function
optimizer = torch.optim.AdamW(lstm.parameters(), lr=learning_rate)

## Training Loop

In the following, the training of the LSTM model is done.

In [None]:
for epoch in range(num_epochs):
    # forward pass
    outputs = lstm.forward(X_train_tensors_final)
    # calculates the gradient and manually setting to 0
    optimizer.zero_grad()

    # obtain the loss function
    loss = criterion(outputs, y_train_tensors)

    # calculates the loss of the loss function
    loss.backward()

    # improve from loss, i.e backpropagation
    optimizer.step()  
    if epoch % 100 == 0:
        print("Epoch: %d, loss: %1.5f" % (epoch, loss.item()))


In [None]:
# old transformers
df_X_ss = ss_scaler.transform(df)
df_y_mm = mm_scaler.transform(df[y_range])

# converting to Tensors
df_X_ss = Variable(torch.Tensor(df_X_ss))
df_y_mm = Variable(torch.Tensor(df_y_mm))

# reshaping the dataset
df_X_ss = torch.reshape(df_X_ss, (df_X_ss.shape[0], 1, df_X_ss.shape[1]))


In [None]:
# Evaluation Mode
lstm.eval()

# forward pass
train_predict = lstm(df_X_ss)
data_predict = train_predict.data.numpy()
data_predict = ss_scaler.fit_transform(data_predict)

dataY_plot = df_y_mm.data.numpy()
dataY_plot = mm_scaler.fit_transform(dataY_plot)

# reverse transformation
data_predict = ss_scaler.inverse_transform(data_predict)  
dataY_plot = mm_scaler.inverse_transform(dataY_plot)

In [None]:
data_predict_df = pd.DataFrame(data_predict, columns=y_range)
data_y_plot_df = pd.DataFrame(dataY_plot, columns=y_range)

In [None]:
df.columns

### Root Mean Squared Error (RMSE)

$\operatorname{RMSD}(\hat{\theta}) = \sqrt{\operatorname{MSE}(\hat{\theta})} = \sqrt{\operatorname{E}((\hat{\theta}-\theta)^2)}$

In [None]:
import math

def get_rmse(actual_values, predicted_values) -> float:
    return math.sqrt(mean_squared_error(actual_values, predicted_values))

rmse_result = get_rmse(dataY_plot[:], data_predict_df[:])
print(f'Test Score: {rmse_result:.2f} RMSE')

### Mean Absolute Error (MAE)

$\mathrm {MAE} ={\frac {\sum _{i=1}^{n}\left|y_{i}-x_{i}\right|}{n}}={\frac {\sum _{i=1}^{n}\left|e_{i}\right|}{n}}
$

In [None]:
mae_result = mean_absolute_error(dataY_plot[:], data_predict_df[:])
print(f'Test Score: {mae_result} MAE')

### Mean Absolute Percentage Error (MAPE)

$MAPE={\frac {100\%}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|$

The **mean absolute percentage error (MAPE)**, is a measure of prediction accuracy of a forecasting (prediction) method in statistics.

$A_t$ is the actual value and $F_t$ is the predicted value. Their difference is divided by the actual value $A_t$. 

The absolute value in this ration is summed for every predicted point in time and divided by the number of fitted points $n$.

In [None]:
def get_mape(actual_values, predicted_values):
    return np.mean(np.abs(actual_values - predicted_values) / np.abs(actual_values) * 100)

mape_result = get_mape(dataY_plot, data_predict_df)
print(f'Test Score: {mape_result} MAPE')

In [None]:
def plot_column(actual_values = data_y_plot_df, predicted_values = data_predict_df, column_number: int = 0, rmse_threshold: float = 0.30):
    
    if len(y_range) <= column_number:
        print('Out of Prediction Bounds')
        return
    
    plt.figure(figsize=(25, 15))  # plotting
    pred_colums = ['pred_' + col for col in y_range]

    column = y_range[column_number]

    rmse = get_rmse(actual_values[column], predicted_values[column])
    mae = mean_absolute_error(actual_values[column], predicted_values[column])
    
    predicted_color = 'green' if rmse < rmse_threshold else 'orange'

    plt.plot(actual_values[column], label=column, color='black')  # actual plot
    plt.plot(predicted_values[column], label='pred_' + column, color=predicted_color)  # predicted plot
    
    plt.title('Time-Series Prediction')
    plt.plot([], [], ' ', label=f'RMSE: {rmse}')
    plt.plot([], [], ' ', label=f'MAE: {mae}')
    plt.legend()
    plt.show()


In [None]:
plot_column(column_number=0)

In [None]:
plot_column(column_number=1)

In [None]:
plot_column(column_number=2)

In [None]:
plot_column(column_number=3)

In [None]:
plot_column(column_number=4)

In [None]:
plot_column(column_number=5)

In [None]:
plot_column(column_number=6)