# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Msc. BDS - M7 Second Semester Project** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 03: Training Pipeline</span>

## <span style='color:#2656a3'> 🗒️ This notebook is divided into the following sections:
1. Feature selection.
2. Creating a Feature View.
3. Training datasets creation - splitting into train and test sets.
4. Training the model.
5. Register the model to Hopsworks Model Registry.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages
We start with importing some of the necessary libraries needed for this notebook and warnings to avoid unnecessary distractions and keep output clean.

In [1]:
# Importing the packages and libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import tensorflow as tf
import os

# Ignore warnings
import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings("ignore", category=DeprecationWarning)

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store
We connect to Hopsworks Feature Store so we can retrieve the Feature Groups and select features for training data.

In [2]:
# Importing the hopsworks module for interacting with the Hopsworks platform
import hopsworks

# Logging into the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/554133
Connected. Call `.close()` to terminate connection gracefully.


In [3]:
# Retrieve the feature groups
electricity_fg = fs.get_feature_group(
    name='electricity_prices',
    version=1,
)

electricity_price_window_fg = fs.get_feature_group(
    name='electricity_price_window',
    version=1,
)

weather_fg = fs.get_feature_group(
    name='weather_measurements',
    version=1,
)

danish_calendar_fg = fs.get_feature_group(
    name='dk_calendar',
    version=1,
)

## <span style="color:#2656a3;"> 🖍 Feature View Creation and Retrieving </span>

We first select the features that we want to include for model training.

Since we specified `primary_key`as `date` and `timestamp` in `1_feature_backfill` we can now join them together for the `electricity_fg`, `weather_fg` and `danish_holiday_fg`.

`join_type` specifies the type of join to perform. An inner join refers to only retaining the rows based on the keys present in all joined DataFrames.

In [4]:
# Select features for training data and join them together and except duplicate columns
selected_features = electricity_fg.select_all()\
    .join(electricity_price_window_fg.select_except(["timestamp"]), join_type="inner")\
    .join(weather_fg.select_except(["timestamp", "datetime", "hour", "date"]), join_type="inner")\
    .join(danish_calendar_fg.select_except(["timestamp", "datetime", "hour", "date"]), join_type="inner")

In [None]:
# transformation_functions = {
#         "hour": fs.get_transformation_function(name="min_max_scaler"),
#         "dk1_spotpricedkk_kwh": fs.get_transformation_function(name="min_max_scaler"),
#         "temperature_2m": fs.get_transformation_function(name="min_max_scaler"),
#         "relative_humidity_2m": fs.get_transformation_function(name="min_max_scaler"),
#         "precipitation": fs.get_transformation_function(name="min_max_scaler"),
#         "rain": fs.get_transformation_function(name="min_max_scaler"),
#         "snowfall": fs.get_transformation_function(name="min_max_scaler"),
#         "weather_code": fs.get_transformation_function(name="min_max_scaler"),
#         "cloud_cover": fs.get_transformation_function(name="min_max_scaler"),
#         "wind_speed_10m": fs.get_transformation_function(name="min_max_scaler"),
#         "wind_gusts_10m": fs.get_transformation_function(name="min_max_scaler"),
#         "dayofweek": fs.get_transformation_function(name="min_max_scaler"),
#         "day": fs.get_transformation_function(name="min_max_scaler"),
#         "month": fs.get_transformation_function(name="min_max_scaler"),
#         "year": fs.get_transformation_function(name="min_max_scaler"),
#         "workday": fs.get_transformation_function(name="min_max_scaler"),
#     }


In [5]:
# Display the first 5 rows of the selected features
selected_features.show(5)

Finished: Reading data from Hopsworks, using ArrowFlight (4.43s) 


Unnamed: 0,timestamp,datetime,date,hour,dk1_spotpricedkk_kwh,prev_1w_mean,prev_2w_mean,prev_4w_mean,temperature_2m,relative_humidity_2m,...,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,dayofweek,day,month,year,workday
0,1692856800000,2023-08-24 06:00:00+00:00,2023-08-24 00:00:00+00:00,6,1.22897,0.795215,0.729342,0.56633,14.8,77.0,...,0.0,0.0,10.0,17.7,32.8,3,24,8,2023,1
1,1695733200000,2023-09-26 13:00:00+00:00,2023-09-26 00:00:00+00:00,13,0.61856,0.458488,0.540868,0.662041,19.4,66.0,...,0.0,1.0,40.0,12.4,25.9,1,26,9,2023,1
2,1657242000000,2022-07-08 01:00:00+00:00,2022-07-08 00:00:00+00:00,1,1.53483,1.791675,1.946746,1.756484,12.9,87.0,...,0.0,0.0,6.0,18.0,33.8,4,8,7,2022,1
3,1691539200000,2023-08-09 00:00:00+00:00,2023-08-09 00:00:00+00:00,0,-0.00842,0.32863,0.434574,0.44715,11.3,87.0,...,0.0,53.0,100.0,35.0,60.1,2,9,8,2023,1
4,1666929600000,2022-10-28 04:00:00+00:00,2022-10-28 00:00:00+00:00,4,0.66365,0.872462,1.027621,1.074615,14.7,94.0,...,0.0,3.0,100.0,17.9,34.6,4,28,10,2022,1


A `Feature View` stands between the **Feature Groups** and **Training Dataset**. Сombining **Feature Groups** we can create a **Feature View** which stores a metadata of our data. Having the **Feature View** we can create a **Training Dataset**.

In order to create Feature View we can use `fs.get_or_create_feature_view()` method.

We can specify parameters:

- `name` - Name of the feature view to create.
- `version` - Version of the feature view to create.
- `query` - Query object with the data.

In [None]:
# Getting or creating a feature view named 'electricity_price_feature_view'
version=1
feature_view_training = fs.get_or_create_feature_view(
    name='electricity_price_feature_view',
    version=version,
    query=selected_features,
)

In [None]:
# # Getting or creating a feature view named 'dk1_electricity_training_feature_view'
# version = 1
# feature_view_training = fs.get_or_create_feature_view(
#     name='lstm_dk1_electricity_training_feature_view',
#     version=version,
#     transformation_functions=transformation_functions,
#     query=selected_features_training,
# )

## <span style="color:#2656a3;"> 🏋️ Training Dataset Creation</span>

In Hopsworks, a training dataset is generated from a query defined by the parent FeatureView, which determines the set of features.

**Training Dataset may contain splits such as:** 
* Training set: This subset of the training data is utilized for model training.
* Validation set: Used for evaluating hyperparameters during model training. *(We have not included a validation set for this project)*
* Test set: Reserved as a holdout subset of training data for evaluating a trained model's performance.

Training dataset is created using `fs.training_data()` method.

In [None]:
# Retrieve training data from the feature view 'feature_view_training', assigning the features to 'X'.
df, _ = feature_view_training.training_data(
    description = 'LSTM Electricity Prices Training Dataset',
)

In [None]:
# sort the data by timestamp and reset the index for time series data
df.sort_values(by='timestamp', ascending=True, inplace=True)
df = df.reset_index(drop=True)

df.head()

## <span style="color:#2656a3;">🧬 Modeling</span>

### <span style="color:#2656a3;">🧬 Model 1: Random Train/Test Split LSTM Model</span>

#### <span style="color:#2656a3;">👆 Feature Selection</span>

In [None]:
# Select features and target
features = df.drop(columns=['dk1_spotpricedkk_kwh','datetime','date','timestamp'])
target = df['dk1_spotpricedkk_kwh']

# Normalize the features
scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)

# Normalize the target
target = target.values.reshape(-1, 1)
target_scaler = MinMaxScaler()
target_scaled = target_scaler.fit_transform(target)

# Convert back to DataFrame for easier handling
features_scaled = pd.DataFrame(features_scaled, index=features.index, columns=features.columns)
target_scaled = pd.DataFrame(target_scaled, index=features.index, columns=['dk1_spotpricedkk_kwh'])

#### <span style="color:#2656a3;"> ⛳️ Dataset with train and test splits</span>

Here we define our train and test splits for traning the model.

Create Sequences for X and y

In [None]:
from sklearn.model_selection import train_test_split

def create_sequences(features, target, time_steps=24):
    X, y = [], []
    for i in range(len(features) - time_steps):
        X.append(features.iloc[i:i+time_steps].values)
        y.append(target.iloc[i+time_steps].values)
    return np.array(X), np.array(y)

time_steps = 24  # Use the past 24 hours to predict the next hour
X, y = create_sequences(features_scaled, target_scaled, time_steps)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

#### <span style="color:#2656a3;">🏠 Model Building</span>

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Define the LSTM model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])), # Set to True for multiple LSTM layers
    LSTM(50, return_sequences=False), # Set to False for the last LSTM layer
    Dense(1) # Output layer
])

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')



#### <span style="color:#2656a3;">💪 Model Training</span>

In [None]:
# Train the model
history = model.fit(X_train, y_train, epochs=12, batch_size=32, validation_data=(X_test, y_test)) # Increase epochs for better performance

#### <span style='color:#2656a3'> ⚖️ Model Validation

In [None]:
loss = model.evaluate(X_test, y_test)
print(f'Test Loss: {loss}')

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Inverse transform the predictions and the true values to their original scale
y_pred_inverse = target_scaler.inverse_transform(y_pred)
y_test_inverse = target_scaler.inverse_transform(y_test)

# Calculate performance metrics (e.g., RMSE)
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import numpy as np

# Calculate performance metrics

rmse = np.sqrt(mean_squared_error(y_test_inverse, y_pred_inverse))
mae = mean_absolute_error(y_test_inverse, y_pred_inverse)
mse = mean_squared_error(y_test_inverse, y_pred_inverse)
r2 = r2_score(y_test_inverse, y_pred_inverse)

print(f'RMSE: {np.sqrt(rmse)}')
print(f'MAE: {mae}')
print(f'MSE: {mse}')
print(f'R²: {r2}')

In [None]:
import matplotlib.pyplot as plt

# Plotting the true values and the predicted values
plt.figure(figsize=(14, 7))
plt.plot(y_test_inverse, label='Actual dk1_spotpricedkk_kwh')
plt.plot(y_pred_inverse, label='Predicted dk1_spotpricedkk_kwh')
plt.title('Actual vs Predicted dk1_spotpricedkk_kwh')
plt.xlabel('Time')
plt.ylabel('dk1_spotpricedkk_kwh')
plt.legend()
plt.show()

#### Feature Importance

In [None]:
import numpy as np
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Function to calculate permutation feature importance
def permutation_feature_importance(model, X_val, y_val, feature_names):
    baseline_mse = mean_squared_error(y_val, model.predict(X_val))
    importances = []

    for col in range(X_val.shape[2]):
        X_val_permuted = np.copy(X_val)
        np.random.shuffle(X_val_permuted[:, :, col])
        permuted_mse = mean_squared_error(y_val, model.predict(X_val_permuted))
        importances.append(permuted_mse - baseline_mse)

    return np.array(importances), feature_names

# Calculate feature importance
importances, feature_names = permutation_feature_importance(model, X_test, y_test, features.columns)

# Plot feature importance
plt.figure(figsize=(10, 6))
plt.barh(range(len(importances)), importances, align='center')
plt.yticks(range(len(importances)), feature_names)
plt.xlabel('Increase in MSE after Permutation')
plt.title('Permutation Feature Importance')
plt.show()

#### <span style="color:#2656a3;">🤖 Making the predictions</span>

In [None]:
# Extract the last 5 predictions and their corresponding actual values
last_5_predictions = y_pred_inverse[-5:]
last_5_actuals = y_test_inverse[-5:]

# Print the last 5 predictions and their actual values
print("Last 5 Predictions vs Actual Values:")
for i in range(5):
    print(f"Prediction: {last_5_predictions[i][0]:.4f}, Actual: {last_5_actuals[i][0]:.4f}")

### <span style="color:#2656a3;">🧬 Model 2: Temporal LSTM model</span>

#### Preprocess the Data

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler

# Assuming your DataFrame is named df


# Selecting the relevant features and target
features = df.drop(columns=['dk1_spotpricedkk_kwh','datetime','date','timestamp'])

target = df['dk1_spotpricedkk_kwh'].values.reshape(-1, 1)

# Scaling the features and target
scaler_features = MinMaxScaler()
scaler_target = MinMaxScaler()

features_scaled = scaler_features.fit_transform(features)
target_scaled = scaler_target.fit_transform(target)


#### Create Sequences

In [None]:
def create_sequences(features, target, time_steps=24):
    X, y = [], []
    for i in range(len(features) - time_steps):
        X.append(features[i:(i + time_steps)])
        y.append(target[i + time_steps])
    return np.array(X), np.array(y)

time_steps = 24  # For hourly data, 24 time steps correspond to one day
X, y = create_sequences(features_scaled, target_scaled, time_steps)

#### <span style="color:#2656a3;"> ⛳️ Dataset with train and test splits</span>

Here we define our train and test splits for traning the model.

In [None]:
# Train-test split
split_ratio = 0.8
split_index = int(len(X) * split_ratio)

X_train, X_test = X[:split_index], X[split_index:]
y_train, y_test = y[:split_index], y[split_index:]

#### <span style="color:#2656a3;">🏠 Model Building</span>

In [None]:
from keras.models import Sequential
from keras.layers import LSTM, Dense

model = Sequential()
model.add(LSTM(50, return_sequences=True, input_shape=(time_steps, X_train.shape[2])))
model.add(LSTM(50, return_sequences=False))
model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')

#### <span style="color:#2656a3;">💪 Model Training</span>

In [None]:
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.2)

#### <span style='color:#2656a3'> ⚖️ Model Validation

In [None]:
loss = model.evaluate(X_test, y_test)
print(f'Val Loss: {loss}')

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Inverse transform the predictions and the true values to their original scale
y_pred_inverse = scaler_target.inverse_transform(y_pred)
y_test_inverse = scaler_target.inverse_transform(y_test)

# Calculate performance metrics (e.g., RMSE)
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import numpy as np

# Calculate performance metrics

rmse = np.sqrt(mean_squared_error(y_test_inverse, y_pred_inverse))
mae = mean_absolute_error(y_test_inverse, y_pred_inverse)
mse = mean_squared_error(y_test_inverse, y_pred_inverse)
r2 = r2_score(y_test_inverse, y_pred_inverse)

print(f'RMSE: {np.sqrt(rmse)}')
print(f'MAE: {mae}')
print(f'MSE: {mse}')
print(f'R2: {r2}')

In [None]:
import matplotlib.pyplot as plt

# Plotting the true values and the predicted values
plt.figure(figsize=(14, 7))
plt.plot(y_test_inverse, label='Actual dk1_spotpricedkk_kwh')
plt.plot(y_pred_inverse, label='Predicted dk1_spotpricedkk_kwh')
plt.title('Actual vs Predicted dk1_spotpricedkk_kwh')
plt.xlabel('Time')
plt.ylabel('dk1_spotpricedkk_kwh')
plt.legend()
plt.show()

#### Feature Importance

In [None]:
import numpy as np
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Function to calculate permutation feature importance
def permutation_feature_importance(model, X_val, y_val, feature_names):
    baseline_mse = mean_squared_error(y_val, model.predict(X_val))
    importances = []

    for col in range(X_val.shape[2]):
        X_val_permuted = np.copy(X_val)
        np.random.shuffle(X_val_permuted[:, :, col])
        permuted_mse = mean_squared_error(y_val, model.predict(X_val_permuted))
        importances.append(permuted_mse - baseline_mse)

    return np.array(importances), feature_names

# Calculate feature importance
importances, feature_names = permutation_feature_importance(model, X_test, y_test, features.columns)

# Plot feature importance
plt.figure(figsize=(10, 6))
plt.barh(range(len(importances)), importances, align='center')
plt.yticks(range(len(importances)), feature_names)
plt.xlabel('Increase in MSE after Permutation')
plt.title('Permutation Feature Importance')
plt.show()

#### <span style="color:#2656a3;">🤖 Making the predictions</span>

In [None]:
# Extract the last 5 predictions and their corresponding actual values
last_5_predictions = y_pred_inverse[-5:]
last_5_actuals = y_test_inverse[-5:]

# Print the last 5 predictions and their actual values
print("Last 5 Predictions vs Actual Values:")
for i in range(5):
    print(f"Prediction: {last_5_predictions[i][0]:.4f}, Actual: {last_5_actuals[i][0]:.4f}")

In [None]:
# Extract the last 5 predictions and their corresponding actual values
last_5_predictions = y_pred_inverse[-5:]
last_5_actuals = y_test_inverse[-5:]

# Print the last 5 predictions and their actual values
print("Last 5 Predictions vs Actual Values:")
for i in range(5):
    print(f"Prediction: {last_5_predictions[i][0]:.4f}, Actual: {last_5_actuals[i][0]:.4f}")

### <span style="color:#2656a3;">🧬 Model 3: Hybrid Conv1D-Bidirectional LSTM Time Series Model</span>

In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Conv1D, BatchNormalization, LeakyReLU, MaxPooling1D, Bidirectional, Dropout

def build_model(input_dim):
    # Creating a Sequential model
    model = Sequential()

    # Adding a 1D convolutional layer
    model.add(Conv1D(filters=64, # Number of filters for the convolutional layer
                     kernel_size=1, # kernel size to detect patterns over short periods.
                     padding='same', # padding to ensure the output has the same length as the input
                     kernel_initializer="uniform", # kernel initializer
                     input_shape=(input_dim[0], input_dim[1]))) # input shape
    model.add(BatchNormalization()) # normalize the activations of the previous layer at each batch
    model.add(LeakyReLU(alpha=0.2)) #  type of activation function based on a ReLU, but it has a small slope for negative values instead of a flat slope.

    # Adding 1D convolutional layer
    model.add(Conv1D(filters=32, # smaller number of filters for
                     kernel_size=1, 
                     padding='same', 
                     kernel_initializer="uniform"))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2)) # alpha 0.2 that is used to control the amount of leakage

    # Adding 1D convolutional layer
    model.add(Conv1D(filters=16, 
                     kernel_size=1, 
                     padding='same', 
                     kernel_initializer="uniform"))
    model.add(BatchNormalization())
    model.add(LeakyReLU(alpha=0.2))
    
    # Adding a 1D max pooling layer
    model.add(MaxPooling1D(pool_size=1, padding='same')) # max pooling layer to downsample the input representation

    # Adding a Bidirectional LSTM layer
    model.add(Bidirectional(LSTM(units=50, return_sequences=True))) # Bidirectional layer to learn from the input sequence in both forward and backward directions
    model.add(Dropout(rate=0.1))

    # Adding a second LSTM layer
    model.add(LSTM(units=50, return_sequences=False))
    
    # Adding a Dense layer with 1 unit for the output
    model.add(Dense(units=1))  # Output layer

    # Displaying the model summary
    model.summary()

    # Compiling the model with mean squared error loss and the Adam optimizer
    model.compile(loss='mean_squared_error', optimizer='adam')

    return model

#### <span style="color:#2656a3;">🏠 Model Building</span>

In [None]:
# input_dim
input_dim = (X_train.shape[1], X_train.shape[2]) # sequence_length, num_features
model = build_model(input_dim) 

#### <span style="color:#2656a3;">💪 Model Training</span>

In [None]:
# Train the model
# history for the loss function and the validation loss function
history = model.fit(X_train, y_train, epochs=12, batch_size=32, validation_data=(X_test, y_test)) 

In [None]:
# Extracting the training history dictionary from the model training
history_dict = history.history

# Displaying the keys in the history dictionary
print(history_dict.keys())

In [None]:
# Extracting training and validation loss values from the history dictionary
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']

# Creating separate variables for loss values (50 epochs)
loss_values50 = loss_values
val_loss_values50 = val_loss_values

# Generating a plot for training and validation loss over epochs
epochs = range(1, len(loss_values50) + 1)
plt.plot(epochs, loss_values50, 'b', color='blue', label='Training loss')
plt.plot(epochs, val_loss_values50, 'b', color='red', label='Validation loss')

# Setting plot details and labels
plt.rc('font', size=18)
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.xticks(epochs)

# Adjusting the size of the plot
fig = plt.gcf()
fig.set_size_inches(15, 7)

# Displaying the plot
plt.show()

In [None]:
# Predict on the test set
y_pred = model.predict(X_test)

# Inverse transform the predictions and the true values to their original scale
y_pred_inverse = scaler_target.inverse_transform(y_pred)
y_test_inverse = scaler_target.inverse_transform(y_test)

In [None]:
import matplotlib.pyplot as plt

# Plotting the true values and the predicted values
plt.figure(figsize=(14, 7))
plt.plot(y_test_inverse, label='Actual dk1_spotpricedkk_kwh')
plt.plot(y_pred_inverse, label='Predicted dk1_spotpricedkk_kwh')
plt.title('Actual vs Predicted dk1_spotpricedkk_kwh')
plt.xlabel('Time')
plt.ylabel('dk1_spotpricedkk_kwh')
plt.legend()
plt.show()

#### <span style='color:#2656a3'> ⚖️ Model Validation

In [None]:
# Calculate performance metrics

rmse = np.sqrt(mean_squared_error(y_test_inverse, y_pred_inverse))
mae = mean_absolute_error(y_test_inverse, y_pred_inverse)
mse = mean_squared_error(y_test_inverse, y_pred_inverse)
r2 = r2_score(y_test_inverse, y_pred_inverse)

print(f'RMSE: {np.sqrt(rmse)}')
print(f'MAE: {mae}')
print(f'MSE: {mse}')
print(f'R2: {r2}')

#### <span style="color:#2656a3;">🤖 Making the predictions</span>

In [None]:
# Extract the last 5 predictions and their corresponding actual values
last_5_predictions = y_pred_inverse[-5:]
last_5_actuals = y_test_inverse[-5:]

# Print the last 5 predictions and their actual values
print("Last 5 Predictions vs Actual Values:")
for i in range(5):
    print(f"Prediction: {last_5_predictions[i][0]:.4f}, Actual: {last_5_actuals[i][0]:.4f}")

## <span style="color:#2656a3;"> Hypertuning the last model</span>

Configure the Tuner and Set up the RandomSearch tuner with the hypermodel:



### <span style="color:#2656a3;"> ⛳️ Dataset with train and test splits</span>

Here we define our train and test splits for traning the model.

## <span style="color:#2656a3;">🗃 Window timeseries</span>

For this case, let’s assume that given the past 10 days observation, we need to forecast the next 5 days observations.

## <span style='color:#2656a3'>🗄 Model Registry</span>

The Model Registry in Hopsworks enable us to store the trained model. The model registry centralizes model management, enabling models to be securely accessed and governed. We can also save model metrics with the model, enabling the user to understand performance of the model on test (or unseen) data.

In [None]:
# Exporting the trained model to a directory
model_dir = "lstm_electricity_price_model"
print('Exporting trained model to: {}'.format(model_dir))

# Saving the model using TensorFlow's saved_model.save function
tf.saved_model.save(model, model_dir)

### <span style="color:#2656a3;">⚙️ Model Schema</span>
A model schema defines the structure and format of the input and output data that a machine learning model expects and produces, respectively. It serves as a **blueprint** for understanding how to interact with the model in terms of input features and output predictions. In the context of the Hopsworks platform, a model schema is typically defined using the Schema class, which specifies the features expected in the input data and the target variable in the output data. This schema helps ensure consistency and compatibility between the model and the data it operates on.

In [None]:
# Importing the libraries for saving the model
from hsml.schema import Schema
from hsml.model_schema import ModelSchema

In [None]:
# # Specify the schema of the model's input and output using the features (X_train) and dependent variable (y_train)
# input_schema = Schema(X_train)
# output_schema = Schema(y_train)

# # Create a model schema using the input and output schemas
# model_schema = ModelSchema(input_schema, output_schema)

In [None]:
# Retrieving the Model Registry
mr = project.get_model_registry()

# Extracting loss value from the training history
metrics = {'loss': history_dict['val_loss'][0]} 

# Creating a TensorFlow model in the Model Registry
tf_model = mr.tensorflow.create_model(
    name="lstm_electricity_price_model",
    metrics=metrics,
    #model_schema=model_schema,
    description="LSTM Daily electricity price prediction model.",
    #input_example=X_train[:1]
)

# Saving the model to the specified directory
tf_model.save(model_dir)

## <span style="color:#2656a3;">⏭️ **Next:** Part 04: Batch Inference </span>

Next notebook we will use the registered model to make predictions based on the batch data.