# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Data Engineering and Machine Learning Operations in Business** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## <span style='color:#2656a3'> 🗒️ This notebook is divided into the following sections:

1. Load batch data.
2. Predict using model from Model Registry.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

First, we'll install the Python packages required for this notebook. We'll use the --quiet command after specifying the names of the libraries to ensure a silent installation process. Then, we'll proceed to import all the necessary libraries.

In [1]:
# Importing the packages for the needed libraries for the Jupyter notebook
import joblib
import inspect 
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import os

#%config InlineBackend.figure_format='retina'
#%matplotlib inline

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

In [2]:
# Importing the hopsworks module
import hopsworks

# Logging in to the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store() 

  from .autonotebook import tqdm as notebook_tqdm


Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040
Connected. Call `.close()` to terminate connection gracefully.


### <span style='color:#2656a3'> ⚙️ Feature View Retrieval

In [3]:
# Retrieve the 'electricity_feature_view' feature view
feature_view = fs.get_feature_view(
    name='electricity_feature_view',
    version=1,
)

### <span style='color:#2656a3'> 🗄 Model Registry

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


## <span style='color:#2656a3'> 📮 Retrieving model from Model Registry

In [5]:
# Retrieving the model from the Model Registry
retrieved_model = mr.get_model(
    name="electricity_price_prediction_model", 
    version=1,
)

# Downloading the saved model to a local directory
saved_model_dir = retrieved_model.download()

# Loading the saved XGB model
retrieved_xgboost_model = joblib.load(saved_model_dir + "/dk_electricity_model.pkl")


Downloading model artifact (0 dirs, 3 files)... DONE

In [6]:
# Display the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style='color:#2656a3'> ✨ Load Batch Data

In [7]:
import datetime

# Calculating the start date as 5 days ago from the current date
start_date = datetime.datetime.now() - datetime.timedelta(days=5)

# Converting the start date to a timestamp in milliseconds
start_time = int(start_date.timestamp()) * 1000

In [13]:
# Initializing batch scoring
feature_view.init_batch_scoring(1)

# Retrieving batch data from the feature view starting from the specified start time
batch_data = feature_view.get_batch_data(
    start_time=start_time,
)

Finished: Reading data from Hopsworks, using ArrowFlight (2.67s) 


In [35]:
# Display the first 5 rows of the batch data
batch_data.head(5)

Unnamed: 0,timestamp,time,date,dk1_spotpricedkk_kwh,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
36,1714035600000,2024-04-25 09:00:00+00:00,2024-04-25,0.222241,0.088847,0.055661,0.364648,0.445455,0.373333,0.0,0.0,0.0,0.04,1.0,0.319192,0.313417,0
16,1714039200000,2024-04-25 10:00:00+00:00,2024-04-25,0.210133,0.092402,0.072598,0.431977,0.454545,0.333333,0.0,0.0,0.0,0.04,0.89,0.335354,0.33543,0
1,1714042800000,2024-04-25 11:00:00+00:00,2024-04-25,0.20832,0.110765,0.088312,0.45882,0.456818,0.333333,0.0,0.0,0.0,0.026667,0.55,0.363636,0.361635,0
71,1714046400000,2024-04-25 12:00:00+00:00,2024-04-25,0.204035,0.109006,0.10076,0.489359,0.456818,0.346667,0.0,0.0,0.0,0.026667,0.56,0.361616,0.365828,0
21,1714050000000,2024-04-25 13:00:00+00:00,2024-04-25,0.198776,0.093025,0.113017,0.486068,0.452273,0.36,0.0,0.0,0.0,0.013333,0.37,0.367677,0.365828,0


### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [37]:
# Sorting the DataFrame based on the 'timestamp' column
batch_data.sort_values(["timestamp"], inplace=True)

# Dropping the 'date', 'timestamp' and 'time' columns from the DataFrame
X_batch = batch_data.drop(["date", "timestamp","time"], axis=1)

# Displaying the first 3 rows of the modified DataFrame
X_batch.head(3)

Unnamed: 0,dk1_spotpricedkk_kwh,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
36,0.222241,0.088847,0.055661,0.364648,0.445455,0.373333,0.0,0.0,0.0,0.04,1.0,0.319192,0.313417,0
16,0.210133,0.092402,0.072598,0.431977,0.454545,0.333333,0.0,0.0,0.0,0.04,0.89,0.335354,0.33543,0
1,0.20832,0.110765,0.088312,0.45882,0.456818,0.333333,0.0,0.0,0.0,0.026667,0.55,0.363636,0.361635,0


In [38]:
# Extract the target variable 'dk1_spotpricedkk_kwh' from the batch data
y_batch = X_batch.pop('dk1_spotpricedkk_kwh')

# Displaying the first 3 rows of the modified DataFrame
X_batch.head(3)

Unnamed: 0,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
36,0.088847,0.055661,0.364648,0.445455,0.373333,0.0,0.0,0.0,0.04,1.0,0.319192,0.313417,0
16,0.092402,0.072598,0.431977,0.454545,0.333333,0.0,0.0,0.0,0.04,0.89,0.335354,0.33543,0
1,0.110765,0.088312,0.45882,0.456818,0.333333,0.0,0.0,0.0,0.026667,0.55,0.363636,0.361635,0


In [42]:
# Display the target variable
y_batch

36     0.222241
16     0.210133
1      0.208320
71     0.204035
21     0.198776
         ...   
100    0.195324
27     0.194834
67     0.196156
96     0.184178
42     0.181624
Name: dk1_spotpricedkk_kwh, Length: 111, dtype: float64

In [43]:
# Make predictions on the batch data using the retrieved XGBoost regressor model
predictions = retrieved_xgboost_model.predict(X_batch)

# Display the first 5 predictions
predictions[:5]

See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)


array([0.30831686, 0.29268646, 0.24546978, 0.28364897, 0.20107578],
      dtype=float32)

## <span style='color:#2656a3'> 🤖 Testing Making the predictions

In [46]:
# Making predictions on the batch data using the retrieved XGBoost model
# X_pred = df.iloc[:, 1:]

# print("Daily instance: \n{}".format(X_pred))

In [47]:
# predict and get latest (daily) feature
# y_pred = model.predict(df.iloc[:, 0])
# print("Prediction: {}".format(y_pred[0]))

---

### <span style="color:#ff5f27;">🥳 <b> Next Steps  </b> </span>
Congratulations you've now completed the Electricity price tutorial for Managed Hopsworks.

Check out our other tutorials on ➡ https://github.com/logicalclocks/hopsworks-tutorials

Or documentation at ➡ https://docs.hopsworks.ai