# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Data Engineering and Machine Learning Operations in Business** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## <span style='color:#2656a3'> 🗒️ This notebook is divided into the following sections:

1. Load batch data.
2. Predict using model from Model Registry.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

First, we'll install the Python packages required for this notebook. We'll use the --quiet command after specifying the names of the libraries to ensure a silent installation process. Then, we'll proceed to import all the necessary libraries.

In [1]:
# Importing the packages for the needed libraries for the Jupyter notebook
import joblib
import inspect 
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import os

#%config InlineBackend.figure_format='retina'
#%matplotlib inline

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

In [2]:
# Importing the hopsworks module
import hopsworks

# Logging in to the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store() 

Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/554133
Connected. Call `.close()` to terminate connection gracefully.


### <span style='color:#2656a3'> ⚙️ Feature View Retrieval

In [3]:
# Retrieve the 'electricity_feature_view' feature view
feature_view = fs.get_feature_view(
    name='electricity_feature_view',
    version=1,
)

### <span style='color:#2656a3'> 🗄 Model Registry

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


## <span style='color:#2656a3'> 📮 Retrieving model from Model Registry

In [5]:
# Retrieving the model from the Model Registry
retrieved_model = mr.get_model(
    name="electricity_price_prediction_model", 
    version=1,
)

# Downloading the saved model to a local directory
saved_model_dir = retrieved_model.download()

# Loading the saved XGB model
retrieved_xgboost_model = joblib.load(saved_model_dir + "/dk_electricity_model.pkl")

Downloading model artifact (0 dirs, 3 files)... DONE

In [6]:
# Display the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style='color:#2656a3'> ✨ Load Batch Data

In [7]:
import datetime

# Calculating the start date as 5 days ago from the current date
start_date = datetime.datetime.now() - datetime.timedelta(days=5)

# Converting the start date to a timestamp in milliseconds
start_time = int(start_date.timestamp()) * 1000

# Displaying the start date in timestamp format
start_time

1714138328000

In [10]:
# Initializing batch scoring
feature_view.init_batch_scoring(training_dataset_version=1)

# Retrieving batch data from the feature view starting from the specified start time
batch_data = feature_view.get_batch_data(
    start_time=start_time,
)

Finished: Reading data from Hopsworks, using ArrowFlight (2.85s) 


In [11]:
# Display the first 5 rows of the batch data
batch_data.head(5)

Unnamed: 0,timestamp,time,date,dk1_spotpricedkk_kwh,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
0,1714287600000,2024-04-28 07:00:00+00:00,2024-04-28,0.00186,0.959167,0.77175,0.184346,8.5,91.0,0.0,0.0,0.0,2.0,62.0,12.8,22.3,Not a Workday
1,1714392000000,2024-04-29 12:00:00+00:00,2024-04-29,0.26984,0.649292,1.123,1.615064,14.1,48.0,0.0,0.0,0.0,1.0,32.0,17.8,39.2,Workday
2,1714536000000,2024-05-01 04:00:00+00:00,2024-05-01,0.35659,0.605792,1.227542,0.000218,11.2,78.0,0.0,0.0,0.0,3.0,96.0,18.0,33.5,Workday
3,1714172400000,2024-04-26 23:00:00+00:00,2024-04-26,0.65829,0.178042,0.244625,0.0,3.9,96.0,0.0,0.0,0.0,1.0,36.0,3.9,8.6,Workday
4,1714258800000,2024-04-27 23:00:00+00:00,2024-04-27,0.48644,0.657625,0.999583,0.0,6.9,93.0,0.0,0.0,0.0,1.0,40.0,16.2,29.9,Not a Workday


### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [16]:
from sklearn.preprocessing import LabelEncoder

# Create a LabelEncoder object
label_encoder = LabelEncoder()

# Fit the encoder to the data in the 'city_name' column
label_encoder.fit(batch_data[['type']])

# Transform the 'city_name' column data using the fitted encoder
encoded = label_encoder.transform(batch_data[['type']])



In [17]:
# Convert the output of the label encoding to a dense array and concatenate with the original data
X_batch = pd.concat([batch_data, pd.DataFrame(encoded)], axis=1)

# Drop columns 'date', 'city_name', 'unix_time' from the DataFrame 'X'
X_batch = X_batch.drop(columns=['date', 'time', 'timestamp', 'type'])

# Rename the newly added column with label-encoded city names to 'city_name_encoded'
X_batch = X_batch.rename(columns={0: "type_encoded"})

# Displaying the first 5 rows of the modified DataFrame
X_batch.head()

See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)


Unnamed: 0,dk1_spotpricedkk_kwh,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type_encoded
48,0.48757,0.42825,1.065542,0.712989,7.9,67.0,0.1,0.1,0.0,51.0,100.0,24.5,49.3,0
38,0.5215,0.374083,0.968125,0.740813,7.5,67.0,0.1,0.1,0.0,51.0,100.0,23.1,47.5,0
33,0.53478,0.322542,0.848917,0.666078,6.8,73.0,0.1,0.1,0.0,51.0,100.0,21.2,43.9,0
23,0.60012,0.29775,0.743667,0.498373,6.4,76.0,0.2,0.2,0.0,51.0,81.0,19.6,40.0,0
18,0.70021,0.281875,0.633917,0.315199,5.8,81.0,0.1,0.1,0.0,51.0,61.0,15.3,37.1,0


In [18]:
# Extract the target variable 'dk1_spotpricedkk_kwh' from the batch data
y_batch = X_batch.pop('dk1_spotpricedkk_kwh')

# Displaying the first 5 rows of the modified DataFrame
y_batch.head()

48    0.48757
38    0.52150
33    0.53478
23    0.60012
18    0.70021
Name: dk1_spotpricedkk_kwh, dtype: float64

In [19]:
# Display the target variable
y_batch

48    0.48757
38    0.52150
33    0.53478
23    0.60012
18    0.70021
       ...   
32    0.37590
47    0.37292
27    0.25366
64    0.22315
96    0.16408
Name: dk1_spotpricedkk_kwh, Length: 106, dtype: float64

In [20]:
# Make predictions on the batch data using the retrieved XGBoost regressor model
predictions = retrieved_xgboost_model.predict(X_batch)

# Display the first 5 predictions
predictions[:5]

See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)


array([0.25547686, 0.37913612, 0.33905983, 0.3961694 , 0.5968245 ],
      dtype=float32)

---
## <span style="color:#ff5f27;">👾 Next is creating our Streamlit App?</span>