# <span style="font-width:bold; font-size: 3rem; color:#2656a3;">**Data Engineering and Machine Learning Operations in Business** </span> <span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## <span style='color:#2656a3'> 🗒️ This notebook is divided into the following sections:

1. Load batch data.
2. Predict using model from Model Registry.

## <span style='color:#2656a3'> ⚙️ Import of libraries and packages

First, we'll install the Python packages required for this notebook. We'll use the --quiet command after specifying the names of the libraries to ensure a silent installation process. Then, we'll proceed to import all the necessary libraries.

In [1]:
# Importing the packages for the needed libraries for the Jupyter notebook
import joblib
import inspect 
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
import os

#%config InlineBackend.figure_format='retina'
#%matplotlib inline

## <span style="color:#2656a3;"> 📡 Connecting to Hopsworks Feature Store

In [2]:
# Importing the hopsworks module
import hopsworks

# Logging in to the Hopsworks project
project = hopsworks.login()

# Getting the feature store from the project
fs = project.get_feature_store() 

  from .autonotebook import tqdm as notebook_tqdm


Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/550040
Connected. Call `.close()` to terminate connection gracefully.


### <span style='color:#2656a3'> ⚙️ Feature View Retrieval

In [3]:
# Retrieve the 'electricity_feature_view' feature view
feature_view = fs.get_feature_view(
    name='electricity_feature_view2',
    version=1,
)

### <span style='color:#2656a3'> 🗄 Model Registry

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


## <span style='color:#2656a3'> 📮 Retrieving model from Model Registry

In [5]:
# Retrieving the model from the Model Registry
retrieved_model = mr.get_model(
    name="electricity_price_prediction_model", 
    version=1,
)

# Downloading the saved model to a local directory
saved_model_dir = retrieved_model.download()

# Loading the saved XGB model
retrieved_xgboost_model = joblib.load(saved_model_dir + "/dk_electricity_model.pkl")


Downloading model artifact (0 dirs, 3 files)... DONE

In [6]:
# Display the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style='color:#2656a3'> ✨ Load Batch Data

In [7]:
import datetime

# Calculating the start date as 5 days ago from the current date
start_date = datetime.datetime.now() - datetime.timedelta(days=5)

# Converting the start date to a timestamp in milliseconds
start_time = int(start_date.timestamp()) * 1000

In [8]:
start_time

1714120603000

In [9]:
# filtered_df["timestamp"] = filtered_df["time"].apply(lambda x: int(x.timestamp() * 1000))

In [10]:
# Initializing batch scoring
feature_view.init_batch_scoring(1)

# Retrieving batch data from the feature view starting from the specified start time
batch_data = feature_view.get_batch_data(
    start_time=start_time,
)

Finished: Reading data from Hopsworks, using ArrowFlight (2.45s) 


In [11]:
# Display the first 5 rows of the batch data
batch_data.head(5)

Unnamed: 0,timestamp,time,date,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
0,1714287600000,2024-04-28 07:00:00+00:00,2024-04-28,0.843712,0.206571,0.098264,0.456818,0.88,0.0,0.0,0.0,0.026667,0.62,0.258586,0.203354,0
1,1714392000000,2024-04-29 12:00:00+00:00,2024-04-29,0.571125,0.301191,0.860894,0.584091,0.306667,0.0,0.0,0.0,0.013333,0.32,0.359596,0.380503,1
2,1714536000000,2024-05-01 04:00:00+00:00,2024-05-01,0.532859,0.329352,0.000116,0.518182,0.706667,0.0,0.0,0.0,0.04,0.96,0.363636,0.320755,1
3,1714125600000,2024-04-26 10:00:00+00:00,2024-04-26,0.536781,0.30934,0.373829,0.456818,0.4,0.0,0.0,0.0,0.0,0.09,0.446465,0.4413,1
4,1714258800000,2024-04-27 23:00:00+00:00,2024-04-27,0.578455,0.267945,0.0,0.420455,0.906667,0.0,0.0,0.0,0.013333,0.4,0.327273,0.283019,0


### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [12]:
# Sorting the DataFrame based on the 'timestamp' column
batch_data.sort_values(["timestamp"], inplace=True)

# Dropping the 'date', 'timestamp' and 'time' columns from the DataFrame
X_batch = batch_data.drop(["date", "timestamp","time"], axis=1)

# Displaying the first 3 rows of the modified DataFrame
X_batch.head(3)

Unnamed: 0,dk1_offshore_wind_forecastintraday_kwh,dk1_onshore_wind_forecastintraday_kwh,dk1_solar_forecastintraday_kwh,temperature_2m,relative_humidity_2m,precipitation,rain,snowfall,weather_code,cloud_cover,wind_speed_10m,wind_gusts_10m,type
23,0.508742,0.276666,0.318783,0.443182,0.413333,0.0,0.0,0.0,0.0,0.18,0.426263,0.415094,1
3,0.536781,0.30934,0.373829,0.456818,0.4,0.0,0.0,0.0,0.0,0.09,0.446465,0.4413,1
13,0.535901,0.320317,0.397666,0.461364,0.373333,0.0,0.0,0.0,0.026667,0.52,0.466667,0.460168,1


In [13]:
# Extract the target variable 'dk1_spotpricedkk_kwh' from the batch data
y_batch = X_batch.pop('dk1_spotpricedkk_kwh')

# Displaying the first 3 rows of the modified DataFrame
X_batch.head(3)

KeyError: 'dk1_spotpricedkk_kwh'

In [14]:
# Display the target variable
y_batch

NameError: name 'y_batch' is not defined

In [None]:
# Make predictions on the batch data using the retrieved XGBoost regressor model
predictions = retrieved_xgboost_model.predict(X_batch)

# Display the first 5 predictions
predictions[:5]

---
## <span style="color:#ff5f27;">👾 Next is creating our Streamlit App?</span>