# <span style="font-width:bold; font-size: 3rem; color:#1EB182;"><img src="../images/icon102.png" width="38px"></img> **Hopsworks Feature Store** </span><span style="font-width:bold; font-size: 3rem; color:#333;">- Part 04: Batch Inference</span>

## 🗒️ This notebook is divided into the following sections:

1. Load batch data.
2. Retrieve your trained model from the Model Registry.
3. Load batch data.
4. Predict batch data.

## <span style='color:#ff5f27'> 📝 Imports

In [1]:
import joblib
import datetime
import pandas as pd

## <span style="color:#ff5f27;"> 📡 Connect to Hopsworks Feature Store </span>

In [2]:
import hopsworks

project = hopsworks.login()

fs = project.get_feature_store()



Connected. Call `.close()` to terminate connection gracefully.

Logged in to project, explore it here https://snurran.hops.works/p/5242
Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;"> ⚙️ Feature View Retrieval</span>

In [3]:
# Retrieve the 'air_quality_fv' feature view
feature_view = fs.get_feature_view(
    name='air_quality_fv',
    version=1,
)

## <span style="color:#ff5f27;">🗄 Model Registry</span>

In [4]:
# Retrieve the model registry
mr = project.get_model_registry()

Connected. Call `.close()` to terminate connection gracefully.


## <span style="color:#ff5f27;">🪝 Retrieve model from Model Registry</span>

In [5]:
# Retrieve the 'air_quality_xgboost_model' from the model registry
retrieved_model = mr.get_model(
    name="air_quality_xgboost_model",
    version=1,
)

# Download the saved model artifacts to a local directory
saved_model_dir = retrieved_model.download()

2024-02-21 12:14:03,447 main ERROR Cannot access RandomAccessFile java.io.IOException: Could not create directory /srv/hops/hadoop-3.2.0.12-EE-RC0/logs java.io.IOException: Could not create directory /srv/hops/hadoop-3.2.0.12-EE-RC0/logs
	at org.apache.logging.log4j.core.util.FileUtils.mkdir(FileUtils.java:128)
	at org.apache.logging.log4j.core.util.FileUtils.makeParentDirs(FileUtils.java:141)
	at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$RollingRandomAccessFileManagerFactory.createManager(RollingRandomAccessFileManager.java:231)
	at org.apache.logging.log4j.core.appender.rolling.RollingRandomAccessFileManager$RollingRandomAccessFileManagerFactory.createManager(RollingRandomAccessFileManager.java:204)
	at org.apache.logging.log4j.core.appender.AbstractManager.getManager(AbstractManager.java:144)
	at org.apache.logging.log4j.core.appender.OutputStreamManager.getManager(OutputStreamManager.java:100)
	at org.apache.logging.log4j.core.appender.rolling.Ro

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/srv/hops/hadoop-3.2.0.12-EE-RC0/share/hadoop/common/lib/log4j-slf4j-impl-2.19.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/srv/hops/hadoop-3.2.0.12-EE-RC0/share/hadoop/hdfs/lib/log4j-slf4j-impl-2.19.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]


Downloading model artifact (0 dirs, 6 files)... DONE

In [6]:
# Load the XGBoost regressor model and label encoder from the saved model directory
retrieved_xgboost_model = joblib.load(saved_model_dir + "/xgboost_regressor.pkl")
retrieved_encoder = joblib.load(saved_model_dir + "/label_encoder.pkl")

# Display the retrieved XGBoost regressor model
retrieved_xgboost_model

## <span style="color:#ff5f27;">✨ Load Batch Data of last days</span>

First, you will need to fetch the training dataset that you created in the previous notebook.

In [7]:
# Get the current date
today = datetime.date.today()

# Calculate a date threshold 30 days ago from the current date
date_threshold = today - datetime.timedelta(days=30)

# Convert the date threshold to a string format
str(date_threshold)

'2024-01-22'

In [8]:
# Initialize batch scoring
feature_view.init_batch_scoring(1)

# Retrieve batch data from the feature view with a start time set to the date threshold
batch_data = feature_view.get_batch_data(
    start_time=date_threshold,
)

Finished: Reading data from Hopsworks, using ArrowFlight (7.95s) 


### <span style="color:#ff5f27;">🤖 Making the predictions</span>

In [12]:
# Transform the 'city_name' column in the batch data using the retrieved label encoder
encoded = retrieved_encoder.transform(batch_data['city_name'])

# Concatenate the label-encoded 'city_name' with the original batch data
X_batch = pd.concat([batch_data, pd.DataFrame(encoded)], axis=1)

# Drop unnecessary columns ('date', 'city_name', 'unix_time') from the batch data
X_batch = X_batch.drop(columns=['date', 'city_name', 'unix_time'])

# Rename the newly added column with label-encoded city names to 'city_name_encoded'
X_batch = X_batch.rename(columns={0: 'city_name_encoded'})

# Extract the target variable 'pm2_5' from the batch data
y_batch = X_batch.pop('pm2_5')

X_batch.head(3)

Unnamed: 0,pm_2_5_previous_1_day,pm_2_5_previous_2_day,pm_2_5_previous_3_day,pm_2_5_previous_4_day,pm_2_5_previous_5_day,pm_2_5_previous_6_day,pm_2_5_previous_7_day,mean_7_days,mean_14_days,mean_28_days,...,temperature_max,temperature_min,precipitation_sum,rain_sum,snowfall_sum,precipitation_hours,wind_speed_max,wind_gusts_max,wind_direction_dominant,city_name_encoded
0,7.7,7.4,6.8,22.9,8.6,11.1,8.5,10.428571,9.15,9.610714,...,8.8,-3.9,0.0,0.0,0.0,0.0,15.1,32.4,108,3
1,13.9,11.9,15.2,19.1,14.8,10.6,6.4,13.128571,9.6,11.667857,...,7.9,2.5,0.0,0.0,0.0,0.0,13.0,19.4,83,31
2,2.7,5.1,6.9,3.7,5.1,7.6,10.5,5.942857,5.307143,5.285714,...,10.7,7.5,1.9,1.9,0.0,4.0,18.5,39.2,225,37


In [10]:
# Make predictions on the batch data using the retrieved XGBoost regressor model
predictions = retrieved_xgboost_model.predict(X_batch)

# Display the first 5 predictions
predictions[:5]

array([11.763566 , 12.726435 ,  3.3570244,  5.7458963,  5.986775 ],
      dtype=float32)

---
## <span style="color:#ff5f27;">👾 Now try out the Streamlit App!</span>

In [11]:
# !python3 -m streamlit run streamlit_app.py

---

### <span style="color:#ff5f27;">🥳 <b> Next Steps  </b> </span>
Congratulations you've now completed the Air Quality tutorial for Managed Hopsworks.

Check out our other tutorials on ➡ https://github.com/logicalclocks/hopsworks-tutorials

Or documentation at ➡ https://docs.hopsworks.ai