# Iris Flower - Batch Prediction


In this notebook we will, 

1. Load the batch inference data that arrived in the last 24 hours
2. Predict the first Iris Flower found in the batch
3. Write the ouput png of the Iris flower predicted, to be displayed in Github Pages.

In [1]:
import pandas as pd
import hopsworks
import joblib

project = hopsworks.login()

fs = project.get_feature_store()

2025-01-03 01:46:30,812 INFO: Initializing external client
2025-01-03 01:46:30,813 INFO: Base URL: https://c.app.hopsworks.ai:443
2025-01-03 01:46:34,142 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1207459


In [2]:
mr = project.get_model_registry()
model = mr.get_model("aqi", version=1)
model_dir = model.download()
model = joblib.load(model_dir + "/aqi.pkl")

Downloading model artifact (0 dirs, 1 files)... DONE

We are downloading the 'raw' iris data. We explicitly do not want transformed data, reading for training. 

So, let's download the iris dataset, and preview some rows. 

Note, that it is 'tabular data'. There are 5 columns: 4 of them are "features", and the "variety" column is the **target** (what we are trying to predict using the 4 feature values in the target's row).

In [3]:
feature_view = fs.get_feature_view(name="aqi", version=1)

Now we will do some **Batch Inference**. 

We will read all the input features that have arrived in the last 24 hours, and score them.

In [4]:
import datetime
from PIL import Image

#the batch data is the feature data, i.e without the labels
batch_data = feature_view.get_batch_data()

y_pred = model.predict(batch_data)

y_pred

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (2.04s) 


array(['Good', 'Good', 'Good', ..., 'Good', 'Good', 'Hazardous'],
      dtype=object)

In [8]:
batch_data['aqi_value']

0         49
1         37
2         48
3         61
4         54
        ... 
16389    114
16390     41
16391     43
16392     34
16393    303
Name: aqi_value, Length: 16394, dtype: int64

Batch prediction output is the last entry in the batch - it is output as a file 'latest_iris.png'

In [5]:
#image of the predicted flower, the latest data added to the feature store
category = y_pred[y_pred.size-1]
# flower_img = "assets/" + flower + ".png"
# img = Image.open(flower_img)            

# img.save("../../assets/latest_iris.png")

In [6]:
aqi_fg = fs.get_feature_group(name="aqi", version=1)
df = aqi_fg.read()
df

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (4.87s) 


Unnamed: 0,country,city,aqi_value,aqi_category,co_aqi_value,ozone_aqi_value,no2_aqi_value,pm25_aqi_value,lat,lng,uuid
0,Germany,Wolgast,49,Good,1,35,2,49,54.0500,13.7667,ad2eed88-7404-4fea-83e1-a5365da42f69
1,Belgium,Ans,37,Good,1,22,4,37,50.6625,5.5200,346747f4-55c0-469d-81a2-cae081a4a615
2,Belgium,Tubize,48,Good,1,25,5,48,50.6930,4.2047,a5ffad51-2116-41d0-bafa-65b2aeac110d
3,Romania,Gheorgheni,61,Moderate,1,40,0,61,46.7200,25.5900,3d62cb08-f644-478e-b18e-9d91780a7d62
4,United States of America,State College,54,Moderate,1,40,1,54,40.7909,-77.8567,b03c92c8-5507-440f-a514-32e4ab235dd4
...,...,...,...,...,...,...,...,...,...,...,...
16389,India,Barauli,114,Unhealthy for Sensitive Groups,3,67,2,114,26.3815,84.5872,4b0be1d1-8a70-4e76-93ba-99b7932dd866
16390,United States of America,Albany,41,Good,1,15,8,41,44.6272,-123.0965,d6ad878d-33c5-462d-bde7-9b1eb7b6d703
16391,Switzerland,Winterthur,43,Good,1,25,2,43,47.4989,8.7286,3168e7be-9301-4b27-8b01-611b36ad0556
16392,Finland,Salo,34,Good,0,34,1,20,60.3861,23.1250,d8a12282-c99d-4197-8e2c-cfdddbff3ed9


In [7]:
label = df.iloc[-1]["aqi_category"]
label

'Hazardous'

In [9]:
#image of the actual flower
# label_flower = "assets/" + label + ".png"

# img = Image.open(label_flower)            

# img.save("../../assets/actual_iris.png")

In [8]:
import pandas as pd

monitor_fg = fs.get_or_create_feature_group(name="aqi_predictions",
                                  version=1,
                                  primary_key=["datetime"],
                                  description="Air Quality Prediction/Outcome Monitoring"
                                 )

# Clear the contents of the feature group
# monitor_fg = fs.get_feature_group(name="iris_predictions", version=1)

# monitor_fg.delete()

# print("Feature group contents cleared successfully!")

In [9]:
from datetime import datetime
now = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")
last_row = df.iloc[-1]
data = {
    'country': [last_row['country']],  # Convert to list for DataFrame compatibility
    'city': [last_row['city']],
    'aqi_value': [last_row['aqi_value']],
    'co_aqi_value': [last_row['co_aqi_value']],
    'ozone_aqi_value': [last_row['ozone_aqi_value']],
    'no2_aqi_value': [last_row['no2_aqi_value']],
    'pm25_aqi_value': [last_row['pm25_aqi_value']],
    'lat': [last_row['lat']],
    'lng': [last_row['lng']],
    'prediction': [category],  # Ensure 'flower' is a scalar value
    'label': [label],        # Ensure 'label' is a scalar value
    'datetime': [now],
}
monitor_df = pd.DataFrame(data)
monitor_fg.insert(monitor_df)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1207459/fs/1195092/fg/1393475


Uploading Dataframe: 100.00% |██████████| Rows 1/1 | Elapsed Time: 00:00 | Remaining Time: 00:00


Launching job: aqi_predictions_1_offline_fg_materialization
Job started successfully, you can follow the progress at 
https://c.app.hopsworks.ai:443/p/1207459/jobs/named/aqi_predictions_1_offline_fg_materialization/executions


(Job('aqi_predictions_1_offline_fg_materialization', 'SPARK'), None)

In [15]:
history_df = monitor_fg.read()
history_df

2025-01-03 01:48:06,540 ERROR: [Errno 2] Opening HDFS file '/apps/hive/warehouse/mlops101_featurestore.db/aqi_predictions_1/.hoodie/hoodie.properties' failed. Detail: [errno 2] No such file or directory. Detail: Python exception: FlyingDuckException. gRPC client debug context: UNKNOWN:Error received from peer ipv4:51.79.26.27:5005 {grpc_message:"[Errno 2] Opening HDFS file \'/apps/hive/warehouse/mlops101_featurestore.db/aqi_predictions_1/.hoodie/hoodie.properties\' failed. Detail: [errno 2] No such file or directory. Detail: Python exception: FlyingDuckException", grpc_status:2, created_time:"2025-01-03T00:48:06.541484656+00:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
Traceback (most recent call last):
  File "C:\Users\HP\anaconda3\Lib\site-packages\hsfs\core\arrow_flight_client.py", line 364, in afs_error_handler_wrapper
    return func(instance, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\HP\anaconda3\Lib\site-packages\hs

FeatureStoreException: Could not read data using Hopsworks Feature Query Service.