# Iris Flower - Batch Prediction


In this notebook we will, 

1. Load the batch inference data that arrived in the last 24 hours
2. Predict the first Iris Flower found in the batch
3. Write the ouput png of the Iris flower predicted, to be displayed in Github Pages.

In [1]:
import pandas as pd
import hopsworks
import joblib

project = hopsworks.login()

fs = project.get_feature_store()

2024-12-30 20:15:35,738 INFO: Initializing external client
2024-12-30 20:15:35,741 INFO: Base URL: https://c.app.hopsworks.ai:443
2024-12-30 20:15:39,756 INFO: Python Engine initialized.

Logged in to project, explore it here https://c.app.hopsworks.ai:443/p/1207459


In [2]:
mr = project.get_model_registry()
model = mr.get_model("iris", version=1)
model_dir = model.download()
model = joblib.load(model_dir + "/iris_model.pkl")

Downloading model artifact (0 dirs, 2 files)... DONE

We are downloading the 'raw' iris data. We explicitly do not want transformed data, reading for training. 

So, let's download the iris dataset, and preview some rows. 

Note, that it is 'tabular data'. There are 5 columns: 4 of them are "features", and the "variety" column is the **target** (what we are trying to predict using the 4 feature values in the target's row).

In [3]:
feature_view = fs.get_feature_view(name="iris", version=1)

Now we will do some **Batch Inference**. 

We will read all the input features that have arrived in the last 24 hours, and score them.

In [4]:
import datetime
from PIL import Image

#the batch data is the feature data, i.e without the labels
batch_data = feature_view.get_batch_data()

y_pred = model.predict(batch_data)

y_pred

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (0.94s) 


array(['Setosa', 'Versicolor', 'Setosa', 'Setosa', 'Setosa', 'Versicolor',
       'Virginica', 'Setosa', 'Virginica', 'Setosa', 'Setosa', 'Setosa',
       'Versicolor', 'Setosa', 'Versicolor', 'Virginica', 'Setosa',
       'Virginica', 'Virginica', 'Virginica', 'Versicolor', 'Virginica',
       'Setosa', 'Setosa', 'Setosa', 'Versicolor', 'Virginica',
       'Versicolor', 'Versicolor', 'Virginica', 'Virginica', 'Setosa',
       'Setosa', 'Setosa', 'Versicolor', 'Setosa', 'Versicolor',
       'Versicolor', 'Virginica', 'Versicolor', 'Versicolor',
       'Versicolor', 'Versicolor', 'Virginica', 'Versicolor', 'Setosa',
       'Virginica', 'Virginica', 'Setosa', 'Setosa', 'Versicolor',
       'Versicolor', 'Virginica', 'Setosa', 'Virginica', 'Setosa',
       'Versicolor', 'Versicolor', 'Virginica', 'Setosa', 'Virginica',
       'Setosa', 'Setosa', 'Virginica', 'Setosa', 'Versicolor', 'Setosa',
       'Virginica', 'Versicolor', 'Setosa', 'Virginica', 'Versicolor',
       'Virginica', 'Virgin

In [5]:
batch_data['sepal_length']

0      5.000000
1      6.600000
2      5.100000
3      5.100000
4      5.400000
         ...   
177    5.367117
178    5.367117
179    5.367117
180    6.533010
181    4.995918
Name: sepal_length, Length: 182, dtype: float64

Batch prediction output is the last entry in the batch - it is output as a file 'latest_iris.png'

In [6]:
#image of the predicted flower, the latest data added to the feature store
flower = y_pred[y_pred.size-1]
flower_img = "assets/" + flower + ".png"
img = Image.open(flower_img)            

img.save("../../assets/latest_iris.png")

In [7]:
iris_fg = fs.get_feature_group(name="iris", version=1)
df = iris_fg.read()
df

Finished: Reading data from Hopsworks, using Hopsworks Feature Query Service (1.33s) 


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,variety,uuid
0,5.000000,3.300000,1.400000,0.200000,Setosa,48fe1330-1f4f-4526-8bb6-2f3592ab7b61
1,6.600000,3.000000,4.400000,1.400000,Versicolor,9b23506a-83b3-47fd-ac8f-8acfa716837d
2,5.100000,3.500000,1.400000,0.300000,Setosa,18383b75-a81b-4527-9cde-59b047a66bca
3,5.100000,3.700000,1.500000,0.400000,Setosa,0dd9d1fb-1d52-4df5-a6f2-b2c1a49c5db0
4,5.400000,3.400000,1.700000,0.200000,Setosa,eb0183b4-f3a2-4248-9b8b-a661f1bbcd54
...,...,...,...,...,...,...
177,5.367117,4.434446,1.374910,0.497858,Setosa,312cab31-018c-449e-88d5-ea84fe943b8e
178,5.367117,4.434446,1.374910,0.497858,Setosa,aaeb3d94-6142-4d24-88ad-4a497fee1337
179,5.367117,4.434446,1.374910,0.497858,Setosa,c374f893-edc1-449a-bea6-e0a1b2a43d0a
180,6.533010,3.296076,3.580095,1.754300,Versicolor,4ce96f90-c87f-476d-8303-b0ea8890f9fe


In [8]:
label = df.iloc[-1]["variety"]
label

'Versicolor'

In [9]:
#image of the actual flower
label_flower = "assets/" + label + ".png"

img = Image.open(label_flower)            

img.save("../../assets/actual_iris.png")

In [9]:
import pandas as pd

monitor_fg = fs.get_or_create_feature_group(name="iris_predictions",
                                  version=1,
                                  primary_key=["datetime"],
                                  description="Iris flower Prediction/Outcome Monitoring"
                                 )

# Clear the contents of the feature group
# monitor_fg = fs.get_feature_group(name="iris_predictions", version=1)

# monitor_fg.delete()

# print("Feature group contents cleared successfully!")

In [None]:
from datetime import datetime
now = datetime.now().strftime("%m/%d/%Y, %H:%M:%S")

data = {
    's_length': [float(df.iloc[-1]['sepal_length'])],  # Convert to scalar
    's_width': [float(df.iloc[-1]['sepal_width'])],
    'p_length': [float(df.iloc[-1]['petal_length'])],
    'p_width': [float(df.iloc[-1]['petal_width'])],
    'prediction': [flower],  # Ensure 'flower' is a scalar value
    'label': [label],        # Ensure 'label' is a scalar value
    'datetime': [now],
}
monitor_df = pd.DataFrame(data)
monitor_fg.insert(monitor_df)

Feature Group created successfully, explore it at 
https://c.app.hopsworks.ai:443/p/1207459/fs/1195092/fg/1394387


In [None]:
history_df = monitor_fg.read()
history_df

In [13]:
# import dataframe_image as dfi

# df_recent = history_df.tail(5)
 
# # If you exclude this image, you may have the same iris_latest.png and iris_actual.png files
# # If no files have changed, the GH-action 'git commit/push' stage fails, failing your GH action (last step)
# # This image, however, is always new, ensuring git commit/push will succeed.
# dfi.export(df_recent, '../../assets/df_recent.png', table_conversion = 'matplotlib')



In [19]:
# from sklearn.metrics import confusion_matrix

# predictions = history_df[['prediction']]
# labels = history_df[['label']]

# results = confusion_matrix(labels, predictions)
# print(results)

[[0 1]
 [0 0]]


In [20]:
# from matplotlib import pyplot
# import seaborn as sns

# # Only create the confusion matrix when our iris_predictions feature group has examples of all 3 iris flowers
# if results.shape == (3,3):

#     df_cm = pd.DataFrame(results, ['True Setosa', 'True Versicolor', 'True Virginica'],
#                          ['Pred Setosa', 'Pred Versicolor', 'Pred Virginica'])

#     cm = sns.heatmap(df_cm, annot=True)

#     fig = cm.get_figure()
#     fig.savefig("../../assets/confusion_matrix.png") 
#     df_cm
# else:
#     print("Run the batch inference pipeline more times until you get 3 different iris flowers")    

Run the batch inference pipeline more times until you get 3 different iris flowers
