<h1 style=\"text-align: center; font-size: 50px;\">🌷 Register Model with LDA and SVM </h1>
This notebook is about Iris Flowers: a famous machine learning classification problem. <br>
The goal is to create a model that classifies the categorical variable (setosa, virginica or versicolor) based in some probability.

## Notebook Overview
- Imports
- Configurations
- Define User Constants
- Loading the dataset
- Summarize the Dataset 
- Define MLflow Class
- Logging Model to MLflow
- Fetching the Latest Model Version from MLflow
- Loading the Model and Running Inference


## Imports

In [1]:
%%time

%pip install -r ../requirements.txt --quiet

# ------------------------ Data Manipulation ------------------------
import numpy as np
import pandas as pd

# ------------------------ System Utilities ------------------------
import os
import warnings
import logging
import time
from typing import Optional, Any

# ------------------------ Machine Learning tools ------------------------
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.svm import SVC

# ------------------------ MLflow for Experiment Tracking and Model Management ------------------------
import mlflow
from mlflow import MlflowClient
from mlflow.models.signature import ModelSignature
from mlflow.types.schema import Schema, ColSpec


## Configurations

In [2]:
# Suppress Python warnings
warnings.filterwarnings("ignore")

In [3]:
# Create logger
logger = logging.getLogger("flower_logger")
logger.setLevel(logging.INFO)
logger.propagate = False
logger.handlers.clear()

formatter = logging.Formatter("%(asctime)s - %(levelname)s - %(message)s", 
                              datefmt="%Y-%m-%d %H:%M:%S")  

stream_handler = logging.StreamHandler()
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)

## Define User Constants

In [4]:
# ------------------------- Paths -------------------------
DATASET_URL = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
MODEL_DIR = "../model/Iris_model.joblib"
ARTIFACT_PATH = "iris_model"

# ------------------------ MLflow Integration ------------------------
EXPERIMENT_NAME = "Iris_Flower_Experiment"
RUN_NAME = "Iris_Flower_Run"
MODEL_NAME = "Iris_Flower_Model"

In [5]:
start_time = time.time() 
logger.info('Notebook execution started.')

2025-08-14 15:10:59 - INFO - Notebook execution started.


# Loading the dataset
First of all we will import some libraries for analysis and model building:

In [6]:
dataset_url = DATASET_URL
col_name = ["sepal-length", "sepal-width", "petal-length","petal-width","class"]

# Reading the .csv file
dataset = pd.read_csv(dataset_url, names = col_name)

# Summarize the Dataset 

## Dataset overview

Dataset contains the data for this project in comma-separated values (CSV) format. The number of columns is 5, and the number of rows is 150.

In [7]:
print("Dataset shape:", dataset.shape, " => 150 rows and 5 columns \n")
dataset.head(10) # The head() function is used to get the first n rows. By default: n = 5

Dataset shape: (150, 5)  => 150 rows and 5 columns 



Unnamed: 0,sepal-length,sepal-width,petal-length,petal-width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


In [8]:
x = dataset.drop(['class'], axis=1)
y = dataset['class']
logger.info(f'x shape: {x.shape} | y shape: {y.shape} ')

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=1)

2025-08-14 15:10:59 - INFO - x shape: (150, 4) | y shape: (150,) 


## Define MLflow Class

In [9]:
class IrisFlowerModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context: Optional[Any]) -> None:
        try:    
            dataset_url = DATASET_URL
            col_name = ["sepal-length", "sepal-width", "petal-length","petal-width","class"]
            dataset = pd.read_csv(dataset_url, names = col_name)

            x = dataset.drop(['class'], axis=1)
            y = dataset['class']

            x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=1)

            self.scaler = StandardScaler()
            x_train_scaled = self.scaler.fit_transform(x_train)
            x_test_scaled = self.scaler.transform(x_test)

            self.model = SVC(kernel="rbf", gamma="scale", C=6.812920690579608)
            self.model.fit(x_train_scaled,y_train)

            self.model = LinearDiscriminantAnalysis(solver="svd")
            self.model.fit(x_train_scaled, y_train)

            self.acc_test = accuracy_score(y_test, self.model.predict(x_test_scaled))

        except Exception as e:
            logger.error(f"Error error during initialization: {str(e)}")
            raise
    
    def predict(self, context: Any, model_input: pd.DataFrame, params: Optional[dict] = None) -> list[str]:
        """
        Computes the predicted class of Iris Flower.
        """
        try:       
            x_scaled = self.scaler.transform(model_input)
            prediction = self.model.predict(x_scaled)
            return prediction.tolist()
            
        except Exception as e:
            logger.error(f"Error performing prediction: {str(e)}")
            raise
            
    @classmethod
    def log_model(cls, model_name: str) -> None:
        """
        Logs the model to MLflow with artifacts for demo and config.
        """
        try:
            # Define input and output schema
            input_schema = Schema([
                ColSpec("double","sepal-length"),
                ColSpec("double","sepal-width"),
                ColSpec("double","petal-length"),
                ColSpec("double","petal-width"),
                ])
            output_schema = Schema([
                ColSpec("string", "class"),
            ])
            
            # Define model signature
            signature = ModelSignature(inputs=input_schema, outputs=output_schema)

            model_instance = cls()
            model_instance.load_context(None)
            
            # Prepare artifacts dictionary
            artifacts = {}
            
            # Add demo folder as artifact
            demo_folder = "../demo"
            if os.path.exists(demo_folder):
                artifacts["demo"] = demo_folder
                logger.info(f"✅ Demo folder added to artifacts: {demo_folder}")
            else:
                logger.warning(f"⚠️  Demo folder not found: {demo_folder}")
                
            # Add config file as artifact
            config_path = "../configs/config.yaml"
            if os.path.exists(config_path):
                artifacts["config"] = config_path
                logger.info(f"✅ Config file added to artifacts: {config_path}")
            else:
                logger.warning(f"⚠️  Config file not found: {config_path}")
            
            # Log the model in MLflow
            mlflow.pyfunc.log_model(
                artifact_path=model_name,
                python_model=cls(),
                signature=signature,
                artifacts=artifacts if artifacts else None,
                pip_requirements=["mlflow", "pandas", "scikit-learn", "numpy"]
            )
            
            mlflow.log_metric("test_accuracy", model_instance.acc_test)
        
        except Exception as e:
            logger.error(f"Error logging model: {str(e)}")
            raise

## Logging Model to MLflow

In [10]:
# Set up MLflow experiment
mlflow.set_experiment(EXPERIMENT_NAME)

# Start an MLflow run
with mlflow.start_run(run_name=RUN_NAME) as run:
    logger.info(f"Starting the experiment: {EXPERIMENT_NAME}")
    
    # Log the model using our custom class
    IrisFlowerModel.log_model(model_name=MODEL_NAME)
    
    # Register the model in the MLflow Model Registry
    model_uri = f"runs:/{run.info.run_id}/{MODEL_NAME}"
    mlflow.register_model(
        model_uri=model_uri, 
        name=MODEL_NAME
    )

logger.info(f'Registered the model: {MODEL_NAME}')

2025/08/14 15:10:59 INFO mlflow.tracking.fluent: Experiment with name 'Iris_Flower_Experiment' does not exist. Creating a new experiment.
2025-08-14 15:10:59 - INFO - Starting the experiment: Iris_Flower_Experiment
2025-08-14 15:10:59 - INFO - ✅ Demo folder added to artifacts: ../demo
2025-08-14 15:10:59 - INFO - ✅ Config file added to artifacts: ../configs/config.yaml


Downloading artifacts:   0%|          | 0/8 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Successfully registered model 'Iris_Flower_Model'.
Created version '1' of model 'Iris_Flower_Model'.
2025-08-14 15:11:01 - INFO - Registered the model: Iris_Flower_Model


## Fetching the Latest Model Version from MLflow

In [11]:
# Initialize the MLflow client
client = MlflowClient()

# Retrieve the latest version of the "Iris_Flower_Model" model (not yet in a specific stage)
model_metadata = client.get_latest_versions(MODEL_NAME, stages=["None"])
latest_model_version = model_metadata[0].version  # Extract the latest model version

# Fetch model information, including its signature
model_info = mlflow.models.get_model_info(f"models:/{MODEL_NAME}/{latest_model_version}")

# Print the latest model version and its signature
logger.info(f"Latest Model Version: {latest_model_version}")
logger.info(f"Model Signature: {model_info.signature}")

2025-08-14 15:11:01 - INFO - Latest Model Version: 1
2025-08-14 15:11:01 - INFO - Model Signature: inputs: 
  ['sepal-length': double (required), 'sepal-width': double (required), 'petal-length': double (required), 'petal-width': double (required)]
outputs: 
  ['class': string (required)]
params: 
  None



## Loading the Model and Running Inference

In [12]:
model = mlflow.pyfunc.load_model(model_uri=f"models:/{MODEL_NAME}/{latest_model_version}")

df_input = pd.DataFrame({
    'sepal-length': [5.1],
    'sepal-width': [3.5],
    'petal-length':	[1.4],
    'petal-width': [0.2]
})
prediction = model.predict(df_input)
logger.info(prediction)


2025-08-14 15:11:01 - INFO - ['Iris-setosa']


In [13]:
end_time: float = time.time()
elapsed_time: float = end_time - start_time
elapsed_minutes: int = int(elapsed_time // 60)
elapsed_seconds: float = elapsed_time % 60
logger.info(f"Total execution time: {elapsed_minutes}m {elapsed_seconds:.2f}s")
logger.info("Notebook execution completed successfully.")


2025-08-14 15:11:01 - INFO - Total execution time: 0m 2.52s
2025-08-14 15:11:01 - INFO - Notebook execution completed successfully.


Built with ❤️ using [**HP AI Studio**](https://hp.com/ai-studio).