# Scikit-Learn Classifier

This Notebook is designed to be an example for developing a modular, reusable Scikit-Learn classification backend. 
In this guide we will:

1. Creating a project with the Poetry
2. Train a classifier with Scikit-Learn
3. Develop the Inference Backend for running the model with Packflow
4. Load and validate the Backend from the installed package

## Creating a Project

First, We'll install poetry and create a new Project:

In [1]:
%pip install poetry --quiet

Note: you may need to restart the kernel to use updated packages.


In [2]:
%%sh

poetry new sklearn_classifier

Created package sklearn_classifier in sklearn_classifier


Next, we need to install a few dependencies to our poetry project:

In [3]:
%%sh

poetry --directory ./sklearn_classifier add scikit-learn joblib pandas

Using version ^1.8.0 for scikit-learn
Using version ^1.5.3 for joblib
Using version ^3.0.0 for pandas

Updating dependencies
Resolving dependencies...

No dependencies to install or update

Writing lock file


## Training a Iris Classifier

For our sample use-case, we'll use the Scikit-Learn Iris dataset and train a simple Decision Tree Classifier:

In [4]:
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True, as_frame=True)

X.sample(3)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm)
21,5.1,3.7,1.5,0.4
29,4.7,3.2,1.6,0.2
111,6.4,2.7,5.3,1.9


For simplicity, we will ignore best practices and train our model on the entire dataset:

In [5]:
from sklearn.tree import DecisionTreeClassifier
import joblib

model = DecisionTreeClassifier()

model.fit(X, y)

joblib.dump(model, "sklearn_classifier/src/sklearn_classifier/model.joblib")

['sklearn_classifier/src/sklearn_classifier/model.joblib']

The model has now been trained (fit) and serialized with joblib to the path output above.

## Developing the Inference Backend

Now we can develop the Inference Backend for running and sharing the model with Packflow:

In [6]:
%%writefile sklearn_classifier/src/sklearn_classifier/inference.py

# -- Packflow imports --
from packflow import InferenceBackend, BackendConfig
from packflow.utils.normalize import ensure_valid_output

# -- Imports that are required to run the model --
from pathlib import Path
import pandas as pd
import sklearn
import joblib


class SklearnClassifierConfig(BackendConfig):
    # Create a config field for where to load the model from
    serialized_model_path: str = Path(__file__).resolve().parent.joinpath('model.joblib')
    
    # Define the default input feature names
    feature_names: list[str] = [
        'sepal length (cm)', 
        'sepal width (cm)', 
        'petal length (cm)', 
        'petal width (cm)'
    ]


class Backend(InferenceBackend):
    # override the default model with the custom model defined above
    backend_config_model = SklearnClassifierConfig

    def initialize(self):
        self.logger.info(f'Loading model from: {self.config.serialized_model_path}')
        self.model = joblib.load(self.config.serialized_model_path)

    def transform_inputs(self, inputs):
        """
        Convert input array (this backend uses the Numpy Preprocessor) to a Pandas DataFrame
        """
        return pd.DataFrame(columns=self.config.feature_names, data=inputs)
        
    
    def execute(self, inputs):
        """
        Run the Pandas DataFrame through the loaded model 
        and return the predicted class.
        """
        return self.model.predict(inputs)

    def transform_outputs(self, outputs):
        """
        Use Packflow.dev to convert outputs to safe return types.

        Note: 
            This method is less flexible and does not apply business-logic.
            However, for this demo we will assume the output does not need
            any special postprocessing.
        """
        return ensure_valid_output(outputs, parent_key='class')


# Set defaults for base fields
backend = Backend(
    input_format='numpy'
)

Writing sklearn_classifier/src/sklearn_classifier/inference.py


We will also need to add the `inference` module to the package by adding it to the `__init__.py` file so it can be imported:

In [7]:
%%writefile sklearn_classifier/src/sklearn_classifier/__init__.py

from . import inference

Overwriting sklearn_classifier/src/sklearn_classifier/__init__.py


Now that we've created an `inference.py` file to our poetry package, we can use Packflow's ModuleLoader to import the backend and run it wherever needed.

## Validating the Inference Backend

Now that we've create a snapshot, let's load and validate the backend is running as expected:

In [8]:
%pip install ./sklearn_classifier --quiet

Note: you may need to restart the kernel to use updated packages.


## IMPORTANT
You will likely need to restart the kernel in this notebook to proceed with loading and running the inference backend!

In [9]:
from packflow.loaders import ModuleLoader

# Import from the installed Poetry package
# We want to import the `backend` object from the `inference` module
# we will also pass a relative
backend = ModuleLoader("sklearn_classifier.inference:backend").load()

backend

[32m2026-01-21 14:16:08.037[0m | [34m[1mDEBUG   [0m | [36mpackflow.utils.normalize.base[0m:[36m_import_module[0m:[36m30[0m - [34m[1mTorchScalarHandler Type Converter is not available. Reason: No module named 'torch'[0m
[32m2026-01-21 14:16:08.038[0m | [34m[1mDEBUG   [0m | [36mpackflow.utils.normalize.base[0m:[36m_import_module[0m:[36m30[0m - [34m[1mTorchTensorHandler Type Converter is not available. Reason: No module named 'torch'[0m
[32m2026-01-21 14:16:08.039[0m | [34m[1mDEBUG   [0m | [36mpackflow.utils.normalize.base[0m:[36m_import_module[0m:[36m30[0m - [34m[1mPillowImageHandler Type Converter is not available. Reason: No module named 'PIL'[0m
[32m2026-01-21 14:16:08.059[0m | [34m[1mDEBUG   [0m | [36mpackflow.backend.configuration[0m:[36mload_backend_configuration[0m:[36m63[0m - [34m[1mLoaded raw configuration: {'input_format': 'numpy'}[0m
[32m2026-01-21 14:16:08.060[0m | [1mINFO    [0m | [36mpackflow.backend.configuration

Backend[
  SklearnClassifierConfig(verbose=True, input_format=<InputFormats.NUMPY: 'numpy'>, rename_fields={}, feature_names=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], flatten_nested_inputs=False, flatten_lists=False, nested_field_delimiter='.', serialized_model_path=PosixPath('/Users/cdao-user/.pyenv/versions/3.11.14/envs/packflow/lib/python3.11/site-packages/sklearn_classifier/model.joblib'))
]

In [10]:
from sklearn.datasets import load_iris

X, _ = load_iris(return_X_y=True, as_frame=True)

outputs = backend.validate(X.sample(10).to_dict("records"))

outputs[:5]

[32m2026-01-21 14:16:08.071[0m | [1mINFO    [0m | [36mpackflow.backend.base[0m:[36m__call__[0m:[36m86[0m - [1mExecutionMetrics(batch_size=10, execution_times=ExecutionTimes(preprocess=0.01938, transform_inputs=0.09275, execute=0.60546, transform_outputs=0.02338), total_execution_time=0.74097)[0m


[{'class': 0}, {'class': 0}, {'class': 1}, {'class': 0}, {'class': 1}]

## Conclusion

In this example notebook, we went through a simple example of how to create an inference backend for a scikit-learn classifier.

Try extending this example further to support custom output logic or supporting different model types!