# Example usage

## The Journey to Building the Best Classifier: A Regular Story

It was an overcast Wednesday morning when Alex decided to tackle a classification challenge. Armed with the `sklearn` datasets and a toolbox of custom Python functions from the `classifierpromax` package, Alex set out to build the best possible model to classify data efficiently and effectively.

In [1]:
import pandas as pd
import numpy as np
from classifierpromax.ClassifierTrainer import ClassifierTrainer
from classifierpromax.ClassifierOptimizer import ClassifierOptimizer
from classifierpromax.FeatureSelector import FeatureSelector
from classifierpromax.ResultHandler import ResultHandler
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn import datasets

## Loading the Dataset
Alex started by loading a familiar dataset from `sklearn`. For this project, the Iris dataset served as the training grounds—a well-known dataset for classifying different types of iris flowers.

In [2]:
# Load the dataset
iris = datasets.load_iris()
X, y = pd.DataFrame(iris.data, columns=iris.feature_names), pd.Series(iris.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
pd.concat([X_train, y_train], axis=1)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),0
81,5.5,2.4,3.7,1.0,1
133,6.3,2.8,5.1,1.5,2
137,6.4,3.1,5.5,1.8,2
75,6.6,3.0,4.4,1.4,1
109,7.2,3.6,6.1,2.5,2
...,...,...,...,...,...
71,6.1,2.8,4.0,1.3,1
106,4.9,2.5,4.5,1.7,2
14,5.8,4.0,1.2,0.2,0
92,5.8,2.6,4.0,1.2,1


With the data ready, Alex built a preprocessing pipeline to ensure the input features were standardized.

In [3]:
# Preprocessing pipeline
preprocessor = make_pipeline(StandardScaler())

## Training the Initial Models
The first step in Alex’s workflow was training baseline models using the `ClassifierTrainer` function. This function automatically trained several models, including Logistic Regression, SVC, and Random Forest, and evaluated them using cross-validation metrics.

In [4]:
trained_models, initial_scores = ClassifierTrainer(preprocessor, X_train, y_train, seed=42)
pd.concat(initial_scores, axis=1) # might change to use Result handler

Unnamed: 0_level_0,dummy,dummy,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std,mean,std
fit_time,0.001092,0.000358,0.001765,0.000331,0.001121,3.7e-05,0.039312,0.000288
score_time,0.002533,0.000309,0.00245,1.1e-05,0.002483,3.5e-05,0.004324,6.3e-05
test_accuracy,0.333333,0.0,0.942857,0.039841,0.942857,0.039841,0.942857,0.039841
train_accuracy,0.357143,0.0,0.954762,0.005324,0.97381,0.00996,1.0,0.0
test_precision,0.111111,0.0,0.949339,0.036236,0.953148,0.031001,0.949339,0.036236
train_precision,0.127551,0.0,0.955993,0.006492,0.97521,0.009469,1.0,0.0
test_recall,0.333333,0.0,0.942857,0.039841,0.942857,0.039841,0.942857,0.039841
train_recall,0.357143,0.0,0.954762,0.005324,0.97381,0.00996,1.0,0.0
test_f1,0.166667,0.0,0.942552,0.040005,0.942023,0.040642,0.942552,0.040005
train_f1,0.18797,0.0,0.954726,0.005275,0.973775,0.009977,1.0,0.0


As Alex examined the results, they noticed that Random Forest showed promise, but its performance could likely improve with feature selection and hyperparameter tuning.

## Selecting Features for Simplicity
With a sense that the dataset contained redundant features, Alex decided to use the `FeatureSelector` function to refine the models further. They opted for **Recursive Feature Elimination (RFE)** to identify and select the most informative features.

In [5]:
feature_selected_models = FeatureSelector(
    preprocessor, trained_models, X_train, y_train, method='RFE', n_features_to_select=2
)

Now equipped with models that only used the top two features, Alex could already see improvements in simplicity and interpretability. But there was one final step to maximize the models’ potential.

## Optimizing Model Hyperparameters
The models were performing well, but Alex wanted them to be great. Enter the `ClassifierOptimizer` function. This function searched for the best hyperparameters using `RandomizedSearchCV`, tuning each model to its optimal configuration.

In [6]:
optimized_models, optimized_scores = ClassifierOptimizer(
    feature_selected_models, X_train, y_train, scoring='f1', n_iter=50, random_state=42
)
pd.concat(optimized_scores, axis=1) # might change to use Result handler


Training logreg...

Training svc...

Training random_forest...


Unnamed: 0_level_0,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std
fit_time,0.003958,0.000129,0.002467,7e-05,0.130747,0.000622
score_time,0.002571,9.8e-05,0.002607,6.9e-05,0.003194,5.3e-05
test_accuracy,0.971429,0.042592,0.952381,0.033672,0.952381,0.047619
train_accuracy,0.961905,0.013041,0.940476,0.008418,0.990476,0.005324
test_precision,0.97672,0.033796,0.959921,0.026337,0.957804,0.042985
train_precision,0.965881,0.010422,0.944519,0.009479,0.990789,0.005149
test_recall,0.971429,0.042592,0.952381,0.033672,0.952381,0.047619
train_recall,0.961905,0.013041,0.940476,0.008418,0.990476,0.005324
test_f1,0.971172,0.042973,0.95199,0.033972,0.952162,0.047833
train_f1,0.961743,0.013238,0.940283,0.00842,0.990474,0.005325


After a few minutes of computation, the function returned the optimized models and their performance metrics. Alex was thrilled to see that the Random Forest classifier now had the best F1 score among all models.


## Summarizing the Results
To present the results clearly, Alex used the `ResultHandler` function to compile the performance metrics and hyperparameters into a tidy DataFrame.

In [7]:
# results_df = ResultHandler(initial_scores, optimized_scores)
# print(results_df)

## Testing the Final Model
The optimized Random Forest model was deployed for testing on the holdout dataset. Alex measured its performance and confirmed that it generalized well beyond the training data.

In [8]:
# Testing the final Random Forest model
final_model = optimized_models['random_forest']
y_pred = final_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00        13

    accuracy                           1.00        45
   macro avg       1.00      1.00      1.00        45
weighted avg       1.00      1.00      1.00        45



## Reflection
By combining tools like `ClassifierTrainer`, `FeatureSelector`, `ClassifierOptimizer`, and `ResultHandler`, Alex had created an efficient, repeatable, and transparent machine learning pipeline. It was a productive day of problem-solving, leaving Alex eager to tackle their next project.