# Example usage

## The Journey to Building the Best Classifier: A Regular Story

It was an overcast Wednesday morning when Alex decided to tackle a classification challenge. Armed with the `sklearn` datasets and a toolbox of custom Python functions from the `classifierpromax` package, Alex set out to build the best possible model to classify data efficiently and effectively.

In [1]:
import pandas as pd
import numpy as np
from classifierpromax.ClassifierTrainer import ClassifierTrainer
from classifierpromax.ClassifierOptimizer import ClassifierOptimizer
from classifierpromax.FeatureSelector import FeatureSelector
from classifierpromax.ResultHandler import ResultHandler
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn import datasets

## Loading the Dataset
Alex started by loading a familiar dataset from `sklearn`. For this project, the Iris dataset served as the training grounds—a well-known dataset for classifying different types of iris flowers.

In [2]:
# Load the dataset
iris = datasets.load_iris()
X, y = pd.DataFrame(iris.data, columns=iris.feature_names), pd.Series(iris.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
pd.concat([X_train, y_train], axis=1)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),0
22,4.6,3.6,1.0,0.2,0
15,5.7,4.4,1.5,0.4,0
65,6.7,3.1,4.4,1.4,1
11,4.8,3.4,1.6,0.2,0
42,4.4,3.2,1.3,0.2,0
...,...,...,...,...,...
71,6.1,2.8,4.0,1.3,1
106,4.9,2.5,4.5,1.7,2
14,5.8,4.0,1.2,0.2,0
92,5.8,2.6,4.0,1.2,1


With the data ready, Alex built a preprocessing pipeline to ensure the input features were standardized.

In [3]:
# Preprocessing pipeline
preprocessor = make_pipeline(StandardScaler())

## Training the Initial Models
The first step in Alex’s workflow was training baseline models using the `ClassifierTrainer` function. This function automatically trained several models, including Logistic Regression, SVC, and Random Forest, and evaluated them using cross-validation metrics.

In [4]:
trained_models, initial_scores = ClassifierTrainer(preprocessor, X_train, y_train, seed=42)
pd.concat(initial_scores, axis=1) # might change to use Result handler

Unnamed: 0_level_0,dummy,dummy,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std,mean,std
fit_time,0.004301,0.001606887,0.011091,0.004068,0.004701,0.002998,0.166458,0.00463
score_time,0.012599,0.001685312,0.0117,0.003424,0.012587,0.002765,0.021542,0.003534
test_accuracy,0.333333,0.0,0.958333,0.051031,0.95,0.068465,0.941667,0.063191
train_accuracy,0.341667,0.004658475,0.970833,0.011411,0.975,0.011877,1.0,0.0
test_precision,0.111111,1.551584e-17,0.966911,0.037309,0.96287,0.045361,0.946574,0.05932
train_precision,0.116753,0.003154176,0.971805,0.011092,0.975868,0.011727,1.0,0.0
test_recall,0.333333,0.0,0.958333,0.051031,0.95,0.068465,0.941667,0.063191
train_recall,0.341667,0.004658475,0.970833,0.011411,0.975,0.011877,1.0,0.0
test_f1,0.166667,0.0,0.957289,0.052895,0.947644,0.073191,0.940971,0.06427
train_f1,0.174031,0.004116792,0.970821,0.01141,0.974996,0.011872,1.0,0.0


As Alex examined the results, they noticed that Random Forest showed promise, but its performance could likely improve with feature selection and hyperparameter tuning.

## Selecting Features for Simplicity
With a sense that the dataset contained redundant features, Alex decided to use the `FeatureSelector` function to refine the models further. They opted for **Recursive Feature Elimination (RFE)** to identify and select the most informative features.

In [5]:
feature_selected_models = FeatureSelector(
    preprocessor, trained_models, X_train, y_train, method='RFE', n_features_to_select=2
)

Now equipped with models that only used the top two features, Alex could already see improvements in simplicity and interpretability. But there was one final step to maximize the models’ potential.

## Optimizing Model Hyperparameters
The models were performing well, but Alex wanted them to be great. Enter the `ClassifierOptimizer` function. This function searched for the best hyperparameters using `RandomizedSearchCV`, tuning each model to its optimal configuration.

In [7]:
optimized_models, optimized_scores = ClassifierOptimizer(
    feature_selected_models, X_train, y_train, scoring='f1', n_iter=50, random_state=42
)
pd.concat(optimized_scores, axis=1) # might change to use Result handler


Training logreg...

Training svc...

Training random_forest...


Unnamed: 0_level_0,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std
fit_time,0.026346,0.004516,0.010971,0.002906,0.599469,0.009448
score_time,0.014132,0.004257,0.012767,0.004132,0.014775,0.000935
test_accuracy,0.975,0.055902,0.958333,0.051031,0.958333,0.072169
train_accuracy,0.96875,0.012758,0.958333,0.024429,0.989583,0.007366
test_precision,0.981818,0.040656,0.966911,0.037309,0.961481,0.067363
train_precision,0.970391,0.012431,0.958708,0.024365,0.989777,0.007368
test_recall,0.975,0.055902,0.958333,0.051031,0.958333,0.072169
train_recall,0.96875,0.012758,0.958333,0.024429,0.989583,0.007366
test_f1,0.974089,0.057939,0.957289,0.052895,0.957772,0.073315
train_f1,0.968732,0.012759,0.958338,0.024427,0.989584,0.007366


After a few minutes of computation, the function returned the optimized models and their performance metrics. Alex was thrilled to see that the Random Forest classifier now had the best F1 score among all models.


## Summarizing the Results
To present the results clearly, Alex used the `ResultHandler` function to compile the performance metrics and hyperparameters into a tidy DataFrame.

In [None]:
# results_df = ResultHandler(initial_scores, optimized_scores)
# print(results_df)

## Testing the Final Model
The optimized Random Forest model was deployed for testing on the holdout dataset. Alex measured its performance and confirmed that it generalized well beyond the training data.

In [20]:
# Testing the final Random Forest model
final_model = optimized_models['random_forest']
y_pred = final_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



## Reflection
By combining tools like `ClassifierTrainer`, `FeatureSelector`, `ClassifierOptimizer`, and `ResultHandler`, Alex had created an efficient, repeatable, and transparent machine learning pipeline. It was a productive day of problem-solving, leaving Alex eager to tackle their next project.