# Example usage

## The Journey to Building the Best Classifier: A Regular Story

It was an overcast Wednesday morning when Alex decided to tackle a classification challenge. Armed with the `sklearn` datasets and a toolbox of custom Python functions from the `classifierpromax` package, Alex set out to build the best possible model to classify data efficiently and effectively.

In [35]:
import pandas as pd
import numpy as np
from classifierpromax.ClassifierTrainer import ClassifierTrainer
from classifierpromax.ClassifierOptimizer import ClassifierOptimizer
from classifierpromax.FeatureSelector import FeatureSelector
from classifierpromax.ResultHandler import ResultHandler
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn import datasets

## Loading the Dataset
Alex started by loading a familiar dataset from `sklearn`. For this project, the Iris dataset served as the training grounds—a well-known dataset for classifying different types of iris flowers.

In [36]:
# Load the dataset
iris = datasets.load_iris()
X, y = pd.DataFrame(iris.data, columns=iris.feature_names), pd.Series(iris.target)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=421)
pd.concat([X_train, y_train], axis=1)

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),0
84,5.4,3.0,4.5,1.5,1
148,6.2,3.4,5.4,2.3,2
53,5.5,2.3,4.0,1.3,1
5,5.4,3.9,1.7,0.4,0
93,5.0,2.3,3.3,1.0,1
...,...,...,...,...,...
71,6.1,2.8,4.0,1.3,1
106,4.9,2.5,4.5,1.7,2
14,5.8,4.0,1.2,0.2,0
92,5.8,2.6,4.0,1.2,1


With the data ready, Alex built a preprocessing pipeline to ensure the input features were standardized.

In [37]:
# Preprocessing pipeline
preprocessor = make_pipeline(StandardScaler())

## Training the Initial Models
The first step in Alex’s workflow was training baseline models using the `ClassifierTrainer` function. This function automatically trained several models, including Logistic Regression, SVC, and Random Forest, and evaluated them using cross-validation metrics.

In [38]:
trained_models, initial_scores = ClassifierTrainer(preprocessor, X_train, y_train, seed=421)
pd.concat(initial_scores, axis=1) # might change to use Result handler

Unnamed: 0_level_0,dummy,dummy,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std,mean,std
fit_time,0.001692,0.000566169,0.001818,0.000376,0.001029,7e-05,0.037255,0.000146
score_time,0.004134,0.001808231,0.002404,0.000139,0.002261,2.9e-05,0.004107,3.2e-05
test_accuracy,0.333333,0.0,0.92,0.07303,0.946667,0.07303,0.893333,0.076012
train_accuracy,0.366667,0.0,0.94,0.022361,0.98,0.007454,1.0,0.0
test_precision,0.111111,1.551584e-17,0.928444,0.071031,0.961905,0.052164,0.904952,0.072952
train_precision,0.134444,0.0,0.941105,0.022996,0.981148,0.006543,1.0,0.0
test_recall,0.333333,0.0,0.92,0.07303,0.946667,0.07303,0.893333,0.076012
train_recall,0.366667,0.0,0.94,0.022361,0.98,0.007454,1.0,0.0
test_f1,0.166667,0.0,0.919519,0.072924,0.945778,0.074247,0.892222,0.076538
train_f1,0.196748,0.0,0.939973,0.022325,0.979979,0.00748,1.0,0.0


As Alex examined the results, they noticed that Random Forest showed promise, but its performance could likely improve with feature selection and hyperparameter tuning.

## Selecting Features for Simplicity
With a sense that the dataset contained redundant features, Alex decided to use the `FeatureSelector` function to refine the models further. They opted for **Recursive Feature Elimination (RFE)** to identify and select the most informative features.

In [39]:
feature_selected_models = FeatureSelector(
    preprocessor, trained_models, X_train, y_train, method='RFE', n_features_to_select=2
)

Now equipped with models that only used the top two features, Alex could already see improvements in simplicity and interpretability. But there was one final step to maximize the models’ potential.

## Optimizing Model Hyperparameters
The models were performing well, but Alex wanted them to be great. Enter the `ClassifierOptimizer` function. This function searched for the best hyperparameters using `RandomizedSearchCV`, tuning each model to its optimal configuration.

In [40]:
optimized_models, optimized_scores = ClassifierOptimizer(
    feature_selected_models, X_train, y_train, scoring='f1', n_iter=10, random_state=421
)
pd.concat(optimized_scores, axis=1) # might change to use Result handler


Training logreg...

Training svc...

Training random_forest...


Unnamed: 0_level_0,logreg,logreg,svc,svc,random_forest,random_forest
Unnamed: 0_level_1,mean,std,mean,std,mean,std
fit_time,0.003884,0.000142,0.002529,0.000255,0.130282,0.002758
score_time,0.002527,7.3e-05,0.002502,5.5e-05,0.003265,0.000124
test_accuracy,0.933333,0.066667,0.946667,0.055777,0.92,0.055777
train_accuracy,0.94,0.014907,0.936667,0.018257,0.99,0.009129
test_precision,0.950476,0.047809,0.950794,0.054825,0.932063,0.04986
train_precision,0.948938,0.010463,0.938374,0.019488,0.990448,0.00872
test_recall,0.933333,0.066667,0.946667,0.055777,0.92,0.055777
train_recall,0.94,0.014907,0.936667,0.018257,0.99,0.009129
test_f1,0.932217,0.067778,0.946304,0.055888,0.919421,0.056274
train_f1,0.939554,0.015163,0.936582,0.018263,0.989998,0.00913


After a few minutes of computation, the function returned the optimized models and their performance metrics. Alex was thrilled to see that the Random Forest classifier now had the best F1 score among all models.


## Summarizing the Results
To present the results clearly, Alex used the `ResultHandler` function to compile the performance metrics and hyperparameters into a tidy DataFrame.

In [41]:
# results_df = ResultHandler(initial_scores, optimized_scores)
# print(results_df)

## Testing the Final Model
The optimized Random Forest model was deployed for testing on the test dataset. Alex measured its performance and confirmed that it generalized well beyond the training data.

In [42]:
# Testing the final Random Forest model
final_model = optimized_models['random_forest']
y_pred = final_model.predict(X_test)

print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        29
           1       1.00      1.00      1.00        23
           2       1.00      1.00      1.00        23

    accuracy                           1.00        75
   macro avg       1.00      1.00      1.00        75
weighted avg       1.00      1.00      1.00        75



## Reflection
By combining tools like `ClassifierTrainer`, `FeatureSelector`, `ClassifierOptimizer`, and `ResultHandler`, Alex had created an efficient, repeatable, and transparent machine learning pipeline. It was a productive day of problem-solving, leaving Alex eager to tackle their next project.