# Introduction
This is a simple notebook exploring the usage of python based ML library Poniard (https://github.com/rxavier/poniard) which present a stop mult-model comparsion platform for classification and regression problem

Installation

In [1]:
#!pip install poniard

For exploration purpose, we can consider a toy data available with sklearn, however, one remains free to use any other data as well and explore!!

In [2]:
from sklearn.datasets import load_breast_cancer
from poniard import PoniardClassifier

X, y = load_breast_cancer(return_X_y=True, as_frame=True)
pclf = PoniardClassifier(random_state=0).setup(X, y)

Main metric: roc_auc
Minimum unique values to consider a number feature numeric: 56
Minimum unique values to consider a non-number feature high cardinality: 20

Inferred feature types:
                    numeric categorical_high categorical_low datetime
0               mean radius                                          
1              mean texture                                          
2            mean perimeter                                          
3                 mean area                                          
4           mean smoothness                                          
5          mean compactness                                          
6            mean concavity                                          
7       mean concave points                                          
8             mean symmetry                                          
9    mean fractal dimension                                          
10             radius error                  

Before proceeding ahead, we can have a glimpse into the default preprocessing available with the library

In [3]:
pclf.preprocessor_

model training configuration

In [4]:
pclf.cv_

StratifiedKFold(n_splits=5, random_state=0, shuffle=True)

In [5]:
pclf.metrics_

['roc_auc', 'accuracy', 'precision', 'recall', 'f1']

model considered

In [6]:
pclf.estimators_.keys()

dict_keys(['LogisticRegression', 'GaussianNB', 'LinearSVC', 'KNeighborsClassifier', 'DecisionTreeClassifier', 'RandomForestClassifier', 'HistGradientBoostingClassifier', 'XGBClassifier', 'DummyClassifier'])

Now, once we have gained insight in the model configuration, types and training methodology, we fit the model

In [7]:
pclf.fit()

Completed: 100%|██████████| 9/9 [00:11<00:00,  1.22s/it]                     


PoniardClassifier(estimators=None, metrics=None,
    preprocess=True, scaler=standard, numeric_imputer=simple,
    custom_preprocessor=None, numeric_threshold=0.1,
    cardinality_threshold=20, cv=None, verbose=0,
    random_state=0, n_jobs=None, plugins=None,
    plot_options=PoniardPlotFactory())
            

show_results() provides the result obtained using CV based model fitting

In [8]:
pclf.show_results()

Unnamed: 0,test_roc_auc,train_roc_auc,test_accuracy,train_accuracy,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1,fit_time,score_time
LogisticRegression,0.995456,0.997511,0.978916,0.988137,0.975411,0.98613,0.991549,0.995095,0.983351,0.990589,0.01666,0.013003
HistGradientBoostingClassifier,0.994128,1.0,0.970129,1.0,0.967263,1.0,0.985955,1.0,0.976433,1.0,0.979213,0.027605
XGBClassifier,0.994123,1.0,0.970129,1.0,0.967554,1.0,0.985915,1.0,0.976469,1.0,0.189375,0.027806
LinearSVC,0.992901,0.998985,0.968359,0.989895,0.974993,0.98751,0.974765,0.996496,0.974783,0.991982,0.01488,0.013422
RandomForestClassifier,0.992264,1.0,0.964881,1.0,0.964647,1.0,0.980282,1.0,0.972192,1.0,0.29234,0.042403
GaussianNB,0.98873,0.988861,0.9297,0.939369,0.940993,0.941821,0.949413,0.962883,0.9443,0.952219,0.008902,0.013409
KNeighborsClassifier,0.98061,0.998064,0.964881,0.978472,0.955018,0.97003,0.991628,0.996501,0.972746,0.983079,0.00889,0.131973
DecisionTreeClassifier,0.920983,1.0,0.926223,1.0,0.941672,1.0,0.94108,1.0,0.941054,1.0,0.019817,0.014049
DummyClassifier,0.5,0.5,0.627418,0.627417,0.627418,0.627417,1.0,1.0,0.771052,0.771058,0.012467,0.021321


# Building with Custom defined setting

## case1: When using defined model in pipeline

In [9]:

from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from poniard import PoniardClassifier

pclf = PoniardClassifier(random_state=0,estimators={"lr": LogisticRegression(max_iter=1000),
                                    "lr_no_penalty": LogisticRegression(max_iter=100, penalty="none"),
                                    "lda": LinearDiscriminantAnalysis()})
pclf.setup(X, y)
pclf.fit()

lr:   0%|          | 0/4 [00:00<?, ?it/s]

Main metric: roc_auc
Minimum unique values to consider a number feature numeric: 56
Minimum unique values to consider a non-number feature high cardinality: 20

Inferred feature types:
                    numeric categorical_high categorical_low datetime
0               mean radius                                          
1              mean texture                                          
2            mean perimeter                                          
3                 mean area                                          
4           mean smoothness                                          
5          mean compactness                                          
6            mean concavity                                          
7       mean concave points                                          
8             mean symmetry                                          
9    mean fractal dimension                                          
10             radius error                  

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
Completed: 100%|██████████| 4/4 [00:01<00:00,  3.45it/s]      


PoniardClassifier(estimators={'lr': LogisticRegression(max_iter=1000, random_state=0), 'lr_no_penalty': LogisticRegression(penalty='none', random_state=0), 'lda': LinearDiscriminantAnalysis()}, metrics=None,
    preprocess=True, scaler=standard, numeric_imputer=simple,
    custom_preprocessor=None, numeric_threshold=0.1,
    cardinality_threshold=20, cv=None, verbose=0,
    random_state=0, n_jobs=None, plugins=None,
    plot_options=PoniardPlotFactory())
            

In [10]:
pclf.show_results()

Unnamed: 0,test_roc_auc,train_roc_auc,test_accuracy,train_accuracy,test_precision,train_precision,test_recall,train_recall,test_f1,train_f1,fit_time,score_time
lr,0.995456,0.997511,0.978916,0.988137,0.975411,0.98613,0.991549,0.995095,0.983351,0.990589,0.022983,0.020219
lda,0.990223,0.996792,0.954308,0.965728,0.93689,0.951829,0.994405,0.995799,0.964749,0.973307,0.016805,0.016446
lr_no_penalty,0.986705,1.0,0.950769,0.99956,0.968989,1.0,0.952308,0.999298,0.960329,0.999649,0.025204,0.015374
DummyClassifier,0.5,0.5,0.627418,0.627417,0.627418,0.627417,1.0,1.0,0.771052,0.771058,0.009397,0.01579


## case 2: Changing metric in model definition

In [11]:


#X, y = make_regression()
pclf = PoniardClassifier(metrics=["neg_log_loss"],estimators={"lr": LogisticRegression(max_iter=1000),
                                    "lr_no_penalty": LogisticRegression(max_iter=1000, penalty="none"),
                                    "lda": LinearDiscriminantAnalysis()})
pclf.setup(X, y)
pclf.fit()

lr_no_penalty:  25%|██▌       | 1/4 [00:00<00:00,  6.23it/s]

Main metric: neg_log_loss
Minimum unique values to consider a number feature numeric: 56
Minimum unique values to consider a non-number feature high cardinality: 20

Inferred feature types:
                    numeric categorical_high categorical_low datetime
0               mean radius                                          
1              mean texture                                          
2            mean perimeter                                          
3                 mean area                                          
4           mean smoothness                                          
5          mean compactness                                          
6            mean concavity                                          
7       mean concave points                                          
8             mean symmetry                                          
9    mean fractal dimension                                          
10             radius error             

Completed: 100%|██████████| 4/4 [00:00<00:00,  5.82it/s]      


PoniardClassifier(estimators={'lr': LogisticRegression(max_iter=1000, random_state=0), 'lr_no_penalty': LogisticRegression(max_iter=1000, penalty='none', random_state=0), 'lda': LinearDiscriminantAnalysis()}, metrics=['neg_log_loss'],
    preprocess=True, scaler=standard, numeric_imputer=simple,
    custom_preprocessor=None, numeric_threshold=0.1,
    cardinality_threshold=20, cv=None, verbose=0,
    random_state=0, n_jobs=None, plugins=None,
    plot_options=PoniardPlotFactory())
            

In [12]:
pclf.show_results()

Unnamed: 0,test_neg_log_loss,train_neg_log_loss,fit_time,score_time
lr,-0.073786,-0.05296873,0.01768,0.004923
lda,-0.13966,-0.08686903,0.015739,0.006414
DummyClassifier,-0.660334,-0.6603142,0.010395,0.008578
lr_no_penalty,-1.655873,-5.829453e-07,0.025737,0.006319
