In [11]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, fbeta_score
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.datasets import load_breast_cancer, load_iris

In [12]:
# import datasets and load them as pandas frames

iris_set = load_iris()
breast_cancer_set = load_breast_cancer()

iris_raw = pd.DataFrame(iris_set.data, columns=iris_set.feature_names)
breast_cancer_raw = pd.DataFrame(breast_cancer_set.data, columns=breast_cancer_set.feature_names)

breast_cancer_raw.head(5)

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [13]:
# See the targets also
iris_target = iris_set['target']
breast_cancer_target = breast_cancer_set['target']

### Quick exploration and rule setting

From `sklearn description` it appears that both dataset labels are fairly balanced (40-60 at worse) in the case of iris are perfectly balanced. 

So for the iris we will pick straight **accuracy** as our evaluation metric,since we don't have significant interest on putting more weight of importance on predicting a specific class better than the other.

For the breast cancer, surely we are more interested in true positives, and false negatives. So we are mostly interested on prediction positive cases. So recall would be our first priority here, but because we want to be careful and not classify also people with no cancer as positive, we will pick some kind of mean between recall and precision, with a weight towards recall, namely **f2_score**.

#### Scaling

Last but not least, after splitting the data (for both training and test set) we will scale numerical columns to values between 0 and 1. This scaled version of both datasets will be used only for the perceptron training. The logic behind this action is that although both are linear models, and each input will have it's corresponding co-efficient, when updating the weights in perceptron training after a miss-classified point, we want the update to be as smooth as it gets and not to depend on extreme values because a unit is bigger than the other. So we make the change a bit less sensitive (since the input of the new point to be added/removed will be scaled down).

#### Note

We could also avoid to scale down Iris since all of it's independent variables are in the same scale (cm).

In [15]:
# Let's find types of our dataframes columns

print(iris_raw.dtypes)
print(breast_cancer_raw.dtypes)

sepal length (cm)    float64
sepal width (cm)     float64
petal length (cm)    float64
petal width (cm)     float64
dtype: object
mean radius                float64
mean texture               float64
mean perimeter             float64
mean area                  float64
mean smoothness            float64
mean compactness           float64
mean concavity             float64
mean concave points        float64
mean symmetry              float64
mean fractal dimension     float64
radius error               float64
texture error              float64
perimeter error            float64
area error                 float64
smoothness error           float64
compactness error          float64
concavity error            float64
concave points error       float64
symmetry error             float64
fractal dimension error    float64
worst radius               float64
worst texture              float64
worst perimeter            float64
worst area                 float64
worst smoothness           flo

Since all of the columns are numerical and we don't deal with categorical variables or strings to hot encode, we will proceed with the train_test_split (and a random_state so we have the same always) and the both train and test set scales. So we have two different version for logistic regression and our perceptron as well.

In [18]:
# Split training and test sets

iris_X_train, iris_X_test, iris_Y_train, iris_Y_test = train_test_split(iris_raw, iris_target, test_size=0.35, random_state=323)
breast_cancer_X_train, breast_cancer_X_test, breast_cancer_Y_train, breast_cancer_Y_test = train_test_split(breast_cancer_raw, breast_cancer_target, test_size=0.35, random_state=313)

In [20]:
# Scale using minmax scaler for perceptron

min_max_scaler = MinMaxScaler()

breast_X_train_scaled = min_max_scaler.fit_transform(breast_cancer_X_train)
breast_X_test_scaled = min_max_scaler.fit_transform(breast_cancer_X_test)

## Let the training begin

We will start with the iris dataset, first logisitic regression and then the perceptron. To answer the question we will try to apply the same hyperparameters to estimate which algorithm performs better and then having put that out of the way, we will try and tune perceptron a bit.

In [100]:
# train logistic regression with iris

lr_model = LogisticRegression(penalty="elasticnet", l1_ratio=0.5, solver="saga", max_iter=2000, random_state=101).fit(iris_X_train, iris_Y_train)
lr_predictions = lr_model.predict(iris_X_test)
print(accuracy_score(iris_Y_test, lr_predictions))

0.9811320754716981


In [106]:
# Train our perceptron model 

prct_model = Perceptron(penalty="elasticnet", l1_ratio=0.5, max_iter=300, random_state=105).fit(iris_X_train, iris_Y_train)
prct_predictions = prct_model.predict(iris_X_test)
print(accuracy_score(iris_Y_test, prct_predictions))

0.33962264150943394


## First impressions

Logistic regressions appears far more stable no matter what the parameters passed are, predicted accuracy comes close to 98% almost all of the times. We have tried different combinations of **lasso l1**, **ridge l2** and combined **elasticnet** regularization as a method to penalize the slope, and the outcome is the same. 

At the same time percpetron seems very sensible to hyperparameter tuning, not only the type of slope penalty we apply, but also to the **lambda** ratio that adjusts the penalty. The variation of accuracy comes from as low as **33%** as high as **92%** as we will demonstrate below with a fine l1_ratio tuning. 

#### Note:

A positive side I noticed is that perceptron model takes a lot smaller iterations to converge than equivalent logistic regression. Depends also a lot on the solver, the dataset, if it's multinomial or not, it's size and more, but for now pound to pound it appeared to me that perceptron converged significantly faster, no matter the outcome.

In [107]:
# Train our perceptron model with optimal l1_ratio

prct_model = Perceptron(penalty="elasticnet", l1_ratio=0.9, max_iter=300, random_state=105).fit(iris_X_train, iris_Y_train)
prct_predictions = prct_model.predict(iris_X_test)
print(accuracy_score(iris_Y_test, prct_predictions))

0.9245283018867925


### Outcome thoughts

Specifically for this dataset, logistic regression appears to be not only more stable, but also more accurate in any case we tried. Perceptron has it's ups and down's and with fine tuning and correct regularization can also be very competitive, but also very sensitive, which is giving me thoughts if there is an **overfitting** pattern and it just performs better in some cases because the sample size it's small and it just fits better with these params. (for example try to adjust the l1_ratio from 0.9 to either 0.8 or to 1 and hell unleashes).

On to the breast cancer dataset

In [167]:
# train logistic regression with breast cancer not scaled

lr_model = LogisticRegression(penalty="elasticnet", l1_ratio=0.5, solver="saga",max_iter=4000, random_state=101).fit(breast_cancer_X_train, breast_cancer_Y_train)
lr_predictions = lr_model.predict(breast_cancer_X_test)
print(fbeta_score(breast_cancer_Y_test, lr_predictions, average="micro", beta=2.0))

0.915


### Let's try the scaled features for perceptron

In [168]:
# Train our perceptron model non scaled

prct_model = Perceptron(penalty="elasticnet", l1_ratio=0.9, max_iter=300, random_state=105).fit(breast_cancer_X_train, breast_cancer_Y_train)
prct_predictions = prct_model.predict(breast_cancer_X_test)
print(fbeta_score(breast_cancer_Y_test, prct_predictions, average="micro", beta=2.0))

0.9200000000000002


In [169]:
# Train our perceptron model scaled data

prct_model = Perceptron(penalty="elasticnet", l1_ratio=0.9, max_iter=300, random_state=105).fit(breast_X_train_scaled, breast_cancer_Y_train)
prct_predictions = prct_model.predict(breast_X_test_scaled)
print(fbeta_score(breast_cancer_Y_test, prct_predictions, average="micro", beta=2.0))

0.975


### A significant improvement based on the scaled features

But maybe it was a co-incidence let's try and have a look with an **l2** ridge type of regularization penalty

In [170]:
# Train our perceptron model scaled data

prct_model = Perceptron(penalty="l2", max_iter=300, random_state=105).fit(breast_X_train_scaled, breast_cancer_Y_train)
prct_predictions = prct_model.predict(breast_X_test_scaled)
print(fbeta_score(breast_cancer_Y_test, prct_predictions, average="micro", beta=2.0))

0.98


In [171]:
# Let's see how logistic regression classifier performs under l2, 

# train logistic regression with breast cancer not scaled

lr_model = LogisticRegression(penalty="l2", max_iter=4000, random_state=101).fit(breast_cancer_X_train, breast_cancer_Y_train)
lr_predictions = lr_model.predict(breast_cancer_X_test)
print(fbeta_score(breast_cancer_Y_test, lr_predictions, average="micro", beta=2.0))

0.96


## Thoughts on the outcome

For the second dataset we can conclude that for both classifiers the penalty set to **L2** helped significantly (regardless of changing the score, with accuracy worked the same). This may have to do due to the change of the solver (probably **saga** solver does not produce the same results with elastic net config here). 

Scaling also improved our perceptron model, to achieve high **f2_score** accuracy and perform slightly better than logistic regression on this dataset. 