#  Session 4

In this session, we will be using one of the classification tasks
found in OpenML as the basis for a mock-up exam. More precisely, the
task
[*semeion*](https://www.openml.org/search?type=data&status=any&id=1501)
is selected. This is another handwritten digit dataset. Specifically,
it is composed by 1593 handwritten digits from around 80 different
people. Each digit is represented as a binary image of 16x16 pixels
(256 values).

You may need to run this code if this is the first time you are running this notebook.

In [2]:
!pip install scikit-learn

Collecting scikit-learn
  Using cached scikit_learn-1.8.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (11 kB)
Collecting numpy>=1.24.1 (from scikit-learn)
  Using cached numpy-2.3.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (62 kB)
Collecting scipy>=1.10.0 (from scikit-learn)
  Using cached scipy-1.16.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (62 kB)
Collecting joblib>=1.3.0 (from scikit-learn)
  Using cached joblib-1.5.2-py3-none-any.whl.metadata (5.6 kB)
Collecting threadpoolctl>=3.2.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Using cached scikit_learn-1.8.0-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (8.9 MB)
Using cached joblib-1.5.2-py3-none-any.whl (308 kB)
Using cached numpy-2.3.5-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (16.6 MB)
Using cached scipy-1.16.3-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (35.7

Before starting the mock-up exam, you must download the data
(**semeion_X.npy** and **semeion_y.npy**) and the logistic regression
library (**Logistic_Regression.py**) from poliFormat.

In [3]:
# Execute this cell only when running in Google Colab 

# You need to upload LogisticRegression.py
# from google.colab import files
# uploaded = files.upload()

# You need to upload semeion_X.npy 
# from google.colab import files
# uploaded = files.upload()

# You need to upload semeion_y.npy
# from google.colab import files
# uploaded = files.upload()

<p style="page-break-after:always;"></p>

Below you can find a baseline result achieved with the logistic regression classifier using default parameters with batch size 10, and devoting 80% of the samples to training and 20% to test (with seed random_state$=$23).

In [4]:
from LogisticRegression import LogisticRegressionClassification, LogisticRegressionTraining
import warnings; warnings.filterwarnings("ignore"); import numpy as np
from sklearn.model_selection import train_test_split

# Load data
X = np.load('semeion_X.npy'); y = np.load('semeion_y.npy')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=23)
N = len(X_train); M = len(X_test)

# Train and classify
W = LogisticRegressionTraining(X_train, y_train, bs=10)
haty_test = LogisticRegressionClassification(X_test, W)
accuracy = np.sum(haty_test==y_test)/M
print(f"Test error: {1.0-accuracy:.1%}")

Test error: 12.5%


<p style="page-break-after:always;"></p>

### Exercise 1
Applying the logistic regression classifier with default parameter values and batch size 10, adjust the maximum number of epochs in logarithmic scale to determine an optimal value. Report the classification error rate on the training and test sets. Is overfitting observed? If so, from what epoch?

In [15]:
print("  bs maxEps     eta trainErr testErr")
print("---- ------ ------- -------- -------")
bs=10; eta=1e-2;
for maxEpochs in (5, 10, 20, 50, 100, 200, 500, 1000, 2000):
    W = LogisticRegressionTraining(X_train,y_train, bs=bs, maxEpochs=maxEpochs, eta=eta)
    haty_train = LogisticRegressionClassification(X_train,W)
    acc_train = np.sum(haty_train==y_train)/N
    haty_test = LogisticRegressionClassification(X_test,W)
    acc_test = np.sum(haty_test==y_test)/M
    print(f"{bs:4d} {maxEpochs:6d} {eta:.1e} {1.0-acc_train:8.1%} {1.0-acc_test:7.1%}")

  bs maxEps     eta trainErr testErr
---- ------ ------- -------- -------
  10      5 1.0e-02    10.3%   14.4%
  10     10 1.0e-02     8.0%   12.5%
  10     20 1.0e-02     5.6%   10.7%
  10     50 1.0e-02     1.8%    9.4%
  10    100 1.0e-02     0.5%    8.5%
  10    200 1.0e-02     0.1%    9.1%
  10    500 1.0e-02     0.1%    9.7%
  10   1000 1.0e-02     0.1%    9.7%
  10   2000 1.0e-02     0.1%    9.7%


<p style="page-break-after:always;"></p>
Overfitting is observed from epoch 200 onwards, becuase the test error increases, while train error decreases.<br>
<b>The best value is 100, as the test error is the lowest (and we want a model that generalizes).</b>

### Exercise 2
Using maximum number of epochs 2000, learning rate (eta) 1e-2 and
applying the logistic regression classifier, adjust the batch size in logarithmic
scale to determine an optimal value. Report the classification error rate on the training and test sets.

In [6]:
print("  bs maxEps     eta trainErr testErr")
print("---- ------ ------- -------- -------")
maxEpochs=2000; eta=1e-2;
for bs in (1, 2, 5, 10, 20, 50, 100):
    W = LogisticRegressionTraining(X_train,y_train, bs=bs, maxEpochs=maxEpochs, eta=eta)
    haty_train = LogisticRegressionClassification(X_train,W)
    acc_train = np.sum(haty_train==y_train)/N
    haty_test = LogisticRegressionClassification(X_test,W)
    acc_test = np.sum(haty_test==y_test)/M
    print(f"{bs:4d} {maxEpochs:6d} {eta:.1e} {1.0-acc_train:8.1%} {1.0-acc_test:7.1%}")

  bs maxEps     eta trainErr testErr
---- ------ ------- -------- -------
   1   2000 1.0e-02     0.0%   10.0%
   2   2000 1.0e-02     0.0%   10.0%
   5   2000 1.0e-02     0.0%    9.7%
  10   2000 1.0e-02     0.1%    9.7%
  20   2000 1.0e-02     0.0%    9.7%
  50   2000 1.0e-02     0.0%    9.7%
 100   2000 1.0e-02     0.1%    9.7%


<p style="page-break-after:always;"></p>
With a batch size of 5, we get the lower training error and test error (0% and 9,7% respectively).

### Exercise 3
Applying the logistic regression classifier with default parameter values and
batch size 10, adjust both the maximum number of epochs and the learning rate
(eta) to determine the optimal values. Use in both cases a logarithmic
scale. Report the classification error rate on the training and test
sets. Discuss the results obtained.

In [17]:
print("  bs maxEps     eta trainErr testErr")
print("---- ------ ------- -------- -------")
bs=10
for maxEpochs in (5, 10, 20, 50, 100, 200, 500, 1000, 2000):
    for eta in (1e-4, 1e-3, 1e-2, 1e-1, 1, 1e1, 1e2):
        W = LogisticRegressionTraining(X_train,y_train, bs=bs, maxEpochs=maxEpochs, eta=eta)
        haty_train = LogisticRegressionClassification(X_train,W)
        acc_train = np.sum(haty_train==y_train)/N
        haty_test = LogisticRegressionClassification(X_test,W)
        acc_test = np.sum(haty_test==y_test)/M
        print(f"{bs:4d} {maxEpochs:6d} {eta:.1e} {1.0-acc_train:8.1%} {1.0-acc_test:7.1%}")

  bs maxEps     eta trainErr testErr
---- ------ ------- -------- -------
  10      5 1.0e-04    38.5%   42.9%
  10      5 1.0e-03    21.7%   25.7%
  10      5 1.0e-02    10.3%   14.4%
  10      5 1.0e-01     2.2%   11.0%
  10      5 1.0e+00     0.2%   11.0%
  10      5 1.0e+01     6.7%   13.5%
  10      5 1.0e+02     8.0%   11.6%
  10     10 1.0e-04    35.8%   42.6%
  10     10 1.0e-03    17.0%   19.7%
  10     10 1.0e-02     8.0%   12.5%
  10     10 1.0e-01     0.5%    9.7%
  10     10 1.0e+00     0.2%   11.0%
  10     10 1.0e+01     6.7%   13.5%
  10     10 1.0e+02     8.0%   11.6%
  10     20 1.0e-04    30.5%   34.8%
  10     20 1.0e-03    14.1%   17.9%
  10     20 1.0e-02     5.6%   10.7%
  10     20 1.0e-01     0.1%   10.0%
  10     20 1.0e+00     0.2%   11.0%
  10     20 1.0e+01     6.7%   13.5%
  10     20 1.0e+02     8.0%   11.6%
  10     50 1.0e-04    21.5%   25.7%
  10     50 1.0e-03    10.7%   14.4%
  10     50 1.0e-02     1.8%    9.4%
  10     50 1.0e-01     0.0%    9.7%
 

<p style="page-break-after:always;"></p>
We can use 100 epochs and 1.0e-02 as learning rate, which provides the smallest test error.

### Exercise 4
According to the results you have obtained, could you claim that this task is linearly separable? Why?