### Foma Mironenko, <br>SPbU, Faculty of Mathematics and Mechanics,<br>431

# CNN, *Part I*

### The purpose of this work is to automatically recognize hand-written digits with a high level of confidence. As a source images set we use @MNIST dataset. The resulting model precision is measured on a testing images sample. Finally we apply the model to custom images set and compare predicted values against the real.

## Model description

### We choose a convolutional neural network (CNN) as a classifier. The network structure is as follows:
- *Convolution* 3x3, 32 filt
- *Max Pooling* 2x2
- *Batch Normalization*
- *Convolution* 3x3, 16 filt
- *Flattening*
- *Dense* 160, relu
- *Dense* 10, softmax

### The total number of optimization parameters is 316\`602

## Training

### We split the total sample of 60k images into a training and testing sets in proportion 5 : 1 respectively, which are 50k and 10k. <br>Achieved precision at the testing set is 98.89%.<br>However the custom images performance leaves much to be desired. Digits 6, 8, 9 are purely recognizable by the model.

In [1]:
#----- data handling -----#
import pandas as pd
import numpy as np
from PIL import Image

In [19]:
#----- conv net -----#
from tensorflow import keras

## Functions definitions

In [20]:
def draw_image(array, R, C):
    array = array.reshape((R, C));
    img = Image.fromarray(255 - array, 'P');
    img.show();

In [21]:
def decode(S: bytes):
    return int.from_bytes(S, byteorder='big');

def parse_images(k: int):
    #----- read a file -----#
    f = open("./train-images-idx3-ubyte", "rb");
    assert(decode(f.read(4)) == 2051);
    N = decode(f.read(4));
    assert(k <= N);
    #----- parse image data -----#
    R = decode(f.read(4));
    C = decode(f.read(4));
    assert(R == 28 and C == 28);
    images = [];
    for j in range(k):
        bits = [decode(f.read(1)) for i in range(R*C)];
        img = np.array(bits, dtype=np.uint8)
        images.append(img);
    f.close();
    return images, R, C;

def parse_labels(k: int):
    #----- read a file -----#
    f = open("./train-labels-idx1-ubyte", "rb");
    assert(decode(f.read(4)) == 2049);
    N = decode(f.read(4));
    assert(k <= N);
    #----- parse label data -----#
    result = [decode(f.read(1)) for i in range(k)]
    f.close();
    return result;

def parse_data(k: int):
    #----- create a dataset -----#
    imgs, R, C = parse_images(k);
    labs = parse_labels(k);
    return pd.DataFrame({'image': imgs, 'label': labs}), R, C;

## Load training and testing sets

In [13]:
Ntrain = 50000;
Ntest  = 10000;
N = Ntrain + Ntest;
df, R, C = parse_data(N);

df_test  = df.iloc[range(Ntest), :];
df_train = df.iloc[range(Ntest, N), :];

## Prepare training set
#### The model return is a vector of lenght 10. So we need to convert labels to vectors

In [14]:
Xtrain = np.stack( df_train
    .loc[:, 'image']
    .apply(lambda arr: arr.reshape((R, C))
));

inds = list(df_train.loc[:, 'label']);
Ytrain = np.zeros((Ntrain, 10));
Ytrain[range(Ntrain), inds] = 1;

## Initialize a conv net with max pooling and dense layers

In [36]:
model = keras.Sequential([
    keras.Input(
        shape=(R, C, 1)),
    keras.layers.Conv2D(
        filters=32, 
        kernel_size=(3, 3), 
        activation='relu'),
    keras.layers.MaxPool2D(
        pool_size=(2, 2)),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(
        filters=16, 
        kernel_size=(3, 3), 
        activation='relu'),
    keras.layers.Flatten(),
    keras.layers.Dense(
        160, activation='relu'),
    keras.layers.Dense(
        10, activation='softmax')
]);

In [37]:
model.compile(loss='kl_divergence');

## Print model summary

In [38]:
print(model.summary())

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_11 (Conv2D)          (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_8 (MaxPooling  (None, 13, 13, 32)       0         
 2D)                                                             
                                                                 
 batch_normalization (BatchN  (None, 13, 13, 32)       128       
 ormalization)                                                   
                                                                 
 conv2d_12 (Conv2D)          (None, 11, 11, 16)        4624      
                                                                 
 flatten_4 (Flatten)         (None, 1936)              0         
                                                                 
 dense_8 (Dense)             (None, 160)              

## Train model

In [39]:
model.fit(Xtrain, Ytrain, batch_size=1000, epochs=10, verbose=True)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x7ff5e0d4fca0>

In [40]:
model.fit(Xtrain, Ytrain, batch_size=10000, epochs=3, verbose=True)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<keras.callbacks.History at 0x7ff3a0eaba60>

## Predict testing labels and compare with actual values

In [42]:
Xtest = np.stack(df_test
        .loc[:, 'image']
        .apply(lambda arr: arr.reshape((R, C))
));

Ytest = np.argmax( 
    model.predict(Xtest, verbose=True), 
    axis=1 
);

Yactual = df_test.loc[:, 'label'];

precision_pct = 100 * np.equal(Ytest, Yactual).sum() / Ntest;
print(f"Model precision: {precision_pct}%");

Model precision: 98.89%


## Save model if intended precision was reached

In [43]:
if precision_pct >= 98.0:
    model.save('./trained-model', overwrite=True);

INFO:tensorflow:Assets written to: ./trained-model/assets
