# Character Recognition

This notebook will show how to recognize characters in ML and how to use our binary classifier notebook to help us

The next cell downloads MNIST from the Internet to an `mnist` directory. It takes some time, but you should only have to run it once. However, there is no harm in running it multiple times, if you do it by mistake.

In [1]:
# Only run once, to download MNIST.

import urllib.request
import os

# Create an 'mnist' directory unless it exists:
LOCAL_DIR = './mnist/'
if not os.path.exists(LOCAL_DIR):
    os.makedirs(LOCAL_DIR)

# Download the four MNIST files from the official site:
MNIST_SITE = 'http://yann.lecun.com/exdb/mnist/'
TRAINING_IMAGES = 'train-images-idx3-ubyte.gz'
TRAINING_LABELS = 'train-labels-idx1-ubyte.gz'
TEST_IMAGES = 't10k-images-idx3-ubyte.gz'
TEST_LABELS = 't10k-labels-idx1-ubyte.gz'

urllib.request.urlretrieve(MNIST_SITE + TRAINING_IMAGES, LOCAL_DIR + TRAINING_IMAGES)
urllib.request.urlretrieve(MNIST_SITE + TRAINING_LABELS, LOCAL_DIR + TRAINING_LABELS)
urllib.request.urlretrieve(MNIST_SITE + TEST_IMAGES, LOCAL_DIR + TEST_IMAGES)
urllib.request.urlretrieve(MNIST_SITE + TEST_LABELS, LOCAL_DIR + TEST_LABELS)

print("Data loaded")

Data loaded


Now here's the code that loads MNIST, starting with the images:

In [2]:
import numpy as np
import gzip
import struct

def load_images(filename):
    # Open and unzip the file of images:
    with gzip.open(filename, 'rb') as f:
        # Read the header information into a bunch of variables:
        _ignored, n_images, image_columns, image_rows = struct.unpack('>IIII', f.read(16))
        # Read all the pixels into a long NumPy array:
        all_pixels = np.frombuffer(f.read(), dtype=np.uint8)
        # Reshape the array into a matrix where each line is an image:
        images_matrix = all_pixels.reshape(n_images, image_columns * image_rows)
        # Add a bias column full of 1s as the first column in the matrix
        return np.insert(images_matrix, 0, 1, axis=1)

In [3]:
# 60000 images, each 785 elements (1 bias + 28 * 28 pixels)
X_train = load_images("./mnist/train-images-idx3-ubyte.gz")

# 10000 images, each 785 elements, with the same structure as X_train
X_test = load_images("./mnist/t10k-images-idx3-ubyte.gz")

Let's check that we have a (60000, 785) matrix of training images:

In [4]:
X_train.shape

(60000, 785)

Now let's load the labels. Note that the system we're writing identifies the digit 4, so the labels that are originally 4 become 1, and the others become 0:

In [5]:
def load_labels(filename):
    # Open and unzip the file of images:
    with gzip.open(filename, 'rb') as f:
        # Skip the header bytes:
        f.read(8)
        # Read all the labels into a list:
        all_labels = f.read()
        # Reshape the list of labels into a one-column matrix:
        labels_matrix = np.frombuffer(all_labels, dtype=np.uint8).reshape(-1, 1)
        # Encode the matrix so that all 4s become 1, and other digits become 0s:
        return (labels_matrix == 4).astype(int)

In [6]:
# 60K labels, each with value 1 if the digit is a five, and 0 otherwise
Y_train = load_labels("./mnist/train-labels-idx1-ubyte.gz")

# 10000 labels, with the same encoding as Y_train
Y_test = load_labels("./mnist/t10k-labels-idx1-ubyte.gz")

The training labels should be a matrix with 1 column and 60K rows:

In [7]:
Y_train.shape

(60000, 1)

Now here is the code of the binary classifier notebook. Nothing changed in any of these functions:

In [8]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

In [9]:
def predict(X, w):
    return sigmoid(np.matmul(X, w))

In [10]:
def loss(X, Y, w):
    predictions = predict(X, w)
    first_term = Y * np.log(predictions)
    second_term = (1 - Y) * np.log(1 - predictions)
    return -np.average(first_term + second_term)

In [11]:
def gradient(X, Y, w):
    return np.matmul(X.T, (predict(X, w) - Y)) / X.shape[0]

In [12]:
def train(X, Y, iterations, lr):
    w = np.zeros((X.shape[1], 1))
    for i in range(iterations):
        print("Iteration %4d => Loss: %.20f" % (i, loss(X, Y, w)))
        w -= gradient(X, Y, w) * lr
    return w

Let's run training with 200 iterations and a pretty small learning rate. This is going to take a minute or two:

In [13]:
w = train(X_train, Y_train, iterations=200, lr=0.00001)

Iteration    0 => Loss: 0.69314718055994528623
Iteration    1 => Loss: 0.80017553516096073807
Iteration    2 => Loss: 0.55723828104553563279
Iteration    3 => Loss: 0.32973175390660297568
Iteration    4 => Loss: 0.18334562770359463801
Iteration    5 => Loss: 0.16726190795891721086
Iteration    6 => Loss: 0.15898112692349325448
Iteration    7 => Loss: 0.15231619727522435759
Iteration    8 => Loss: 0.14690314715421115555
Iteration    9 => Loss: 0.14226046885782328566
Iteration   10 => Loss: 0.13819528582968026997
Iteration   11 => Loss: 0.13458801318399327140
Iteration   12 => Loss: 0.13135763336291331194
Iteration   13 => Loss: 0.12844347920976570410
Iteration   14 => Loss: 0.12579789856333775666
Iteration   15 => Loss: 0.12338265705502209080
Iteration   16 => Loss: 0.12116664919212588591
Iteration   17 => Loss: 0.11912428830004077873
Iteration   18 => Loss: 0.11723432263009328502
Iteration   19 => Loss: 0.11547894406452016702
Iteration   20 => Loss: 0.11384310912932420201
Iteration   2

Iteration  175 => Loss: 0.07005631044904066240
Iteration  176 => Loss: 0.06999624899762441066
Iteration  177 => Loss: 0.06993673333136346537
Iteration  178 => Loss: 0.06987775538901459804
Iteration  179 => Loss: 0.06981930726987627123
Iteration  180 => Loss: 0.06976138122977343370
Iteration  181 => Loss: 0.06970396967716367687
Iteration  182 => Loss: 0.06964706516935990910
Iteration  183 => Loss: 0.06959066040886570381
Iteration  184 => Loss: 0.06953474823981939390
Iteration  185 => Loss: 0.06947932164454294346
Iteration  186 => Loss: 0.06942437374019248819
Iteration  187 => Loss: 0.06936989777550613134
Iteration  188 => Loss: 0.06931588712764681637
Iteration  189 => Loss: 0.06926233529913597420
Iteration  190 => Loss: 0.06920923591487564142
Iteration  191 => Loss: 0.06915658271925567702
Iteration  192 => Loss: 0.06910436957334328834
Iteration  193 => Loss: 0.06905259045215231262
Iteration  194 => Loss: 0.06900123944198929826
Iteration  195 => Loss: 0.06895031073787410980
Iteration  19

The result is a matrix of 785 weights–one for each pixel in the images, plus one for the bias:

In [14]:
w.shape

(785, 1)

Now let's check the first ten predictions, and compare them with the first ten labels:

In [15]:
np.round(predict(X_test, w))[0:10]

array([[0.],
       [0.],
       [0.],
       [0.],
       [1.],
       [0.],
       [1.],
       [0.],
       [0.],
       [0.]])

In [16]:
Y_test[0:10]

array([[0],
       [0],
       [0],
       [0],
       [1],
       [0],
       [1],
       [0],
       [0],
       [0]])