# Classification

Let's load the data as usual:

In [None]:
import numpy as np
x1, x2, x3, y = np.loadtxt("pizza_categorical.txt", skiprows=1, unpack=True)
X = np.column_stack((np.ones(x1.size), x1, x2, x3))
Y = y.reshape(-1, 1)

This time the labels are categorical instead of numerical. To be precise, they're either 0 or 1:

In [None]:
Y

To deal with categorical quantities, we can to define a `sigmoid` function that squashes the result of linear regression in the range from 0 to 1:

In [None]:
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

The updated `predict` function passes the result of the matrix multiplication through the sigmoid:

In [None]:
def predict(X, w):
    return sigmoid(np.matmul(X, w))

The sigmoid and the mean squared error loss together result in a function that isn't ideal for gradient descent. Instead, let's switch to a new way to calculate the loss: the log loss.

In [None]:
def loss(X, Y, w):
    predictions = predict(X, w)
    first_term = Y * np.log(predictions)
    second_term = (1 - Y) * np.log(1 - predictions)
    return -np.average(first_term + second_term)

And here is the gradient of the log loss:

In [None]:
def gradient(X, Y, w):
    return np.matmul(X.T, (predict(X, w) - Y)) / X.shape[0]

Finally, the `train` function is the same as before:

In [None]:
def train(X, Y, iterations, lr):
    w = np.zeros((X.shape[1], 1))
    for i in range(iterations):
        print("Iteration %4d => Loss: %.20f" % (i, loss(X, Y, w)))
        w -= gradient(X, Y, w) * lr
    return w

Time to train:

In [None]:
w = train(X, Y, iterations=100000, lr=0.001)

Let's run prediction on the entire dataset:

In [None]:
np.round(predict(X, w))

Now let's see which predicted labels are the same as the actual labels:

In [None]:
np.round(predict(X, w)) == Y

Only one inaccurate prediction. Not bad!