# E-commerce Project - Logistic Regression
We are now going to look at training **logistic regression** with softmax. In my other walk throughs on logistic regression, we weren't looking at multiclass classification (just binary) so we had just been using the **sigmoid** function, and not softmax. This will give us the chance to see how logistic regression performs compared to a neural network. Remember, for logistic regression our architecture looks like:

<img src="images/logistic-neuron.png">

The only difference now is that instead of using the sigmoid at at the logistic neuron, we are going to use the softmax as we are performing multiclass classification. 

<img src="images/logistic-regression-softmax.png">

We can start with our imports.

In [40]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

from sklearn.utils import shuffle

And lets define our `get_data` function:

In [56]:
def get_data():
    df = pd.read_csv('data/ecommerce_data.csv')
    data = df.as_matrix()
    np.random.shuffle(data)
    X = data[:,:-1]
    Y = data[:,-1].astype(np.int32)

    # one-hot encode the categorical data
    N, D = X.shape
    X2 = np.zeros((N, D+3))
    X2[:,0:(D-1)] = X[:,0:(D-1)] # non-categorical

    # one-hot
    for n in range(N):
      t = int(X[n,D-1])
      X2[n,t+D-1] = 1
    X = X2

    # split train and test
    Xtrain = X[:-100]
    Ytrain = Y[:-100]
    Xtest = X[-100:]
    Ytest = Y[-100:]

    # normalize columns 1 and 2
    for i in (1, 2):
        m = Xtrain[:,i].mean()
        s = Xtrain[:,i].std()
        Xtrain[:,i] = (Xtrain[:,i] - m) / s
        Xtest[:,i] = (Xtest[:,i] - m) / s

    return Xtrain, Ytrain, Xtest, Ytest

We are going to need a function to get the indicator matrix from the targets.

In [48]:
def y2indicator(y, K):
    N = len(y)
    ind = np.zeros((N,K))
    for i in range(N):
        ind[i, y[i]] = 1 
    return ind

Now we can can get our data.

In [57]:
Xtrain, Ytrain, Xtest, Ytest = get_data()
D = Xtrain.shape[1]
K = len(set(Ytrain) | set(Ytest))

And convert our `Y` data into an indicator matrix. 

In [None]:
Ytrain_ind = y2indicator(Ytrain, K)
Ytest_ind = y2indicator(Ytest, K)

It is shape (400 x 4) because we have have four classes that make up Y. 

In [None]:
Ytrain_ind.shape

Now randomly initialize our weights. 

In [None]:
W = np.random.randn(D, K)
b = np.zeros(K)

And we can define our `softmax` function:

In [None]:
def softmax(a):
    expA = np.exp(a)
    return expA / expA.sum(axis=1, keepdims=True)

And now lets define our forward function:

In [None]:
def forward(X, W, b):
    return softmax(X.dot(W) + b)

Next, our predict function:

In [None]:
def predict(P_Y_given_X):
    return np.argmax(P_Y_given_X, axis=1)

Our classification rate: 

In [None]:
def classification_rate(Y, P):
    return np.mean(Y == P)

And the **Cross Entropy**:

In [None]:
def cross_entropy(T, pY):
    return -np.mean(T * np.log(pY))

Now that we have done all of that, we can enter our train loop. Note: our weight update rule in gradient descent is based on the rule we derived last lecture: `Z.T.dot(T - Y)`. Only in the case of logistic regression, we only have and input and output layer, no hidden layer, so in this case `Z` is going to be `X`. We are also now going to be performing gradient descent, not ascent, so we have a minus in our update, and we are subtracting the targets from the predictions. 

In [None]:
train_costs = []
test_costs = []
learning_rate = 0.001
for i in range(10000):
    pYtrain = forward(Xtrain, W, b)        # find the predicitons for y train
    pYtest = forward(Xtest, W, b)        # find the predicitons for y test
    
    ctrain = cross_entropy(Ytrain_ind, pYtrain)     # Ytrain_ind === targets in this case
    ctest = cross_entropy(Ytest_ind, pYtest)
    train_costs.append(ctrain)
    test_costs.append(ctest)
    
    # now we can perform gradient descent
    W -= learning_rate * Xtrain.T.dot(pYtrain - Ytrain_ind)  
    b -= learning_rate * (pYtrain - Ytrain_ind).sum(axis=0)
    if i % 1000 == 0:
        print i, ctrain, ctest
print("Final training classification rate: ", classification_rate(Ytrain, predict(pYtrain)))
print("Final testing classification rate: ", classification_rate(Ytest, predict(pYtest)))

legend1, = plt.plot(train_costs, label='train cost')
legend2, = plt.plot(test_costs, label='test cost')
plt.legend([legend1, legend2])
plt.show()