# Multinomial Classification

## (Review) Logistic Regression

### Hypothesis

$$
{H(X)} = {1 \over {1+ e^{-XW}}}
$$

### Cost Function

$$
Cost(W) = {1 \over m} {\sum_{i=1}^m c(H(x_{i}), y_{i})}
$$

### Cross Entropy in Logistic Classification

$$
c(H(x), y) = \begin{cases}-log(H(x)) : y=1 \\ -log(1-H(x)) : y=0\end{cases} 
= -y log(H(x)) - (1-y) log(1-H(x))
$$

### Minimizing Cost

$$
W_{new} = W_{old} - \alpha {\partial \over {\partial W}} Cost(W)
$$

## Multinomial Classification

### Softmax Function (Hypothesis)

$$
S(y_{i}) = {{e^{y_i}} \over {\sum_{j=1}^n e^{y_{j}}}} \\
\\
n: Number\ of\ classes \\
\\
i: i\ class
$$

### Cost Function

$$
Cost(W) = {1 \over m} {\sum_{i=1}^m D(S(X_{i}W + b),L_{i})} \\
\\
m: Number\ of\ instances \\
\\
i: i\ instance
$$

### Cross Entropy in Multinomial Classification

$$
D(S, L) = - \sum_{j=1}^n L_{j} log(S(y_{j})) \\
\\
n: Number\ of\ classes \\
\\
j: j\ class
$$

### Minimizing Cost

$$
W_{new} = W_{old} - \alpha {\partial \over {\partial W}} Cost(W)
$$

## Implement

In [2]:
import tensorflow as tf
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

print("TensorFlow Version: %s" % (tf.__version__))

TensorFlow Version: 2.0.0


## Data

In [7]:
x_data = [[1, 2, 1, 1],
          [2, 1, 3, 2],
          [3, 1, 3, 4],
          [4, 1, 5, 5],
          [1, 7, 5, 5],
          [1, 2, 5, 6],
          [1, 6, 6, 6],
          [1, 7, 7, 7]]

y_data = [[0, 0, 1],
          [0, 0, 1],
          [0, 0, 1],
          [0, 1, 0],
          [0, 1, 0],
          [0, 1, 0],
          [1, 0, 0],
          [1, 0, 0]]

x_data = np.array(x_data, dtype=np.float32)
y_data = np.array(y_data, dtype=np.float32)

nb_classes = 3

In [26]:
# Weights
tf.random.set_seed(2020)
W = tf.Variable(tf.random.normal([4, nb_classes], mean=0.0))
b = tf.Variable(tf.random.normal([nb_classes], mean=0.0))

print('# Weights: \n', W.numpy(), '\n\n# Bias: \n', b.numpy())

# Weights: 
 [[-0.10099822  0.6847899   1.6258513 ]
 [ 0.88112587 -0.63692456 -0.1427695 ]
 [ 0.82411087 -0.91326994 -0.4510184 ]
 [ 0.58053356  1.3066356  -0.60428965]] 

# Bias: 
 [ 0.38414612 -0.6159301  -0.5453214 ]


In [None]:
# Learning Rate
learning_rate = 0.01

# Hypothesis and Prediction Function
def predict(X):
    hypothesis = tf.nn.softmax(tf.matmul(x_data, W) + b)
    return hypothesis

# Training
for i in range(2000+1):
    
    with tf.GradientTape() as tape:
        
        hypothesis = predict(x_train)
        cost = tf.reduce_mean(-tf.reduce_sum(y_train*tf.math.log(hypothesis) + (1-y_train)*tf.math.log(1-hypothesis)))        
        W_grad, b_grad = tape.gradient(cost, [W, b])
                
        W.assign_sub(learning_rate * W_grad)
        b.assign_sub(learning_rate * b_grad)
    
    if i % 400 == 0:
        print(">>> #%s \n Weights: \n%s \n Bias: \n%s \n cost: %s\n" % (i, W.numpy(), b.numpy(), cost.numpy()))

In [27]:
hypothesis = tf.nn.softmax(tf.matmul(x_data, W) + b)
hypothesis

<tf.Tensor: id=101, shape=(8, 3), dtype=float32, numpy=
array([[9.6286476e-01, 1.3575208e-02, 2.3560027e-02],
       [9.8214459e-01, 8.8751027e-03, 8.9803627e-03],
       [9.1783530e-01, 7.7751756e-02, 4.4128583e-03],
       [9.8761183e-01, 1.1750807e-02, 6.3734938e-04],
       [9.9999988e-01, 1.2473994e-07, 7.7960598e-09],
       [9.9948967e-01, 5.0995382e-04, 3.9852188e-07],
       [9.9999976e-01, 2.0705905e-07, 1.8543984e-09],
       [1.0000000e+00, 1.6505206e-08, 5.6909893e-11]], dtype=float32)>

In [None]:
cost = tf.reduce_mean()