# SoftMax Regression/ MultiClass Classification 

When we have to classify between more than 2 objects then we use softmax regresssion. Consider an exmple where we classify 10 handwritten digits there we use softmax regression which gives the probability to every distinct handwritten digit

In [1]:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

# Creating Dataset

In [2]:
from sklearn.datasets import make_blobs
centers = [[-5, 2], [-2, -2], [1, 2], [5, -2]]
X_train, y_train = make_blobs(n_samples=2000, centers=centers, cluster_std=1.0,random_state=30)

## Method with Numerical Roundoff errors

In [3]:
model = Sequential([
    Dense(units=25,activation='relu'),
    Dense(units=15,activation='relu'),
    Dense(units=4,activation='softmax')
])

model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(),
             optimizer=tf.keras.optimizers.Adam(0.001),)

model.fit(X_train,y_train,epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1f7e13cb0c8>

In [4]:
p_nonpreferred = model.predict(X_train)
print(p_nonpreferred [:2])
print("largest value", np.max(p_nonpreferred), "smallest value", np.min(p_nonpreferred))

[[1.4677609e-03 1.4148162e-03 9.6415544e-01 3.2962032e-02]
 [9.8882282e-01 1.0765087e-02 3.0768575e-04 1.0433059e-04]]
largest value 0.9999961 smallest value 8.694405e-11


The output provided is the probabilities

## Model which is preffered having less numerical Rounoff Errors

In [5]:
model_preffered = Sequential([
    Dense(units=25,activation='relu'),
    Dense(units=15,activation='relu'),
    Dense(units=4,activation='linear')
])

model_preffered.compile(
                loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                optimizer = tf.keras.optimizers.Adam(0.001)
                       )

model_preffered.fit(X_train,y_train,epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1f7e29e9308>

In [6]:
p_preferred = model_preffered.predict(X_train)
print(f"two example output vectors:\n {p_preferred[:2]}")
print("largest value", np.max(p_preferred), "smallest value", np.min(p_preferred))

two example output vectors:
 [[-2.1648939  -1.5291836   3.9892135  -0.23768035]
 [ 6.5366626   1.1982427  -3.97533    -3.7934895 ]]
largest value 12.664385 smallest value -9.028359


**Note: The output predictions are not probabilities!**

If the problem only requires a selection, that is sufficient where the largest value index gives the predicted output. Use NumPy [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) to select it. If the problem requires a probability, a softmax is required:

In [7]:
sm_preferred = tf.nn.softmax(p_preferred).numpy()
print(f"two example output vectors:\n {sm_preferred[:2]}")
print("largest value", np.max(sm_preferred), "smallest value", np.min(sm_preferred))

two example output vectors:
 [[2.0815765e-03 3.9307699e-03 9.7968656e-01 1.4301133e-02]
 [9.9516022e-01 4.7802068e-03 2.7076510e-05 3.2476168e-05]]
largest value 0.9999997 smallest value 3.7928033e-10


To return an integer representing the predicted target, you want the index of the largest probability. This is accomplished with the Numpy [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) function.

This gives us the rquired probability of different classes

To select the most likely category, the softmax is not required. One can find the index of the largest output using [np.argmax()](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html).

In [8]:
for i in range(5):
    print( f"{p_preferred[i]}, category: {np.argmax(p_preferred[i])}")

[-2.1648939  -1.5291836   3.9892135  -0.23768035], category: 2
[ 6.5366626  1.1982427 -3.97533   -3.7934895], category: 0
[ 4.717441   1.3192291 -2.9539359 -3.1261587], category: 0
[-2.1094606   5.076804   -0.98137516 -1.982822  ], category: 1
[-0.7942974 -1.4857364  4.6711507 -2.1158438], category: 2


When we compare this with above model non preferred one then we see that there directly we get the probabilities but here as we are using from_logits=True due to which we get probability in 2 steps procedure

largest numerical value index in p_preferred gives the category or checkup the probability sm_preferred the one with highest probability gives the category

## SparseCategorialCrossentropy or CategoricalCrossEntropy
Tensorflow has two potential formats for target values and the selection of the loss defines which is expected.
- SparseCategorialCrossentropy: expects the target to be an integer corresponding to the index. For example, if there are 10 potential target values, y would be between 0 and 9. 
- CategoricalCrossEntropy: Expects the target value of an example to be one-hot encoded where the value at the target index is 1 while the other N-1 entries are zero. An example with 10 potential target values, where the target is 2 would be [0,0,1,0,0,0,0,0,0,0].