# Softmax and multi-class classification

- Multi-Class Classification (Softmax version): Similar to the Binary Classification Case, We have X (which are our predictors).  Each predictor describes the subject that we are trying to classify, and each row is an example.  The predictors are put into a linear combination of (wX+b) = y.

- We need to create a probability space for the linear combination for each class. So take the exponent of linear combination results (y) for each class, and divide that by the exponent of total linear combination results.  The exponent is there to make sure all values are positive, so we don’t get into a situation where we are dividing by zero.


## One Hot Encoding

- NOTE: we need to One-Hot encode the y axis (similar to what we did to the perceptron algorithm, but in that case we used 1 and -1).  Why one-hot encode is because we don't want to imply any ordinal or distance dependencies between classes.  One-hot encoding just provides a numerical code to each group without the dependencies.

- then continue on, and optimizing our predictions using the iterative process of gradient descent. 

In [None]:
display.Image('./images/one-hot-encoding.png')

## Calculating the SoftMax Function

In [None]:
import numpy as np
import pandas as pd
from typing import List
from IPython import display

pd.set_option('display.max_columns',500)

In [None]:
display.Image('./images/softmax-function.png')

In [None]:
import numpy as np

def softmax(scores:List[float])->List[float]:
    exp_scores = np.exp(scores)
    sum_exp_scores = sum(exp_scores)
    result = []
    for i in exp_scores:
        result.append(i*1.0/sum_exp_scores)
    return result
    
    # Note: The function np.divide can also be used here, as follows:
    # def softmax(scores:List[float])->List[float]:
    #     exp_scores = np.exp(scores)
    #     return np.divide (exp_scores,sum_exp_scores)