## Softmax regression

Softmax regression, also known as multinomial logistic regression, is a type of regression analysis used for classification problems where the output variable consists of multiple classes. It is an extension of logistic regression, which is used for binary classification problems.

In softmax regression, the goal is to predict the probability that an input belongs to each possible class. The output of a softmax regression model is a probability distribution over the different classes, with each class having a probability assigned to it. This makes softmax regression particularly useful when dealing with problems where there are more than two possible outcomes.

Here's how softmax regression works:

1. **Input Data**: Like in any other regression or classification problem, you start with a dataset consisting of input features and corresponding labels.

2. **Linear Combination**: Softmax regression computes a weighted sum of the input features, similar to linear regression. However, instead of directly outputting this sum, it further processes it.

3. **Activation Function**: After computing the weighted sum of the input features, softmax regression applies the softmax function to these values. The softmax function transforms the raw scores into probabilities. It does this by exponentiating each score and then normalizing the resulting values to sum up to 1. This ensures that the output can be interpreted as probabilities.

    Mathematically, the softmax function for class \( j \) given the input \( x \) and parameters \( W \) (weights) and \( b \) (bias) is defined as:
    
`P(y = j | x; W, b) = e^(W_jx + b_j) / Σ(e^(W_kx + b_k))`

    Where:
    - P(y = j \mid x; W, b) is the probability that input x belongs to class j.
    - W_j is the weight vector for class j.
    - b_j is the bias term for class j.
    - K is the total number of classes.

4. **Loss Function**: In order to train the softmax regression model, you need a loss function that quantifies the difference between the predicted probabilities and the actual labels. A common loss function for softmax regression is the cross-entropy loss.

5. **Optimization**: The goal of optimization is to adjust the parameters (weights and biases) of the model in such a way that the loss function is minimized. This is typically done using optimization algorithms such as gradient descent.

6. **Training**: During the training phase, the model is trained on the dataset by repeatedly adjusting the parameters based on the gradients of the loss function with respect to the parameters. This process continues until the model converges to a set of parameters that minimize the loss function.

7. **Prediction**: Once the model is trained, it can be used to predict the class probabilities for new input instances. The class with the highest probability is then chosen as the predicted class.

In summary, softmax regression is a supervised learning algorithm used for classification problems with multiple classes. It computes the probability distribution over the classes using the softmax function and is trained using techniques like gradient descent to minimize the cross-entropy loss.

### Softmax function

The softmax function is a mathematical function that takes as input a vector of numerical scores (often called logits) and transforms them into a probability distribution over multiple classes. It's commonly used in machine learning for multiclass classification tasks.

Here's how the softmax function works:

1. **Input**: Given a vector of numerical scores \( z = (z_1, z_2, ..., z_k) \), where \( k \) is the number of classes, the softmax function computes the probability that each score corresponds to each class.

2. **Exponentiation**: The first step is to exponentiate each score in the input vector. This is done to turn the scores into positive values while maintaining their relative magnitudes.

3. **Normalization**: After exponentiation, the next step is to normalize the resulting values to ensure they sum up to 1. This is achieved by dividing each exponentiated score by the sum of all exponentiated scores. This normalization ensures that the output represents a valid probability distribution.

Mathematically, the softmax function can be defined as follows:

![Softmax Function](https://assets-global.website-files.com/5ef788f07804fb7d78a4127a/61d596de34b8ea3f9e3dce2d_Softmax%20function-min.png)

Where:
- \( z_i \) is the \( i \)-th score in the input vector \( z \).
- \( e \) is the base of the natural logarithm (Euler's number).
- \( \sum_{j=1}^{k} e^{z_j} \) represents the sum of the exponentiated scores over all classes.
- k is number of class (0,1,2 or yes, no , maybe)

This operation ensures that the output of the softmax function is a probability distribution, with each element representing the probability of the corresponding class.

Range values between 0 and 1

In summary, the softmax function transforms numerical scores into probabilities, making it useful for multiclass classification tasks where you need to predict the probability of an input belonging to each class.

In [1]:
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,confusion_matrix
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
df = sns.load_dataset('iris')

In [3]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [4]:
encoder = LabelEncoder()
df['species'] = encoder.fit_transform(df['species'])

In [5]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [6]:
df = df[['sepal_length','petal_length','species']]

In [7]:
df.head()

Unnamed: 0,sepal_length,petal_length,species
0,5.1,1.4,0
1,4.9,1.4,0
2,4.7,1.3,0
3,4.6,1.5,0
4,5.0,1.4,0


In [8]:
X = df.iloc[:,0:2]
y = df.iloc[:,-1]

In [9]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

In [10]:
#For Softmax
clf = LogisticRegression(multi_class='multinomial')

In [11]:
clf.fit(X_train, y_train)

In [12]:
y_pred = clf.predict(X_test)

In [13]:
print(accuracy_score(y_test,y_pred))

0.9666666666666667


In [14]:
pd.DataFrame(confusion_matrix(y_test,y_pred))

Unnamed: 0,0,1,2
0,14,0,0
1,0,7,1
2,0,0,8


In [15]:
# prediction
query = np.array([[3.4,2.7]])
clf.predict_proba(query)



array([[7.25957888e-01, 2.73627865e-01, 4.14246954e-04]])

In [16]:
clf.predict(query)



array([0])

In [17]:
# from mlxtend.plotting import plot_decision_regions

# plot_decision_regions(X.values, y.values, clf, legend=2)

# # Adding axes annotations
# plt.xlabel('sepal length [cm]')
# plt.ylabel('petal length [cm]')
# plt.title('Softmax on Iris')

# plt.show()