# Multiclass Classification Logic

Multiclass classification aims to categorize a given input datum, denoted as "x", into one of several distinct categories. Specifically, this scenario is applicable when there are more than two possible classes, i.e., $K > 2$, thus the target variable $y$ belongs to a discrete set ${1, 2, ..., K}$.

For the prediction space $y$, which is discrete and encompasses $K$ distinct categories, the selection of a probability distribution is crucial. Given the nature of $y$ as ${1, 2, ..., K}$, the categorical distribution is chosen. This distribution is characterized by $K$ parameters $\lambda_1, \lambda_2, ..., \lambda_K$, each representing the probability of the corresponding category. The probability that $y$ equals a specific category $k$ is denoted as:

$$Pr(y=k)=\lambda_k$$

The parameters $\lambda_k$ of the categorical distribution must satisfy two conditions for the distribution to be valid: they should lie within the interval $[0, 1]$, and their sum must be exactly $1$. These constraints ensure that we are dealing with a proper probability distribution.

The computation of the $K$ parameters from the input $x$ is performed using a network denoted as $f(x, \phi)$, which outputs $K$ values. To enforce the aforementioned constraints on these outputs, they are passed through the softmax function. The softmax function transforms any real-valued vector of length $K$ into a probability distribution consisting of $K$ values in the range $[0, 1]$ that sum to $1$. The $k^{th}$ element of the softmax-transformed vector is calculated as follows, where the exponential function guarantees non-negative values:

$$\text{softmax}_k|z| = \frac{exp|z_k|}{\sum_{k'=1}^{K}exp|z_{k'}|}$$

Given the setup, the likelihood that the input $x$ is assigned label $y$ equating to category $k$ is given by:

$$Pr(y=k|x) = \text{softmax}_k[f|x,\phi|]$$

The prediction $\hat{y}$ is the category with the highest probability, determined as follows:

$$\hat{y} = \text{argmax}_k[P r(y = k|f[x,\hat{\phi}])]$$

The loss function, utilized for training the model, is defined as the negative log-likelihood of the observed labels in the training data set. This can be mathematically represented as:

$$L|\phi| = -\sum_{i=1}^{I}log[\text{softmax}_{y_i}|f|x_i,\phi||]$$

This formulation encapsulates the principle of maximizing the likelihood of the true labels given the model parameters, thereby facilitating the learning of an effective classification model.

In [1]:
import numpy as np

**sample problem**:

- Simulate logits for an input: [2.0, 1.0, 0.1] (these could be outputs from the last layer of a neural network before applying softmax).
- Apply the softmax function to convert logits into probabilities.
- Predict the class with the highest probability.
- Compute the loss using a true label (assume the true class is 1, represented as [1, 0, 0]).

**Feature Representation**: We started with simplified feature vectors representing each fruit. In a real application, features would be extracted automatically by the neural network.

**Simulated Neural Network Output (Logits)**: These logits are typically the output of the final layer of a neural network before applying the softmax function. They represent the network's raw predictions.

**Softmax Function**: Converts raw logits into probabilities by applying the exponential function to each logit, normalizing them to ensure they sum to 1. This step is crucial because it allows us to interpret the network's output as probabilities.

**Prediction**: We identify the class with the highest probability as our prediction. This step is straightforward but crucial for classification tasks.

In [2]:
def Softmax(values:np.ndarray)->np.ndarray:
  """
  Compute softmax values for each sets of scores in values
  """
  expValue = np.exp(values-np.max(values))
  return expValue/expValue.sum(axis=0)

In [3]:
values = np.array([2.0,1.0,0.1])

In [4]:
# Applying softmax to simulate neural network's output layer for class probabilities
probabilities = Softmax(values)
print(f"Probabilities:\n{probabilities}")

Probabilities:
[0.65900114 0.24243297 0.09856589]


In [5]:
def PredictClass(probabilities:np.ndarray)->int|float:
  """
  Predict the class with the highest probability
  """
  return np.argmax(probabilities)+1 # Adding 1 to match the class labels 1, 2, ..., K

In [6]:
predicted = PredictClass(probabilities)
print(f"Predicted Class:\n{predicted}")

Predicted Class:
1


In [7]:
def CrossEntropyLoss(groundTruth:np.ndarray,probabilities:np.ndarray)->np.ndarray:
  return -np.sum(groundTruth*np.log(probabilities))

In [8]:
# Example true label for class 1 in one-hot encoding
groundTruth = np.array([1,0,0]) # Class 1

In [9]:
loss = CrossEntropyLoss(groundTruth,probabilities)
print(f"Cross Entropy Loss:\n{loss}")

Cross Entropy Loss:
0.4170300162778335


- **scenario**:

Suppose we have a dataset of images, and each image is represented by a set of features extracted from the image. For simplicity, let's assume these features are already extracted and represent color intensity, texture, and size. Based on these features, we want to classify each image into one of the three categories (apple, banana, orange).

For this example, let's assume we have the following feature vector for three images, representing an apple, a banana, and an orange, respectively:

- Apple: [0.9, 0.1, 0.8] (High red intensity, low texture, medium size)
- Banana: [0.1, 0.3, 0.9] (Low red intensity, medium texture, large size)
- Orange: [0.8, 0.2, 0.7] (High red intensity, low texture, medium size)

In a real scenario, a neural network trained on our dataset would produce a set of logits for each class based on the input features. For this example, let's simulate the logits for our three images:

- Apple logits: [2.5, 0.5, 1.0]
- Banana logits: [1.0, 3.0, 0.5]
- Orange logits: [2.0, 1.0, 2.5]

In [10]:
appleLogit = np.array([2.5,0.5,1.0])
bananaLogit = np.array([1.0,3.0,0.5])
orangeLogit = np.array([2.0,1.0,2.5])

In [11]:
appleProbabilities = Softmax(appleLogit)
bananaProbabilities = Softmax(bananaLogit)
orangeProbabilities = Softmax(orangeLogit)
print(f"Apple Probabilities: {appleProbabilities}")
print(f"Banana Probabilities: {bananaProbabilities}")
print(f"Orange Probabilities: {orangeProbabilities}")

Apple Probabilities: [0.73612472 0.09962365 0.16425163]
Banana Probabilities: [0.11116562 0.82140902 0.06742536]
Orange Probabilities: [0.33149896 0.12195165 0.54654939]


In [12]:
def PredictionClass(probabilities:np.ndarray)->int|float:
  classes = ["Apple","Banana","Orange"]
  return classes[np.argmax(probabilities)]

In [13]:
applePredicted = PredictionClass(appleProbabilities)
bananaPredicted = PredictionClass(bananaProbabilities)
orangePredicted = PredictionClass(orangeProbabilities)
print(f"Predicted Class for Apple: {applePredicted}")
print(f"Predicted Class for Banana: {bananaPredicted}")
print(f"Predicted Class for Orange: {orangePredicted}")

Predicted Class for Apple: Apple
Predicted Class for Banana: Banana
Predicted Class for Orange: Orange


- **scenario**:

Imagine a customer service system designed to automatically categorize incoming text messages to streamline the response process. We'll simulate a simplified version of this task, focusing on the core aspects of preprocessing text data, vectorizing the text, and applying a classification model.

In [16]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [15]:
data = [
    ("What are your opening hours?","Question"),
    ("Can you provide me with your location?","Question"),
    ("Please cancel my subscription.","Request"),
    ("I need information about my order.","Request"),
    ("Tell me more about your products.","Information"),
    ("How do I create an account?","Question"),
    ("What types of payment do you accept?","Question"),
    ("I'd like to know more about the refund policy.","Information"),
    ("Please update my contact details.","Request"),
    ("Can I change my delivery address?","Request")
]

In [17]:
messages,labels = zip(*data)

In [18]:
vectorizer = TfidfVectorizer(stop_words="english",max_features=100)

In [19]:
xData = vectorizer.fit_transform(messages)

In [22]:
print(f"Shape of data: {xData.shape}")

Shape of data: (10, 26)


In [23]:
labelEngine = LabelEncoder()

In [24]:
yValues = labelEngine.fit_transform(labels)

In [27]:
print(f"Shape of labels: {yValues.shape}")
print(f"Labels:\n{set(yValues)}")

Shape of labels: (10,)
Labels:
{0, 1, 2}


In [28]:
xTrain,xTest,yTrain,yTest = train_test_split(xData,yValues,test_size=0.2,random_state=45)

In [29]:
model = LogisticRegression()

In [30]:
model.fit(xTrain,yTrain)

- just for example

In [31]:
predictions = model.predict(xTest)

In [37]:
accuracyManuel = sum(1 for true,predicted in zip(yTest,predictions) if true == predicted)/len(yValues)*100
print(f"Accuracy-Manuel: % {int(accuracyManuel)}")
accuracy = accuracy_score(predictions,yTest)
print(f"Accuracy: {accuracy}")

Accuracy-Manuel: % 0
Accuracy: 0.0


In [38]:
def ClassifyMessage(model,vectorizer,labelEncoder,message)->str:
  vectorized = vectorizer.transform([message])
  prediction = model.predict(vectorized)
  labels = labelEncoder.inverse_transform(prediction)[0]
  return labels

In [40]:
message = "How can I track my shipment?"
prediction = ClassifyMessage(model,vectorizer,labelEngine,message)
print(f"Prediction: {prediction} for --> {message}")

Prediction: Question for --> How can I track my shipment?
