# Multiclass Classification Models

## Objectives:

1. **Build a Multiclass Classification Model with Linear Decision Boundaries:**
    - Import necessary libraries.
    - Read the dataset from waveform folder, view it, and check for any missing values.
    - Define functions for splitting the dataset and standardizing it using Scikit-Learn.
    - Convert dataset into numpy arrays, and split the dataset.
    - Train linear and non-linear/polynomial multinomial logistic regression models using Scikit-Learn.
    - Evaluate each model on training, cross-validation, and testing datasets.
    - Determine the optimal model.
    - Build a custom model based on the resulting optimal model.
    - Train and evaluate the custom model and compare results with the Scikit-Learn model.

2. **Build a Multiclass Classification Model with Non-linear Decision Boundaries:**
    - Read the dataset from optdigits folder, view it, and check for any missing values.
    - Convert dataset into numpy arrays, and split the dataset.
    - Train and evaulate linear and non-linear/polynomial multinomial logistic regression models using Scikit-Learn.
    - Determine the optimal model.
    - Based on the resulting optimal model, build a custom model.
    - Train and evaluate the custom model and compare results with the Scikit-Learn model.

## 1. Multiclass Classification Model with Linear Decision Boundaries

In [1]:
# Import necessary libraries
import numpy as np
import plotly.express as px

##### This project uses data from the [`UCI Machine Learning Repository`](https://archive.ics.uci.edu/dataset/107/waveform+database+generator+version+1). The dataset is licensed under a [`Creative Commons Attribution 4.0 International (CC BY 4.0) license`](https://creativecommons.org/licenses/by/4.0/legalcode). Variable names were added to waveform.data file and the dataset was converted into numpy arrays, for building a machine learning model.

In [2]:
import pandas as pd

data = pd.read_csv("waveform/waveform.data")
data

Unnamed: 0,Attribute1,Attribute2,Attribute3,Attribute4,Attribute5,Attribute6,Attribute7,Attribute8,Attribute9,Attribute10,...,Attribute13,Attribute14,Attribute15,Attribute16,Attribute17,Attribute18,Attribute19,Attribute20,Attribute21,class
0,-1.23,-1.56,-1.75,-0.28,0.60,2.22,0.85,0.21,-0.20,0.89,...,2.89,7.75,4.59,3.15,5.12,3.32,1.20,0.24,-0.56,2
1,-0.69,2.43,0.61,2.08,2.30,3.25,5.52,4.55,2.97,2.22,...,1.24,1.89,1.88,-1.34,0.83,1.41,1.78,0.60,2.42,1
2,-0.12,-0.94,1.29,2.59,2.42,3.55,4.94,3.25,1.90,2.07,...,2.50,0.12,1.41,2.78,0.64,0.62,-0.01,-0.79,-0.12,0
3,0.86,0.29,2.19,-0.02,1.13,2.51,2.37,5.45,5.45,4.84,...,2.58,1.40,1.24,1.41,1.07,-1.43,2.84,-1.18,1.12,1
4,1.16,0.37,0.40,-0.59,2.66,1.00,2.69,4.06,5.34,3.53,...,4.30,1.84,1.73,0.21,-0.18,0.13,-0.21,-0.80,-0.68,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,-0.65,0.69,2.29,-0.16,0.51,0.26,2.34,-0.42,0.49,0.31,...,3.46,4.81,5.49,5.19,3.10,3.86,2.96,1.09,-1.42,0
4996,-0.02,0.67,2.34,3.18,2.16,4.77,6.70,4.54,4.92,3.39,...,1.53,2.52,1.14,-1.56,-1.18,-0.56,0.02,-1.05,-0.18,1
4997,0.01,-1.99,0.16,2.30,-0.53,1.93,3.61,3.00,4.61,5.73,...,3.14,3.04,1.61,0.60,-0.52,0.62,1.00,1.21,-0.27,1
4998,-0.40,0.41,-0.48,1.04,0.79,-0.66,1.18,0.52,2.20,0.59,...,3.64,3.62,5.97,2.63,3.83,1.72,2.08,1.31,1.37,0


In [3]:
# Check for any missing values in the dataset
print(f"\nMissing values in dataset: \n{data.isna().any().values}")


Missing values in dataset: 
[False False False False False False False False False False False False
 False False False False False False False False False False]


We have a cleaned dataset with **5,000** training examples, **21** feature variables, and **1** target variable. Our goal is to build a multiclass classification model with a linear decision boundaries. To ensure that the linear decision boundaries will be suitable for this multiclass dataset, we will first use scikit-learn to build and evaluate both linear and non-linear/polynomial multiclass classification models. After identifying the optimal scikit-learn model, we will develop a custom model based on these findings and compare its results with the optimal scikit-learn model. This approach saves time by efficiently determining the best model type before building our custom models.

In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


def split_dataset(X, y):
    # Splitting the data into training (60%), cross-validation (20%), and testing (20%) sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

    return X_train, y_train, X_val, y_val, X_test, y_test


def standardize_dataset(X_train, X_val, X_test):
    # Standardizing the datasets
    scaler = StandardScaler()

    # Fitting the scaler on the training data and transforming training, validation, and testing sets
    X_train_scaled = scaler.fit_transform(X_train)
    X_val_scaled = scaler.transform(X_val)
    X_test_scaled = scaler.transform(X_test)

    return scaler, X_train_scaled, X_val_scaled, X_test_scaled

In [5]:
# Convert the data into numpy arrays of features (X) and target (y)
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values

# Splitting the dataset into training, validation, and test sets
X_train, y_train, X_val, y_val, X_test, y_test = split_dataset(X, y)

# Printing the shapes of the resulting datasets
print(f"Training set: {X_train.shape}, {y_train.shape}")
print(f"Validation set: {X_val.shape}, {y_val.shape}")
print(f"Test set: {X_test.shape}, {y_test.shape}")

Training set: (3000, 21), (3000,)
Validation set: (1000, 21), (1000,)
Test set: (1000, 21), (1000,)


In [6]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


def train_eval_multinomial_logistic_regression(X_train, y_train, X_val, y_val, X_test, y_test, degree=1):
    # Initialize multinomial logistic regression model for multiclass classification (linear/polynomial)
    model = make_pipeline(
        PolynomialFeatures(degree=degree),
        StandardScaler(),
        LogisticRegression(random_state=42, max_iter=1000, multi_class="multinomial", solver="lbfgs")
    )
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Evaluate on training set
    train_acc, train_prec, train_rec, train_f1 = evaluate_model(model, X_train, y_train)
    
    # Evaluate on validation set
    val_acc, val_prec, val_rec, val_f1 = evaluate_model(model, X_val, y_val)

    # Evaluate on testing set
    test_acc, test_prec, test_rec, test_f1 = evaluate_model(model, X_test, y_test)

    print("Linear Multinomial Logistic Regression:") if degree == 1 \
        else print(f"\nPolynomial Degree {degree} Multinomial Logistic Regression:")
    
    print(f"Training set - Accuracy: {train_acc:.4f}, Precision: {train_prec:.4f}, Recall: {train_rec:.4f}, "
            f"F1-score: {train_f1:.4f}")

    print(f"Validation set - Accuracy: {val_acc:.4f}, Precision: {val_prec:.4f}, Recall: {val_rec:.4f}, "
            f"F1-score: {val_f1:.4f}")
    
    print(f"Test set - Accuracy: {test_acc:.4f}, Precision: {test_prec:.4f}, Recall: {test_rec:.4f}, "
            f"F1-score: {test_f1:.4f}")
    
    return model


def evaluate_model(model, X, y):
    # Predict the target values using the provided model and features
    y_pred = model.predict(X)
    
    # Calculate the accuracy of the model
    accuracy = accuracy_score(y, y_pred)
    # Calculate the precision of the model
    precision = precision_score(y, y_pred, average="weighted")
    # Calculate the recall of the model
    recall = recall_score(y, y_pred, average="weighted")
    # Calculate the F1 score of the model
    f1 = f1_score(y, y_pred, average="weighted")
    
    return accuracy, precision, recall, f1


# Train and evaluate linear multinomial logistic regression model for multiclass classification
linear_mlr_model = train_eval_multinomial_logistic_regression(X_train, y_train, X_val, y_val, X_test, y_test)

# Train and evaluate polynomial (degree 2) multinomial logistic regression model for multiclass classification
poly2_mlr_model = train_eval_multinomial_logistic_regression(X_train, y_train, X_val, y_val, X_test, y_test, \
                                                             degree=2)

# Train and evaluate polynomial (degree 3) multinomial logistic regression model for multiclass classification
poly3_mlr_model = train_eval_multinomial_logistic_regression(X_train, y_train, X_val, y_val, X_test, y_test, \
                                                             degree=3)

Linear Multinomial Logistic Regression:
Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857

Polynomial Degree 2 Multinomial Logistic Regression:
Training set - Accuracy: 0.9267, Precision: 0.9265, Recall: 0.9267, F1-score: 0.9265
Validation set - Accuracy: 0.8440, Precision: 0.8442, Recall: 0.8440, F1-score: 0.8437
Test set - Accuracy: 0.8500, Precision: 0.8500, Recall: 0.8500, F1-score: 0.8499

Polynomial Degree 3 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.8120, Precision: 0.8119, Recall: 0.8120, F1-score: 0.8119
Test set - Accuracy: 0.8300, Precision: 0.8301, Recall: 0.8300, F1-score: 0.8299


Based on the above evaluations, the **Linear Multinomial Logistic Regression** model shows the best balance between bias and variance, with validation and test set scores indicating better generalization: **validation accuracy** of **0.8670** and **test accuracy** of **0.8860**. The **Polynomial Degree 2** model, while achieving a high **training accuracy** of **0.9267**, performs worse on validation (**0.8440**) and test sets (**0.8500**), suggesting overfitting. The **Polynomial Degree 3** model significantly overfits, with a perfect **training accuracy** of **1.0000** but lower validation (**0.8120**) and test accuracy (**0.8300**). Hence, the **Linear Multinomial Logistic Regression** model is the optimal choice for this dataset. Now, we will build our **Custom Linear Multinomial Logistic Regression** model and compare the results with the scikit-learn optimal model.

The multinomial logistic regression (softmax regression) model can be represented as:

$$
\begin{equation}
\hat{\mathbf{Y}} = \sigma(\mathbf{X} \cdot \mathbf{W} + \mathbf{b})
\end{equation}
$$

where:
- $\hat{\mathbf{Y}}$ represents the predicted probabilities matrix,
- $\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}$ is the softmax function that converts the output logits into probabilities,
- $j = 1, 2, \ldots, K$,
- $K$ is the number of classes,
- $\mathbf{X}$ represents the features matrix,
- $\mathbf{W}$ represents the weight matrix,
- $\mathbf{b}$ represents the bias vector.

To train the multinomial logistic regression model, we use the cross-entropy loss as the loss function:

$$
\begin{equation}
\text{Cross-Entropy Loss} = -\frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} y_{i,k} \log(\hat{y}_{i,k}) \tag{2}
\end{equation}
$$

where:
- $m$ is the number of data points,
- $y_{i,k}$ is the actual binary indicator (0 or 1) if the class label of the $i$-th instance is $k$,
- $\hat{y}_{i,k}$ is the predicted probability that the $i$-th instance belongs to class $k$.

To update the parameters $\mathbf{W}$ and $\mathbf{b}$, we use gradient descent:

$$
\begin{equation}
\mathbf{W}_{\text{new}} = \mathbf{W}_{\text{old}} - \alpha \times \frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}} \tag{3}
\end{equation}
$$

$$
\begin{equation}
\mathbf{b}_{\text{new}} = \mathbf{b}_{\text{old}} - \alpha \times \frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}} \tag{4}
\end{equation}
$$

where:
- $\mathbf{W}_{\text{new}}$ and $\mathbf{W}_{\text{old}}$ are the updated and current weight matrices, respectively,
- $\mathbf{b}_{\text{new}}$ and $\mathbf{b}_{\text{old}}$ are the updated and current bias vectors, respectively,
- $\alpha$ is the learning rate,
- $\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}}$ is the gradient of the Cross-Entropy Loss function with respect to the weight matrix,
- $\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}}$ is the gradient of the Cross-Entropy Loss function with respect to the bias vector.

The gradients with respect to the weights and biases are computed as follows:

$$
\begin{equation}
\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_{i,k} - {y}_{i,k}) {x}_{i,j} \tag{5}
\end{equation}
$$

$$
\begin{equation}
\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_{i,k} - {y}_{i,k}) \tag{6}
\end{equation}
$$

In [34]:
class MultinomialLogisticRegression:
    def __init__(self, learning_rate=0.01, num_iterations=1000):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.biases = None
        self.cost_history = []


    def initialize_parameters(self, n_features, k_classes):
        # Initialize weights as zero matrix of shape (n_features, k_classes)
        self.weights = np.zeros((n_features, k_classes))
        
        # Initialize biases as a zero vector of shape (k_classes,)
        self.biases = np.zeros(k_classes)

    
    def softmax(self, Z):
        # Return the softmax output
        return np.exp(Z) / np.sum(np.exp(Z), axis=1, keepdims=True)


    def compute_cost(self, Y_hat, Y_one_hot):
        # Get the number of samples
        m = Y_one_hot.shape[0]

        # Small epsilon value to prevent log(0)
        epsilon = 1e-10

        # Compute the cost using cross-entropy loss function
        cost = - (1 / m) * np.sum(Y_one_hot * np.log(Y_hat + epsilon))
        
        return cost 
   

    def fit(self, X, y):
        # Get the shape of X
        m, n = X.shape

        # Get the total number of unique classes in y
        k = len(np.unique(y))

        # One-hot encoding for vectorized implementation
        Y_one_hot = np.eye(k)[y]
        
        # Initialize parameters weights and biases
        self.initialize_parameters(n, k)

        # Run the gradient descent loop
        for i in range(self.num_iterations):
            # Forward propagation
            Z = np.matmul(X, self.weights) + self.biases
            Y_hat = self.softmax(Z)

            # Compute cost
            cost = self.compute_cost(Y_hat, Y_one_hot)
            print(f"Iteration {i + 1}/{self.num_iterations}: Cost {cost}")

            # Save cost at every 100 iterations
            if (i + 1) % 100 == 0:
                self.cost_history.append((i+1, cost))

            # Compute gradients
            dW = (1 / m) * np.matmul(X.T, (Y_hat - Y_one_hot))
            db = (1 / m) * np.sum((Y_hat - Y_one_hot), axis=0)

            # Update parameters
            self.weights -= self.learning_rate * dW
            self.biases -= self.learning_rate * db

    
    def predict(self, X):
        # Compute the linear combination of input features and weights, plus biases
        # Note: X can be polynomial features
        Z = np.matmul(X, self.weights) + self.biases
        
        # Apply the softmax function to the linear combination
        Y_hat = self.softmax(Z)
        
        # Return the index of the maximum probaility in each row of Y_hat
        return np.argmax(Y_hat, axis=1)
    

    def evaluate_model(self, X, y):
        # Predict the target values using the provided features
        y_pred = self.predict(X)
        
        # Calculate the accuracy of the model
        accuracy = accuracy_score(y, y_pred)
        # Calculate the precision of the model
        precision = precision_score(y, y_pred, average="weighted")
        # Calculate the recall of the model
        recall = recall_score(y, y_pred, average="weighted")
        # Calculate the F1 score of the model
        f1 = f1_score(y, y_pred, average="weighted")

        return accuracy, precision, recall, f1

In [8]:
# Standardize the dataset
scaler, X_train_scaled, X_val_scaled, X_test_scaled = standardize_dataset(X_train, X_val, X_test)

In [35]:
# Train custom linear multinomial logistic regression model
custom_linear_mlr_model = MultinomialLogisticRegression(learning_rate=0.7)
custom_linear_mlr_model.fit(X_train_scaled, y_train)

Iteration 1/1000: Cost 1.0986122883681098
Iteration 2/1000: Cost 0.5682868017509506
Iteration 3/1000: Cost 0.46041648364230336
Iteration 4/1000: Cost 0.425842482563487
Iteration 5/1000: Cost 0.40417337730229
Iteration 6/1000: Cost 0.38903024415488197
Iteration 7/1000: Cost 0.3778006873832341
Iteration 8/1000: Cost 0.36912311243647794
Iteration 9/1000: Cost 0.36221055450224016
Iteration 10/1000: Cost 0.35657361431550316
Iteration 11/1000: Cost 0.3518906309067311
Iteration 12/1000: Cost 0.34794081475793637
Iteration 13/1000: Cost 0.3445672184074419
Iteration 14/1000: Cost 0.3416550121294961
Iteration 15/1000: Cost 0.3391181243411838
Iteration 16/1000: Cost 0.3368906974496641
Iteration 17/1000: Cost 0.3349214389634937
Iteration 18/1000: Cost 0.3331697788261672
Iteration 19/1000: Cost 0.33160318995863175
Iteration 20/1000: Cost 0.3301952789307131
Iteration 21/1000: Cost 0.3289243990696909
Iteration 22/1000: Cost 0.32777262570168597
Iteration 23/1000: Cost 0.32672498728680166
Iteration 24/1

In [36]:
# Convert cost history into numpy arrays
cost_hist = np.array(custom_linear_mlr_model.cost_history)

# Plotly Express line chart
fig = px.line(
    x=cost_hist[:, 0],
    y=cost_hist[:, 1],
    title="Iteration vs Cost",
    labels={"x": "Iteration", "y": "Cost"}
)

# Show the plot
fig.show()

# Saved as plot_1.png in the current directory/folder

The above plot shows the decrease in cost as the number of iterations increases, flattening out at approximately 400 iterations. This indicates the proper functioning of gradient descent in minimizing cost, leading to the convergence of the model and the attainment of optimal model parameters.

In [37]:
# Evaluate the custom linear multinomial logistic regression model
train_acc, train_prec, train_rec, train_f1 = custom_linear_mlr_model.evaluate_model(X_train_scaled, y_train)
val_acc, val_prec, val_rec, val_f1 = custom_linear_mlr_model.evaluate_model(X_val_scaled, y_val)
test_acc, test_prec, test_rec, test_f1 = custom_linear_mlr_model.evaluate_model(X_test_scaled, y_test)

print("Custom Linear Multinomial Logistic Regression:\n")

print(f"Training set - Accuracy: {train_acc:.4f}, Precision: {train_prec:.4f}, Recall: {train_rec:.4f}, "
        f"F1-score: {train_f1:.4f}")

print(f"Validation set - Accuracy: {val_acc:.4f}, Precision: {val_prec:.4f}, Recall: {val_rec:.4f}, "
        f"F1-score: {val_f1:.4f}")

print(f"Test set - Accuracy: {test_acc:.4f}, Precision: {test_prec:.4f}, Recall: {test_rec:.4f}, "
        f"F1-score: {test_f1:.4f}")

Custom Linear Multinomial Logistic Regression:

Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857


Scikit-Learn Linear Multinomial Logistic Regression: <br>

Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724<br>
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668<br>
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857

The Custom and Scikit-Learn implementations of the **Linear Multinomial Lgistic Regression** model yield identical results across the training, validation, and test sets. These consistent results across both implementations affirm the reliability and correctness of the model in predicting outcomes, showcasing its effective generalization to unseen data. With the goal of building a **Multiclass Classification Model with Linear Decision Boundaries** achieved, our next step is to develop a **Multiclass Classification Model with Non-Linear Decision Boundaries**.

## 2. Multiclass Classification Model with Non-Linear Decision Boundaries

##### This project also uses data from the [`UCI Machine Learning Repository`](https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits). The dataset is licensed under a [`Creative Commons Attribution 4.0 International (CC BY 4.0) license`](https://creativecommons.org/licenses/by/4.0/legalcode). The data from `optdigits.tra` and `optdigits.tes` was merged into `optdigits.csv` file with addition to variable names, and the dataset was converted into numpy arrays, for building a machine learning model.

In [12]:
data_ = pd.read_csv("optdigits/optdigits.csv")
data_

Unnamed: 0,Attribute1,Attribute2,Attribute3,Attribute4,Attribute5,Attribute6,Attribute7,Attribute8,Attribute9,Attribute10,...,Attribute56,Attribute57,Attribute58,Attribute59,Attribute60,Attribute61,Attribute62,Attribute63,Attribute64,class
0,0,1,6,15,12,1,0,0,0,7,...,0,0,0,6,14,7,1,0,0,0
1,0,0,10,16,6,0,0,0,0,7,...,0,0,0,10,16,15,3,0,0,0
2,0,0,8,15,16,13,0,0,0,1,...,0,0,0,9,14,0,0,0,0,7
3,0,0,0,3,11,16,0,0,0,0,...,0,0,0,0,1,15,2,0,0,4
4,0,0,5,14,4,0,0,0,0,0,...,0,0,0,4,12,14,7,0,0,6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5615,0,0,4,10,13,6,0,0,0,1,...,0,0,0,2,14,15,9,0,0,9
5616,0,0,6,16,13,11,1,0,0,0,...,0,0,0,6,16,14,6,0,0,0
5617,0,0,1,11,15,1,0,0,0,0,...,0,0,0,2,9,13,6,0,0,8
5618,0,0,2,10,7,0,0,0,0,0,...,0,0,0,5,12,16,12,0,0,9


In [13]:
# Check for any missing values in the dataset
print(f"\nMissing values in dataset: \n{data_.isna().any().values}")


Missing values in dataset: 
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False]


We have a cleaned dataset with **5,620** training examples, **64** feature variables, and **1** target variable. Our goal is to build a multiclass classification model with a non-linear decision boundaries. To ensure that the non-linear decision boundaries will be suitable for this multiclass dataset, we will use the same approach as we did with our previous goal of building a multiclass classification model with a linear decision boundaries.

In [14]:
# Convert the data_ into numpy arrays of features (X_) and target (y_)
X_ = data_.iloc[:, :-1].values
y_ = data_.iloc[:, -1].values

# Splitting the dataset into training, validation, and test sets
X_train_, y_train_, X_val_, y_val_, X_test_, y_test_ = split_dataset(X_, y_)

# Printing the shapes of the resulting datasets
print(f"Training set: {X_train_.shape}, {y_train_.shape}")
print(f"Validation set: {X_val_.shape}, {y_val_.shape}")
print(f"Test set: {X_test_.shape}, {y_test_.shape}")

Training set: (3372, 64), (3372,)
Validation set: (1124, 64), (1124,)
Test set: (1124, 64), (1124,)


In [15]:
# Train and evaluate linear multinomial logistic regression model for multiclass classification
linear_mlr_model_ = train_eval_multinomial_logistic_regression(X_train_, y_train_, X_val_, y_val_, \
                                                               X_test_, y_test_)

# Train and evaluate polynomial (degree 2) multinomial logistic regression model for multiclass classification
poly2_mlr_model_ = train_eval_multinomial_logistic_regression(X_train_, y_train_, X_val_, y_val_, \
                                                              X_test_, y_test_, degree=2)

# Train and evaluate polynomial (degree 3) multinomial logistic regression model for multiclass classification
poly3_mlr_model_ = train_eval_multinomial_logistic_regression(X_train_, y_train_, X_val_, y_val_, \
                                                              X_test_, y_test_, degree=3)

Linear Multinomial Logistic Regression:
Training set - Accuracy: 0.9923, Precision: 0.9923, Recall: 0.9923, F1-score: 0.9923
Validation set - Accuracy: 0.9724, Precision: 0.9728, Recall: 0.9724, F1-score: 0.9725
Test set - Accuracy: 0.9680, Precision: 0.9684, Recall: 0.9680, F1-score: 0.9680

Polynomial Degree 2 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9831, Precision: 0.9833, Recall: 0.9831, F1-score: 0.9832
Test set - Accuracy: 0.9849, Precision: 0.9850, Recall: 0.9849, F1-score: 0.9849

Polynomial Degree 3 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9875, Precision: 0.9877, Recall: 0.9875, F1-score: 0.9876
Test set - Accuracy: 0.9858, Precision: 0.9859, Recall: 0.9858, F1-score: 0.9858


The comparison of multinomial logistic regression models shows that the **Linear Multinomial Logistic Regression** model performs well with a training accuracy of **0.9923**, but the polynomial models outperform it. The **Polynomial Degree 2 Multinomial Logistic Regression** achieves perfect performance on the training set (**accuracy: 1.0000**) and maintains consistently high performance on both the validation (**accuracy: 0.9831**) and test sets (**accuracy: 0.9849**), showing no drop in performance. In contrast, while the **Polynomial Degree 3 Multinomial Logistic Regression** model also achieves perfect training accuracy (**1.0000**), it shows a slight drop in performance on the test set (**accuracy: 0.9858**) compared to the validation set (**accuracy: 0.9875**). This suggests that the **Polynomial Degree 2** model generalizes slightly better, maintaining stable performance without overfitting, making it the more optimal choice for this dataset.

In [16]:
# Apply the polynomial features to the dataset
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_poly = poly.fit_transform(X_train_)
X_val_poly = poly.transform(X_val_)
X_test_poly = poly.transform(X_test_)

In [17]:
# Standardize the dataset
scaler_, X_train_poly_scaled, X_val_poly_scaled, X_test_poly_scaled = standardize_dataset(X_train_poly, 
                                                                                          X_val_poly, X_test_poly)

In [43]:
# Train custom polynomial (degree 2) multinomial logistic regression model
custom_poly2_mlr_model = MultinomialLogisticRegression(learning_rate=1, num_iterations=200)
custom_poly2_mlr_model.fit(X_train_poly_scaled, y_train_)

Iteration 1/200: Cost 2.302585091994046
Iteration 2/200: Cost 1.260707638406088
Iteration 3/200: Cost 0.5906363871207801
Iteration 4/200: Cost 0.3850760328444319
Iteration 5/200: Cost 0.28646833689095524
Iteration 6/200: Cost 0.22936520103099922
Iteration 7/200: Cost 0.1902085829983312
Iteration 8/200: Cost 0.1642712611924324
Iteration 9/200: Cost 0.14429775044678267
Iteration 10/200: Cost 0.12818792572331209
Iteration 11/200: Cost 0.11490871978986932
Iteration 12/200: Cost 0.10414014842955388
Iteration 13/200: Cost 0.0949956607435425
Iteration 14/200: Cost 0.08694258248632855
Iteration 15/200: Cost 0.0797225418925461
Iteration 16/200: Cost 0.07322485193497326
Iteration 17/200: Cost 0.06736334610207807
Iteration 18/200: Cost 0.06207643111205437
Iteration 19/200: Cost 0.05729847333398635
Iteration 20/200: Cost 0.052964918398233314
Iteration 21/200: Cost 0.04902494121563997
Iteration 22/200: Cost 0.04544033252753974
Iteration 23/200: Cost 0.042177995619156236
Iteration 24/200: Cost 0.039

In [44]:
# Convert cost history into numpy arrays
cost_hist_ = np.array(custom_poly2_mlr_model.cost_history)

# Plotly Express line chart
fig = px.line(
    x=cost_hist_[:, 0],
    y=cost_hist_[:, 1],
    title="Iteration vs Cost",
    labels={"x": "Iteration", "y": "Cost"}
)

# Show the plot
fig.show()

# Saved as plot_2.png in the current directory/folder

The above plot shows the decrease in cost as the number of iterations increases, indicating the proper functioning of gradient descent in minimizing cost and leading to optimal model parameters.

In [45]:
# Evaluate the custom polynomial (degree 2) multinomial logistic regression model
train_acc_, train_prec_, train_rec_, train_f1_ = custom_poly2_mlr_model.evaluate_model(X_train_poly_scaled, \
                                                                                       y_train_)

val_acc_, val_prec_, val_rec_, val_f1_ = custom_poly2_mlr_model.evaluate_model(X_val_poly_scaled, y_val_)

test_acc_, test_prec_, test_rec_, test_f1_ = custom_poly2_mlr_model.evaluate_model(X_test_poly_scaled, y_test_)

print("Custom Polynomial (degree 2) Multinomial Logistic Regression:\n")

print(f"Training set - Accuracy: {train_acc_:.4f}, Precision: {train_prec_:.4f}, Recall: {train_rec_:.4f}, "
        f"F1-score: {train_f1_:.4f}")

print(f"Validation set - Accuracy: {val_acc_:.4f}, Precision: {val_prec_:.4f}, Recall: {val_rec_:.4f}, "
        f"F1-score: {val_f1_:.4f}")

print(f"Test set - Accuracy: {test_acc_:.4f}, Precision: {test_prec_:.4f}, Recall: {test_rec_:.4f}, "
        f"F1-score: {test_f1_:.4f}")

Custom Polynomial (degree 2) Multinomial Logistic Regression:

Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9733, Precision: 0.9735, Recall: 0.9733, F1-score: 0.9733
Test set - Accuracy: 0.9813, Precision: 0.9816, Recall: 0.9813, F1-score: 0.9813


Scikit-Learn Polynomial (degree 2) Multinomial Logistic Regression: <br>

Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000<br>
Validation set - Accuracy: 0.9831, Precision: 0.9833, Recall: 0.9831, F1-score: 0.9832<br>
Test set - Accuracy: 0.9849, Precision: 0.9850, Recall: 0.9849, F1-score: 0.9849

Both the Custom and Scikit-Learn implementations of the **Polynomial (degree 2) Multinomial Logistic Regression** model demonstrated excellent performance across the training, validation, and test sets. The Custom model achieved perfect scores on the training set (**accuracy**, **precision**, **recall**, and **F1-score** all at **1.0000**), with slightly lower but still impressive validation metrics (**accuracy** 0.9733, **precision** 0.9735, **recall** and **F1-score** 0.9733), and strong test set performance (**accuracy** 0.9813, **precision** 0.9816, **recall** and **F1-score** 0.9813). 

Similarly, the Scikit-Learn model achieved perfect training set scores and slightly better validation metrics (**accuracy** 0.9831, **precision** 0.9833, **recall** 0.9831, and **F1-score** 0.9832), with a marginally higher test set performance (**accuracy** 0.9849, **precision** 0.9850, **recall** 0.9849, and **F1-score** 0.9849). 

These results affirm the robustness and reliability of both implementations in predicting outcomes and generalizing to unseen data. With this comparison, the objective of building a **Multiclass Classification Model with Non-Linear Decision Boundaries** has been achieved successfully.