# Multiclass Classification Models

## Objectives

1. **Build a Multiclass Classification Model with Linear Decision Boundaries:**
    - Import necessary libraries.
    - Import the dataset from UCIML repository, view it, check its shape, check for missing values and total classes in $y$.
    - Define functions for splitting the dataset and standardizing it using Scikit-Learn.
    - Split and standardize the dataset.
    - Train linear and non-linear/polynomial multinomial logistic regression models using Scikit-Learn.
    - Evaluate each model on training, cross-validation, and testing datasets.
    - Determine the optimal model, which was linear multinomial logistic regression.
    - Build a custom model based on the resulting optimal model.
    - Train and evaluate the custom model and compare results with the Scikit-Learn model.

2. **Build a Multiclass Classification Model with Non-linear Decision Boundaries:**
    - Get the dataset from the same source as before, view it, check its shape, check for missing values and total classes in $y$.
    - Split and standardize the dataset.
    - Train and evaulate linear and non-linear/polynomial multinomial logistic regression models using Scikit-Learn.
    - Determine the optimal model, which was a polynomial degree **2** multinomial logistic regression model.
    - Based on the resulting optimal model, build a custom model.
    - Train and evaluate the custom model and compare results with the Scikit-Learn model.

## 1. Multiclass Classification Model with Linear Decision Boundaries

In [1]:
# Import necessary libraries
import numpy as np
import plotly.express as px

##### This project uses data from the [`UCI Machine Learning Repository`](https://archive.ics.uci.edu/dataset/107/waveform+database+generator+version+1). The dataset is licensed under a [`Creative Commons Attribution 4.0 International (CC BY 4.0) license`](https://creativecommons.org/licenses/by/4.0/legalcode). The features (X) and target (y) variables were converted into numpy arrays, for building a machine learning model.

In [2]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
waveform_database_generator_version_1 = fetch_ucirepo(id=107) 
  
# data (as pandas dataframes) 
X = waveform_database_generator_version_1.data.features 
y = waveform_database_generator_version_1.data.targets 

In [3]:
# View first 10 rows of features data
X.head(10)

Unnamed: 0,Attribute1,Attribute2,Attribute3,Attribute4,Attribute5,Attribute6,Attribute7,Attribute8,Attribute9,Attribute10,...,Attribute12,Attribute13,Attribute14,Attribute15,Attribute16,Attribute17,Attribute18,Attribute19,Attribute20,Attribute21
0,-1.23,-1.56,-1.75,-0.28,0.6,2.22,0.85,0.21,-0.2,0.89,...,4.2,2.89,7.75,4.59,3.15,5.12,3.32,1.2,0.24,-0.56
1,-0.69,2.43,0.61,2.08,2.3,3.25,5.52,4.55,2.97,2.22,...,1.61,1.24,1.89,1.88,-1.34,0.83,1.41,1.78,0.6,2.42
2,-0.12,-0.94,1.29,2.59,2.42,3.55,4.94,3.25,1.9,2.07,...,1.45,2.5,0.12,1.41,2.78,0.64,0.62,-0.01,-0.79,-0.12
3,0.86,0.29,2.19,-0.02,1.13,2.51,2.37,5.45,5.45,4.84,...,4.05,2.58,1.4,1.24,1.41,1.07,-1.43,2.84,-1.18,1.12
4,1.16,0.37,0.4,-0.59,2.66,1.0,2.69,4.06,5.34,3.53,...,4.79,4.3,1.84,1.73,0.21,-0.18,0.13,-0.21,-0.8,-0.68
5,-0.0,0.77,1.32,0.29,-1.28,0.84,1.6,1.55,2.93,4.76,...,4.3,4.89,2.81,2.37,3.68,-0.98,0.69,0.91,-1.8,0.39
6,0.87,1.07,-0.65,1.46,0.84,2.7,3.67,2.94,3.81,5.2,...,3.29,4.24,2.43,0.4,1.6,0.72,0.66,0.05,-0.24,0.67
7,-0.22,-0.91,-1.18,0.35,-1.92,-1.59,1.91,0.75,1.72,2.02,...,3.91,2.73,4.29,4.89,2.04,1.13,-0.66,-1.33,0.41,-0.75
8,-1.11,-1.14,-0.89,0.0,0.53,0.44,0.24,2.15,1.64,1.75,...,5.68,3.39,4.24,3.81,4.56,3.18,1.51,2.9,0.14,-0.12
9,-0.75,1.1,-1.9,1.43,0.47,0.4,0.86,3.51,2.62,4.5,...,6.94,0.75,3.23,1.08,-0.25,0.73,-0.41,-1.5,0.46,1.47


In [23]:
# View target data
y

Unnamed: 0,class
0,2
1,1
2,0
3,1
4,1
...,...
4995,0
4996,1
4997,1
4998,0


In [24]:
# Check the shape of X and y
print(f"Shape of X = {X.shape}")
print(f"Shape of y = {y.shape}")

print("\n---------------------------")

# Check for any missing values for X and y
print(f"\nMissing values in X: \n{X.isna().any().values}")
print(f"\nMissing values in y: \n{y.isna().any().values}")

print("\n---------------------------")

# Check for how many classes are in y
print(f"\nTotal classes in y: {np.unique(y)}")

Shape of X = (5000, 21)
Shape of y = (5000, 1)

---------------------------

Missing values in X: 
[False False False False False False False False False False False False
 False False False False False False False False False]

Missing values in y: 
[False]

---------------------------

Total classes in y: [0 1 2]


We have a cleaned dataset with **5,000** training examples, **21** feature variables, and **1** target variable. Our goal is to build a multiclass classification model with a linear decision boundaries. To ensure that the linear decision boundaries will be suitable for this multiclass dataset, we will first use scikit-learn to build and evaluate both linear and non-linear/polynomial multiclass classification models. After identifying the optimal scikit-learn model, we will develop a custom model based on these findings and compare its results with the optimal scikit-learn model. This approach saves time by efficiently determining the best model type before building our custom models.

In [25]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


def split_dataset(X, y):
    # Splitting the data into training (60%), cross-validation (20%), and testing (20%) sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.4, random_state=42, stratify=y)
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

    return X_train, y_train, X_val, y_val, X_test, y_test


def standardize_dataset(X_train, X_val, X_test):
    # Standardizing the datasets
    scaler = StandardScaler()

    # Fitting the scaler on the training data and transforming training, validation, and testing sets
    X_train_scaled = scaler.fit_transform(X_train)
    X_val_scaled = scaler.transform(X_val)
    X_test_scaled = scaler.transform(X_test)

    return scaler, X_train_scaled, X_val_scaled, X_test_scaled

In [26]:
# Convert the features (X) and target (y) variables into numpy arrays
X = X.to_numpy()
y = y["class"].to_numpy()

# Splitting the dataset into training, validation, and test sets
X_train, y_train, X_val, y_val, X_test, y_test = split_dataset(X, y)

# Standardizing the feature datasets (training, validation, and test sets) to have zero mean and unit variance
scaler, X_train_scaled, X_val_scaled, X_test_scaled = standardize_dataset(X_train, X_val, X_test)

# Printing the shapes of the resulting datasets
print(f"Training set: {X_train_scaled.shape}, {y_train.shape}")
print(f"Validation set: {X_val_scaled.shape}, {y_val.shape}")
print(f"Test set: {X_test_scaled.shape}, {y_test.shape}")

Training set: (3000, 21), (3000,)
Validation set: (1000, 21), (1000,)
Test set: (1000, 21), (1000,)


In [27]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score


def train_eval_multinomial_logistic_regression(X_train, y_train, X_val, y_val, X_test, y_test, degree=1):
    # Initialize multinomial logistic regression model for multiclass classification (linear/polynomial)
    model = make_pipeline(
        PolynomialFeatures(degree=degree),
        LogisticRegression(random_state=42, max_iter=1000, multi_class="multinomial", solver="lbfgs")
    )
    
    # Train the model
    model.fit(X_train, y_train)
    
    # Evaluate on training set
    train_acc, train_prec, train_rec, train_f1 = evaluate_model(model, X_train, y_train)
    
    # Evaluate on validation set
    val_acc, val_prec, val_rec, val_f1 = evaluate_model(model, X_val, y_val)

    # Evaluate on testing set
    test_acc, test_prec, test_rec, test_f1 = evaluate_model(model, X_test, y_test)

    print("Linear Multinomial Logistic Regression:") if degree == 1 else \
    print(f"\nPolynomial Degree {degree} Multinomial Logistic Regression:")
    
    print(f"Training set - Accuracy: {train_acc:.4f}, Precision: {train_prec:.4f}, Recall: {train_rec:.4f}, "
            f"F1-score: {train_f1:.4f}")

    print(f"Validation set - Accuracy: {val_acc:.4f}, Precision: {val_prec:.4f}, Recall: {val_rec:.4f}, "
            f"F1-score: {val_f1:.4f}")
    
    print(f"Test set - Accuracy: {test_acc:.4f}, Precision: {test_prec:.4f}, Recall: {test_rec:.4f}, "
            f"F1-score: {test_f1:.4f}")
    
    return model


def evaluate_model(model, X, y):
    # Predict the target values using the provided model and features
    y_pred = model.predict(X)
    
    # Calculate the accuracy of the model
    accuracy = accuracy_score(y, y_pred)
    # Calculate the precision of the model
    precision = precision_score(y, y_pred, average="weighted")
    # Calculate the recall of the model
    recall = recall_score(y, y_pred, average="weighted")
    # Calculate the F1 score of the model
    f1 = f1_score(y, y_pred, average="weighted")
    
    return accuracy, precision, recall, f1


# Train and evaluate linear multinomial logistic regression model for multiclass classification
linear_mlr_model = train_eval_multinomial_logistic_regression(
    X_train_scaled, y_train, X_val_scaled, y_val, X_test_scaled, y_test
    )

# Train and evaluate polynomial (degree 2) multinomial logistic regression model for multiclass classification
poly2_mlr_model = train_eval_multinomial_logistic_regression(
    X_train_scaled, y_train, X_val_scaled, y_val, X_test_scaled, y_test, degree=2
    )

# Train and evaluate polynomial (degree 3) multinomial logistic regression model for multiclass classification
poly3_mlr_model = train_eval_multinomial_logistic_regression(
    X_train_scaled, y_train, X_val_scaled, y_val, X_test_scaled, y_test, degree=3
    )

Linear Multinomial Logistic Regression:
Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857

Polynomial Degree 2 Multinomial Logistic Regression:
Training set - Accuracy: 0.9330, Precision: 0.9329, Recall: 0.9330, F1-score: 0.9329
Validation set - Accuracy: 0.8390, Precision: 0.8390, Recall: 0.8390, F1-score: 0.8388
Test set - Accuracy: 0.8490, Precision: 0.8489, Recall: 0.8490, F1-score: 0.8490

Polynomial Degree 3 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.8200, Precision: 0.8199, Recall: 0.8200, F1-score: 0.8199
Test set - Accuracy: 0.8280, Precision: 0.8283, Recall: 0.8280, F1-score: 0.8278


Based on the above evaluations, the **Linear Multinomial Logistic Regression** model shows the best balance between bias and variance, with validation and test set scores indicating better generalization: **validation accuracy** of **0.8670** and **test accuracy** of **0.8860**. The **Polynomial Degree 2** model, while achieving a high **training accuracy** of **0.9330**, performs worse on validation (**0.8390**) and test sets (**0.8490**), suggesting overfitting. The **Polynomial Degree 3** model significantly overfits, with a perfect **training accuracy** of **1.0000** but lower validation (**0.8200**) and test accuracy (**0.8280**). Hence, the **Linear Multinomial Logistic Regression** model is the optimal choice for this dataset. Now, we will build our **Custom Linear Multinomial Logistic Regression** model and compare the results with the scikit-learn optimal model.

The multinomial logistic regression (softmax regression) model can be represented as:

$$
\begin{equation}
\hat{\mathbf{Y}} = \sigma(\mathbf{X} \cdot \mathbf{W} + \mathbf{b})
\end{equation}
$$

where:
- $\hat{\mathbf{Y}}$ represents the predicted probabilities matrix,
- $\sigma(\mathbf{z})_j = \frac{e^{z_j}}{\sum_{k=1}^{K} e^{z_k}}$ is the softmax function that converts the output logits into probabilities,
- $j = 1, 2, \ldots, K$,
- $K$ is the number of classes,
- $\mathbf{X}$ represents the features matrix,
- $\mathbf{W}$ represents the weight matrix,
- $\mathbf{b}$ represents the bias vector.

To train the multinomial logistic regression model, we use the cross-entropy loss as the loss function:

$$
\begin{equation}
\text{Cross-Entropy Loss} = -\frac{1}{m} \sum_{i=1}^{m} \sum_{k=1}^{K} y_{i,k} \log(\hat{y}_{i,k}) \tag{2}
\end{equation}
$$

where:
- $m$ is the number of data points,
- $y_{i,k}$ is the actual binary indicator (0 or 1) if the class label of the $i$-th instance is $k$,
- $\hat{y}_{i,k}$ is the predicted probability that the $i$-th instance belongs to class $k$.

To update the parameters $\mathbf{W}$ and $\mathbf{b}$, we use gradient descent:

$$
\begin{equation}
\mathbf{W}_{\text{new}} = \mathbf{W}_{\text{old}} - \alpha \times \frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}} \tag{3}
\end{equation}
$$

$$
\begin{equation}
\mathbf{b}_{\text{new}} = \mathbf{b}_{\text{old}} - \alpha \times \frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}} \tag{4}
\end{equation}
$$

where:
- $\mathbf{W}_{\text{new}}$ and $\mathbf{W}_{\text{old}}$ are the updated and current weight matrices, respectively,
- $\mathbf{b}_{\text{new}}$ and $\mathbf{b}_{\text{old}}$ are the updated and current bias vectors, respectively,
- $\alpha$ is the learning rate,
- $\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}}$ is the gradient of the Cross-Entropy Loss function with respect to the weight matrix,
- $\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}}$ is the gradient of the Cross-Entropy Loss function with respect to the bias vector.

The gradients with respect to the weights and biases are computed as follows:

$$
\begin{equation}
\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{W}} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_{i,k} - {y}_{i,k}) {x}_{i,j} \tag{5}
\end{equation}
$$

$$
\begin{equation}
\frac{\partial \text{Cross-Entropy Loss}}{\partial \mathbf{b}} = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_{i,k} - {y}_{i,k}) \tag{6}
\end{equation}
$$

In [52]:
class MultinomialLogisticRegression:
    def __init__(self, learning_rate=0.01, num_iterations=1000):
        self.learning_rate = learning_rate
        self.num_iterations = num_iterations
        self.weights = None
        self.biases = None
        self.cost_history = []


    def initialize_parameters(self, n_features, k_classes):
        # Initialize weights as a zero matrix of shape (n_features, k_classes)
        self.weights = np.zeros((n_features, k_classes))
        
        # Initialize biases as a zero vector of shape (k_classes,)
        self.biases = np.zeros(k_classes)

    
    def softmax(self, Z):
        # Return the softmax output
        return np.exp(Z) / np.sum(np.exp(Z), axis=1, keepdims=True)


    def compute_cost(self, Y_hat, Y_one_hot):
        # Get the number of samples
        m = Y_one_hot.shape[0]

        # Small epsilon value to prevent log(0)
        epsilon = 1e-10

        # Compute the cost using cross-entropy loss function
        cost = - (1 / m) * np.sum(Y_one_hot * np.log(Y_hat + epsilon))
        
        return cost 
   

    def fit(self, X, y):
        # Get the shape of X
        m, n = X.shape

        # Get the total number of unique classes in y
        k = len(np.unique(y))

        # One-hot encoding for vectorized implementation
        Y_one_hot = np.eye(k)[y]
        
        # Initialize parameters weights and biases
        self.initialize_parameters(n, k)

        # Run the gradient descent loop
        for i in range(self.num_iterations):
            # Forward propagation
            Z = np.matmul(X, self.weights) + self.biases
            Y_hat = self.softmax(Z)

            # Compute cost
            cost = self.compute_cost(Y_hat, Y_one_hot)

            # Print cost every 100 iterations and save cost
            if (i + 1) % 100 == 0:
                print(f"Iteration {i + 1}/{self.num_iterations}: Cost {cost}")
                self.cost_history.append((i+1, cost))

            # Compute gradients
            dW = (1 / m) * np.matmul(X.T, (Y_hat - Y_one_hot))
            db = (1 / m) * np.sum((Y_hat - Y_one_hot), axis=0)

            # Update parameters
            self.weights -= self.learning_rate * dW
            self.biases -= self.learning_rate * db

    
    def predict(self, X):
        # Compute the linear combination of input features and weights, plus biases
        # Note: X can be polynomial features
        Z = np.matmul(X, self.weights) + self.biases
        
        # Apply the softmax function to the linear combination
        Y_hat = self.softmax(Z)
        
        # Return the index of the maximum probaility in each row of Y_hat
        return np.argmax(Y_hat, axis=1)
    

    def evaluate_model(self, X, y):
        # Predict the target values using the provided features
        y_pred = self.predict(X)
        
        # Calculate the accuracy of the model
        accuracy = accuracy_score(y, y_pred)
        # Calculate the precision of the model
        precision = precision_score(y, y_pred, average="weighted")
        # Calculate the recall of the model
        recall = recall_score(y, y_pred, average="weighted")
        # Calculate the F1 score of the model
        f1 = f1_score(y, y_pred, average="weighted")

        return accuracy, precision, recall, f1

In [87]:
# Train custom linear multinomial logistic regression model
custom_linear_mlr_model = MultinomialLogisticRegression(learning_rate=0.7)
custom_linear_mlr_model.fit(X_train_scaled, y_train)

Iteration 100/1000: Cost 0.31142376048471093
Iteration 200/1000: Cost 0.3102091157073313
Iteration 300/1000: Cost 0.3100006408145589
Iteration 400/1000: Cost 0.30995123348071446
Iteration 500/1000: Cost 0.3099373155757351
Iteration 600/1000: Cost 0.30993294801407006
Iteration 700/1000: Cost 0.3099314935273836
Iteration 800/1000: Cost 0.3099309945440475
Iteration 900/1000: Cost 0.3099308209192659
Iteration 1000/1000: Cost 0.30993076010325465


In [88]:
# Convert cost history into numpy arrays
cost_hist = np.array(custom_linear_mlr_model.cost_history)

# Plotly Express line chart
fig = px.line(
    x=cost_hist[:, 0],
    y=cost_hist[:, 1],
    title="Iteration vs Cost",
    labels={"x": "Iteration", "y": "Cost"}
)

# Show the plot
fig.show()

# Saved as plot_1.png in the current directory/folder

The above plot shows the decrease in cost as the number of iterations increases, flattening out at approximately 500 iterations. This indicates the proper functioning of gradient descent in minimizing cost, leading to the convergence of the model and the attainment of optimal model parameters.

In [89]:
# Evaluate the custom linear multinomial logistic regression model
train_acc, train_prec, train_rec, train_f1 = custom_linear_mlr_model.evaluate_model(X_train_scaled, y_train)
val_acc, val_prec, val_rec, val_f1 = custom_linear_mlr_model.evaluate_model(X_val_scaled, y_val)
test_acc, test_prec, test_rec, test_f1 = custom_linear_mlr_model.evaluate_model(X_test_scaled, y_test)

print("Custom Linear Multinomial Logistic Regression:")

print(f"Training set - Accuracy: {train_acc:.4f}, Precision: {train_prec:.4f}, Recall: {train_rec:.4f}, "
        f"F1-score: {train_f1:.4f}")

print(f"Validation set - Accuracy: {val_acc:.4f}, Precision: {val_prec:.4f}, Recall: {val_rec:.4f}, "
        f"F1-score: {val_f1:.4f}")

print(f"Test set - Accuracy: {test_acc:.4f}, Precision: {test_prec:.4f}, Recall: {test_rec:.4f}, "
        f"F1-score: {test_f1:.4f}")

Custom Linear Multinomial Logistic Regression:
Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857


Scikit-Learn Linear Multinomial Logistic Regression: <br>
Training set - Accuracy: 0.8727, Precision: 0.8725, Recall: 0.8727, F1-score: 0.8724 <br>
Validation set - Accuracy: 0.8670, Precision: 0.8679, Recall: 0.8670, F1-score: 0.8668 <br>
Test set - Accuracy: 0.8860, Precision: 0.8862, Recall: 0.8860, F1-score: 0.8857

The Custom and Scikit-Learn implementations of the **Linear Multinomial Lgistic Regression** model yield identical results across the training, validation, and test sets. These consistent results across both implementations affirm the reliability and correctness of the model in predicting outcomes, showcasing its effective generalization to unseen data. With the goal of building a **Multiclass Classification Model with Linear Decision Boundaries** achieved, our next step is to develop a **Multiclass Classification Model with Non-Linear Decision Boundaries**.

## 2. Multiclass Classification Model with Non-Linear Decision Boundaries

##### This project uses data from the [`UCI Machine Learning Repository`](https://archive.ics.uci.edu/dataset/80/optical+recognition+of+handwritten+digits). The dataset is licensed under a [`Creative Commons Attribution 4.0 International (CC BY 4.0) license`](https://creativecommons.org/licenses/by/4.0/legalcode). The features (X_) and target (y_) variables were converted into numpy arrays, for building a machine learning model.

In [90]:
# fetch dataset 
optical_recognition_of_handwritten_digits = fetch_ucirepo(id=80) 
  
# data (as pandas dataframes) 
X_ = optical_recognition_of_handwritten_digits.data.features 
y_ = optical_recognition_of_handwritten_digits.data.targets 

In [91]:
# View first 10 rows of features data
X_.head(10)

Unnamed: 0,Attribute1,Attribute2,Attribute3,Attribute4,Attribute5,Attribute6,Attribute7,Attribute8,Attribute9,Attribute10,...,Attribute55,Attribute56,Attribute57,Attribute58,Attribute59,Attribute60,Attribute61,Attribute62,Attribute63,Attribute64
0,0,1,6,15,12,1,0,0,0,7,...,0,0,0,0,6,14,7,1,0,0
1,0,0,10,16,6,0,0,0,0,7,...,3,0,0,0,10,16,15,3,0,0
2,0,0,8,15,16,13,0,0,0,1,...,0,0,0,0,9,14,0,0,0,0
3,0,0,0,3,11,16,0,0,0,0,...,0,0,0,0,0,1,15,2,0,0
4,0,0,5,14,4,0,0,0,0,0,...,12,0,0,0,4,12,14,7,0,0
5,0,0,11,16,10,1,0,0,0,4,...,8,3,0,0,10,16,16,16,16,6
6,0,0,1,11,13,11,7,0,0,0,...,0,0,0,0,1,13,5,0,0,0
7,0,0,8,10,8,7,2,0,0,1,...,0,0,0,0,4,13,8,0,0,0
8,0,0,15,2,14,13,2,0,0,0,...,0,0,0,0,10,12,5,0,0,0
9,0,0,3,13,13,2,0,0,0,6,...,12,0,0,0,3,15,11,6,0,0


In [93]:
# View target data
y_

Unnamed: 0,class
0,0
1,0
2,7
3,4
4,6
...,...
5615,9
5616,0
5617,8
5618,9


In [94]:
# Check the shape of X_ and y_
print(f"Shape of X_ = {X_.shape}")
print(f"Shape of y_ = {y_.shape}")

print("\n---------------------------")

# Check for any missing values for X_ and y_
print(f"\nMissing values in X_: \n{X_.isna().any().values}")
print(f"\nMissing values in y_: \n{y_.isna().any().values}")

print("\n---------------------------")

# Check for how many classes are in y_
print(f"\nTotal classes in y_: {np.unique(y_)}")

Shape of X_ = (5620, 64)
Shape of y_ = (5620, 1)

---------------------------

Missing values in X_: 
[False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False]

Missing values in y_: 
[False]

---------------------------

Total classes in y_: [0 1 2 3 4 5 6 7 8 9]


We have a cleaned dataset with **5,620** training examples, **64** feature variables, and **1** target variable. Our goal is to build a multiclass classification model with a non-linear decision boundaries. To ensure that the non-linear decision boundaries will be suitable for this multiclass dataset, we will use the same approach as we did with our previous goal of building a multiclass classification model with a linear decision boundaries.

In [95]:
# Convert the features (X_) and target (y_) variables into numpy arrays
X_ = X_.to_numpy()
y_ = y_["class"].to_numpy()

# Splitting the dataset into training, validation, and test sets
X_train_, y_train_, X_val_, y_val_, X_test_, y_test_ = split_dataset(X_, y_)

# Standardizing the feature datasets (training, validation, and test sets) to have zero mean and unit variance
scaler_, X_train_scaled_, X_val_scaled_, X_test_scaled_ = standardize_dataset(X_train_, X_val_, X_test_)

# Printing the shapes of the resulting datasets
print(f"Training set: {X_train_scaled_.shape}, {y_train_.shape}")
print(f"Validation set: {X_val_scaled_.shape}, {y_val_.shape}")
print(f"Test set: {X_test_scaled_.shape}, {y_test_.shape}")

Training set: (3372, 64), (3372,)
Validation set: (1124, 64), (1124,)
Test set: (1124, 64), (1124,)


In [96]:
# Train and evaluate linear multinomial logistic regression model for multiclass classification
linear_mlr_model_ = train_eval_multinomial_logistic_regression(
    X_train_scaled_, y_train_, X_val_scaled_, y_val_, X_test_scaled_, y_test_
    )

# Train and evaluate polynomial (degree 2) multinomial logistic regression model for multiclass classification
poly2_mlr_model_ = train_eval_multinomial_logistic_regression(
    X_train_scaled_, y_train_, X_val_scaled_, y_val_, X_test_scaled_, y_test_, degree=2
    )

# Train and evaluate polynomial (degree 3) multinomial logistic regression model for multiclass classification
poly3_mlr_model_ = train_eval_multinomial_logistic_regression(
    X_train_scaled_, y_train_, X_val_scaled_, y_val_, X_test_scaled_, y_test_, degree=3
    )

Linear Multinomial Logistic Regression:
Training set - Accuracy: 0.9926, Precision: 0.9926, Recall: 0.9926, F1-score: 0.9926
Validation set - Accuracy: 0.9724, Precision: 0.9728, Recall: 0.9724, F1-score: 0.9725
Test set - Accuracy: 0.9680, Precision: 0.9683, Recall: 0.9680, F1-score: 0.9680

Polynomial Degree 2 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9858, Precision: 0.9861, Recall: 0.9858, F1-score: 0.9858
Test set - Accuracy: 0.9858, Precision: 0.9858, Recall: 0.9858, F1-score: 0.9858

Polynomial Degree 3 Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9840, Precision: 0.9843, Recall: 0.9840, F1-score: 0.9840
Test set - Accuracy: 0.9911, Precision: 0.9912, Recall: 0.9911, F1-score: 0.9911


The comparison of multinomial logistic regression models reveals that the **Linear Multinomial Logistic Regression** model shows good performance with a training accuracy of **0.9926**, but slightly lower than the polynomial models. The **Polynomial Degree 2 Multinomial Logistic Regression** model achieves perfect performance on the training set (**accuracy: 1.0000**) and maintains high performance on the validation (**accuracy: 0.9858**) and test sets (**accuracy: 0.9858**). The **Polynomial Degree 3** model also performs excellently, showing perfect scores on the training set and a slightly higher test accuracy (**0.9911**) than the degree **2** model. Despite the marginally better test performance of the degree **3** model, the **Polynomial Degree 2** model is chosen as optimal due to its balanced performance across training, validation, and test sets. An attempt to fit a polynomial degree **4** model resulted in a **MemoryError**, indicating the system couldn't allocate **20.5 GiB** of memory required for an array with shape **(3372, 814385)** due to the current system's **4GB RAM**. While a more powerful PC with significantly more RAM could prevent this specific memory error, high-degree polynomial models often lead to impracticality due to the exponential growth in features. Therefore, the **Polynomial Degree 2 Multinomial Logistic Regression** model offers robust and consistent performance, eliminating the need for more complex models like the polynomial degree **3** and **4** models. Now, we will build our **Custom Polynomial Degree 2 Multinomial Logistic Regression** model and compare the results with the scikit-learn optimal model.

In [97]:
from itertools import combinations_with_replacement


def polynomial_features_multiple(X: np.ndarray, degree: int):
    """
    Generate polynomial features for a given standardized dataset X up to a specific degree.
    
    Parameters:
    X (np.ndarray): The standardized input dataset of shape (n_samples, n_features).
    degree (int): The degree of the polynomial features.
    
    Returns:
    np.ndarray: A new dataset with polynomial features of shape (n_samples, n_output_features).
    
    """
    
    n_features = X.shape[1]
    
    # List to hold the polynomial features
    features = []
    
    # Add polynomial features of all degrees from 1 to the specified degree
    for d in range(1, degree + 1):
        for indices in combinations_with_replacement(range(n_features), d):
            new_feature = np.prod(X[:, indices], axis=1)
            features.append(new_feature)
    
    return np.vstack(features).T

In [112]:
# Train custom polynomial (degree 2) multinomial logistic regression model
custom_poly2_mlr_model = MultinomialLogisticRegression(learning_rate=0.15)
custom_poly2_mlr_model.fit(polynomial_features_multiple(X_train_scaled_, 2), y_train_)

Iteration 100/1000: Cost 0.025256094558993584
Iteration 200/1000: Cost 0.013207418209254789
Iteration 300/1000: Cost 0.008968965239506051
Iteration 400/1000: Cost 0.006803447193990029
Iteration 500/1000: Cost 0.005487381298418004
Iteration 600/1000: Cost 0.004602137894336613
Iteration 700/1000: Cost 0.003965500216623285
Iteration 800/1000: Cost 0.0034853852375776894
Iteration 900/1000: Cost 0.0031102287784110626
Iteration 1000/1000: Cost 0.0028088988398663564


In [113]:
# Convert cost history into numpy arrays
cost_hist_ = np.array(custom_poly2_mlr_model.cost_history)

# Plotly Express line chart
fig = px.line(
    x=cost_hist_[:, 0],
    y=cost_hist_[:, 1],
    title="Iteration vs Cost",
    labels={"x": "Iteration", "y": "Cost"}
)

# Show the plot
fig.show()

# Saved as plot_2.png in the current directory/folder

The above plot shows the decrease in cost as the number of iterations increases, indicating the proper functioning of gradient descent in minimizing cost and leading to optimal model parameters.

In [114]:
# Evaluate the custom polynomial (degree 2) multinomial logistic regression model
train_acc_, train_prec_, train_rec_, train_f1_ = \
        custom_poly2_mlr_model.evaluate_model(polynomial_features_multiple(X_train_scaled_, 2), y_train_)

val_acc_, val_prec_, val_rec_, val_f1_ = \
        custom_poly2_mlr_model.evaluate_model(polynomial_features_multiple(X_val_scaled_, 2), y_val_)

test_acc_, test_prec_, test_rec_, test_f1_ = \
        custom_poly2_mlr_model.evaluate_model(polynomial_features_multiple(X_test_scaled_, 2), y_test_)

print("Custom Polynomial (degree 2) Multinomial Logistic Regression:")

print(f"Training set - Accuracy: {train_acc_:.4f}, Precision: {train_prec_:.4f}, Recall: {train_rec_:.4f}, "
        f"F1-score: {train_f1_:.4f}")

print(f"Validation set - Accuracy: {val_acc_:.4f}, Precision: {val_prec_:.4f}, Recall: {val_rec_:.4f}, "
        f"F1-score: {val_f1_:.4f}")

print(f"Test set - Accuracy: {test_acc_:.4f}, Precision: {test_prec_:.4f}, Recall: {test_rec_:.4f}, "
        f"F1-score: {test_f1_:.4f}")

Custom Polynomial (degree 2) Multinomial Logistic Regression:
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000
Validation set - Accuracy: 0.9858, Precision: 0.9862, Recall: 0.9858, F1-score: 0.9858
Test set - Accuracy: 0.9902, Precision: 0.9903, Recall: 0.9902, F1-score: 0.9902


Scikit-Learn Polynomial (degree 2) Multinomial Logistic Regression: <br>
Training set - Accuracy: 1.0000, Precision: 1.0000, Recall: 1.0000, F1-score: 1.0000 <br>
Validation set - Accuracy: 0.9858, Precision: 0.9861, Recall: 0.9858, F1-score: 0.9858 <br>
Test set - Accuracy: 0.9858, Precision: 0.9858, Recall: 0.9858, F1-score: 0.9858

Both the Custom and Scikit-Learn implementations of the **Polynomial (degree 2) Multinomial Logistic Regression** model demonstrated excellent performance across the training, validation, and test sets. The Custom model achieved perfect scores on the training set (**accuracy**, **precision**, **recall**, and **F1-score** all at **1.0000**), with validation set metrics slightly lower but still high (**accuracy** 0.9858, **precision** 0.9862, **recall** and **F1-score** 0.9858), and test set metrics at (**accuracy** 0.9902, **precision** 0.9903, **recall** and **F1-score** 0.9902). Similarly, the Scikit-Learn model achieved perfect training set scores and comparable validation set results (**accuracy** 0.9858, **precision** 0.9861, **recall** and **F1-score** 0.9858), with the test set showing slightly lower metrics (all at **0.9858**). These results affirm the robustness and reliability of both implementations in predicting outcomes and generalizing to unseen data. So, with this comparison, our goal of building a **Multiclass Classification Model with Non-Linear Decision Boundaries** has been achieved successfully.