# Multi-class Classification using Logistic Regression

Discriminative multi-class classification aims to learn decision boundaries that effectively separate multiple 
classes within a dataset. Three common methods for discriminative multi-class classification are one-vs-one, 
one-vs-all and softmax regression.

- One-vs-One (OvO) trains several binary logistic classifiers, each specialized in distinguishing between 
a specific pair of classes, and predicts the class based on their combined outcomes. 

- One-vs-All (OvA) builds one binary classifier per class against all remaining classes and assigns the 
label with the highest confidence score.

- Softmax regression extends logistic regression to the multi-class setting by modeling a normalized 
probability distribution over all classes and selecting the class with the maximum probability.

In [48]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [49]:
data = pd.read_csv("./data/wine_dataset.csv")
data

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,class_label
0,14.23,1.71,2.43,15.6,127,2.80,3.06,0.28,2.29,5.64,1.04,3.92,1065,1
1,13.20,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.40,1050,1
2,13.16,2.36,2.67,18.6,101,2.80,3.24,0.30,2.81,5.68,1.03,3.17,1185,1
3,14.37,1.95,2.50,16.8,113,3.85,3.49,0.24,2.18,7.80,0.86,3.45,1480,1
4,13.24,2.59,2.87,21.0,118,2.80,2.69,0.39,1.82,4.32,1.04,2.93,735,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
173,13.71,5.65,2.45,20.5,95,1.68,0.61,0.52,1.06,7.70,0.64,1.74,740,3
174,13.40,3.91,2.48,23.0,102,1.80,0.75,0.43,1.41,7.30,0.70,1.56,750,3
175,13.27,4.28,2.26,20.0,120,1.59,0.69,0.43,1.35,10.20,0.59,1.56,835,3
176,13.17,2.59,2.37,20.0,120,1.65,0.68,0.53,1.46,9.30,0.60,1.62,840,3


In [50]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   alcohol                       178 non-null    float64
 1   malic_acid                    178 non-null    float64
 2   ash                           178 non-null    float64
 3   alcalinity_of_ash             178 non-null    float64
 4   magnesium                     178 non-null    int64  
 5   total_phenols                 178 non-null    float64
 6   flavanoids                    178 non-null    float64
 7   nonflavanoid_phenols          178 non-null    float64
 8   proanthocyanins               178 non-null    float64
 9   color_intensity               178 non-null    float64
 10  hue                           178 non-null    float64
 11  od280_od315_of_diluted_wines  178 non-null    float64
 12  proline                       178 non-null    int64  
 13  class

In [51]:
data.shape

(178, 14)

well as always we have the std and mean in the describe but let's do them ourself!

In [52]:
data.describe()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,class_label
count,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0,178.0
mean,13.000618,2.336348,2.366517,19.494944,99.741573,2.295112,2.02927,0.361854,1.590899,5.05809,0.957449,2.611685,746.893258,1.938202
std,0.811827,1.117146,0.274344,3.339564,14.282484,0.625851,0.998859,0.124453,0.572359,2.318286,0.228572,0.70999,314.907474,0.775035
min,11.03,0.74,1.36,10.6,70.0,0.98,0.34,0.13,0.41,1.28,0.48,1.27,278.0,1.0
25%,12.3625,1.6025,2.21,17.2,88.0,1.7425,1.205,0.27,1.25,3.22,0.7825,1.9375,500.5,1.0
50%,13.05,1.865,2.36,19.5,98.0,2.355,2.135,0.34,1.555,4.69,0.965,2.78,673.5,2.0
75%,13.6775,3.0825,2.5575,21.5,107.0,2.8,2.875,0.4375,1.95,6.2,1.12,3.17,985.0,3.0
max,14.83,5.8,3.23,30.0,162.0,3.88,5.08,0.66,3.58,13.0,1.71,4.0,1680.0,3.0


In [53]:
data.dtypes

alcohol                         float64
malic_acid                      float64
ash                             float64
alcalinity_of_ash               float64
magnesium                         int64
total_phenols                   float64
flavanoids                      float64
nonflavanoid_phenols            float64
proanthocyanins                 float64
color_intensity                 float64
hue                             float64
od280_od315_of_diluted_wines    float64
proline                           int64
class_label                       int64
dtype: object

In [54]:
data.std()

alcohol                           0.811827
malic_acid                        1.117146
ash                               0.274344
alcalinity_of_ash                 3.339564
magnesium                        14.282484
total_phenols                     0.625851
flavanoids                        0.998859
nonflavanoid_phenols              0.124453
proanthocyanins                   0.572359
color_intensity                   2.318286
hue                               0.228572
od280_od315_of_diluted_wines      0.709990
proline                         314.907474
class_label                       0.775035
dtype: float64

In [55]:
data.mean()

alcohol                          13.000618
malic_acid                        2.336348
ash                               2.366517
alcalinity_of_ash                19.494944
magnesium                        99.741573
total_phenols                     2.295112
flavanoids                        2.029270
nonflavanoid_phenols              0.361854
proanthocyanins                   1.590899
color_intensity                   5.058090
hue                               0.957449
od280_od315_of_diluted_wines      2.611685
proline                         746.893258
class_label                       1.938202
dtype: float64

In [56]:
data.head()

Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280_od315_of_diluted_wines,proline,class_label
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065,1
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050,1
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,1
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480,1
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735,1


NO Categorical Features But We could check like this too:

In [57]:
categorical_features = data.select_dtypes(include=['object']).columns.tolist()

if categorical_features:
    for feature in categorical_features:
        if feature != 'target' and 'class' not in feature.lower():
            print(f"   {feature}: {data[feature].nunique()} unique values")
else:
    print(f"\n4. No categorical features (all numerical)")


4. No categorical features (all numerical)


In [58]:
df_processed = data.copy()

print(f"  Shape: {df_processed.shape}")
print(f"  Missing values: {df_processed.isnull().sum().sum()}")
print(f"  Duplicates: {df_processed.duplicated().sum()}")

  Shape: (178, 14)
  Missing values: 0
  Duplicates: 0


In [None]:
np.random.seed(42)

target_col = 'class_label'

print(f"\n7. Target variable: {target_col}")
print(f"Classes: {sorted(data[target_col].unique())}")
print(f"Class distribution:")
class_counts = data[target_col].value_counts().sort_index()
for cls, count in class_counts.items():
    print(f"Class {cls}: {count} samples ({100*count/len(data):.1f}%)")

print(f"\nDataset is clean! No preprocessing needed beyond normalization.")



7. Target variable: class_label
Classes: [np.int64(1), np.int64(2), np.int64(3)]
Class distribution:
Class 1: 59 samples (33.1%)
Class 2: 71 samples (39.9%)
Class 3: 48 samples (27.0%)

✓ Dataset is clean! No preprocessing needed beyond normalization.


In [60]:
X = data.drop(target_col, axis=1).values
y = data[target_col].values

print(f"X shape: {X.shape} (samples or m, features or n)")
print(f"y shape: {y.shape} (samples or m, )")

X shape: (178, 13) (samples or m, features or n)
y shape: (178,) (samples or m, )


In [61]:
# We could do it this way too!
print(f"\nMean of each feature:")
for i, col in enumerate(data.columns[:-1]):
    print(f"  {col}: mean={X[:, i].mean():.4f}, std={X[:, i].std():.4f}")


Mean of each feature:
  alcohol: mean=13.0006, std=0.8095
  malic_acid: mean=2.3363, std=1.1140
  ash: mean=2.3665, std=0.2736
  alcalinity_of_ash: mean=19.4949, std=3.3302
  magnesium: mean=99.7416, std=14.2423
  total_phenols: mean=2.2951, std=0.6241
  flavanoids: mean=2.0293, std=0.9960
  nonflavanoid_phenols: mean=0.3619, std=0.1241
  proanthocyanins: mean=1.5909, std=0.5707
  color_intensity: mean=5.0581, std=2.3118
  hue: mean=0.9574, std=0.2279
  od280_od315_of_diluted_wines: mean=2.6117, std=0.7080
  proline: mean=746.8933, std=314.0217


### Data Split with Stratified Train-Test

In [62]:
np.random.seed(42)

unique_classes = np.unique(y)

train_idx = []
test_idx = []

# Splitting Each Class separately, so we maintain the proportions:
for cls in unique_classes:
    cls_indices = np.where(y == cls)[0]
    split_point = int(0.75 * len(cls_indices))
    
    np.random.shuffle(cls_indices)
    train_idx.extend(cls_indices[:split_point])
    test_idx.extend(cls_indices[split_point:])

train_idx = np.array(train_idx)
test_idx = np.array(train_idx)

X_train = X[train_idx]
y_train = y[train_idx]

X_test = X[test_idx]
y_test = y[test_idx]

In [63]:
print(f"\nClass distribution preserved:")
print(f"  Training set:")
for cls in unique_classes:
    count = np.sum(y_train == cls)
    print(f"Class {cls}: {count} ({100*count/len(y_train):.1f}%)")

print(f"  Test set:")
for cls in unique_classes:
    count = np.sum(y_test == cls)
    print(f"Class {cls}: {count} ({100*count/len(y_test):.1f}%)")


Class distribution preserved:
  Training set:
Class 1: 44 (33.1%)
Class 2: 53 (39.8%)
Class 3: 36 (27.1%)
  Test set:
Class 1: 44 (33.1%)
Class 2: 53 (39.8%)
Class 3: 36 (27.1%)


In [64]:
print(f"\nBefore outlier removal: X_train shape = {X_train.shape}")

# Z-scores using training set statistics
train_mean = np.mean(X_train, axis=0)
train_std = np.std(X_train, axis=0)

Z_scores = np.abs((X_train - train_mean) / (train_std + 1e-8))

# outliers (any feature with |z| > 2.75)
outlier_mask = np.any(Z_scores > 2.75, axis=1)
n_outliers = np.sum(outlier_mask)

print(f"Outliers detected: {n_outliers}")

# Removing outliers
X_train_clean = X_train[~outlier_mask]
y_train_clean = y_train[~outlier_mask]

print(f"After outlier removal: X_train shape = {X_train_clean.shape}")
print(f"Samples removed: {n_outliers}")

X_train = X_train_clean
y_train = y_train_clean


Before outlier removal: X_train shape = (133, 13)
Outliers detected: 11
After outlier removal: X_train shape = (122, 13)
Samples removed: 11


In [65]:
print(f"\nBefore normalization:")
print(f"  X_train range: [{X_train.min():.2f}, {X_train.max():.2f}]")

# min/max
X_train_min = np.min(X_train, axis=0)
X_train_max = np.max(X_train, axis=0)

# Avoid division by zero
X_range = X_train_max - X_train_min
X_range = np.where(X_range == 0, 1, X_range)

# Apply min-max normalization: (X - min) / (max - min)
X_train_normalized = (X_train - X_train_min) / X_range
X_test_normalized = (X_test - X_train_min) / X_range

print(f"After normalization:")
print(f"  X_train_normalized range: [{X_train_normalized.min():.2f}, {X_train_normalized.max():.2f}]")
print(f"  All values now in [0, 1] range ✓")


Before normalization:
  X_train range: [0.13, 1515.00]
After normalization:
  X_train_normalized range: [0.00, 1.00]
  All values now in [0, 1] range ✓


In [72]:
class BinaryLogisticRegression: 
    
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.theta = None
        self.cost_history = []
        
    def sigmoid(self, z):
        z = np.clip(z, -500, 500) # limit the values in a NumPy array to a specified minimum and maximum range. Values outside this interval are replaced by the interval's edge values. 
        return 1 / (1 + np.exp(-z))
    
    def fit(self, X, y):
        m, n = X.shape
        
        X_with_bias = np.column_stack([np.ones(m), X])
        self.theta = np.zeros(X_with_bias.shape[1])
        
        for iteration in range(self.n_iterations):
            z = X_with_bias @ self.theta
            h_theta = self.sigmoid(z)
            
            errors = h_theta - y
            # Update with gradient
            gradients = (1/m) * (X_with_bias.T @ errors)
            self.theta -= self.learning_rate * gradients
            
            if iteration % 100 == 0:
                h_theta_clipped = np.clip(h_theta, 1e-15, 1 - 1e-15)
                cost = -1/m * np.sum(y * np.log(h_theta_clipped) + (1 - y) * np.log(1 - h_theta_clipped))
                self.cost_history.append(cost)
        
        return self
    
    def predict_prob(self, X):
        m = len(X)
        X_with_bias = np.column_stack([np.ones(m), X])
        z = X_with_bias @ self.theta
        return self.sigmoid(z)
    
    def predict(self, X, threshold=0.5):
        return (self.predict_prob(X) >= threshold).astype(int)
    

In [73]:
class SoftmaxLogisticRegression:
    
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.theta = None
        self.cost_history = []
        self.classes = None
    
    def softmax(self, z):
        """Softmax activation function"""
        z = z - np.max(z, axis=1, keepdims=True)
        exp_z = np.exp(z)
        return exp_z / np.sum(exp_z, axis=1, keepdims=True)
    
    def fit(self, X, y):
        """Fit multi-class logistic regression using softmax"""
        m, _ = X.shape
        self.classes = np.unique(y)
        n_classes = len(self.classes)
        
        # Add bias term
        X_with_bias = np.column_stack([np.ones(m), X])
        
        # Initialize parameters
        self.theta = np.random.randn(X_with_bias.shape[1], n_classes) * 0.01
        
        # Gradient descent
        for iteration in range(self.n_iterations):
            z = X_with_bias @ self.theta
            h_theta = self.softmax(z)
            
            # Convert y to one-hot encoding
            y_one_hot = np.zeros((m, n_classes))
            for i, cls in enumerate(self.classes):
                y_one_hot[y == cls, i] = 1
            
            # Cross-entropy loss
            h_theta_clipped = np.clip(h_theta, 1e-15, 1 - 1e-15)
            cost = -1/m * np.sum(y_one_hot * np.log(h_theta_clipped))
            
            # Gradients
            errors = h_theta - y_one_hot
            gradients = (1/m) * (X_with_bias.T @ errors)
            
            # Update parameters
            self.theta -= self.learning_rate * gradients
            
            if iteration % 100 == 0:
                self.cost_history.append(cost)
        
        return self
    
    def predict_prob(self, X):
        """Predict class probabilities"""
        m = len(X)
        X_with_bias = np.column_stack([np.ones(m), X])
        z = X_with_bias @ self.theta
        return self.softmax(z)
    
    def predict(self, X):
        """Predict class labels"""
        prob = self.predict_prob(X)
        class_indices = np.argmax(prob, axis=1)
        return self.classes[class_indices]

In [None]:
class OneVsAll:
    def __init__ (self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.model = {}
        self.classes = None
        
    def fit(self, X, y):
        self.classes = np.unique(y)
        
        print(f"\nTraining {len(self.classes)} OvA binary classifiers...")
        for cls in self.classes:
            y_binary = (y == cls).astype(int)
            
            model = BinaryLogisticRegression(learning_rate=self.learning_rate, n_iterations=self.n_iterations)
            model.fit(X, y_binary)
            
            self.model[cls] = model
        print(f"Trained {len(self.classes)} classifiers")
        return self
    
    def predict_prob(self, X):
        n_samples = len(X)
        prob = np.zeros((n_samples, len(self.classes)))
        
        for i, cls in enumerate(self.classes):
            prob[:, i] = self.model[cls].predict_prob(X)
            
        prob /= prob.sum(axis=1, keepdims=True)
        return prob
    
    def predict(self, X):
        prob = self.predict_prob(X)
        return self.classes[np.argmax(prob, axis=1)]

In [79]:
ova_model = OneVsAll(learning_rate=0.1, n_iterations=1000)
ova_model.fit(X_train_normalized, y_train)

y_train_pred_ova = ova_model.predict(X_train_normalized)
y_test_pred_ova = ova_model.predict(X_test_normalized)

train_acc_ova = np.mean(y_train_pred_ova == y_train)
test_acc_ova = np.mean(y_test_pred_ova == y_test)

print(f"\nOvA Results:")
print(f"  Train Accuracy: {train_acc_ova:.4f}")
print(f"  Test Accuracy: {test_acc_ova:.4f}")


Training 3 OvA binary classifiers...
✓ Trained 3 classifiers

OvA Results:
  Train Accuracy: 0.9918
  Test Accuracy: 0.9850


In [83]:
from itertools import combinations

class OneVsOne:
    
    def __init__(self, learning_rate=0.01, n_iterations=1000):
        self.learning_rate = learning_rate
        self.n_iterations = n_iterations
        self.model = {}
        self.classes = None
        
    def fit(self, X, y):
        self.classes = np.unique(y)
        
        class_pairs = list(combinations(self.classes, 2))
        print(f"\nTraining {len(class_pairs)} OvO binary classifiers...")

        for cls1, cls2 in class_pairs:
            mask = (y == cls1) | (y == cls2)
            X_pair = X[mask]
            y_pair = y[mask]
            
            y_binary = (y_pair == cls1).astype(int)
            
            model = BinaryLogisticRegression(
                learning_rate=self.learning_rate,
                n_iterations=self.n_iterations
            )
            
            model.fit(X_pair, y_binary)
            self.model[(cls1, cls2)] = model
            
        print(f"Trained {len(class_pairs)} classifiers")
        return self
    
    def predict(self, X):
        n_samples = len(X)
        votes = np.zeros((n_samples, len(self.classes)))
        
        for (cls1, cls2), model in self.model.items():
            prob = model.predict_prob(X)
            
            cls1_idx = self.classes.tolist().index(cls1)
            cls2_idx = self.classes.tolist().index(cls2)
            
            # Class 1 gets votes where prob >= 0.5
            votes[prob >= 0.5, cls1_idx] += 1
            votes[prob < 0.5, cls2_idx] += 1
        
        return self.classes[np.argmax(votes, axis=1)]


In [84]:
ovo_model = OneVsOne(learning_rate=0.1, n_iterations=1000)
ovo_model.fit(X_train_normalized, y_train)

y_train_pred_ovo = ovo_model.predict(X_train_normalized)
y_test_pred_ovo = ovo_model.predict(X_test_normalized)

train_acc_ovo = np.mean(y_train_pred_ovo == y_train)
test_acc_ovo = np.mean(y_test_pred_ovo == y_test)

print(f"\nOvO Results:")
print(f"  Train Accuracy: {train_acc_ovo:.4f}")
print(f"  Test Accuracy: {test_acc_ovo:.4f}")


Training 3 OvO binary classifiers...
Trained 3 classifiers

OvO Results:
  Train Accuracy: 0.9918
  Test Accuracy: 0.9850


In [85]:
# Train Softmax
softmax_model = SoftmaxLogisticRegression(learning_rate=0.1, n_iterations=1000)
softmax_model.fit(X_train_normalized, y_train)

y_train_pred_softmax = softmax_model.predict(X_train_normalized)
y_test_pred_softmax = softmax_model.predict(X_test_normalized)

train_acc_softmax = np.mean(y_train_pred_softmax == y_train)
test_acc_softmax = np.mean(y_test_pred_softmax == y_test)

print(f"\nSoftmax Results:")
print(f"  Train Accuracy: {train_acc_softmax:.4f}")
print(f"  Test Accuracy: {test_acc_softmax:.4f}")


Softmax Results:
  Train Accuracy: 0.9918
  Test Accuracy: 0.9850


In [86]:
print(f"\n1. TEST SET ACCURACY COMPARISON:")
print(f"   One-vs-All (OvA):     {test_acc_ova:.4f}")
print(f"   One-vs-One (OvO):     {test_acc_ovo:.4f}")
print(f"   Softmax Regression:   {test_acc_softmax:.4f}")

print(f"\n2. TRAINING SET ACCURACY:")
print(f"   One-vs-All (OvA):     {train_acc_ova:.4f}")
print(f"   One-vs-One (OvO):     {train_acc_ovo:.4f}")
print(f"   Softmax Regression:   {train_acc_softmax:.4f}")

print(f"\n3. CONVERGENCE BEHAVIOR:")
print(f"   OvA:       Trains {len(np.unique(y_train))} binary classifiers")
print(f"   OvO:       Trains {len(list(combinations(np.unique(y_train), 2)))} binary classifiers")
print(f"   Softmax:   Direct multi-class training")

print(f"\n4. COMPUTATIONAL COMPLEXITY:")
n_classes = len(np.unique(y_train))
print(f"   OvA:       O(K) binary classifiers, where K={n_classes}")
print(f"   OvO:       O(K²) binary classifiers, where K(K-1)/2={(n_classes*(n_classes-1))//2}")
print(f"   Softmax:   O(1) multi-class classifier")

best_method = ['OvA', 'OvO', 'Softmax'][np.argmax([test_acc_ova, test_acc_ovo, test_acc_softmax])]
best_acc = max([test_acc_ova, test_acc_ovo, test_acc_softmax])
print(f"\n5. BEST PERFORMANCE:")
print(f"   Method: {best_method}")
print(f"   Test Accuracy: {best_acc:.4f}")


1. TEST SET ACCURACY COMPARISON:
   One-vs-All (OvA):     0.9850
   One-vs-One (OvO):     0.9850
   Softmax Regression:   0.9850

2. TRAINING SET ACCURACY:
   One-vs-All (OvA):     0.9918
   One-vs-One (OvO):     0.9918
   Softmax Regression:   0.9918

3. CONVERGENCE BEHAVIOR:
   OvA:       Trains 3 binary classifiers
   OvO:       Trains 3 binary classifiers
   Softmax:   Direct multi-class training

4. COMPUTATIONAL COMPLEXITY:
   OvA:       O(K) binary classifiers, where K=3
   OvO:       O(K²) binary classifiers, where K(K-1)/2=3
   Softmax:   O(1) multi-class classifier

5. BEST PERFORMANCE:
   Method: OvA
   Test Accuracy: 0.9850


In [None]:
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Plot 1: Cost function convergence (Softmax)
ax = axes[0, 0]
ax.plot(softmax_model.cost_history, marker='o', color='purple', linewidth=2)
ax.set_xlabel('Iterations (×100)')
ax.set_ylabel('Cross-Entropy Loss')
ax.set_title('Softmax Regression: Cost Function Convergence')
ax.grid(True, alpha=0.3)

# Plot 2: Accuracy Comparison
ax = axes[0, 1]
methods = ['OvA', 'OvO', 'Softmax']
train_accs = [train_acc_ova, train_acc_ovo, train_acc_softmax]
test_accs = [test_acc_ova, test_acc_ovo, test_acc_softmax]

x = np.arange(len(methods))
width = 0.35

bars1 = ax.bar(x - width/2, train_accs, width, label='Train', alpha=0.8)
bars2 = ax.bar(x + width/2, test_accs, width, label='Test', alpha=0.8)

ax.set_ylabel('Accuracy')
ax.set_title('Accuracy Comparison: All Three Methods')
ax.set_xticks(x)
ax.set_xticklabels(methods)
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
ax.set_ylim([0, 1.1])

# Add value labels on bars
for bars in [bars1, bars2]:
    for bar in bars:
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.3f}',
                ha='center', va='bottom', fontsize=9)

# Plot 3: Confusion Matrix for best method
ax = axes[1, 0]
if best_method == 'OvA':
    y_pred_best = y_test_pred_ova
elif best_method == 'OvO':
    y_pred_best = y_test_pred_ovo
else:
    y_pred_best = y_test_pred_softmax

classes = np.unique(y_test)
confusion = np.zeros((len(classes), len(classes)))
for i, true_cls in enumerate(classes):
    for j, pred_cls in enumerate(classes):
        confusion[i, j] = np.sum((y_test == true_cls) & (y_pred_best == pred_cls))

im = ax.imshow(confusion, cmap='Blues', aspect='auto')
ax.set_xlabel('Predicted Class')
ax.set_ylabel('True Class')
ax.set_title(f'Confusion Matrix ({best_method})')
ax.set_xticks(range(len(classes)))
ax.set_yticks(range(len(classes)))
ax.set_xticklabels(classes)
ax.set_yticklabels(classes)

for i in range(len(classes)):
    for j in range(len(classes)):
        text = ax.text(j, i, int(confusion[i, j]),
                      ha="center", va="center", color="black", fontsize=12)

plt.colorbar(im, ax=ax)

# Plot 4: Summary
ax = axes[1, 1]
ax.axis('off')

summary_text = f"""
MULTI-CLASS CLASSIFICATION RESULTS

Test Accuracy (Wine Dataset):
  • One-vs-All (OvA):      {test_acc_ova:.4f}
  • One-vs-One (OvO):      {test_acc_ovo:.4f}
  • Softmax Regression:    {test_acc_softmax:.4f}
  
BEST: {best_method} ({best_acc:.4f})

Data Summary:
  • Samples: {len(y_train)} train, {len(y_test)} test
  • Features: {X_train_normalized.shape[1]}
  • Classes: {len(np.unique(y_train))}
  • Outliers removed: {len(X) - len(X_train)}

Training Details:
  • Normalization: Min-Max [0, 1]
  • Learning rate: 0.1
  • Iterations: 1000
  • Train/Test: 75/25 (stratified)
"""

ax.text(0.05, 0.95, summary_text, transform=ax.transAxes,
        fontsize=10, verticalalignment='top', family='monospace',
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.tight_layout()
plt.savefig('task3_multiclass_classification.png', dpi=150, bbox_inches='tight')
print("Plots saved as 'task3_multiclass_classification.png'")
plt.close()

✓ Plots saved as 'task3_multiclass_classification.png'
