<a href="https://colab.research.google.com/github/dzervenes/dzervenes.github.io/blob/master/e_Portfolio_Activity_Model_Performance_Measurement.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this notebook, I will conduct experiments to explore how varying different parameters affects the AUC (Area Under the Curve) and R² (R-squared) error values. The objective is to analyse the relationship between parameter changes and the resulting performance metrics, providing insights into the model's behaviour and predictive capabilities.

# Baseline values

Running the default model, without altering its parameters, yields the following baseline values:

- AUC (Linear Model): 0.994767718408118

- AUC (Multiclass Case): 0.9913333333333334
- R²: 0.9486081370449679

These baseline metrics will serve as a reference point for evaluating how changes to various parameters influence the model's performance. By comparing these values with those obtained from parameter tuning, we can gain insights into the model's sensitivity to different configurations.

**Regularisation strength**

In this section, I explore the effect of regularisation strength (C) on the model's performance, measured using the AUC (Area Under the Curve) metric for both binary and multiclass classification tasks.

The regularisation strength is set to two different values: C = 0.001 and C = 1000.

This analysis provides insights into how varying regularisation strength impacts the classifier's ability to generalise across different tasks.

In [None]:
X_binary, y_binary = load_breast_cancer(return_X_y=True)
X_multi, y_multi = load_iris(return_X_y=True)

# Testing with Small C = 0.001
print("\nTesting with C = 0.001")
clf_binary_small = LogisticRegression(C=0.001, solver="liblinear")
clf_binary_small.fit(X_binary, y_binary)
auc_binary_small = roc_auc_score(y_binary, clf_binary_small.predict_proba(X_binary)[:, 1])
print(f"Binary Classification - AUC: {auc_binary_small:.4f}")

clf_multi_small = LogisticRegression(C=0.001, solver="liblinear")
clf_multi_small.fit(X_multi, y_multi)
auc_multi_small = roc_auc_score(y_multi, clf_multi_small.predict_proba(X_multi), multi_class='ovr')
print(f"Multiclass Classification - AUC: {auc_multi_small:.4f}")

# Testing with Large C = 1000
print("\nTesting with C = 1000")
clf_binary_large = LogisticRegression(C=1000, solver="liblinear")
clf_binary_large.fit(X_binary, y_binary)
auc_binary_large = roc_auc_score(y_binary, clf_binary_large.predict_proba(X_binary)[:, 1])
print(f"Binary Classification - AUC: {auc_binary_large:.4f}")

clf_multi_large = LogisticRegression(C=1000, solver="liblinear")
clf_multi_large.fit(X_multi, y_multi)
auc_multi_large = roc_auc_score(y_multi, clf_multi_large.predict_proba(X_multi), multi_class='ovr')
print(f"Multiclass Classification - AUC: {auc_multi_large:.4f}")


Testing with C = 0.001
Binary Classification - AUC: 0.9751
Multiclass Classification - AUC: 0.8351

Testing with C = 1000
Binary Classification - AUC: 0.9969
Multiclass Classification - AUC: 0.9956


**Results**

Comparison of Binary Classification AUC

- Baseline AUC: 0.9948

- C = 0.001: AUC = 0.9751

- C = 1000: AUC = 0.9969

Comparison of Multiclass Classification AUC

- Baseline AUC : 0.9913
- C = 0.001: AUC = 0.8351
- C = 1000: AUC = 0.9956

For C = 0.001 (strong regularisation), both binary and multiclass AUC values are lower than the baseline. This indicates that excessive regularisation limits the model's ability to capture the complexity of the data, leading to underperformance. On the other hand, for C = 1000 (weak regularisation), both AUC values exceed the baseline. This suggests that reducing regularisation enhances the model's predictive power by allowing it to better adapt to the data's complexity. Overall, this comparison highlights the trade-off between regularisation strength and model performance, demonstrating the importance of finding a careful balance to optimise results.

**Solver**

In this section, I decided to experiment with the lbfgs solver for logistic regression to evaluate its impact on the model's performance. The solver is known for its efficiency and suitability for both binary and multiclass classification tasks.

However, during the experiments, the default number of iterations (max_iter=100) was insufficient for convergence, resulting in a ConvergenceWarning. To address this, I increased the number of iterations to 5000 to ensure proper convergence of the model.

In [None]:
# Binary Classification with lbfgs
clf_binary_lbfgs = LogisticRegression(solver="lbfgs", max_iter=5000).fit(X_binary, y_binary)
auc_binary_lbfgs = roc_auc_score(y_binary, clf_binary_lbfgs.predict_proba(X_binary)[:, 1])
print(f"Binary Classification - AUC: {auc_binary_lbfgs:.4f}")

# Multiclass Classification with lbfgs (default One-vs-Rest)
clf_multi_lbfgs = LogisticRegression(solver="lbfgs", max_iter=5000).fit(X_multi, y_multi)
auc_multi_lbfgs = roc_auc_score(y_multi, clf_multi_lbfgs.predict_proba(X_multi), multi_class="ovr")
print(f"Multiclass Classification - AUC: {auc_multi_lbfgs:.4f}")

Binary Classification - AUC: 0.9947
Multiclass Classification - AUC: 0.9983


**Results**

The experiment using the lbfgs solver with max_iter=5000 yielded the following AUC values:

- Binary Classification - AUC: 0.9947
- Multiclass Classification - AUC: 0.9983

For binary classification, the lbfgs solver provides performance comparable to the baseline, maintaining a consistently high AUC value. In contrast, for multiclass classification, lbfgs demonstrates a clear improvement over the baseline by achieving a higher AUC value and potentially better generalisation.

**Regularisation Penalty**

In this section, I explore two types of regularisation penalties, l1 and l2, to evaluate their impact on the performance of a logistic regression model for binary classification. Due to the computational intensity of l1 regularisation, the default number of iterations (max_iter=100) was insufficient for the solver to converge. To address this issue, I increased max_iter to 1000, providing the solver with additional iterations to optimise the model's coefficients effectively.

In [None]:
# Testing L1 Penalty with increased iterations
clf_binary_l1 = LogisticRegression(solver="liblinear", penalty="l1", max_iter=1000).fit(X_binary, y_binary)
auc_binary_l1 = roc_auc_score(y_binary, clf_binary_l1.predict_proba(X_binary)[:, 1])
print(f"Binary Classification (L1 Penalty) - AUC: {auc_binary_l1:.4f}")

# Testing L2 Penalty with increased iterations
clf_binary_l2 = LogisticRegression(solver="liblinear", penalty="l2", max_iter=1000).fit(X_binary, y_binary)
auc_binary_l2 = roc_auc_score(y_binary, clf_binary_l2.predict_proba(X_binary)[:, 1])
print(f"Binary Classification (L2 Penalty) - AUC: {auc_binary_l2:.4f}")

Binary Classification (L1 Penalty) - AUC: 0.9951
Binary Classification (L2 Penalty) - AUC: 0.9948


**Results**

- L1 Penalty - AUC: 0.9951
- L2 Penalty - AUC: 0.9948

The results show that both penalties achieved high AUC values, with l1 slightly outperforming the baseline, while l2 closely matched it. The slight improvement with l1 suggests its feature selection properties may enhance model generalisation, while l2 provides stable and reliable performance.

**Train-Test Split**

In this section, I experiment with different train-test splits to observe how varying the proportion of data used for training and testing affects the model's performance. By iterating over splits of 20%, 50%, and 70% for testing, I aim to understand the impact of data distribution on prediction accuracy.

In [None]:
splits = [0.2, 0.5, 0.7]

for test_size in splits:
    # Split the data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=42)

    # Train SVM model
    svr = svm.SVR(kernel="linear")
    svr.fit(X_train, y_train)

    # Make predictions
    y_pred = svr.predict(X_test)

    # Calculate metrics
    mse = mean_squared_error(y_test, y_pred)
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)

    # Print results for this split
    print(f"Test Size: {test_size}")
    print(f"R² Score: {r2:.4f}\n")

Test Size: 0.2
R² Score: 0.9481

Test Size: 0.5
R² Score: 0.9392

Test Size: 0.7
R² Score: 0.9218



**Results**

The experiment demonstrates that as the test size increases, the R² score slightly decreases. With a smaller test size, the model achieves the highest R² score of 0.9481, indicating better performance when more data is available for training. As the test size grows to 50% and 70%, the R² scores decrease to 0.9392 and 0.9218, respectively, reflecting the trade-off between training data size and the model's ability to generalise to unseen data.

It is worth noting that the difference between the R² score of 0.9392 at test size 0.5 and the baseline R² (which was also calculated based on a test size of 0.5) is likely due to the randomness introduced during the train-test split.

**Noise levels**

In this section, I experiment with different noise levels added to the dataset to evaluate their impact on the model's performance. By introducing varying amounts of random noise to the features and training an SVM model, I measure the resulting R² scores to observe how well the model generalises under noisy conditions. This analysis helps assess the model's robustness to data distortion.


In [None]:
# Experiment with different noise levels
noise_levels = [0, 0.5, 1.0]
test_size = 0.5

for noise in noise_levels:
    # Add noise to the features
    random_state = np.random.RandomState(42)
    X_noisy = X + noise * random_state.normal(size=X.shape)

    # Split the noisy data
    X_train, X_test, y_train, y_test = train_test_split(X_noisy, y, test_size=test_size, random_state=42)

    # Train SVM model
    svr = svm.SVR(kernel="linear")
    svr.fit(X_train, y_train)

    # Make predictions
    y_pred = svr.predict(X_test)

    # Calculate metrics
    r2 = r2_score(y_test, y_pred)

    print(f"Noise Level: {noise}")
    print(f"R² Score: {r2:.4f}\n")

Noise Level: 0
R² Score: 0.9392

Noise Level: 0.5
R² Score: 0.8293

Noise Level: 1.0
R² Score: 0.6389



**Results**

The experiment shows that increasing noise levels in the dataset reduces the model's performance, as seen by declining R² scores. Without noise, the model achieves the highest R², indicating strong accuracy, while higher noise levels make it harder for the model to capture meaningful patterns, highlighting its sensitivity to data distortion.

**Conclusion**

These experiments show how train-test split, regularisation, solver choice, and noise levels affect model performance. The results highlight the importance of fine-tuning parameters and ensuring data quality for achieving reliable and generalisable models.