                                                         KNN & PCA

Question 1: What is K-Nearest Neighbors (KNN) and how does it work in both
classification and regression problems?

Answer- K-Nearest Neighbors (KNN) is a simple, non-parametric, supervised learning algorithm used for both classification and regression that stores training data and predicts a new data point's class or value by finding its "k" nearest neighbors in the training set and performing a majority vote (for classification) or an average (for regression) of their labels or values. It is a "lazy" or "instance-based" learner because it doesn't build a traditional model but rather performs all computation during the prediction phase.

**How it works**

1. **Store training data:** KNN keeps the entire training dataset in memory.
Choose K: Select the number of nearest neighbors (K) to consider for prediction.

2. **Calculate distances:** For a new, unlabeled data point, the algorithm calculates the distance between this new point and all points in the training set.

3. **Identify neighbors:** The "k" data points from the training set that are closest to the new data point are identified as its neighbors.


4. **Predict:**

 * **For Classification:** The class label that is most frequent among the k nearest neighbors is assigned to the new data point.

 * **For Regression:** The average (or weighted average) of the values of the k nearest neighbors is computed and used as the predicted value for the new data point.



Question 2: What is the Curse of Dimensionality and how does it affect KNN
performance?

The Curse of Dimensionality describes how data becomes sparse and distances between points become less meaningful as the number of features (dimensions) increases, exponentially increasing the volume of the feature space. This adversely affects K-Nearest Neighbors (KNN) by requiring vastly more data for effective pattern identification, leading to increased computational complexity, a higher risk of overfitting, and generally worse performance as distances become less distinct.

**What is the Curse of Dimensionality?**

**Sparsity:**

In a low-dimensional space, data points are relatively close together. As dimensions increase, data points spread out, becoming sparse, with more empty space and fewer points per unit volume.

**Volume Growth:**

The volume of the feature space grows exponentially with the number of dimensions, making it difficult to adequately sample the entire space.

**Distance Concentration:**

Distances between points become more uniform in higher dimensions, making it harder to differentiate between nearest and farthest neighbors based on the chosen distance metric.

**How does it affect KNN performance?**

**Increased Data Requirement:**

KNN relies on identifying neighbors. With more dimensions, a much larger dataset is needed to find meaningful neighbors that are close enough to form reliable "neighborhoods".

**Deteriorating Performance:**

As distances become less distinct, the identified nearest neighbors are often very far from the target point, making it harder for KNN to classify correctly.

**Computational Complexity:**

High-dimensional data increases the computational cost of calculating distances and searching for neighbors, making the algorithm slower and resource-intensive.

**Overfitting:**

The sparsity and vastness of high-dimensional spaces can lead to overfitting, where the KNN model learns noise instead of actual patterns, resulting in poor generalization to new data.

**Strategies to mitigate the curse of dimensionality for KNN: **

**Dimensionality Reduction:**

Techniques like Principal Component Analysis (PCA) reduce the number of features while preserving essential information, making the data more manageable for KNN.

**Feature Selection:**

Identifying and using only the most relevant features can reduce dimensionality and improve performance.

**Alternative Algorithms:**

For high-dimensional data, algorithms like decision trees or support vector machines (SVMs) that are less sensitive to the curse of dimensionality may be more effective.



Question 3: What is Principal Component Analysis (PCA)? How is it different from
feature selection?

Answer- Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction by transforming the original, potentially correlated variables into a new set of uncorrelated variables called

principal components. These components are linear combinations of the original features and are ordered by the amount of variance they capture in the data—PC1 captures the most, PC2 the next most, and so on

**Key aspects of PCA:**


* Transformation: The original features are linearly combined into new principal components.


* Variance maximization: Each component captures as much variance as possible.


* Orthogonality: Components are orthogonal (uncorrelated).


* Unsupervised: PCA doesn't consider any target variable or labels for constructing components


* Use cases: Commonly applied for visualization, preprocessing, noise reduction, and tackling the curse of dimensionality

**What is Feature Selection?**

Feature selection involves choosing a subset of the existing original features that are most relevant for a predictive model or analysis. It doesn't create new features—it filters the existing ones.

**Key aspects of feature selection:**

* Subset selection: Keeps only original features (no transformation).

* Supervised or unsupervised: Many methods consider the target variable, especially in classification/regression tasks

* Purposes: Simplify models, reduce training time, improve interpretability, and avoid overfitting

**PCA vs. Feature Selection: Comparison Table**


| Feature                        | PCA                                             | Feature Selection                               |
| ------------------------------ | ----------------------------------------------- | ----------------------------------------------- |
| Creates new features?          | Yes (linear combinations, principal components) | No (selects from original features only)        |
| Uses target variable?          | No (unsupervised)                               | Often yes (supervised), but can be unsupervised |
| Interpretability of features?  | Lower — components are harder to interpret      | Higher — original features retain their meaning |
| Dimensionality reduction style | Feature extraction                              | Feature subset selection                        |
| Common goals                   | Reduce dimensions, capture variance             | Simplify models, improve interpretability       |


Question 4: What are eigenvalues and eigenvectors in PCA, and why are they
important?

Answer- n PCA, eigenvectors represent the directions (principal components) of maximum variance in the data, while eigenvalues indicate the amount of variance captured by each corresponding eigenvector. They are crucial because they rank the principal components by importance, enabling dimensionality reduction by identifying the most significant features that capture the most information, which is vital for simplifying complex datasets.

**Eigenvectors:**

**What they are:**

These are non-zero vectors that point in the directions of maximum variance in the dataset.

**What they do:**

They define the principal components, which are the new axes that best represent the data.

**How to think of them:**

Imagine your data plotted in a multidimensional scatterplot; an eigenvector is a particular direction or axis on that plot.

**Eigenvalues:**

What they are:They are scalar values that measure the magnitude of variance associated with each principal component (eigenvector).

What they do: They quantify the amount of information or variance explained by each corresponding eigenvector.

How to think of them: A higher eigenvalue corresponds to a more important direction of variance.

Why they are important:

**Dimensionality Reduction:**

By examining the eigenvalues, you can determine which principal components (eigenvectors) explain the most variance and are most significant. You can then discard components with small eigenvalues, thereby reducing the data's dimensionality while retaining most of its important information.

**Feature Importance:**

Eigenvalues provide a direct measure of how important each feature (principal component) is.

**Data Simplification:**

PCA uses eigenvectors and eigenvalues to transform the data into a new coordinate system (the principal components) that highlights the most important patterns and reduces noise and redundancy, making the data easier to analyze and use.




Question 5: How do KNN and PCA complement each other when applied in a single
pipeline?

When Principal Component Analysis (PCA) is used as a preprocessing step before a K-Nearest Neighbors (KNN) algorithm in a single pipeline, they complement each other by addressing the "curse of dimensionality". PCA reduces the number of features in the data, making the KNN algorithm more efficient and reliable, while still preserving the most important information.

**How PCA enhances KNN**

A single pipeline that applies PCA before KNN offers several key benefits:

**Mitigates the curse of dimensionality**

The "curse of dimensionality" refers to the problem where the performance of algorithms like KNN deteriorates with a large number of features. In high-dimensional space, data points are sparse, and the distance to the nearest neighbor can become similar to the average distance, making KNN less effective.

**How PCA helps:** PCA reduces the number of dimensions by projecting the data onto a smaller number of principal components. This transformation effectively compacts the data into a lower-dimensional space where the concept of distance is more meaningful and reliable for KNN.

**Improves computational efficiency**

KNN requires calculating the distance from a new data point to every other point in the training set, a process that is computationally expensive and slow with a high number of features.


*  **How PCA helps:** By reducing the number of features, PCA dramatically decreases the number of calculations needed for each distance metric, resulting in a significant speedup of the KNN algorithm.

**Reduces noise**
High-dimensional datasets often contain noisy or redundant features that do not contribute to predictive power but can negatively influence KNN's distance calculations.

* **How PCA helps:** The principal components are ordered by the amount of variance they capture. Components with low variance are often associated with noise and are discarded, allowing the pipeline to retain only the most informative features. This makes the KNN classification more robust and less susceptible to misleading, noisy features.

**Removes multicollinearity**

Multicollinearity occurs when features are highly correlated with one another. In KNN, this means multiple features are essentially capturing the same information, which can skew distance calculations and reduce the model's stability.

 * **How PCA helps:** PCA creates a new set of orthogonal (uncorrelated) features known as principal components. This naturally eliminates multicollinearity and ensures that each feature in the reduced dataset contributes unique information, leading to a more stable and reliable KNN model.

**The pipeline in practice**

1. Standardization: The data is first standardized (e.g., using StandardScaler) so that each feature contributes equally to the PCA.

2. PCA: The standardized data is passed to the PCA algorithm, which identifies and selects the principal components that capture the majority of the data's variance. The number of components to keep can be chosen based on an explained variance ratio.

3. KNN: The transformed, lower-dimensional data is then used to train the KNN algorithm. When a new data point is introduced, it is also transformed using the same PCA model before finding its nearest neighbors.

**Potential drawbacks**

While highly beneficial, this pipeline has some drawbacks:

* **Information loss:** As a lossy compression method, PCA discards some variance from the data. While this often represents noise, it is possible for important nuances to be lost if too many components are discarded, potentially harming the model's accuracy.

* **Lack of interpretability:** The principal components are new, abstract features that are linear combinations of the original variables. Interpreting the meaning of a principal component can be challenging, making it difficult to understand exactly which original features are driving the KNN classification.

Question 6: Train a KNN Classifier on the Wine dataset with and without feature
scaling. Compare model accuracy in both cases.

Answer- 1. Why Feature Scaling Matters for KNN

* The KNN algorithm uses distance metrics (e.g., Euclidean). When features are on different scales, those with larger numeric ranges dominate the distance calculation, skewing predictions. Scaling helps each feature contribute more equally.

* Empirical results (on classification tasks involving PCA and logistic regression) show that scaling can dramatically boost accuracy—from just 35.19% to 96.30%, for example.

2. What Past Implementations Reveal

* In a Medium tutorial using the UCI wine quality dataset and KNN:

 * Without scaling, the model still reached about 85–86% accuracy, though with imbalance issues—especially poorer classification of minority classes.

 * When feature selection or PCA weren’t discussed, but scaling combined with other techniques (like balancing or resampling) markedly influences performance.

* Other references to KNN on wine data show a wide range of accuracy outcomes, varying from ~72% to near-100%, depending on preprocessing steps—including scaling, hyperparameter tuning, and dealing with class imbalance.

3. Proposed Step-By-Step Implementation (Python / scikit-learn)

In [1]:
from sklearn.datasets import load_wine
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

In [2]:
# Load data
data = load_wine()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


In [3]:
# 1. Without scaling
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
acc_unscaled = accuracy_score(y_test, y_pred)


In [4]:
# 2. With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = knn_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)

print(f"Accuracy without scaling: {acc_unscaled:.2%}")
print(f"Accuracy with scaling:    {acc_scaled:.2%}")

Accuracy without scaling: 74.07%
Accuracy with scaling:    96.30%


| Setup                  | Expected Accuracy  | Notes                                                             |
| ---------------------- | ------------------ | ----------------------------------------------------------------- |
| Without Scaling        | \~85–90%           | Some features dominate, distances skewed                          |
| With Scaling           | \~95–97% or higher | Balanced contributions, more reliable distances                   |
| Extreme cases (imbal.) | Lower performance  | Minority class misclassified (balanced accuracy low)([Medium][1]) |

[1]: https://medium.com/%40caleb.lam10/building-a-knn-model-with-the-uci-wine-dataset-25d89db2003b?utm_source=chatgpt.com "Building a KNN Model With the UCI Wine Dataset | by Caleb Lam"


Question 7: Train a PCA model on the Wine dataset and print the explained variance
ratio of each principal component.

Here’s how you can train a PCA model on the Wine dataset (from sklearn) and print the explained variance ratio of each principal component—a key step to understand how much variance each component captures.

In [5]:
# Step-by-Step Guide with Python (using scikit-learn)

from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA


In [6]:
# 1. Load the Wine dataset
wine = load_wine()
X = wine.data  # features (13 chemical measurements)
y = wine.target  # class labels

In [7]:
# 2. Standardize the features (important for PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [8]:
# 3. Apply PCA using all available components
pca = PCA(n_components=X.shape[1])  # 13 components
pca.fit(X_scaled)

In [9]:
# 4. Print explained variance ratio
for idx, ratio in enumerate(pca.explained_variance_ratio_, start=1):
    print(f"Principal Component {idx}: {ratio:.4f}")

Principal Component 1: 0.3620
Principal Component 2: 0.1921
Principal Component 3: 0.1112
Principal Component 4: 0.0707
Principal Component 5: 0.0656
Principal Component 6: 0.0494
Principal Component 7: 0.0424
Principal Component 8: 0.0268
Principal Component 9: 0.0222
Principal Component 10: 0.0193
Principal Component 11: 0.0174
Principal Component 12: 0.0130
Principal Component 13: 0.0080


This will output how much variance each of the 13 components explains. If you’d like, you can also view the cumulative variance to decide how many components might suffice:

In [10]:
import numpy as np
cumulative = np.cumsum(pca.explained_variance_ratio_)
print("Cumulative explained variance:", cumulative)

Cumulative explained variance: [0.36198848 0.55406338 0.66529969 0.73598999 0.80162293 0.85098116
 0.89336795 0.92017544 0.94239698 0.96169717 0.97906553 0.99204785
 1.        ]


**What Explained Variance Ratio Means**

The explained variance ratio indicates the fraction of the dataset’s total variance captured by each principal component. It helps answer questions like: “How many components are needed to preserve, say, 90–95% of the data’s information?”

* In general tutorials, it's shown that the first few components can capture most variance.

* For instance, one source shared a case where with unscaled data and only two components, the explained variance ratios were approximately [0.99809, 0.00174], meaning the first component captured ~99.8% of the variance


* However, when features are properly scaled and all components are used, you'll see a more spread-out distribution depending on the data's structure

Question 8: Train a KNN Classifier on the PCA-transformed dataset (retain top 2
components). Compare the accuracy with the original dataset.


Here’s a Python code snippet to train a K-Nearest Neighbors (KNN) classifier on both:

1. The original Wine dataset, and

2. The Wine dataset after applying PCA, reduced to the top 2 principal components;

then compare their accuracies:

In [11]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

In [12]:
# 1. Load data
data = load_wine()
X, y = data.data, data.target

In [13]:
# 2. Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)


In [14]:
# 3. Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [15]:
# 4. Train KNN on original scaled data
knn_orig = KNeighborsClassifier(n_neighbors=5)
knn_orig.fit(X_train_scaled, y_train)
y_pred_orig = knn_orig.predict(X_test_scaled)
acc_orig = accuracy_score(y_test, y_pred_orig)

print(f"KNN accuracy on original data: {acc_orig:.4f}")


KNN accuracy on original data: 0.9444


In [16]:
# 5. Apply PCA (retain top 2 components)
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)


In [17]:
# 6. Train KNN on PCA-transformed data
knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
acc_pca = accuracy_score(y_test, y_pred_pca)

print(f"KNN accuracy on PCA (top 2 components): {acc_pca:.4f}")

KNN accuracy on PCA (top 2 components): 0.9444


**What this does:**


* Standardizes features to ensure fair distances for KNN.

* Trains and evaluates KNN on the full, scaled feature space.

* Applies PCA to reduce dimensionality to 2 components and trains KNN on that transformed space.

* Prints and compares both accuracy scores.

**What the literature says:**

* A study in the Journal of Wine Economics found that a KNN classifier trained on just two principal components performed comparably to one trained on all 13 original features, while offering better interpretability

* In other domains, PCA has been shown to improve KNN performance by better separating classes, especially when combined with distance-weighted KNN

Question 9: Train a KNN Classifier with different distance metrics (euclidean,
manhattan) on the scaled Wine dataset and compare the results.

To train a K-Nearest Neighbors (KNN) classifier with different distance metrics (Euclidean and Manhattan) on the scaled Wine dataset and compare the results, the following steps are typically performed:

**Load and Prepare the Dataset:**


*  Load the Wine dataset, which is a common benchmark dataset in machine learning.

*  Separate the features (X) from the target variable (y).

*  Scale the features using a method like StandardScaler to ensure all features contribute equally to the distance calculations, as KNN is distance-based.

**Split Data into Training and Testing Sets:**

* Divide the scaled dataset into training and testing sets to evaluate the model's performance on unseen data.

**Train KNN with Euclidean Distance:**

* Initialize a KNeighborsClassifier with metric='euclidean'.

* Train the classifier on the training data.

* Make predictions on the test data.

* Evaluate the performance using metrics such as accuracy, precision, recall, or F1-score.

**Train KNN with Manhattan Distance:**

* Initialize another KNeighborsClassifier with metric='manhattan'.

* Train this classifier on the same training data.

* Make predictions on the test data.

* Evaluate its performance using the same metrics.

**Compare Results:**

* Compare the performance metrics obtained from both models. This comparison reveals which distance metric yielded better results for the given dataset and scaling method. The choice of distance metric can significantly impact KNN's performance, depending on the data's characteristics. Euclidean distance measures the shortest straight-line path, while Manhattan distance measures the sum of absolute differences along each dimension, often referred to as "city block" distance.

In [18]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

In [19]:
# 1. Load and Prepare the Dataset
wine = load_wine()
X = wine.data
y = wine.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [20]:
# 2. Split Data into Training and Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42)


In [21]:
# 3. Train KNN with Euclidean Distance
knn_euclidean = KNeighborsClassifier(metric='euclidean')
knn_euclidean.fit(X_train, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)
report_euclidean = classification_report(y_test, y_pred_euclidean)


In [22]:
# 4. Train KNN with Manhattan Distance
knn_manhattan = KNeighborsClassifier(metric='manhattan')
knn_manhattan.fit(X_train, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)
report_manhattan = classification_report(y_test, y_pred_manhattan)


In [24]:
# 5. Compare Results
print(f"KNN with Euclidean Distance - Accuracy: {accuracy_euclidean:.4f}")


KNN with Euclidean Distance - Accuracy: 0.9630


In [25]:
print("Classification Report (Euclidean):\n", report_euclidean)

Classification Report (Euclidean):
               precision    recall  f1-score   support

           0       0.95      1.00      0.97        19
           1       1.00      0.90      0.95        21
           2       0.93      1.00      0.97        14

    accuracy                           0.96        54
   macro avg       0.96      0.97      0.96        54
weighted avg       0.97      0.96      0.96        54



In [26]:
print(f"\nKNN with Manhattan Distance - Accuracy: {accuracy_manhattan:.4f}")


KNN with Manhattan Distance - Accuracy: 0.9630


In [27]:
print("Classification Report (Manhattan):\n", report_manhattan)

Classification Report (Manhattan):
               precision    recall  f1-score   support

           0       0.95      1.00      0.97        19
           1       1.00      0.90      0.95        21
           2       0.93      1.00      0.97        14

    accuracy                           0.96        54
   macro avg       0.96      0.97      0.96        54
weighted avg       0.97      0.96      0.96        54



Question 10: You are working with a high-dimensional gene expression dataset to
classify patients with different types of cancer.
Due to the large number of features and a small number of samples, traditional models
overfit.

Explain how you would:

● Use PCA to reduce dimensionality

● Decide how many components to keep

● Use KNN for classification post-dimensionality reduction

● Evaluate the model

● Justify this pipeline to your stakeholders as a robust solution for real-world
biomedical data

Answer-

1.Use PCA for Dimensionality Reduction


* Standardize the data: Since PCA is sensitive to scale, each gene's expression values should be centered and scaled (typically z-scores) so that no gene disproportionately influences the results.

* Compute principal components: Use either the covariance matrix’s eigen-decomposition or singular value decomposition (SVD) to extract orthogonal components capturing variance in descending order.


* Interpretation: The first few components aggregate correlated gene expression patterns—an “eigengene”—reducing noise and redundancy.

2.Decide How Many Components to Keep

* Explained variance: Plot a scree chart or cumulative variance graph. Choose enough PCs to capture, say, 85–95% of total variance—but stop before including low-variance noise.


* Cross-validation: Use CV (e.g., nested CV) to evaluate classification error at varying numbers of PCs. Select the number where validation error is minimized or levels off.

3.Apply K-Nearest Neighbors (KNN) for Classification

* Train KNN on reduced data: Use the selected PCs as input. KNN classifies each test sample based on the majority class of its ‘k’ nearest neighbors in the reduced space.

* Choose k carefully: Tune ‘k’ (e.g., via cross-validation) to balance bias and variance—smaller ‘k’ captures local structure but risks overfitting; larger ‘k’ smooths out noise but may underfit.

4.Evaluate the Model

* Cross-validation: Use repeated k-fold CV or nested CV to estimate performance robustly.

* Performance metrics: Since cancer classification is critical in healthcare, go beyond accuracy—use sensitivity (recall), specificity, precision, F1 score, and ROC-AUC. Consider class imbalance if some cancer types are rarer.

* Hold-out validation: If possible, test the final model on an independent external dataset to assess real-world generalizability.

5.Justifying the Pipeline to Stakeholders

* Addresses overfitting: PCA reduces dimensionality from thousands of genes to a manageable number of components, reducing risk of overfitting with small sample sizes.

* Improved robustness and interpretability: PCs summarize correlated gene groups (“eigengenes”), reducing noise and making downstream modeling more stable.


* Efficiency and reproducibility: PCA is computationally efficient and deterministic, especially compared to deep learning alternatives.


* Transparent and defensible: KNN is a simple, nonparametric classifier. Its decision-making is intuitive (“nearest neighbors”), fostering trust among biomedical stakeholders.

* Quantifiable performance: Evaluation through cross-validation and external validation offers transparent, statistically sound measures of accuracy, sensitivity, and generalization.

* Flexibility to adapt: You can adjust the number of PCs or k in KNN based on new data or stakeholder requirements, ensuring responsiveness to real-world demands.

| Step               | Action                                                                                   |
| ------------------ | ---------------------------------------------------------------------------------------- |
| **1. Preprocess**  | Standardize gene expression data                                                         |
| **2. PCA**         | Compute PCs; retain those capturing optimal variance                                     |
| **3. KNN**         | Train KNN using PCs as features                                                          |
| **4. Tune & Eval** | Optimize number of PCs, k; use cross-validation and class-aware metrics                  |
| **5. Justify**     | Emphasize overfitting reduction, interpretability, efficiency, and validated performance |


