## Empirical Tuning

### Round 1:

In [81]:
param_grid = {'n_neighbors': [3, 5],'weights': ['uniform', 'distance'],'metric': ['euclidean']}
from sklearn.model_selection import GridSearchCV

grid = GridSearchCV(KNeighborsClassifier(n_jobs=-1), param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best CV Accuracy:", grid.best_score_)

Best Parameters: {'metric': 'euclidean', 'n_neighbors': 5, 'weights': 'distance'}
Best CV Accuracy: 0.6872464764523891


## üîç Observations ‚Äì Round 1 (KNN with GridSearchCV)

### üß™ Model Performance
- **Best CV Accuracy**: `68.72%` via GridSearchCV
- **Test Accuracy**: `69.29%` ‚Äì Fair baseline, especially considering KNN‚Äôs simplicity and scalability limitations.

In [87]:
best_knn = grid.best_estimator_
y_pred = best_knn.predict(X_test)
y_pred_probs = best_knn.predict_proba(X_test)

In [88]:
accuracy = accuracy_score(y_test, y_pred)
print(f"\nüîπ Accuracy: {accuracy:.4f}")


üîπ Accuracy: 0.6929


In [89]:
cm = confusion_matrix(y_test, y_pred)
print("\nüîπ Confusion Matrix:")
print(cm)


üîπ Confusion Matrix:
[[  808  1073  1004    17   232]
 [  231 17822  2695    91   770]
 [  183  2053  7838    26  1106]
 [    7   161    84    71    13]
 [   62  1000  1460    17  1175]]


### üßæ Confusion Matrix Insights
- Large confusion between:
  - `'Drink'` ‚Üî `'Inside'` and `'Food'`
  - `'Menu'` ‚Üî `'Food'` and `'Inside'`
  - `'Outside'` misclassified as `'Inside'` and `'Food'`
- **High true positives for 'Food'**, but minority classes get heavily confused or ignored.

In [90]:
report = classification_report(y_test, y_pred, target_names=list(label_dict.values()))
print("\nüîπ Classification Report:")
print(report)


üîπ Classification Report:
              precision    recall  f1-score   support

       drink       0.63      0.26      0.37      3134
        food       0.81      0.82      0.82     21609
      inside       0.60      0.70      0.65     11206
        menu       0.32      0.21      0.25       336
     outside       0.36      0.32      0.34      3714

    accuracy                           0.69     39999
   macro avg       0.54      0.46      0.48     39999
weighted avg       0.69      0.69      0.68     39999



### üìâ Class-wise Performance
- **'Food'** is the top-performing class:
  - `Precision: 0.81`, `Recall: 0.82`, `F1-score: 0.82` ‚Äì consistent and strong.
- **'Inside'** performs moderately:
  - `Recall: 0.70`, `F1-score: 0.65` ‚Äì fairly balanced.
- **'Drink'**, **'Menu'**, and **'Outside'** have poor performance:
  - 'Drink': `F1-score: 0.37`
  - 'Menu': `F1-score: 0.25`
  - 'Outside': `F1-score: 0.34`

### üìä Advanced Metrics
- **Macro F1-score**: `0.48` ‚Äì reflects class imbalance.
- **Weighted F1-score**: `0.68` ‚Äì benefits from large 'Food' class performance.
- **Macro Precision**: `0.54`, **Macro Recall**: `0.46` ‚Äì suggests room for improvement, particularly on minority classes.


In [91]:
print("\nüîπ AUC Scores (per class):")
num_classes = len(label_dict)
y_test_cat = to_categorical(y_test, num_classes=num_classes)

auc_scores = []
for i in range(num_classes):
    try:
        auc = roc_auc_score(y_test_cat[:, i], y_pred_probs[:, i])
        auc_scores.append(auc)
        print(f"Class {label_dict[i]} AUC: {auc:.3f}")
    except Exception:
        auc_scores.append(None)
        print(f"Class {label_dict[i]} AUC: N/A")


üîπ AUC Scores (per class):
Class drink AUC: 0.679
Class food AUC: 0.873
Class inside AUC: 0.836
Class menu AUC: 0.706
Class outside AUC: 0.733


### üìà AUC per Class
- **'Food' and 'Inside'** show strong AUC:
  - `Food: 0.873`, `Inside: 0.836`
- **'Drink', 'Menu', 'Outside'** have weaker AUCs:
  - `Drink: 0.679`, `Menu: 0.706`, `Outside: 0.733`
- Indicates good discrimination for dominant classes, but struggles with edge cases and small classes.

---

## üìå Summary ‚Äì Round 1

- **KNN with GridSearchCV** provides a solid baseline model with **~69% test accuracy**, outperforming chance but limited in class sensitivity.
- **Class imbalance is clearly a major challenge**:
  - Minority classes like **'menu'**, **'outside'**, and **'drink'** perform poorly in terms of recall and F1-score.
- **Strong reliance on majority class ('food')**, which inflates weighted metrics but masks poor per-class performance.
- **AUC scores reveal good potential for ranking/class separation**, especially for 'food' and 'inside'.

### ‚úÖ Recommendation
- **KNN is sensitive to class imbalance and high-dimensional data** ‚Äì consider switching to tree-based models or neural nets.
- If continuing with KNN:
  - Try **dimensionality reduction (e.g., PCA)** before fitting
  - Implement **SMOTE** or **class-weighted strategies** to balance classes
  - Consider **distance-based kernel tuning** or different distance metrics
- Use these results as a **baseline** to benchmark future models.

## Round 2 :

In [98]:
best_pca_knn = None
best_pca_acc = 0
best_pca_X_test = None
best_pca_y_test = None
best_pca_y_pred_probs = None
best_pca_label_dict = label_dict

In [100]:
for n in [50, 100]:
    pca = PCA(n_components=n)
    X_pca = pca.fit_transform(X_scaled)

    X_train, X_test, y_train, y_test = train_test_split(X_pca, y_encoded, test_size=0.2, stratify=y_encoded)

    knn = KNeighborsClassifier(n_neighbors=5, n_jobs=-1)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    y_pred_probs = knn.predict_proba(X_test)

    acc = accuracy_score(y_test, y_pred)
    print(f"PCA Components: {n}, Accuracy: {acc:.4f}")

    if acc > best_pca_acc:
        best_pca_acc = acc
        best_pca_knn = knn
        best_pca_X_test = X_test
        best_pca_y_test = y_test
        best_pca_y_pred_probs = y_pred_probs

PCA Components: 50, Accuracy: 0.6883
PCA Components: 100, Accuracy: 0.6880


## üîç Observations ‚Äì Round 2 (KNN with PCA)

### üß™ Model Performance
- **Test Accuracy**: `68.83%`
- **PCA Comparison**:
  - `50 components`: Accuracy = `68.83%`
  - `100 components`: Accuracy = `68.80%`
- PCA doesn‚Äôt significantly affect performance ‚Äî suggests intrinsic data structure may not be highly compressible, or KNN is not effectively leveraging reduced dimensionality.

In [101]:
y_pred = best_pca_knn.predict(best_pca_X_test)
y_pred_probs = best_pca_y_pred_probs
y_test = best_pca_y_test

In [102]:
accuracy = accuracy_score(y_test, y_pred)
print(f"\nüî∏ Round 2 ‚Äî Accuracy: {accuracy:.4f}")


üî∏ Round 2 ‚Äî Accuracy: 0.6883


In [103]:
cm = confusion_matrix(y_test, y_pred)
print("\nüî∏ Round 2 ‚Äî Confusion Matrix:")
print(cm)


üî∏ Round 2 ‚Äî Confusion Matrix:
[[  783  1279   950     8   114]
 [  396 18277  2551    30   355]
 [  324  2646  7568    11   657]
 [   14   193    80    39    10]
 [  116  1216  1513     6   863]]


### üßæ Confusion Matrix Insights
- **'Food'** predictions overwhelm others ‚Äî many samples from other classes (especially `'menu'` and `'outside'`) misclassified as `'food'`.
- **'Menu' class struggles most**:
  - Out of 336 samples, only ~40 correctly classified.
- **'Inside'** class sees improvements in correct predictions, better than Round 1.

In [104]:
report = classification_report(y_test, y_pred, target_names=list(label_dict.values()))
print("\nüî∏ Round 2 ‚Äî Classification Report:")
print(report)


üî∏ Round 2 ‚Äî Classification Report:
              precision    recall  f1-score   support

       drink       0.48      0.25      0.33      3134
        food       0.77      0.85      0.81     21609
      inside       0.60      0.68      0.63     11206
        menu       0.41      0.12      0.18       336
     outside       0.43      0.23      0.30      3714

    accuracy                           0.69     39999
   macro avg       0.54      0.42      0.45     39999
weighted avg       0.67      0.69      0.67     39999



### üìâ Class-wise Performance
- **'Food'** class continues to dominate:
  - `Precision: 0.77`, `Recall: 0.85`, `F1-score: 0.81` ‚Äî slight dip from Round 1 but still strong.
- **'Inside'** remains decent:
  - `F1-score: 0.63`, up slightly from Round 1.
- **'Drink'**, **'Outside'**, and **'Menu'** remain problematic:
  - 'Drink': `F1-score: 0.33` (recall dropped to 0.25)
  - 'Outside': `F1-score: 0.30`
  - 'Menu': `F1-score: 0.18`, recall plummets to 0.12 ‚Äî nearly all menu items misclassified.

### üìä Advanced Metrics
- **Macro F1-score**: `0.45` ‚Äì consistent with Round 1 (`0.48`)
- **Weighted F1-score**: `0.67` ‚Äì slightly lower than Round 1 due to drops in precision for minority classes.
- **Macro Recall**: `0.42`, **Macro Precision**: `0.54` ‚Äì unchanged from Round 1, suggesting PCA didn‚Äôt help recover lost recall.

In [105]:
print("\nüî∏ Round 2 ‚Äî AUC Scores (per class):")
y_test_cat = to_categorical(y_test, num_classes=len(label_dict))
auc_scores = []
for i in range(len(label_dict)):
    try:
        auc = roc_auc_score(y_test_cat[:, i], y_pred_probs[:, i])
        auc_scores.append(auc)
        print(f"Class {label_dict[i]} AUC: {auc:.3f}")
    except Exception:
        auc_scores.append(None)
        print(f"Class {label_dict[i]} AUC: N/A")


üî∏ Round 2 ‚Äî AUC Scores (per class):
Class drink AUC: 0.676
Class food AUC: 0.861
Class inside AUC: 0.821
Class menu AUC: 0.684
Class outside AUC: 0.711


### üìà AUC per Class
- Slight dip in AUC scores compared to Round 1:
  - `'Drink': 0.676` (‚Üì from 0.679)
  - `'Food': 0.861` (‚Üì from 0.873)
  - `'Inside': 0.821` (‚Üì from 0.836)
  - `'Menu': 0.684` (‚Üì from 0.706)
  - `'Outside': 0.711` (‚Üì from 0.733)
- PCA may have caused subtle degradation in feature separation ability.

---

In [106]:
results['Round2'] = {'accuracy': accuracy,'auc_scores': auc_scores,'confusion_matrix': cm}
report_dicts['Round2'] = classification_report(y_test, y_pred, target_names=list(label_dict.keys()), output_dict=True)

---
## üìå Summary ‚Äì Round 2

- Applying **PCA before KNN** offers **negligible improvement** ‚Äî same accuracy (`~68.8%`) and slightly lower performance on minority classes.
- **'Food'** remains a strong performer, but **other classes suffer from confusion**, particularly 'menu' and 'outside'.
- **Recall for minority classes deteriorated**, especially for **'menu'**.
- **AUC scores and F1-metrics show general performance stagnation or decline** ‚Äî PCA hasn‚Äôt added discriminative power in this case.

### ‚úÖ Recommendation
- PCA may not be effective for this dataset + KNN combo:
  - **Original high-dimensional features may retain critical class signals**
  - **KNN** is sensitive to distance metrics ‚Äî PCA may distort class boundaries
- Try:
  - **Other dimensionality reduction techniques** (e.g., UMAP, t-SNE for visualization or LDA for supervised reduction)
  - **Alternative classifiers** (Random Forest, Gradient Boosting, or Neural Networks)
  - **Class rebalancing techniques** or custom distance metrics that weight minority classes more

## Round 3:

In [141]:
scalers = {'Standard': StandardScaler(),'MinMax': MinMaxScaler(),'Robust': RobustScaler()}

In [143]:
best_scaler_name = None
best_scaler_knn = None
best_scaler_X_test = None
best_scaler_y_test = None
best_scaler_y_pred_probs = None
best_scaler_acc = 0

In [145]:
for name, scaler in scalers.items():
    X_scaled = scaler.fit_transform(X)
    pca = PCA(n_components=100)
    X_pca = pca.fit_transform(X_scaled)

    X_train, X_test, y_train, y_test = train_test_split(X_pca, y_encoded, test_size=0.2, stratify=y_encoded)

    knn = KNeighborsClassifier(n_neighbors=5, n_jobs=-1)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    y_pred_probs = knn.predict_proba(X_test)

    acc = accuracy_score(y_test, y_pred)
    print(f"{name} Scaler, Accuracy: {acc:.4f}")

    if acc > best_scaler_acc:
        best_scaler_acc = acc
        best_scaler_name = name
        best_scaler_knn = knn
        best_scaler_X_test = X_test
        best_scaler_y_test = y_test
        best_scaler_y_pred_probs = y_pred_probs

Standard Scaler, Accuracy: 0.6876
MinMax Scaler, Accuracy: 0.6898
Robust Scaler, Accuracy: 0.6868


## üîç Observations ‚Äì Round 3 (KNN with Scaling Techniques)

### üß™ Model Performance
- **Best Scaler**: `MinMaxScaler` with **Accuracy = 68.98%**
- **Other Scalers**:
  - StandardScaler: `68.76%`
  - RobustScaler: `68.68%`
- All scalers yielded similar performance, but **MinMaxScaler slightly outperformed** in accuracy.

In [147]:
y_pred = best_scaler_knn.predict(best_scaler_X_test)
y_pred_probs = best_scaler_y_pred_probs
y_test = best_scaler_y_test

In [148]:
accuracy = accuracy_score(y_test, y_pred)
print(f"\nüî∏ Round 3 ‚Äî Best Scaler: {best_scaler_name}")
print(f"üî∏ Accuracy: {accuracy:.4f}")


üî∏ Round 3 ‚Äî Best Scaler: MinMax
üî∏ Accuracy: 0.6898


In [149]:
cm = confusion_matrix(y_test, y_pred)
print("\nüî∏ Confusion Matrix:")
print(cm)


üî∏ Confusion Matrix:
[[  795  1191   984     7   157]
 [  374 18086  2624    44   481]
 [  296  2453  7676    27   754]
 [   15   164    96    54     7]
 [   95  1144  1485     9   981]]


### üßæ Confusion Matrix Insights
- **'Food' continues to absorb misclassifications** from most other classes, especially 'menu' and 'outside'.
- **'Menu' classification slightly improved** compared to Round 2, but recall remains very low (`0.16`).
- **'Inside' performance holds steady**, reflecting decent model sensitivity to indoor content.

In [150]:
report = classification_report(y_test, y_pred, target_names=list(label_dict.values()))
print("\nüî∏ Classification Report:")
print(report)


üî∏ Classification Report:
              precision    recall  f1-score   support

       drink       0.50      0.25      0.34      3134
        food       0.79      0.84      0.81     21609
      inside       0.60      0.68      0.64     11206
        menu       0.38      0.16      0.23       336
     outside       0.41      0.26      0.32      3714

    accuracy                           0.69     39999
   macro avg       0.54      0.44      0.47     39999
weighted avg       0.67      0.69      0.67     39999



### üìâ Class-wise Performance
- **'Food'** continues to dominate:
  - `Precision: 0.79`, `Recall: 0.84`, `F1-score: 0.81`
- **'Inside'** remains consistent:
  - `F1-score: 0.64`, comparable to Round 2
- **'Drink'**, **'Menu'**, and **'Outside'** still underperform:
  - 'Drink': `F1-score: 0.34`
  - 'Outside': `F1-score: 0.32`
  - 'Menu': `F1-score: 0.23` ‚Äì slight gain from Round 2 but still very weak

### üìä Advanced Metrics
- **Macro F1-score**: `0.47` ‚Äì slight improvement over Round 2
- **Weighted F1-score**: `0.67` ‚Äì consistent across all rounds
- **Macro Recall**: `0.44`, **Macro Precision**: `0.54` ‚Äì same as Round 2

In [151]:
print("\nüî∏ AUC Scores (per class):")
y_test_cat = to_categorical(y_test, num_classes=len(label_dict))
auc_scores = []
for i in range(len(label_dict)):
    try:
        auc = roc_auc_score(y_test_cat[:, i], y_pred_probs[:, i])
        auc_scores.append(auc)
        print(f"Class {label_dict[i]} AUC: {auc:.3f}")
    except Exception:
        auc_scores.append(None)
        print(f"Class {label_dict[i]} AUC: N/A")


üî∏ AUC Scores (per class):
Class drink AUC: 0.673
Class food AUC: 0.861
Class inside AUC: 0.822
Class menu AUC: 0.707
Class outside AUC: 0.723


### üìà AUC per Class
- Very similar to Round 2:
  - `'Drink': 0.673`, `'Food': 0.861`, `'Inside': 0.822`, `'Menu': 0.707`, `'Outside': 0.723`
- **No meaningful improvement in class separation** from scaling ‚Äì AUC scores nearly identical.

---

In [152]:
results['Round3'] = {'accuracy': accuracy,'auc_scores': auc_scores,'confusion_matrix': cm,'best_scaler': best_scaler_name}
report_dicts['Round3'] = classification_report(y_test, y_pred, target_names=list(label_dict.keys()), output_dict=True)

## üìå Summary ‚Äì Round 3

- **MinMaxScaler** produced the best results among the scaling methods, but **improvements were minimal** (`+0.1‚Äì0.2%`).
- **No breakthrough in addressing class imbalance or minority class confusion**.
- AUC and F1-scores are largely unchanged from Round 2, suggesting **scaling has limited impact on KNN performance in this context**.
- **'Food' class performance props up overall metrics**, while 'drink', 'outside', and 'menu' continue to perform poorly.

### ‚úÖ Recommendation
- Scaling has **marginal benefits** for KNN in this scenario.
- Consider focusing on:
  - **Model type upgrades** (e.g., SVMs, ensemble models)
  - **Feature engineering** to highlight class-discriminative properties
  - **Data rebalancing techniques** (e.g., SMOTE) or **class-specific tuning**
- Use this round as a confirmation that **data scaling alone cannot resolve class-level performance gaps**.