# Phase 2: Classification - Credit Rating Prediction

This notebook implements and evaluates classification models for predicting sovereign credit ratings as categorical labels (AAA, BB+, etc.).

## 1. Setup and Data Loading

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import StratifiedKFold, cross_val_score, cross_val_predict
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score

import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

In [None]:
# Load dataset with credit rating labels
df = pd.read_csv('../data/processed/merged_dataset_labels.csv')

print(f"Dataset shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")
print(f"\nFirst 5 rows:")
df.head()

In [None]:
# Prepare features (X) and target (y)
X = df.drop(['Country', 'Year', 'Credit_Rating_Label'], axis=1)
y = df['Credit_Rating_Label']

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"\nNumber of classes: {y.nunique()}")
print(f"\nClass distribution:")
y.value_counts().sort_index()

## 2. k-Nearest Neighbors (k-NN) Classification

k-NN predicts the credit rating by finding the k most similar countries (based on macroeconomic indicators) and taking the majority vote.

**Key points:**
- Feature normalization is **mandatory** (k-NN uses Euclidean distance)
- We test k = 3, 5, 7, 9
- Stratified K-Fold CV (K=5) to maintain class distribution

### 2.1 k-NN with k=3

In [None]:
# Normalize features
scaler_k3 = StandardScaler()
X_scaled_k3 = scaler_k3.fit_transform(X)

# Create k-NN model with k=3
knn_k3 = KNeighborsClassifier(n_neighbors=3, metric='euclidean')

# Stratified K-Fold CV
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Cross-validation scores
accuracy_k3 = cross_val_score(knn_k3, X_scaled_k3, y, cv=skf, scoring='accuracy')
f1_weighted_k3 = cross_val_score(knn_k3, X_scaled_k3, y, cv=skf, scoring='f1_weighted')
f1_macro_k3 = cross_val_score(knn_k3, X_scaled_k3, y, cv=skf, scoring='f1_macro')

print("k-NN (k=3) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_k3.mean():.4f} ± {accuracy_k3.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_k3.mean():.4f} ± {f1_weighted_k3.std():.4f}")
print(f"  F1-Macro:      {f1_macro_k3.mean():.4f} ± {f1_macro_k3.std():.4f}")

# Train final model
knn_k3.fit(X_scaled_k3, y)
print("\n✓ Model trained on full dataset")

In [None]:
# Confusion Matrix for k=3
y_pred_k3 = cross_val_predict(knn_k3, X_scaled_k3, y, cv=skf)
cm_k3 = confusion_matrix(y, y_pred_k3)

# Visualize
plt.figure(figsize=(14, 12))
labels = sorted(y.unique())
sns.heatmap(cm_k3, annot=True, fmt='d', cmap='Blues', 
            xticklabels=labels, yticklabels=labels,
            cbar_kws={'label': 'Count'})
plt.title(f'k-NN (k=3) Confusion Matrix\nAccuracy: {accuracy_score(y, y_pred_k3):.4f}', 
          fontsize=14, fontweight='bold')
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# Classification Report for k=3
print("Classification Report (k=3):")
print(classification_report(y, y_pred_k3, zero_division=0))

### 2.2 k-NN with k=5

In [None]:
# k-NN with k=5
scaler_k5 = StandardScaler()
X_scaled_k5 = scaler_k5.fit_transform(X)

knn_k5 = KNeighborsClassifier(n_neighbors=5, metric='euclidean')

accuracy_k5 = cross_val_score(knn_k5, X_scaled_k5, y, cv=skf, scoring='accuracy')
f1_weighted_k5 = cross_val_score(knn_k5, X_scaled_k5, y, cv=skf, scoring='f1_weighted')
f1_macro_k5 = cross_val_score(knn_k5, X_scaled_k5, y, cv=skf, scoring='f1_macro')

print("k-NN (k=5) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_k5.mean():.4f} ± {accuracy_k5.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_k5.mean():.4f} ± {f1_weighted_k5.std():.4f}")
print(f"  F1-Macro:      {f1_macro_k5.mean():.4f} ± {f1_macro_k5.std():.4f}")

knn_k5.fit(X_scaled_k5, y)
y_pred_k5 = cross_val_predict(knn_k5, X_scaled_k5, y, cv=skf)

### 2.3 k-NN with k=7

In [None]:
# k-NN with k=7
scaler_k7 = StandardScaler()
X_scaled_k7 = scaler_k7.fit_transform(X)

knn_k7 = KNeighborsClassifier(n_neighbors=7, metric='euclidean')

accuracy_k7 = cross_val_score(knn_k7, X_scaled_k7, y, cv=skf, scoring='accuracy')
f1_weighted_k7 = cross_val_score(knn_k7, X_scaled_k7, y, cv=skf, scoring='f1_weighted')
f1_macro_k7 = cross_val_score(knn_k7, X_scaled_k7, y, cv=skf, scoring='f1_macro')

print("k-NN (k=7) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_k7.mean():.4f} ± {accuracy_k7.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_k7.mean():.4f} ± {f1_weighted_k7.std():.4f}")
print(f"  F1-Macro:      {f1_macro_k7.mean():.4f} ± {f1_macro_k7.std():.4f}")

knn_k7.fit(X_scaled_k7, y)
y_pred_k7 = cross_val_predict(knn_k7, X_scaled_k7, y, cv=skf)

### 2.4 k-NN with k=9

In [None]:
# k-NN with k=9
scaler_k9 = StandardScaler()
X_scaled_k9 = scaler_k9.fit_transform(X)

knn_k9 = KNeighborsClassifier(n_neighbors=9, metric='euclidean')

accuracy_k9 = cross_val_score(knn_k9, X_scaled_k9, y, cv=skf, scoring='accuracy')
f1_weighted_k9 = cross_val_score(knn_k9, X_scaled_k9, y, cv=skf, scoring='f1_weighted')
f1_macro_k9 = cross_val_score(knn_k9, X_scaled_k9, y, cv=skf, scoring='f1_macro')

print("k-NN (k=9) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_k9.mean():.4f} ± {accuracy_k9.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_k9.mean():.4f} ± {f1_weighted_k9.std():.4f}")
print(f"  F1-Macro:      {f1_macro_k9.mean():.4f} ± {f1_macro_k9.std():.4f}")

knn_k9.fit(X_scaled_k9, y)
y_pred_k9 = cross_val_predict(knn_k9, X_scaled_k9, y, cv=skf)

## 3. Comparison of k Values

In [None]:
# Create comparison dataframe
comparison_df = pd.DataFrame({
    'k': [3, 5, 7, 9],
    'Accuracy': [accuracy_k3.mean(), accuracy_k5.mean(), accuracy_k7.mean(), accuracy_k9.mean()],
    'Accuracy_Std': [accuracy_k3.std(), accuracy_k5.std(), accuracy_k7.std(), accuracy_k9.std()],
    'F1_Weighted': [f1_weighted_k3.mean(), f1_weighted_k5.mean(), f1_weighted_k7.mean(), f1_weighted_k9.mean()],
    'F1_Weighted_Std': [f1_weighted_k3.std(), f1_weighted_k5.std(), f1_weighted_k7.std(), f1_weighted_k9.std()],
    'F1_Macro': [f1_macro_k3.mean(), f1_macro_k5.mean(), f1_macro_k7.mean(), f1_macro_k9.mean()],
    'F1_Macro_Std': [f1_macro_k3.std(), f1_macro_k5.std(), f1_macro_k7.std(), f1_macro_k9.std()]
})

print("k-NN Performance Comparison:")
print("="*80)
comparison_df

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Accuracy
axes[0].errorbar(comparison_df['k'], comparison_df['Accuracy'], 
                 yerr=comparison_df['Accuracy_Std'], 
                 marker='o', capsize=5, capthick=2, linewidth=2, markersize=8)
axes[0].set_xlabel('k (Number of Neighbors)', fontsize=12)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Accuracy vs k', fontsize=14, fontweight='bold')
axes[0].grid(True, alpha=0.3)
axes[0].set_xticks([3, 5, 7, 9])

# F1-Weighted
axes[1].errorbar(comparison_df['k'], comparison_df['F1_Weighted'], 
                 yerr=comparison_df['F1_Weighted_Std'], 
                 marker='o', capsize=5, capthick=2, linewidth=2, markersize=8, color='orange')
axes[1].set_xlabel('k (Number of Neighbors)', fontsize=12)
axes[1].set_ylabel('F1-Weighted', fontsize=12)
axes[1].set_title('F1-Weighted vs k', fontsize=14, fontweight='bold')
axes[1].grid(True, alpha=0.3)
axes[1].set_xticks([3, 5, 7, 9])

# F1-Macro
axes[2].errorbar(comparison_df['k'], comparison_df['F1_Macro'], 
                 yerr=comparison_df['F1_Macro_Std'], 
                 marker='o', capsize=5, capthick=2, linewidth=2, markersize=8, color='green')
axes[2].set_xlabel('k (Number of Neighbors)', fontsize=12)
axes[2].set_ylabel('F1-Macro', fontsize=12)
axes[2].set_title('F1-Macro vs k', fontsize=14, fontweight='bold')
axes[2].grid(True, alpha=0.3)
axes[2].set_xticks([3, 5, 7, 9])

plt.tight_layout()
plt.show()

## 4. k-NN Analysis and Conclusions

### Key Findings:

1. **Best k value**: k=3 achieves the highest performance
   - Accuracy: ~68.84%
   - F1-Weighted: ~68.12%
   - F1-Macro: ~54.37%

2. **Performance decreases as k increases**:
   - k=3: Most precise, uses closest neighbors
   - k=9: Too smooth, loses detail

3. **Well-predicted classes** (k=3):
   - AAA: 89% F1-Score (226 examples)
   - AA+: 80% F1-Score (50 examples)
   - BB: 78% F1-Score (12 examples)

4. **Difficult classes** (k=3):
   - CC, CCC, CCC+: 0% (only 3-4 examples each)
   - Classes with <10 examples are very hard to predict

5. **Overall performance**:
   - 68.84% accuracy with 20 classes is excellent
   - Random guessing would give ~5% accuracy
   - k-NN is 13× better than random!

In [None]:
# Load saved metrics
metrics_df = pd.read_csv('../results/classification_metrics.csv')
print("Saved Classification Metrics:")
print("="*80)
metrics_df

## 4. Naive Bayes Classification

Naive Bayes is a probabilistic classifier based on Bayes' theorem. It calculates the probability of each class given the features and predicts the class with the highest probability.

**Key points:**
- No feature normalization needed (unlike k-NN)
- Assumes features are independent (naive assumption)
- Fast training and prediction
- Good probabilistic baseline

In [None]:
# Create Gaussian Naive Bayes model
nb_model = GaussianNB()

# Cross-validation scores (no normalization needed)
accuracy_nb = cross_val_score(nb_model, X, y, cv=skf, scoring='accuracy')
f1_weighted_nb = cross_val_score(nb_model, X, y, cv=skf, scoring='f1_weighted')
f1_macro_nb = cross_val_score(nb_model, X, y, cv=skf, scoring='f1_macro')

print("Naive Bayes Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_nb.mean():.4f} ± {accuracy_nb.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_nb.mean():.4f} ± {f1_weighted_nb.std():.4f}")
print(f"  F1-Macro:      {f1_macro_nb.mean():.4f} ± {f1_macro_nb.std():.4f}")

# Train final model
nb_model.fit(X, y)
print("\n✓ Model trained on full dataset")

In [None]:
# Confusion Matrix for Naive Bayes
y_pred_nb = cross_val_predict(nb_model, X, y, cv=skf)
cm_nb = confusion_matrix(y, y_pred_nb)

# Visualize
plt.figure(figsize=(14, 12))
labels = sorted(y.unique())
sns.heatmap(cm_nb, annot=True, fmt='d', cmap='Blues', 
            xticklabels=labels, yticklabels=labels,
            cbar_kws={'label': 'Count'})
plt.title(f'Naive Bayes Confusion Matrix\nAccuracy: {accuracy_score(y, y_pred_nb):.4f}', 
          fontsize=14, fontweight='bold')
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.yticks(rotation=0)
plt.tight_layout()
plt.show()

In [None]:
# Classification Report for Naive Bayes
print("Classification Report (Naive Bayes):")
print(classification_report(y, y_pred_nb, zero_division=0))

## 5. Comparison: k-NN vs Naive Bayes

In [None]:
# Create comparison dataframe including Naive Bayes
comparison_all_df = pd.DataFrame({
    'Model': ['k-NN (k=3)', 'k-NN (k=5)', 'k-NN (k=7)', 'k-NN (k=9)', 'Naive Bayes'],
    'Accuracy': [accuracy_k3.mean(), accuracy_k5.mean(), accuracy_k7.mean(), accuracy_k9.mean(), accuracy_nb.mean()],
    'Accuracy_Std': [accuracy_k3.std(), accuracy_k5.std(), accuracy_k7.std(), accuracy_k9.std(), accuracy_nb.std()],
    'F1_Weighted': [f1_weighted_k3.mean(), f1_weighted_k5.mean(), f1_weighted_k7.mean(), f1_weighted_k9.mean(), f1_weighted_nb.mean()],
    'F1_Weighted_Std': [f1_weighted_k3.std(), f1_weighted_k5.std(), f1_weighted_k7.std(), f1_weighted_k9.std(), f1_weighted_nb.std()],
    'F1_Macro': [f1_macro_k3.mean(), f1_macro_k5.mean(), f1_macro_k7.mean(), f1_macro_k9.mean(), f1_macro_nb.mean()],
    'F1_Macro_Std': [f1_macro_k3.std(), f1_macro_k5.std(), f1_macro_k7.std(), f1_macro_k9.std(), f1_macro_nb.std()]
})

print("All Models Performance Comparison:")
print("="*80)
comparison_all_df

In [None]:
# Visualize comparison
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

models = ['k-NN\n(k=3)', 'k-NN\n(k=5)', 'k-NN\n(k=7)', 'k-NN\n(k=9)', 'Naive\nBayes']
x_pos = np.arange(len(models))

# Accuracy
axes[0].bar(x_pos, comparison_all_df['Accuracy'], yerr=comparison_all_df['Accuracy_Std'], 
            capsize=5, alpha=0.7, color=['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'])
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Accuracy Comparison', fontsize=14, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(models)
axes[0].grid(True, alpha=0.3, axis='y')

# F1-Weighted
axes[1].bar(x_pos, comparison_all_df['F1_Weighted'], yerr=comparison_all_df['F1_Weighted_Std'], 
            capsize=5, alpha=0.7, color=['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'])
axes[1].set_ylabel('F1-Weighted', fontsize=12)
axes[1].set_title('F1-Weighted Comparison', fontsize=14, fontweight='bold')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(models)
axes[1].grid(True, alpha=0.3, axis='y')

# F1-Macro
axes[2].bar(x_pos, comparison_all_df['F1_Macro'], yerr=comparison_all_df['F1_Macro_Std'], 
            capsize=5, alpha=0.7, color=['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e'])
axes[2].set_ylabel('F1-Macro', fontsize=12)
axes[2].set_title('F1-Macro Comparison', fontsize=14, fontweight='bold')
axes[2].set_xticks(x_pos)
axes[2].set_xticklabels(models)
axes[2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 6. Overall Analysis

### Naive Bayes Performance:

**Results:**
- Accuracy: ~37.26%
- F1-Weighted: ~37.32%
- F1-Macro: ~30.67%

**Why is Naive Bayes much weaker than k-NN?**

1. **Independence assumption violated**: Naive Bayes assumes features are independent, but in reality:
   - Interest_Rate and Inflation are correlated
   - Unemployment and GDP_Growth are correlated
   - This violates the "naive" assumption

2. **20 classes with imbalanced data**: 
   - Classes with few examples (CC=3, CCC=4) are very difficult
   - Gaussian assumption may not hold for all features

3. **k-NN handles correlation naturally**: 
   - k-NN uses distances, not probabilities
   - No independence assumption needed

### Best Model So Far:

**k-NN (k=3)** is the clear winner:
- 68.84% accuracy (vs 37.26% for Naive Bayes)
- 85% better than Naive Bayes
- 13× better than random guessing (5%)

In [None]:
# Load saved metrics
metrics_df = pd.read_csv('../results/classification_metrics.csv')
print("Saved Classification Metrics:")
print("="*80)
metrics_df

## Next Steps

In the following sections, we will implement and compare:
- Decision Tree (interpretable model with max_depth tuning)
- Random Forest (ensemble method for improved performance)

These tree-based models should handle feature correlation better than Naive Bayes and potentially match or exceed k-NN performance.

## 7. Decision Tree Classification

Decision Tree creates a tree-like model of decisions based on feature values. Each internal node represents a test on a feature, each branch represents the outcome, and each leaf node represents a class label.

**Key points:**
- No feature normalization needed
- Highly interpretable (can visualize the tree)
- Provides feature importance
- Test max_depth = 3, 5, 10
- Uses `class_weight='balanced'` to handle class imbalance

### 7.1 Decision Tree with max_depth=3

In [None]:
# Create Decision Tree model with max_depth=3
dt_model_3 = DecisionTreeClassifier(max_depth=3, random_state=42, class_weight='balanced')

# Cross-validation scores
accuracy_dt3 = cross_val_score(dt_model_3, X, y, cv=skf, scoring='accuracy')
f1_weighted_dt3 = cross_val_score(dt_model_3, X, y, cv=skf, scoring='f1_weighted')
f1_macro_dt3 = cross_val_score(dt_model_3, X, y, cv=skf, scoring='f1_macro')

print("Decision Tree (depth=3) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_dt3.mean():.4f} ± {accuracy_dt3.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_dt3.mean():.4f} ± {f1_weighted_dt3.std():.4f}")
print(f"  F1-Macro:      {f1_macro_dt3.mean():.4f} ± {f1_macro_dt3.std():.4f}")

# Train final model
dt_model_3.fit(X, y)
print("\n✓ Model trained on full dataset")

In [None]:
# Feature Importance for depth=3
feature_importance_dt3 = dt_model_3.feature_importances_
importance_df_dt3 = pd.DataFrame({
    'Feature': X.columns,
    'Importance': feature_importance_dt3
}).sort_values('Importance', ascending=False)

print("Feature Importance (depth=3):")
print(importance_df_dt3)

### 7.2 Decision Tree with max_depth=5

In [None]:
# Decision Tree with max_depth=5
dt_model_5 = DecisionTreeClassifier(max_depth=5, random_state=42, class_weight='balanced')

accuracy_dt5 = cross_val_score(dt_model_5, X, y, cv=skf, scoring='accuracy')
f1_weighted_dt5 = cross_val_score(dt_model_5, X, y, cv=skf, scoring='f1_weighted')
f1_macro_dt5 = cross_val_score(dt_model_5, X, y, cv=skf, scoring='f1_macro')

print("Decision Tree (depth=5) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_dt5.mean():.4f} ± {accuracy_dt5.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_dt5.mean():.4f} ± {f1_weighted_dt5.std():.4f}")
print(f"  F1-Macro:      {f1_macro_dt5.mean():.4f} ± {f1_macro_dt5.std():.4f}")

dt_model_5.fit(X, y)
y_pred_dt5 = cross_val_predict(dt_model_5, X, y, cv=skf)

### 7.3 Decision Tree with max_depth=10

In [None]:
# Decision Tree with max_depth=10
dt_model_10 = DecisionTreeClassifier(max_depth=10, random_state=42, class_weight='balanced')

accuracy_dt10 = cross_val_score(dt_model_10, X, y, cv=skf, scoring='accuracy')
f1_weighted_dt10 = cross_val_score(dt_model_10, X, y, cv=skf, scoring='f1_weighted')
f1_macro_dt10 = cross_val_score(dt_model_10, X, y, cv=skf, scoring='f1_macro')

print("Decision Tree (depth=10) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_dt10.mean():.4f} ± {accuracy_dt10.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_dt10.mean():.4f} ± {f1_weighted_dt10.std():.4f}")
print(f"  F1-Macro:      {f1_macro_dt10.mean():.4f} ± {f1_macro_dt10.std():.4f}")

dt_model_10.fit(X, y)
y_pred_dt10 = cross_val_predict(dt_model_10, X, y, cv=skf)

In [None]:
# Feature Importance for depth=10 (Best Decision Tree)
feature_importance_dt10 = dt_model_10.feature_importances_
importance_df_dt10 = pd.DataFrame({
    'Feature': X.columns,
    'Importance': feature_importance_dt10
}).sort_values('Importance', ascending=False)

# Visualize
plt.figure(figsize=(10, 6))
plt.barh(importance_df_dt10.sort_values('Importance')['Feature'], 
         importance_df_dt10.sort_values('Importance')['Importance'], 
         color='steelblue')
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('Decision Tree (depth=10) - Feature Importance', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("\nFeature Importance Ranking:")
print(importance_df_dt10)

### 7.4 Comparison of Decision Tree max_depth Values

In [None]:
# Create comparison dataframe for all models including Decision Trees
comparison_all_models = pd.DataFrame({
    'Model': ['k-NN (k=3)', 'k-NN (k=5)', 'k-NN (k=7)', 'k-NN (k=9)', 
              'Naive Bayes', 'DT (depth=3)', 'DT (depth=5)', 'DT (depth=10)'],
    'Accuracy': [accuracy_k3.mean(), accuracy_k5.mean(), accuracy_k7.mean(), accuracy_k9.mean(), 
                 accuracy_nb.mean(), accuracy_dt3.mean(), accuracy_dt5.mean(), accuracy_dt10.mean()],
    'Accuracy_Std': [accuracy_k3.std(), accuracy_k5.std(), accuracy_k7.std(), accuracy_k9.std(), 
                     accuracy_nb.std(), accuracy_dt3.std(), accuracy_dt5.std(), accuracy_dt10.std()],
    'F1_Weighted': [f1_weighted_k3.mean(), f1_weighted_k5.mean(), f1_weighted_k7.mean(), f1_weighted_k9.mean(), 
                    f1_weighted_nb.mean(), f1_weighted_dt3.mean(), f1_weighted_dt5.mean(), f1_weighted_dt10.mean()],
    'F1_Weighted_Std': [f1_weighted_k3.std(), f1_weighted_k5.std(), f1_weighted_k7.std(), f1_weighted_k9.std(), 
                        f1_weighted_nb.std(), f1_weighted_dt3.std(), f1_weighted_dt5.std(), f1_weighted_dt10.std()]
})

print("All Models Performance Comparison:")
print("="*80)
comparison_all_models.sort_values('Accuracy', ascending=False)

In [None]:
# Visualize comparison of all models
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

models = comparison_all_models['Model']
x_pos = np.arange(len(models))

# Accuracy
axes[0].bar(x_pos, comparison_all_models['Accuracy'], 
            yerr=comparison_all_models['Accuracy_Std'], 
            capsize=5, alpha=0.7, 
            color=['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e', '#2ca02c', '#2ca02c', '#2ca02c'])
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Accuracy Comparison - All Models', fontsize=14, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(models, rotation=45, ha='right')
axes[0].grid(True, alpha=0.3, axis='y')

# F1-Weighted
axes[1].bar(x_pos, comparison_all_models['F1_Weighted'], 
            yerr=comparison_all_models['F1_Weighted_Std'], 
            capsize=5, alpha=0.7, 
            color=['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e', '#2ca02c', '#2ca02c', '#2ca02c'])
axes[1].set_ylabel('F1-Weighted', fontsize=12)
axes[1].set_title('F1-Weighted Comparison - All Models', fontsize=14, fontweight='bold')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(models, rotation=45, ha='right')
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

## 8. Decision Tree Analysis

### Key Findings:

1. **Performance varies significantly with max_depth**:
   - depth=3: Very poor (8% accuracy) - too simple, over-compensates for class imbalance
   - depth=5: Poor (23% accuracy) - still too simple
   - depth=10: Good (52% accuracy) - best Decision Tree performance

2. **Decision Tree (depth=10) vs other models**:
   - Better than Naive Bayes (37%)
   - Worse than all k-NN variants (60-69%)
   - Best single Decision Tree: depth=10

3. **Feature Importance (depth=10)**:
   - **FX_Reserves**: 23.63% (most important!)
   - **Public_Debt**: 21.28%
   - **Interest_Rate**: 14.54%
   - **Unemployment**: 14.51%
   
4. **Insights**:
   - Foreign exchange reserves and public debt are the strongest predictors
   - Decision Trees handle feature correlation naturally
   - Deeper trees perform better but risk overfitting

### Current Best Model: k-NN (k=3) with 68.84% accuracy

## 9. Random Forest Classification

Random Forest is an ensemble learning method that combines multiple decision trees. Each tree is trained on a random subset of the data and features, and the final prediction is made by majority voting.

**Key points:**
- Ensemble of decision trees (Bagging)
- No feature normalization needed
- Reduces overfitting compared to single Decision Tree
- Test n_estimators = 50, 100, 200
- Provides feature importance
- Uses `class_weight='balanced'` and `n_jobs=-1` (parallel processing)

### 9.1 Random Forest with n_estimators=50

In [None]:
# Create Random Forest model with n_estimators=50
rf_model_50 = RandomForestClassifier(n_estimators=50, max_depth=None, random_state=42, 
                                      class_weight='balanced', n_jobs=-1)

# Cross-validation scores
accuracy_rf50 = cross_val_score(rf_model_50, X, y, cv=skf, scoring='accuracy')
f1_weighted_rf50 = cross_val_score(rf_model_50, X, y, cv=skf, scoring='f1_weighted')
f1_macro_rf50 = cross_val_score(rf_model_50, X, y, cv=skf, scoring='f1_macro')

print("Random Forest (n=50) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_rf50.mean():.4f} ± {accuracy_rf50.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_rf50.mean():.4f} ± {f1_weighted_rf50.std():.4f}")
print(f"  F1-Macro:      {f1_macro_rf50.mean():.4f} ± {f1_macro_rf50.std():.4f}")

# Train final model
rf_model_50.fit(X, y)
print("\n✓ Model trained on full dataset")

### 9.2 Random Forest with n_estimators=100

In [None]:
# Random Forest with n_estimators=100
rf_model_100 = RandomForestClassifier(n_estimators=100, max_depth=None, random_state=42, 
                                       class_weight='balanced', n_jobs=-1)

accuracy_rf100 = cross_val_score(rf_model_100, X, y, cv=skf, scoring='accuracy')
f1_weighted_rf100 = cross_val_score(rf_model_100, X, y, cv=skf, scoring='f1_weighted')
f1_macro_rf100 = cross_val_score(rf_model_100, X, y, cv=skf, scoring='f1_macro')

print("Random Forest (n=100) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_rf100.mean():.4f} ± {accuracy_rf100.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_rf100.mean():.4f} ± {f1_weighted_rf100.std():.4f}")
print(f"  F1-Macro:      {f1_macro_rf100.mean():.4f} ± {f1_macro_rf100.std():.4f}")

rf_model_100.fit(X, y)
y_pred_rf100 = cross_val_predict(rf_model_100, X, y, cv=skf)

### 9.3 Random Forest with n_estimators=200 (Best Model)

In [None]:
# Random Forest with n_estimators=200
rf_model_200 = RandomForestClassifier(n_estimators=200, max_depth=None, random_state=42, 
                                       class_weight='balanced', n_jobs=-1)

accuracy_rf200 = cross_val_score(rf_model_200, X, y, cv=skf, scoring='accuracy')
f1_weighted_rf200 = cross_val_score(rf_model_200, X, y, cv=skf, scoring='f1_weighted')
f1_macro_rf200 = cross_val_score(rf_model_200, X, y, cv=skf, scoring='f1_macro')

print("Random Forest (n=200) Cross-Validation Results:")
print(f"  Accuracy:      {accuracy_rf200.mean():.4f} ± {accuracy_rf200.std():.4f}")
print(f"  F1-Weighted:   {f1_weighted_rf200.mean():.4f} ± {f1_weighted_rf200.std():.4f}")
print(f"  F1-Macro:      {f1_macro_rf200.mean():.4f} ± {f1_macro_rf200.std():.4f}")

rf_model_200.fit(X, y)
y_pred_rf200 = cross_val_predict(rf_model_200, X, y, cv=skf)

In [None]:
# Feature Importance for Random Forest (n=200)
feature_importance_rf200 = rf_model_200.feature_importances_
importance_df_rf200 = pd.DataFrame({
    'Feature': X.columns,
    'Importance': feature_importance_rf200
}).sort_values('Importance', ascending=False)

# Visualize
plt.figure(figsize=(10, 6))
plt.barh(importance_df_rf200.sort_values('Importance')['Feature'], 
         importance_df_rf200.sort_values('Importance')['Importance'], 
         color='forestgreen')
plt.xlabel('Importance', fontsize=12)
plt.ylabel('Feature', fontsize=12)
plt.title('Random Forest (n=200) - Feature Importance', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3, axis='x')
plt.tight_layout()
plt.show()

print("\nFeature Importance Ranking:")
print(importance_df_rf200)

### 9.4 Comparison of Random Forest n_estimators Values

In [None]:
# Create final comparison dataframe for ALL models
comparison_final = pd.DataFrame({
    'Model': ['k-NN (k=3)', 'k-NN (k=5)', 'k-NN (k=7)', 'k-NN (k=9)', 
              'Naive Bayes', 'DT (depth=3)', 'DT (depth=5)', 'DT (depth=10)',
              'RF (n=50)', 'RF (n=100)', 'RF (n=200)'],
    'Accuracy': [accuracy_k3.mean(), accuracy_k5.mean(), accuracy_k7.mean(), accuracy_k9.mean(), 
                 accuracy_nb.mean(), accuracy_dt3.mean(), accuracy_dt5.mean(), accuracy_dt10.mean(),
                 accuracy_rf50.mean(), accuracy_rf100.mean(), accuracy_rf200.mean()],
    'Accuracy_Std': [accuracy_k3.std(), accuracy_k5.std(), accuracy_k7.std(), accuracy_k9.std(), 
                     accuracy_nb.std(), accuracy_dt3.std(), accuracy_dt5.std(), accuracy_dt10.std(),
                     accuracy_rf50.std(), accuracy_rf100.std(), accuracy_rf200.std()],
    'F1_Weighted': [f1_weighted_k3.mean(), f1_weighted_k5.mean(), f1_weighted_k7.mean(), f1_weighted_k9.mean(), 
                    f1_weighted_nb.mean(), f1_weighted_dt3.mean(), f1_weighted_dt5.mean(), f1_weighted_dt10.mean(),
                    f1_weighted_rf50.mean(), f1_weighted_rf100.mean(), f1_weighted_rf200.mean()],
    'F1_Weighted_Std': [f1_weighted_k3.std(), f1_weighted_k5.std(), f1_weighted_k7.std(), f1_weighted_k9.std(), 
                        f1_weighted_nb.std(), f1_weighted_dt3.std(), f1_weighted_dt5.std(), f1_weighted_dt10.std(),
                        f1_weighted_rf50.std(), f1_weighted_rf100.std(), f1_weighted_rf200.std()]
})

print("FINAL Performance Comparison - All Models:")
print("="*80)
comparison_final.sort_values('Accuracy', ascending=False)

In [None]:
# Visualize final comparison of all models
fig, axes = plt.subplots(1, 2, figsize=(18, 6))

models = comparison_final['Model']
x_pos = np.arange(len(models))

# Color coding: k-NN (blue), Naive Bayes (orange), Decision Tree (green), Random Forest (red)
colors = ['#1f77b4', '#1f77b4', '#1f77b4', '#1f77b4', '#ff7f0e', 
          '#2ca02c', '#2ca02c', '#2ca02c', '#d62728', '#d62728', '#d62728']

# Accuracy
axes[0].bar(x_pos, comparison_final['Accuracy'], 
            yerr=comparison_final['Accuracy_Std'], 
            capsize=5, alpha=0.7, color=colors)
axes[0].set_ylabel('Accuracy', fontsize=12)
axes[0].set_title('Accuracy Comparison - All Models', fontsize=14, fontweight='bold')
axes[0].set_xticks(x_pos)
axes[0].set_xticklabels(models, rotation=45, ha='right')
axes[0].grid(True, alpha=0.3, axis='y')
axes[0].axhline(y=0.8, color='red', linestyle='--', alpha=0.5, label='80% threshold')
axes[0].legend()

# F1-Weighted
axes[1].bar(x_pos, comparison_final['F1_Weighted'], 
            yerr=comparison_final['F1_Weighted_Std'], 
            capsize=5, alpha=0.7, color=colors)
axes[1].set_ylabel('F1-Weighted', fontsize=12)
axes[1].set_title('F1-Weighted Comparison - All Models', fontsize=14, fontweight='bold')
axes[1].set_xticks(x_pos)
axes[1].set_xticklabels(models, rotation=45, ha='right')
axes[1].grid(True, alpha=0.3, axis='y')
axes[1].axhline(y=0.8, color='red', linestyle='--', alpha=0.5, label='80% threshold')
axes[1].legend()

plt.tight_layout()
plt.show()

## 10. Final Analysis and Conclusions

### Random Forest Performance:

**Results:**
- **n=50**: 78.42% accuracy, 77.20% F1-weighted
- **n=100**: 78.95% accuracy, 77.52% F1-weighted
- **n=200**: **79.68% accuracy, 78.47% F1-weighted** ← **BEST MODEL**

**Why Random Forest is the best:**

1. **Ensemble Learning**: Combines 200 decision trees through majority voting
   - Each tree sees different data (bootstrap sampling)
   - Each tree uses random feature subsets
   - Reduces overfitting dramatically

2. **Performance comparison**:
   - **10.84 points better** than k-NN (k=3): 79.68% vs 68.84%
   - **27.89 points better** than Decision Tree (depth=10): 79.68% vs 51.79%
   - **42.42 points better** than Naive Bayes: 79.68% vs 37.26%

3. **Feature Importance (n=200)**:
   - **FX_Reserves**: 16.97% (most important)
   - **Public_Debt**: 15.45%
   - **Unemployment**: 13.86%
   - **Interest_Rate**: 13.53%
   - All features contribute (balanced usage)

4. **Stability**:
   - Low standard deviation: ±2.6%
   - Consistent across all 5 folds
   - More stable than k-NN (±2.8%)

### Model Type Comparison:

| Model Type | Best Model | Accuracy | Strength |
|------------|------------|----------|----------|
| **Ensemble (Bagging)** | **Random Forest (n=200)** | **79.68%** | **Winner** |
| Distance-based | k-NN (k=3) | 68.84% | Good |
| Tree-based | Decision Tree (depth=10) | 51.79% | Interpretable |
| Probabilistic | Naive Bayes | 37.26% | Fast baseline |

### Key Insights:

1. **Ensemble methods dominate**: Random Forest significantly outperforms all single models
2. **Feature importance**: Foreign exchange reserves and public debt are the strongest predictors
3. **Class imbalance handled well**: `class_weight='balanced'` helps with rare classes
4. **Cross-validation essential**: Provides reliable performance estimates

### Final Recommendation:

**Use Random Forest (n=200) for production** with 79.68% accuracy and 78.47% F1-weighted score.

In [None]:
# Load and display saved metrics
metrics_df = pd.read_csv('../results/classification_metrics.csv')
print("Saved Classification Metrics:")
print("="*80)
metrics_df