# Airbnb New User Booking Destinations - Model Evaluation

This notebook focuses on evaluating the models trained in the previous notebook and selecting the best performing model for predicting Airbnb new user booking destinations.

## Objectives:
- Load the trained models
- Evaluate their performance using various metrics
- Compare models and select the best one
- Generate predictions for the test set
- Prepare submission file

## Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import pickle
import os
from datetime import datetime

from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report
from sklearn.preprocessing import LabelEncoder
import joblib

# Set plotting style
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

# For reproducibility
np.random.seed(42)

## Load Preprocessed Data

First, let's load the preprocessed training and test data that we prepared in the previous notebooks.

In [None]:
# Load the preprocessed training and test data
X_train = pd.read_csv('../data/processed/X_train_preprocessed.csv')
X_test = pd.read_csv('../data/processed/X_test_preprocessed.csv')
y_train = pd.read_csv('../data/processed/y_train.csv', squeeze=True)
X_val = pd.read_csv('../data/processed/X_val_preprocessed.csv')
y_val = pd.read_csv('../data/processed/y_val.csv', squeeze=True)

print(f"Training data shape: {X_train.shape}")
print(f"Validation data shape: {X_val.shape}")
print(f"Test data shape: {X_test.shape}")

## Load Trained Models

Let's load the models we trained in the previous notebook.

In [None]:
# Create a directory to store models if it doesn't exist
os.makedirs('../models', exist_ok=True)

# Function to load models
def load_model(model_path):
    try:
        return joblib.load(model_path)
    except Exception as e:
        print(f"Error loading model from {model_path}: {e}")
        return None

# Load all models
models = {
    'Logistic Regression': load_model('../models/logistic_regression_model.pkl'),
    'Random Forest': load_model('../models/random_forest_model.pkl'),
    'XGBoost': load_model('../models/xgboost_model.pkl'),
    'LightGBM': load_model('../models/lightgbm_model.pkl')
}

# Print loaded models
for model_name, model in models.items():
    if model is not None:
        print(f"Loaded model: {model_name}")
    else:
        print(f"Failed to load model: {model_name}")

## Evaluation Functions

Define functions to evaluate and compare the performance of our models.

In [None]:
def evaluate_model(model, X, y, model_name):
    """Evaluate a model's performance on the given dataset"""
    if model is None:
        print(f"Model {model_name} is not available for evaluation")
        return {}
    
    # Make predictions
    y_pred = model.predict(X)
    
    # Calculate metrics
    metrics = {
        'accuracy': accuracy_score(y, y_pred),
        'f1_macro': f1_score(y, y_pred, average='macro'),
        'f1_weighted': f1_score(y, y_pred, average='weighted'),
        'confusion_matrix': confusion_matrix(y, y_pred),
        'classification_report': classification_report(y, y_pred, output_dict=True)
    }
    
    return metrics

def ndcg_at_k(y_true, y_pred_proba, k=5):
    """Calculate NDCG@k metric for a multiclass prediction"""
    # This is a simplified version for illustration
    # In real applications, you might want to use a library like scikit-learn or implement the full algorithm
    if len(y_pred_proba.shape) < 2:
        return 0
    
    # Get top k predictions for each sample
    top_k_indices = np.argsort(-y_pred_proba, axis=1)[:, :k]
    
    # Create a mapping for label encoder if needed
    if hasattr(models['Logistic Regression'], 'classes_'):
        class_mapping = {i: cls for i, cls in enumerate(models['Logistic Regression'].classes_)}
    else:
        # If no mapping is available, we'll use indices directly
        class_mapping = {i: i for i in range(y_pred_proba.shape[1])}
    
    # Calculate DCG and IDCG
    dcg = 0
    idcg = 1.0  # For binary relevance, IDCG@1 = 1
    
    for i, (true_label, pred_indices) in enumerate(zip(y_true, top_k_indices)):
        # Check if the true label is in the top k predictions
        if any(class_mapping[idx] == true_label for idx in pred_indices):
            rank = 1 + next(r for r, idx in enumerate(pred_indices) if class_mapping[idx] == true_label)
            dcg += 1.0 / np.log2(rank + 1)
    
    return dcg / (len(y_true) * idcg)

def top_k_accuracy(y_true, y_pred_proba, k=5):
    """Calculate Top-K accuracy for multiclass prediction"""
    if len(y_pred_proba.shape) < 2:
        return 0
    
    top_k_indices = np.argsort(-y_pred_proba, axis=1)[:, :k]
    
    if hasattr(models['Logistic Regression'], 'classes_'):
        class_mapping = {i: cls for i, cls in enumerate(models['Logistic Regression'].classes_)}
    else:
        class_mapping = {i: i for i in range(y_pred_proba.shape[1])}
    
    correct = 0
    for i, (true_label, pred_indices) in enumerate(zip(y_true, top_k_indices)):
        if any(class_mapping[idx] == true_label for idx in pred_indices):
            correct += 1
    
    return correct / len(y_true)

## Model Evaluation

Evaluate each model's performance on the validation dataset.

In [None]:
# Dictionary to store all evaluation results
evaluation_results = {}

# For each model, evaluate on validation set
for model_name, model in models.items():
    if model is not None:
        print(f"Evaluating {model_name}...")
        evaluation_results[model_name] = evaluate_model(model, X_val, y_val, model_name)
        
        # Print basic metrics
        print(f"  Accuracy: {evaluation_results[model_name]['accuracy']:.4f}")
        print(f"  F1 (macro): {evaluation_results[model_name]['f1_macro']:.4f}")
        print(f"  F1 (weighted): {evaluation_results[model_name]['f1_weighted']:.4f}")
        print("\n")
        
        # Calculate probability predictions for top-k metrics if the model supports predict_proba
        if hasattr(model, 'predict_proba'):
            y_pred_proba = model.predict_proba(X_val)
            top5_acc = top_k_accuracy(y_val, y_pred_proba, k=5)
            ndcg5 = ndcg_at_k(y_val, y_pred_proba, k=5)
            
            evaluation_results[model_name]['top5_accuracy'] = top5_acc
            evaluation_results[model_name]['ndcg@5'] = ndcg5
            
            print(f"  Top-5 Accuracy: {top5_acc:.4f}")
            print(f"  NDCG@5: {ndcg5:.4f}")
            print("\n")

## Confusion Matrix Visualization

Visualize the confusion matrix for each model to better understand where they make mistakes.

In [None]:
def plot_confusion_matrix(cm, classes, model_name, normalize=False, figsize=(10, 8)):
    """Plot confusion matrix for a given model"""
    plt.figure(figsize=figsize)
    
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        title = f'Normalized Confusion Matrix - {model_name}'
    else:
        title = f'Confusion Matrix - {model_name}'
    
    sns.heatmap(cm, annot=True, fmt='.2f' if normalize else 'd', 
                cmap='Blues', xticklabels=classes, yticklabels=classes)
    
    plt.title(title, fontsize=16)
    plt.ylabel('True Label', fontsize=14)
    plt.xlabel('Predicted Label', fontsize=14)
    plt.tight_layout()
    plt.show()

# Plot confusion matrices for each model
for model_name, results in evaluation_results.items():
    if 'confusion_matrix' in results:
        # Get unique classes from validation set
        classes = np.unique(y_val)
        
        # Plot non-normalized confusion matrix
        plot_confusion_matrix(results['confusion_matrix'], classes, model_name, normalize=False)
        
        # Plot normalized confusion matrix
        plot_confusion_matrix(results['confusion_matrix'], classes, model_name, normalize=True)

## Comparative Analysis

Compare all models side by side to identify the best performer.

In [None]:
# Create a DataFrame for comparison
comparison_data = []

for model_name, results in evaluation_results.items():
    row = {
        'Model': model_name,
        'Accuracy': results.get('accuracy', 0),
        'F1 (macro)': results.get('f1_macro', 0),
        'F1 (weighted)': results.get('f1_weighted', 0),
        'Top-5 Accuracy': results.get('top5_accuracy', 0),
        'NDCG@5': results.get('ndcg@5', 0)
    }
    comparison_data.append(row)

comparison_df = pd.DataFrame(comparison_data)
comparison_df.set_index('Model', inplace=True)

# Display the comparison table
print("Model Performance Comparison:")
comparison_df

In [None]:
# Bar plot for model comparison
metrics = ['Accuracy', 'F1 (macro)', 'F1 (weighted)', 'Top-5 Accuracy', 'NDCG@5']

plt.figure(figsize=(14, 10))

for i, metric in enumerate(metrics):
    if metric in comparison_df.columns:
        plt.subplot(len(metrics), 1, i+1)
        sns.barplot(x=comparison_df.index, y=comparison_df[metric])
        plt.title(f'Model Comparison - {metric}')
        plt.ylabel(metric)
        plt.xticks(rotation=45)
        plt.grid(axis='y', linestyle='--', alpha=0.7)

plt.tight_layout()
plt.show()

## Select Best Model

Based on the evaluation metrics, select the best performing model.

In [None]:
# Select the best model based on F1 (macro) score
best_model_name = comparison_df['F1 (macro)'].idxmax()
best_model = models[best_model_name]

print(f"Best Model based on F1 (macro): {best_model_name}")
print(f"F1 (macro) score: {comparison_df.loc[best_model_name, 'F1 (macro)']:.4f}")

# Check if Top-5 Accuracy might be a better metric for this task
if 'Top-5 Accuracy' in comparison_df.columns:
    best_model_top5 = comparison_df['Top-5 Accuracy'].idxmax()
    print(f"\nBest Model based on Top-5 Accuracy: {best_model_top5}")
    print(f"Top-5 Accuracy: {comparison_df.loc[best_model_top5, 'Top-5 Accuracy']:.4f}")

# For Airbnb destination prediction, Top-5 Accuracy might be more relevant
# Let's decide on our final model
final_model_name = best_model_top5 if 'Top-5 Accuracy' in comparison_df.columns else best_model_name
final_model = models[final_model_name]

print(f"\nSelected Final Model: {final_model_name}")

## Feature Importance Analysis

If the selected model supports feature importance, let's analyze which features contribute most to the predictions.

In [None]:
def plot_feature_importance(model, feature_names, model_name, top_n=20):
    """Plot feature importance for tree-based models"""
    if hasattr(model, 'feature_importances_'):
        # Get feature importance
        importances = model.feature_importances_
        
        # Create a DataFrame for easier manipulation
        feature_importance_df = pd.DataFrame({
            'Feature': feature_names,
            'Importance': importances
        }).sort_values('Importance', ascending=False)
        
        # Plot top N features
        plt.figure(figsize=(12, 8))
        sns.barplot(x='Importance', y='Feature', data=feature_importance_df.head(top_n))
        plt.title(f'Top {top_n} Feature Importance - {model_name}', fontsize=16)
        plt.tight_layout()
        plt.show()
        
        return feature_importance_df
    elif hasattr(model, 'coef_'):
        # For linear models like Logistic Regression
        coefs = model.coef_
        
        # For multiclass, take the average of absolute coefficients across classes
        if len(coefs.shape) > 1:
            importances = np.mean(np.abs(coefs), axis=0)
        else:
            importances = np.abs(coefs)
        
        # Create a DataFrame for easier manipulation
        feature_importance_df = pd.DataFrame({
            'Feature': feature_names,
            'Importance': importances
        }).sort_values('Importance', ascending=False)
        
        # Plot top N features
        plt.figure(figsize=(12, 8))
        sns.barplot(x='Importance', y='Feature', data=feature_importance_df.head(top_n))
        plt.title(f'Top {top_n} Feature Importance - {model_name}', fontsize=16)
        plt.tight_layout()
        plt.show()
        
        return feature_importance_df
    else:
        print(f"Model {model_name} doesn't provide feature importance information")
        return None

# Plot feature importance for the final model
feature_names = X_train.columns
importance_df = plot_feature_importance(final_model, feature_names, final_model_name)

if importance_df is not None:
    print("Top 10 Most Important Features:")
    importance_df.head(10)

## Save the Best Model

Save the best performing model for future use.

In [None]:
# Save the best model
now = datetime.now().strftime('%Y%m%d_%H%M%S')
best_model_path = f'../models/best_model_{now}.pkl'

joblib.dump(final_model, best_model_path)
print(f"Best model saved to {best_model_path}")

# Also save model metadata
model_metadata = {
    'model_name': final_model_name,
    'accuracy': comparison_df.loc[final_model_name, 'Accuracy'],
    'f1_macro': comparison_df.loc[final_model_name, 'F1 (macro)'],
    'f1_weighted': comparison_df.loc[final_model_name, 'F1 (weighted)'],
    'timestamp': now,
    'features': list(X_train.columns)
}

if 'Top-5 Accuracy' in comparison_df.columns:
    model_metadata['top5_accuracy'] = comparison_df.loc[final_model_name, 'Top-5 Accuracy']

# Save metadata
with open(f'../models/best_model_metadata_{now}.json', 'w') as f:
    import json
    json.dump(model_metadata, f, indent=4)

print("Model metadata saved successfully")

## Generate Predictions for Test Set

Use the best model to make predictions on the test set and prepare the submission file.

In [None]:
# Load the original test data to get the user IDs
test_users = pd.read_csv('../test_users.csv')
test_user_ids = test_users['id']

# Make predictions with the best model
if hasattr(final_model, 'predict_proba'):
    # Get probability predictions
    y_test_proba = final_model.predict_proba(X_test)
    
    # Get the top 5 predictions for each user
    n_classes = y_test_proba.shape[1]
    
    # Get the class names (country destinations)
    if hasattr(final_model, 'classes_'):
        class_names = final_model.classes_
    else:
        # If class names are not available, use indices
        print("Warning: Class names not found. Using numerical indices instead.")
        class_names = np.arange(n_classes)
    
    # Create a list to store the submissions
    submissions = []
    
    # For each user, get the top 5 predicted destinations
    for i, user_id in enumerate(test_user_ids):
        probas = y_test_proba[i]
        top5_indices = np.argsort(-probas)[:5]  # Descending order
        top5_destinations = [class_names[idx] for idx in top5_indices]
        
        # Add to submission list
        submissions.append({
            'id': user_id,
            'country': ' '.join(top5_destinations)
        })
    
    # Create submission DataFrame
    submission_df = pd.DataFrame(submissions)
else:
    # If model doesn't support predict_proba, just use predict
    y_test_pred = final_model.predict(X_test)
    
    # Create submission DataFrame
    submission_df = pd.DataFrame({
        'id': test_user_ids,
        'country': y_test_pred
    })

# Display first few rows of the submission file
print("Submission file preview:")
submission_df.head()

In [None]:
# Save the submission file
submission_path = f'../outputs/submission_{now}.csv'
os.makedirs('../outputs', exist_ok=True)
submission_df.to_csv(submission_path, index=False)

print(f"Submission file saved to {submission_path}")

## Summary and Conclusions

Summarize the findings and provide conclusions about the model evaluation process.

### Evaluation Summary

In this notebook, we've evaluated several machine learning models for predicting Airbnb new user booking destinations:

1. **Models Evaluated**:
   - Logistic Regression
   - Random Forest
   - XGBoost
   - LightGBM

2. **Evaluation Metrics**:
   - Accuracy
   - F1-Score (Macro and Weighted)
   - Top-5 Accuracy
   - NDCG@5

3. **Best Performing Model**:
   - We selected our final model based primarily on Top-5 Accuracy, as this is most relevant for the Airbnb prediction task where suggesting a set of potential destinations is more valuable than predicting a single destination.

4. **Feature Importance**:
   - We analyzed which features had the most predictive power in our model, helping us understand what factors influence new users' booking destinations.

5. **Submission File**:
   - We generated predictions for the test set and prepared a submission file in the required format.

### Next Steps

If we wanted to improve our model further, we could consider:

1. **Feature Engineering**: Develop more sophisticated features based on the sessions data.
2. **Hyperparameter Tuning**: Conduct more extensive hyperparameter optimization.
3. **Ensemble Methods**: Create an ensemble of our best models to further improve performance.
4. **Deep Learning**: Experiment with neural networks for this classification task.

The code and model for this project have been saved and can be used for making predictions on new data or deployed as part of a user recommendation system.