# Cart Super Add-On (CSAO) Rail Recommendation System

This notebook implements a complete end-to-end ML system for recommending add-ons in a food delivery platform.

**To run in Google Colab:**
1. Upload this notebook to Google Colab
2. Upload the `src` folder and `data` folder to Colab's file system
3. Run the cells

In [None]:
# Install required packages
!pip install -q pandas numpy scikit-learn xgboost matplotlib seaborn

In [1]:
import sys
import os

# Check if running in Google Colab
IN_COLAB = 'google.colab' in sys.modules

if IN_COLAB:
    # In Colab, we need to upload files or use from google.colab import files
    from google.colab import files
    print("Running in Google Colab!")
    print("Please upload the 'src' and 'data' folders using the file upload button")
    
    # Set paths for Colab
    src_dir = 'src'
    data_dir = 'data'
else:
    # Local execution
    notebook_dir = os.path.dirname(os.path.abspath('__file__'))
    project_root = os.path.dirname(notebook_dir)
    src_dir = os.path.join(project_root, 'src')
    data_dir = os.path.join(project_root, 'data')

sys.path.insert(0, src_dir)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

# Custom modules
from features import create_features
from models import train_and_evaluate
from cold_start import handle_cold_start
from business_impact import simulate_business_impact
from ab_testing import simulate_ab_test

# Set style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print(f"Source directory: {src_dir}")
print(f"Data directory: {data_dir}")

## 1. Data Generation

In [2]:
# Check if data files exist, if not generate them
orders_path = os.path.join(data_dir, 'orders.csv')
if not os.path.exists(orders_path):
    print("Generating synthetic data...")
    os.system(f'python {os.path.join(data_dir, "generate_data.py")}')
else:
    print("Data files already exist.")

# Load data
orders_df = pd.read_csv(orders_path)
users_df = pd.read_csv(os.path.join(data_dir, 'users.csv'))
restaurants_df = pd.read_csv(os.path.join(data_dir, 'restaurants.csv'))
items_df = pd.read_csv(os.path.join(data_dir, 'items.csv'))

print(f"Orders shape: {orders_df.shape}")
print(f"Users shape: {users_df.shape}")
print(f"Restaurants shape: {restaurants_df.shape}")
print(f"Items shape: {items_df.shape}")

In [3]:
# Data exploration
orders_df.head()

In [4]:
# Target distribution
sns.countplot(x='addon_accepted', data=orders_df)
plt.title('Add-on Acceptance Distribution')
plt.show()

## 2. Feature Engineering

In [None]:
# Create features
orders_df = create_features(orders_df)
orders_df.head()

## 3. Model Training and Evaluation

In [None]:
# Train and evaluate models
results, lr_model, gb_model = train_and_evaluate(orders_df)

print("Model Performance:")
for model_name, metrics in results.items():
    print(f"{model_name}: AUC={metrics['auc']:.3f}, Precision={metrics['precision']:.3f}, Recall={metrics['recall']:.3f}, Precision@5={metrics['precision_at_5']:.3f}")

## 4. Cold Start Handling

In [None]:
# Example cold start
user_history = orders_df['user_id'].unique()
cold_start_prob = handle_cold_start(9999, 201, 'New York', 'Italian', 12, 'Premium', 'Main', orders_df, user_history)
print(f"Cold start probability: {cold_start_prob}")

## 5. Business Impact Simulation

In [None]:
# Simulate impact
baseline_auc = results['baseline']['auc']
model_auc = results['gradient_boosting']['auc']
impact = simulate_business_impact(baseline_auc, model_auc)

print("Business Impact:")
for key, value in impact.items():
    print(f"{key}: {value}")

## 6. A/B Testing Simulation

In [5]:
# Simulate A/B test
ab_results = simulate_ab_test(baseline_auc, model_auc)

print("A/B Test Results:")
for key, value in ab_results.items():
    print(f"{key}: {value}")

## 7. Production Architecture Overview

### System Design:
- **Cart Event**: User adds item to cart â†’ triggers recommendation request.
- **Feature Store**: Retrieve real-time features (user history, restaurant data, time, cart context).
- **Model API**: Serve predictions from trained models.
- **Ranking Engine**: Rank add-on suggestions based on scores.
- **API Response**: Return top add-ons to app.

### Inference < 300ms:
- Use optimized models (e.g., ONNX for fast inference).
- Pre-compute features where possible.
- Async processing for non-critical parts.

### Scalability:
- Kubernetes for container orchestration.
- Load balancers.
- Horizontal scaling.

### Caching:
- Redis for user features, popular add-ons.
- TTL-based eviction.

### Retraining:
- Batch retraining weekly with new data.
- A/B test new models before deployment.