# Synthetic Route Planning Data Generator

**Description:**  
This notebook programmatically generates a **synthetic warehouse route-planning dataset** for the logistics AI project. It creates simulated delivery points for multiple warehouses, assigning random customer locations, shipment demands, vehicle capacities, and contextual variables such as delivery priority and traffic conditions.

**Goal:**  
To produce a realistic dataset **Warehouse_Route_Planning.csv** that will be used in later project phases for **route optimization modeling** (e.g., with OR-Tools or Pyomo).  
This dataset complements the real **Product_Demand_Forecasting** data, allowing you to build a full end-to-end pipeline that forecasts shipment volumes and then optimizes delivery routes.

In [1]:
# ====================================================================
# Connect Google Drive
# ====================================================================
from google.colab import drive
drive.mount('/content/drive')

# Set the working directory
%cd /content/drive/MyDrive/ai_logistics

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
/content/drive/MyDrive/ai_logistics


In [2]:
# =====================================================================
# Imports and Setup
# =====================================================================
import sys, os
sys.path.append('/content/drive/MyDrive/ai_logistics')

# Make sure next import line includes all functions you need!
from utils.utils_data import load_real_dataset, get_unique_warehouses,generate_city_coordinates,generate_route_data,save_dataset

import pandas as pd
import numpy as np
from pathlib import Path
# Set random seed for reproducibility
np.random.seed(42)

In [3]:
# =====================================================================
# Paths
# =====================================================================
DATA_DIR = Path("data")
DATA_DIR.mkdir(exist_ok=True)

# Since the working directory was already set with:
# %cd /content/drive/MyDrive/ai_logistics_project
# all paths below are relative to that location
REAL_DATA_PATH = DATA_DIR / "Historical_Product_Demand.csv"
OUTPUT_PATH = DATA_DIR / "Warehouse_Route_Planning.csv"

## Generate Synthetic Route Optimization Dataset

In this phase, we generated a synthetic dataset to complement the real *Historical Product Demand* data.  
The goal was to simulate realistic delivery points, customer demand, and traffic conditions around four warehouses in Spain.

Each step in the main execution cell corresponds directly to a function in `utils_data.py`, which keeps the process modular and production-ready.  
The workflow includes:
1. Loading and validating the real dataset.  
2. Extracting unique warehouse identifiers.  
3. Generating coordinates for each warehouse.  
4. Creating synthetic customer delivery points with demand and traffic features.  
5. Saving the generated dataset for future analysis.


In [4]:
# =====================================================================
# Main Execution
# =====================================================================
if __name__ == "__main__":
    # ---------------------------------------------------------
    # Step 1. Load the real dataset
    # (uses: load_real_dataset)
    # ---------------------------------------------------------
    df_real = load_real_dataset(REAL_DATA_PATH)

    # ---------------------------------------------------------
    # Step 2. Extract unique warehouse codes
    # (uses: get_unique_warehouses)
    # ---------------------------------------------------------
    warehouse_list = get_unique_warehouses(df_real)

    # ---------------------------------------------------------
    # Step 3. Generate coordinates for each warehouse
    # (uses: generate_city_coordinates)
    # ---------------------------------------------------------
    coordinates = generate_city_coordinates(len(warehouse_list))

    # ---------------------------------------------------------
    # Step 4. Generate synthetic route data
    # (uses: generate_route_data)
    # ---------------------------------------------------------
    df_routes = generate_route_data(warehouse_list, coordinates, customers_per_warehouse=4)

    # ---------------------------------------------------------
    # Step 5. Save the generated dataset
    # (uses: save_dataset)
    # ---------------------------------------------------------
    save_dataset(df_routes, OUTPUT_PATH)

    print("\n Synthetic route dataset generation complete!")

Loaded dataset with 1,048,575 rows and 4 unique warehouses.
Detected 4 warehouses: ['Whse_A', 'Whse_C', 'Whse_J', 'Whse_S']
Generated 16 route records for 4 warehouses.
Saved dataset to /content/drive/MyDrive/ai_logistics/data/Warehouse_Route_Planning.csv
  Warehouse    Customer_ID  Latitude  Longitude  Demand  Vehicle_Capacity  \
0    Whse_A  Whse_A_CUST_1   40.4075    -3.6549     156               500   
1    Whse_A  Whse_A_CUST_2   40.4146    -3.7400     137               500   
2    Whse_A  Whse_A_CUST_3   40.4351    -3.7444     137               500   
3    Whse_A  Whse_A_CUST_4   40.3882    -3.7317     107               500   
4    Whse_C  Whse_C_CUST_1   41.3591     2.1812     219               500   
5    Whse_C  Whse_C_CUST_2   41.3533     2.1291     239               500   
6    Whse_C  Whse_C_CUST_3   41.3892     2.1246     100               500   
7    Whse_C  Whse_C_CUST_4   41.3313     2.2142      63               500   
8    Whse_J  Whse_J_CUST_1   37.3398    -5.9716    