# Use Case 1: "Give Me Routes Like This"

**Purpose:** Given a set of route features, find the most similar routes in our dataset.

**Model:** KNN with k=5, cosine metric (93.9% improvement over baseline)

**Use Cases:**
- Cyclist wants routes similar to one they enjoyed
- Find alternatives in different locations with same characteristics
- Discover routes with similar difficulty/distance/terrain

## 1. Setup and Load Data

In [2]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.compose import ColumnTransformer
from sklearn.neighbors import NearestNeighbors

In [3]:
df = pd.read_csv("/Users/eugeneleach/code/Eugle3/cycle_more/Notebooks/KNN Model/Data_Engineered.csv")

In [4]:
pd.set_option('display.max_columns', None)
print(f"Loaded {len(df)} routes")

Loaded 16878 routes


## 2. Prepare Features and Scaler

In [7]:
# Create feature matrix
X = df.drop(['id', 'name'], axis=1)

# Apply same scaling as training
scaler = ColumnTransformer(transformers=[
    ('standard', StandardScaler(), ['distance_m', 'duration_s', 'ascent_m', 'descent_m', 'Turn_Density', 'steps', 'turns']),
    ('minmax', MinMaxScaler(), ['Cycleway', 'on_road', 'off_road', 'Gravel_Tracks', 'Paved_Paths', 'Other', 'Unknown Surface', 'Paved_Road', 'Pedestrian', 'Unknown_Way', 'Cycle Track', 'Main Road', 'Steep Section', 'Moderate Section', 'Flat Section', 'Downhill Section', 'Steep Downhill Section']),
], remainder='passthrough')

X_scaled = scaler.fit_transform(X)
print(f"Features prepared: {X.shape[1]} features")

Features prepared: 24 features


In [8]:
X_scaled

array([[-0.16616222, -0.17159471, -0.0605682 , ...,  0.        ,
         0.0968    ,  0.1452    ],
       [-0.06858741, -0.07637643,  0.25097899, ...,  0.        ,
         0.2119    ,  0.0169    ],
       [ 0.82257908,  0.79343882,  5.83035516, ...,  0.        ,
         0.0925    ,  0.0435    ],
       ...,
       [-0.23618705, -0.23994896, -0.11723828, ...,  0.        ,
         0.        ,  0.        ],
       [-0.24164854, -0.24527526, -0.2037638 , ...,  0.        ,
         1.        ,  0.        ],
       [-0.23640794, -0.24016032, -0.2037638 , ...,  0.        ,
         1.        ,  0.        ]])

## 3. Train Optimized KNN Model

Using optimal parameters from grid search: k=5, cosine metric

In [9]:
# Train with optimal parameters
knn_optimal = NearestNeighbors(n_neighbors=5, metric='cosine')
knn_optimal.fit(X_scaled)

print("✅ Optimized KNN model trained!")
print(f"   k = {knn_optimal.n_neighbors}")
print(f"   metric = '{knn_optimal.metric}'")
print(f"   Training samples = {X_scaled.shape[0]}")

✅ Optimized KNN model trained!
   k = 5
   metric = 'cosine'
   Training samples = 16878


### 3.5 Save Model as joblib

In [10]:
import joblib
joblib.dump(knn_optimal, "model.pkl")

['model.pkl']

---
# PRODUCTION RECOMMENDATION FUNCTION

## 4. Define Recommendation Function

In [11]:
def recommend_similar_routes(input_features, n_recommendations=5, show_details=True):
    """
    Find routes similar to given features

    Parameters:
    - input_features: Dict, Series, or DataFrame with route features (must match X columns)
    - n_recommendations: Number of similar routes to return (default 5)
    - show_details: If True, prints detailed comparison (default True)

    Returns:
    - DataFrame with recommended routes and similarity scores
    """
    # Convert input to DataFrame
    if isinstance(input_features, dict):
        input_df = pd.DataFrame([input_features])
    elif isinstance(input_features, pd.Series):
        input_df = input_features.to_frame().T
    else:
        input_df = input_features.copy()

    # Validate columns
    if not all(col in input_df.columns for col in X.columns):
        missing = [col for col in X.columns if col not in input_df.columns]
        print(f"❌ Error: Missing features: {missing}")
        return None

    # Ensure column order matches
    input_df = input_df[X.columns]

    # Print input summary
    print("=" * 80)
    print("INPUT ROUTE FEATURES")
    print("=" * 80)
    print(f"Distance: {input_df['distance_m'].values[0]:.1f}m")
    print(f"Ascent: {input_df['ascent_m'].values[0]:.1f}m")
    print(f"Duration: {input_df['duration_s'].values[0]:.1f}s")
    print(f"Turn Density: {input_df['Turn_Density'].values[0]:.2f}")
    print()

    # Scale features
    input_scaled = scaler.transform(input_df)

    # Find nearest neighbors
    distances, indices = knn_optimal.kneighbors(input_scaled, n_neighbors=n_recommendations)

    # Build results
    recommendations = []
    for rank, (dist, idx) in enumerate(zip(distances[0], indices[0]), 1):
        rec_route = df.iloc[idx]

        recommendations.append({
            'rank': rank,
            'route_id': rec_route['id'],
            'route_name': rec_route['name'],
            'distance_m': rec_route['distance_m'],
            'ascent_m': rec_route['ascent_m'],
            'duration_s': rec_route['duration_s'],
            'turn_density': rec_route['Turn_Density'],
            'similarity_score': dist
        })

    results_df = pd.DataFrame(recommendations)

    # Print results
    print("=" * 80)
    print(f"TOP {n_recommendations} SIMILAR ROUTES")
    print("=" * 80)
    print()

    for i, row in results_df.iterrows():
        dist_diff = row['distance_m'] - input_df['distance_m'].values[0]
        ascent_diff = row['ascent_m'] - input_df['ascent_m'].values[0]

        print(f"{row['rank']}. {row['route_name'][:60]}")
        print(f"   Route ID: {row['route_id']}")
        print(f"   Similarity: {row['similarity_score']:.4f} (lower = more similar)")
        print(f"   Distance: {row['distance_m']:.1f}m ({dist_diff:+.1f}m)")
        print(f"   Ascent: {row['ascent_m']:.1f}m ({ascent_diff:+.1f}m)")
        print(f"   Duration: {row['duration_s']:.1f}s")
        print()

    # Detailed comparison
    if show_details:
        print("=" * 80)
        print("FEATURE COMPARISON")
        print("=" * 80)
        print(f"{'Feature':<25} {'Input':<15} {'Avg Recommended':<20} {'Difference':<15}")
        print("-" * 80)

        # Map between input column names and results_df column names
        feature_mapping = [
            ('distance_m', 'distance_m'),
            ('ascent_m', 'ascent_m'),
            ('duration_s', 'duration_s'),    # results_df uses lowercase
            ('Turn_Density', 'turn_density')      # results_df uses lowercase
        ]

        for input_col, results_col in feature_mapping:
            input_val = input_df[input_col].values[0]
            avg_rec_val = results_df[results_col].mean()
            diff = avg_rec_val - input_val
            print(f"{input_col:<25} {input_val:<15.2f} {avg_rec_val:<20.2f} {diff:+.2f}")

    print("\n" + "=" * 80)

    return results_df

print("✅ Recommendation function ready!")

✅ Recommendation function ready!


## 5. Helper Function: Get Features from Route ID

In [12]:
def get_route_features(route_id):
    """
    Extract features from an existing route by ID
    Useful for testing with known routes

    Parameters:
    - route_id: The route ID to look up

    Returns:
    - Series with route features
    """
    if route_id not in df['id'].values:
        print(f"❌ Error: Route ID {route_id} not found in dataset")
        return None

    route = df[df['id'] == route_id].iloc[0]
    features = route[X.columns]

    print(f"✅ Extracted features from: '{route['name']}' (ID: {route_id})")
    print(f"   Distance: {route['distance_m']:.1f}m, Ascent: {route['ascent_m']:.1f}m")
    print()

    return features

print("✅ Helper function ready!")

✅ Helper function ready!


---
# EXAMPLES

## Example 1: Use Existing Route Features

In [17]:
df.iloc[14310607]['id']

IndexError: single positional indexer is out-of-bounds

In [13]:
# Pick a route from the dataset
example_route_id = df.iloc[169]['id']

# Get its features
features = get_route_features(example_route_id)

# Find similar routes
recommendations = recommend_similar_routes(features, n_recommendations=5)

✅ Extracted features from: 'Schabs-Brixen' (ID: 13237554)
   Distance: 4894.7m, Ascent: 27.5m

INPUT ROUTE FEATURES
Distance: 4894.7m
Ascent: 27.5m
Duration: 1020.2s
Turn Density: 1.23

TOP 5 SIMILAR ROUTES

1. Schabs-Brixen
   Route ID: 13237554
   Similarity: 0.0000 (lower = more similar)
   Distance: 4894.7m (+0.0m)
   Ascent: 27.5m (+0.0m)
   Duration: 1020.2s

2. Pista Ciclabile Taio - Sabino
   Route ID: 14310607
   Similarity: 0.0714 (lower = more similar)
   Distance: 7574.3m (+2679.6m)
   Ascent: 127.2m (+99.7m)
   Duration: 1524.9s

3. Unnamed route
   Route ID: 6677164
   Similarity: 0.0718 (lower = more similar)
   Distance: 6109.8m (+1215.1m)
   Ascent: 54.7m (+27.2m)
   Duration: 1221.9s

4. Unnamed route
   Route ID: 6593010
   Similarity: 0.0722 (lower = more similar)
   Distance: 5613.5m (+718.8m)
   Ascent: 39.0m (+11.5m)
   Duration: 1127.8s

5. Unnamed route
   Route ID: 7106095
   Similarity: 0.0737 (lower = more similar)
   Distance: 3389.5m (-1505.2m)
   Ascent: 

## Example 2: Manual Feature Input (Short, Flat Route)

In [14]:
# User wants: short, flat, paved route
custom_features = {
    'distance_m': 3000.0,        # 3km
    'duration_s': 600.0,         # 10 minutes
    'ascent_m': 100.0,            # Not Very flat
    'descent_m': 100.0,
    'steps': 3,
    'turns': 5,
    'Asphalt': 90.0,             # 90% paved
    'Unknown': 0.0,
    'Paved': 10.0,
    'Compacted Gravel': 0.0,
    'Wood': 0.0,
    'Gravel': 0.0,
    'Paving Stones': 0.0,
    'Ground': 0.0,
    'Concrete': 0.0,
    'Grass': 0.0,
    'Metal': 0.0,
    'Unpaved': 0.0,
    'Dirt': 0.0,
    'Grass Paver': 0.0,
    'Sand': 0.0,
    'Road': 100.0,
    'Cycleway': 0.0,
    'State Road': 0.0,
    'Track': 0.0,
    'Street': 0.0,
    'Path': 0.0,
    'Footway': 0.0,
    'Unknown.1': 0.0,
    'Steps': 0.0,
    'Construction': 0.0,
    'Ferry': 0.0,
    'uphill_very_steep (7% to 10%)': 0.0,
    'uphill_moderate (3% to 5%)': 0.0,
    'uphill_gentle (0% to 3%)': 10.0,
    'flat (0%)': 90.0,           # Mostly flat
    'downhill_gentle (-5% to 0%)': 0.0,
    'uphill_steep (5% to 7%)': 0.0,
    'uphill_extreme (>10%)': 0.0,
    'downhill_extreme (<-15%)': 0.0,
    'downhill_moderate (-7% to -5%)': 0.0,
    'downhill_steep (-10% to -7%)': 0.0,
    'downhill_very_steep (-15% to -10%)': 0.0,
    'Average_Speed': 5.0,
    'Turn_Density': 1.67
}

print("\n" + "#" * 80)
print("# EXAMPLE 2: Short, Flat, Paved Route")
print("#" * 80)
print()

recommendations2 = recommend_similar_routes(custom_features, n_recommendations=5)


################################################################################
# EXAMPLE 2: Short, Flat, Paved Route
################################################################################

❌ Error: Missing features: ['on_road', 'off_road', 'Gravel_Tracks', 'Paved_Paths', 'Other', 'Unknown Surface', 'Paved_Road', 'Pedestrian', 'Unknown_Way', 'Cycle Track', 'Main Road', 'Steep Section', 'Moderate Section', 'Flat Section', 'Downhill Section', 'Steep Downhill Section']


---
## Summary

**Use Case 1 Function Complete!**

**How to use:**
```python
# Option 1: From existing route
features = get_route_features(route_id)
recommendations = recommend_similar_routes(features)

# Option 2: Custom features
custom = {'distance_m': 5000, 'ascent_m': 50, ...}
recommendations = recommend_similar_routes(custom)
```

**Returns:**
- DataFrame with route IDs, names, features, and similarity scores
- Prints detailed comparison

**Next:** Use Case 2 (custom modifications like "2x distance")