# Use Case 1: "Give Me Routes Like This"

**Purpose:** Given a set of route features, find the most similar routes in our dataset.

**Model:** KNN with k=5, cosine metric (93.9% improvement over baseline)

**Use Cases:**
- Cyclist wants routes similar to one they enjoyed
- Find alternatives in different locations with same characteristics
- Discover routes with similar difficulty/distance/terrain

## 1. Setup and Load Optimized Model

In [None]:
# Create feature matrix
X = df.drop(['Unnamed: 0.1', 'Unnamed: 0', 'id', 'name'], axis=1)

# DEBUG: Check what columns we actually have
print("Available columns in X:")
print(list(X.columns))
print(f"\nTotal columns: {len(X.columns)}")

# Check for specific columns we need for scaling
standard_cols = ['distance_m', 'duration_s', 'ascent_m', 'descent_m', 'Turn_Density', 'Average_Speed', 'steps', 'turns']
print("\n\nChecking StandardScaler columns:")
for col in standard_cols:
    exists = "✅" if col in X.columns else "❌"
    print(f"{exists} {col}")

NameError: name 'df' is not defined

## 2. Prepare Features and Scaler

In [2]:
# Create feature matrix
X = df.drop(['Unnamed: 0.1', 'Unnamed: 0', 'id', 'name'], axis=1)

# Apply same scaling as training
scaler = ColumnTransformer(transformers=[
    ('standard', StandardScaler(), ['distance_m', 'duration_s', 'ascent_m', 'descent_m', 'Turn_Density', 'Average_Speed', 'steps', 'turns']),
    ('minmax', MinMaxScaler(), ['Asphalt', 'Unknown', 'Paved', 'Compacted Gravel', 'Wood', 'Gravel', 'Paving Stones', 'Ground', 'Concrete', 'Grass', 'Metal', 'Unpaved', 'Dirt', 'Grass Paver', 'Sand', 'Road', 'Cycleway', 'State Road', 'Track', 'Street', 'Path', 'Footway', 'Unknown.1', 'Steps', 'Construction', 'Ferry', 'uphill_very_steep (7% to 10%)', 'uphill_moderate (3% to 5%)', 'uphill_gentle (0% to 3%)', 'flat (0%)', 'downhill_gentle (-5% to 0%)', 'uphill_steep (5% to 7%)', 'uphill_extreme (>10%)', 'downhill_extreme (<-15%)', 'downhill_moderate (-7% to -5%)', 'downhill_steep (-10% to -7%)', 'downhill_very_steep (-15% to -10%)']),
], remainder='passthrough')

X_scaled = scaler.fit_transform(X)
print(f"Features prepared: {X.shape[1]} features")
print(f"Feature names: {list(X.columns)}")

Features prepared: 45 features
Feature names: ['distance_m', 'duration_s', 'ascent_m', 'descent_m', 'steps', 'turns', 'Asphalt', 'Unknown', 'Paved', 'Compacted Gravel', 'Wood', 'Gravel', 'Paving Stones', 'Ground', 'Concrete', 'Grass', 'Metal', 'Unpaved', 'Dirt', 'Grass Paver', 'Sand', 'Road', 'Cycleway', 'State Road', 'Track', 'Street', 'Path', 'Footway', 'Unknown.1', 'Steps', 'Construction', 'Ferry', 'uphill_very_steep (7% to 10%)', 'uphill_moderate (3% to 5%)', 'uphill_gentle (0% to 3%)', 'flat (0%)', 'downhill_gentle (-5% to 0%)', 'uphill_steep (5% to 7%)', 'uphill_extreme (>10%)', 'downhill_extreme (<-15%)', 'downhill_moderate (-7% to -5%)', 'downhill_steep (-10% to -7%)', 'downhill_very_steep (-15% to -10%)', 'Average_Speed', 'Turn_Density']


## 3. Train Optimized KNN Model

Using optimal parameters from grid search: k=5, cosine metric

In [3]:
# Train with optimal parameters
knn_optimal = NearestNeighbors(n_neighbors=5, metric='cosine')
knn_optimal.fit(X_scaled)

print("✅ Optimized KNN model trained!")
print(f"   k = {knn_optimal.n_neighbors}")
print(f"   metric = '{knn_optimal.metric}'")
print(f"   Training samples = {X_scaled.shape[0]}")

✅ Optimized KNN model trained!
   k = 5
   metric = 'cosine'
   Training samples = 7717


---
# PRODUCTION RECOMMENDATION FUNCTION

## 4. Define Recommendation Function

In [None]:
def recommend_similar_routes(input_features, n_recommendations=5, show_details=True):
    """
    Find routes similar to given features

    Parameters:
    - input_features: Dict, Series, or DataFrame with route features (must match X columns)
    - n_recommendations: Number of similar routes to return (default 5)
    - show_details: If True, prints detailed comparison (default True)

    Returns:
    - DataFrame with recommended routes and similarity scores
    """
    # Convert input to DataFrame
    if isinstance(input_features, dict):
        input_df = pd.DataFrame([input_features])
    elif isinstance(input_features, pd.Series):
        input_df = input_features.to_frame().T
    else:
        input_df = input_features.copy()

    # Validate columns
    if not all(col in input_df.columns for col in X.columns):
        missing = [col for col in X.columns if col not in input_df.columns]
        print(f"❌ Error: Missing features: {missing}")
        return None

    # Ensure column order matches
    input_df = input_df[X.columns]

    # Print input summary
    print("=" * 80)
    print("INPUT ROUTE FEATURES")
    print("=" * 80)
    print(f"Distance: {input_df['distance_m'].values[0]:.1f}m")
    print(f"Ascent: {input_df['ascent_m'].values[0]:.1f}m")
    print(f"Duration: {input_df['duration_s'].values[0]:.1f}s")
    print(f"Avg Speed: {input_df['Average_Speed'].values[0]:.2f}")
    print(f"Turn Density: {input_df['Turn_Density'].values[0]:.2f}")
    print()

    # Scale features
    input_scaled = scaler.transform(input_df)

    # Find nearest neighbors
    distances, indices = knn_optimal.kneighbors(input_scaled, n_neighbors=n_recommendations)

    # Build results
    recommendations = []
    for rank, (dist, idx) in enumerate(zip(distances[0], indices[0]), 1):
        rec_route = df.iloc[idx]

        recommendations.append({
            'rank': rank,
            'route_id': rec_route['id'],
            'route_name': rec_route['name'],
            'distance_m': rec_route['distance_m'],
            'ascent_m': rec_route['ascent_m'],
            'duration_s': rec_route['duration_s'],
            'avg_speed': rec_route['Average_Speed'],
            'turn_density': rec_route['Turn_Density'],
            'similarity_score': dist
        })

    results_df = pd.DataFrame(recommendations)

    # Print results
    print("=" * 80)
    print(f"TOP {n_recommendations} SIMILAR ROUTES")
    print("=" * 80)
    print()

    for i, row in results_df.iterrows():
        dist_diff = row['distance_m'] - input_df['distance_m'].values[0]
        ascent_diff = row['ascent_m'] - input_df['ascent_m'].values[0]

        print(f"{row['rank']}. {row['route_name'][:60]}")
        print(f"   Route ID: {row['route_id']}")
        print(f"   Similarity: {row['similarity_score']:.4f} (lower = more similar)")
        print(f"   Distance: {row['distance_m']:.1f}m ({dist_diff:+.1f}m)")
        print(f"   Ascent: {row['ascent_m']:.1f}m ({ascent_diff:+.1f}m)")
        print(f"   Duration: {row['duration_s']:.1f}s")
        print(f"   Avg Speed: {row['avg_speed']:.2f}")
        print()

    # Detailed comparison
    if show_details:
        print("=" * 80)
        print("FEATURE COMPARISON")
        print("=" * 80)
        print(f"{'Feature':<25} {'Input':<15} {'Avg Recommended':<20} {'Difference':<15}")
        print("-" * 80)

        key_features = ['distance_m', 'ascent_m', 'duration_s', 'Average_Speed', 'Turn_Density']
        for feature in key_features:
            input_val = input_df[feature].values[0]
            avg_rec_val = results_df[feature].mean()
            diff = avg_rec_val - input_val
            print(f"{feature:<25} {input_val:<15.2f} {avg_rec_val:<20.2f} {diff:+.2f}")

    print("\n" + "=" * 80)

    return results_df

print("✅ Recommendation function ready!")

✅ Recommendation function ready!


## 5. Helper Function: Get Features from Route ID

In [None]:
def get_route_features(route_id):
    """
    Extract features from an existing route by ID
    Useful for testing with known routes

    Parameters:
    - route_id: The route ID to look up

    Returns:
    - Series with route features
    """
    if route_id not in df['id'].values:
        print(f"❌ Error: Route ID {route_id} not found in dataset")
        return None

    route = df[df['id'] == route_id].iloc[0]
    features = route[X.columns]

    print(f"✅ Extracted features from: '{route['name']}' (ID: {route_id})")
    print(f"   Distance: {route['distance_m']:.1f}m, Ascent: {route['ascent_m']:.1f}m")
    print()

    return features

print("✅ Helper function ready!")

✅ Helper function ready!


---
# EXAMPLES

## Example 1: Use Existing Route Features

In [None]:
# Pick a route from the dataset
example_route_id = df.iloc[100]['id']

# Get its features
features = get_route_features(example_route_id)

# Find similar routes
recommendations = recommend_similar_routes(features, n_recommendations=5)

✅ Extracted features from: 'Salle Cycle Loop' (ID: 18923017)
   Distance: 2031.1m, Ascent: 6.7m

INPUT ROUTE FEATURES
Distance: 2031.1m
Ascent: 6.7m
Duration: 406.2s
Avg Speed: 5.00
Turn Density: 0.00

TOP 5 SIMILAR ROUTES

1. Salle Cycle Loop
   Route ID: 18923017
   Similarity: 0.0000 (lower = more similar)
   Distance: 2031.1m (+0.0m)
   Ascent: 6.7m (+0.0m)
   Duration: 406.2s
   Avg Speed: 5.00

2. Unnamed route
   Route ID: 15590082
   Similarity: 0.0001 (lower = more similar)
   Distance: 1797.5m (-233.6m)
   Ascent: 6.0m (-0.7m)
   Duration: 359.5s
   Avg Speed: 5.00

3. Unnamed route
   Route ID: 15588106
   Similarity: 0.0001 (lower = more similar)
   Distance: 1634.9m (-396.2m)
   Ascent: 4.3m (-2.4m)
   Duration: 327.0s
   Avg Speed: 5.00

4. Unnamed route
   Route ID: 15588105
   Similarity: 0.0002 (lower = more similar)
   Distance: 2055.2m (+24.1m)
   Ascent: 15.4m (+8.7m)
   Duration: 411.0s
   Avg Speed: 5.00

5. Hampden Route
   Route ID: 12890644
   Similarity: 0.000

KeyError: 'Average_Speed'

## Example 2: Manual Feature Input (Short, Flat Route)

In [None]:
# User wants: short, flat, paved route
custom_features = {
    'distance_m': 3000.0,        # 3km
    'duration_s': 600.0,         # 10 minutes
    'ascent_m': 10.0,            # Very flat
    'descent_m': 10.0,
    'steps': 3,
    'turns': 5,
    'Asphalt': 90.0,             # 90% paved
    'Unknown': 0.0,
    'Paved': 10.0,
    'Compacted Gravel': 0.0,
    'Wood': 0.0,
    'Gravel': 0.0,
    'Paving Stones': 0.0,
    'Ground': 0.0,
    'Concrete': 0.0,
    'Grass': 0.0,
    'Metal': 0.0,
    'Unpaved': 0.0,
    'Dirt': 0.0,
    'Grass Paver': 0.0,
    'Sand': 0.0,
    'Road': 100.0,
    'Cycleway': 0.0,
    'State Road': 0.0,
    'Track': 0.0,
    'Street': 0.0,
    'Path': 0.0,
    'Footway': 0.0,
    'Unknown.1': 0.0,
    'Steps': 0.0,
    'Construction': 0.0,
    'Ferry': 0.0,
    'uphill_very_steep (7% to 10%)': 0.0,
    'uphill_moderate (3% to 5%)': 0.0,
    'uphill_gentle (0% to 3%)': 10.0,
    'flat (0%)': 90.0,           # Mostly flat
    'downhill_gentle (-5% to 0%)': 0.0,
    'uphill_steep (5% to 7%)': 0.0,
    'uphill_extreme (>10%)': 0.0,
    'downhill_extreme (<-15%)': 0.0,
    'downhill_moderate (-7% to -5%)': 0.0,
    'downhill_steep (-10% to -7%)': 0.0,
    'downhill_very_steep (-15% to -10%)': 0.0,
    'Average_Speed': 5.0,
    'Turn_Density': 1.67
}

print("\n" + "#" * 80)
print("# EXAMPLE 2: Short, Flat, Paved Route")
print("#" * 80)
print()

recommendations2 = recommend_similar_routes(custom_features, n_recommendations=5)

## Example 3: Challenging Hilly Route

In [None]:
# User wants: longer route with significant climbing
hilly_features = {
    'distance_m': 15000.0,       # 15km
    'duration_s': 3600.0,        # 1 hour
    'ascent_m': 300.0,           # 300m climbing
    'descent_m': 300.0,
    'steps': 10,
    'turns': 20,
    'Asphalt': 70.0,
    'Unknown': 0.0,
    'Paved': 20.0,
    'Compacted Gravel': 10.0,
    'Wood': 0.0,
    'Gravel': 0.0,
    'Paving Stones': 0.0,
    'Ground': 0.0,
    'Concrete': 0.0,
    'Grass': 0.0,
    'Metal': 0.0,
    'Unpaved': 0.0,
    'Dirt': 0.0,
    'Grass Paver': 0.0,
    'Sand': 0.0,
    'Road': 80.0,
    'Cycleway': 20.0,
    'State Road': 0.0,
    'Track': 0.0,
    'Street': 0.0,
    'Path': 0.0,
    'Footway': 0.0,
    'Unknown.1': 0.0,
    'Steps': 0.0,
    'Construction': 0.0,
    'Ferry': 0.0,
    'uphill_very_steep (7% to 10%)': 10.0,
    'uphill_moderate (3% to 5%)': 30.0,  # Significant climbing
    'uphill_gentle (0% to 3%)': 20.0,
    'flat (0%)': 20.0,
    'downhill_gentle (-5% to 0%)': 20.0,
    'uphill_steep (5% to 7%)': 0.0,
    'uphill_extreme (>10%)': 0.0,
    'downhill_extreme (<-15%)': 0.0,
    'downhill_moderate (-7% to -5%)': 0.0,
    'downhill_steep (-10% to -7%)': 0.0,
    'downhill_very_steep (-15% to -10%)': 0.0,
    'Average_Speed': 4.17,
    'Turn_Density': 1.33
}

print("\n" + "#" * 80)
print("# EXAMPLE 3: Challenging Hilly Route")
print("#" * 80)
print()

recommendations3 = recommend_similar_routes(hilly_features, n_recommendations=5)

---
## Summary

**Use Case 1 Function Complete!**

**How to use:**
```python
# Option 1: From existing route
features = get_route_features(route_id)
recommendations = recommend_similar_routes(features)

# Option 2: Custom features
custom = {'distance_m': 5000, 'ascent_m': 50, ...}
recommendations = recommend_similar_routes(custom)
```

**Returns:**
- DataFrame with route IDs, names, features, and similarity scores
- Prints detailed comparison

**Next:** Use Case 2 (custom modifications like "2x distance")