# Guide My Sleigh Data Overview:

The crew here at Data Fusion want to help ensure that the the joy and cheer of SantaCon comes with great bars, no lines and a map of fun and getting you back to your hotel with no extra thougts about logistics. Santas have done enough work all year long. 

Below we will be extracting data from the santacon.nyc website of the availible bars. We will also be listing 3 hotels that are also offering reduced night stay rates for our Santa's so they don't have to worry about lodging. 

After gathering this data, we will go ahead and make a few "Sleigh Rides" that will be a currated group of bars that Santa's will have already paid covers for. Allowing them no wait access to those bars. 

An algorithm will calculate the optimal commute (shorest walking distance and order)for that Sleigh Ride and getting our Santas back to their hotel. That information will be passed onto our users visually in the "Guide My Sleigh" app.

In [1]:
import json
import pandas as pd
from datetime import time
import openrouteservice
import numpy as np
import math
import requests
import hashlib

## Bar Data

In [2]:
with open("santacon_2024_venues.json", "r") as file:
    json_data = json.load(file)

In [3]:
# Convert the nested JSON object (assuming the key is 'data') into a pandas DataFrame
df = pd.DataFrame(json_data["data"])

columns = ["name", "latitude", "longitude", "address", "opens", "closes", "categories", "description", "image"]
df = df[columns]

df.columns = [
    "Name", "Latitude", "Longitude", "Address",
    "Opens", "Closes", "Categories", "Description", "Image"
]

display(df.head())

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
0,*10AM START POINT THE CHRISTMAS SPECTACULAR**,40.754349,-73.986899,"Broadway at 40th Street New York, NY",10:00,11:00,START POINT!,10am Santa is Painting the town Red<br>We will...,https://santacon.nyc/wp-content/uploads/2023/1...
1,The Rutherford,40.751373,-73.993552,"W 33rd St at 8th Ave, New York, NY 10119",10:00,20:00,Huge Venues,Get here early for views from the Roof Deck / ...,https://santacon.nyc/wp-content/uploads/2023/1...
2,Avenida,40.752193,-73.993465,"W. 34 St & 8th Ave, New York, NY 10001",10:00,20:00,Huge Venues,The Holiday Hustle at this rooftop + mexican b...,https://santacon.nyc/wp-content/uploads/2023/1...
3,Taj II,40.741051,-73.992882,"48 W 21st St, New York, NY 10010",12:00,22:00,Huge Venues,Naughty vs Nice at this Sexy Dance Floor Vibin...,https://santacon.nyc/wp-content/uploads/2023/1...
4,Clinton Hall 36,40.750099,-73.984395,"16 W, 36th Street New York, NY 10018",9:00,17:00,Huge Venues,DJs Fun games at this HUGE open spot 🎲🎟️🎁,https://santacon.nyc/wp-content/uploads/2023/1...


In [4]:
display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         75 non-null     object 
 1   Latitude     75 non-null     float64
 2   Longitude    75 non-null     float64
 3   Address      75 non-null     object 
 4   Opens        75 non-null     object 
 5   Closes       75 non-null     object 
 6   Categories   75 non-null     object 
 7   Description  75 non-null     object 
 8   Image        75 non-null     object 
dtypes: float64(2), object(7)
memory usage: 5.4+ KB


None

There are 75 venues including the starting location. We are going to update the columns dtypes to datetime for "Opens" and "Closes". From there we should edit this list down to fewer options as 75 is going to innondate the user with options and make it complicated to choose effective sleigh rides

In [5]:
df['Opens'] = pd.to_datetime(df['Opens'], format='%H:%M').dt.time
df['Closes'] = pd.to_datetime(df['Closes'], format='%H:%M').dt.time

In [6]:
display(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 75 entries, 0 to 74
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         75 non-null     object 
 1   Latitude     75 non-null     float64
 2   Longitude    75 non-null     float64
 3   Address      75 non-null     object 
 4   Opens        75 non-null     object 
 5   Closes       75 non-null     object 
 6   Categories   75 non-null     object 
 7   Description  75 non-null     object 
 8   Image        75 non-null     object 
dtypes: float64(2), object(7)
memory usage: 5.4+ KB


None

In [7]:
display(df.head(3))

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
0,*10AM START POINT THE CHRISTMAS SPECTACULAR**,40.754349,-73.986899,"Broadway at 40th Street New York, NY",10:00:00,11:00:00,START POINT!,10am Santa is Painting the town Red<br>We will...,https://santacon.nyc/wp-content/uploads/2023/1...
1,The Rutherford,40.751373,-73.993552,"W 33rd St at 8th Ave, New York, NY 10119",10:00:00,20:00:00,Huge Venues,Get here early for views from the Roof Deck / ...,https://santacon.nyc/wp-content/uploads/2023/1...
2,Avenida,40.752193,-73.993465,"W. 34 St & 8th Ave, New York, NY 10001",10:00:00,20:00:00,Huge Venues,The Holiday Hustle at this rooftop + mexican b...,https://santacon.nyc/wp-content/uploads/2023/1...


In [8]:
# Create a slice of the starting point for SantaCon and convert to a dataframe
start_loc = df.iloc[0].to_frame().T
start_loc['Address'] = '1415 Broadway, New York, NY 10018'

In [9]:
display(start_loc)

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
0,*10AM START POINT THE CHRISTMAS SPECTACULAR**,40.754349,-73.986899,"1415 Broadway, New York, NY 10018",10:00:00,11:00:00,START POINT!,10am Santa is Painting the town Red<br>We will...,https://santacon.nyc/wp-content/uploads/2023/1...


Our Santas may want the option to go elsewhere, see something nearby. 

In [10]:
bar_filt = df.copy()
bar_filt = bar_filt[bar_filt['Opens']<=time(11, 0)]
bar_filt = bar_filt[bar_filt['Closes']>=time(20, 0)]

In [11]:
display(bar_filt.info())

<class 'pandas.core.frame.DataFrame'>
Index: 42 entries, 1 to 74
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Name         42 non-null     object 
 1   Latitude     42 non-null     float64
 2   Longitude    42 non-null     float64
 3   Address      42 non-null     object 
 4   Opens        42 non-null     object 
 5   Closes       42 non-null     object 
 6   Categories   42 non-null     object 
 7   Description  42 non-null     object 
 8   Image        42 non-null     object 
dtypes: float64(2), object(7)
memory usage: 3.3+ KB


None

In [12]:
display(bar_filt)

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
1,The Rutherford,40.751373,-73.993552,"W 33rd St at 8th Ave, New York, NY 10119",10:00:00,20:00:00,Huge Venues,Get here early for views from the Roof Deck / ...,https://santacon.nyc/wp-content/uploads/2023/1...
2,Avenida,40.752193,-73.993465,"W. 34 St & 8th Ave, New York, NY 10001",10:00:00,20:00:00,Huge Venues,The Holiday Hustle at this rooftop + mexican b...,https://santacon.nyc/wp-content/uploads/2023/1...
5,5th & Mad,40.749771,-73.982867,"7 E 36th St, New York, NY 10016",11:00:00,20:00:00,Huge Venues,The Snow Ball at this hugh East Side Bar + DJ ...,https://santacon.nyc/wp-content/uploads/2023/1...
8,Solas,40.729445,-73.988059,"232 E 9th St, New York, NY 10003",11:00:00,20:00:00,Huge Venues,Jingle Bell Rockout at this lounge & dance clu...,https://santacon.nyc/wp-content/uploads/2023/1...
10,The Tailor,40.753109,-73.993027,"505 8th Ave, New York, NY 10018",11:00:00,20:00:00,Huge Venues,Mistletoe Mania at this HUGE - 1000+ Elves & S...,https://santacon.nyc/wp-content/uploads/2023/1...
12,Bar 13,40.734531,-73.99218,"121 University Pl, New York, NY 10003",11:00:00,21:00:00,Huge Venues,"Frosty Fest with Drinking & techno, house & hi...",https://santacon.nyc/wp-content/uploads/2023/1...
16,Amsterdam Billiards and Bar,40.731881,-73.989596,"110 E 11th St, New York, NY 10003",11:00:00,20:00:00,East Village Bars,"25 Billiards Tables plus Ping-Pong, Darts, Foo...",https://santacon.nyc/wp-content/uploads/2023/1...
18,Horseshoe Bar,40.725217,-73.981493,"108 Ave B, New York, NY 10009",11:00:00,20:00:00,East Village Bars,No-nonsense Punk & Old School Bar 🎅🧲,https://santacon.nyc/wp-content/uploads/2023/1...
19,Coyote Ugly,40.733034,-73.985653,"233 E. 14th St., New York, NY 10003",11:00:00,21:00:00,East Village Bars,"The One, The Only, Leave Your Bra on the Ceili...",https://santacon.nyc/wp-content/uploads/2023/1...
20,The Laurels,40.732803,-73.984964,"231 2nd Ave, New York, NY 10003",10:00:00,23:00:00,East Village Bars,"Sexy, Cozy Cocktail Bar in East Village 💃🍺⛄",https://santacon.nyc/wp-content/uploads/2023/1...


We have filtered down the potential bars from 74 to 42 options that are open immediately following the initital meet up and late enough that Santas aren't gonna feel rushed. We have come up with 4 Sleigh Ride Themes to choose from:

The Colossal Cheers Circuit (only the largest venues on our list)
Holiday Hors D'oeuvres Hop (a foodie focused bar crawl)
The Midtown Mistletoe March (only midtown locations)
Shamrocks & Stockings Crawl (Irish pub focused)

Let's fliter further and get each Sleigh ride finalized

In [13]:
huge_df = bar_filt[bar_filt['Categories']=='Huge Venues']
display(huge_df)

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
1,The Rutherford,40.751373,-73.993552,"W 33rd St at 8th Ave, New York, NY 10119",10:00:00,20:00:00,Huge Venues,Get here early for views from the Roof Deck / ...,https://santacon.nyc/wp-content/uploads/2023/1...
2,Avenida,40.752193,-73.993465,"W. 34 St & 8th Ave, New York, NY 10001",10:00:00,20:00:00,Huge Venues,The Holiday Hustle at this rooftop + mexican b...,https://santacon.nyc/wp-content/uploads/2023/1...
5,5th & Mad,40.749771,-73.982867,"7 E 36th St, New York, NY 10016",11:00:00,20:00:00,Huge Venues,The Snow Ball at this hugh East Side Bar + DJ ...,https://santacon.nyc/wp-content/uploads/2023/1...
8,Solas,40.729445,-73.988059,"232 E 9th St, New York, NY 10003",11:00:00,20:00:00,Huge Venues,Jingle Bell Rockout at this lounge & dance clu...,https://santacon.nyc/wp-content/uploads/2023/1...
10,The Tailor,40.753109,-73.993027,"505 8th Ave, New York, NY 10018",11:00:00,20:00:00,Huge Venues,Mistletoe Mania at this HUGE - 1000+ Elves & S...,https://santacon.nyc/wp-content/uploads/2023/1...
12,Bar 13,40.734531,-73.99218,"121 University Pl, New York, NY 10003",11:00:00,21:00:00,Huge Venues,"Frosty Fest with Drinking & techno, house & hi...",https://santacon.nyc/wp-content/uploads/2023/1...
74,Circo,40.76042,-73.98425,"1604 Broadway York, NY 10019",11:00:00,22:00:00,Huge Venues,BIGGEST Santacon venue! Get to this new tri-le...,https://santacon.nyc/wp-content/uploads/2024/1...


In [14]:
huge_df = huge_df.drop(5)

In [15]:
midtown_df = bar_filt[bar_filt['Categories'].str.contains('Midtown', case=False, na=False)]

In [16]:
display(midtown_df)

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
26,Backstage Tavern,40.760502,-73.989732,"346 W 46th St, New York, NY 10036",10:00:00,22:00:00,Midtown West Bars,Where Santa's stars align 🌟🎄🩵,https://santacon.nyc/wp-content/uploads/2024/1...
27,Bar Dough,40.760531,-73.989846,"350 W. 46th St, New York, NY 10036",10:00:00,22:00:00,Midtown West Bars,beer & cocktails + wood-fired pizzas 🎅🍕,https://santacon.nyc/wp-content/uploads/2023/1...
28,Blarney Stone,40.75055,-73.99483,"410 8th Ave, New York, NY 10001",11:00:00,20:00:00,Midtown West Bars,Classic NYC Irish Midtown🎅🇮🇪,https://santacon.nyc/wp-content/uploads/2023/1...
34,The Dean,40.754518,-73.989304,"214 W 39th St, New York, NY 10018",11:00:00,22:00:00,Midtown West Bars,Spacious industrial-chic trendy hub 🎅🥃,https://santacon.nyc/wp-content/uploads/2023/1...
39,The Independent,40.754843,-73.987502,"147 W 40th St, New York, NY 10018",11:00:00,20:00:00,Midtown West Bars,Snug and stylish space for notable drinks 🎅🥃,https://santacon.nyc/wp-content/uploads/2023/1...
42,Jack Doyle's,40.75228,-73.992082,"240 W 35th St, New York, NY 10001",10:00:00,21:00:00,Midtown West Bars,DJ at this HUGE Irish Rockin' Haus 🎅🍀,https://santacon.nyc/wp-content/uploads/2023/1...
43,John Sullivan's,40.751815,-73.990753,"210 W 35th St, New York, NY 10001",10:00:00,21:00:00,Midtown West Bars,2 Levels & Amazing Drink Specials 👯🕶️,https://santacon.nyc/wp-content/uploads/2023/1...
45,The Liberty,40.749799,-73.985657,"29 W 35th St, New York, NY 10001",10:00:00,20:00:00,Midtown West Bars,Amazing fusion between classic & contemporary ...,https://santacon.nyc/wp-content/uploads/2023/1...
46,Printers Alley,40.755506,-73.988588,"215 West 40th St, New York, NY 10018",10:00:00,20:00:00,Midtown West Bars,4 Floor Sprawling Irish Pub 🎅🏠,https://santacon.nyc/wp-content/uploads/2023/1...
49,Peter Dillons 36th,40.749603,-73.983317,"2 E 36th St, New York, NY 10016",10:00:00,20:00:00,Midtown East Bars,Beers & Cocktails for Santa 🎅🇮🇪,https://santacon.nyc/wp-content/uploads/2023/1...


In [17]:
midtown_exclude_list = ["Bar Dough", "Blarney Stone", "Jack Doyle\'s",
                        "Printers Alley", "Peter Dillons 36th", "Peter Dillons Pub 40th",
                        "Playwright Irish Pub", "Slattery's", "Westbury", 
                        "Walters Cottage", "Celtic Rail", "Jameson's",
                        "The Dean", "The Independent"]
midtown_df = midtown_df[~midtown_df['Name'].isin(midtown_exclude_list)]

In [18]:
food_word_list = ['food', 'pizza']
food_pattern = '|'.join(food_word_list)

food_bars = bar_filt[
    (bar_filt['Categories']== "Grub") |
    (bar_filt['Description'].str.contains(food_pattern,
                                          case=False, na=False))
]

display(food_bars)

Unnamed: 0,Name,Latitude,Longitude,Address,Opens,Closes,Categories,Description,Image
16,Amsterdam Billiards and Bar,40.731881,-73.989596,"110 E 11th St, New York, NY 10003",11:00:00,20:00:00,East Village Bars,"25 Billiards Tables plus Ping-Pong, Darts, Foo...",https://santacon.nyc/wp-content/uploads/2023/1...
27,Bar Dough,40.760531,-73.989846,"350 W. 46th St, New York, NY 10036",10:00:00,22:00:00,Midtown West Bars,beer & cocktails + wood-fired pizzas 🎅🍕,https://santacon.nyc/wp-content/uploads/2023/1...
31,Brooklyn Deli Times Square,40.757228,-73.986979,"211 West 43rd St, New York, NY 10036",10:00:00,20:00:00,Grub,Satin Dolls performing Christmas classics & th...,https://santacon.nyc/wp-content/uploads/2024/1...
47,Montagu's Gusto,40.745833,-73.975552,"645 2nd Ave, New York, NY 10016",09:00:00,23:00:00,Grub,Artisan eatery bringing new flavors to Santaco...,https://santacon.nyc/wp-content/uploads/2024/1...
62,La Macarena NYC,40.760747,-73.986387,"234 W 48th St, New York, NY 10036",11:00:00,21:00:00,Midtown West Bars,"Santa loves Latin Food, Hooka & Party Bar in T...",https://santacon.nyc/wp-content/uploads/2024/1...
65,Kinky's Dessert Bar,40.722162,-73.988571,"181 Orchard St, New York, NY 10002",11:00:00,23:00:00,Grub,A Dessert Shop with Booze...Let's Get Kinky! 👄,https://santacon.nyc/wp-content/uploads/2023/1...
70,Cafe Flor,40.744262,-73.999222,"218 8th Ave, New York, NY 10011",10:00:00,20:00:00,Grub,Santa's cozy Coffee Shop / Bar & quick bites 🥖🍸🎄🎅,https://lh3.googleusercontent.com/p/AF1QipO49m...


In [19]:
# removing a specific location because they don't offer liquor
food_bars = food_bars.drop(31)

In [20]:
irish_bars = bar_filt[bar_filt['Description'].str.contains('Irish|🇮🇪', case=False, na=False)]

In [21]:
irish_exclusion_list = ['Blarney Stone', 'Peter Dillons 36th', 
                        'Peter Dillons Pub 40th', "Slattery\'s",
                        'Westbury']
irish_bars = irish_bars[~irish_bars['Name'].isin(irish_exclusion_list)]

Okay--the Sleigh ride venues have been finalized. Each ride has access to 6-7 venues. We will now get the name, coordinates and address for the 3 hotels doing deals. 

## Hotels

In [22]:
with open("hotels.json", "r") as file2:
    json_hotel = json.load(file2)

In [23]:
# Convert the nested JSON object (assuming the key is 'Hotel') into a pandas DataFrame
hotel_df = pd.DataFrame(json_hotel["Hotels"])

# Select only the desired columns
columns = ["name", 'short', "latitude", "longitude", "address", "description", "image"]
hotel_df = hotel_df[columns]

# Rename columns to match your desired naming convention
hotel_df.columns = [
    "Name", "Short", "Latitude", "Longitude", "Address",
    "Description", "Image"
]

# Preview the DataFrame
display(hotel_df.head())

Unnamed: 0,Name,Short,Latitude,Longitude,Address,Description,Image
0,Arthouse Hotel NYC,Arthouse,40.782253,-73.980271,"2178 Broadway, New York, NY, 10024","Arthouse Hotel New York City has brought hip, ...",https://image-tc.galaxy.tf/wijpeg-1nbmvmxtb567...
1,Hotel Indigo LES,Indigo,40.721923,-73.987749,"171 Ludlow St, New York, NY 10002",Whether you’re snapping photos of our spectacu...,https://digital.ihg.com/is/image/ihg/hotel-ind...
2,The High Line Hotel,Highline,40.746105,-74.004977,"180 10th Ave, New York, NY 10011",Nestled in the heart of Chelsea’s buzzing gall...,https://thehighlinehotel.com/wp-content/upload...
3,Virgin Hotels NYC,Virgin,40.746757,-73.988796,"1227 Broadway, New York, NY 10001",Discover the best bed in all five boroughs in ...,https://newyorkyimby.com/wp-content/uploads/20...


In [24]:
def coordinates(row):
    return (row['Longitude'], row['Latitude'])

In [25]:
def make_route_base (start, route_df, hotels):    
    """
    Creates and returns each possible route and hotel combination. Additionally it
    modifies the dataframes to have coordinates instead of latitude and logitude 
    """
    # Concatenate the start location, route DataFrame, and a single hotel row for each route
    art_route = pd.concat([start, route_df, hotels.iloc[[0]].copy()], ignore_index=True).reset_index(drop=True)
    ind_route = pd.concat([start, route_df, hotels.iloc[[1]].copy()], ignore_index=True).reset_index(drop=True)
    high_route = pd.concat([start, route_df, hotels.iloc[[2]].copy()], ignore_index=True).reset_index(drop=True)
    vir_route = pd.concat([start, route_df, hotels.iloc[[3]].copy()], ignore_index=True).reset_index(drop=True)
    
    # Replace latitude and logitude columns with a coordinate column
    for route in [art_route, ind_route, high_route, vir_route]:
        route['Coordinates'] = route.apply(coordinates, axis=1)
        
        # Drop the 'Latitude' and 'Longitude' columns
        route.drop(columns=['Latitude', 'Longitude'], inplace=True)
    
    return art_route, ind_route, high_route, vir_route    
    return art_route, ind_route, high_route, vir_route

In [26]:
start_loc_cleaned = start_loc.drop(['Opens', 'Closes', 'Categories'], axis=1)
huge_bars_cleaned = huge_df.drop(['Opens', 'Closes', 'Categories'], axis=1)
midtown_bars_cleaned = midtown_df.drop(['Opens', 'Closes', 'Categories'], axis=1)
food_bars_cleaned = food_bars.drop(['Opens', 'Closes', 'Categories'], axis=1)
irish_bars_cleaned = irish_bars.drop(['Opens', 'Closes', 'Categories'], axis=1)
hotel_df_cleaned = hotel_df.drop(['Short'], axis=1)

In [27]:
# Huge venue routes
huge_arthouse, huge_indigo, huge_highline, huge_virgin = \
    make_route_base(start_loc_cleaned, huge_bars_cleaned, hotel_df_cleaned)

# Midtown venue routes
midtown_arthouse, midtown_indigo, midtown_highline, midtown_virgin = \
    make_route_base(start_loc_cleaned, midtown_bars_cleaned, hotel_df_cleaned)

# Food venue routes
food_arthouse, food_indigo, food_highline, food_virgin = \
    make_route_base(start_loc_cleaned, food_bars_cleaned, hotel_df_cleaned)

# Irish venue routes
irish_arthouse, irish_indigo, irish_highline, irish_virgin = \
    make_route_base(start_loc_cleaned, irish_bars_cleaned, hotel_df_cleaned)

In [28]:
display(irish_arthouse)
display(irish_arthouse.columns)

Unnamed: 0,Name,Address,Description,Image,Coordinates
0,*10AM START POINT THE CHRISTMAS SPECTACULAR**,"1415 Broadway, New York, NY 10018",10am Santa is Painting the town Red<br>We will...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9868991, 40.7543488)"
1,Jack Doyle's,"240 W 35th St, New York, NY 10001",DJ at this HUGE Irish Rockin' Haus 🎅🍀,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9920823380098, 40.7522799496454)"
2,Printers Alley,"215 West 40th St, New York, NY 10018",4 Floor Sprawling Irish Pub 🎅🏠,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9885881785405, 40.7555056845062)"
3,Playwright Irish Pub,"27 W 35th St, New York, NY 10001",Santa's Winter Warmer Bar 🎅🇮🇪,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9854266119098, 40.7499486051269)"
4,Walters Bar,"389 8th Ave, New York, NY 10001","beers, darts & a pool table 🎅🇮🇪",https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9958497233574, 40.7494953724932)"
5,Celtic Rail,"137 W 33rd St, New York, NY 10120",Madison Square Garden's staple Irish Locals Ha...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.989906, 40.750069)"
6,Jameson's,"920 2nd Ave, New York, NY 10017",Real NYC Bar hang for Santa 🎅🇮🇪,https://santacon.nyc/wp-content/uploads/2024/1...,"(-73.9689739476126, 40.7547953755226)"
7,McKenna's,"250 W 14th St, New York, NY 10011",Santa's Time-tested pub + bar bites 🎅🇮🇪,https://lh3.googleusercontent.com/p/AF1QipN3xQ...,"(-74.002150722183, 40.739637278931)"
8,Arthouse Hotel NYC,"2178 Broadway, New York, NY, 10024","Arthouse Hotel New York City has brought hip, ...",https://image-tc.galaxy.tf/wijpeg-1nbmvmxtb567...,"(-73.98027072698984, 40.78225322430293)"


Index(['Name', 'Address', 'Description', 'Image', 'Coordinates'], dtype='object')

## Alogrithms for calculating route distances and routes

We now have every component we need for assessing optimal routes. Let's devise an algorithm to find optimal routes and compare it to a baseline random route. 

In [29]:
def get_route_coordinates(route_df):
    """
    Get the coordinates of the route, including the start, all waypoints, and the hotel (end).
    The start and hotel remain fixed, while intermediate waypoints will be rearranged.
    """
    
    # Extract the start location (first row), all waypoints (middle), and the hotel (last row)
    start_location = route_df.iloc[0]['Coordinates']
    hotel_location = route_df.iloc[-1]['Coordinates']
    waypoints = route_df.iloc[1:-1]['Coordinates'].tolist()

    # Combine start, waypoints, and hotel into a single list
    coordinates_list = [start_location] + waypoints + [hotel_location]
    return coordinates_list

In [30]:
def calculate_route_distance(route_df, route_order_list):
    """
    Calculate the total route distance using OpenRouteService for a given order of locations.
    
    Parameters:
    - route_df: The dataframe containing the locations (including start, waypoints, and hotel).
    - route_order_list: A list of indices that represents the order of locations in the route.
    
    Returns:
    - total_distance: The total distance of the route, in miles.
    """
    
    # Create a list of coordinates based on the ordered route
    coordinates_list = []
    for i in route_order_list:
        location = route_df.iloc[i]
        coordinates_list.append(location['Coordinates'])
    
    # Calculate the route distance using OpenRouteService's directions API
    total_distance = 0
    
    for i in range(len(coordinates_list) - 1):
        origin = coordinates_list[i]
        destination = coordinates_list[i + 1]
        
        # Request directions between consecutive locations
        routes = client.directions(
            coordinates=[origin, destination],
            profile='driving-car',
            format='geojson'
        )
        
        # Extract the distance from the response (in meters)
        route_distance = routes['features'][0]['properties']['segments'][0]['distance'] / 1000  # Convert to kilometers
        
        route_distance_miles = route_distance * 0.621371  # Conversion factor from km to miles
        
        # Add this distance to the total distance
        total_distance += route_distance_miles
    
    return total_distance


In [31]:
def create_distance_matrix (route_df):
    """
    Calculate the distance matrix using Openrouteservice for the given list of 
    locations. Each location is a tuple of (longitude, latitude).
    """
    
    location_list = get_route_coordinates(route_df)
    
    matrix = client.distance_matrix(
        locations=location_list,
        profile='driving-car',
        metrics=['distance'],
        sources=None,  # All points are sources
        destinations=None  # All points are destinations
    )
    
    # Extract the distance matrix (in meters)
    return matrix['distances']


In [32]:
def nearest_neighbor_tsp(distance_matrix):
    """
    Solve the Traveling Salesman Problem using the Nearest Neighbor heuristic,
    keeping the start and end locations fixed.
    """
    n = len(distance_matrix)  # Total number of locations
    visited = [False] * n  # To track visited locations
    route = [0]  # Start at the first location (start location)
    visited[0] = True  # Mark the start as visited
    total_distance = 0

    current_index = 0  # Start at the first location (start point)

    # Loop over all intermediate locations to find the nearest unvisited neighbor
    for _ in range(n - 2):  # Skip the start (0) and end (n-1)
        nearest_distance = float('inf')
        nearest_index = -1
        
        for i in range(1, n - 1):  # Only consider intermediate locations (1 to n-2)
            if not visited[i] and distance_matrix[current_index][i] < nearest_distance:
                nearest_distance = distance_matrix[current_index][i]
                nearest_index = i
        
        # Add the nearest location to the route
        route.append(nearest_index)
        visited[nearest_index] = True
        total_distance += nearest_distance
        current_index = nearest_index

    # Add the distance from the last visited location to the hotel (end point)
    total_distance += distance_matrix[current_index][n-1]

    return route, total_distance

In [33]:
def find_optimal_route_nn(route_df):
    """
    Finds the optimal route for the given route dataframe using TSP.
    The start and hotel locations are fixed, while the waypoints (bars) are rearranged.
    """

    # Step 1: Get the distance matrix (distance between all locations)
    distance_matrix = create_distance_matrix(route_df)

    # Step 2: Solve TSP using Nearest Neighbor heuristic
    optimal_route, total_distance = nearest_neighbor_tsp(distance_matrix)
    
    optimal_route = optimal_route + [len(route_df) - 1]  # Add the hotel at the end

    # Map the optimal route back to hotel/bar names (based on their index in the dataframe)
    route_names = [route_df.iloc[i]['Name'] for i in optimal_route] 

    return route_names, (total_distance/1000)*0.621371

In [34]:
# Below was an attempt at making a greedy algorthim with nearest neighbors but it did not 
# prove to be any better in testing

# def nearest_neighbor_tsp2(distance_matrix, waypoint_indices):
#     """
#     Solves the TSP using the nearest neighbor heuristic.
#     Returns the optimal order of waypoints and the total distance.
#     """
#     # Start from the first waypoint in the list
#     current_index = waypoint_indices[0]
#     unvisited = set(waypoint_indices)
#     unvisited.remove(current_index)
    
#     route = [current_index]
#     total_distance = 0
    
#     # Visit each waypoint based on the nearest neighbor
#     while unvisited:
#         nearest_neighbor = min(unvisited, key=lambda x: distance_matrix[current_index][x])
#         route.append(nearest_neighbor)
#         total_distance += distance_matrix[current_index][nearest_neighbor]
#         current_index = nearest_neighbor
#         unvisited.remove(current_index)

#     return route, total_distance


In [35]:
# def find_optimal_route_reversed(route_df):
#     """
#     Finds the optimal route while keeping the start location fixed as the first stop.
#     Optimizes the route in reverse order (bars) ensuring the hotel is the last stop.
#     """
#     # Step 1: Create the distance matrix (distance between all locations)
#     distance_matrix = create_distance_matrix(route_df)

#     # Step 2: Define indices for start and hotel
#     start_index = 0  # Start location (fixed as first)
#     hotel_index = len(route_df) - 1  # Hotel location (fixed as last)

#     # Step 3: Extract waypoint indices (bars), which will be optimized
#     waypoint_indices = list(range(1, len(route_df) - 1))  # Exclude start (0) and hotel (last index)

#     # Step 4: Reverse the waypoints (bars) to optimize in reverse
#     reversed_waypoints = waypoint_indices[::-1]

#     # Step 5: Solve the TSP (nearest neighbor) for the reversed waypoints
#     optimal_waypoints, waypoints_distance = nearest_neighbor_tsp2(distance_matrix, reversed_waypoints)

#     # Step 6: Reconstruct the full route: Start -> Optimized Bars -> Hotel
#     optimal_waypoints.insert(0, start_index)  # Add start at the beginning
#     optimal_waypoints.append(hotel_index)  # Add hotel at the end
#     full_route = optimal_waypoints

#     # Step 7: Calculate the total distance for the full route
#     total_distance = (
#         sum(distance_matrix[full_route[i]][full_route[i + 1]] for i in range(len(full_route) - 1))
#     )

#     # Step 8: Map indices to names
#     route_names = [route_df.iloc[i]['Name'] for i in full_route]

#     return route_names, (total_distance/1000)*0.621371


In [36]:
# Initialize the client with API key
client = openrouteservice.Client(key='5b3ce3597851110001cf6248a430fe3f6db5463cb06554bb3c34d263')


In [37]:
display(huge_arthouse)

Unnamed: 0,Name,Address,Description,Image,Coordinates
0,*10AM START POINT THE CHRISTMAS SPECTACULAR**,"1415 Broadway, New York, NY 10018",10am Santa is Painting the town Red<br>We will...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9868991, 40.7543488)"
1,The Rutherford,"W 33rd St at 8th Ave, New York, NY 10119",Get here early for views from the Roof Deck / ...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.9935521707214, 40.7513730510933)"
2,Avenida,"W. 34 St & 8th Ave, New York, NY 10001",The Holiday Hustle at this rooftop + mexican b...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.993465, 40.752193)"
3,Solas,"232 E 9th St, New York, NY 10003",Jingle Bell Rockout at this lounge & dance clu...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.988059, 40.729445)"
4,The Tailor,"505 8th Ave, New York, NY 10018",Mistletoe Mania at this HUGE - 1000+ Elves & S...,https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.993027, 40.753109)"
5,Bar 13,"121 University Pl, New York, NY 10003","Frosty Fest with Drinking & techno, house & hi...",https://santacon.nyc/wp-content/uploads/2023/1...,"(-73.99218, 40.734531)"
6,Circo,"1604 Broadway York, NY 10019",BIGGEST Santacon venue! Get to this new tri-le...,https://santacon.nyc/wp-content/uploads/2024/1...,"(-73.98425, 40.76042)"
7,Arthouse Hotel NYC,"2178 Broadway, New York, NY, 10024","Arthouse Hotel New York City has brought hip, ...",https://image-tc.galaxy.tf/wijpeg-1nbmvmxtb567...,"(-73.98027072698984, 40.78225322430293)"


In [39]:
find_optimal_route_nn(huge_arthouse)

ApiError: 403 ({'error': 'Access to this API has been disallowed'})

In [None]:
find_optimal_route_reversed(huge_arthouse)

In [None]:
calculate_route_distance(huge_arthouse, [0, 1, 5, 3, 2, 4, 6, 7])

In [None]:
def get_optimized_route_mq(api_key, route_df):
    """
    Retrieve the optimized route from MapQuest API based on a DataFrame.

    Parameters:
    - api_key (str): Your MapQuest API key.
    - route_df (pd.DataFrame): DataFrame containing the addresses for the optimization.

    Returns:
    - dict: Optimized route information or None if the request fails.
    """
    # Extract the Address column as a list
    addresses = route_df['Address'].tolist()

    # MapQuest Optimized Route API endpoint
    base_url = "http://www.mapquestapi.com/directions/v2/optimizedroute"

    # Payload for the API request
    payload = {
        "key": api_key,
        "locations": addresses
    }

    # API request
    response = requests.post(base_url, json=payload)

    if response.status_code == 200:
        # Parse and display the optimized route
        optimized_route = response.json()
        print("Optimized Route Order:")
        for loc in optimized_route['route']['locations']:
            print(loc.get('street', 'No street information'))
            
        # Display total route distance
        total_distance = optimized_route['route'].get('distance', 0)
        print(f"\nTotal Route Distance: {total_distance} miles \n")
        
#         # Display distances between each leg
#         print("\nDistance Between Stops:")
#         for i, leg in enumerate(optimized_route['route']['legs'], 1):
#             print(f"Leg {i}: {leg.get('distance', 0)} miles")
            
        return optimized_route
    else:
        # Print error details if the request fails
        print(f"Error: {response.status_code}, {response.text}")
        return None


In [None]:
mq_key = 'YOUR MAPQUEST KEY NOT MINE'

In [None]:
huge_arthouse_mqopt = get_optimized_route_mq(mq_key, huge_arthouse)
huge_indigo_mqopt = get_optimized_route_mq(mq_key, huge_indigo)
huge_highline_mqopt = get_optimized_route_mq(mq_key, huge_highline)
huge_virgin_mqopt = get_optimized_route_mq(mq_key, huge_virgin)


In [None]:
midtown_arthouse_mqopt = get_optimized_route_mq(mq_key, midtown_arthouse)
midtown_indigo_mqopt = get_optimized_route_mq(mq_key, midtown_indigo)
midtown_highline_mqopt = get_optimized_route_mq(mq_key, midtown_highline)
midtown_virgin_mqopt = get_optimized_route_mq(mq_key, midtown_virgin)


In [None]:
food_arthouse_mqopt = get_optimized_route_mq(mq_key, food_arthouse)
food_indigo_mqopt = get_optimized_route_mq(mq_key, food_indigo)
food_highline_mqopt = get_optimized_route_mq(mq_key, food_highline)
food_virgin_mqopt = get_optimized_route_mq(mq_key, food_virgin)


In [None]:
irish_arthouse_mqopt = get_optimized_route_mq(mq_key, irish_arthouse)
irish_indigo_mqopt = get_optimized_route_mq(mq_key, irish_indigo)
irish_highline_mqopt = get_optimized_route_mq(mq_key, irish_highline)
irish_virgin_mqopt = get_optimized_route_mq(mq_key, irish_virgin)

In [None]:
def add_mq_route_to_summary(optimal_route_json, route_df, summary_df, route_name):
    """
    Add a route's optimized order and total distance to the appropriate row in the summary DataFrame.

    Parameters:
    - optimal_route_json (dict): JSON data from MapQuest optimized route API.
    - route_df (pd.DataFrame): Original DataFrame containing the route addresses.
    - summary_df (pd.DataFrame): DataFrame to store route orders and distances.
    - route_name (str): The name of the route to match in the summary DataFrame.

    Returns:
    - pd.DataFrame: Updated summary DataFrame with the new route details.
    """
    # Normalize optimized addresses and route_df['Address']
    optimized_addresses = [loc['street'].strip().lower() for loc in optimal_route_json['route']['locations']]
    route_df['Address'] = route_df['Address'].str.strip().str.lower()

    # Handling bad address matching with dictionary
    address_mismatching_dict = {
        '33 8th ave': 1,
        'w 34th st': 2,
        '22 w 32nd st, 7th floor': 5,
        '350 w 46th st': 2,
        '215 w 40th st': 2,
    }

    # Create the order list with partial matching
    order_list = []
    for addr in optimized_addresses:
        # Find the first row where the Address contains the substring addr
        matching_rows = route_df[route_df['Address'].str.contains(addr, na=False)]
        if matching_rows.empty:
            if addr in address_mismatching_dict:
                order_list.append(address_mismatching_dict[addr])
            else:
                print(f"Address not found in route_df: {addr}")
                order_list.append(None)  # Handle missing addresses as needed
        else:
            order_list.append(matching_rows.index[0])

    # Extract the total distance from the JSON
    total_distance_miles = optimal_route_json['route'].get('distance', 0)

    # Update the appropriate row in the summary DataFrame
    summary_df.loc[summary_df['Route Name'] == route_name, "MQ's Route Order"] = [order_list]
    summary_df.loc[summary_df['Route Name'] == route_name, "MQ's Route Distance (Miles)"] = total_distance_miles
    
    return summary_df


In [None]:
# Initialize a summary DataFrame with route names
route_names_list = [
    "Huge Arthouse", "Huge Indigo", "Huge Highline", "Huge Birgin",
    "Midtown Arthouse", "Midtown Indigo", "Midtown Highline", "Midtown Virgin",
    "Food Arthouse", "Food Indigo", "Food Highline", "Food Virgin",
    "Irish Arthouse", "Irish Indigo", "Irish Highline", "Irish Virgin"
]

# Create the DataFrame
route_summary_df = pd.DataFrame({
    "Route Name": route_names_list,
    "MQ's Route Order": [None] * len(route_names_list),
    "MQ's Route Distance (Miles)": [None] * len(route_names_list),
    "NN's Route Order": [None] * len(route_names_list),
    "NN's Route Distance (Miles)": [None] * len(route_names_list),
    "Random Route Order": [None] * len(route_names_list),
    "Random Route Distance (Miles)": [None] * len(route_names_list)
})


In [None]:
# make a list of the dataframes of each route
route_df_list = [huge_arthouse, huge_indigo, huge_highline, huge_virgin,
                 midtown_arthouse, midtown_indigo, midtown_highline, midtown_virgin,
                 food_arthouse, food_indigo, food_highline, food_virgin,
                 irish_arthouse, irish_indigo, irish_highline, irish_virgin
                ]

In [None]:
# make a list of the returned mapquest optimized json files
mq_opt_json_list = [huge_arthouse_mqopt, huge_indigo_mqopt,
                    huge_highline_mqopt, huge_virgin_mqopt,
                    midtown_arthouse_mqopt,midtown_indigo_mqopt,
                    midtown_highline_mqopt, midtown_virgin_mqopt,
                    food_arthouse_mqopt,food_indigo_mqopt,
                    food_highline_mqopt, food_virgin_mqopt,
                    irish_arthouse_mqopt, irish_indigo_mqopt,
                    irish_highline_mqopt, irish_virgin_mqopt
                   ]              

In [None]:
for route in route_df_list:
    display(route)

In [None]:
# For loop to update the summary DataFrame
for route_name, optimal_route_json, route_df in zip(route_names_list, mq_opt_json_list, route_df_list):
    route_summary_df = add_mq_route_to_summary(optimal_route_json, route_df, route_summary_df, route_name)

# Display the updated summary DataFrame
display(route_summary_df)


In [None]:
def find_optimal_route_nn(route_df):
    """
    Finds the optimal route for the given route dataframe using TSP.
    The start and hotel locations are fixed, while the waypoints (bars) are rearranged.
    """

    # Step 1: Get the distance matrix (distance between all locations)
    distance_matrix = create_distance_matrix(route_df)

    # Step 2: Solve TSP using Nearest Neighbor heuristic
    optimal_route, total_distance = nearest_neighbor_tsp(distance_matrix)
    
    optimal_route = optimal_route + [len(route_df) - 1]  # Add the hotel at the end

    # Map the optimal route back to hotel/bar names (based on their index in the dataframe)
    route_names = [route_df.iloc[i]['Name'] for i in optimal_route] 

    return route_names, (total_distance/1000)*0.621371

In [None]:
# Define empty placeholders for routes and distances
nn_opt_route = [None] * 16
nn_opt_dist = [None] * 16

# For loop to fill in the route and distance lists
for i, sleigh_ride in enumerate(route_df_list):
    nn_opt_route[i], nn_opt_dist[i] = find_optimal_route_nn(sleigh_ride)

In [None]:
def add_nn_to_summary(summary_df, route_df, route_name, nn_route_list, nn_dist):
    """
    Add a route's optimized order and distance from the nearest neighbor outputs 
    to the appropriate row in the summary DataFrame.

    Parameters:
    - summary_df (pd.DataFrame): DataFrame to store route orders and distances.
    - route_df (pd.DataFrame): Original DataFrame containing the route addresses.
    - route_name (str): The name of the route to match in the summary DataFrame.
    - nn_route_list (list of str): Names of locations in route order.
    - nn_dist_list (list of floats): Distance of each route in miles.

    Returns:
    - pd.DataFrame: Updated summary DataFrame with the new route details.
    """

    # Create the order list with partial matching
    order_list = []
    for name in nn_route_list:
        # Find the first row where the Address contains the substring name
        matching_rows = route_df[route_df['Name']==name]
        if matching_rows.empty:
            print(f"Address not found in route_df: {name}")
            order_list.append(None)  # Handle missing addresses as needed
        else:
            order_list.append(matching_rows.index[0])

    # Extract the total distance from the nearest neighbor output
    total_distance_miles = nn_dist

    # Update the appropriate row in the summary DataFrame
    summary_df.loc[summary_df['Route Name'] == route_name, "NN's Route Order"] = [order_list]
    summary_df.loc[summary_df['Route Name'] == route_name, "NN's Route Distance (Miles)"] = total_distance_miles
    
    return summary_df


In [None]:
display(route_df_list[0])

In [None]:
for route_df, route_name, nn_route, nn_dist  in zip(route_df_list, route_names_list, nn_opt_route, nn_opt_dist):
    route_summary_df = add_nn_to_summary(route_summary_df, route_df, route_name, nn_route, nn_dist)
display(route_summary_df)

In [None]:
def make_random_order_list(route_df, random_state_value):
    """
    Create a randomized route order with the first and last values fixed.

    Parameters:
    - route_df (pd.DataFrame): DataFrame that contains the route information (length is used for the list).
    - random_state_value (int): The value to set the random seed, ensuring reproducibility.

    Returns:
    - list: A list representing a randomized route order with fixed first and last values.
    """
    
    # Set the random seed for reproducibility
    np.random.seed(random_state_value)
    
    # Generate a list of indices from 1 to len(route_df) - 2 (for the middle values)
    middle_indices = list(range(1, len(route_df) - 1))
    
    # Shuffle the middle indices
    np.random.shuffle(middle_indices)
    
    # The randomized order starts with 0, followed by the shuffled middle indices, and ends with len(route_df) - 1
    randomized_order = [0] + middle_indices + [len(route_df) - 1]
        
    return randomized_order


In [None]:
# Define a function to generate a reproducible random seed based on the index or some property of the data
def get_reproducible_seed(index, route_df):
    # Create a seed from a hash of the index and a value from route_df (e.g., the first row or column)
    unique_string = f"{index}-{route_df.iloc[0, 0]}"  # Hash the combination of index and a route_df value
    return int(hashlib.sha256(unique_string.encode('utf-8')).hexdigest(), 16) % (10 ** 8)

In [None]:
random_order_list = [None] * 16

In [None]:
# Loop through your route_df_list and generate a unique but reproducible seed
for i, df in enumerate(route_df_list):
    random_seed = get_reproducible_seed(i, df)  # Generate a reproducible seed based on the index and a df value
    random_order_list[i] = make_random_order_list(df, random_seed)

In [None]:
address_lists = [None]*16
random_dists = [None]*16

for i, (route_df, random_order) in enumerate(zip(route_df_list, random_order_list)):
    address_lists[i] = [route_df['Address'][idx] for idx in random_order]

In [None]:
import requests

def get_mapquest_distance_between_points(from_address, to_address, api_key):
    """
    Get the distance between two points using MapQuest API.

    Parameters:
    - from_address (str): The starting address.
    - to_address (str): The destination address.
    - api_key (str): Your MapQuest API key.

    Returns:
    - float: The distance between the two addresses in miles, or None if there is an error.
    """
    url = "http://www.mapquestapi.com/directions/v2/route"
    
    # Set up the parameters for the request
    params = {
        'key': api_key,  # MapQuest API Key
        'from': from_address,  # Starting address
        'to': to_address,  # Ending address
        'outFormat': 'json'  # Get the response in JSON format
    }
    
    # Send the request and parse the response
    response = requests.get(url, params=params)
    
    # Check if the response was successful
    if response.status_code == 200:
        data = response.json()
        if 'route' in data and 'distance' in data['route']:
            # Return the route distance in miles
            return data['route']['distance']
        else:
            print("Error: No route data in the response.")
            return None
    else:
        print(f"Error fetching data from MapQuest API: {response.status_code}")
        return None


def calculate_total_distance(addresses, api_key):
    """
    Calculate the total distance between a list of addresses by summing up the distances 
    between each consecutive pair of addresses.

    Parameters:
    - addresses (list): List of addresses in order.
    - api_key (str): Your MapQuest API key.

    Returns:
    - float: The total driving distance in miles, or None if there is an error.
    """
    total_distance = 0

    # Iterate through the list of addresses and calculate the distance between each consecutive pair
    for i in range(len(addresses) - 1):
        from_address = addresses[i]
        to_address = addresses[i + 1]
        
        # Get the distance between each pair of consecutive points
        distance = get_mapquest_distance_between_points(from_address, to_address, api_key)
        
        if distance is None:
            print("Error calculating distance between {} and {}.".format(from_address, to_address))
            return None
        
        # Add the distance to the total distance
        total_distance += distance
    
    return total_distance


In [None]:
for i in range(len(address_lists)):
    random_dists[i] = calculate_total_distance(address_lists[i], mq_key)
    print(f'Total route distance: {random_dists[i]} miles')

In [None]:
def add_random_to_summary(summary_df, route_name, route_order, route_distance):
    """
    Add a random route's order and distance to the appropriate row in the summary DataFrame.

    Parameters:
    - summary_df (pd.DataFrame): DataFrame to store route orders and distances.
    - route_name (str): Name of the route to match in the summary DataFrame.
    - route_order (list of int): Order of the route, randomly generated.
    - route_distance (float or str): Total route distance in miles (can be a float or string).

    Returns:
    - pd.DataFrame: Updated summary DataFrame with the new route details.
    """

    # Update the appropriate row in the summary DataFrame
    summary_df.loc[summary_df['Route Name'] == route_name, "Random Route Order"] = [route_order]
    summary_df.loc[summary_df['Route Name'] == route_name, "Random Route Distance (Miles)"] = route_distance
    
    return summary_df


In [None]:
for route_name, random_order, random_dist  in zip(route_names_list, random_order_list, random_dists):
    route_summary_df = add_random_to_summary(route_summary_df, route_name, random_order, random_dist)
display(route_summary_df)

In [None]:
worse_df = route_summary_df.copy().drop(["MQ's Route Order", "NN's Route Order", 'Random Route Order'], axis=1)
huge_worse_df = worse_df[worse_df['Route Name'].str.contains('Huge')]
mid_worse_df = worse_df[worse_df['Route Name'].str.contains('Midtown')]
food_worse_df = worse_df[worse_df['Route Name'].str.contains('Food')]
irish_worse_df = worse_df[worse_df['Route Name'].str.contains('Irish')]

In [None]:
def calculate_distance_diff(df):
    """
    Calculate the difference in distance from the optimized route for Nearest Neighbors and Random routes.
    """
    # Use .loc to avoid SettingWithCopyWarning
    df.loc[:, 'NN_Distance_Diff'] = df['NN\'s Route Distance (Miles)'] - df['MQ\'s Route Distance (Miles)']
    df.loc[:, 'Random_Distance_Diff'] = df['Random Route Distance (Miles)'] - df['MQ\'s Route Distance (Miles)']
    return df


In [None]:
huge_worse_df = calculate_distance_diff(huge_worse_df)
mid_worse_df = calculate_distance_diff(mid_worse_df)
food_worse_df = calculate_distance_diff(food_worse_df)
irish_worse_df = calculate_distance_diff(irish_worse_df)

In [None]:
import plotly.express as px

# Function to create the bar graph
def create_bar_plot(df, title):
    """
    Create a bar plot comparing distance differences for Nearest Neighbors and Random routes.

    Parameters:
    - df (pd.DataFrame): DataFrame containing the route distances and their differences.
    - title (str): Title of the plot.
    """
    # Create a long format DataFrame for Plotly Express
    df_long = df.melt(id_vars=["Route Name"], 
                      value_vars=["NN_Distance_Diff", "Random_Distance_Diff"], 
                      var_name="Route Type", 
                      value_name="Distance Difference (Miles)")
    
    # Create the bar plot
    fig = px.bar(df_long, 
                 x='Route Name', 
                 y='Distance Difference (Miles)', 
                 color='Route Type',  # Color by the type of route (NN vs Random)
                 barmode='group',  # Group bars by route type
                 labels={'Route Name': 'Route', 'Distance Difference (Miles)': 'Distance Difference (Miles)', 'Route Type': 'Route Optimization'},
                 title=title,
                 hover_data=['Route Name', 'Distance Difference (Miles)'])
    
    # Update layout
    fig.update_layout(xaxis_title="Route Name", 
                      yaxis_title="Distance Difference (Miles)", 
                      xaxis_tickangle=-45)

    return fig

# Generate and show bar plots for each DataFrame
huge_plot = create_bar_plot(huge_worse_df, "How much farther compared to optimized route")
mid_plot = create_bar_plot(mid_worse_df, "How much farther compared to optimized route")
food_plot = create_bar_plot(food_worse_df, "How much farther compared to optimized route")
irish_plot = create_bar_plot(irish_worse_df, "How much farther compared to optimized route")

# Show the plots
huge_plot.show()
mid_plot.show()
food_plot.show()
irish_plot.show()


In [None]:
# Backup if something happens and we max out our api token.

# # Optimal huge routes
# order_art = [0, 1, 5, 3, 2, 4, 6, 7]
# order_ind = [0, 6, 4, 2, 1, 5, 3, 7]
# order_high = [0, 2, 3, 6, 4, 1, 5, 7]
# order_vir = [0, 2, 1, 6, 3, 4, 5, 7]

# # optimized ordered DataFrames
# huge_arthouse_opt = huge_arthouse.iloc[order_art]
# huge_indigo_opt = huge_indigo.iloc[order_ind]
# huge_highline_opt = huge_highline.iloc[order_high]
# huge_virgin_opt = huge_virgin.iloc[order_vir]

# # Optimal midtown routes
# order_art = [0, 5, 6, 3, 2, 4, 1, 7]
# order_ind = [0, 4, 1, 3, 2, 5, 6, 7]
# order_high = [0, 4, 1, 3, 2, 5, 6, 7]
# order_vir = [0, 4, 1, 3, 2, 5, 6, 7]

# # optimized ordered DataFrames
# midtown_arthouse_opt = midtown_arthouse.iloc[order_art].reset_index(drop=True)
# midtown_indigo_opt = midtown_indigo.iloc[order_ind].reset_index(drop=True)
# midtown_highline_opt = midtown_highline.iloc[order_high].reset_index(drop=True)
# midtown_virgin_opt = midtown_virgin.iloc[order_vir].reset_index(drop=True)

# # Optimal food routes
# order_art = [0, 3, 5, 1, 6, 4, 2, 7]
# order_ind = [0, 2, 4, 6, 3, 1, 5, 7]
# order_high = [0, 4, 6, 1, 5, 2, 3, 7]
# order_vir = [0, 1, 2, 6, 5, 3, 4, 7]

# # optimized ordered DataFrames
# food_arthouse_opt = food_arthouse.iloc[order_art].reset_index(drop=True)
# food_indigo_opt = food_indigo.iloc[order_ind].reset_index(drop=True)
# food_highline_opt = food_highline.iloc[order_high].reset_index(drop=True)
# food_virgin_opt = food_virgin.iloc[order_vir].reset_index(drop=True)

# # Optimal irish routes
# order_art = [0, 1, 3, 5, 7, 4, 2, 6, 8]
# order_ind = [0, 1, 3, 5, 7, 4, 2, 6, 8]
# order_high = [0, 1, 2, 6, 3, 5, 4, 7, 8]
# order_vir = [0, 1, 2, 6, 3, 5, 7, 4, 8]

# # optimized ordered DataFrames
# irish_arthouse_opt = irish_arthouse.iloc[order_art].reset_index(drop=True)
# irish_indigo_opt = irish_indigo.iloc[order_ind].reset_index(drop=True)
# irish_highline_opt = irish_highline.iloc[order_high].reset_index(drop=True)
# irish_virgin_opt = irish_virgin.iloc[order_vir].reset_index(drop=True)

## Review of Data and Algorithms

In review the data was found directly from a quality source. No major preprocessing was necessary, but a few edits were made. After chosing routes based on ensuring users would not find a closed venue and picking from a few themes we tried multiple means of finding the optimal route. A constraint we wanted to make was that distances were going to be based off driving routes, not distances as the bird flies. 

We first started with looking at nearest neighbors optimization. This would reduce the number of calcuations for possibilies to a complexity of O(n^2). High, but not unmanagabe. While this seemed to produce good results there was no difference in routes because we could not find good logic to keep a final destination a hotel and also consider that distance in as a factor of the optimization. We tried reversing things to see if not considering the starting location was better. This actually had worse results most of the time. 

We considered a greedy algorithm, which has another nice complexity O(N 2 log2(N)),but due to street distances not allowing for this to be optimal we decided to not attempt it.

From there I attempted a brute force approach. This however has a very large complexity, O(n!). When attempting I ran out of our API's free capacities. Because of this we decited to use MapQuest's API that generates the optimal route. for our users.

We also made a randomly selected route for each possible trip selection to see if our optimizations were worth the work. 

I compared distances of the Nearest Neighbors and the Random route compared to the route MapQuest's API generated. In the end we can see that when the distances between bars were short (the irish and midtown routes), the nearest neighbor's algorithm often performed worse than random. But overall the nearest neighbor performed better than random 75% of the time. So noteably better than random change. Additionally, the Nearest Neighbors outperformed the Mapquest distances 2 times (about 12.5% of the time). The difference was small .3 and .5 miles, but it does show that the API uses algorithms that aren't guarenteed to produce the best outcomes.

If I were to do this again, I would consider doing a variation of brute force. First I would calculate all the permuations of the bar stops being 6 and 7 stops that would lead to 720 and 5040 and then add the first stop and last stop to each of those points. And brute force from there. (this would save us from 40320 and 362880 if I didn't take those out). I would also consider calculating the shortest distance from the start to the first the penultimate stop to the hotel stop reducing the compuations (to 126, 727). The logic behind this is each hotel is in a different spot while the starting point is the same always, reducing this point's distance to be optimal does have the potential to help us consistently--though not always. While reducing this complexity drastically increases the risk of not finding the global minimum, the computational save is MASSIVE. There would even be capacity to randomly generate a large number routes without the constrain to help ensure the quality of our result. 