# Getting Hotel Distances From Attractions

We want to retrieve the distances, in miles ("as the crow flies") from the hotels listed in the Excel file "Manhattan_selected_to_STR" from the four attractions used by the data distributors. From here, we'll use those distances to pair hotel IDs to the hotels themselves.

In [4]:
# imports...
import pandas, numpy as np, math, warnings
from geopy import GoogleV3
from geopy.exc import GeocoderTimedOut
from geopy.distance import vincenty

# importing helper methods
from util import *

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# ignoring all warnings (not great practice but...)
warnings.filterwarnings("ignore")

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Loading Data

We load up the Excel file contains hotels and information relevant to them. We remove columns we don't need, and condense all address information into one line.

In [5]:
# loading data
hotels_info = pandas.read_excel('../data/Manhattan_selected_to_STR.xls')

# removing unnecessary columns and condensing into (Name, Address) only
hotels_info = hotels_info[['Name', 'Address', 'City', 'ST', 'Zip']]
hotels_info['Address'] = hotels_info['Address'] + ' ' + hotels_info['City'] + ' ' + hotels_info['ST'] + ' ' + hotels_info['Zip']
hotels_info = hotels_info.drop('City', axis=1).drop('ST', axis=1).drop('Zip', axis=1)

## Calculating Distances From Attractions

The goal of the code that follows is to create a new spreadsheet containing (Name, Address, Distance 1, Distance 2, Distance 3, Distance 4) columns, where Dist _i_ corresponds to the distance from the _i_ th attraction

In [6]:
# specifying attraction addresses
attr1 = '1681 Broadway, New York, NY 10019' # Broadway Theatre
attr2 = '1000 5th Ave, New York, NY 10028' # The Metropolitan Museum of Art
attr3 = '350 5th Ave, New York, NY 10118' # Empire State Building
attr4 = '285 Fulton St, New York, NY 10007' # One World Trade Center

# creating geocoder object
geolocator = GoogleV3(api_key='AIzaSyAWV7aBLcawx2WyMO7fM4oOL9ayZ_qGz-Y', timeout=10)

# getting locations of attractions
loc1, loc2, loc3, loc4 = geolocator.geocode(attr1), geolocator.geocode(attr2), geolocator.geocode(attr3), geolocator.geocode(attr4)

# storing the latitude and longitude of each attraction
attr_coords = [ (loc1.latitude, loc1.longitude), (loc2.latitude, loc2.longitude), (loc3.latitude, loc3.longitude), (loc4.latitude, loc4.longitude) ]

# create columns in 'hotels_info' dataframe for distances (in miles) to attractions from hotels
sLength = len(hotels_info['Name'])
hotels_info.loc[:, 'Broadway Theatre Distance'] = pandas.Series(np.zeros(sLength), index=hotels_info.index)
hotels_info.loc[:, 'Metropolitan Museum of Art Distance'] = pandas.Series(np.zeros(sLength), index=hotels_info.index)
hotels_info.loc[:, 'Empire State Building Distance'] = pandas.Series(np.zeros(sLength), index=hotels_info.index)
hotels_info.loc[:, 'One World Trade Center Distance'] = pandas.Series(np.zeros(sLength), index=hotels_info.index)

In [8]:
# loop through the Addresses column, get coordinates, and calculate distance in miles from each attraction
for idx, address in enumerate(hotels_info['Address']):
    # print progress to console
    if idx % 10 == 0:
        print 'Progress:', idx, '/', len(hotels_info['Address'])
    
    # getting location of hotel via Google geocoding
    hotel_location = geolocator.geocode(address)

    # getting coordinates of hotel
    hotel_coords = hotel_location.latitude, hotel_location.longitude
        
    # calculate distances in miles from each attraction and store it in its corresponding column
    hotels_info['Broadway Theatre Distance'][idx] = vincenty(hotel_coords, attr_coords[0]).miles
    hotels_info['Metropolitan Museum of Art Distance'][idx] = vincenty(hotel_coords, attr_coords[1]).miles
    hotels_info['Empire State Building Distance'][idx] = vincenty(hotel_coords, attr_coords[2]).miles
    hotels_info['One World Trade Center Distance'][idx] = vincenty(hotel_coords, attr_coords[3]).miles
    
print 'Progress:', len(hotels_info['Address']), '/', len(hotels_info['Address'])

Progress: 0 / 178
Progress: 10 / 178
Progress: 20 / 178
Progress: 30 / 178
Progress: 40 / 178
Progress: 50 / 178
Progress: 60 / 178
Progress: 70 / 178
Progress: 80 / 178
Progress: 90 / 178
Progress: 100 / 178
Progress: 110 / 178
Progress: 120 / 178
Progress: 130 / 178
Progress: 140 / 178
Progress: 150 / 178
Progress: 160 / 178
Progress: 170 / 178
Progress: 178 / 178


In [9]:
# save out (Name, Address, Distances x 4) dataframe to Excel file
writer = pandas.ExcelWriter('../data/Hotel_Names_Addresses_Distances.xlsx')
hotels_info.to_excel(writer, 'Names, Addresses, Attractions')
writer.close()