# Getting Hotel Distances From Attractions

We want to retrieve the distances, in miles ("as the crow flies") from the hotels listed in the Excel file "Manhattan_selected_to_STR" from the four attractions used by the data distributors. From here, we'll use those distances to pair hotel IDs to the hotels themselves.

In [40]:
# imports...
import pandas, numpy as np, math
from geopy import GoogleV3
from geopy.distance import vincenty

# importing helper methods
from util import *

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Loading Data

We load up the Excel file contains hotels and information relevant to them. We remove columns we don't need, and condense all address information into one line.

In [41]:
# loading data
hotels_info = pandas.read_excel('Manhattan_selected_to_STR.xls')

# removing unnecessary columns and condensing into (Name, Address) only
hotels_info = hotels_info[['Name', 'Address', 'City', 'ST', 'Zip']]
hotels_info['Address'] = hotels_info['Address'] + ' ' + hotels_info['City'] + ' ' + hotels_info['ST'] + ' ' + hotels_info['Zip']
hotels_info = hotels_info.drop('City', axis=1).drop('ST', axis=1).drop('Zip', axis=1)

## Calculating Distances From Attractions

The goal of the code that follows is to create a new spreadsheet containing (Name, Address, Distance 1, Distance 2, Distance 3, Distance 4) columns, where Dist _i_ corresponds to the distance from the _i_ th attraction

In [42]:
# creating geocoder object
geolocator = GoogleV3()

# specifying attraction addresses
attr1 = '1681 Broadway, New York, NY 10019' # Broadway Theatre
attr2 = '1000 5th Ave, New York, NY 10028' # The Metropolitan Museum of Art
attr3 = '350 5th Ave, New York, NY 10118' # Empire State Building
attr4 = '285 Fulton St, New York, NY 10007' # One World Trade Center

# getting locations of attractions
loc1, loc2, loc3, loc4 = geolocator.geocode(attr1), geolocator.geocode(attr2), geolocator.geocode(attr3), geolocator.geocode(attr4)

# storing the latitude and longitude of each attraction
attr_coords = [ (loc1.latitude, loc1.longitude), (loc2.latitude, loc2.longitude), (loc3.latitude, loc3.longitude), (loc4.latitude, loc4.longitude) ]

# create columns in 'hotels_info' dataframe for distances (in miles) to attractions from hotels
sLength = len(hotels_info['Name'])
hotels_info['Broadway Theatre Distance'] = pandas.Series(np.random.randn(sLength), index=hotels_info.index)
hotels_info['Metropolitan Museum of Art Distance'] = pandas.Series(np.random.randn(sLength), index=hotels_info.index)
hotels_info['Empire State Building Distance'] = pandas.Series(np.random.randn(sLength), index=hotels_info.index)
hotels_info['One World Trade Center Distance'] = paandas.Series(np.random.randn(sLength), index=hotels_info.index)

# loop through the Addresses column, get coordinates, and calculate distance in miles from each attraction
for idx, address in enumerate(hotels_info['Address']):
    # getting location and then coordinates from it
    hotel_location = geolocator.geocode(address)
    hotel_coords = hotel_location.latitude, hotel_location.longitude
    
    # calculate distances in miles from each attraction and store it in its corresponding column
    hotels_info['Broadway Theatre Distance'][idx] = vincenty(hotel_coords, attr_coords[0])
    hotels_info['Metropolitan Museum of Art Distance'][idx] = vincenty(hotel_coords, attr_coords[1])
    hotels_info['Empire State Building Distance'][idx] = vincenty(hotel_coords, attr_coords[2])
    hotels_info['One World Trade Center Distance'][idx] = vincenty(hotel_coords, attr_coords[3])

# save out (Name, Address, Distances x 4) dataframe to Excel file
writer = pandas.ExcelWriter('Hotel_Names_Addresses_Distances.xlsx')
hotels_info.to_excel(writer, 'Names, Addresses, and Attraction Distances')

GeocoderServiceError: <urlopen error [Errno -3] Temporary failure in name resolution>