## Introduction
### Where to locate a new hotel in Toronto for young people?
The idea would be providing a tool for investors in order to help them decide where to locate new hotels based on a target group and venues in different neighbourhoods.

Considering young a person between 20 and 40, the hotel would need to be close (no more than 500 meters away) to facilities of interest for the group described.

Those venues would be:
- Fast food restaurants (not real restaurants)
- Bars
- Public transportation
- Live music joints

Also, the hotel should be far as far from other hotels as possible.

## Data
### What date is going to serve this project?
Data to use would be location data for Toronto neighbouthoods, such as centers, limits and geospatial coordinates. Also Foursquare data would be used for searching neighbourhood venues.

Data sources:
- Toronto postal codes: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
- Geospatial_Coordinates.csv file containing coordinates information mapped to postal codes.

## Metodology
### What date is going to serve this project?

The process would cover the following steps:
- Assign a value to each neigbourhood based on proximity to desiered places.
- Assign a coeficient to the proximity to other hotels in order to modify previous value.
- Display obtained information in maps.
- Determine which neighbourhood is best for our needs.

## Let's find the coordinates for every neighbourhood center in Toronto

### Scrap Wikipedia page with Canada postal information

In [1]:
# Import the needed libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd

# Scrap wikipedia page with Canada postal codes list
html = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(html, 'lxml')

# Having checked html results, it is possible to find table content and headers for the table
table = soup.find('table', {'class':'wikitable sortable'})
headers = list(filter(None, table.tr.text.splitlines()))

print('Headers for table information from Wikipedia.')
print(headers)

Headers for table information from Wikipedia.
['Postal Code', 'Borough', 'Neighbourhood']


### Transform table data into a CSV file for DataFrame use and future needs

In [2]:
# Helper function for parsing html table into a csv like format, more compatible with Pandas
def parse_table(t):
    table = ""
    for tr in t.find_all('tr'):
        row = ""
        for td in tr.find_all('td'):
            row = row + "," + td.text.replace('\n', '')
        table = table + row[1:] + '\n'
    return table

parsed = parse_table(table)
print('Parsed html table transformed content (first 200 characters):')
print(parsed[:200])

Parsed html table transformed content (first 200 characters):

M1A,Not assigned,Not assigned
M2A,Not assigned,Not assigned
M3A,North York,Parkwoods
M4A,North York,Victoria Village
M5A,Downtown Toronto,Regent Park, Harbourfront
M6A,North York,Lawrence Manor, Lawr


In [3]:
# Write parsed table into a CSV file and load it as a DataFrame
file = open("toronto.csv", "wb")
file.write(bytes(parsed, encoding="ascii", errors="ignore"))

df = pd.read_csv('toronto.csv', header=None, usecols=[0,1,2])
df.columns = headers

df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park
5,M6A,North York,Lawrence Manor
6,M7A,Downtown Toronto,Queen's Park
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern


### Clean non useful data
From the above DataFrame one can tall that there are somo postal codes that are not assigned to specific neighbourhoods, we will consider that as a useless row. We also should consider that if a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [4]:
toRemoveIndexes = df[ df['Borough'] =='Not assigned'].index

df.drop(toRemoveIndexes, inplace=True)
df.loc[df['Neighbourhood'] =='Not assigned' , 'Neighbourhood'] = df['Borough']

print('Clean DataFrame:')
df.head(10)

Clean DataFrame:


Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park
5,M6A,North York,Lawrence Manor
6,M7A,Downtown Toronto,Queen's Park
8,M9A,Etobicoke,Islington Avenue
9,M1B,Scarborough,Malvern
11,M3B,North York,Don Mills
12,M4B,East York,Parkview Hill
13,M5B,Downtown Toronto,Garden District


### Load data mapping lat and lon coordinates values to postal codes
Then merge this information with the neighbourhood DataFrame.

In [5]:
df_lon_lat = pd.read_csv('Geospatial_Coordinates.csv')

print('Geospatial coordinates mapped to Postal Code DataFrame:')
df_lon_lat.head()

Geospatial coordinates mapped to Postal Code DataFrame:


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
toronto_df = pd.merge(df, df_lon_lat, on='Postal Code')

print('Merged DataFrame')
toronto_df.head()

Merged DataFrame


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636
3,M6A,North York,Lawrence Manor,43.718518,-79.464763
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494


## Let's assign a value to each neigbourhood based on proximity to desiered places

For this porpouse, we will find how many venues for the listed types can be found at a 1000 m radius from the neighbourhood center. Every venue found will add one to a total representing the value given to the neigbourhood by our particular criteria.

### Build a function for getting venues based on our criterias

In [7]:
# Define the categories using Foursquare codes
cat_fast_food = '4bf58dd8d48988d16e941735'
cat_bar = '4bf58dd8d48988d116941735'
cat_disco = '4bf58dd8d48988d11f941735'
cat_live_music = '4bf58dd8d48988d1e5931735'
cat_public_transport = '52f2ab2ebcbc57f1066b8b4f'
cat_hotels = '4bf58dd8d48988d1fa931735'
cats = [cat_fast_food, cat_bar, cat_disco, cat_live_music, cat_public_transport, cat_hotels]

# Foursquare credentials
client_id = 'Z5IMLDVACTQVK3NYZAEI5DEBE4NR1C3OYBF1JF1HVPUR2K02'
client_secret = 'PDDS4G05JJW1YENSYOTNMGA0JUGGVTOPIPAAIRCRGKXC4PFN'

# A function that uses Foursquare API for locating vanues based on a category, a location and a radius
def get_venues_count(lat, lon, category, client_id, client_secret, radius=1000, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
    except:
        results = []
    return len(results)

example_venues_count = get_venues_count(toronto_df.loc[0, 'Latitude'], toronto_df.loc[0, 'Longitude'], cat_fast_food, client_id, client_secret)

print('As an example, let\'s print how many fast food restaurants can be found 1000 m away from Parkwoods center:')
example_venues_count

As an example, let's print how many fast food restaurants can be found 1000 m away from Parkwoods center:


4

### Let's extend Toronto DF with the number of venues for each category

In [8]:
for cat in cats:
    toronto_df[cat] = toronto_df.apply(lambda row: get_venues_count(row.Latitude, row.Longitude, cat, client_id, client_secret), axis = 1)
    
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,4bf58dd8d48988d16e941735,4bf58dd8d48988d116941735,4bf58dd8d48988d11f941735,4bf58dd8d48988d1e5931735,52f2ab2ebcbc57f1066b8b4f,4bf58dd8d48988d1fa931735
0,M3A,North York,Parkwoods,43.753259,-79.329656,4,0,0,0,5,0
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,1,0,1,5,1
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,10,27,0,2,4,4
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,7,3,0,0,7,0
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,76,94,7,6,7,25


In [9]:
# Rename columns for readability
toronto_df.rename(columns = {
    '4bf58dd8d48988d16e941735':'Fast food',
    '4bf58dd8d48988d116941735':'Bar',
    '4bf58dd8d48988d11f941735':'Night club',
    '4bf58dd8d48988d1e5931735':'Live music',
    '52f2ab2ebcbc57f1066b8b4f':'Transport',
    '4bf58dd8d48988d1fa931735':'Hotel'}, inplace = True)

toronto_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Fast food,Bar,Night club,Live music,Transport,Hotel
0,M3A,North York,Parkwoods,43.753259,-79.329656,4,0,0,0,5,0
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,1,0,1,5,1
2,M5A,Downtown Toronto,Regent Park,43.65426,-79.360636,10,27,0,2,4,4
3,M6A,North York,Lawrence Manor,43.718518,-79.464763,7,3,0,0,7,0
4,M7A,Downtown Toronto,Queen's Park,43.662301,-79.389494,76,94,7,6,7,25
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,1,0,0,1,1,0
6,M1B,Scarborough,Malvern,43.806686,-79.194353,3,0,0,0,0,0
7,M3B,North York,Don Mills,43.745906,-79.352188,3,2,0,0,0,0
8,M4B,East York,Parkview Hill,43.706397,-79.309937,3,4,0,1,2,0
9,M5B,Downtown Toronto,Garden District,43.657162,-79.378937,100,100,8,8,6,47


### Add a total score number to each neighbourhood
This will be calculated considering the total number of venues and substracting the number of hostels in the area. Other approachs could be considered.

In [10]:
toronto_df['Total score'] = toronto_df['Fast food'] + toronto_df['Bar'] + toronto_df['Night club'] + toronto_df['Live music'] + toronto_df['Transport'] - toronto_df['Hotel']

# Results
## Top 5 neighbourhoods printed in a map
Top 5 neighbourhoods where to locate a new hotel in Toronto for young people.

In [11]:
# Install needed lybraries
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Libraries imported.


In [12]:
# Geo locate Toronto
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [16]:
# Get the top 5 neighbourhoods
toronto_df.sort_values(by=['Total score'], ascending=False, inplace=True)
top_5 = toronto_df.head()
top_5

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Fast food,Bar,Night club,Live music,Transport,Hotel,Total score
42,M5K,Downtown Toronto,Toronto Dominion Centre,43.647177,-79.381576,100,100,27,18,11,58,198
97,M5X,Downtown Toronto,First Canadian Place,43.648429,-79.38228,100,100,29,17,11,60,197
30,M5H,Downtown Toronto,Richmond,43.650571,-79.384568,100,100,29,20,12,66,195
48,M5L,Downtown Toronto,Commerce Court,43.648198,-79.379817,100,100,20,16,11,53,194
84,M5T,Downtown Toronto,Kensington Market,43.653206,-79.400049,37,100,28,24,10,18,181


In [18]:
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=14)

# add markers to map
for lat, lng, neighborhood, total in zip(top_5['Latitude'], top_5['Longitude'], top_5['Neighbourhood'], top_5['Total score']):
    label = '{}: {}'.format(neighborhood, total)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

# Conclusion
Considering the previous approach as a tool for providing information about neighbourhoods based on desired and non desired venues, we could extend this to any given city with a range of criterias that may vary based on different approaches.