## Capstone Project - The Battle of Neighborhoods

### Proejct Part 1

#### 1-A. Problem Description

A relatively new company recently successfully opened a store in New York, now wants to expand its second store to a different market. The company's store offers healthy (non-alcoholic) drinks and light food to people, both local and out of town vistors. Their product attracts all kinds of people with different ethnicities, different ages and different education backgrounds. The common trait is the people are conscious about health food or people want to try healthy food or drink.

Its food or drink offering are a small store format with a few choices, similar to Jumba Juice. One of its ingredient of success is that store needs to be in a sufficient traffic area. So the first thing in the company's mind is to open the second store in a similar market, and they have decided to open in Toronto, Canada.

The first thing want to do is to decide on which neighborhood to open. They then will try to narrow down the particular location based on rental space availability and such real estate related issues. 

Because their product attracts people with different demographics, there is not a set rules to determine which neighborhood to pick. They elected to consult with a data science service company to help them pick a few similar neighborhoods. They will then choose a location among those similar neighborhoods.

The data science consulting offered this idea to them. They will use K-means to group New Yorks neighborhoods and identify the kind of neighborhood the current store is in. Then they will find the same neighborhoods in Toronto, Canada by scoring the K-means model developed based on New York neighborhoods.

#### 1-B. Data and Methodology 

<b> Data </b>
<ol>
  <li> The Neighborhood of current store in New York.</li>
    <p>Based on the company store address, it has been determined that the neighborhood name is "Lincoln Square"</p>
    <p>  </p>
   
  <li>New York geodata.</li>
    <p>This is the data with borough and neighborhood names.</p>
    <p>  </p>    
    
  <li>New York Foursqure data.</li>
    <p>This contains most common venues of given neighborhood in New York.</p>
    <p>  </p>    
    
  <li>Toronto Wiki Page</li>
    <p>This has zip codes and their borough and neighborhood names for Toronto.</p>
    <p>  </p>    
    
  <li>Toronto Foursqure data.</li>
    <p>This contains most common venues of given neighborhood in New York.</p>
</ol>

<b> Methodology </b>
<ol>
  <li> New York geo data is merged with Foursquare New York data, so each neighborhood has their venues listed.</li>
    <p>  </p>    
   
  <li>Based on its top 10 venues in each neighborhood, the neighborhoods are classified into a few different groups.</li>
    <p>  </p>    
    
  <li>Find the group number the "Lincoln Square" is in</li>
    <p>  </p>    
    
  <li>Toronto postal data is processed to a neiborhood data like New York geo data</li>
    <p>  </p>    
    
  <li>Toronto geo data is merged with Toronto Foursqure data. And further process to have top 10 venues in each neiborhood.</li>
    <p>  </p>    
    
  <li>Apply the K-mean model from New York to Toronto neighborhood venu data to find neighborhoods have the same group number as "Lincoln Square"</li>

</ol>

## Proejct Part 2: Data Preparation, Modeling and Results

#### 2-A. Data preparation

<b>Import libs for data processing</b>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

<b>Install geo and folium</b>

In [2]:
# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
# from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
# import folium # map rendering library


In [3]:
!pip install folium
import folium    # for map

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 4.2 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [4]:
!pip install geopy   # for coordinates
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values



<b>Download and Explore Datasets</b>

<b>New York neighborhood geo data</b>

In [5]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


Read data and extract neighborhood related columns

In [6]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

neighborhoods_data = newyork_data['features']
print(len(neighborhoods_data))
neighborhoods_data[0]

306


{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Read neighborhoods_data into data frame

In [8]:
for data in neighborhoods_data:
    # borough = neighborhood_name = data['properties']['borough'] # original line, not making sense
    borough = data['properties']['borough']
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Quick look neighborhood

In [9]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


<b> Visualize Manhattan New York Map</b>

In [10]:
# address = 'New York City, NY'
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

The geograpical coordinate of New York City are 40.7896239, -73.9598939.


Sample Manhattan data

In [11]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


<b>Manhattan neighborhood Foursquare data</b>

In [12]:
CLIENT_ID = 'EWOBIN44ST33Q2UXOCR1K5JZ55BI51IJHOY3NHYNWR3IR32B' # your Foursquare ID
CLIENT_SECRET = 'TB5DKEEO2TPVMF0XRJTK0MXEMWTYGAU0Q3Z2SHF5ZUHGFV1U' # your Foursquare Secret
# ACCESS_TOKEN = 'Q2Y1KSJYJUMIIBSDTNHL4ZXJ2CB3L3HMB1BNNALV5P2JXITA' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: EWOBIN44ST33Q2UXOCR1K5JZ55BI51IJHOY3NHYNWR3IR32B
CLIENT_SECRET:TB5DKEEO2TPVMF0XRJTK0MXEMWTYGAU0Q3Z2SHF5ZUHGFV1U


Start with the first neighborhood to form an area for illustration

In [13]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


Limit to 2 KM radius and 500 venues for illustration

In [14]:
radius= 2000
LIMIT=500 # my use limit is 100, so I don't get anything more than 100

#url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, neighborhood_latitude, neighborhood_longitude, radius, LIMIT)

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=EWOBIN44ST33Q2UXOCR1K5JZ55BI51IJHOY3NHYNWR3IR32B&client_secret=TB5DKEEO2TPVMF0XRJTK0MXEMWTYGAU0Q3Z2SHF5ZUHGFV1U&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=2000&limit=500'

In [15]:
results = requests.get(url).json()
# results

Build function to and extract venues from neiborhood results

In [16]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [17]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Bikram Yoga,Yoga Studio,40.876844,-73.906204
1,Tibbett Diner,Diner,40.880404,-73.908937
2,Sam's Pizza,Pizza Place,40.879435,-73.905859
3,Arturo's,Pizza Place,40.874412,-73.910271
4,The Bronx Public,Pub,40.878377,-73.903481


In [18]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


create a function to repeat the same process to all the neighborhoods in Manhattan

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)  # commented out for sharing purpose
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
# type your answer here
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

Check Manhattan venue data frame shape

In [21]:
print(manhattan_venues.shape)
manhattan_venues.head()

(3201, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


Check Manhattan neighborhood venue count and categories

In [22]:
manhattan_venues.groupby('Neighborhood')['Venue'].count()

Neighborhood
Battery Park City       65
Carnegie Hill           86
Central Harlem          45
Chelsea                100
Chinatown              100
Civic Center           100
Clinton                100
East Harlem             40
East Village           100
Financial District     100
Flatiron               100
Gramercy                94
Greenwich Village      100
Hamilton Heights        63
Hudson Yards            62
Inwood                  56
Lenox Hill             100
Lincoln Square          92
Little Italy           100
Lower East Side         47
Manhattan Valley        49
Manhattanville          46
Marble Hill             22
Midtown                100
Midtown South          100
Morningside Heights     43
Murray Hill            100
Noho                   100
Roosevelt Island        26
Soho                   100
Stuyvesant Town         17
Sutton Place           100
Tribeca                 88
Tudor City              81
Turtle Bay             100
Upper East Side         97
Upper West Side

In [23]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 323 uniques categories.


<b>Convert Manhattan neighborhood venue data to dataframe with each category a column</b>

In [24]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighbourhood'] = manhattan_venues['Neighborhood'] 
# notice the onehot slight name differences in neighborhood

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Check Cashing Service,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Arts Building,College Bookstore,College Cafeteria,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cooking School,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Doctor's Office,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,Empanada Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Eye Doctor,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Kitchen Supply Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Moving Target,Museum,Music School,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Smoothie Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [25]:
print(manhattan_onehot.shape)

(3201, 324)


<b> Process Toronto data to a similar onehot format </b>

<b> Create Toronto Postal Code DataFrame</b>

In [26]:
# using panda to directly read from wiki page and assign the table to a pandas frame
df_tor = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df_tor.shape

(180, 3)

In [27]:
# Delete rows with a borough that is Not assigned.

df_tor_bor = df_tor[df_tor['Borough'] != 'Not assigned']
df_tor_bor.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


check if any Neighbourhood is with value "Not assigned" 

In [28]:
mask = (df_tor_bor['Neighbourhood'] == 'Not assigned')
mask.sum()

0

get geo data from spreadsheet

In [29]:
!wget -q -O 'geo_data.csv' https://cocl.us/Geospatial_data
print('Data downloaded!')
# convert to a dataframe
df_geo = pd.read_csv('geo_data.csv')
df_geo.head()

Data downloaded!


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [30]:
df_geo.shape

(103, 3)

merge neighborhood data with geo data to add latitude/longitude for each postal code

In [31]:
df_tor_bor_geo = df_tor_bor

# merge neighborhood data to add latitude/longitude for each postal code
df_tor_bor_geo = df_tor_bor_geo.join(df_geo.set_index('Postal Code'), on='Postal Code')
df_tor_bor_geo.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Quick check glance Toronto neighborhoods

In [32]:
print('The Toronto has {} boroughs and {} neighborhoods.'.format(
        len(df_tor_bor_geo['Borough'].unique()),
        df_tor_bor_geo.shape[0]
    )
)

The Toronto has 10 boroughs and 103 neighborhoods.


 reformat neighbourhood convert multiple values to multiple rows, then we drop postal code

In [33]:
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
    return list(chain.from_iterable(s.str.split(',')))

# calculate lengths of splits
lens = df_tor_bor_geo['Neighbourhood'].str.split(',').map(len)

# create new dataframe, repeating or chaining as appropriate
df_temp = pd.DataFrame({'Neighbourhood': chainer(df_tor_bor_geo['Neighbourhood']),
                    'Latitude': np.repeat(df_tor_bor_geo['Latitude'], lens),
                    'Longitude': np.repeat(df_tor_bor_geo['Longitude'], lens)})

# drop duplicate records
df_tor_bor_geo = df_temp.drop_duplicates(subset='Neighbourhood')
df_tor_bor_geo.shape

(209, 3)

create map for visual

In [34]:
tor_lat = 43.780918
tor_long = -79.421371
# create map of Toronto using latitude and longitude values
map_toronto  = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_tor_bor_geo['Latitude'], df_tor_bor_geo['Longitude'], df_tor_bor_geo['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Get Toronto neighborhood data

In [35]:
df_tor_data = df_tor_bor_geo.reset_index(drop=True)

df_tor_venues = getNearbyVenues(names=df_tor_data['Neighbourhood'],
                                   latitudes=df_tor_data['Latitude'],
                                   longitudes=df_tor_data['Longitude']
                                  )

In [36]:
print(df_tor_venues.shape)
df_tor_venues.head()

(4214, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [37]:
len(list(df_tor_venues['Neighborhood'].unique()))
# notice lost 5 neighborhoods

204

In [38]:
# visual inspection, commented out for easy read of shared notebook
# df_tor_venues.groupby('Neighborhood')['Neighborhood'].count()

In [39]:
print('There are {} uniques categories.'.format(len(df_tor_venues['Venue Category'].unique())))

There are 271 uniques categories.


<b> Create Toronto One hot (notice neiborhood --> neighbourhood due to neighborhood is a value in venue category)</b>

In [40]:
# one hot encoding
df_tor_onehot = pd.get_dummies(df_tor_venues[['Venue Category']], prefix="", prefix_sep="")
print(df_tor_onehot.shape)
# print(df_tor_onehot.head())

# add neighborhood column back to dataframe
df_tor_onehot['Neighbourhood'] = df_tor_venues['Neighborhood'] 
# venue category already has a cat = Neighborhood, so use one with -u- for original column name
# print(df_tor_onehot['Neighborhood'][:5])
# print(df_tor_onehot.columns[-1])

# move neighborhood column to the first column
fixed_columns = [df_tor_onehot.columns[-1]] + list(df_tor_onehot.columns[:-1])
df_tor_onehot = df_tor_onehot[fixed_columns]

print(df_tor_onehot.shape)
df_tor_onehot.head()

(4214, 271)
(4214, 272)


Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Auditorium,College Cafeteria,College Gym,College Rec Center,College Stadium,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hakka Restaurant,Harbor / Marina,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Kids Store,Kitchen Supply Store,Korean BBQ Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Locksmith,Lounge,Luggage Store,Malay Restaurant,Market,Martial Arts School,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Plane,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,River,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Social Club,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swim School,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<b> Create a common set of categories between New York Manhattan and Toronto </b>

This is necessary to apply models created based on New York data to Toronto data

In [41]:
cats_ny = list(manhattan_onehot.columns)
cats_ny.pop(0)
print(len(cats_ny))
cats_tor = list(df_tor_onehot.columns)
cats_tor.pop(0)
print(len(cats_tor))

cats_com = ['Neighbourhood']+list(set(cats_ny).intersection(cats_tor))
len(cats_com)

323
271


210

In [42]:
manhattan_onehot_2 = manhattan_onehot[cats_com]
df_tor_onehot_2 =df_tor_onehot[cats_com]
manhattan_onehot_2.shape, df_tor_onehot_2.shape

((3201, 210), (4214, 210))

<b> Prepare NY data and Toronto data for Modeling and Applying model</b>

In [43]:
manhattan_grouped = manhattan_onehot_2.groupby('Neighbourhood').mean().reset_index()
print(manhattan_grouped.shape)

toronto_grouped = df_tor_onehot_2.groupby('Neighbourhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(40, 210)
(204, 210)


Unnamed: 0,Neighbourhood,Mobile Phone Shop,Social Club,Gym / Fitness Center,Trail,Hostel,Gaming Cafe,Cajun / Creole Restaurant,Taiwanese Restaurant,Noodle House,Bus Station,Chocolate Shop,Bookstore,Convenience Store,College Arts Building,Women's Store,Shoe Store,Skate Park,Theater,Creperie,Filipino Restaurant,Supplement Shop,Steakhouse,American Restaurant,Cocktail Bar,Wings Joint,Pharmacy,Italian Restaurant,Dance Studio,Pizza Place,Martial Arts School,Dog Run,French Restaurant,Juice Bar,Sculpture Garden,Coworking Space,Candy Store,Drugstore,Strip Club,Bridal Shop,Camera Store,Bagel Shop,Gay Bar,Bar,Bistro,Cosmetics Shop,Korean Restaurant,Snack Place,Accessories Store,Southern / Soul Food Restaurant,Soccer Field,Molecular Gastronomy Restaurant,Salon / Barbershop,College Cafeteria,Gourmet Shop,Shopping Mall,Latin American Restaurant,Food Truck,Theme Restaurant,Hobby Shop,Tea Room,Miscellaneous Shop,Modern European Restaurant,Sushi Restaurant,Pool,Antique Shop,Garden,Moroccan Restaurant,Dessert Shop,Fish Market,Record Shop,Fried Chicken Joint,Health & Beauty Service,Ramen Restaurant,Thrift / Vintage Store,Indian Restaurant,Rental Car Location,Boutique,Brazilian Restaurant,Indie Movie Theater,Smoke Shop,Deli / Bodega,Electronics Store,Middle Eastern Restaurant,Jewelry Store,Dim Sum Restaurant,Flea Market,Event Space,Music Venue,Office,German Restaurant,Spa,Gastropub,Speakeasy,Malay Restaurant,Discount Store,Food & Drink Shop,Mexican Restaurant,Art Gallery,Diner,Kids Store,River,Bubble Tea Shop,Breakfast Spot,Grocery Store,Hotel,Restaurant,Train Station,Sports Bar,Vegetarian / Vegan Restaurant,Yoga Studio,Medical Center,Museum,Beer Bar,Coffee Shop,Pet Store,Poke Place,Burrito Place,Optical Shop,Climbing Gym,Ethiopian Restaurant,Donut Shop,Bank,Massage Studio,Dumpling Restaurant,Athletics & Sports,Golf Course,Food Court,Nightclub,Historic Site,Jazz Club,Hookah Bar,Baby Store,General Entertainment,Wine Bar,Boat or Ferry,Hardware Store,Café,Arts & Crafts Store,Gas Station,Beer Store,Vietnamese Restaurant,Bike Shop,Gift Shop,Turkish Restaurant,Seafood Restaurant,Chinese Restaurant,Burger Joint,Lingerie Store,Furniture / Home Store,Garden Center,Performing Arts Venue,Bakery,Organic Grocery,Butcher,Gym,Caribbean Restaurant,Video Game Store,Sake Bar,Building,Art Museum,Monument / Landmark,Hotel Bar,Health Food Store,Irish Pub,Liquor Store,Roof Deck,Pub,Lounge,Movie Theater,Frozen Yogurt Shop,Opera House,History Museum,Toy / Game Store,Market,Supermarket,Plaza,Cheese Shop,Eastern European Restaurant,New American Restaurant,Mediterranean Restaurant,Park,Soup Place,Sporting Goods Shop,Asian Restaurant,Kitchen Supply Store,Harbor / Marina,Concert Hall,Fountain,Ice Cream Shop,Farmers Market,Sandwich Place,Men's Store,Tennis Court,Clothing Store,Cuban Restaurant,Salad Place,Cupcake Shop,Scenic Lookout,Thai Restaurant,Smoothie Shop,Department Store,BBQ Joint,Greek Restaurant,Fast Food Restaurant,Japanese Restaurant,Playground,Baseball Field,Falafel Restaurant,Tailor Shop
0,Adelaide,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.09,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0
1,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0
2,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
3,Bathurst Quay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Beaumond Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0


<b>Additional information, top venu types for each neighborhood</b> for better understanding by client

 function to sort the venues in descending order.

In [44]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Manhattan top 15 venues for each neighborhood.

In [45]:
num_top_venues = 15
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
manhattan_venues_sorted = pd.DataFrame(columns=columns)
manhattan_venues_sorted['Neighbourhood'] = manhattan_grouped['Neighbourhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    manhattan_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

manhattan_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Battery Park City,Park,Hotel,Gym,Coffee Shop,Clothing Store,Burger Joint,Boat or Ferry,Food Court,Pizza Place,Gourmet Shop,Plaza,Shopping Mall,Playground,Movie Theater,Steakhouse
1,Carnegie Hill,Coffee Shop,Café,French Restaurant,Italian Restaurant,Gym / Fitness Center,Yoga Studio,Bookstore,Gym,Pizza Place,Cocktail Bar,Vietnamese Restaurant,Cosmetics Shop,Indian Restaurant,Bar,Grocery Store
2,Central Harlem,Chinese Restaurant,Cosmetics Shop,Seafood Restaurant,French Restaurant,American Restaurant,Bar,Southern / Soul Food Restaurant,Café,Caribbean Restaurant,Dessert Shop,Gym,Bagel Shop,Fried Chicken Joint,Jazz Club,Juice Bar
3,Chelsea,Coffee Shop,Art Gallery,Bakery,American Restaurant,Café,Italian Restaurant,Hotel,Bookstore,French Restaurant,Market,Park,Theater,Ice Cream Shop,Bar,Cupcake Shop
4,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Dessert Shop,Spa,Salon / Barbershop,Optical Shop,Coffee Shop,Asian Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Ice Cream Shop,Sandwich Place,Bar


Toronto top 15 venues for each neighborhood.

In [46]:
# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_venues_sorted.head()


Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Adelaide,Coffee Shop,Café,Hotel,Gym,Restaurant,Bar,Thai Restaurant,Clothing Store,American Restaurant,Steakhouse,Cosmetics Shop,Concert Hall,Pizza Place,Sushi Restaurant,Bookstore
1,Agincourt North,Playground,Bakery,Park,Tailor Shop,Antique Shop,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market,Dessert Shop,Moroccan Restaurant,Garden
2,Albion Gardens,Pizza Place,Grocery Store,Beer Store,Fried Chicken Joint,Sandwich Place,Pharmacy,Liquor Store,Fast Food Restaurant,Latin American Restaurant,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Shopping Mall,Record Shop
3,Bathurst Quay,Rental Car Location,Harbor / Marina,Boat or Ferry,Sculpture Garden,Bar,Coffee Shop,Boutique,Dessert Shop,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market
4,Beaumond Heights,Pizza Place,Grocery Store,Beer Store,Fried Chicken Joint,Sandwich Place,Pharmacy,Liquor Store,Fast Food Restaurant,Latin American Restaurant,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Shopping Mall,Record Shop


Now we have both Manhattan and Toronto nieghbourhood data prepared, ready for modeling.

#### 2-B. Build K-Means model on Manhattan data and Apply to Toronto

We use k-means model, because this is an unsupervised learning. 

Additionally, the idea way to determine the number of clusters is to run an evaluation on different number and find the inflection point for the best cluster number. consider that we already did this in ML course, for the sake of my time (already spent so much), I'm just going to use a predetermined number. see below:

 I decided to set k=6. This is fairly large number for a cluster model. The reason for this is to have a narrower or smaller number of neighborhoods taht the "Lincoln Square" will fall on; similarly a smaller number of neighborhoods in Toronto. Which gives the company samller area to choose their potential location.

<b> Build k-means model on Manhattan data</b>

Build model

In [47]:
# set number of clusters
kclusters = 6

manhattan_grouped_clustering = manhattan_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 0, 2, 5, 2, 2, 4, 2, 0], dtype=int32)

add cluster numbering to the modeling/neighborhood data

In [48]:
# add clustering labels
manhattan_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [49]:
manhattan_merged = manhattan_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(manhattan_venues_sorted.set_index('Neighbourhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,0,Gym,Sandwich Place,Discount Store,Coffee Shop,Yoga Studio,Donut Shop,Deli / Bodega,Seafood Restaurant,Kids Store,Video Game Store,Pizza Place,Pharmacy,Steakhouse,Supplement Shop,Ice Cream Shop
1,Manhattan,Chinatown,40.715618,-73.994279,5,Chinese Restaurant,Bakery,Cocktail Bar,American Restaurant,Dessert Shop,Spa,Salon / Barbershop,Optical Shop,Coffee Shop,Asian Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Ice Cream Shop,Sandwich Place,Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,0,Café,Bakery,Grocery Store,Mobile Phone Shop,Sandwich Place,Latin American Restaurant,Chinese Restaurant,Bank,Donut Shop,Gym,Coffee Shop,Supermarket,New American Restaurant,Park,Supplement Shop
3,Manhattan,Inwood,40.867684,-73.92121,4,Mexican Restaurant,Café,Restaurant,Lounge,Bakery,Wine Bar,Caribbean Restaurant,Park,Frozen Yogurt Shop,Deli / Bodega,Pizza Place,Chinese Restaurant,Juice Bar,Dog Run,Latin American Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Sushi Restaurant,Park,Cocktail Bar,Liquor Store,Caribbean Restaurant,Chinese Restaurant,Sandwich Place,Latin American Restaurant,Bakery,Yoga Studio


find cluster number for "Lincoln Square"

In [50]:
Target_Cluster_num = manhattan_merged[manhattan_merged['Neighborhood']=='Lincoln Square']['Cluster Labels'].iloc[0]
print('Lincoln Square has cluster number {}'.format(Target_Cluster_num))

Lincoln Square has cluster number 2


<b>apply built k-means model Toronto data</b>

apply model

In [51]:
toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)
toronto_cluster = kmeans.predict(toronto_grouped_clustering)

In [52]:
type(toronto_cluster)

numpy.ndarray

Insert cluster labels back to toronto data 

In [53]:
# add clustering labels
toronto_venues_sorted.insert(0, 'Cluster Labels', toronto_cluster)

In [54]:
toronto_venues_sorted.head()

Unnamed: 0,Cluster Labels,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
0,0,Adelaide,Coffee Shop,Café,Hotel,Gym,Restaurant,Bar,Thai Restaurant,Clothing Store,American Restaurant,Steakhouse,Cosmetics Shop,Concert Hall,Pizza Place,Sushi Restaurant,Bookstore
1,3,Agincourt North,Playground,Bakery,Park,Tailor Shop,Antique Shop,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market,Dessert Shop,Moroccan Restaurant,Garden
2,4,Albion Gardens,Pizza Place,Grocery Store,Beer Store,Fried Chicken Joint,Sandwich Place,Pharmacy,Liquor Store,Fast Food Restaurant,Latin American Restaurant,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Shopping Mall,Record Shop
3,0,Bathurst Quay,Rental Car Location,Harbor / Marina,Boat or Ferry,Sculpture Garden,Bar,Coffee Shop,Boutique,Dessert Shop,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market
4,4,Beaumond Heights,Pizza Place,Grocery Store,Beer Store,Fried Chicken Joint,Sandwich Place,Pharmacy,Liquor Store,Fast Food Restaurant,Latin American Restaurant,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Shopping Mall,Record Shop


Merge with original (non-modeling) toronto geo data

In [55]:
toronto_merged = df_tor_bor_geo

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue,13th Most Common Venue,14th Most Common Venue,15th Most Common Venue
2,Parkwoods,43.753259,-79.329656,3.0,Park,Food & Drink Shop,Antique Shop,Rental Car Location,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market,Dessert Shop,Moroccan Restaurant,Garden,Tailor Shop
3,Victoria Village,43.725882,-79.315572,2.0,Pizza Place,French Restaurant,Coffee Shop,Tailor Shop,Moroccan Restaurant,Rental Car Location,Indian Restaurant,Thrift / Vintage Store,Ramen Restaurant,Health & Beauty Service,Fried Chicken Joint,Record Shop,Fish Market,Dessert Shop,Garden
4,Regent Park,43.65426,-79.360636,0.0,Coffee Shop,Pub,Bakery,Park,Café,Breakfast Spot,Theater,Hotel,Art Gallery,Performing Arts Venue,Beer Store,Antique Shop,Historic Site,Dessert Shop,Bank
4,Harbourfront,43.65426,-79.360636,0.0,Coffee Shop,Pub,Bakery,Park,Café,Breakfast Spot,Theater,Hotel,Art Gallery,Performing Arts Venue,Beer Store,Antique Shop,Historic Site,Dessert Shop,Bank
5,Lawrence Manor,43.718518,-79.464763,2.0,Clothing Store,Accessories Store,Women's Store,Gift Shop,Coffee Shop,Event Space,Vietnamese Restaurant,Furniture / Home Store,Boutique,Fried Chicken Joint,Record Shop,Dessert Shop,Moroccan Restaurant,Health & Beauty Service,Ramen Restaurant


<b> List all toronto neighborhoods with cluster number 2</b>

In [56]:
Target_neighbourhoods = toronto_merged.loc[toronto_merged['Cluster Labels']==float(Target_Cluster_num)]
Target_neighbourhoods.shape

(50, 19)

In [57]:
print('There are {} neighborhoods similar to "Lincoln Square"'.format(Target_neighbourhoods.shape[0]))

There are 50 neighborhoods similar to "Lincoln Square"


Print names of those neighborhoods

In [58]:
tgt_tor_bor = Target_neighbourhoods[['Neighbourhood','Latitude', 'Longitude']].reset_index(drop=True)
tgt_tor_bor

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Victoria Village,43.725882,-79.315572
1,Lawrence Manor,43.718518,-79.464763
2,Lawrence Heights,43.718518,-79.464763
3,Garden District,43.657162,-79.378937
4,Ryerson,43.657162,-79.378937
5,West Deane Park,43.650943,-79.554724
6,Princess Gardens,43.650943,-79.554724
7,Martin Grove,43.650943,-79.554724
8,Islington,43.650943,-79.554724
9,Cloverdale,43.650943,-79.554724


<b> Put those neighborhoods on map</b>

In [59]:
tor_lat = 43.780918
tor_long = -79.421371
# create map of Toronto using latitude and longitude values
map_toronto  = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# add markers to map
for lat, lng, label in zip(tgt_tor_bor['Latitude'], tgt_tor_bor['Longitude'], tgt_tor_bor['Neighbourhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

The map shows neighborhoods have similar "characteritic" of "lincoln Square", which give the compay the most chance to have a sucessful new store. As we recall earlier, some neighborhoods are on the same postal code, so while there are 49 neighborhoods, the total different neighborhood locations differenticated by latitudes and logitudes are only about 20.