# Requirements:

For this week, you will required to submit the following:

1. A description of the problem and a discussion of the background. (15 marks)
2. A description of the data and how it will be used to solve the problem. (15 marks)

# Project Description

In the city of New York, I want to open a new grocery shop. I want to find the best place to open it.  

We will use the data from Foursquare about venues in New York City and use KNN cluster to decide location which has high concentration of good Shop Venue. That will be the interesting point to open a new Shop. 

# Import Data for venues in NY

In [2]:
# Import Libraries

import numpy as np # library to handle data in a vectorized manner
import wget

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Download and Explore Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood.

In [3]:
#!wget -q -o'newyork_data.json' https://cocl.us/new_york_dataset
#url  = 'https://cocl.us/new_york_dataset'
#file = wget.download(url)
print('Data downloaded!')

Data downloaded!


In [4]:
with open('new_york_dataset') as json_data:
    newyork_data = json.load(json_data)

In [5]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [6]:
# Take the list of neighbor from features key from Json file
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

In [7]:
# Transform the data into pandas df

# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


In [8]:
# loop through data and fill dataframe one row at a time

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [10]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### use geopy library to get lat and long values on NYC

In [11]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Create map of NY and its neighborhoods

In [12]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'L5ZVQIFSZXJWKXR3131RTBYXLULHLMZB0M1QXDEMUNTMJWKD' # your Foursquare ID
CLIENT_SECRET = '1IAZIRFGEEACMPJ2XX0VIEQX4VNQWLTGSXQANIZCVL4UNFVW' # your Foursquare Secret
VERSION = '20191212' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: L5ZVQIFSZXJWKXR3131RTBYXLULHLMZB0M1QXDEMUNTMJWKD
CLIENT_SECRET:1IAZIRFGEEACMPJ2XX0VIEQX4VNQWLTGSXQANIZCVL4UNFVW


In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

request all venues with 500m radius from all neighborhoods in NY

In [15]:
newyork_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

In [16]:
newyork_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Lollipops Gelato,40.894123,-73.845892,Dessert Shop
1,Wakefield,40.894705,-73.847201,Rite Aid,40.896649,-73.844846,Pharmacy
2,Wakefield,40.894705,-73.847201,Carvel Ice Cream,40.890487,-73.848568,Ice Cream Shop
3,Wakefield,40.894705,-73.847201,Dunkin',40.890459,-73.849089,Donut Shop
4,Wakefield,40.894705,-73.847201,Shell,40.894187,-73.845862,Gas Station


In [17]:
newyork_venues.shape

(10245, 7)

In [18]:
newyork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,33,33,33,33,33,33
Annadale,11,11,11,11,11,11
Arden Heights,6,6,6,6,6,6
Arlington,5,5,5,5,5,5
Arrochar,24,24,24,24,24,24
Arverne,18,18,18,18,18,18
Astoria,100,100,100,100,100,100
Astoria Heights,13,13,13,13,13,13
Auburndale,19,19,19,19,19,19
Bath Beach,53,53,53,53,53,53


In [19]:
# Unique categories
print('There are {} uniques categories.'.format(len(newyork_venues['Venue Category'].unique())))

There are 433 uniques categories.


## Analyze 

What is the categories of venue in NY?

In [20]:
newyork_venues['Venue Category'].unique()

array(['Dessert Shop', 'Pharmacy', 'Ice Cream Shop', 'Donut Shop',
       'Gas Station', 'Caribbean Restaurant', 'Sandwich Place',
       'Laundromat', 'Discount Store', 'Mattress Store', 'Pizza Place',
       'Bagel Shop', 'Fast Food Restaurant', 'Grocery Store',
       'Restaurant', 'Baseball Field', 'Chinese Restaurant',
       'Salon / Barbershop', 'Gift Shop', 'Park', 'Bus Station',
       'Accessories Store', 'Diner', 'Seafood Restaurant',
       'Deli / Bodega', 'Bowling Alley', 'Bus Stop', 'Platform',
       'Metro Station', 'Convenience Store', 'Cosmetics Shop', 'Plaza',
       'River', 'Bank', 'Food Truck', 'Home Service', 'Gym',
       'Gourmet Shop', 'Latin American Restaurant', 'Pub', 'Beer Bar',
       'Warehouse Store', 'Spanish Restaurant', 'Burger Joint',
       'Coffee Shop', 'Mexican Restaurant', 'Bar', 'Wings Joint', 'Trail',
       'Supermarket', 'Thrift / Vintage Store', 'Bakery',
       'Breakfast Spot', 'Candy Store', 'Café', 'Rental Car Location',
       'Fried

Find all venue which is Store

In [21]:
ny_allstore = newyork_venues[newyork_venues['Venue Category'].str.contains('Store')].reset_index(drop=True)
ny_allstore.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Co-op City,40.874294,-73.829939,Dollar Tree,40.870125,-73.828989,Discount Store
1,Co-op City,40.874294,-73.829939,Mattress Firm,40.872234,-73.828607,Mattress Store
2,Co-op City,40.874294,-73.829939,Food Universe Marketplace,40.87674,-73.82898,Grocery Store
3,Co-op City,40.874294,-73.829939,Jimmy Jazz,40.873286,-73.82437,Accessories Store
4,Eastchester,40.887556,-73.827806,Adil Newsstand & Grocery,40.888433,-73.831277,Convenience Store


In [22]:
ny_allstore['Venue Category'].unique()

array(['Discount Store', 'Mattress Store', 'Grocery Store',
       'Accessories Store', 'Convenience Store', 'Warehouse Store',
       'Thrift / Vintage Store', 'Candy Store', 'Shipping Store',
       'Pet Store', 'Liquor Store', 'Department Store', 'Kids Store',
       "Men's Store", 'Electronics Store', 'Shoe Store', 'Clothing Store',
       'Video Game Store', 'Outlet Store', 'Music Store',
       'Paper / Office Supplies Store', 'Furniture / Home Store',
       'Video Store', "Women's Store", 'Lingerie Store',
       'Toy / Game Store', 'Arts & Crafts Store', 'Beer Store',
       'Vape Store', 'Fruit & Vegetable Store', 'Hardware Store',
       'Jewelry Store', 'Herbs & Spices Store', 'Stationery Store',
       'Big Box Store', 'Kitchen Supply Store', 'Health Food Store',
       'Frame Store', 'Baby Store', 'Camera Store', 'Leather Goods Store'],
      dtype=object)

My store will sell similar goods as many stores so it's good to take only competitive stores in the consideration

In [25]:
#store_list = ['Grocery Store', 'Convenience Store', 'Liquor Store', 
#              'Fruit & Vegetable Store', 'Paper / Office Supplies Store',
#            'Kitchen Supply Store',  'Outdoor Supply Store']

In [None]:
#newyork_venues[newyork_venues['Venue Category']== store_list]

In [26]:
ny_allstore.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,3,3,3,3,3,3
Arrochar,2,2,2,2,2,2
Astoria,3,3,3,3,3,3
Auburndale,4,4,4,4,4,4
Bath Beach,5,5,5,5,5,5
Battery Park City,14,14,14,14,14,14
Bay Ridge,10,10,10,10,10,10
Bay Terrace,18,18,18,18,18,18
Baychester,7,7,7,7,7,7
Bayside,5,5,5,5,5,5


## Analyze Neighborhood

In [27]:
# one hot encoding
ny_allstore_onehot = pd.get_dummies(ny_allstore[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ny_allstore_onehot['Neighborhood'] = ny_allstore['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ny_allstore_onehot.columns[-1]] + list(ny_allstore_onehot.columns[:-1])
ny_allstore_onehot = ny_allstore_onehot[fixed_columns]

ny_allstore_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Arts & Crafts Store,Baby Store,Beer Store,Big Box Store,Camera Store,Candy Store,Clothing Store,Convenience Store,Department Store,Discount Store,Electronics Store,Frame Store,Fruit & Vegetable Store,Furniture / Home Store,Grocery Store,Hardware Store,Health Food Store,Herbs & Spices Store,Jewelry Store,Kids Store,Kitchen Supply Store,Leather Goods Store,Lingerie Store,Liquor Store,Mattress Store,Men's Store,Music Store,Outlet Store,Paper / Office Supplies Store,Pet Store,Shipping Store,Shoe Store,Stationery Store,Thrift / Vintage Store,Toy / Game Store,Vape Store,Video Game Store,Video Store,Warehouse Store,Women's Store
0,Co-op City,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Co-op City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Co-op City,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Co-op City,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Eastchester,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
ny_allstore_onehot.shape

(918, 42)

In [29]:
ny_allstore_grouped = ny_allstore_onehot.groupby('Neighborhood').mean().reset_index()
ny_allstore_grouped

Unnamed: 0,Neighborhood,Accessories Store,Arts & Crafts Store,Baby Store,Beer Store,Big Box Store,Camera Store,Candy Store,Clothing Store,Convenience Store,Department Store,Discount Store,Electronics Store,Frame Store,Fruit & Vegetable Store,Furniture / Home Store,Grocery Store,Hardware Store,Health Food Store,Herbs & Spices Store,Jewelry Store,Kids Store,Kitchen Supply Store,Leather Goods Store,Lingerie Store,Liquor Store,Mattress Store,Men's Store,Music Store,Outlet Store,Paper / Office Supplies Store,Pet Store,Shipping Store,Shoe Store,Stationery Store,Thrift / Vintage Store,Toy / Game Store,Vape Store,Video Game Store,Video Store,Warehouse Store,Women's Store
0,Allerton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.333333,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arrochar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Astoria,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Auburndale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Bath Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.2
5,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.214286,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286
6,Bay Ridge,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0
7,Bay Terrace,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.111111,0.055556,0.0,0.111111,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.111111,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.111111
8,Baychester,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.285714,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Bayside,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = ny_allstore_grouped['Neighborhood']

for ind in np.arange(ny_allstore_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(ny_allstore_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Allerton,Department Store,Grocery Store,Discount Store,Jewelry Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
1,Arrochar,Liquor Store,Women's Store,Discount Store,Health Food Store,Hardware Store,Grocery Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
2,Astoria,Grocery Store,Liquor Store,Women's Store,Discount Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
3,Auburndale,Pet Store,Furniture / Home Store,Toy / Game Store,Discount Store,Convenience Store,Hardware Store,Grocery Store,Fruit & Vegetable Store,Frame Store,Electronics Store
4,Bath Beach,Women's Store,Video Store,Video Game Store,Liquor Store,Clothing Store,Discount Store,Hardware Store,Grocery Store,Furniture / Home Store,Fruit & Vegetable Store


# Cluster Neighborhoods

In [30]:
# set number of clusters
kclusters = 5

ny_allstore_grouped_clustering = ny_allstore_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ny_allstore_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 3, 2, 2, 2, 2, 2, 2, 2])

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ny_allstore_merged = ny_allstore

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ny_allstore_merged = ny_allstore_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

ny_allstore_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Co-op City,40.874294,-73.829939,Dollar Tree,40.870125,-73.828989,Discount Store,2,Accessories Store,Discount Store,Grocery Store,Mattress Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
1,Co-op City,40.874294,-73.829939,Mattress Firm,40.872234,-73.828607,Mattress Store,2,Accessories Store,Discount Store,Grocery Store,Mattress Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
2,Co-op City,40.874294,-73.829939,Food Universe Marketplace,40.87674,-73.82898,Grocery Store,2,Accessories Store,Discount Store,Grocery Store,Mattress Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
3,Co-op City,40.874294,-73.829939,Jimmy Jazz,40.873286,-73.82437,Accessories Store,2,Accessories Store,Discount Store,Grocery Store,Mattress Store,Health Food Store,Hardware Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store
4,Eastchester,40.887556,-73.827806,Adil Newsstand & Grocery,40.888433,-73.831277,Convenience Store,4,Convenience Store,Women's Store,Discount Store,Health Food Store,Hardware Store,Grocery Store,Furniture / Home Store,Fruit & Vegetable Store,Frame Store,Electronics Store


In [35]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ny_allstore_merged['Venue Latitude'], ny_allstore_merged['Venue Longitude'], ny_allstore_merged['Neighborhood'], ny_allstore_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters