# Battle of the Neighborhoods | Coursera Capstone Project
### Brooklyn, NY vs Scarborough, Toronto, CA
___

## Introduction 

#### The Horizon Corporation is a looking to expand by opening a second North American office. The stakeholders of the company have narrowed down this new location to either Brooklyn, New York or Scarborough in Toronto, CA. The Horizon Corporation is mid-size company with approximately 350 employees and is growing rapidly. Because about 150 employees will be relocating to open the new location, the company is looking to gain insights about the surrounding businesses and venues in Brooklyn and Scarborough to determine which location will be most beneficial to its employees and to the company as a whole. 

#### This project will use neighborhood data from each city to make comparisons and contrasts in order make a determination on which location will be a good fit for the culture of the organization. 



### Begin by importing the necessary libraries and modules.

In [2]:
import requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         395 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

In [3]:
pip install BeautifulSoup4

Collecting BeautifulSoup4
[?25l  Downloading https://files.pythonhosted.org/packages/66/25/ff030e2437265616a1e9b25ccc864e0371a0bc3adb7c5a404fd661c6f4f6/beautifulsoup4-4.9.1-py3-none-any.whl (115kB)
[K     |████████████████████████████████| 122kB 21.4MB/s eta 0:00:01
[?25hCollecting soupsieve>1.2 (from BeautifulSoup4)
  Downloading https://files.pythonhosted.org/packages/6f/8f/457f4a5390eeae1cc3aeab89deb7724c965be841ffca6cfca9197482e470/soupsieve-2.0.1-py3-none-any.whl
Installing collected packages: soupsieve, BeautifulSoup4
Successfully installed BeautifulSoup4-4.9.1 soupsieve-2.0.1
Note: you may need to restart the kernel to use updated packages.


In [4]:
pip install lxml

Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/55/6f/c87dffdd88a54dd26a3a9fef1d14b6384a9933c455c54ce3ca7d64a84c88/lxml-4.5.1-cp36-cp36m-manylinux1_x86_64.whl (5.5MB)
[K     |████████████████████████████████| 5.5MB 4.0MB/s eta 0:00:01     |██████████████████████▋         | 3.9MB 4.0MB/s eta 0:00:01     |██████████████████████████████▊ | 5.3MB 4.0MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.5.1
Note: you may need to restart the kernel to use updated packages.


## Segmenting and Clustering Neighborhoods in Toronto
#### Use data from Toronto Neighborhood Wikipedia page to segment, cluster and explore neighborhoods in Toronto

## Step 1: Download and Arrange Toronto Dataset

The following dataset containing a list of postal codes, boroughs, and neighborhood names within the city of Toronto, CA is found at https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.  

In [5]:
#Acqurie Toronto Postal Code, Borough, and Neighborhood information:
table = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', header = 0)
df_toronto = table[0]
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


### Transform the Data

In [93]:
df_toronto.rename(columns = {"Postal Code": "PostalCode", "Neighbourhood": "Neighborhood"}, inplace = True)

#Process the cells containing an assigned borough. Ignore cells with a borough that is "Not assigned".
df_toronto.drop(df_toronto[df_toronto.Borough == 'Not assigned'].index, inplace=True)

#Combine the neighborhoods that exists in one postal code
df_toronto = df_toronto.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(lambda x: ','.join(x)).reset_index()

#Change unassigned Neighborhood to its Borough's name
df_toronto.loc[85,'Neighborhood'] = 'Queen\'s Park'

print (df_toronto.shape)
df_toronto

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


### Add latitude and longitude information to the neighborhood table

In [7]:
#Create a dataframe of the latitude and longitudes of the Toronto Neighborhoods
toronto_ll = pd.read_csv("http://cocl.us/Geospatial_data")
toronto_ll.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
toronto_ll.rename(columns = {"Postal Code": "PostalCode"}, inplace = True)
toronto_ll.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [9]:
toronto_ll.shape

(103, 3)

In [92]:
#Combine the Latitude and Longitude dataframe to Neighborhoods dataframe
df_toronto.set_index("PostalCode")
toronto_ll.set_index("PostalCode")
toronto_data=pd.merge(df_toronto, toronto_ll)
toronto_data

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


In [11]:
print('Toronto has {} boroughs and {} neighborhoods.'.format(
        len(toronto_data['Borough'].unique()),
        toronto_data.shape[0]
    )
)

Toronto has 10 boroughs and 103 neighborhoods.


### Use GEOPY library to find the latitude and longitude values of Toronto, Canada.

In [12]:
address = 'Toronto, CA'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Canada are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Toronto, Canada are 43.6534817, -79.3839347.


### Create a map of Toronto with neighborhoods superimposed on top.

In [13]:
# create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'], toronto_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(toronto_map)  
    
toronto_map

### Isolate the **Scarborough** neighborhood data.

We will simplify our Toronto data by narrowing our choice to the neighborhoods contained in the borough of  Scarborough, Toronto

In [91]:
df_scarborough = toronto_data[toronto_data['Borough'] == 'Scarborough'].reset_index(drop=True)
df_scarborough

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [15]:
df_scarborough.shape

(17, 5)

### Get the geographical coordinates of Scarborough

In [16]:
address = 'Scarborough, Toronto'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Scarborough, CA are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Scarborough, CA are 43.773077, -79.257774.


In [17]:
# create map of Scarborough using latitude and longitude values
scarborough_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_scarborough['Latitude'], df_scarborough['Longitude'], df_scarborough['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(scarborough_map)  
    
scarborough_map

## Step 2: Use Foursquare API to explore Scarborough neighborhoods

### Define Foursquare credentials and version

In [18]:
CLIENT_ID = '*******' # your Foursquare ID
CLIENT_SECRET = '*******' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PLY4EEOMDIHOV5LPWRVCSNEJSBCZJTRT5M1PBMGWXDP5CCOI
CLIENT_SECRET:WGILVD4Z3515GALQPTVBCJP2YXYIP0QYFWVF30MLJI5ISJF5


### 2.2 Write a function to explore neighborhoods

In [19]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
    
    return(nearby_venues)

In [20]:
scarborough_venues = getNearbyVenues(names=df_scarborough['Neighborhood'],
                                   latitudes=df_scarborough['Latitude'],
                                   longitudes=df_scarborough['Longitude']
                                  )

Found 90 venues in 17 neighborhoods.


In [21]:
print(scarborough_venues.shape)
scarborough_venues.head()

(90, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Chris Effects Painting,43.784343,-79.163742,Construction & Landscaping
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


In [22]:
# Display the number of venues per Neighborhood
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Birch Cliff, Cliffside West",4,4,4,4,4,4
Cedarbrae,8,8,8,8,8,8
"Clarks Corners, Tam O'Shanter, Sullivan",12,12,12,12,12,12
"Cliffside, Cliffcrest, Scarborough Village West",2,2,2,2,2,2
"Dorset Park, Wexford Heights, Scarborough Town Centre",5,5,5,5,5,5
"Golden Mile, Clairlea, Oakridge",10,10,10,10,10,10
"Guildwood, Morningside, West Hill",7,7,7,7,7,7
"Kennedy Park, Ionview, East Birchmount Park",5,5,5,5,5,5
"Malvern, Rouge",1,1,1,1,1,1


In [23]:
print('There are {} distinct venues in {} categories.'.format(
    len(scarborough_venues['Venue'].unique()),len(scarborough_venues['Venue Category'].unique())))

#print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 80 distinct venues in 55 categories.


### Analyze each Neighborhood

In [24]:
# one hot encoding
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough_onehot = scarborough_onehot[fixed_columns]

scarborough_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Burger Joint,Bus Line,...,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Skating Rink,Soccer Field,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Group rows by neighborhood and by taking the mean of each categories occurrence frequency

In [25]:
scarborough_groups = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_groups

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Auto Garage,Bakery,Bank,Bar,Breakfast Spot,Burger Joint,Bus Line,...,Pharmacy,Pizza Place,Playground,Rental Car Location,Sandwich Place,Skating Rink,Soccer Field,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0
1,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
2,Cedarbrae,0.0,0.125,0.0,0.125,0.125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
3,"Clarks Corners, Tam O'Shanter, Sullivan",0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,...,0.083333,0.166667,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0
4,"Cliffside, Cliffcrest, Scarborough Village West",0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Dorset Park, Wexford Heights, Scarborough Town...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2
6,"Golden Mile, Clairlea, Oakridge",0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0
7,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0
8,"Kennedy Park, Ionview, East Birchmount Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Malvern, Rouge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### List the top 5 venues of each neighborhood

In [26]:
top_venues_5 = 5

for hood in scarborough_groups['Neighborhood']:
    print("----"+hood+"----")
    temp = scarborough_groups[scarborough_groups['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(top_venues_5))
    print('\n')

----Agincourt----
                       venue  freq
0               Skating Rink   0.2
1             Breakfast Spot   0.2
2  Latin American Restaurant   0.2
3                     Lounge   0.2
4             Clothing Store   0.2


----Birch Cliff, Cliffside West----
                   venue  freq
0           Skating Rink  0.25
1  General Entertainment  0.25
2                   Café  0.25
3        College Stadium  0.25
4    American Restaurant  0.00


----Cedarbrae----
                  venue  freq
0  Caribbean Restaurant  0.12
1                Bakery  0.12
2                  Bank  0.12
3       Thai Restaurant  0.12
4      Hakka Restaurant  0.12


----Clarks Corners, Tam O'Shanter, Sullivan----
                venue  freq
0         Pizza Place  0.17
1        Noodle House  0.08
2  Chinese Restaurant  0.08
3  Italian Restaurant  0.08
4     Thai Restaurant  0.08


----Cliffside, Cliffcrest, Scarborough Village West----
                 venue  freq
0  American Restaurant   0.5
1             

#### Put the top venues into a *pandas* dataframe

In [27]:
# Write a function to sort the venues in descending order

def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_venues]

Create a new dataframe to display the top 10 venues for each neighborhood

In [28]:
top_venues_10 = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(top_venues_10):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhood_venues = pd.DataFrame(columns=columns)
neighborhood_venues['Neighborhood'] = scarborough_groups['Neighborhood']

for ind in np.arange(scarborough_groups.shape[0]):
    neighborhood_venues.iloc[ind, 1:] = return_most_common_venues(scarborough_groups.iloc[ind, :], top_venues_10)

neighborhood_venues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Clothing Store,Vietnamese Restaurant,Coffee Shop,Gas Station,Fried Chicken Joint,Fast Food Restaurant
1,"Birch Cliff, Cliffside West",General Entertainment,Skating Rink,Café,College Stadium,Vietnamese Restaurant,Clothing Store,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store
2,Cedarbrae,Thai Restaurant,Athletics & Sports,Hakka Restaurant,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Vietnamese Restaurant,College Stadium
3,"Clarks Corners, Tam O'Shanter, Sullivan",Pizza Place,Fried Chicken Joint,Bank,Gas Station,Italian Restaurant,Noodle House,Pharmacy,Rental Car Location,Fast Food Restaurant,Thai Restaurant
4,"Cliffside, Cliffcrest, Scarborough Village West",American Restaurant,Motel,Athletics & Sports,Auto Garage,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
5,"Dorset Park, Wexford Heights, Scarborough Town...",Indian Restaurant,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Bakery,Bank,Gas Station,Fried Chicken Joint,Athletics & Sports,Fast Food Restaurant
6,"Golden Mile, Clairlea, Oakridge",Bus Line,Bakery,Intersection,Ice Cream Shop,Bus Station,Park,Metro Station,Soccer Field,Electronics Store,Discount Store
7,"Guildwood, Morningside, West Hill",Mexican Restaurant,Electronics Store,Bank,Rental Car Location,Breakfast Spot,Intersection,Medical Center,Vietnamese Restaurant,Fried Chicken Joint,Fast Food Restaurant
8,"Kennedy Park, Ionview, East Birchmount Park",Discount Store,Department Store,Convenience Store,Bus Station,Coffee Shop,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint
9,"Malvern, Rouge",Fast Food Restaurant,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Electronics Store,Discount Store,Department Store,Convenience Store


In [29]:
neighborhood_venues.iloc[11,]

Neighborhood              Rouge Hill, Port Union, Highland Creek
1st Most Common Venue                                        Bar
2nd Most Common Venue                 Construction & Landscaping
3rd Most Common Venue                      Vietnamese Restaurant
4th Most Common Venue                             Clothing Store
5th Most Common Venue                      General Entertainment
6th Most Common Venue                                Gas Station
7th Most Common Venue                        Fried Chicken Joint
8th Most Common Venue                       Fast Food Restaurant
9th Most Common Venue                          Electronics Store
10th Most Common Venue                            Discount Store
Name: 11, dtype: object

## Step 3: Cluster Scarborough neighborhoods using K-means

Use the K-means algorithm to separate the neighborhoods into three clusters

In [30]:
# set number of clusters
kclusters = 3

scarborough_clusters = scarborough_groups.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(scarborough_clusters)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 1, 1, 0, 0, 0], dtype=int32)

Create a new dataframe that includes the cluster and top ten venues for each neighborhood

In [31]:
#Note that the neighborhood Upper Rouge does not have any venues, so I will drop from dataset
df_scarborough.drop(df_scarborough[df_scarborough.Neighborhood == 'Upper Rouge'].index, inplace = True)
#df_toronto.drop(df_toronto[df_toronto.Borough == 'Not assigned'].index, inplace=True)

scarborough_merged = df_scarborough

# add clustering labels
scarborough_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarborough_merged = scarborough_merged.join(neighborhood_venues.set_index('Neighborhood'), on='Neighborhood')

scarborough_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0,Fast Food Restaurant,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Electronics Store,Discount Store,Department Store,Convenience Store
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,0,Bar,Construction & Landscaping,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0,Mexican Restaurant,Electronics Store,Bank,Rental Car Location,Breakfast Spot,Intersection,Medical Center,Vietnamese Restaurant,Fried Chicken Joint,Fast Food Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Gym,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,0,Thai Restaurant,Athletics & Sports,Hakka Restaurant,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Vietnamese Restaurant,College Stadium


In [32]:
# create map
cluster_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(cluster_map)
       
cluster_map

## Step 4: Examine the Scarborough neighborhood clusters

#### Scarborough Clusters 0, 1, 2

In [33]:
scarborough_cluster_0 = scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 0, scarborough_merged.columns[[1] + list(range(4, scarborough_merged.shape[1]))]]

scarborough_cluster_1 = scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 1, scarborough_merged.columns[[1] + list(range(4, scarborough_merged.shape[1]))]]

scarborough_cluster_2 = scarborough_merged.loc[scarborough_merged['Cluster Labels'] == 2, scarborough_merged.columns[[1] + list(range(4, scarborough_merged.shape[1]))]]





In [34]:
scarborough_cluster_0

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,-79.194353,0,Fast Food Restaurant,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Electronics Store,Discount Store,Department Store,Convenience Store
1,Scarborough,-79.160497,0,Bar,Construction & Landscaping,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
2,Scarborough,-79.188711,0,Mexican Restaurant,Electronics Store,Bank,Rental Car Location,Breakfast Spot,Intersection,Medical Center,Vietnamese Restaurant,Fried Chicken Joint,Fast Food Restaurant
3,Scarborough,-79.216917,0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Gym,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
4,Scarborough,-79.239476,0,Thai Restaurant,Athletics & Sports,Hakka Restaurant,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Vietnamese Restaurant,College Stadium
5,Scarborough,-79.239476,0,Playground,Construction & Landscaping,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
6,Scarborough,-79.262029,0,Discount Store,Department Store,Convenience Store,Bus Station,Coffee Shop,Vietnamese Restaurant,Clothing Store,General Entertainment,Gas Station,Fried Chicken Joint
7,Scarborough,-79.284577,0,Bus Line,Bakery,Intersection,Ice Cream Shop,Bus Station,Park,Metro Station,Soccer Field,Electronics Store,Discount Store
8,Scarborough,-79.239476,0,American Restaurant,Motel,Athletics & Sports,Auto Garage,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store,Discount Store
13,Scarborough,-79.304302,0,Pizza Place,Fried Chicken Joint,Bank,Gas Station,Italian Restaurant,Noodle House,Pharmacy,Rental Car Location,Fast Food Restaurant,Thai Restaurant


In [35]:
scarborough_cluster_1

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Scarborough,-79.273304,1,Indian Restaurant,Vietnamese Restaurant,Pet Store,Chinese Restaurant,Bakery,Bank,Gas Station,Fried Chicken Joint,Athletics & Sports,Fast Food Restaurant
11,Scarborough,-79.295849,1,Middle Eastern Restaurant,Auto Garage,Bakery,Sandwich Place,Breakfast Spot,Vietnamese Restaurant,Coffee Shop,Gas Station,Fried Chicken Joint,Fast Food Restaurant
12,Scarborough,-79.262029,1,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Clothing Store,Vietnamese Restaurant,Coffee Shop,Gas Station,Fried Chicken Joint,Fast Food Restaurant


In [36]:
scarborough_cluster_2

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Scarborough,-79.264848,2,General Entertainment,Skating Rink,Café,College Stadium,Vietnamese Restaurant,Clothing Store,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Electronics Store


## Step 5: Explore New York City Neighborhoods

New York City has a total of 5 boroughs and 306 neighborhoods.

This dataset containing the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhoodcan be acquired for free from the following link:
https://geo.nyu.edu/catalog/nyu_2451_34572

run a `wget` command to access the data:

In [37]:
!wget -q -O 'newyork_data.json' https://ibm.box.com/shared/static/fbpwbovar7lf8p5sgddm06cgipa2rxpe.json
print('Data downloaded!')

Data downloaded!


#### Load and explore the dataset

In [38]:
with open('newyork_data.json') as json_data:
    df_newyork = json.load(json_data)

#### Because all of the relevant data is in the *features* key, define a new variable that includes this data.

In [39]:
ny_neighborhoods = df_newyork['features']

In [40]:
ny_neighborhoods[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

### Transform the data into a pandas dataframe

In [41]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [42]:
for data in ny_neighborhoods:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_ll = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_ll[1]
    neighborhood_lon = neighborhood_ll[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [43]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [44]:
neighborhoods.shape

(306, 4)

In [45]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Use Geolibrary to get the latitude and longitude of New York City

In [46]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent = 'my-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [47]:
# create map of New York using latitude and longitude values
newyork_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(newyork_map)  
    
newyork_map

#### Isolate the Brooklyn borough data by slicing the original dataframe and creating a new dataframe of the Brooklyn neighborhood data.

In [48]:
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head(50)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471
5,Brooklyn,Brighton Beach,40.576825,-73.965094
6,Brooklyn,Sheepshead Bay,40.58689,-73.943186
7,Brooklyn,Manhattan Terrace,40.614433,-73.957438
8,Brooklyn,Flatbush,40.636326,-73.958401
9,Brooklyn,Crown Heights,40.670829,-73.943291


#### Get the geographical location of Brooklyn, NY

In [49]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent = 'my-application')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Queens are 40.6501038, -73.9495823.


In [50]:
# create map of Brooklyn using latitude and longitude values
brooklyn_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(brooklyn_map)  
    
brooklyn_map

## Part 6: Explore the Bedford Stuyvesant neighborhood in Brooklyn, NY

In [51]:
brooklyn_data.loc[10, 'Neighborhood']

'East Flatbush'

In [52]:
#Bedford Stuyvesant Latitude and Longitude values

bedstuy_latitude = brooklyn_data.loc[10, 'Latitude'] # neighborhood latitude value
bedstuy_longitude = brooklyn_data.loc[10, 'Longitude'] # neighborhood longitude value

neighborhood_name = brooklyn_data.loc[10, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               bedstuy_latitude, 
                                                               bedstuy_longitude))

Latitude and longitude values of East Flatbush are 40.64171776668961, -73.93610256185836.


### Top 100 venues in the Beford Stuyvesant neighborhood within a radius of 500 meters

First, let's create the GET request URL named **url**.

In [53]:
# type your answer here
LIMIT = 100
radius = 500
bedstuy_url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    bedstuy_latitude, 
    bedstuy_longitude, 
    radius, 
    LIMIT)

bedstuy_url

'https://api.foursquare.com/v2/venues/explore?&client_id=PLY4EEOMDIHOV5LPWRVCSNEJSBCZJTRT5M1PBMGWXDP5CCOI&client_secret=WGILVD4Z3515GALQPTVBCJP2YXYIP0QYFWVF30MLJI5ISJF5&v=20180605&ll=40.64171776668961,-73.93610256185836&radius=500&limit=100'

In [54]:
#Send the GET request
bedstuy_results = requests.get(bedstuy_url).json()

In [55]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [56]:
venues = bedstuy_results['response']['groups'][0]['items']
    
bedstuy_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
bedstuy_venues = bedstuy_venues.loc[:, filtered_columns]

# filter the category for each row
bedstuy_venues['venue.categories'] = bedstuy_venues.apply(get_category_type, axis=1)

# clean columns
bedstuy_venues.columns = [col.split(".")[-1] for col in bedstuy_venues.columns]

bedstuy_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,VIVID Caribbean American Bistro,Caribbean Restaurant,40.642025,-73.932636
1,Paerdegat Park,Park,40.638137,-73.938138
2,Rite Aid,Pharmacy,40.641799,-73.937224
3,Key Food,Supermarket,40.641806,-73.93649
4,Kennedy Fried Chicken,Fast Food Restaurant,40.641409,-73.937811


In [57]:
print('{} venues were returned by Foursquare.'.format(bedstuy_venues.shape[0]))

11 venues were returned by Foursquare.


## Part 7: Analyze Each Neighborhood in Brooklyn

In [58]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    print('Found {} venues in {} neighborhoods.'.format(nearby_venues.shape[0], len(venues_list)))
    
    return(nearby_venues)

In [59]:
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'],
                                   latitudes=brooklyn_data['Latitude'],
                                   longitudes=brooklyn_data['Longitude'])
                                  

Found 2712 venues in 70 neighborhoods.


In [60]:
brooklyn_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,The Bookmark Shoppe,40.624577,-74.030562,Bookstore


In [61]:
print(brooklyn_venues.shape)
brooklyn_venues.head()

(2712, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
1,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
2,Bay Ridge,40.625801,-74.030621,Leo's Casa Calamari,40.6242,-74.030931,Pizza Place
3,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
4,Bay Ridge,40.625801,-74.030621,The Bookmark Shoppe,40.624577,-74.030562,Bookstore


In [95]:
#Venues per Neighborhood
brooklyn_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bath Beach,46,46,46,46,46,46
Bay Ridge,83,83,83,83,83,83
Bedford Stuyvesant,29,29,29,29,29,29
Bensonhurst,31,31,31,31,31,31
Bergen Beach,5,5,5,5,5,5
...,...,...,...,...,...,...
Vinegar Hill,29,29,29,29,29,29
Weeksville,16,16,16,16,16,16
Williamsburg,34,34,34,34,34,34
Windsor Terrace,28,28,28,28,28,28


In [63]:
print('There are {} distinct venues in {} categories.'.format(
    len(brooklyn_venues['Venue'].unique()),len(brooklyn_venues['Venue Category'].unique())))

There are 2213 distinct venues in 281 categories.


In [64]:
# one hot encoding
brooklyn_onehot = pd.get_dummies(brooklyn_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
brooklyn_onehot['Neighborhood'] = brooklyn_venues['Neighborhood'] 

# move neighborhood column to the first column
#fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
#scarborough_onehot = scarborough_onehot[fixed_columns]

brooklyn_neigh = brooklyn_onehot['Neighborhood']
brooklyn_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
brooklyn_onehot.insert(0, 'Neighborhood', brooklyn_neigh)

brooklyn_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Bay Ridge,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Group by Neighborhood, and exame the frequency of the occurrence of venue

In [65]:
brooklyn_groups = brooklyn_onehot.groupby('Neighborhood').mean().reset_index()
brooklyn_groups

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport Terminal,American Restaurant,Animal Shelter,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Bath Beach,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.021739,0.021739,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000
1,Bay Ridge,0.0,0.0,0.0,0.036145,0.0,0.000000,0.0,0.0,0.000000,...,0.012048,0.000000,0.012048,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000
2,Bedford Stuyvesant,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.034483,0.034483,0.0,0.0,0.000000
3,Bensonhurst,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000
4,Bergen Beach,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65,Vinegar Hill,0.0,0.0,0.0,0.034483,0.0,0.034483,0.0,0.0,0.034483,...,0.000000,0.000000,0.000000,0.0,0.034483,0.034483,0.034483,0.0,0.0,0.000000
66,Weeksville,0.0,0.0,0.0,0.062500,0.0,0.000000,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.0,0.000000
67,Williamsburg,0.0,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.029412,...,0.000000,0.000000,0.000000,0.0,0.000000,0.029412,0.000000,0.0,0.0,0.029412
68,Windsor Terrace,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.000000,...,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.035714,0.0,0.0,0.000000


Each Neighborhood with the top 5 venues

In [66]:
brooklyn_venues_5 = 5

for hood in brooklyn_groups['Neighborhood']:
    print("----"+hood+"----")
    temp = brooklyn_groups[brooklyn_groups['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(brooklyn_venues_5))
    print('\n')

----Bath Beach----
                venue  freq
0            Pharmacy  0.07
1  Chinese Restaurant  0.07
2          Donut Shop  0.04
3         Gas Station  0.04
4  Italian Restaurant  0.04


----Bay Ridge----
                venue  freq
0                 Spa  0.07
1  Italian Restaurant  0.07
2         Pizza Place  0.06
3     Thai Restaurant  0.04
4    Greek Restaurant  0.04


----Bedford Stuyvesant----
           venue  freq
0    Coffee Shop  0.10
1           Café  0.07
2    Pizza Place  0.07
3  Deli / Bodega  0.07
4            Bar  0.07


----Bensonhurst----
                venue  freq
0  Chinese Restaurant  0.13
1  Italian Restaurant  0.06
2          Donut Shop  0.06
3      Ice Cream Shop  0.06
4    Sushi Restaurant  0.06


----Bergen Beach----
                  venue  freq
0       Harbor / Marina   0.4
1        Baseball Field   0.2
2            Playground   0.2
3    Athletics & Sports   0.2
4  Other Great Outdoors   0.0


----Boerum Hill----
           venue  freq
0    Coffee Shop  0.

In [67]:
#Function to sort venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

### Top venues for each neighborhood

In [68]:
brooklyn_venues_10 = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(brooklyn_venues_10):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
brooklyn_venues_sorted = pd.DataFrame(columns=columns)
brooklyn_venues_sorted['Neighborhood'] = brooklyn_groups['Neighborhood']

for ind in np.arange(brooklyn_groups.shape[0]):
    brooklyn_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_groups.iloc[ind, :], brooklyn_venues_10)

brooklyn_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bath Beach,Pharmacy,Chinese Restaurant,Fast Food Restaurant,Pizza Place,Bubble Tea Shop,Donut Shop,Italian Restaurant,Gas Station,Surf Spot,Spanish Restaurant
1,Bay Ridge,Italian Restaurant,Spa,Pizza Place,Greek Restaurant,American Restaurant,Bar,Thai Restaurant,Bagel Shop,Grocery Store,Hookah Bar
2,Bedford Stuyvesant,Coffee Shop,Pizza Place,Café,Bar,Deli / Bodega,Wine Bar,Juice Bar,Gift Shop,Basketball Court,New American Restaurant
3,Bensonhurst,Chinese Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Bakery,Donut Shop,Sushi Restaurant,Road,Smoke Shop,Butcher
4,Bergen Beach,Harbor / Marina,Playground,Athletics & Sports,Baseball Field,Yoga Studio,Farmers Market,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant
...,...,...,...,...,...,...,...,...,...,...,...
65,Vinegar Hill,Food Truck,Coffee Shop,Café,Bike Rental / Bike Share,Ice Cream Shop,Bakery,Performing Arts Venue,Park,Entertainment Service,Factory
66,Weeksville,Chinese Restaurant,Discount Store,Park,Juice Bar,Lounge,Liquor Store,Laundry Service,Gas Station,Grocery Store,Cocktail Bar
67,Williamsburg,Coffee Shop,Bar,Bagel Shop,Yoga Studio,Middle Eastern Restaurant,Tapas Restaurant,Latin American Restaurant,Taco Place,Liquor Store,Steakhouse
68,Windsor Terrace,Diner,Plaza,Café,Park,Grocery Store,Deli / Bodega,American Restaurant,Bagel Shop,Sushi Restaurant,Beer Store


In [69]:
brooklyn_venues_sorted.iloc[47,]

Neighborhood                       Mill Island
1st Most Common Venue                     Pool
2nd Most Common Venue        Other Repair Shop
3rd Most Common Venue              Yoga Studio
4th Most Common Venue     Fast Food Restaurant
5th Most Common Venue              Event Space
6th Most Common Venue                  Factory
7th Most Common Venue       Falafel Restaurant
8th Most Common Venue                     Farm
9th Most Common Venue           Farmers Market
10th Most Common Venue                   Field
Name: 47, dtype: object

## Part 8: Use K-Means to cluster the Brooklyn borough 

In [106]:
# set number of clusters
kclusters = 7

brooklyn_clusters = brooklyn_groups.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=2).fit(brooklyn_clusters)

# check cluster labels generated for each row in the dataframe
#kmeans.labels_[0:10] 
kmeans.labels_

array([4, 6, 6, 4, 1, 6, 0, 6, 4, 6, 4, 6, 5, 6, 4, 4, 6, 6, 4, 4, 4, 6,
       6, 3, 0, 0, 6, 0, 0, 0, 6, 4, 6, 4, 4, 6, 5, 6, 4, 4, 4, 4, 4, 4,
       5, 5, 4, 2, 4, 6, 4, 4, 6, 6, 6, 4, 0, 6, 0, 0, 6, 4, 6, 5, 4, 6,
       4, 6, 6, 4], dtype=int32)

### Dataframe that includes the cluster of each neighborhood

In [107]:
brooklyn_merged = brooklyn_data

# add clustering labels
brooklyn_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(brooklyn_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

brooklyn_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Brooklyn,Bay Ridge,40.625801,-74.030621,4,Italian Restaurant,Spa,Pizza Place,Greek Restaurant,American Restaurant,Bar,Thai Restaurant,Bagel Shop,Grocery Store,Hookah Bar
1,Brooklyn,Bensonhurst,40.611009,-73.99518,6,Chinese Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Bakery,Donut Shop,Sushi Restaurant,Road,Smoke Shop,Butcher
2,Brooklyn,Sunset Park,40.645103,-74.010316,6,Pizza Place,Mobile Phone Shop,Mexican Restaurant,Latin American Restaurant,Bank,Bakery,Fried Chicken Joint,Gym,Grocery Store,Creperie
3,Brooklyn,Greenpoint,40.730201,-73.954241,4,Bar,Pizza Place,Cocktail Bar,Coffee Shop,Yoga Studio,Sushi Restaurant,French Restaurant,Deli / Bodega,Café,Restaurant
4,Brooklyn,Gravesend,40.59526,-73.973471,1,Pizza Place,Lounge,Chinese Restaurant,Italian Restaurant,Bakery,Gym,Spa,Breakfast Spot,Furniture / Home Store,Metro Station


In [108]:
# create map
brooklyn_map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'], brooklyn_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(brooklyn_map_clusters)
       
brooklyn_map_clusters

## Part 9: Examine the Brooklyn neighborhood Clusters

In [99]:
brooklyn_cluster_0 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 0, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_1 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 1, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_2 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 2, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_3 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 3, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_4 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 4, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_5 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 5, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]

brooklyn_cluster_6 = brooklyn_merged.loc[brooklyn_merged['Cluster Labels'] == 6, brooklyn_merged.columns[[1] + list(range(4, brooklyn_merged.shape[1]))]]


In [84]:
print(brooklyn_cluster_0.shape)
brooklyn_cluster_0

(12, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Sheepshead Bay,0,Turkish Restaurant,Dessert Shop,Sandwich Place,Yoga Studio,Hotel,Restaurant,Pizza Place,Outlet Store,Miscellaneous Shop,Karaoke Bar
14,Brownsville,0,Restaurant,Moving Target,Pool,Burger Joint,Farmers Market,Fried Chicken Joint,Chinese Restaurant,Park,Performing Arts Venue,Spanish Restaurant
19,Cobble Hill,0,Playground,Bar,Coffee Shop,Pizza Place,Yoga Studio,Cocktail Bar,Deli / Bodega,Italian Restaurant,Ice Cream Shop,Wine Shop
25,Cypress Hills,0,Latin American Restaurant,Ice Cream Shop,Donut Shop,Fried Chicken Joint,Metro Station,Fast Food Restaurant,Spanish Restaurant,Dance Studio,Supermarket,Gas Station
38,Clinton Hill,0,Pizza Place,Italian Restaurant,Wine Shop,Mexican Restaurant,Thai Restaurant,Yoga Studio,Japanese Restaurant,Indian Restaurant,Restaurant,Deli / Bodega
39,Sea Gate,0,Spa,Beach,American Restaurant,Sports Club,Bus Station,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
43,Ocean Hill,0,Deli / Bodega,Convenience Store,Southern / Soul Food Restaurant,Bakery,Playground,Coffee Shop,Donut Shop,Chinese Restaurant,Dry Cleaner,Salad Place
45,Bergen Beach,0,Harbor / Marina,Playground,Athletics & Sports,Baseball Field,Yoga Studio,Farmers Market,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant
48,Georgetown,0,Bank,Pharmacy,Donut Shop,Burger Joint,Supplement Shop,Supermarket,Frozen Yogurt Shop,Mexican Restaurant,Miscellaneous Shop,Cosmetics Shop
56,Rugby,0,Bank,Grocery Store,Caribbean Restaurant,Deli / Bodega,Bus Station,Sandwich Place,Chinese Restaurant,Fried Chicken Joint,Seafood Restaurant,Pharmacy


In [100]:
print(brooklyn_cluster_1.shape)
brooklyn_cluster_1

(1, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Gravesend,1,Pizza Place,Lounge,Chinese Restaurant,Italian Restaurant,Bakery,Gym,Spa,Breakfast Spot,Furniture / Home Store,Metro Station


In [101]:
print(brooklyn_cluster_2.shape)
brooklyn_cluster_2

(1, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
47,Prospect Park South,2,Caribbean Restaurant,Mobile Phone Shop,Pizza Place,Fast Food Restaurant,Grocery Store,Mexican Restaurant,Latin American Restaurant,Donut Shop,Fried Chicken Joint,Supermarket


In [102]:
print(brooklyn_cluster_3.shape)
brooklyn_cluster_3

(1, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Fort Greene,3,Wine Shop,Flower Shop,Italian Restaurant,Playground,Cocktail Bar,Opera House,Pizza Place,French Restaurant,New American Restaurant,Coffee Shop


In [103]:
print(brooklyn_cluster_4.shape)
brooklyn_cluster_4

(27, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bay Ridge,4,Italian Restaurant,Spa,Pizza Place,Greek Restaurant,American Restaurant,Bar,Thai Restaurant,Bagel Shop,Grocery Store,Hookah Bar
3,Greenpoint,4,Bar,Pizza Place,Cocktail Bar,Coffee Shop,Yoga Studio,Sushi Restaurant,French Restaurant,Deli / Bodega,Café,Restaurant
8,Flatbush,4,Deli / Bodega,Juice Bar,Mexican Restaurant,Caribbean Restaurant,Pharmacy,Coffee Shop,Plaza,Middle Eastern Restaurant,Lounge,Liquor Store
10,East Flatbush,4,Food & Drink Shop,Supermarket,Caribbean Restaurant,Chinese Restaurant,Park,Fast Food Restaurant,Pharmacy,Liquor Store,Moving Target,Wine Shop
14,Brownsville,4,Restaurant,Moving Target,Pool,Burger Joint,Farmers Market,Fried Chicken Joint,Chinese Restaurant,Park,Performing Arts Venue,Spanish Restaurant
15,Williamsburg,4,Coffee Shop,Bar,Bagel Shop,Yoga Studio,Middle Eastern Restaurant,Tapas Restaurant,Latin American Restaurant,Taco Place,Liquor Store,Steakhouse
18,Brooklyn Heights,4,Yoga Studio,Deli / Bodega,Park,Italian Restaurant,Pizza Place,Bakery,Mexican Restaurant,Gym,Plaza,Pharmacy
19,Cobble Hill,4,Playground,Bar,Coffee Shop,Pizza Place,Yoga Studio,Cocktail Bar,Deli / Bodega,Italian Restaurant,Ice Cream Shop,Wine Shop
20,Carroll Gardens,4,Italian Restaurant,Coffee Shop,Pizza Place,Bakery,Cocktail Bar,Spa,Wine Shop,Bar,Café,Thai Restaurant
31,Manhattan Beach,4,Café,Ice Cream Shop,Harbor / Marina,Beach,Sandwich Place,Bus Stop,Pizza Place,Playground,Food,Fish Market


In [104]:
print(brooklyn_cluster_5.shape)
brooklyn_cluster_5

(5, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Windsor Terrace,5,Diner,Plaza,Café,Park,Grocery Store,Deli / Bodega,American Restaurant,Bagel Shop,Sushi Restaurant,Beer Store
36,Gerritsen Beach,5,Ice Cream Shop,Pizza Place,Bar,Bagel Shop,Department Store,Convenience Store,Restaurant,Park,Event Space,Seafood Restaurant
44,City Line,5,Donut Shop,Mobile Phone Shop,Fried Chicken Joint,Grocery Store,Fast Food Restaurant,Food Truck,Metro Station,Food,South American Restaurant,Flower Shop
45,Bergen Beach,5,Harbor / Marina,Playground,Athletics & Sports,Baseball Field,Yoga Studio,Farmers Market,Ethiopian Restaurant,Event Space,Factory,Falafel Restaurant
63,Weeksville,5,Chinese Restaurant,Discount Store,Park,Juice Bar,Lounge,Liquor Store,Laundry Service,Gas Station,Grocery Store,Cocktail Bar


In [105]:
print(brooklyn_cluster_6.shape)
brooklyn_cluster_6

(26, 12)


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bensonhurst,6,Chinese Restaurant,Ice Cream Shop,Italian Restaurant,Grocery Store,Bakery,Donut Shop,Sushi Restaurant,Road,Smoke Shop,Butcher
2,Sunset Park,6,Pizza Place,Mobile Phone Shop,Mexican Restaurant,Latin American Restaurant,Bank,Bakery,Fried Chicken Joint,Gym,Grocery Store,Creperie
5,Brighton Beach,6,Beach,Russian Restaurant,Eastern European Restaurant,Restaurant,Gourmet Shop,Sushi Restaurant,Bank,Mobile Phone Shop,Taco Place,Korean Restaurant
7,Manhattan Terrace,6,Pizza Place,Donut Shop,Ice Cream Shop,Bagel Shop,Cosmetics Shop,Convenience Store,Coffee Shop,Chinese Restaurant,Organic Grocery,Steakhouse
9,Crown Heights,6,Pizza Place,Museum,Café,Grocery Store,Fried Chicken Joint,Bookstore,Candy Store,Supermarket,Sushi Restaurant,Salon / Barbershop
11,Kensington,6,Grocery Store,Thai Restaurant,Sandwich Place,Restaurant,Ice Cream Shop,Pizza Place,Pub,Café,Lingerie Store,Liquor Store
13,Prospect Heights,6,Bar,Mexican Restaurant,Thai Restaurant,Wine Shop,Café,Cocktail Bar,Gourmet Shop,Yoga Studio,Bakery,Beer Bar
16,Bushwick,6,Bar,Deli / Bodega,Mexican Restaurant,Coffee Shop,Pizza Place,Bakery,Thrift / Vintage Store,Discount Store,Vegetarian / Vegan Restaurant,Pharmacy
17,Bedford Stuyvesant,6,Coffee Shop,Pizza Place,Café,Bar,Deli / Bodega,Wine Bar,Juice Bar,Gift Shop,Basketball Court,New American Restaurant
21,Red Hook,6,Art Gallery,Seafood Restaurant,American Restaurant,Park,Bar,Ice Cream Shop,Farm,Pizza Place,Wine Shop,Flower Shop
