# The Battle of Restaurants

#### This notebook is used to implement the Coursera Capstone project 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1.  <a href="#item1">Business Problem</a>

2.  <a href="#item2">Data</a>

3.  <a href="#item3">Exploratory Data Analysis</a>

4.  <a href="#item4">Conclusion</a>
    
    </font>
    </div>

## 1. Business Problem

_Jhon_ is a Restaurant Chain owner and thinks of opening a new restaurant in __Brooklyn__ in __New York__. _Jhon_ wants to know what's the __best__ place for openning a new restaurant in order to maximise his profit and guarentee that the project will be succesful.
In order to answer _Jhon_'s question, we will be applying some __Machine Learning__ techniques using __API Foursquare__ location data. Using the API Foursquare location data, we will be able to get the location of all the exisiting restaurants in Brooklyn. Having all this data about the restaurants in Brooklyn gives us the possibiltiy to apply some Machine Learning algorithms, especially, __Clustering__ algortihms, in order to obtain the similarities that exist between different __neighborhoods__ in Brooklyn concerning restaurants. Having this type of information would help our friend _Jhon_ to decide where should he open his next restaurant depending on the type of services he's willing to offer and the clients he's targetting.

## 2. Data

In this project, we will be using two sources of data. First one is the __New York__ data already used in the previous lab. This dataset has a total of __5__ boroughs and __306__ neighborhoods  as well as the the __latitude__ and __longitude__ coordinates of each neighborhood. All this data is stored in a __json__ file, we'll extract from the latter all the data we need and transform it into a __pandas DataFrame__. Now that we have all the data we need, reprensented in a pandas DataFrame, its manipulation will become easier. First of all, our friend _Jhon_ is interested in opening a restaurant, exactly, in Brooklyn so for the rest of the project we will only focus on the neighborhoods that are situated in Brooklyn borough. Here, we will use our second data source, the __API Foursquare__ location data. Using the latter, we will be able to explore all the neighborhoods in Brooklyn and get the different avenues situated within a determined radius as well as the avenues category. After collecting all this data, we can manipulate it as we like in order to give _Jhon_ an answer to his question __"Where should I open my next restaurant in Brooklyn ?"__. This manipulation is often done using Machine Learning techniques. In our case, we will be using Clustering algorithms to do so.

## 3. Exploratory Data Analysis

Let's import all the libraries we need.

In [1]:
# Import required libraries
import numpy as np
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json 
from geopy.geocoders import Nominatim 
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium 
import urllib.request
print('Libraries imported.')

Libraries imported.


### 3.1 Download and Explore Dataset

The dataset has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and longitude coordinates of each neighborhood. 

In [2]:
# Download the dataset
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json'
filename = 'newyork_data.json'
urllib.request.urlretrieve(url, filename)
print('Data downloaded!')

Data downloaded!


Now, let's load the data.

In [3]:
# Load the dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Let's take a quick look at the data.

In [4]:
# Open the json file
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

We notice that all the relevant data is in the _features_ key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data and take a quick look at the first item.

In [5]:
# Extract the data in "features"
new_york_neigh=newyork_data["features"]
# Print the first item
new_york_neigh[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

Next, we have to transform this data in json file into a _pandas_ dataframe. So let's create an empty DataFrame then let's loop through the data and fill the dataframe one row at a time.

In [6]:
# Define the DataFrame columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# Instantiate the DataFrame
neighborhoods = pd.DataFrame(columns=column_names)
# Loop
for data in new_york_neigh:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']    
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]  
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

Let's take a quick look at the resulting DataFrame.

In [7]:
# Show the first five rows
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


We stated in the beginning that this dataset contains 5 boroughs and 306 neighborhoods. Let's check that the data has been completly loaded succefully.


In [8]:
# Check the number of boroughs and neighborhoods
print('The dataframe has {} boroughs and {} neighborhoods.'.format(len(neighborhoods['Borough'].unique()),neighborhoods.shape[0]))

The dataframe has 5 boroughs and 306 neighborhoods.


Before working with our data, let's do a quick pre processing and check if there's any _missing_ or _not assigned_ cells. If it's the case, we're going to drop the rows containing missing values.

In [9]:
neighborhoods.replace("Not assigned",np.nan, inplace = True) # Replace "Not assigned" with "Nan"
neighborhoods.replace("",np.nan, inplace = True) # Replace "" with "Nan"
neighborhoods.replace(" ",np.nan, inplace = True) # Replace " " with "Nan"
neighborhoods.replace("-",np.nan, inplace = True) # Replace "-" with "Nan"
neighborhoods.dropna(inplace = True) # Drop rows with missing values

Let's check again the size of our dataset. If it changes, then we had missing values.

In [10]:
# Check the number of boroughs and neighborhoods
print('The dataframe has {} boroughs and {} neighborhoods.'.format(len(neighborhoods['Borough'].unique()),neighborhoods.shape[0]))

The dataframe has 5 boroughs and 306 neighborhoods.


Let's get the geocoordinates of __New York City__ using _geocoder_ and plot it along side with the 306 neighborhoods using _Folium_.

In [11]:
# Define an instance of the geocoder
address = 'New York City, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# Create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)
# Add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_newyork)     
map_newyork # Display map

However, as we stated in the Business Problem previously. Our firend _Jhon_ is only interested in opening a restaurant in __Brooklyn__ borough. So let's simplify the above map and segment and cluster only the neighborhoods in Brooklyn. In order to do that, we'll slice the original DataFrame and create a new DataFrame of the Brooklyn data.


In [12]:
# Create the new DataFrame for Brooklyn borough
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
# Show the first five rows
brooklyn_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Brooklyn,Bay Ridge,40.625801,-74.030621
1,Brooklyn,Bensonhurst,40.611009,-73.99518
2,Brooklyn,Sunset Park,40.645103,-74.010316
3,Brooklyn,Greenpoint,40.730201,-73.954241
4,Brooklyn,Gravesend,40.59526,-73.973471


Next, as we did before to New York City. Let's get the geocoordinates of __Brooklyn__ and plot its own map using _Folium_.

In [13]:
# Define an instance of the geocoder
address = 'Brooklyn, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# Create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)
# Add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_brooklyn)      
map_brooklyn # Display map

### 3.2 Explore Neighborhoods in Brooklyn

We are going to use the __Foursquare API__ to explore the neighbourhoods and segment them.

__Let's define Foursquare Credentials and Version__.

In [14]:
CLIENT_ID = 'RLHUH1EHWMKBYQZQ5FOQU1JEJZSHUQYATCPNPC2GPEU1D3ZF' # Foursquare ID
CLIENT_SECRET = 'PRMGPYDSCSEDMSMZCJ32SCBQAD2TJFDA3LUCU2D1ZYLUGJLB' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 150 # Foursquare API limit value
radius = 2000 # Define radius

The next function will get the __Top 150 venues__, if available, that are in the neighbourhoods within a __2000 meters__ radius and store all the informations in a DataFrame.

In [15]:
# Function that extracts the category of the venue
def getNearbyVenues(names, latitudes, longitudes, radius=2000, LIMIT=150):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        # Make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # Return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
    # Put everything into a pandas DataFrame
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'] 
    return(nearby_venues)

Let's test the function and get the DataFrame containaing all the venues __categories__ in Brooklyn.

In [16]:
# Test the function to verify its output
brooklyn_venues = getNearbyVenues(names=brooklyn_data['Neighborhood'], latitudes=brooklyn_data['Latitude'], longitudes=brooklyn_data['Longitude'])
brooklyn_venues.head() # Show the first five rows

Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker Heights
Gerritsen Beach
Marine Park
Clinton Hill
Sea Gate
Downtown
Boerum Hill
Prospect Lefferts Gardens
Ocean Hill
City Line
Bergen Beach
Midwood
Prospect Park South
Georgetown
East Williamsburg
North Side
South Side
Ocean Parkway
Fort Hamilton
Ditmas Park
Wingate
Rugby
Remsen Village
New Lots
Paerdegat Basin
Mill Basin
Fulton Ferry
Vinegar Hill
Weeksville
Broadway Junction
Dumbo
Homecrest
Highland Park
Madison
Erasmus


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bay Ridge,40.625801,-74.030621,Bagel Boy,40.627896,-74.029335,Bagel Shop
1,Bay Ridge,40.625801,-74.030621,Pilo Arts Day Spa and Salon,40.624748,-74.030591,Spa
2,Bay Ridge,40.625801,-74.030621,Pegasus Cafe,40.623168,-74.031186,Breakfast Spot
3,Bay Ridge,40.625801,-74.030621,Ho' Brah Taco Joint,40.62296,-74.031371,Taco Place
4,Bay Ridge,40.625801,-74.030621,Mimi Nails,40.622571,-74.031477,Spa


Let's check the DataFrame shape.

In [17]:
brooklyn_venues.shape # Size of DataFrame

(6928, 7)

Since our friend _Jhon_ is only interested in __restaurants__ we'll reduce the current DataFrame to a new DataFrame containig only restaurants.

In [18]:
# Create a DataFrame with only restaurants
brookly_restaurants = pd.DataFrame(brooklyn_venues[brooklyn_venues['Venue Category'].str.contains('Restaurant')])
brookly_restaurants.head() # Show the first five rows

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
6,Bay Ridge,40.625801,-74.030621,Karam,40.622931,-74.028316,Middle Eastern Restaurant
15,Bay Ridge,40.625801,-74.030621,Tanoreen,40.630726,-74.027954,Lebanese Restaurant
17,Bay Ridge,40.625801,-74.030621,Tuscany Grill,40.622913,-74.031387,Italian Restaurant
19,Bay Ridge,40.625801,-74.030621,Chadwick's Restaurant,40.62145,-74.031964,American Restaurant
23,Bay Ridge,40.625801,-74.030621,Elia Restaurant,40.62309,-74.031156,Greek Restaurant


Let's check the current shape. We notice that it went down from __6928__ avenue to only __1693__.

In [19]:
brookly_restaurants.shape # Size of DataFrame

(1693, 7)

Let's have a look at the restaurants __type__ and __number__.

In [20]:
brookly_restaurants["Venue Category"].value_counts() # Unique categories of the variable

Caribbean Restaurant               224
Italian Restaurant                 186
Mexican Restaurant                 102
Chinese Restaurant                  99
Japanese Restaurant                 89
Sushi Restaurant                    86
American Restaurant                 86
Seafood Restaurant                  60
Restaurant                          60
New American Restaurant             46
Fast Food Restaurant                45
Latin American Restaurant           44
Thai Restaurant                     41
Vietnamese Restaurant               38
Middle Eastern Restaurant           38
French Restaurant                   29
Southern / Soul Food Restaurant     29
Russian Restaurant                  29
Mediterranean Restaurant            27
Turkish Restaurant                  26
Greek Restaurant                    23
Tapas Restaurant                    23
Indian Restaurant                   20
Vegetarian / Vegan Restaurant       17
Eastern European Restaurant         15
Asian Restaurant         

### 3.3 Analyze Each Neighborhood

We apply *One Hot Encoding* on the __"Venue Category"__ column and store it in a new DataFrame, we also bring the __"Neighborhood"__ column to the new DataFrame and place it first.

In [21]:
# One Hot encoding
brooklyn_dummy= pd.get_dummies(brookly_restaurants[['Venue Category']], prefix="", prefix_sep="")
# Add neighborhood column back to DataFrame
brooklyn_dummy['Neighborhood'] = brookly_restaurants['Neighborhood'] 
# Move neighborhood column to the first column
fixed_columns = [brooklyn_dummy.columns[-1]] + list(brooklyn_dummy.columns[:-1])
brooklyn_dummy = brooklyn_dummy[fixed_columns]
brooklyn_dummy.head() # Show the first five rows

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,Brazilian Restaurant,Burmese Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,New American Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Polish Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
6,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
15,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
17,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
19,Bay Ridge,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
23,Bay Ridge,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now, let's group rows by neighbourhood and by taking the __mean of the frequency of occurrence__ of each category.

In [22]:
brooklyn_grouped = brooklyn_dummy.groupby('Neighborhood').mean().reset_index() # Get the mean of frequency of occurence
brooklyn_grouped.head() # Show the first five rows

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Austrian Restaurant,Brazilian Restaurant,Burmese Restaurant,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Cuban Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant,Hotpot Restaurant,Indian Restaurant,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Moroccan Restaurant,New American Restaurant,Paella Restaurant,Persian Restaurant,Peruvian Restaurant,Polish Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Tibetan Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Bath Beach,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.090909,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.318182,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.090909
1,Bay Ridge,0.0,0.1,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.133333,0.033333,0.0,0.033333,0.0,0.166667,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.066667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0
2,Bedford Stuyvesant,0.071429,0.035714,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.035714,0.0,0.035714,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0
3,Bensonhurst,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.178571,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.285714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.071429
4,Bergen Beach,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.153846,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now, let's create a new DataFrame that contains each neighbourhood along with the __Top 15__ avenues in it in the descending order.
To do so, let's first create a function that will sort the venues in descending order.

In [23]:
# Function for descendent order sorting
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Next, we create the DataFrame.

In [24]:
num_top_venues = 15
indicators = ['st', 'nd', 'rd']
# Create columns according to number of top restaurants
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Restaurant'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Restaurant'.format(ind+1))
# Create a new DataFrame
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighborhood'] = brooklyn_grouped['Neighborhood']
for ind in np.arange(brooklyn_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(brooklyn_grouped.iloc[ind, :], num_top_venues)
neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
0,Bath Beach,Italian Restaurant,Sushi Restaurant,Vietnamese Restaurant,Chinese Restaurant,Mexican Restaurant,Cantonese Restaurant,Caucasian Restaurant,Dim Sum Restaurant,Fast Food Restaurant,German Restaurant,Japanese Restaurant,Turkish Restaurant,Southern / Soul Food Restaurant,Colombian Restaurant,French Restaurant
1,Bay Ridge,Italian Restaurant,Greek Restaurant,American Restaurant,Chinese Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Caucasian Restaurant,Lebanese Restaurant,Mexican Restaurant,New American Restaurant,Halal Restaurant,Indian Restaurant,Asian Restaurant,Turkish Restaurant
2,Bedford Stuyvesant,New American Restaurant,Caribbean Restaurant,French Restaurant,African Restaurant,Cuban Restaurant,Ramen Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Filipino Restaurant,Italian Restaurant,Mexican Restaurant,Japanese Restaurant
3,Bensonhurst,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Mexican Restaurant,Asian Restaurant,Burmese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,German Restaurant,Greek Restaurant,Japanese Restaurant,Turkish Restaurant,South American Restaurant,Colombian Restaurant
4,Bergen Beach,Italian Restaurant,Mexican Restaurant,Japanese Restaurant,American Restaurant,Peruvian Restaurant,Sushi Restaurant,Chinese Restaurant,Seafood Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Arepa Restaurant,Dumpling Restaurant,Filipino Restaurant


### 3.4 Cluster Neighborhoods

After preparing the DataFrame, it's time to run __Clustering__ algorithms on it.

In my case, I'll proceed wit __K-Means__ algorithm based on __5 clusters__.

In [25]:
# Number of clusters
kclusters = 5
brooklyn_grouped_clustering = brooklyn_grouped.drop('Neighborhood', axis = 1)
# Run K-Means
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(brooklyn_grouped_clustering)

Let's create now, a __final DataFrame__ that contains the previous informations along with the cluster labels and the __Top 15__ restaurants.

In [26]:
# Add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)
brooklyn_merged = brooklyn_data
# Merge brooklyn_grouped with brooklyn_data to add latitude/longitude for each neighborhood
brooklyn_merged = brooklyn_merged.join(neighbourhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how ='inner')
brooklyn_merged.head() # Show the first five rows

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Label,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
0,Brooklyn,Bay Ridge,40.625801,-74.030621,1,Italian Restaurant,Greek Restaurant,American Restaurant,Chinese Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Caucasian Restaurant,Lebanese Restaurant,Mexican Restaurant,New American Restaurant,Halal Restaurant,Indian Restaurant,Asian Restaurant,Turkish Restaurant
1,Brooklyn,Bensonhurst,40.611009,-73.99518,3,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Mexican Restaurant,Asian Restaurant,Burmese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,German Restaurant,Greek Restaurant,Japanese Restaurant,Turkish Restaurant,South American Restaurant,Colombian Restaurant
2,Brooklyn,Sunset Park,40.645103,-74.010316,1,Mexican Restaurant,Latin American Restaurant,Chinese Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Restaurant,Sushi Restaurant,Seafood Restaurant,Szechuan Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Greek Restaurant,Hotpot Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
3,Brooklyn,Greenpoint,40.730201,-73.954241,1,Italian Restaurant,Polish Restaurant,Japanese Restaurant,French Restaurant,New American Restaurant,Seafood Restaurant,Brazilian Restaurant,Chinese Restaurant,Jewish Restaurant,Mediterranean Restaurant,Mexican Restaurant,Moroccan Restaurant,Restaurant,Vietnamese Restaurant,Southern / Soul Food Restaurant
4,Brooklyn,Gravesend,40.59526,-73.973471,3,Vietnamese Restaurant,Sushi Restaurant,Italian Restaurant,Chinese Restaurant,Turkish Restaurant,Mexican Restaurant,Restaurant,Russian Restaurant,Japanese Restaurant,Caucasian Restaurant,Falafel Restaurant,New American Restaurant,Eastern European Restaurant,Cantonese Restaurant,Peruvian Restaurant


Finally, let's visualize the resulting clusters on the New York City map using _Folium_.

You can click on the colored circles and informations about the __neighborhood__, __cluster label__ and __most occurent__ restaurant will appear.

In [27]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster, rest in zip(brooklyn_merged['Latitude'], brooklyn_merged['Longitude'], brooklyn_merged['Neighborhood'],brooklyn_merged['Cluster Label'] ,brooklyn_merged['1st Most Common Restaurant']):
    label = folium.Popup(str(poi) + '\nCluster ' + str(cluster) + '\n' + str(rest), parse_html=True)
    folium.CircleMarker([lat, lon], radius=5, popup=label, color=rainbow[cluster-1], fill=True, fill_color=rainbow[cluster-1], fill_opacity=0.7).add_to(map_clusters)  
map_clusters # Display map

### 3.5 Examine Clusters

Now, we can examine each cluster and determine the discriminating restaurant categories that distinguish each cluster.


#### Cluster 1

In [28]:
# Create a DataFrame to display neighborhoods and restaurants that exist in Cluster 1 
cluster1=pd.DataFrame(brooklyn_merged.loc[brooklyn_merged['Cluster Label'] == 0, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]])
cluster1 # Show the DataFrame

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
8,Flatbush,Caribbean Restaurant,Mexican Restaurant,Thai Restaurant,Restaurant,Halal Restaurant,Middle Eastern Restaurant,Tibetan Restaurant,Tapas Restaurant,Filipino Restaurant,Italian Restaurant,Austrian Restaurant,Vegetarian / Vegan Restaurant,New American Restaurant,Mediterranean Restaurant,Latin American Restaurant
9,Crown Heights,Caribbean Restaurant,Southern / Soul Food Restaurant,New American Restaurant,American Restaurant,Mexican Restaurant,Tapas Restaurant,Sushi Restaurant,Seafood Restaurant,Empanada Restaurant,Chinese Restaurant,Ethiopian Restaurant,African Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant
14,Brownsville,Caribbean Restaurant,Seafood Restaurant,Fast Food Restaurant,Latin American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,American Restaurant,Vegetarian / Vegan Restaurant,Chinese Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Filipino Restaurant
17,Bedford Stuyvesant,New American Restaurant,Caribbean Restaurant,French Restaurant,African Restaurant,Cuban Restaurant,Ramen Restaurant,American Restaurant,Arepa Restaurant,Asian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Filipino Restaurant,Italian Restaurant,Mexican Restaurant,Japanese Restaurant
32,Coney Island,Italian Restaurant,Mexican Restaurant,Caribbean Restaurant,Sushi Restaurant,Chinese Restaurant,Restaurant,Austrian Restaurant,Eastern European Restaurant,Indian Restaurant,Hotpot Restaurant,Halal Restaurant,Greek Restaurant,German Restaurant,French Restaurant,Filipino Restaurant
42,Prospect Lefferts Gardens,Caribbean Restaurant,American Restaurant,Italian Restaurant,Chinese Restaurant,Korean Restaurant,Empanada Restaurant,Ramen Restaurant,Restaurant,Falafel Restaurant,Southern / Soul Food Restaurant,Sushi Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,Thai Restaurant,Indian Restaurant
43,Ocean Hill,Southern / Soul Food Restaurant,Caribbean Restaurant,Latin American Restaurant,French Restaurant,Mexican Restaurant,Chinese Restaurant,Comfort Food Restaurant,Ramen Restaurant,Restaurant,Mediterranean Restaurant,African Restaurant,Spanish Restaurant,Tapas Restaurant,Vegetarian / Vegan Restaurant,Italian Restaurant
46,Midwood,Caribbean Restaurant,Halal Restaurant,Italian Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Restaurant,Sushi Restaurant,Filipino Restaurant,New American Restaurant,Russian Restaurant,Japanese Restaurant,Tapas Restaurant,Tibetan Restaurant,Turkish Restaurant
47,Prospect Park South,Caribbean Restaurant,Thai Restaurant,Mexican Restaurant,Middle Eastern Restaurant,French Restaurant,Italian Restaurant,Latin American Restaurant,Chinese Restaurant,Restaurant,Korean Restaurant,Vegetarian / Vegan Restaurant,Austrian Restaurant,Sushi Restaurant,Indian Restaurant,Tapas Restaurant
48,Georgetown,Caribbean Restaurant,Italian Restaurant,American Restaurant,Mexican Restaurant,Fast Food Restaurant,Peruvian Restaurant,Chinese Restaurant,Japanese Restaurant,Restaurant,Empanada Restaurant,Indian Restaurant,Hotpot Restaurant,Halal Restaurant,Greek Restaurant,German Restaurant


Let's find restaurants with most occurences in this cluster so that we'll have an idea about what type of restaurants is most __suitable__ to this cluster.

In [29]:
# Count the occurence of each restaurant in the whole DataFrame
cluster1.stack().value_counts()

Caribbean Restaurant               15
Italian Restaurant                 14
Mexican Restaurant                 13
Chinese Restaurant                 13
Tapas Restaurant                   10
Restaurant                         10
Vegetarian / Vegan Restaurant       9
Sushi Restaurant                    8
Indian Restaurant                   7
Southern / Soul Food Restaurant     7
Thai Restaurant                     7
New American Restaurant             7
Empanada Restaurant                 7
Filipino Restaurant                 6
French Restaurant                   6
Latin American Restaurant           6
American Restaurant                 5
Mediterranean Restaurant            5
Seafood Restaurant                  5
Austrian Restaurant                 5
Ramen Restaurant                    5
Japanese Restaurant                 5
Korean Restaurant                   4
African Restaurant                  4
Halal Restaurant                    4
Tibetan Restaurant                  4
Middle Easte

We notice that for this cluster. It's preffered to open a __Caribbean__, __Italian__, __Chinese__ or __Mexican__ restaurant since they are the most occurent ones. So if _Jhon_ is ever interested in offering such services then __neighborhoods__ exisitng in this cluster are the perfect place.

Let's display existing neighborhoods in this cluster to make it easy for our firend _Jhon_ to make his __decision__.

In [30]:
# Neighborhoods that exist in this cluster
cluster1["Neighborhood"]

8                      Flatbush
9                 Crown Heights
14                  Brownsville
17           Bedford Stuyvesant
32                 Coney Island
42    Prospect Lefferts Gardens
43                   Ocean Hill
46                      Midwood
47          Prospect Park South
48                   Georgetown
54                  Ditmas Park
55                      Wingate
63                   Weeksville
64            Broadway Junction
69                      Erasmus
Name: Neighborhood, dtype: object

#### Cluster 2

In [31]:
# Create a DataFrame to display neighborhoods and restaurants that exist in Cluster 2
cluster2=pd.DataFrame(brooklyn_merged.loc[brooklyn_merged['Cluster Label'] == 1, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]])
cluster2 # Show the DataFrame

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
0,Bay Ridge,Italian Restaurant,Greek Restaurant,American Restaurant,Chinese Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Restaurant,Caucasian Restaurant,Lebanese Restaurant,Mexican Restaurant,New American Restaurant,Halal Restaurant,Indian Restaurant,Asian Restaurant,Turkish Restaurant
2,Sunset Park,Mexican Restaurant,Latin American Restaurant,Chinese Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Restaurant,Sushi Restaurant,Seafood Restaurant,Szechuan Restaurant,Cantonese Restaurant,Dim Sum Restaurant,Greek Restaurant,Hotpot Restaurant,Italian Restaurant,Vegetarian / Vegan Restaurant
3,Greenpoint,Italian Restaurant,Polish Restaurant,Japanese Restaurant,French Restaurant,New American Restaurant,Seafood Restaurant,Brazilian Restaurant,Chinese Restaurant,Jewish Restaurant,Mediterranean Restaurant,Mexican Restaurant,Moroccan Restaurant,Restaurant,Vietnamese Restaurant,Southern / Soul Food Restaurant
11,Kensington,Mexican Restaurant,Thai Restaurant,American Restaurant,Middle Eastern Restaurant,Caribbean Restaurant,Italian Restaurant,Vietnamese Restaurant,New American Restaurant,Asian Restaurant,Austrian Restaurant,Chinese Restaurant,Dumpling Restaurant,Filipino Restaurant,French Restaurant,Mediterranean Restaurant
12,Windsor Terrace,Italian Restaurant,American Restaurant,Mexican Restaurant,Thai Restaurant,Vietnamese Restaurant,Restaurant,Greek Restaurant,French Restaurant,Chinese Restaurant,Middle Eastern Restaurant,Eastern European Restaurant,Hotpot Restaurant,Halal Restaurant,German Restaurant,Filipino Restaurant
13,Prospect Heights,Sushi Restaurant,New American Restaurant,American Restaurant,Italian Restaurant,Mexican Restaurant,Ramen Restaurant,Vietnamese Restaurant,Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Colombian Restaurant,Indian Restaurant,Latin American Restaurant,Middle Eastern Restaurant,Persian Restaurant
15,Williamsburg,Japanese Restaurant,American Restaurant,Chinese Restaurant,Italian Restaurant,New American Restaurant,Mediterranean Restaurant,Latin American Restaurant,Ramen Restaurant,Scandinavian Restaurant,Seafood Restaurant,South American Restaurant,Sushi Restaurant,Tapas Restaurant,French Restaurant,Arepa Restaurant
16,Bushwick,Mexican Restaurant,Cuban Restaurant,Italian Restaurant,French Restaurant,New American Restaurant,Caribbean Restaurant,Vegetarian / Vegan Restaurant,Latin American Restaurant,Mediterranean Restaurant,Ethiopian Restaurant,Peruvian Restaurant,Seafood Restaurant,Japanese Restaurant,Greek Restaurant,American Restaurant
18,Brooklyn Heights,American Restaurant,Italian Restaurant,Seafood Restaurant,Vietnamese Restaurant,Chinese Restaurant,French Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,New American Restaurant,Japanese Restaurant,Sushi Restaurant,Thai Restaurant,Spanish Restaurant,Colombian Restaurant,Tibetan Restaurant
19,Cobble Hill,Italian Restaurant,Vietnamese Restaurant,American Restaurant,Seafood Restaurant,Chinese Restaurant,Dumpling Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Vegetarian / Vegan Restaurant,Middle Eastern Restaurant,Japanese Restaurant,Thai Restaurant,Sushi Restaurant,Colombian Restaurant


Let's find restaurants with most occurences in this cluster so that we'll have an idea about what type of restaurants is most __suitable__ to this cluster.

In [32]:
# Count the occurence of each restaurant in the whole DataFrame
cluster2.stack().value_counts()

Italian Restaurant                 30
Japanese Restaurant                24
Chinese Restaurant                 23
American Restaurant                21
Seafood Restaurant                 21
Mexican Restaurant                 20
New American Restaurant            20
French Restaurant                  19
Middle Eastern Restaurant          17
Vietnamese Restaurant              14
Restaurant                         14
Thai Restaurant                    13
Greek Restaurant                   13
Sushi Restaurant                   13
Dumpling Restaurant                10
Latin American Restaurant           9
German Restaurant                   9
Caribbean Restaurant                8
Ethiopian Restaurant                7
Vegetarian / Vegan Restaurant       6
Cuban Restaurant                    6
Asian Restaurant                    6
Mediterranean Restaurant            6
Halal Restaurant                    6
Filipino Restaurant                 6
Southern / Soul Food Restaurant     6
Peruvian Res

For this cluster. In adition to Italian and Chinese, __Japanese__, __American__, __French__ and __Seafood__ restaurants are very occurent. So if _Jhon_ is ever interested in offering such services then __neighborhoods__ exisitng in this cluster are the perfect place.

Let's display existing neighborhoods in this cluster to make it easy for our firend _Jhon_ to make his __decision__.

In [33]:
# Neighborhoods that exist in this cluster
cluster2["Neighborhood"]

0             Bay Ridge
2           Sunset Park
3            Greenpoint
11           Kensington
12      Windsor Terrace
13     Prospect Heights
15         Williamsburg
16             Bushwick
18     Brooklyn Heights
19          Cobble Hill
20      Carroll Gardens
21             Red Hook
22              Gowanus
23          Fort Greene
24           Park Slope
30          Mill Island
34         Borough Park
35        Dyker Heights
38         Clinton Hill
40             Downtown
41          Boerum Hill
45         Bergen Beach
49    East Williamsburg
50           North Side
51           South Side
53        Fort Hamilton
60           Mill Basin
61         Fulton Ferry
62         Vinegar Hill
65                Dumbo
Name: Neighborhood, dtype: object

#### Cluster 3

In [34]:
# Create a DataFrame to display neighborhoods and restaurants that exist in Cluster 3
cluster3=pd.DataFrame(brooklyn_merged.loc[brooklyn_merged['Cluster Label'] == 2, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]])
cluster3 # Show the DataFrame

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
10,East Flatbush,Caribbean Restaurant,Chinese Restaurant,Indian Restaurant,Fast Food Restaurant,Seafood Restaurant,Latin American Restaurant,Mexican Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,Dumpling Restaurant,French Restaurant,German Restaurant,Greek Restaurant
27,Starrett City,Caribbean Restaurant,Fast Food Restaurant,American Restaurant,Chinese Restaurant,Vietnamese Restaurant,Filipino Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,French Restaurant,Dumpling Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant,Hotpot Restaurant
28,Canarsie,Caribbean Restaurant,American Restaurant,Fast Food Restaurant,Italian Restaurant,Mexican Restaurant,Vietnamese Restaurant,Filipino Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,French Restaurant,Dumpling Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant
29,Flatlands,Caribbean Restaurant,Restaurant,American Restaurant,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Mexican Restaurant,Fast Food Restaurant,Japanese Restaurant,Cajun / Creole Restaurant,Ethiopian Restaurant,Arepa Restaurant,Hotpot Restaurant,Halal Restaurant,Greek Restaurant
56,Rugby,Caribbean Restaurant,Seafood Restaurant,Fast Food Restaurant,Chinese Restaurant,Japanese Restaurant,Spanish Restaurant,Indian Restaurant,Vegetarian / Vegan Restaurant,American Restaurant,Burmese Restaurant,Cajun / Creole Restaurant,Hotpot Restaurant,Halal Restaurant,Greek Restaurant,German Restaurant
57,Remsen Village,Caribbean Restaurant,Seafood Restaurant,Fast Food Restaurant,Spanish Restaurant,Indian Restaurant,Chinese Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Vietnamese Restaurant,Dim Sum Restaurant,Filipino Restaurant,French Restaurant,German Restaurant
59,Paerdegat Basin,Caribbean Restaurant,American Restaurant,Italian Restaurant,Japanese Restaurant,Chinese Restaurant,Fast Food Restaurant,Mexican Restaurant,Peruvian Restaurant,German Restaurant,French Restaurant,Eastern European Restaurant,Filipino Restaurant,Greek Restaurant,Halal Restaurant,Hotpot Restaurant


Let's find restaurants with most occurences in this cluster so that we'll have an idea about what type of restaurants is most __suitable__ to this cluster.

In [35]:
# Count the occurence of each restaurant in the whole DataFrame
cluster3.stack().value_counts()

Fast Food Restaurant             7
Caribbean Restaurant             7
German Restaurant                6
Greek Restaurant                 6
Chinese Restaurant               6
Filipino Restaurant              5
American Restaurant              5
Halal Restaurant                 5
French Restaurant                5
Ethiopian Restaurant             5
Hotpot Restaurant                4
Falafel Restaurant               4
Empanada Restaurant              4
Mexican Restaurant               4
Indian Restaurant                4
Vietnamese Restaurant            3
Dumpling Restaurant              3
Italian Restaurant               3
Seafood Restaurant               3
Japanese Restaurant              3
Cajun / Creole Restaurant        2
Spanish Restaurant               2
Eastern European Restaurant      2
Restaurant                       1
Arepa Restaurant                 1
Peruvian Restaurant              1
Canarsie                         1
Vegetarian / Vegan Restaurant    1
Paerdegat Basin     

Concerning this cluster. In adition to Carribean and Chinese, __Fast Food__, __German__, __Greek__ and __Halal__ restaurants are an exellent idea. So if _Jhon_ is ever interested in offering such services then __neighborhoods__ exisitng in this cluster are the perfect place.

Let's display existing neighborhoods in this cluster to make it easy for our firend _Jhon_ to make his __decision__.

In [36]:
# Neighborhoods that exist in this cluster
cluster3["Neighborhood"]

10      East Flatbush
27      Starrett City
28           Canarsie
29          Flatlands
56              Rugby
57     Remsen Village
59    Paerdegat Basin
Name: Neighborhood, dtype: object

#### Cluster 4

In [37]:
# Create a DataFrame to display neighborhoods and restaurants that exist in Cluster 4
cluster4=pd.DataFrame(brooklyn_merged.loc[brooklyn_merged['Cluster Label'] == 3, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]])
cluster4 # Show the DataFrame

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
1,Bensonhurst,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Mexican Restaurant,Asian Restaurant,Burmese Restaurant,Cantonese Restaurant,Dim Sum Restaurant,German Restaurant,Greek Restaurant,Japanese Restaurant,Turkish Restaurant,South American Restaurant,Colombian Restaurant
4,Gravesend,Vietnamese Restaurant,Sushi Restaurant,Italian Restaurant,Chinese Restaurant,Turkish Restaurant,Mexican Restaurant,Restaurant,Russian Restaurant,Japanese Restaurant,Caucasian Restaurant,Falafel Restaurant,New American Restaurant,Eastern European Restaurant,Cantonese Restaurant,Peruvian Restaurant
5,Brighton Beach,Sushi Restaurant,Eastern European Restaurant,Restaurant,Russian Restaurant,Japanese Restaurant,Italian Restaurant,Turkish Restaurant,Mediterranean Restaurant,Seafood Restaurant,Indian Restaurant,Asian Restaurant,Burmese Restaurant,Argentinian Restaurant,Halal Restaurant,American Restaurant
6,Sheepshead Bay,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Russian Restaurant,Turkish Restaurant,Eastern European Restaurant,Seafood Restaurant,Mediterranean Restaurant,Asian Restaurant,Chinese Restaurant,Greek Restaurant,Indian Restaurant,Mexican Restaurant,New American Restaurant
7,Manhattan Terrace,Sushi Restaurant,Chinese Restaurant,Italian Restaurant,Russian Restaurant,Vietnamese Restaurant,American Restaurant,Restaurant,Japanese Restaurant,Halal Restaurant,Middle Eastern Restaurant,Falafel Restaurant,Burmese Restaurant,Turkish Restaurant,Mediterranean Restaurant,Mexican Restaurant
31,Manhattan Beach,Sushi Restaurant,Italian Restaurant,Restaurant,Russian Restaurant,Seafood Restaurant,Turkish Restaurant,Eastern European Restaurant,Japanese Restaurant,Mediterranean Restaurant,Greek Restaurant,Middle Eastern Restaurant,Indian Restaurant,Asian Restaurant,Cantonese Restaurant,American Restaurant
33,Bath Beach,Italian Restaurant,Sushi Restaurant,Vietnamese Restaurant,Chinese Restaurant,Mexican Restaurant,Cantonese Restaurant,Caucasian Restaurant,Dim Sum Restaurant,Fast Food Restaurant,German Restaurant,Japanese Restaurant,Turkish Restaurant,Southern / Soul Food Restaurant,Colombian Restaurant,French Restaurant
36,Gerritsen Beach,Italian Restaurant,Sushi Restaurant,Seafood Restaurant,Japanese Restaurant,Turkish Restaurant,Russian Restaurant,American Restaurant,Greek Restaurant,Restaurant,Mediterranean Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Fast Food Restaurant,Ethiopian Restaurant
37,Marine Park,Italian Restaurant,Japanese Restaurant,American Restaurant,Sushi Restaurant,Restaurant,Mediterranean Restaurant,Peruvian Restaurant,Mexican Restaurant,Russian Restaurant,Chinese Restaurant,Seafood Restaurant,Caribbean Restaurant,Empanada Restaurant,Ethiopian Restaurant,Dumpling Restaurant
52,Ocean Parkway,Italian Restaurant,Chinese Restaurant,Sushi Restaurant,Japanese Restaurant,Russian Restaurant,Mexican Restaurant,Burmese Restaurant,Falafel Restaurant,Halal Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Vietnamese Restaurant,Turkish Restaurant,Colombian Restaurant,French Restaurant


Let's find restaurants with most occurences in this cluster so that we'll have an idea about what type of restaurants is most __suitable__ to this cluster.

In [38]:
# Count the occurence of each restaurant in the whole DataFrame
cluster4.stack().value_counts()

Sushi Restaurant                   12
Italian Restaurant                 12
Japanese Restaurant                12
Turkish Restaurant                 11
Russian Restaurant                 10
Chinese Restaurant                  9
Mexican Restaurant                  9
Mediterranean Restaurant            9
Restaurant                          9
Vietnamese Restaurant               7
Seafood Restaurant                  7
Eastern European Restaurant         6
American Restaurant                 6
Asian Restaurant                    5
Halal Restaurant                    5
Greek Restaurant                    4
Burmese Restaurant                  4
Middle Eastern Restaurant           4
Cantonese Restaurant                4
New American Restaurant             4
Colombian Restaurant                3
Falafel Restaurant                  3
Indian Restaurant                   3
Ethiopian Restaurant                2
Fast Food Restaurant                2
German Restaurant                   2
French Resta

When it comes to this cluster. In adition to Italian, Mexican and Japanese, __Sushi__, __Vietnamese__, and __Turkish__ restaurants are also very occurent. So if _Jhon_ is ever interested in offering such services then __neighborhoods__ exisitng in this cluster are the perfect place.

Let's display existing neighborhoods in this cluster to make it easy for our firend _Jhon_ to make his __decision__.

In [39]:
# Neighborhoods that exist in this cluster
cluster4["Neighborhood"]

1           Bensonhurst
4             Gravesend
5        Brighton Beach
6        Sheepshead Bay
7     Manhattan Terrace
31      Manhattan Beach
33           Bath Beach
36      Gerritsen Beach
37          Marine Park
52        Ocean Parkway
66            Homecrest
68              Madison
Name: Neighborhood, dtype: object

#### Cluster 5

In [40]:
# Create a DataFrame to display neighborhoods and restaurants that exist in Cluster 5
cluster5=pd.DataFrame(brooklyn_merged.loc[brooklyn_merged['Cluster Label'] == 4, brooklyn_merged.columns[[1] + list(range(5, brooklyn_merged.shape[1]))]])
cluster5 # Show the DataFrame

Unnamed: 0,Neighborhood,1st Most Common Restaurant,2nd Most Common Restaurant,3rd Most Common Restaurant,4th Most Common Restaurant,5th Most Common Restaurant,6th Most Common Restaurant,7th Most Common Restaurant,8th Most Common Restaurant,9th Most Common Restaurant,10th Most Common Restaurant,11th Most Common Restaurant,12th Most Common Restaurant,13th Most Common Restaurant,14th Most Common Restaurant,15th Most Common Restaurant
25,Cypress Hills,Latin American Restaurant,Fast Food Restaurant,South American Restaurant,African Restaurant,Cuban Restaurant,Chinese Restaurant,Caribbean Restaurant,Restaurant,American Restaurant,Thai Restaurant,Brazilian Restaurant,Eastern European Restaurant,Hotpot Restaurant,Halal Restaurant,Greek Restaurant
26,East New York,Latin American Restaurant,Fast Food Restaurant,Caribbean Restaurant,Cuban Restaurant,Chinese Restaurant,South American Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant,Dumpling Restaurant,French Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant
39,Sea Gate,Fast Food Restaurant,Chinese Restaurant,Falafel Restaurant,Mexican Restaurant,Vietnamese Restaurant,Filipino Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,French Restaurant,Dim Sum Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant,Hotpot Restaurant
44,City Line,Fast Food Restaurant,Latin American Restaurant,South American Restaurant,Chinese Restaurant,Japanese Restaurant,Caribbean Restaurant,Cuban Restaurant,Restaurant,African Restaurant,Sushi Restaurant,Halal Restaurant,German Restaurant,French Restaurant,Filipino Restaurant,Thai Restaurant
58,New Lots,Fast Food Restaurant,Caribbean Restaurant,Latin American Restaurant,American Restaurant,Chinese Restaurant,Vietnamese Restaurant,Filipino Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,French Restaurant,Dumpling Restaurant,German Restaurant,Greek Restaurant,Halal Restaurant
67,Highland Park,Latin American Restaurant,Chinese Restaurant,Fast Food Restaurant,Restaurant,Indian Restaurant,South American Restaurant,Southern / Soul Food Restaurant,Mexican Restaurant,American Restaurant,Filipino Restaurant,French Restaurant,Dim Sum Restaurant,German Restaurant,Greek Restaurant,Falafel Restaurant


Let's find restaurants with most occurences in this cluster so that we'll have an idea about what type of restaurants is most __suitable__ to this cluster.

In [41]:
# Count the occurence of each restaurant in the whole DataFrame
cluster5.stack().value_counts()

Fast Food Restaurant               6
Chinese Restaurant                 6
Halal Restaurant                   5
German Restaurant                  5
Filipino Restaurant                5
Greek Restaurant                   5
Latin American Restaurant          5
French Restaurant                  5
South American Restaurant          4
Caribbean Restaurant               4
Falafel Restaurant                 4
American Restaurant                3
Empanada Restaurant                3
Restaurant                         3
Ethiopian Restaurant               3
Cuban Restaurant                   3
Eastern European Restaurant        2
Dumpling Restaurant                2
Vietnamese Restaurant              2
African Restaurant                 2
Dim Sum Restaurant                 2
Thai Restaurant                    2
Mexican Restaurant                 2
Hotpot Restaurant                  2
New Lots                           1
Indian Restaurant                  1
Sea Gate                           1
S

Finally, concerning this cluster. __Latin American__, __Filipino__, and __French__ restaurants would be a great idea. So if _Jhon_ is ever interested in offering such services then __neighborhoods__ exisitng in this cluster are the perfect place.

Let's display existing neighborhoods in this cluster to make it easy for our firend _Jhon_ to make his __decision__.

In [42]:
# Neighborhoods that exist in this cluster
cluster5["Neighborhood"]

25    Cypress Hills
26    East New York
39         Sea Gate
44        City Line
58         New Lots
67    Highland Park
Name: Neighborhood, dtype: object

## 4. Conclusion

The main goal of this project was to help our friend _Jhon_, who's a __Restaurant Chain owner__, open a restaurant in __Brooklyn__. We used both __New York City__ data and __API Foursquare__ location data to do so. By __segmenting__ Brooklyn's neighborhoods and __analyzing__ each one of them. We finally could cluster Brooklyn into __5 main clusters__, using Clustering algorithms and Machine Learning techniques. These 5 clusters will serve as a __help with decison making tool__. Each cluster is characterized with specific type of restaurants. All useful informations about these restaurants has been provided, such as the __neighborhood__ where this restaurant is located and the __type of services__ the restaurant offers. All what's left to do for *Jhon* is to choose what type of restaurants he's willing to open, and based on his decision, the __most suitable__ neighborhood will be suggested.