<h1><center>The Battle of the Nieghborhoods</center></h1>
<h3><center>Applied Data Science Capstone by IBM/Coursera</center></h3>

## Table of contents
* [1. Introduction: Business Problem](#introduction)
* [2. Data: Data Sources](#data)
* [3. Methodology](#methodology)
* [4. Discussion and Recommendation](#discussion)
* [5. Conclusion](#conclusion)

## 1. Introduction: Business Problem <a name="introduction"></a>

Our business problem here is to capitalize on this increasing demand and interest in Chinese cuisine and open a Chinese restaurant. However, the first thing to think about when opening a new restaurant is location. The purpose of this project is to determine which neighborhood would be the most ideal to open a new Chinese restaurant. To do so, we will be analyzing the demographic data of boroughs in London and nearby venues as well as performing clustering on neighborhoods. This project will be useful to those looking to open a Chinese restaurant.

## 2. Data: Data Sources <a name="data"></a>

* Demographic data to understand the Chinese population in all boroughs from the **Wikipedia page – Ethnic Groups in London**.
* List of neighborhoods and postal codes for the boroughs from the **Wikipedia page – List of Areas of London**.
* List of venues in each neighborhood from the **FourSquare API**.

<h3>2.1 Installing Required Libraries</h3>

In [1]:
!pip -q install geopy

!pip -q install geocoder

!pip install pgeocode

!pip -q install folium
print('Installation Completed')

Installation Completed


<h3> 2.2 Importing Required Libraries</h3>

In [2]:
# Library for BeautifulSoup, for web scrapping
from bs4 import BeautifulSoup

# Library to handle data in a vectorized manner
import numpy as np

# Library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Library to handle JSON files
import json

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

# Library to handle requests
import requests

# Tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering stage
from sklearn.cluster import KMeans

# Import the Geocoder
import geocoder

# Import pgeocode
import pgeocode

# Map rendering library
import folium

# Import matplotlib and seaborn for visualisation
import matplotlib.pyplot as plt
import seaborn as sns

print('Libraries Imported')

Libraries Imported


<h3>2.3 Getting Deomographic Data for Ethnic Groups in London</h3>

In order to determine which neighborhood in London would be the best to open a new Chinese restaurant, we assume that the Chinese population make up the majority of the market for Chinese cuisine. In reality, other ethnicities also do frequent Chinese restaurants. However, since market data for Chinese cuisine is not made freely available on the internet, for the purpose of this project, we will make this assumption.

In [3]:
# Submiting GET request using BeautifulSoup object
url = 'https://en.wikipedia.org/wiki/Ethnic_groups_in_London#:~:text=At%20the%202011%20census%2C%20London,24.5%25%20born%20outside%20of%20Europe.'
html_data  = requests.get(url).text
soup = BeautifulSoup(html_data, "html5lib")

In [4]:
tables = soup.find_all('table', {'class':'wikitable'})

In [5]:
asian_pop = tables[6].tbody

In [6]:
# Define the column names for a new dataframe
columns = ['London Borough', 'Chinese Population']

# Create a new empty dataframe 
chinese_pop = pd.DataFrame(columns = columns)

# Extract the borough and chinese population from the html table into the dataframe chinese_pop
for i, row in enumerate(asian_pop.find_all('tr')): 
    if i == 0: # The first row is ignored as it only contains headers
        pass
    else:
        col = row.find_all("td")
        borough = col[1].text
        chinese = col[5].text.replace(',','')
        chinese_pop = chinese_pop.append({"London Borough":borough, "Chinese Population":chinese}, ignore_index=True)

In [7]:
# Examine the first 5 rows of the dataframe
chinese_pop.head()

Unnamed: 0,London Borough,Chinese Population
0,Newham,3930
1,Redbridge,3000
2,Brent,3250
3,Tower Hamlets,8109
4,Harrow,2629


Now that we have our data, we want to sort the dataframe by 'Chinese Population' from the borough with the highest Chinese population to the lowest.

In [8]:
# Converting the values to type int so we can sort by population
chinese_pop['Chinese Population'] = chinese_pop['Chinese Population'].astype('int')

# Sorting the Chinese Population column in descending order 
chinese_pop_sort = chinese_pop.sort_values(by='Chinese Population', ascending = False).reset_index(drop = True)

# Returning the top 5 boroughs with the highest Chinese population
chinese_pop_sort.head()

Unnamed: 0,London Borough,Chinese Population
0,Barnet,8259
1,Tower Hamlets,8109
2,Southwark,8074
3,Camden,6493
4,Westminster,5917


<h3>2.4 Getting Neighborhood and Postal Code Data for the 5 Boroughs in London</h3>

Now that we have identified the 5 boroughs that we will be focusing on, we want to get a list of the neighborhoods and postal codes in those boroughs.

In [9]:
# Submiting GET request using BeautifulSoup object
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
html_data  = requests.get(url).text
soup = BeautifulSoup(html_data, "html5lib")

In [10]:
# This extracts the "tbody" within the table where class is "wikitable sortable"
table = soup.find('table', {'class':'wikitable sortable'}).tbody

# Extracts all table rows within the table above
rows = table.find_all('tr')

# Define the column names for a new dataframe
columns = ['Neighborhood', 'Borough', 'PostTown', 'PostCode']

# Create a new empty dataframe 
london = pd.DataFrame(columns = columns)

# Extract the neighborhood, borough, posttown and postal code data from the html table into the dataframe london
for i in range(1, len(table.find_all('tr'))): # The first row is ignored as it only contains headers
    col = rows[i].find_all("td")
    location = col[0].text
    borough = col[1].text.rstrip(']').rstrip('0123456789').rstrip('[')
    posttown = col[2].text
    postcode = col[3].text
    london = london.append({"Neighborhood":location, "Borough":borough, 'PostTown':posttown, 'PostCode':postcode}, ignore_index=True)

In [11]:
# Examine the first 5 rows of the dataframe
london.head()

Unnamed: 0,Neighborhood,Borough,PostTown,PostCode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14"


In the <code>london</code> dataframe, we note that in some cases the neighborhood is affixed to more than one postcode. Since we only need one postcode to extract the coordinates of the neighborhood, we select the first postcode listed and drop the rest.

In [12]:
# Spliting the postcode values by ',' and selecting one postcode
london = london.drop('PostCode', axis=1).join(london['PostCode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('PostCode'))

In [13]:
# Examine the first 5 rows of the dataframe
london.head()

Unnamed: 0,Neighborhood,Borough,PostTown,PostCode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0


Now that we have only one postal code for each neighborhood, we want to filter the dataframe for PostTown 'LONDON' and the 5 boroughs as well.

In [14]:
# Filtering the data to include only neighborhoods in London
london = london[london['PostTown'].str.contains('LONDON')]

# Dropping the PostTown column after we have filtered the dataframe
london.drop(['PostTown'], axis=1, inplace = True)

# Filtering the data to include only the 5 boroughs
london = london[london['Borough'].str.contains('Barnet|Tower Hamlets|Southwark|Camden|Westminster')].reset_index(drop = True)

In [15]:
# Examine the first 10 rows of the dataframe
london.head(10)

Unnamed: 0,Neighborhood,Borough,PostCode
0,Aldwych,Westminster,WC2
1,Arkley,Barnet,EN5
2,Arkley,Barnet,NW7
3,Bankside,Southwark,SE1
4,Barnet Gate,Barnet,NW7
5,Barnet Gate,Barnet,EN5
6,Bayswater,Westminster,W2
7,Belgravia,Westminster,SW1
8,Belsize Park,Camden,NW3
9,Bermondsey,Southwark,SE1


Although we have dropped the excess postcodes, we can see from the dataframe above that some of the postcodes are still repeating. The latitude and longitude values of these neighborhoods will be the same, therefore having multiplies of the same postcode does not add any value to our analysis and are subsequently dropped.

In [16]:
# Drop duplicated postcodes
london.drop_duplicates(subset ="PostCode", keep='first', inplace = True)

In [17]:
# Check the final number of rows of the dataframe
london.shape

(54, 3)

<h3>2.5 Getting the Latitude and Longitude for each PostCode</h3>

Now that we have the names of the neighborhoods and their respective postal codes, we want to get the latitude and longitude values as well. To do so, we will be using the Python library <code>pgeocode</code>.

In [18]:
# Extracting the PostCode column from our london dataframe
postal_codes = london['PostCode']

# Define the column names for a new dataframe
columns = ['PostCode', 'Latitude', 'Longitude']

# Create a new empty dataframe 
coordinates = pd.DataFrame(columns = columns)

# Iterate through the postcodes, get the latitude and longitude values and append to the new dataframe coordinates
for postal_code in postal_codes:
    nomi = pgeocode.Nominatim('gb')
    coordinate = nomi.query_postal_code(postal_code)
    lat = coordinate[9]
    lng = coordinate[10]
    coordinates = coordinates.append({"PostCode":postal_code, "Latitude":lat, 'Longitude':lng}, ignore_index=True)
    
# Examine the first 5 rows of the dataframe
coordinates.head()

Unnamed: 0,PostCode,Latitude,Longitude
0,WC2,51.5142,-0.123382
1,EN5,51.6562,-0.194317
2,NW7,51.6143,-0.2273
3,SE1,51.4963,-0.093038
4,NW7,51.6143,-0.2273


We have 2 dataframes now. <code>london</code>, containing the name of the neighborhoods and boroughs and <code>coordinates</code>, containing the coordinates for these neighborhoods. We want to merge the two dataframe to get what will be our main dataframe for our analysis.

In [19]:
# Perform a left join on london and coordinates dataframe
london = pd.merge(london, coordinates,on='PostCode', how='left')

# Examine the first 5 rows of the dataframe
london.head()

Unnamed: 0,Neighborhood,Borough,PostCode,Latitude,Longitude
0,Aldwych,Westminster,WC2,51.5142,-0.123382
1,Arkley,Barnet,EN5,51.6562,-0.194317
2,Arkley,Barnet,NW7,51.6143,-0.2273
3,Bankside,Southwark,SE1,51.4963,-0.093038
4,Barnet Gate,Barnet,NW7,51.6143,-0.2273


In [20]:
# Check the shape of the dataframe london to ensure no information has been dropped or lost during the process
london.shape

(54, 5)

<h3>2.6 Defining the FourSquare API Data</h3>

Finally, we define the details of our credentials to access the FourSquare API. We also define the limit and radius of our calls.

In [21]:
CLIENT_ID = 'W33WOM3RWJHP511UIYKJKJISIWDN5KAOWU3N2V3ECGQGYIYP'
CLIENT_SECRET = 'P22DORR32XWLTRDAZQJ2T24N1YUVPM3DLV3G0UR4RNJAULGP'
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
radius = 2000 # A default radius value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: W33WOM3RWJHP511UIYKJKJISIWDN5KAOWU3N2V3ECGQGYIYP
CLIENT_SECRET:P22DORR32XWLTRDAZQJ2T24N1YUVPM3DLV3G0UR4RNJAULGP


## 3. Methodology <a name="methodology"></a>

We will be performing 3 main analysis:
* Exploring a sample neighborhood
* Exploring all the neighborhoods in the 5 boroughs
* Clustering the neighborhoods

<h3>3.1 Exploring a Sample Neighborhood</h3>

We shall do an initial exploration of a sample neighborhood, Arkley, Barnet, to determine the workability of the FourSquare API data. First, let us check how many venues are there in Arkley.

In [22]:
# Extracting the coordinates of the sample neighborhood
barnet_lat = london.loc[1, 'Latitude']
barnet_lng = london.loc[1, 'Longitude']
barnet_name = london.loc[1, 'Neighborhood']
print('The geograpical coordinate of {}, Barnet are {}, {}.'.format(barnet_name, barnet_lat, barnet_lng))

The geograpical coordinate of Arkley, Barnet are 51.6562, -0.19431666666666667.


In [23]:
# Define the URL to explore the nearby venues in the sample neighborhood
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, barnet_lat, barnet_lng, VERSION, radius, LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?client_id=W33WOM3RWJHP511UIYKJKJISIWDN5KAOWU3N2V3ECGQGYIYP&client_secret=P22DORR32XWLTRDAZQJ2T24N1YUVPM3DLV3G0UR4RNJAULGP&ll=51.6562,-0.19431666666666667&v=20180605&radius=2000&limit=100


In [24]:
# Send GET request and examine the results
results = requests.get(url).json()
'There are {} nearby venues in {}, Barnet.'.format(len(results['response']['groups'][0]['items']), barnet_name)

'There are 44 nearby venues in Arkley, Barnet.'

Ok, we know that there are **42** venues in Arkley, Barnet. Let us drill down into these venues and see what type of venues they are.

In [25]:
# Get the relevant part of the JSON file
items = results['response']['groups'][0]['items']

In [26]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [27]:
# Process JSON and convert it to a clean dataframe
dataframe = json_normalize(items) # flatten JSON

# Filter columns
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in dataframe.columns if col.startswith('venue.location.')] + ['venue.id']

arkley_venues = dataframe.loc[:, filtered_columns]

# Filter the category for each row
arkley_venues['venue.categories'] = arkley_venues.apply(get_category_type, axis=1)

# Clean columns
arkley_venues.columns = [col.split('.')[-1] for col in arkley_venues.columns]

# Examine the first 5 rows of the dataframe
arkley_venues.head()

  dataframe = json_normalize(items) # flatten JSON


Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,crossStreet,id
0,Ye Old Mitre Inne,Pub,58 High Street,51.65294,-0.199507,"[{'label': 'display', 'lat': 51.65293985597287...",510,EN5 5SJ,GB,Hertfordshire,Hertfordshire,United Kingdom,"[58 High Street, Hertfordshire, EN5 5SJ, Unite...",,,4b995bccf964a5209f7535e3
1,The Black Horse,Pub,Wood St,51.653075,-0.206719,"[{'label': 'display', 'lat': 51.65307467634626...",924,EN5 4BW,GB,London,Greater London,United Kingdom,"[Wood St, London, Greater London, EN5 4BW, Uni...",High Barnet,,4bc1e42eabf49521c690c193
2,Everyman Cinema,Movie Theater,Great North Rd,51.646793,-0.187675,"[{'label': 'display', 'lat': 51.64679349064748...",1143,EN5 1AB,GB,Barnet,Greater London,United Kingdom,"[Great North Rd, Barnet, Greater London, EN5 1...",,,55bfd6a3498ecb12ed3241e3
3,Joie de Vie,Bakery,,51.653659,-0.201288,"[{'label': 'display', 'lat': 51.653659, 'lng':...",558,,GB,,,United Kingdom,[United Kingdom],,,55f6c258498e7a5e6b9bda02
4,Caffè Nero,Coffee Shop,128 High St,51.654861,-0.201743,"[{'label': 'display', 'lat': 51.65486135090324...",534,EN5 5XQ,GB,Barnet,Greater London,United Kingdom,"[128 High St, Barnet, Greater London, EN5 5XQ,...",,,56aa3b9038facb6e8b642f7d


We now have a dataframe of all the 42 venues as well as their respective categories. But for our purposes, we are only interested in the venues labeled 'Chinese restaurant'. So, let us create a block of code that loops through our dataframe and prints out any rows where the column 'categories' includes the phrase 'Chinese'.

In [28]:
# Iterate through the list of venues, if there are any Chinese restaurants listed, print the row
for result in (arkley_venues['categories'].str.contains('Chinese')):
    if result is False:
        pass
    else:
        print(result)

Interestingly, no results were returned. This means that although the borough of Barnet has one of the highest Chinese populations in London, there are no Chinese restaurants in the Arkley, Barnet area.

<h3>3.2 Explore all the Neighborhoods in the 5 Boroughs</h3>

We have analyzed one neighborhood and gotten interesting results. Now let us explore all the neighborhoods within the 5 boroughs and determine which neighborhoods have the highest number of Chinese restaurants. 

In [29]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
# Get all the nearby venues for the neighborhoods in our dataframe london
london_venues = getNearbyVenues(names=london['Neighborhood'], latitudes=london['Latitude'],longitudes=london['Longitude'])

Aldwych
Arkley
Arkley
Bankside
Barnet Gate
Barnet Gate
Bayswater
Belgravia
Belsize Park
Bethnal Green
Blackwall
Bloomsbury
Bow
Brent Cross
Brent Cross
Brunswick Park
Burroughs, The
Camberwell
Camden Town
Chinatown
Church End
Colindale
Colney Hatch
Dulwich
East Dulwich
East Finchley
Elephant and Castle
Elephant and Castle
Finchley
Finchley
Golders Green
Gospel Oak
Gospel Oak
Highgate
Holborn
Kennington
Kilburn
Lisson Grove
Little Venice
Little Venice
Mile End
Muswell Hill
North Finchley
Nunhead
Oakleigh Park
Osidge
Primrose Hill
Rotherhithe
Sydenham Hill
Temple
Tower Hill
Tufnell Park
Tufnell Park
Walworth


Now that we have all the nearby venues for the relevant neighborhoods, we can create a dataframe with the frequency of occurance of every venue category grouped by each neighborhood.

In [31]:
# Perform one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
london_onehot['Neighborhood'] = london_venues['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Yoga Studio,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,BBQ Joint,Baby Store,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Bookstore,Boutique,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Candy Store,Caribbean Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Community College,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cricket Ground,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Distillery,Dive Bar,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Lawyer,Lebanese Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Neighborhood,Nightclub,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Pharmacy,Photography Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,RV Park,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Road,Roof Deck,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Soccer Field,Social Club,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Xinjiang Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aldwych,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aldwych,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aldwych,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aldwych,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Aldwych,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [32]:
# Create a dataframe of each neighborhood and the frequency of each venue
london_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()

In [33]:
# Examine the dataframe
london_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,BBQ Joint,Baby Store,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Bookstore,Boutique,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buddhist Temple,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Candy Store,Caribbean Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Community College,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cricket Ground,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Distillery,Dive Bar,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Home Service,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Lake,Latin American Restaurant,Lawyer,Lebanese Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Metro Station,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music Venue,Nightclub,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Pharmacy,Photography Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,RV Park,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Road,Roof Deck,Rugby Pitch,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoothie Shop,Soccer Field,Social Club,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Xinjiang Restaurant
0,Aldwych,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.05,0.02,0.05,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.01,0.0
1,Arkley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bankside,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.147059,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Barnet Gate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bayswater,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now that we have a dataframe of all the neighborhoods and the frequency of each venue occuring in each neighborhood, we can filter out which neighborhoods have Chinese restaurants.

In [34]:
# Filter the dataframe by the column 'Chinese Restaurant' where the values is higher than 0
chinese_restaurant = (london_grouped.loc[london_grouped['Chinese Restaurant'] > 0])

# Drop all other irrelevant rows besides the neighborhood name and the frequency of Chinese restaurants
chinese_restaurant = chinese_restaurant.loc[:, chinese_restaurant.columns.intersection(['Neighborhood','Chinese Restaurant'])]

# Sort the dataframe in descending order. We don't reset the index as we will be needing the information later
chinese_restaurant = chinese_restaurant.sort_values(by='Chinese Restaurant', ascending = False)

In [35]:
# Examine the results of the neighborhood
chinese_restaurant

Unnamed: 0,Neighborhood,Chinese Restaurant
11,Brent Cross,0.181818
18,Colindale,0.1
36,Nunhead,0.066667
10,Bow,0.0625
40,Rotherhithe,0.034483
17,Church End,0.033333
4,Bayswater,0.03
32,Little Venice,0.03
24,Finchley,0.028571
9,Bloomsbury,0.020833


As we can see, there are **11 neighborhoods** with the Chinese restaurants. Next, we check what are the top 10 common venues in these neighborhoods to determine if 'Chinese Restaurant' is one of them.

In [36]:
# Function returning the most common venues in the neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
# Slicing the dataframe london_grouped to include only the neighborhoods with Chinese restaurants
select_neigh = london_grouped.loc[[4,9,10,11,13,16,17,23,31,35,39],:] # Indices obtained from the above dataframe

In [38]:
# Specifying we want the top 10 venues
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
common_venues = pd.DataFrame(columns=columns)
common_venues['Neighborhood'] = select_neigh['Neighborhood']

for ind in np.arange(select_neigh.shape[0]):
    common_venues.iloc[ind, 1:] = return_most_common_venues(select_neigh.iloc[ind, :], num_top_venues)

# Examine the dataframe
common_venues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Bayswater,Coffee Shop,Café,Hotel,Pub,Garden,Persian Restaurant,Pizza Place,Gym / Fitness Center,Greek Restaurant,Indian Restaurant
9,Bloomsbury,Garden,Pub,Coffee Shop,Bookstore,Café,Fish & Chips Shop,Park,Plaza,Gay Bar,Supermarket
10,Bow,Pub,Bus Stop,Bar,Park,Chinese Restaurant,Burger Joint,Road,Metro Station,Coffee Shop,Breakfast Spot
11,Brent Cross,Chinese Restaurant,Sporting Goods Shop,Supermarket,Grocery Store,Hardware Store,Clothing Store,Arts & Crafts Store,Home Service,Warehouse Store,Organic Grocery
13,"Burroughs, The",Home Service,Xinjiang Restaurant,Electronics Store,Food Court,Food & Drink Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
16,Chinatown,Coffee Shop,Italian Restaurant,Gym / Fitness Center,Clothing Store,Wine Bar,Vietnamese Restaurant,Sandwich Place,Bakery,Restaurant,Burger Joint
17,Church End,Supermarket,Turkish Restaurant,Coffee Shop,Pub,Pizza Place,Japanese Restaurant,Park,Café,Restaurant,Chinese Restaurant
23,Elephant and Castle,Pub,Café,Hotel,Park,History Museum,Bar,Gastropub,Restaurant,Coffee Shop,Climbing Gym
31,Lisson Grove,Cricket Ground,Café,Deli / Bodega,Hookah Bar,Grocery Store,Canal,Modern European Restaurant,French Restaurant,Salad Place,Garden
35,North Finchley,Sports Club,Soccer Field,Photography Studio,Health & Beauty Service,Rugby Pitch,Xinjiang Restaurant,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant


<h3>3.3 Clustering the Neighborhoods</h3>

We want to cluster the neighborhoods to determine if there is any correlation in the neighborhoods and concentration of Chinese restaurants. We chose the **kmeans clustering method** as it is the most common and best suited for our purposes. We also use the Elbow Method and determined the best number of clusters is 5.

In [39]:
# Set number of clusters
kclusters = 5

london_grouped_clustering = select_neigh.drop('Neighborhood', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 0, 1, 2, 2, 2, 4, 3])

Next, we do a left join on the two main dataframe we have: the most common venue dataframe and our original dataframe containing the neighborhood, borough, postal code, latitude and longitude.

In [40]:
# Add clustering labels
common_venues.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = london

# Perform a right join to merge the london dataframe with common_venues to add latitude/longitude for each neighborhood
london_merged = london_merged.join(common_venues.set_index('Neighborhood'), on='Neighborhood', how='right').reset_index(drop = True)

# Examine the dataframe
london_merged.head()

Unnamed: 0,Neighborhood,Borough,PostCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayswater,Westminster,W2,51.5156,-0.188645,2,Coffee Shop,Café,Hotel,Pub,Garden,Persian Restaurant,Pizza Place,Gym / Fitness Center,Greek Restaurant,Indian Restaurant
1,Bloomsbury,Camden,WC1,51.5236,-0.1223,2,Garden,Pub,Coffee Shop,Bookstore,Café,Fish & Chips Shop,Park,Plaza,Gay Bar,Supermarket
2,Bow,Tower Hamlets,E3,51.525,-0.026571,2,Pub,Bus Stop,Bar,Park,Chinese Restaurant,Burger Joint,Road,Metro Station,Coffee Shop,Breakfast Spot
3,Brent Cross,Barnet,NW2,51.5649,-0.223325,0,Chinese Restaurant,Sporting Goods Shop,Supermarket,Grocery Store,Hardware Store,Clothing Store,Arts & Crafts Store,Home Service,Warehouse Store,Organic Grocery
4,Brent Cross,Barnet,NW4,51.6,-0.2167,0,Chinese Restaurant,Sporting Goods Shop,Supermarket,Grocery Store,Hardware Store,Clothing Store,Arts & Crafts Store,Home Service,Warehouse Store,Organic Grocery


Let us also visualize our results by utilizing folium to superimpose the clusters onto a map of London.

In [41]:
# Get the latitude and longitude of London, UK
address = 'London, UK'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [42]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Neighborhood'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Lastly, lets examine each cluster by slicing the dataframe by cluster.

In [43]:
# Cluster 1
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Brent Cross,0,Chinese Restaurant,Sporting Goods Shop,Supermarket,Grocery Store,Hardware Store,Clothing Store,Arts & Crafts Store,Home Service,Warehouse Store,Organic Grocery
4,Brent Cross,0,Chinese Restaurant,Sporting Goods Shop,Supermarket,Grocery Store,Hardware Store,Clothing Store,Arts & Crafts Store,Home Service,Warehouse Store,Organic Grocery


In [44]:
# Cluster 2
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,"Burroughs, The",1,Home Service,Xinjiang Restaurant,Electronics Store,Food Court,Food & Drink Shop,Flower Shop,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


In [45]:
# Cluster 3
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayswater,2,Coffee Shop,Café,Hotel,Pub,Garden,Persian Restaurant,Pizza Place,Gym / Fitness Center,Greek Restaurant,Indian Restaurant
1,Bloomsbury,2,Garden,Pub,Coffee Shop,Bookstore,Café,Fish & Chips Shop,Park,Plaza,Gay Bar,Supermarket
2,Bow,2,Pub,Bus Stop,Bar,Park,Chinese Restaurant,Burger Joint,Road,Metro Station,Coffee Shop,Breakfast Spot
6,Chinatown,2,Coffee Shop,Italian Restaurant,Gym / Fitness Center,Clothing Store,Wine Bar,Vietnamese Restaurant,Sandwich Place,Bakery,Restaurant,Burger Joint
7,Church End,2,Supermarket,Turkish Restaurant,Coffee Shop,Pub,Pizza Place,Japanese Restaurant,Park,Café,Restaurant,Chinese Restaurant
8,Elephant and Castle,2,Pub,Café,Hotel,Park,History Museum,Bar,Gastropub,Restaurant,Coffee Shop,Climbing Gym
9,Elephant and Castle,2,Pub,Café,Hotel,Park,History Museum,Bar,Gastropub,Restaurant,Coffee Shop,Climbing Gym


In [46]:
# Cluster 4
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,North Finchley,3,Sports Club,Soccer Field,Photography Studio,Health & Beauty Service,Rugby Pitch,Xinjiang Restaurant,Eastern European Restaurant,Fish Market,Fish & Chips Shop,Fast Food Restaurant


In [47]:
# Cluster 5
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Lisson Grove,4,Cricket Ground,Café,Deli / Bodega,Hookah Bar,Grocery Store,Canal,Modern European Restaurant,French Restaurant,Salad Place,Garden
12,Primrose Hill,4,Cricket Ground,Café,Deli / Bodega,Hookah Bar,Grocery Store,Canal,Modern European Restaurant,French Restaurant,Salad Place,Garden


## 4. Discussion and Recommendation <a name="discussion"></a>

Based on an initial look at the <code>common_venues</code> dataframe, we can see that only 2 neighborhoods – **Brent Cross, Barnet and Bow, Tower Hamlets** – have Chinese restaurants listed in the top 5 most common venues. Some of the other neighborhoods do not even have Chinese restaurant listed. We now have a clearer picture of which neighborhoods seem to be popular for Chinese restaurants.

From the folium map, we can see that the area we want to be focusing on is near the London city center and north of London.
Cluster 2 and 5 are removed from our consideration. They do not have Chinese restaurant listed in the 10 most common venues. We are looking for areas where demand for Chinese restaurants is high, thus where Chinese restaurants are a popular locale.
The most viable option here is Cluster 1 and 3. The clustering algorithm has confirmed that the neighborhood Brent Cross and Bow are our best options. However, we must keep in mind that there are other contributing factors that we have not considered due to the lack of public information available such as a breakdown of ethnicities in each neighborhood and market data on Chinese restaurants, i.e., who are more likely to frequent them including a breakdown of age, income, education, ethnicity etc. information. Also, the Census information we obtained was from back in 2011, 10 years ago, therefore the demographic in London may have chanced since then.

A city as diverse as London and the 5 selected boroughs with the highest Chinese population only results in 11 neighborhoods that have Chinese restaurants is unlikely.
Given this caveat, our recommendation for where to open a Chinese restaurant is either at Brent Cross, Barnet or Bow, Tower Hamlet. 

## 5. Conclusion <a name="conclusion"></a>

In this project, our aim was to identify which neighborhood would be best suited to open a new Chinese restaurant. Our main methodology to determine this was to understand where the demand is for Chinese restaurants. We accomplished this by determining where the concentration of Chinese communities are, then analyzing each neighborhood in those boroughs. The neighborhoods with the most Chinese restaurants are the ones that were deemed the most viable options as this indicates where the demand is high. 

Our recommendation, ultimately, was either Brent Cross, Barnet or Bow, Tower Hamlet. Although in this report, we do acknowledge that the missing data, especially market data on Chinese restaurants is required to give an even more accurate recommendation. 
This information would be useful for those looking to open a new Chinese restaurant and wanting to know where the demand for them is high.
