# Capstone Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

<img src = "https://p2.zoon.ru/preview/zuUcMIdBXqJR1uMkFUVmBQ/584x440x85/1/c/1/original_5d284555ca057f02bc6262fa_5d284743b20e5.jpg">

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)


## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a bakery. Specifically, this report will be targeted to stakeholders interested in opening a luxury bakery in London UK.

Since there are lots of restaurants in London we will try to detect **locations that are not already crowded with bakeries**. We are also particularly interested in **areas with no popular bakeries**. We would also prefer locations **for wealthy people**.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* Average monthly rental costs in London. The more such costs, the more wealthy people live in that boroughs. Therefore, it'll be considered the good place for opening a luxury bakery.
* The number of bakeries in top10 places in the chosen borougs.

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* Average monthly rental costs in Greater London as of June 2019, by borough (in GPB) https://www.statista.com/statistics/752279/average-rental-costs-in-greater-london-boroughs/ 
* number of top10 places and their type and location in every neighborhood will be obtained using **Foursquare API**
* Greater London Area postal codes <https://en.wikipedia.org/wiki/List_of_areas_of_London >

Libraries — For convenience, all the libraries are presented at the beginning

In [2]:
# library for BeautifulSoup, for web scrapping
from bs4 import BeautifulSoup
# library to handle data in a vectorized manner
import numpy as np
# library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
# library to handle JSON files
import json
print('numpy, pandas, ..., imported...')
!pip -q install geopy
print('geopy installed...')
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
print('Nominatim imported...')
# library to handle requests
import requests
print('requests imported...')
# tranform JSON file into a pandas dataframe
from pandas.io.json import json_normalize
print('json_normalize imported...')
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
print('matplotlib imported...')
# import k-means from clustering stage
from sklearn.cluster import KMeans
print('Kmeans imported...')
# install the Geocoder
!pip -q install geocoder
import geocoder
# import time
import time
!conda install -c conda-forge folium=0.5.0 --yes
print('folium installed...')
import folium # map rendering library
print('folium imported...')


import types
from botocore.client import Config
import ibm_boto3

print('...Done')

numpy, pandas, ..., imported...
geopy installed...
Nominatim imported...
requests imported...
json_normalize imported...
matplotlib imported...
Kmeans imported...
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
   

### Neighborhood Candidates

At first, let's analyse average rental costs by boroughs based on the information presented in https://www.statista.com/statistics/752279/average-rental-costs-in-greater-london-boroughs/

In [3]:
def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_11aaefccbd32400f8784ede2c5c844af = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='-VN2-M-BMVHsGanr--Racx2_rFeWPk1Z0a6ddeo0oE0n',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_11aaefccbd32400f8784ede2c5c844af.get_object(Bucket='courseracapstone-donotdelete-pr-uzusml2razjnkt',Key='Rental costs London.xlsx')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_rent = pd.read_excel(body)
df_rent.head()

Unnamed: 0,Borough,Price
0,City of Westminster,2534
1,Lambeth,2055
2,Camden & City of London,2013
3,Hammersmith and Fulham & Kensington and Chelsea,1945
4,Wandsworth,1760


In [4]:
df_rent = df_rent.set_index(['Price']).stack().str.split('&|,', expand=True).stack().unstack(-2).reset_index(-1, drop=True).reset_index()
df_rent.shape

(33, 2)

In [5]:
df_rent

Unnamed: 0,Price,Borough
0,2534,City of Westminster
1,2055,Lambeth
2,2013,Camden
3,2013,City of London
4,1945,Hammersmith and Fulham
5,1945,Kensington and Chelsea
6,1760,Wandsworth
7,1729,Lewisham
8,1729,Southwark
9,1719,Tower Hamlets


As we see top4 boroughs (City of Westminster, City of London, Lambeth and Camden) are good candidates among others for place where we can open a luxury bakery. 

In [6]:
import json
from six.moves.urllib.request import urlopen

json_url = urlopen('https://raw.githubusercontent.com/Helavissa05/Coursera_Capstone/master/London_boroughs.json')
world_geo = json.loads(json_url.read())
world_geo


{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'properties': {'AREA_CODE': 'LBO',
    'DESCRIPTIO': 'London Borough',
    'FILE_NAME': 'GREATER_LONDON_AUTHORITY',
    'NUMBER': 77.0,
    'NUMBER0': 1312.0,
    'POLYGON_ID': 50632.0,
    'UNIT_ID': 11244.0,
    'CODE': 'E09000007',
    'HECTARES': 2178.932,
    'AREA': 0.0,
    'TYPE_CODE': 'AA',
    'DESCRIPT0': 'CIVIL ADMINISTRATION AREA',
    'TYPE_COD0': None,
    'DESCRIPT1': None,
    'type': 'borough',
    'name': 'Camden'},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-0.140893669649311, 51.568510662383844],
      [-0.140063090227398, 51.56718624944498],
      [-0.139589907361594, 51.56516780518284],
      [-0.139082922856378, 51.564084144247936],
      [-0.139065605693718, 51.5638014742053],
      [-0.139195555312194, 51.56302200168111],
      [-0.138985366946239, 51.5623387602664],
      [-0.139091811893047, 51.56160478162274],
      [-0.139436095067137, 51.56073517593641],
      [-0.13944925426

In [8]:
latitude = 51.509865
longitude = -0.118092
world_map = folium.Map(location=[latitude,longitude], zoom_start=10)

world_map.choropleth(
    geo_data=world_geo,
    data=df_rent,
    columns=['Borough', 'Price'],
    key_on='properties.name',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Rent prices in London'
)

#display map
world_map

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods.

In this project, London will be used as synonymous to the “Greater London Area” in this project. Within the Greater London Area, there are areas that are within the London Area Postcode. The focus of this project will be the neighbourhoods are that are within the London Post Code area.
The London Area consists of 32 Boroughs and the “City of London”. Our data will be from the link — Greater London Area <https://en.wikipedia.org/wiki/List_of_areas_of_London >
The web scrapped of the Wikipedia page for the Greater London Area data is provided below:
The BeautifulSoup package is used to scrap the needed data from Wikipedia.

In [10]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
wikipedia_page = requests.get(wikipedia_link)

# Cleans html file
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
# This extracts the "tbody" within the table where class is "wikitable sortable"
table = soup.find('table', {'class':'wikitable sortable'}).tbody
# Extracts all "tr" (table rows) within the table above
rows = table.find_all('tr')
# Extracts the column headers, removes and replaces possible '\n' with space for the "th" tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
# Converts columns to pd dataframe
df = pd.DataFrame(columns = columns)
'''
Extracts every row with corresponding columns then appends the values to the create pd dataframe "df". The first row (row[0]) is skipped because it is already the header
'''
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)
        df                                                                                             

The resulting dataframe needs to be cleaned as follows:

In [11]:
# Remove Borough reference numbers with []

df.iloc[:,1] = df.iloc[:,1].map(lambda x: x.rstrip("]").rstrip("0123456789").rstrip("["))
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [12]:
df0 = df.drop(df.columns[3],axis='columns').join(df.iloc[:,3].str.split(",", expand=True).stack().reset_index(level=1, drop=True).rename("Postcode"))
df0.columns = ['Location', 'Borough', 'Post-town','Dial-code','OSgridref','Postcode']
df0.head()

Unnamed: 0,Location,Borough,Post-town,Dial-code,OSgridref,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,20,TQ465785,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W4
2,Addington,Croydon,CROYDON,20,TQ375645,CR0
3,Addiscombe,Croydon,CROYDON,20,TQ345665,CR0


In [13]:
df1 = df0[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)

In [14]:
df2 = df1 # assigns df1 to df2
df21 = df2[df2['Post-town'].str.contains('LONDON')]

In [15]:
df3 = df21[["Location", "Borough", "Postcode"]].reset_index(drop=True)
df3

Unnamed: 0,Location,Borough,Postcode
0,Abbey Wood,"Bexley, Greenwich",SE2
1,Acton,"Ealing, Hammersmith and Fulham",W3
2,Acton,"Ealing, Hammersmith and Fulham",W4
3,Aldgate,City,EC3
4,Aldwych,Westminster,WC2
5,Anerley,Bromley,SE20
6,Angel,Islington,EC1
7,Angel,Islington,N1
8,Archway,Islington,N19
9,Arkley,Barnet,EN5


Let's choose only those postcodes that are related to out top4 boroughs

In [16]:

df_lux = df3.loc[df3['Borough'].isin(['Westminster','City','Lambeth','Camden', 'City, Westminster','Islington & City'])].reset_index(drop=True)
df_lux.drop('Location', axis=1, inplace=True)
df_lux

Unnamed: 0,Borough,Postcode
0,City,EC3
1,Westminster,WC2
2,City,EC1
3,Westminster,W2
4,Westminster,SW1
5,Camden,NW3
6,City,EC4
7,Camden,WC1
8,Lambeth,SW2
9,Lambeth,SW9


Let's delete duplicated postcodes

In [17]:
df_lux.drop_duplicates('Postcode',inplace=True)
df_lux.reset_index(drop=True, inplace=True)
df_lux

Unnamed: 0,Borough,Postcode
0,City,EC3
1,Westminster,WC2
2,City,EC1
3,Westminster,W2
4,Westminster,SW1
5,Camden,NW3
6,City,EC4
7,Camden,WC1
8,Lambeth,SW2
9,Lambeth,SW9


In [18]:
df_lux["Borough"].unique()

array(['City', 'Westminster', 'Camden', 'Lambeth'], dtype=object)

In [19]:
df_lux.Borough.replace(['City'],['City of London'], inplace=True)
df_lux["Borough"].unique()

array(['City of London', 'Westminster', 'Camden', 'Lambeth'], dtype=object)

### Geocoder

In obtaining the location data of the locations, the Geocoder package is used with the arcgis_geocoder to obtain the latitude and longitude of the needed locations.
These will help to create a new dataframe that will be used subsequently for top4 areas.

In [20]:
# Geocoder starts here
# Defining a function to use --> get_latlng()'''
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Geocoder ends here

In [21]:
postal_codes = df_lux['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]
df_loc = df_lux
# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
df_loc_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_loc['Latitude'] = df_loc_coordinates['Latitude']
df_loc['Longitude'] = df_loc_coordinates['Longitude']
df_loc.head(5)

Unnamed: 0,Borough,Postcode,Latitude,Longitude
0,City of London,EC3,51.512,-0.08058
1,Westminster,WC2,51.51651,-0.11968
2,City of London,EC1,51.52361,-0.09877
3,Westminster,W2,51.51494,-0.18048
4,Westminster,SW1,51.49713,-0.13829


In [22]:
df_loc.shape

(30, 4)

### Foursquare
Now that we have our location candidates, let's use Foursquare API to get info on venues in each neighborhood.

In [23]:
CLIENT_ID = 'HA2LTVYBAF4QMMJEEBTEFH1VGD2B1GIZULF1EHABR0KGWJYH' # your Foursquare ID
CLIENT_SECRET = 'JLOCPKMTSPTGYN2MG234XHUAYJEOM22EXIGWFXPFWBGA30M5' # your Foursquare Secret
VERSION = '20191105' # Foursquare API version
LIMIT = 100

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: HA2LTVYBAF4QMMJEEBTEFH1VGD2B1GIZULF1EHABR0KGWJYH
CLIENT_SECRET:JLOCPKMTSPTGYN2MG234XHUAYJEOM22EXIGWFXPFWBGA30M5


In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
      #  url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
    
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
london_venues = getNearbyVenues(names=df_loc['Borough'],
                                   latitudes=df_loc['Latitude'],
                                   longitudes=df_loc['Longitude']
                                  )

City of London
Westminster
City of London
Westminster
Westminster
Camden
City of London
Camden
Lambeth
Lambeth
Lambeth
Camden
Westminster
Lambeth
Lambeth
Camden
Camden
Lambeth
Camden
Camden
Lambeth
Westminster
Westminster
Westminster
Lambeth
Lambeth
Camden
Lambeth
Camden
Lambeth


In [29]:
print(london_venues.shape)
london_venues.head()

(2496, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,City of London,51.512,-0.08058,The Association,51.513733,-0.079132,Coffee Shop
1,City of London,51.512,-0.08058,The Garden at 120,51.512101,-0.080799,Garden
2,City of London,51.512,-0.08058,Sky Garden,51.511168,-0.083625,Scenic Lookout
3,City of London,51.512,-0.08058,Curators Coffee Studio,51.512085,-0.082568,Coffee Shop
4,City of London,51.512,-0.08058,BrewDog Tower Hill,51.509948,-0.080977,Beer Bar


In [30]:
london_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Camden,833,833,833,833,833,833
City of London,300,300,300,300,300,300
Lambeth,726,726,726,726,726,726
Westminster,637,637,637,637,637,637


In [31]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 263 uniques categories.


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of London that have low top bakeries. We have limited our analysis to area ~1km around neighborhoods' centers.

In the first step we have collected the required **data: location and type (category) of every venue within 1km from neighborhoods centers**. 

In the second and final step we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods  which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 

### Clustering

In [32]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighborhood'] = london_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Zoo Exhibit,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Cantonese Restaurant,Caribbean Restaurant,Castle,Caucasian Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Colombian Restaurant,Comedy Club,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,Neighborhood,New American Restaurant,Nightclub,Office,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Palace,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Scottish Restaurant,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Snack Place,Social Club,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,Stadium,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,City of London,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,City of London,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,City of London,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,City of London,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,City of London,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
london_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()
london_grouped

Unnamed: 0,Neighborhood,Zoo Exhibit,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Boxing Gym,Brasserie,Brazilian Restaurant,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,Café,Camera Store,Canal,Cantonese Restaurant,Caribbean Restaurant,Castle,Caucasian Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Cafeteria,Colombian Restaurant,Comedy Club,Comic Shop,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Hill,Himalayan Restaurant,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Office,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Palace,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recording Studio,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Rock Club,Roof Deck,Sake Bar,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Scottish Restaurant,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Snack Place,Social Club,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stables,Stadium,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Theme Restaurant,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Used Bookstore,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,Zoo
0,Camden,0.009604,0.0,0.004802,0.0,0.0012,0.003601,0.006002,0.002401,0.006002,0.002401,0.002401,0.0,0.0,0.0,0.002401,0.040816,0.012005,0.0,0.002401,0.003601,0.0012,0.0012,0.0,0.003601,0.0012,0.019208,0.0,0.002401,0.0,0.0012,0.002401,0.002401,0.0,0.0,0.010804,0.003601,0.0,0.006002,0.0,0.0,0.060024,0.0,0.002401,0.0,0.0,0.0,0.0012,0.0,0.002401,0.002401,0.0012,0.0,0.0,0.009604,0.0,0.006002,0.060024,0.0,0.0,0.0012,0.0012,0.002401,0.0,0.006002,0.0012,0.0,0.002401,0.004802,0.0012,0.0,0.0012,0.003601,0.014406,0.0,0.003601,0.002401,0.003601,0.002401,0.0,0.0012,0.0,0.0,0.0,0.0012,0.0,0.002401,0.002401,0.0012,0.008403,0.0012,0.0012,0.006002,0.004802,0.006002,0.0012,0.002401,0.0012,0.003601,0.0012,0.0,0.002401,0.0,0.014406,0.0,0.0012,0.010804,0.0012,0.012005,0.0,0.0,0.004802,0.0012,0.008403,0.021609,0.0012,0.018007,0.0,0.0,0.0,0.002401,0.0,0.003601,0.012005,0.002401,0.0012,0.022809,0.007203,0.010804,0.018007,0.0,0.0,0.0,0.033613,0.010804,0.002401,0.0,0.0012,0.0,0.002401,0.0012,0.002401,0.004802,0.0,0.002401,0.0012,0.0,0.0012,0.004802,0.002401,0.004802,0.0,0.003601,0.0,0.0012,0.003601,0.007203,0.0,0.0,0.004802,0.0,0.0012,0.0,0.007203,0.0012,0.008403,0.0012,0.009604,0.0,0.0,0.0012,0.0,0.002401,0.002401,0.0012,0.0,0.0012,0.002401,0.0,0.0,0.010804,0.002401,0.0012,0.0012,0.0,0.0012,0.006002,0.0012,0.028812,0.0,0.003601,0.004802,0.002401,0.006002,0.07563,0.0012,0.002401,0.0012,0.0,0.0,0.0,0.007203,0.0012,0.0,0.0,0.0012,0.008403,0.0,0.0,0.002401,0.0,0.003601,0.004802,0.0012,0.0012,0.0,0.0012,0.0012,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.004802,0.0,0.0012,0.004802,0.002401,0.0,0.0,0.009604,0.006002,0.0012,0.016807,0.010804,0.0012,0.0012,0.0,0.0012,0.002401,0.003601,0.004802,0.0012,0.002401,0.006002,0.0,0.004802,0.0,0.0,0.0,0.003601,0.006002,0.0,0.004802,0.0,0.0,0.002401,0.0012
1,City of London,0.0,0.0,0.003333,0.0,0.0,0.01,0.013333,0.01,0.0,0.003333,0.0,0.0,0.0,0.003333,0.003333,0.006667,0.013333,0.003333,0.0,0.016667,0.0,0.0,0.003333,0.0,0.003333,0.0,0.003333,0.0,0.006667,0.0,0.0,0.006667,0.0,0.003333,0.006667,0.003333,0.0,0.0,0.0,0.003333,0.023333,0.0,0.0,0.0,0.0,0.006667,0.0,0.003333,0.0,0.0,0.0,0.006667,0.0,0.0,0.0,0.03,0.103333,0.0,0.003333,0.0,0.0,0.0,0.006667,0.0,0.0,0.003333,0.0,0.0,0.0,0.0,0.0,0.003333,0.0,0.0,0.003333,0.0,0.0,0.0,0.003333,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.006667,0.003333,0.016667,0.0,0.0,0.003333,0.0,0.0,0.006667,0.0,0.0,0.0,0.003333,0.016667,0.0,0.016667,0.0,0.0,0.02,0.0,0.006667,0.0,0.003333,0.0,0.0,0.0,0.006667,0.0,0.05,0.003333,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.07,0.006667,0.0,0.006667,0.006667,0.0,0.0,0.026667,0.01,0.0,0.0,0.003333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006667,0.0,0.006667,0.0,0.003333,0.0,0.0,0.0,0.003333,0.003333,0.0,0.01,0.003333,0.003333,0.0,0.0,0.0,0.003333,0.0,0.0,0.0,0.0,0.0,0.003333,0.003333,0.0,0.0,0.003333,0.0,0.0,0.0,0.0,0.016667,0.0,0.003333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.006667,0.0,0.003333,0.036667,0.003333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.006667,0.0,0.013333,0.006667,0.0,0.026667,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003333,0.003333,0.003333,0.003333,0.006667,0.0,0.003333,0.0,0.0,0.006667,0.0,0.006667,0.003333,0.01,0.003333,0.003333,0.003333,0.003333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.003333,0.0,0.006667,0.0,0.0,0.0,0.0,0.016667,0.0,0.006667,0.0,0.016667,0.003333,0.0,0.0,0.0,0.0,0.0,0.0
2,Lambeth,0.0,0.004132,0.001377,0.002755,0.0,0.002755,0.006887,0.001377,0.002755,0.004132,0.004132,0.0,0.001377,0.004132,0.0,0.020661,0.012397,0.0,0.0,0.009642,0.001377,0.0,0.004132,0.001377,0.0,0.004132,0.0,0.0,0.001377,0.0,0.002755,0.012397,0.009642,0.0,0.006887,0.0,0.004132,0.013774,0.0,0.0,0.063361,0.0,0.0,0.002755,0.008264,0.0,0.0,0.001377,0.001377,0.004132,0.0,0.0,0.001377,0.001377,0.0,0.009642,0.068871,0.001377,0.0,0.0,0.0,0.0,0.0,0.008264,0.001377,0.0,0.002755,0.002755,0.002755,0.0,0.0,0.002755,0.00551,0.001377,0.0,0.001377,0.004132,0.0,0.0,0.0,0.0,0.001377,0.001377,0.0,0.001377,0.002755,0.002755,0.0,0.0,0.001377,0.001377,0.006887,0.001377,0.00551,0.0,0.001377,0.004132,0.002755,0.002755,0.001377,0.001377,0.0,0.004132,0.001377,0.001377,0.009642,0.00551,0.020661,0.006887,0.0,0.002755,0.0,0.001377,0.041322,0.006887,0.024793,0.0,0.0,0.001377,0.0,0.001377,0.0,0.004132,0.0,0.001377,0.023416,0.0,0.004132,0.023416,0.004132,0.001377,0.001377,0.023416,0.002755,0.006887,0.001377,0.0,0.001377,0.001377,0.0,0.0,0.002755,0.002755,0.0,0.0,0.0,0.0,0.0,0.0,0.006887,0.001377,0.002755,0.0,0.0,0.001377,0.002755,0.0,0.0,0.004132,0.0,0.001377,0.001377,0.001377,0.0,0.001377,0.0,0.009642,0.0,0.001377,0.001377,0.00551,0.0,0.001377,0.0,0.0,0.001377,0.0,0.001377,0.001377,0.034435,0.0,0.00551,0.0,0.002755,0.001377,0.0,0.0,0.022039,0.006887,0.001377,0.00551,0.001377,0.020661,0.095041,0.001377,0.002755,0.0,0.002755,0.002755,0.001377,0.012397,0.0,0.0,0.0,0.001377,0.011019,0.0,0.002755,0.0,0.0,0.004132,0.001377,0.0,0.0,0.001377,0.001377,0.0,0.002755,0.001377,0.0,0.0,0.0,0.0,0.00551,0.0,0.0,0.001377,0.0,0.001377,0.002755,0.012397,0.001377,0.001377,0.0,0.006887,0.001377,0.0,0.008264,0.006887,0.0,0.0,0.001377,0.0,0.001377,0.008264,0.004132,0.0,0.0,0.00551,0.001377,0.00551,0.001377,0.0,0.001377,0.004132,0.0,0.001377,0.0,0.0,0.001377,0.004132,0.0
3,Westminster,0.0,0.0,0.0,0.0,0.0,0.00157,0.010989,0.00157,0.00314,0.0,0.00471,0.00157,0.0,0.0,0.0,0.021978,0.006279,0.0,0.0,0.009419,0.0,0.0,0.0,0.00157,0.00157,0.007849,0.0,0.00314,0.0,0.00157,0.00314,0.007849,0.0,0.0,0.012559,0.0,0.0,0.00157,0.00314,0.0,0.058085,0.00157,0.00471,0.0,0.0,0.0,0.00157,0.0,0.00471,0.017268,0.00471,0.0,0.0,0.009419,0.00157,0.007849,0.043956,0.0,0.0,0.00157,0.00157,0.0,0.00157,0.00157,0.006279,0.0,0.0,0.006279,0.00314,0.00157,0.0,0.00471,0.015699,0.00314,0.007849,0.0,0.00314,0.0,0.0,0.0,0.00157,0.0,0.0,0.00314,0.0,0.006279,0.00157,0.00157,0.00314,0.00471,0.0,0.00314,0.00471,0.00314,0.0,0.0,0.00157,0.0,0.00314,0.0,0.0,0.00314,0.025118,0.0,0.0,0.036107,0.00471,0.00314,0.0,0.00157,0.006279,0.00471,0.012559,0.009419,0.006279,0.018838,0.0,0.00314,0.0,0.0,0.0,0.006279,0.00471,0.00157,0.0,0.062794,0.014129,0.010989,0.018838,0.0,0.0,0.0,0.017268,0.00471,0.0,0.0,0.012559,0.0,0.00314,0.00157,0.00314,0.0,0.0,0.00471,0.0,0.00157,0.00157,0.00314,0.00471,0.00157,0.0,0.00314,0.00157,0.00157,0.0,0.006279,0.0,0.00314,0.006279,0.0,0.00157,0.0,0.006279,0.00157,0.0,0.0,0.0,0.00157,0.0,0.0,0.0,0.00157,0.00157,0.00157,0.00314,0.006279,0.0,0.0,0.00157,0.010989,0.00471,0.00157,0.009419,0.0,0.0,0.006279,0.00157,0.012559,0.0,0.00471,0.006279,0.0,0.00314,0.048666,0.00157,0.0,0.00157,0.0,0.0,0.0,0.017268,0.0,0.0,0.00157,0.00471,0.014129,0.00314,0.0,0.0,0.00157,0.009419,0.00314,0.0,0.00157,0.00157,0.00157,0.0,0.0,0.0,0.00314,0.0,0.0,0.0,0.00471,0.0,0.00314,0.0,0.007849,0.0,0.00157,0.00471,0.010989,0.0,0.0,0.00471,0.006279,0.0,0.009419,0.018838,0.0,0.0,0.00157,0.00157,0.0,0.00314,0.00471,0.0,0.0,0.00314,0.0,0.0,0.0,0.0,0.0,0.006279,0.006279,0.0,0.00157,0.00157,0.0,0.00157,0.0


In [34]:
london_grouped.shape

(4, 263)

In [35]:
# set number of clusters
kclusters = 2

london_grouped_clustering = london_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 0, 0], dtype=int32)

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = london_grouped['Neighborhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Camden,Pub,Café,Coffee Shop,Bakery,Italian Restaurant,Pizza Place,Hotel,Grocery Store,Bookstore,Indian Restaurant
1,City of London,Coffee Shop,Hotel,Gym / Fitness Center,Pub,Cocktail Bar,Scenic Lookout,Italian Restaurant,Café,History Museum,Garden
2,Lambeth,Pub,Coffee Shop,Café,Grocery Store,Park,Gym / Fitness Center,Italian Restaurant,Hotel,Indian Restaurant,Pizza Place
3,Westminster,Hotel,Café,Pub,Coffee Shop,Garden,French Restaurant,Bakery,Indian Restaurant,Theater,Gym / Fitness Center


In [38]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

df_loc.columns = ['Neighborhood','Postcode', 'Latitude', 'Longitude']

london_merged = df_loc

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

london_merged.head() 

Unnamed: 0,Neighborhood,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,City of London,EC3,51.512,-0.08058,1,Coffee Shop,Hotel,Gym / Fitness Center,Pub,Cocktail Bar,Scenic Lookout,Italian Restaurant,Café,History Museum,Garden
1,Westminster,WC2,51.51651,-0.11968,0,Hotel,Café,Pub,Coffee Shop,Garden,French Restaurant,Bakery,Indian Restaurant,Theater,Gym / Fitness Center
2,City of London,EC1,51.52361,-0.09877,1,Coffee Shop,Hotel,Gym / Fitness Center,Pub,Cocktail Bar,Scenic Lookout,Italian Restaurant,Café,History Museum,Garden
3,Westminster,W2,51.51494,-0.18048,0,Hotel,Café,Pub,Coffee Shop,Garden,French Restaurant,Bakery,Indian Restaurant,Theater,Gym / Fitness Center
4,Westminster,SW1,51.49713,-0.13829,0,Hotel,Café,Pub,Coffee Shop,Garden,French Restaurant,Bakery,Indian Restaurant,Theater,Gym / Fitness Center


In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Neighborhood'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [41]:
london_clusters = london_merged[['Neighborhood','Cluster Labels']]
london_clusters["value"]=1
pivot = pd.pivot_table(london_clusters, values="value", index=["Neighborhood"], columns="Cluster Labels", fill_value=0) 
pivot

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Cluster Labels,0,1
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1
Camden,1,0
City of London,0,1
Lambeth,1,0
Westminster,1,0


In [143]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]].drop_duplicates()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Westminster,Hotel,Café,Pub,Coffee Shop,Garden,French Restaurant,Bakery,Theater,Restaurant,Indian Restaurant
5,Camden,Pub,Coffee Shop,Café,Bakery,Italian Restaurant,Pizza Place,Hotel,Grocery Store,Bookstore,Gym / Fitness Center
8,Lambeth,Pub,Coffee Shop,Café,Grocery Store,Park,Gym / Fitness Center,Italian Restaurant,Hotel,Pizza Place,Bakery


In [144]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]].drop_duplicates()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,City of London,Coffee Shop,Hotel,Gym / Fitness Center,Pub,Cocktail Bar,Italian Restaurant,Scenic Lookout,Café,History Museum,Garden


## Results and Discussion <a name="results"></a>

Our analysis shows that there are 4 certain neighborhoods that are best suited for offering luxury bakery segment. It was based on the fact that the rent price is positively correlated with the wealth of people leaving in that neighborhoods.

After directing our attention to this more narrow area of interest we first explored them on top10 most popular venues.

Those location candidates were then clustered to create zones of interest. Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Result of all there are 2 clusters: 1 - Westminster, Camden, Lambeth and 2 - City of London. After analysis of those 2 clusters we came to conclusion that cluster#1 is less attractive for stakeholders than cluster #2. It's explained by the fact that all of the neighborhoods in Cluster#1 has  bakery in their top10 most common venues whereas Cluster#2 doesn't have at all. It means that the City of London should be the starting point for more detailed analysis which could eventually result in location which has not only no nearby competition.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify London neigborhood close to center with low number of bakeries in top vanues in order to aid stakeholders in narrowing down the search for optimal location for a new luxury bakery. We chose top4 areas based on the highest average month rent, assuming that rent price is positively correlated with the wealth of people leaving in that neighborhoods.

Clustering of those neighborhoods was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.