Project Title: The Beverage Battle of the Neighborhoods in London, UK

Part II:

DATA AND SOLUTION DESCRIPTIONS:

First of all, I will install and import the necessary python packages, such as Beautifulsoup, wikipedia, pandas, numpy, geopy, and sklearn. Second, I will download ‘the List of areas of London’ on Wikipedia (https://en.wikipedia.org/wiki/List_of_areas_of_London) and sort the data to form the first data frame. The data frame contains data of ‘Location’, ‘Borough', 'Post town', 'PostalCode', 'Dial code', 'OS grid ref'. I will replace 'Location' with 'Neighborhod' to make it easier to understand and drop the column, 'Dial code', as it will not be used later. Next, I will convert 'OS grid ref' to latitude and longitude, create a new data frame to store the converted information, and merge this new data frame with the first one. 

After clearing the data, I will only focus on data in the post town, London, to ensure the business location derived later will be in the main area of London. I will then garner information about the most common venues in each neighborhood of the main London area using the Foursquare API. 

Subsequently, I will apply k-means clustering with k=5 to segmenting the neighborhoods based on information of the five most common venues in the respective neighborhoods. For the cluster with the highest frequency of the first three common venues being one of the following categories: coffee shop, café, pub, and bar, it will be further analyzed. Within this cluster, I will select neighborhoods showing all of the first 3 common venues being one of the aforementioned categories and present their locations using the folium package. 

Finally, I will use the ‘search for venues’ function in the Foursquare to identify coffee shops, cafés, pubs, and bars near the identified neighborhoods as the potential customers for this new beverage and ingredient supply company.  

In this session below, I will be showing data extraction and cleaning for the subsequent analysis.

In [1]:
import wikipedia
print(wikipedia.WikipediaPage(title = 'List of areas of London').summary)

This is a list of the areas of London, in alphabetical order.
London is administered by the City of London and 32 London boroughs. These boroughs are modern, having been created in 1965 and have a weaker sense of identity than their constituent "districts" (considered in speech, "parts of London" or more formally, "areas"). The boroughs were primarily formed from amalgamations of Metropolitan, County and Municipal Boroughs. These in turn were groupings of Ancient Parishes, in turn being economically large enough villages and towns which warranted a church, and before 1900 gained a Civil Parish counterpart in almost all instances. The capital had three ancient boroughs, uniting for a purpose once-subdivided parishes, London, Southwark and Kingston upon Thames. Most areas were instead towns and villages with quite steady boundaries from the Middle Ages, through the process of urbanisation and into the modern era.
Sub-districts of the districts rooted on parishes are of five types: 

form

In [1]:
import pandas as pd
import numpy as np
import geopy
import folium
from geopy.geocoders import Nominatim
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


In [3]:
from bs4 import BeautifulSoup
import requests

wikipage = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'

In [4]:
page = requests.get(wikipage)
page_html = BeautifulSoup(page.text, 'lxml')
wiki_table = page_html.find('table', attrs = {'class':'wikitable sortable'})
#print(wiki_table)
row_list = wiki_table.find_all('tr')

In [5]:
header_row = row_list.pop(0)
header_th = header_row.find_all('th')
header = [el.text for el in header_th]

table_dict = {x:[] for x in header}

In [6]:
for row in row_list:
 row_td = row.find_all('td')
 for el,td in zip(header,row_td):
    table_dict[el].append(td.text)

London = pd.DataFrame(table_dict)
print(London)

                                       Location  \
0                                    Abbey Wood   
1                                         Acton   
2                                     Addington   
3                                    Addiscombe   
4                                   Albany Park   
5                              Aldborough Hatch   
6                                       Aldgate   
7                                       Aldwych   
8                                      Alperton   
9                                       Anerley   
10                                        Angel   
11                                    Aperfield   
12                                      Archway   
13                               Ardleigh Green   
14                                       Arkley   
15                                  Arnos Grove   
16                                       Balham   
17                                     Bankside   
18                             

To ensure the correct column names To remove "\n" in column 'Neighborhood'

In [7]:
London.columns=['Neighborhood', 'Borough', 'Post town', 'PostalCode', 'Dial code', 'OS grid ref']
London = London.replace('\n','', regex=True)
London['Borough'] = London['Borough'].str.replace('\d+', '')
London['Borough']=London['Borough'].str.replace("[]",'', regex=False)

London

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Dial code,OS grid ref
0,Abbey Wood,Greenwich,LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon,CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
5,Aldborough Hatch,Redbridge,ILFORD,IG2,020,TQ455895
6,Aldgate,City,LONDON,EC3,020,TQ334813
7,Aldwych,Westminster,LONDON,WC2,020,TQ307810
8,Alperton,Brent,WEMBLEY,HA0,020,TQ185835
9,Anerley,Bromley,LONDON,SE20,020,TQ345695


To first group based on PostCode, second to combine neighborhood names in the same PostCode, and finally to drop the duplicated rows

In [8]:
London['PostalCode'] = London[['Neighborhood','Borough','Post town','PostalCode','Dial code','OS grid ref']].groupby(['Neighborhood'])['PostalCode'].transform(lambda x: ','.join(x))
London['Borough'] = London[['Neighborhood','Borough','Post town','PostalCode','Dial code','OS grid ref']].groupby(['Neighborhood'])['Borough'].transform(lambda x: ','.join(x))
London=London[['Neighborhood','Borough','Post town','PostalCode','Dial code','OS grid ref']].drop_duplicates(subset='Neighborhood')
London=London.drop(['Dial code'], axis =1)

In [9]:
London.to_csv('London.csv', index=False)
London.shape

(527, 5)

Using the folllowing link to convert 'OS grid ref' in London data frame to 'latitude' and 'longitude' and save the file as 'London_Geospatial_Coordinates.csv'
https://gridreferencefinder.com/batchConvert/batchConvert.php

In [10]:
coord=pd.read_csv('London_Geospatial_Coordinates.csv')
coord.shape

(527, 5)

In [11]:
combin = pd.merge(London, coord)
combin

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,OS grid ref,X,Y,Lat,Lng
0,Abbey Wood,Greenwich,LONDON,SE2,TQ465785,546500.0,178500.0,51.486481,0.108592
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",TQ205805,520500.0,180500.0,51.510588,-0.264989
2,Addington,Croydon,CROYDON,CR0,TQ375645,537500.0,164500.0,51.362931,-0.026374
3,Addiscombe,Croydon,CROYDON,CR0,TQ345665,534500.0,166500.0,51.381622,-0.068682
4,Addiscombe,Croydon,CROYDON,CR0,TQ345665,534500.0,166500.0,51.381622,-0.068682
5,Angel,Islington,LONDON,"EC1, N1",TQ345665,534500.0,166500.0,51.381622,-0.068682
6,Angel,Islington,LONDON,"EC1, N1",TQ345665,534500.0,166500.0,51.381622,-0.068682
7,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",TQ478728,547800.0,172800.0,51.434926,0.124921
8,Aldborough Hatch,Redbridge,ILFORD,IG2,TQ455895,545500.0,189500.0,51.585578,0.098742
9,Aldgate,City,LONDON,EC3,TQ334813,533400.0,181300.0,51.514882,-0.078905


In [12]:
combin.to_csv('combin.csv', index=False)

In [15]:
Lon_data = combin[combin['Post town'] == 'LONDON'].reset_index(drop=True)
Lon_data['PostalCode'] = Lon_data[['Neighborhood','Borough','Post town','PostalCode','OS grid ref','X','Y','Lat','Lng']].groupby(['Neighborhood'])['PostalCode'].transform(lambda x: ','.join(x))
Lon_data['Borough'] = Lon_data[['Neighborhood','Borough','Post town','PostalCode','OS grid ref','X','Y','Lat','Lng']].groupby(['Neighborhood'])['Borough'].transform(lambda x: ','.join(x))
Lon_data=Lon_data[['Neighborhood','Borough','Post town','PostalCode', 'OS grid ref', 'X','Y','Lat','Lng']].drop_duplicates(subset='Neighborhood')
Lon_data.head()
Lon_data.shape
Lon_data.to_csv('Lon_data.csv', index=False)

In [16]:
address = 'London area, London'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of London are 51.5073219, -0.1276474.


In [17]:
# create map of Manhattan using latitude and longitude values
Lon_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, label in zip(Lon_data['Lat'], Lon_data['Lng'], Lon_data['Borough']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.8
        ).add_to(Lon_map)  

Lon_map.save("Lon_map.html") 
Lon_map

In [2]:
# The code was removed by Watson Studio for sharing.

CLIENT_ID, CLIENT_SECRET, VERION are defined


In [19]:
Lon_data=pd.read_csv('Lon_data.csv')
Lon_data.head()

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,OS grid ref,X,Y,Lat,Lng
0,Abbey Wood,Greenwich,LONDON,SE2,TQ465785,546500.0,178500.0,51.486481,0.108592
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",TQ205805,520500.0,180500.0,51.510588,-0.264989
2,Angel,"Islington,Islington",LONDON,"EC1, N1,EC1, N1",TQ345665,534500.0,166500.0,51.381622,-0.068682
3,Aldgate,City,LONDON,EC3,TQ334813,533400.0,181300.0,51.514882,-0.078905
4,Aldwych,Westminster,LONDON,WC2,TQ307810,530700.0,181000.0,51.512816,-0.117904


In [20]:
Lon_data.shape

(297, 9)

In [21]:
neighborhood_latitude = Lon_data.loc[1, 'Lat'] # neighborhood latitude value
neighborhood_longitude = Lon_data.loc[1, 'Lng'] # neighborhood longitude value

neighborhood_name = Lon_data.loc[1, 'Borough'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ealing, Hammersmith and Fulham are 51.510588, -0.26498903.


In [22]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude,
    radius,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=FJBE2EPGQVSSRD240QZ1NGIPYQD2EB242AFDUXWTKXXGGU4K&client_secret=GVQZ122GNG1WT4CSHNCZHSLZPZZSXM5ZORAVHAKDP0J5VFPF&v=20180605&ll=51.510588,-0.26498903&radius=500&limit=100'

In [23]:
findings = requests.get(url).json()
findings

{'meta': {'code': 200, 'requestId': '5c4e584d4434b974435507ff'},
 'response': {'headerLocation': 'Acton',
  'headerFullLocation': 'Acton, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 12,
  'suggestedBounds': {'ne': {'lat': 51.5150880045,
    'lng': -0.25777209722545796},
   'sw': {'lat': 51.506087995499996, 'lng': -0.27220596277454207}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4cb84e45a33bb1f781d87ffd',
       'name': 'The Station House',
       'location': {'address': 'Station Buildings',
        'crossStreet': 'Churchfield Rd.',
        'lat': 51.50887658087082,
        'lng': -0.2630759210521079,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.50887658087082,
          'lng': -0.2630759210521079}],
        'dist

To visualizat the neighborhoods in London area of London.

In [24]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        findings = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name']) for v in findings])
            

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

To write the code to run the above function on each neighborhood and create a new dataframe called Lon_venues.

In [32]:
Lon_venues = getNearbyVenues(names=Lon_data['Neighborhood'], latitudes=Lon_data['Lat'], 
                                   longitudes=Lon_data['Lng'], radius=500)

Abbey Wood
Acton
Angel
Aldgate
Aldwych
Anerley
Archway
Highgate
Arnos Grove
New Southgate
Balham
Bankside
Newington
Barbican
Barnes
Barnsbury
Battersea
Bayswater
Bedford Park
Belgravia
Brompton
Knightsbridge
Bellingham
Belsize Park
Bermondsey
Bethnal Green
Blackfriars
Blackheath
Blackheath Royal Standard
Blackwall
Bloomsbury
St Pancras
Bounds Green
Bow
Mile End
Bowes Park
Brent Cross
Brent Park
Brixton
Brockley
Bromley (also Bromley-by-Bow)
Brondesbury
Brunswick Park
Burroughs, The
Camberwell
Cambridge Heath
Camden Town
Canary Wharf
Cann Hall
Leytonstone
Canning Town
Canonbury
Castelnau
Catford
Chalk Farm
Charing Cross
Charlton
Chelsea
Childs Hill
Chinatown
Snaresbrook
Chinbrook
Chingford
Chiswick
Church End
Clapham
Clerkenwell
Finsbury
Colindale
Colliers Wood
Colney Hatch
Covent Garden
Cricklewood
Crofton Park
Crossness
Crouch End
Crystal Palace
Cubitt Town
Custom House
Dalston
Hackney
Hackney Central
Dartford
De Beauvoir Town
Denmark Hill
Deptford
Dollis Hill
Dulwich
Ealing
Earls Cou

Get the neighborhood's latitude and longitude values

In [33]:
print(Lon_venues.shape)
Lon_venues.head()

(9354, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbey Wood,51.486481,0.108592,Bostal Gardens,51.48667,0.110462,Playground
1,Abbey Wood,51.486481,0.108592,Co-op Food,51.48765,0.11349,Grocery Store
2,Abbey Wood,51.486481,0.108592,tommysdriveways,51.489386,0.104273,Construction & Landscaping
3,Abbey Wood,51.486481,0.108592,Meghna Tandoori,51.485709,0.101681,Indian Restaurant
4,Acton,51.510588,-0.264989,The Station House,51.508877,-0.263076,Pub


In [34]:
Lon_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbey Wood,4,4,4,4,4,4
Acton,13,13,13,13,13,13
Aldgate,100,100,100,100,100,100
Aldwych,100,100,100,100,100,100
Anerley,6,6,6,6,6,6
Angel,16,16,16,16,16,16
Archway,28,28,28,28,28,28
Arnos Grove,4,4,4,4,4,4
Balham,54,54,54,54,54,54
Bankside,25,25,25,25,25,25


In [35]:
print('There are {} uniques categories.'.format(len(Lon_venues['Venue Category'].unique())))

There are 386 uniques categories.


In [36]:
# one hot encoding
Lon_onehot = pd.get_dummies(Lon_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Lon_onehot['Neighborhood'] = Lon_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Lon_onehot.columns[-1]] + list(Lon_onehot.columns[:-1])
Lon_onehot = Lon_onehot[fixed_columns]

Lon_onehot.head()

Unnamed: 0,Zoo Exhibit,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Whisky Bar,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant,Yakitori Restaurant,Yoga Studio,Zoo
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
Lon_onehot.shape

(9354, 386)

In [38]:
Lon_grouped = Lon_onehot.groupby('Neighborhood').mean().reset_index()
Lon_grouped
Lon_grouped.to_csv('Lon_grouped_19Jan.csv', index= False)

In [39]:
Lon_grouped.shape

(297, 386)

To print each neighborhood along with the top 5 most common venues

In [40]:
num_top_venues = 5

for hood in Lon_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Lon_grouped[Lon_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbey Wood----
                        venue  freq
0           Indian Restaurant  0.25
1                  Playground  0.25
2  Construction & Landscaping  0.25
3               Grocery Store  0.25
4                        Park  0.00


----Acton----
                  venue  freq
0                   Pub  0.23
1  Gym / Fitness Center  0.15
2                  Park  0.15
3         Train Station  0.08
4         Grocery Store  0.08


----Aldgate----
                  venue  freq
0           Coffee Shop  0.09
1                 Hotel  0.06
2  Gym / Fitness Center  0.04
3          Cocktail Bar  0.04
4           Salad Place  0.04


----Aldwych----
                venue  freq
0             Theater  0.08
1         Coffee Shop  0.05
2  Italian Restaurant  0.05
3          Restaurant  0.04
4        Dessert Shop  0.04


----Anerley----
            venue  freq
0   Grocery Store  0.33
1   Train Station  0.17
2  Hardware Store  0.17
3             Pub  0.17
4            Park  0.17


----Angel----
       

                  venue  freq
0              Pharmacy  0.25
1           Coffee Shop  0.25
2  Gym / Fitness Center  0.25
3         Grocery Store  0.25
4           Zoo Exhibit  0.00


----Chalk Farm----
                venue  freq
0                Café  0.12
1                 Bar  0.08
2                 Pub  0.07
3         Pizza Place  0.05
4  Italian Restaurant  0.05


----Charing Cross----
               venue  freq
0            Theater  0.06
1              Hotel  0.06
2       Dessert Shop  0.04
3  French Restaurant  0.04
4       Burger Joint  0.04


----Charlton----
               venue  freq
0  Electronics Store  0.07
1     Soccer Stadium  0.07
2           Platform  0.07
3    Thai Restaurant  0.07
4              Hotel  0.07


----Chelsea----
               venue  freq
0                Pub  0.23
1             Garden  0.09
2  Convenience Store  0.09
3      Grocery Store  0.09
4               Café  0.05


----Childs Hill----
              venue  freq
0              Park  0.25
1  Sushi R

             venue  freq
0           Garden  0.25
1  Nature Preserve  0.25
2   Scenic Lookout  0.25
3    Garden Center  0.25
4  Other Nightlife  0.00


----Fortis Green----
               venue  freq
0               Café  0.17
1      Deli / Bodega  0.11
2                Pub  0.11
3  Indian Restaurant  0.06
4    Organic Grocery  0.06


----Friern Barnet----
                venue  freq
0       Grocery Store   0.4
1   Fish & Chips Shop   0.2
2        Dessert Shop   0.2
3  Italian Restaurant   0.2
4         Zoo Exhibit   0.0


----Frognal----
               venue  freq
0  Indian Restaurant  0.06
1             Bakery  0.06
2        Pizza Place  0.06
3               Café  0.06
4                Pub  0.06


----Fulham----
                venue  freq
0  Italian Restaurant  0.15
1                Café  0.13
2                 Pub  0.06
3         Pizza Place  0.06
4         Yoga Studio  0.04


----Gipsy Hill----
                venue  freq
0         Coffee Shop  0.16
1  Italian Restaurant  0.08
2  

                     venue  freq
0                      Pub  0.12
1    Portuguese Restaurant  0.12
2              Supermarket  0.12
3  Fruit & Vegetable Store  0.12
4            Metro Station  0.12


----Kingston Vale----
           venue  freq
0      Rest Area  0.25
1   Soccer Field  0.25
2       Bus Stop  0.25
3  Bowling Alley  0.25
4           Park  0.00


----Knightsbridge----
                venue  freq
0            Boutique  0.13
1                Café  0.11
2  Italian Restaurant  0.08
3               Hotel  0.07
4  Seafood Restaurant  0.04


----Ladywell----
           venue  freq
0            Pub  0.25
1           Park  0.17
2  Grocery Store  0.08
3    Coffee Shop  0.08
4           Café  0.08


----Lambeth----
                   venue  freq
0            Art Gallery  0.16
1                  Hotel  0.11
2                    Pub  0.08
3            Pizza Place  0.05
4  Portuguese Restaurant  0.05


----Lea Bridge----
             venue  freq
0             Park  0.29
1     Skating Ri

           venue  freq
0           Café  0.22
1    Coffee Shop  0.12
2            Pub  0.09
3    Pizza Place  0.09
4  Grocery Store  0.06


----Old Oak Common----
                        venue  freq
0                 Bus Station   0.2
1  Modern European Restaurant   0.2
2                       Canal   0.2
3                         Gym   0.2
4             Warehouse Store   0.2


----Osidge----
              venue  freq
0     Grocery Store  0.25
1               Pub  0.25
2             Plaza  0.25
3  Greek Restaurant  0.25
4       Zoo Exhibit  0.00


----Oval----
                        venue  freq
0                Home Service  0.33
1                        Café  0.33
2  Construction & Landscaping  0.33
3                 Zoo Exhibit  0.00
4                 Pastry Shop  0.00


----Paddington----
                venue  freq
0               Hotel  0.17
1         Coffee Shop  0.08
2                Café  0.08
3                 Pub  0.05
4  Italian Restaurant  0.05


----Palmers Green----
    

         venue  freq
0        Hotel  0.05
1   Restaurant  0.05
2      Theater  0.05
3          Pub  0.04
4  Coffee Shop  0.03


----St James's----
                venue  freq
0             Theater  0.05
1      Clothing Store  0.05
2   Indian Restaurant  0.04
3  Italian Restaurant  0.04
4               Hotel  0.04


----St John's Wood----
                       venue  freq
0                       Café  0.12
1              Deli / Bodega  0.08
2                Coffee Shop  0.08
3  Middle Eastern Restaurant  0.08
4                     Bakery  0.04


----St Johns----
                venue  freq
0          Food Truck  0.13
1               Hotel  0.10
2  Light Rail Station  0.07
3       Grocery Store  0.07
4                 Pub  0.07


----St Luke's----
                venue  freq
0         Coffee Shop  0.12
1                Café  0.05
2          Food Truck  0.05
3                 Pub  0.05
4  Italian Restaurant  0.05


----St Pancras----
                venue  freq
0                 Pub  0.1

               venue  freq
0                Pub  0.23
1  Indian Restaurant  0.09
2     Farmers Market  0.09
3        Pizza Place  0.09
4      Grocery Store  0.09


----West Ealing----
               venue  freq
0               Café  0.15
1               Park  0.08
2  Fish & Chips Shop  0.08
3     Hardware Store  0.08
4        Coffee Shop  0.08


----West Green----
                venue  freq
0  Turkish Restaurant  0.15
1                Park  0.15
2   Convenience Store  0.15
3         Coffee Shop  0.08
4      Sandwich Place  0.08


----West Hackney----
           venue  freq
0            Pub  0.13
1           Café  0.09
2    Pizza Place  0.07
3            Bar  0.07
4  Grocery Store  0.04


----West Ham----
               venue  freq
0                Pub  0.33
1           Bus Line  0.17
2           Bus Stop  0.17
3           Boutique  0.17
4  Convenience Store  0.17


----West Hampstead----
               venue  freq
0      Grocery Store  0.06
1               Café  0.06
2  Indian Restaur

To put the info above into a pandas dataframe First, I am going to set a function to sort the venues in descending order.

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Next, I am going to create the new dataframe and display the top 5 venues for each neighborhood.

In [42]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = Lon_grouped['Neighborhood']

for ind in np.arange(Lon_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Lon_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbey Wood,Grocery Store,Construction & Landscaping,Playground,Indian Restaurant,Zoo
1,Acton,Pub,Park,Gym / Fitness Center,Creperie,Bakery
2,Aldgate,Coffee Shop,Hotel,Italian Restaurant,Cocktail Bar,Salad Place
3,Aldwych,Theater,Coffee Shop,Italian Restaurant,Restaurant,Cocktail Bar
4,Anerley,Grocery Store,Hardware Store,Pub,Park,Train Station
5,Angel,Grocery Store,Tram Station,Café,Park,Bakery
6,Archway,Pub,Bakery,Indian Restaurant,Park,Jazz Club
7,Arnos Grove,Pool,Park,Grocery Store,Chinese Restaurant,French Restaurant
8,Balham,Coffee Shop,Pub,Pizza Place,Indian Restaurant,Steakhouse
9,Bankside,Pub,Park,Coffee Shop,Garden,Argentinian Restaurant


** Cluster Neighborhoods: To apply k-means to clustering the neighborhood into 5 clusters

In [43]:
# set number of clusters
kclusters = 5

Lon_grouped_clustering = Lon_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Lon_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 4, 0, 0, 1, 1, 4, 1, 0, 4, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 0, 0,
       0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0, 3,
       0, 0, 1, 0, 0, 0, 4, 2, 0, 0, 4, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 2,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 3, 0, 0, 0, 4, 4, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 3, 4, 0, 0,
       0, 0, 4, 0, 0, 3, 0, 0, 0, 2, 0, 4, 0, 3, 0, 0, 0, 2, 0, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 2, 4, 4, 0, 4, 0, 0, 0, 3, 0, 0, 2, 0, 2,
       1, 4, 0, 2, 0, 0, 0, 0, 4, 0, 0, 0, 4, 0, 0, 4, 0, 0, 0, 0, 0, 0,
       3, 1, 0, 0, 4, 0, 0, 3, 2, 0, 0, 0, 1, 0, 0, 0, 4, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 2, 0, 0, 2, 0, 3, 0, 3,
       0, 4, 0, 2, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 4, 0, 4, 0, 0, 4, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 3], dtype=int3

In [44]:
kmeans.labels_.shape

(297,)

In [45]:
Lon_merged = Lon_data
Lon_merged.shape

# add clustering labels
Lon_merged['Cluster Labels'] = kmeans.labels_

Lon_merged.rename(columns={'Location': 'Neighborhood'}, inplace=True) 

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Lon_merged = Lon_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

Lon_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,OS grid ref,X,Y,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbey Wood,Greenwich,LONDON,SE2,TQ465785,546500.0,178500.0,51.486481,0.108592,1,Grocery Store,Construction & Landscaping,Playground,Indian Restaurant,Zoo
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",TQ205805,520500.0,180500.0,51.510588,-0.264989,4,Pub,Park,Gym / Fitness Center,Creperie,Bakery
2,Angel,"Islington,Islington",LONDON,"EC1, N1,EC1, N1",TQ345665,534500.0,166500.0,51.381622,-0.068682,0,Grocery Store,Tram Station,Café,Park,Bakery
3,Aldgate,City,LONDON,EC3,TQ334813,533400.0,181300.0,51.514882,-0.078905,0,Coffee Shop,Hotel,Italian Restaurant,Cocktail Bar,Salad Place
4,Aldwych,Westminster,LONDON,WC2,TQ307810,530700.0,181000.0,51.512816,-0.117904,1,Theater,Coffee Shop,Italian Restaurant,Restaurant,Cocktail Bar


To visualize the resulting clusters

In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Lon_merged['Lat'], Lon_merged['Lng'], Lon_merged['Neighborhood'], Lon_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters.save("map_clusters.html")   
map_clusters

Now, I am going to examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, I will assign a name to each cluster.

In [25]:
Lon_merged=pd.read_csv('Lon_merged.csv')


Cluster 1

In [70]:
cluster1=Lon_merged.loc[Lon_merged['Cluster Labels'] == 0, Lon_merged.columns[[0] + list(range(1, Lon_merged.shape[1]))]]
cluster1['count1'] = cluster1.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster1['count2'] = cluster1.groupby('2nd Most Common Venue')['2nd Most Common Venue'].transform('count')
cluster1['count3'] = cluster1.groupby('3rd Most Common Venue')['3rd Most Common Venue'].transform('count')
#cluster1.to_csv('cluster1.csv', index=False)
cluster1.sort_values(['count1'], ascending=False)

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
144,Hornsey,Haringey,LONDON,N8,51.589248,-0.117634,0,Pub,Gym,Pizza Place,Gym / Fitness Center,Hotel,46,1,12
239,Southfields,Wandsworth,LONDON,"SW18, SW19",51.446591,-0.195461,0,Pub,Thai Restaurant,Gym Pool,Furniture / Home Store,Gym,46,1,1
204,Palmers Green,Enfield,LONDON,N13,51.625194,-0.116146,0,Pub,Tennis Court,French Restaurant,Hotel,Café,46,1,3
168,Limehouse,Tower Hamlets,LONDON,E14,51.515939,-0.034180,0,Pub,Athletics & Sports,Canal Lock,Track Stadium,Playground,46,2,1
251,Stratford,Newham,LONDON,E15,51.542411,-0.004196,0,Pub,Sandwich Place,Bookstore,Café,Fast Food Restaurant,46,3,2
45,Cambridge Heath,Tower Hamlets,LONDON,E2,51.531624,-0.058015,0,Pub,Coffee Shop,Pizza Place,Restaurant,Cocktail Bar,46,23,12
48,Cann Hall,"Waltham Forest,Waltham Forest",LONDON,"E11,E11",51.569122,0.011404,0,Pub,Café,Grocery Store,Platform,Coffee Shop,46,30,13
49,Leytonstone,"Waltham Forest,Waltham Forest",LONDON,"E11,E11",51.569122,0.011404,0,Pub,Café,Grocery Store,Platform,Coffee Shop,46,30,13
248,Stockwell,Lambeth,LONDON,"SW8, SW9",51.463436,-0.122816,0,Pub,Bar,Coffee Shop,Café,African Restaurant,46,5,11
99,Farringdon,Islington & City,LONDON,EC1,51.519820,-0.106084,0,Pub,Coffee Shop,Hotel,French Restaurant,Vietnamese Restaurant,46,23,10


Cluster 2

In [71]:
cluster2=Lon_merged.loc[Lon_merged['Cluster Labels'] == 1, Lon_merged.columns[[0] + list(range(1, Lon_merged.shape[1]))]]
cluster2['count1'] = cluster2.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster2['count2'] = cluster2.groupby('2nd Most Common Venue')['2nd Most Common Venue'].transform('count')
cluster2['count3'] = cluster2.groupby('3rd Most Common Venue')['3rd Most Common Venue'].transform('count')
cluster2.to_csv('cluster2.csv', index=False)
cluster2.sort_values(['count1'], ascending=False)

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
7,Highgate,"Camden,Camden",LONDON,"N6,N6",51.571734,-0.147219,1,Pub,Bakery,Park,Indian Restaurant,Jazz Club,6,1,2
17,Bayswater,Westminster,LONDON,W2,51.509501,-0.192977,1,Pub,Hotel,Grocery Store,Indian Restaurant,Ice Cream Shop,6,2,2
46,Camden Town,Camden,LONDON,NW1,51.544545,-0.133901,1,Pub,Italian Restaurant,Park,Pizza Place,Hotel,6,1,2
92,East Finchley,Barnet,LONDON,N2,51.590159,-0.175342,1,Pub,Grocery Store,Bus Stop,Italian Restaurant,Bakery,6,2,1
101,Finsbury Park,"Haringey, Islington",LONDON,N4,51.56837,-0.10551,1,Pub,Coffee Shop,Café,Grocery Store,Italian Restaurant,6,2,2
249,Stoke Newington,Hackney,LONDON,N16,51.561587,-0.075494,1,Pub,Café,Cocktail Bar,Pizza Place,Gift Shop,6,4,1
223,South Norwood,"Croydon,Croydon",LONDON,"SE25,SE25",51.398815,-0.075146,1,Platform,Café,Coffee Shop,Asian Restaurant,Park,2,4,1
5,Anerley,Bromley,LONDON,SE20,51.408582,-0.067546,1,Grocery Store,Train Station,Pub,Park,Hardware Store,2,2,3
61,Chinbrook,Lewisham,LONDON,SE12,51.431241,0.02836,1,Platform,Chinese Restaurant,Fried Chicken Joint,Event Service,Coffee Shop,2,1,1
237,South Tottenham,Haringey,LONDON,"N15, N17",51.57956,-0.074735,1,Grocery Store,Hardware Store,Pizza Place,Concert Hall,Park,2,1,1


Cluster 3

In [72]:
cluster3= Lon_merged.loc[Lon_merged['Cluster Labels'] == 2, Lon_merged.columns[[0] + list(range(1, Lon_merged.shape[1]))]]
cluster3['count1']=cluster3.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster3['count2'] = cluster3.groupby('2nd Most Common Venue')['2nd Most Common Venue'].transform('count')
cluster3['count3'] = cluster3.groupby('3rd Most Common Venue')['3rd Most Common Venue'].transform('count')
cluster3.sort_values(['count1'], ascending= False)
cluster3.to_csv('cluster3.csv', index=False)
cluster3.sort_values(['count1'], ascending=False)

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
15,Barnsbury,Islington,LONDON,N1,51.544315,-0.119489,2,Café,Grocery Store,Pub,Brewery,Nightclub,2,2,3
65,Clapham,"Lambeth, Wandsworth",LONDON,SW4,51.463665,-0.137203,2,Pub,Burger Joint,Café,Restaurant,Cocktail Bar,2,1,2
141,Lower Clapton,"Hackney,Hackney",LONDON,"E5,E5",51.552124,-0.047045,2,Coffee Shop,Café,Pub,Italian Restaurant,Bookstore,2,3,3
173,Manor House,Hackney,LONDON,N4,51.571825,-0.096708,2,Café,Trail,Clothing Store,Convenience Store,Reservoir,2,1,1
206,Parsons Green,Hammersmith and Fulham,LONDON,SW6,51.473553,-0.194398,2,Coffee Shop,Grocery Store,Italian Restaurant,Café,Pub,2,2,1
284,West Hackney,Hackney,LONDON,N16,51.561468,-0.068285,2,Pub,Café,Bar,Pizza Place,Coffee Shop,2,3,1
51,Canonbury,Islington,LONDON,N1,51.54385,-0.090664,2,Gastropub,Café,Grocery Store,Pub,Fish & Chips Shop,1,3,1
149,Kennington,"Lambeth, Southwark",LONDON,SE11,51.481409,-0.122078,2,Portuguese Restaurant,Pub,Café,Coffee Shop,Park,1,3,2
162,Vauxhall,"Lambeth,Lambeth",LONDON,"SW8,SW8",51.490396,-0.121708,2,Art Gallery,Hotel,Pub,Restaurant,Portuguese Restaurant,1,1,3
175,Maryland,Newham,LONDON,E15,51.545858,0.004608,2,Hotel,Pub,Bus Stop,Café,Grocery Store,1,3,1


Cluster 4

In [73]:
cluster4=Lon_merged.loc[Lon_merged['Cluster Labels'] == 3, Lon_merged.columns[[0] + list(range(1, Lon_merged.shape[1]))]]
cluster4['count1']=cluster4.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster4['count2']=cluster4.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster4['count3']=cluster4.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster4.sort_values(['count1'], ascending=False)
cluster4.to_csv('cluster4.csv', index= False)
cluster4.sort_values(['count1'], ascending=False)

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
36,Brent Cross,Barnet,LONDON,"NW2, NW4",51.572061,-0.226573,3,Café,Clothing Store,Coffee Shop,Department Store,Electronics Store,2,2,2
76,Crystal Palace,Bromley,LONDON,"SE19, SE20, SE26",51.420359,-0.072803,3,Platform,Café,Breakfast Spot,Italian Restaurant,Farm,2,2,2
137,Holland Park,Kensington and Chelsea,LONDON,"W8, W11, W14",51.503409,-0.206186,3,Café,Garden,Grocery Store,Restaurant,Tennis Court,2,2,2
263,Tottenham Hale,Haringey,LONDON,"N15, N17",51.588308,-0.059929,3,Platform,Fast Food Restaurant,Clothing Store,Furniture / Home Store,Coffee Shop,2,2,2
43,"Burroughs, The",Barnet,LONDON,NW4,51.587404,-0.230307,3,Coffee Shop,Burger Joint,Japanese Restaurant,Grocery Store,Chinese Restaurant,1,1,1
113,Grange Park,Enfield,LONDON,N21,51.649271,-0.103584,3,Golf Course,Dance Studio,Zoo Exhibit,Food Court,Farmers Market,1,1,1
128,Harringay,Haringey,LONDON,"N4, N8, N15",51.58178,-0.100622,3,Turkish Restaurant,Café,Coffee Shop,Pub,Bakery,1,1,1
145,The Hyde,Barnet,LONDON,NW9,51.584968,-0.247723,3,Asian Restaurant,Pub,Pet Store,Hookah Bar,Auto Workshop,1,1,1
170,Little Ilford,Newham,LONDON,E12,51.550148,0.068263,3,Indian Restaurant,Ice Cream Shop,Fried Chicken Joint,Grocery Store,Farm,1,1,1
198,Whetstone,"Barnet,Barnet",LONDON,"N20,N20",51.626106,-0.1739,3,Pub,Coffee Shop,Auto Garage,Mediterranean Restaurant,Pharmacy,1,1,1


Cluster 5

In [74]:
cluster5=Lon_merged.loc[Lon_merged['Cluster Labels'] == 4, Lon_merged.columns[[0] + list(range(1, Lon_merged.shape[1]))]]
cluster5['count1']=cluster5.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster5['count2']=cluster5.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster5['count3']=cluster5.groupby('1st Most Common Venue')['1st Most Common Venue'].transform('count')
cluster5.sort_values(['count1'], ascending=False)
cluster5.to_csv('cluster5.csv', index=False)
cluster5.sort_values(['count1'], ascending=False)

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
23,Belsize Park,Camden,LONDON,NW3,51.545046,-0.165609,4,Café,Italian Restaurant,Convenience Store,Hotel,Hotel Bar,5,5,5
54,Chalk Farm,Camden,LONDON,NW1,51.543966,-0.154115,4,Café,Bar,Pub,Italian Restaurant,Pizza Place,5,5,5
80,Hackney,"Hackney,Hackney,Hackney",LONDON,"E5, E8, E9, N1, N16,E5, E8, E9, N1, N16,E5, E8...",51.543377,-0.061841,4,Café,Pub,Thai Restaurant,Gym / Fitness Center,Vietnamese Restaurant,5,5,5
81,Hackney Central,"Hackney,Hackney,Hackney",LONDON,"E8,E8,E8",51.543377,-0.061841,4,Café,Pub,Thai Restaurant,Gym / Fitness Center,Vietnamese Restaurant,5,5,5
188,New Cross,Lewisham,LONDON,SE14,51.471008,-0.036112,4,Café,Bus Stop,Coffee Shop,Gastropub,Indie Movie Theater,5,5,5
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.510588,-0.264989,4,Pub,Gym / Fitness Center,Park,Shopping Mall,Bakery,3,3,3
280,Wapping,Tower Hamlets,LONDON,E1,51.507432,-0.063367,4,Pub,Park,Grocery Store,Bar,History Museum,3,3,3
26,Blackfriars,City,LONDON,EC4,51.510764,-0.102136,4,Coffee Shop,Italian Restaurant,Restaurant,Bakery,Pub,3,3,3
265,Tower Hill,Tower Hamlets,LONDON,EC3,51.508615,-0.080609,4,Hotel,Coffee Shop,Gym / Fitness Center,French Restaurant,Sandwich Place,3,3,3
214,Putney,Wandsworth,LONDON,SW15,51.465005,-0.223529,4,Coffee Shop,Pub,Indian Restaurant,Grocery Store,Café,3,3,3


Despite that Cluster 1 appraently has the highest frequency of the top 3 most common venues being either coffee shop, or café, or pub, or bar, I would still conduct a chi-square test to see if statistically this conclusion holds.

Thus, I summed up the counts of pubs, bars, coffee shops, and cafés for each cluster and applied a chi-squared test.

In [75]:
cluster1=cluster1.rename(columns={'1st Most Common Venue':'toponeVenue'})
pub_count1=cluster1.loc[cluster1.toponeVenue== 'Pub', 'toponeVenue'].count()
bar_count1=cluster1.loc[cluster1.toponeVenue== 'Bar', 'toponeVenue'].count()
coffeeshop_count1=cluster1.loc[cluster1.toponeVenue== 'Coffee Shop', 'toponeVenue'].count()
cafe_count1=cluster1.loc[cluster1.toponeVenue== 'Café', 'toponeVenue'].count()
numbers1=[pub_count1, bar_count1, coffeeshop_count1, cafe_count1]
beverage1=sum(numbers1)

cluster2=cluster2.rename(columns={'1st Most Common Venue':'toponeVenue'})
pub_count2=cluster2.loc[cluster2.toponeVenue== 'Pub', 'toponeVenue'].count()
bar_count2=cluster2.loc[cluster2.toponeVenue== 'Bar', 'toponeVenue'].count()
coffeeshop_count2=cluster2.loc[cluster2.toponeVenue== 'Coffee Shop', 'toponeVenue'].count()
cafe_count2=cluster2.loc[cluster2.toponeVenue== 'Café', 'toponeVenue'].count()
numbers2=[pub_count2, bar_count2, coffeeshop_count2, cafe_count2]
beverage2=sum(numbers2)

cluster3=cluster3.rename(columns={'1st Most Common Venue':'toponeVenue'})
pub_count3=cluster3.loc[cluster3.toponeVenue== 'Pub', 'toponeVenue'].count()
bar_count3=cluster3.loc[cluster3.toponeVenue== 'Bar', 'toponeVenue'].count()
coffeeshop_count3=cluster3.loc[cluster3.toponeVenue== 'Coffee Shop', 'toponeVenue'].count()
cafe_count3=cluster3.loc[cluster3.toponeVenue== 'Café', 'toponeVenue'].count()
numbers3=[pub_count3, bar_count3, coffeeshop_count3, cafe_count3]
beverage3=sum(numbers3)

cluster4=cluster4.rename(columns={'1st Most Common Venue':'toponeVenue'})
pub_count4=cluster4.loc[cluster4.toponeVenue== 'Pub', 'toponeVenue'].count()
bar_count4=cluster4.loc[cluster4.toponeVenue== 'Bar', 'toponeVenue'].count()
coffeeshop_count4=cluster4.loc[cluster4.toponeVenue== 'Coffee Shop', 'toponeVenue'].count()
cafe_count4=cluster4.loc[cluster4.toponeVenue== 'Café', 'toponeVenue'].count()
numbers4=[pub_count4, bar_count4, coffeeshop_count4, cafe_count4]
beverage4=sum(numbers4)

cluster5=cluster5.rename(columns={'1st Most Common Venue':'toponeVenue'})
pub_count5=cluster5.loc[cluster5.toponeVenue== 'Pub', 'toponeVenue'].count()
bar_count5=cluster5.loc[cluster5.toponeVenue== 'Bar', 'toponeVenue'].count()
coffeeshop_count5=cluster5.loc[cluster5.toponeVenue== 'Coffee Shop', 'toponeVenue'].count()
cafe_count5=cluster5.loc[cluster5.toponeVenue== 'Café', 'toponeVenue'].count()
numbers5=[pub_count5, bar_count5, coffeeshop_count5, cafe_count5]
beverage5=sum(numbers5)

from scipy.stats import chisquare
chisquare([beverage1, beverage2, beverage3, beverage4, beverage5])

Power_divergenceResult(statistic=286.5652173913044, pvalue=8.557909248426446e-61)

Next, I am going to identify neighborhoods with all of the top 3 most common veues being either coffee shop, or café, or pub, or bar.

In [76]:
cluster1=pd.read_csv('cluster1.csv')
cluster1_sub = cluster1[cluster1['toponeVenue'].isin(['Coffee Shop', 'Café', 'Pub', 'Bar'])]
cluster1_sub2=cluster1_sub[cluster1_sub['2nd Most Common Venue'].isin(['Coffee Shop', 'Café', 'Pub', 'Bar'])]
cluster1_sub3=cluster1_sub2[cluster1_sub2['3rd Most Common Venue'].isin(['Coffee Shop', 'tea','Café', 'Pub', 'Bar'])]
cluster1_sub3

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,toponeVenue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
25,Bethnal Green,Tower Hamlets,LONDON,E2,51.525405,-0.062604,0,Pub,Café,Coffee Shop,Grocery Store,Gym,46,30,11
44,Camberwell,Southwark,LONDON,SE5,51.473755,-0.093593,0,Café,Pub,Coffee Shop,Grocery Store,Bus Stop,26,21,11
66,Clerkenwell,"Islington,Islington",LONDON,"EC1,EC1",51.52611,-0.105823,0,Pub,Coffee Shop,Café,Hotel,Bar,46,23,17
67,Finsbury,"Islington,Islington",LONDON,"EC1,EC1",51.52611,-0.105823,0,Pub,Coffee Shop,Café,Hotel,Bar,46,23,17
83,De Beauvoir Town,Islington,LONDON,N1,51.540989,-0.08069,0,Pub,Coffee Shop,Café,Deli / Bodega,Cocktail Bar,46,23,17
138,Holloway,"Islington,Islington",LONDON,"N7,N7",51.556873,-0.117528,0,Pub,Café,Coffee Shop,Bus Stop,Fast Food Restaurant,46,30,11
139,Nag's Head,"Islington,Islington",LONDON,"N7,N7",51.556873,-0.117528,0,Pub,Café,Coffee Shop,Bus Stop,Fast Food Restaurant,46,30,11
140,Homerton,"Hackney,Hackney",LONDON,"E9,E9",51.552124,-0.047045,0,Coffee Shop,Café,Pub,Italian Restaurant,Bookstore,32,30,25
248,Stockwell,Lambeth,LONDON,"SW8, SW9",51.463436,-0.122816,0,Pub,Bar,Coffee Shop,Café,African Restaurant,46,5,11
266,Tufnell Park,Islington,LONDON,"N7, N19",51.553532,-0.133534,0,Pub,Coffee Shop,Café,Bus Stop,Indian Restaurant,46,23,17


Then I am going to show these neighborhoods on the map

In [77]:
locaitons=cluster1_sub3[['Lat', 'Lng']]
locationlist=locaitons.values.tolist()
len(locationlist)
locationlist[7]

[51.552124, -0.047044617]

In [78]:

import folium
import pandas as pd
location = cluster1_sub3['Lat'].mean(), cluster1_sub3['Lng'].mean()
locationlist = cluster1_sub3[["Lat","Lng"]].values.tolist()
labels = cluster1_sub3["Neighborhood"].values.tolist()
m = folium.Map(location=location, zoom_start=12)

for point in range(len(locationlist)):
    popup = folium.Popup(labels[point], parse_html=True)
    folium.Marker(locationlist[point], popup=popup).add_to(m)

m.save("m.html")
m 

As can be seen above, there are 8 neighborhoods showing that the top 3 most common venues are all either coffee shop, or Café, or pub, or bar.

For a new company, to be on the safe side, I will further narrow down the number of neighborhoods by identifying neighborhoods with all of the top 4 most common venues being one of the aforementioned categories.

In [79]:
cluster1_sub4=cluster1_sub3[cluster1_sub3['4th Most Common Venue'].isin(['Coffee Shop', 'Café', 'Pub', 'Bar'])]
cluster1_sub4

Unnamed: 0,Neighborhood,Borough,Post town,PostalCode,Lat,Lng,Cluster Labels,toponeVenue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,count1,count2,count3
248,Stockwell,Lambeth,LONDON,"SW8, SW9",51.463436,-0.122816,0,Pub,Bar,Coffee Shop,Café,African Restaurant,46,5,11


The good news is that now there is only one neighborhood with all the top 4 most common venues being beverage selling places. 

The next thing that is needed is to find out the names and locations of these beverage selling places as the potential customers for the new company to target.

In [26]:
import foursquare
client = foursquare.Foursquare(client_id=CLIENT_ID, client_secret=CLIENT_SECRET, version=VERSION)

In [27]:
client.venues.search(params={'query': 'coffee', 'll': '51.463436,-0.122816'})

{'venues': [{'id': '4f4b8947e4b007d391e6b3ad',
   'name': 'Costa Coffee',
   'location': {'address': '458-460 Brixton Rd',
    'lat': 51.46293676351596,
    'lng': -0.11490881941917813,
    'labeledLatLngs': [{'label': 'display',
      'lat': 51.46293676351596,
      'lng': -0.11490881941917813}],
    'distance': 551,
    'postalCode': 'SW9 8EA',
    'cc': 'GB',
    'neighborhood': 'Brixton',
    'city': 'Brixton',
    'state': 'Greater London',
    'country': 'United Kingdom',
    'formattedAddress': ['458-460 Brixton Rd',
     'Brixton',
     'Greater London',
     'SW9 8EA',
     'United Kingdom']},
   'categories': [{'id': '4bf58dd8d48988d1e0931735',
     'name': 'Coffee Shop',
     'pluralName': 'Coffee Shops',
     'shortName': 'Coffee Shop',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/coffeeshop_',
      'suffix': '.png'},
     'primary': True}],
   'referralId': 'v-1548640376',
   'hasPerk': False},
  {'id': '5a19c873356b4976e646ed4b',
   'name': 'The C

In [41]:
coffee=client.venues.search(params={'query': 'coffee', 'll': '51.463436,-0.122816'})
nearby_coffee=pd.DataFrame(coffee)
nearby_coffee

Unnamed: 0,venues
0,"{'id': '4f4b8947e4b007d391e6b3ad', 'name': 'Co..."
1,"{'id': '5a19c873356b4976e646ed4b', 'name': 'Th..."
2,"{'id': '53cb7a97498edc8c0e5162ee', 'name': 'Th..."
3,"{'id': '4f54937ce4b0e14ed9512da2', 'name': 'Oz..."
4,"{'id': '4ac518edf964a520c1ac20e3', 'name': 'Mo..."
5,"{'id': '4ad9a8acf964a5205f1a21e3', 'name': 'Mo..."
6,"{'id': '4ad21f70f964a52082df20e3', 'name': 'Co..."
7,"{'id': '4b2328d1f964a520fb5324e3', 'name': 'Co..."
8,"{'id': '5565c24d498e2885d1b9a54e', 'name': 'Co..."
9,"{'id': '5883913551d19e062a758a77', 'name': 'Ra..."


In [104]:
client.venues.search(params={'query': 'tea', 'll': '51.463436,-0.122816'})

{'venues': [{'id': '4fd4bdd5e4b0052fbc13a808',
   'name': 'Diamond Jubilee Tea Salon',
   'location': {'address': '181 Piccadilly',
    'crossStreet': 'Fortnum & Mason',
    'lat': 51.50818280532083,
    'lng': -0.13847303307163822,
    'labeledLatLngs': [{'label': 'display',
      'lat': 51.50818280532083,
      'lng': -0.13847303307163822}],
    'distance': 5098,
    'postalCode': 'W1J 9EH',
    'cc': 'GB',
    'city': 'London',
    'state': 'Greater London',
    'country': 'United Kingdom',
    'formattedAddress': ['181 Piccadilly (Fortnum & Mason)',
     'London',
     'Greater London',
     'W1J 9EH',
     'United Kingdom']},
   'categories': [{'id': '4bf58dd8d48988d1dc931735',
     'name': 'Tea Room',
     'pluralName': 'Tea Rooms',
     'shortName': 'Tea Room',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/tearoom_',
      'suffix': '.png'},
     'primary': True}],
   'referralId': 'v-1548157874',
   'hasPerk': False},
  {'id': '53691b16498e84a13e31fce5',


In [52]:
tea=client.venues.search(params={'query': 'tea', 'll': '51.463436,-0.122816'})
nearby_tea=pd.DataFrame(tea)
nearby_tea

Unnamed: 0,venues
0,"{'id': '4fd4bdd5e4b0052fbc13a808', 'name': 'Di..."
1,"{'id': '53691b16498e84a13e31fce5', 'name': 'Af..."
2,"{'id': '4d8490f07e8ef04dde5b0dbe', 'name': 'Re..."
3,"{'id': '54d77a21498e92bf4aa0a917', 'name': '2L..."
4,"{'id': '4ebc109e30f8ec57ef37a0df', 'name': 'Th..."
5,"{'id': '5853cb1352a0510ab8171b85', 'name': 'Af..."
6,"{'id': '58278cba88a1a041b7abf009', 'name': 'Co..."
7,"{'id': '53d0d723498eda9618d44c62', 'name': 'Bi..."
8,"{'id': '4e7c4d38183853fb9f712dcb', 'name': '2 ..."
9,"{'id': '58d13c2a0b56560b9c30d4d4', 'name': 'Bi..."


In [44]:
client.venues.search(params={'query': 'pub', 'll': '51.463436,-0.122816'})

{'venues': [{'id': '4b09a2a4f964a520ba1a23e3',
   'name': 'CASK Pub And Kitchen',
   'location': {'address': '6 Charlwood St.',
    'crossStreet': 'at Tachbrook St.',
    'lat': 51.49110107756983,
    'lng': -0.13746538073485792,
    'labeledLatLngs': [{'label': 'display',
      'lat': 51.49110107756983,
      'lng': -0.13746538073485792}],
    'distance': 3242,
    'postalCode': 'SW1V 2EE',
    'cc': 'GB',
    'city': 'Pimlico',
    'state': 'Greater London',
    'country': 'United Kingdom',
    'formattedAddress': ['6 Charlwood St. (at Tachbrook St.)',
     'Pimlico',
     'Greater London',
     'SW1V 2EE',
     'United Kingdom']},
   'categories': [{'id': '4bf58dd8d48988d11b941735',
     'name': 'Pub',
     'pluralName': 'Pubs',
     'shortName': 'Pub',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/nightlife/pub_',
      'suffix': '.png'},
     'primary': True}],
   'referralId': 'v-1548645433',
   'hasPerk': False},
  {'id': '4b9ebbf5f964a52092fd36e3',
   'name': 

In [50]:
pub=client.venues.search(params={'query': 'pub', 'll': '51.463436,-0.122816'})
nearby_pub=pd.DataFrame(pub)
nearby_pub

Unnamed: 0,venues
0,"{'id': '4b09a2a4f964a520ba1a23e3', 'name': 'CA..."
1,"{'id': '4b9ebbf5f964a52092fd36e3', 'name': 'Lo..."
2,"{'id': '4ac518c5f964a520c7a420e3', 'name': 'Pu..."
3,"{'id': '57fbefa8498e37e267a5df7d', 'name': 'Pu..."
4,"{'id': '4b45dcabf964a5204e1126e3', 'name': 'Wi..."
5,"{'id': '4d44995a1911a09363e9d9d8', 'name': 'Th..."
6,"{'id': '4d4dd376fe7fb1f7c1d36042', 'name': 'Fe..."
7,"{'id': '4b530727f964a520f48c27e3', 'name': 'Th..."
8,"{'id': '4f19a912e4b04692a0c6925d', 'name': 'Wa..."
9,"{'id': '4be6bc57bcef2d7f026505e5', 'name': 'Th..."


In [106]:
client.venues.search(params={'query': 'bar', 'll': '51.463436,-0.122816'})

{'venues': [{'id': '5073eaf9d63e722af9c7df92',
   'name': 'Buddha-Bar',
   'location': {'address': '145 Knightsbridge',
    'crossStreet': 'Brompton Rd',
    'lat': 51.50198418119578,
    'lng': -0.1619833371342229,
    'labeledLatLngs': [{'label': 'display',
      'lat': 51.50198418119578,
      'lng': -0.1619833371342229}],
    'distance': 5078,
    'postalCode': 'SW1X 7PA',
    'cc': 'GB',
    'city': 'London',
    'state': 'Greater London',
    'country': 'United Kingdom',
    'formattedAddress': ['145 Knightsbridge (Brompton Rd)',
     'London',
     'Greater London',
     'SW1X 7PA',
     'United Kingdom']},
   'categories': [{'id': '4bf58dd8d48988d142941735',
     'name': 'Asian Restaurant',
     'pluralName': 'Asian Restaurants',
     'shortName': 'Asian',
     'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
      'suffix': '.png'},
     'primary': True}],
   'venuePage': {'id': '42749858'},
   'referralId': 'v-1548157897',
   'hasPerk': False},
  {'id'

In [51]:
bar=client.venues.search(params={'query': 'bar', 'll': '51.463436,-0.122816'})
nearby_bar=pd.DataFrame(bar)
nearby_bar

Unnamed: 0,venues
0,"{'id': '5073eaf9d63e722af9c7df92', 'name': 'Bu..."
1,"{'id': '4ae8cfe1f964a52083b221e3', 'name': 'SW..."
2,"{'id': '507e8ef152622f097e39c008', 'name': 'Ad..."
3,"{'id': '5a19c873356b4976e646ed4b', 'name': 'Th..."
4,"{'id': '527a8644498ec716be9d5129', 'name': 'Th..."
5,"{'id': '50b788d7e4b0c9f8dc77784d', 'name': 'Li..."
6,"{'id': '4ac518e3f964a520bbaa20e3', 'name': 'Ta..."
7,"{'id': '4ec4e0b349017fff08b684ef', 'name': 'Ca..."
8,"{'id': '4ac518c8f964a52081a520e3', 'name': 'Ku..."
9,"{'id': '5887ed25561ded45b3ae39b6', 'name': 'Th..."
