In [1]:
# Importing Libraries
import numpy as np 
import pandas as pd 
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

We have found a good source of information that contains London boroughs and its coordinates

https://en.wikipedia.org/wiki/List_of_London_boroughs

So let´s scrap it from the web

In [2]:
# First we bring it into a pandas dataframe
london = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_London_boroughs')

In [3]:
# Checking how many tables are in the url
print(len(london))

4


In [4]:
# Let´s check position by position to see which one is the table we need
london[0].head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
1,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,".mw-parser-output .geo-default,.mw-parser-outp...",25
2,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
3,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
4,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.70,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12


In [5]:
# Well, we have been lucky and is the first one, so let´s bring it into a dataframe
london_df = pd.DataFrame(london[0])
london_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
1,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,".mw-parser-output .geo-default,.mw-parser-outp...",25
2,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
3,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
4,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.70,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12


In [6]:
# We may want to change the column names later, but for now let´s use row 0 as it contains them
london_df.columns = london_df.iloc[0]

# And now let´s drop the row 0 as we won´t be needing it any longer
london_df.drop([0], inplace = True)

london_df.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
1,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,".mw-parser-output .geo-default,.mw-parser-outp...",25
2,Barnet,,,Barnet London Borough Council,Conservative,"Barnet House, 2 Bristol Avenue, Colindale",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
3,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
4,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
5,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [7]:
# We only need the Borough and the coordinates so let´s replace our dataframe with the one containing only those two
london_df = london_df[['Borough','Co-ordinates']]

### Our client wanted to be somehow in a central area within London, so let´s stablish which areas are more central than others

For that we found a good source of information in:

https://en.wikipedia.org/wiki/Inner_London

This tell us which Boroughs are considered inner London.

City of London is in the inner area but is not considered a borough so is not in our dataframe, we will add it after some further wrangling

In [8]:
#Creating a inner london list: 
inner = ['Camden','Greenwich [note 2]','Hackney','Hammersmith and Fulham [note 4]','Islington','Kensington and Chelsea','Lambeth','Lewisham','Lewisham','Southwark', 'Tower Hamlets','Wandsworth','Westminster']

#Now we use our list for creating a london dataframe containing only the selected inner boroughs
inner_london = london_df[london_df['Borough'].isin(inner)]

# Let´s also reset the index, and drop the extra column that creates with the old index
inner_london.reset_index(inplace=True)
inner_london.drop(['index'],axis=1,inplace=True)

inner_london.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,Borough,Co-ordinates
0,Camden,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W
1,Greenwich [note 2],51°29′21″N 0°03′53″E﻿ / ﻿51.4892°N 0.0648°E
2,Hackney,51°32′42″N 0°03′19″W﻿ / ﻿51.5450°N 0.0553°W
3,Hammersmith and Fulham [note 4],51°29′34″N 0°14′02″W﻿ / ﻿51.4927°N 0.2339°W
4,Islington,51°32′30″N 0°06′08″W﻿ / ﻿51.5416°N 0.1022°W


The above error doesnt affect us so we can ignore it

Now let´s replace a few values that contains charachters that we dont want

There are better ways of doing it, but are innecesary complications for such a small dataframe so let´s do it the quick and dirty way as the result is exactly what we need

In [9]:
inner_london["Borough"]= inner_london["Borough"].replace('Hammersmith and Fulham [note 4]', "Hammersmith and Fulham")
inner_london["Borough"]= inner_london["Borough"].replace('Greenwich [note 2]', "Greenwhich") 
inner_london.head(12)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


Unnamed: 0,Borough,Co-ordinates
0,Camden,51°31′44″N 0°07′32″W﻿ / ﻿51.5290°N 0.1255°W
1,Greenwhich,51°29′21″N 0°03′53″E﻿ / ﻿51.4892°N 0.0648°E
2,Hackney,51°32′42″N 0°03′19″W﻿ / ﻿51.5450°N 0.0553°W
3,Hammersmith and Fulham,51°29′34″N 0°14′02″W﻿ / ﻿51.4927°N 0.2339°W
4,Islington,51°32′30″N 0°06′08″W﻿ / ﻿51.5416°N 0.1022°W
5,Kensington and Chelsea,51°30′07″N 0°11′41″W﻿ / ﻿51.5020°N 0.1947°W
6,Lambeth,51°27′39″N 0°06′59″W﻿ / ﻿51.4607°N 0.1163°W
7,Lewisham,51°26′43″N 0°01′15″W﻿ / ﻿51.4452°N 0.0209°W
8,Southwark,51°30′13″N 0°04′49″W﻿ / ﻿51.5035°N 0.0804°W
9,Tower Hamlets,51°30′36″N 0°00′21″W﻿ / ﻿51.5099°N 0.0059°W


Again the above error doesnt affect us so we can proceed with more data wrangling

In [10]:
# To make our life easier, let´s change the Coordinates column name so doesnt contain the - which will create problems in the next steps
inner_london = inner_london.rename(columns={'Co-ordinates':'Coordinates'})

We need to clean the Coordinates column as we need only the Latitude and the Longitude and in a very specific format

In [11]:
# Separating the column in two parts so we have in one column the Latitude and Longitude and we can drop the GPS coordinates
inner_london[['GPS coordinates','LatLon']] = inner_london.Coordinates.str.split(" / ",expand=True,)

#Now we select only the columns we want in our dataframe
inner_london = inner_london[['Borough', 'LatLon']]

# We make a further split of our LatLon column to separate in two individual columns: Latitude and Longitude
inner_london[['Latitude','Longitude']] = inner_london.LatLon.str.split(' ',expand=True,)

#Again we select only the columns we want
inner_london = inner_london[['Borough', 'Latitude','Longitude']]

#Let´s take a peak
inner_london.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Camden,﻿51.5290°N,0.1255°W
1,Greenwhich,﻿51.4892°N,0.0648°E
2,Hackney,﻿51.5450°N,0.0553°W
3,Hammersmith and Fulham,﻿51.4927°N,0.2339°W
4,Islington,﻿51.5416°N,0.1022°W


In [12]:
#Now we need to remove all the data that is not a pure number, so let´s clean it
inner_london['Latitude'] = inner_london['Latitude'].map(lambda x: x.rstrip('°N'))
inner_london['Longitude'] = inner_london['Longitude'].map(lambda x: x.rstrip('°W'))
inner_london['Longitude'] = inner_london['Longitude'].map(lambda x: x.rstrip('°E'))
inner_london.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Camden,﻿51.5290,0.1255
1,Greenwhich,﻿51.4892,0.0648
2,Hackney,﻿51.5450,0.0553
3,Hammersmith and Fulham,﻿51.4927,0.2339
4,Islington,﻿51.5416,0.1022


Let´s check the dtypes because we will need it as floats for the Foursquare API

In [13]:
inner_london.dtypes

0
Borough      object
Latitude     object
Longitude    object
dtype: object

We see these are object types, so we will need to transform them.

Before doing so we will add the minus charachter "-" in front of our Longitude, as otherwise points to an incorrect point in the map

In [14]:
inner_london['Longitude'] = ('-' + inner_london['Longitude'])

And before proceeding to convert to float, let´s not forget about the city of London, for which we have found the coordinates in here:

https://latitudelongitude.org/gb/city-of-london/

So let´s add it to our dataframe

In [15]:
new_row = {'Borough':'City of London', 'Latitude':51.51279, 'Longitude':-0.09184}

#append row to the dataframe
inner_london = inner_london.append(new_row, ignore_index=True)

#Now we can check again our table:
inner_london

Unnamed: 0,Borough,Latitude,Longitude
0,Camden,﻿51.5290,-0.1255
1,Greenwhich,﻿51.4892,-0.0648
2,Hackney,﻿51.5450,-0.0553
3,Hammersmith and Fulham,﻿51.4927,-0.2339
4,Islington,﻿51.5416,-0.1022
5,Kensington and Chelsea,﻿51.5020,-0.1947
6,Lambeth,﻿51.4607,-0.1163
7,Lewisham,﻿51.4452,-0.0209
8,Southwark,﻿51.5035,-0.0804
9,Tower Hamlets,﻿51.5099,-0.0059


Now we cannot convert to float because Westminster Latitude, so replacing the hidden values that are shown in the error we got.

As is only in one row, let´s fix it simply

In [16]:
inner_london["Latitude"]= inner_london["Latitude"].replace('\ufeff', "",regex=True) 

Now, finally we can conver our two columns to float:

In [17]:
inner_london['Latitude']=inner_london['Latitude'].astype(str).astype(float)
inner_london['Longitude']=inner_london['Longitude'].astype(str).astype(float)

#Let´s re-check the dtypes:
inner_london.dtypes

0
Borough       object
Latitude     float64
Longitude    float64
dtype: object

In [18]:
#And our final dataframe
inner_london.head()

Unnamed: 0,Borough,Latitude,Longitude
0,Camden,51.529,-0.1255
1,Greenwhich,51.4892,-0.0648
2,Hackney,51.545,-0.0553
3,Hammersmith and Fulham,51.4927,-0.2339
4,Islington,51.5416,-0.1022


Now let´s visualize our inner London Boroughs in a map.

We do it by using latitude and longitude values

For that and for the analysis that we will do later, we need to install and import a few more libraries

In [19]:
# Let´s install the rest of libraries we will need for the analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

#!conda install -c conda-forge geopy --yes (already in the notebook)
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes (already in the notebook)
import folium # map rendering library

print('Let´s Roll')

Let´s Roll


First let´s create a map of London

In [20]:
#Locating the Latitude and the Longitude values of the city

address = 'London'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of London are 51.5073219, -0.1276474.


In [21]:
#Now we create a map of inner London boroughs using the values gathered in our dataframen and in our city coordinates
london_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(inner_london['Latitude'], inner_london['Longitude'], inner_london['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(london_map)  
    
london_map

### Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [22]:
# Defining Foursquare credentials and version
CLIENT_ID = '5ICJLVYATQEV3T3ROXKNPX5Z0OBWMECKYABTTVLQT3SDFKMH' 
CLIENT_SECRET = 'XA41D2GUYEA43KK4JQM24XNGX4RNOTUZPKFZ4N2212RHZ2DO' 
VERSION = '20180605' 
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5ICJLVYATQEV3T3ROXKNPX5Z0OBWMECKYABTTVLQT3SDFKMH
CLIENT_SECRET:XA41D2GUYEA43KK4JQM24XNGX4RNOTUZPKFZ4N2212RHZ2DO


Let´s play at first with only one Borough

In [23]:
#get the first borough name
inner_london.loc[0, 'Borough']

'Camden'

In [24]:
# Get the Neighbourhood Latitude and Longitude value
neigh_lat = inner_london.loc[0, 'Latitude']
neigh_lon = inner_london.loc[0, 'Longitude']

neigh_name = inner_london.loc[0, 'Borough'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neigh_name, 
                                                               neigh_lat, 
                                                               neigh_lon))

Latitude and longitude values of Camden are 51.529, -0.1255.


In [25]:
# Get the top 100 venues that are in Camden within a radius of 1000 meters

# First create GET request URL
radius = 1000
#LIMIT= 100 has been already defined with the ID
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neigh_lat, neigh_lon, VERSION, radius, LIMIT)

# Sending the GET request and examining the results
results = requests.get(url).json()
results


{'meta': {'code': 200, 'requestId': '5fdf3647b4359f7d96451348'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': "King's Cross",
  'headerFullLocation': "King's Cross, London",
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 164,
  'suggestedBounds': {'ne': {'lat': 51.538000009000015,
    'lng': -0.11106029796310865},
   'sw': {'lat': 51.51999999099999, 'lng': -0.13993970203689135}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4daeaa490437710b8137b098',
       'name': 'The Sir John Ritblat Gallery: Treasures of the British Library',
       'location': {'address': '96 Euston Road',
        'crossStreet': 'The British Library',
        'lat': 51.52966611994934,
  

### Note: On the Foursquare results, all information is in the item key so let´s leverage this knowledge into the below definition

In [26]:
# Defining a function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [27]:
# Now let´s clean the JSON and structure it into a pandas DataFrame
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Sir John Ritblat Gallery: Treasures of the...,Museum,51.529666,-0.127541
1,St. Pancras Renaissance Hotel London,Hotel,51.529733,-0.125912
2,Pullman London St Pancras,Hotel,51.528668,-0.128191
3,Pitted Olive,Turkish Restaurant,51.526369,-0.125623
4,Pullman Hotel Breakfast Area,Breakfast Spot,51.528484,-0.128126


In [28]:
# Amount of venues returned by Foursquare
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


### Now let´s do same exploration but with all the neighborhoods in our inner London DataFrame

In [29]:
# Create a function to repeat the same process to all the neighborhoods but let´s increase the radius
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
# Creating a new DataFrame with all the venues info when recalling the function defined above

inner_london_venues = getNearbyVenues(names=inner_london['Borough'],
                                      latitudes = inner_london['Latitude'],
                                      longitudes = inner_london['Longitude']
                                     )

Camden
Greenwhich
Hackney
Hammersmith and Fulham
Islington
Kensington and Chelsea
Lambeth
Lewisham
Southwark
Tower Hamlets
Wandsworth
Westminster
City of London


In [31]:
#Checking the new dataframe
inner_london_venues.head(10)

Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Camden,51.529,-0.1255,The Sir John Ritblat Gallery: Treasures of the...,51.529666,-0.127541,Museum
1,Camden,51.529,-0.1255,Pitted Olive,51.526369,-0.125623,Turkish Restaurant
2,Camden,51.529,-0.1255,St. Pancras Renaissance Hotel London,51.529733,-0.125912,Hotel
3,Camden,51.529,-0.1255,Aux Pains de Papy,51.52934,-0.120303,Bakery
4,Camden,51.529,-0.1255,London St Pancras International Railway Statio...,51.531982,-0.126086,Train Station
5,Camden,51.529,-0.1255,M&S Simply Food,51.5328,-0.127123,Grocery Store
6,Camden,51.529,-0.1255,Granger & Co.,51.532606,-0.125275,Breakfast Spot
7,Camden,51.529,-0.1255,Pizza Union,51.530984,-0.119933,Pizza Place
8,Camden,51.529,-0.1255,Barry's Bootcamp,51.527075,-0.131056,Gym / Fitness Center
9,Camden,51.529,-0.1255,Gagosian Gallery,51.530125,-0.118065,Art Gallery


In [32]:
# Count of venues returned by Neighborhood
inner_london_venues.groupby(['Borough'])[['Venue']].count() 

Unnamed: 0_level_0,Venue
Borough,Unnamed: 1_level_1
Camden,100
City of London,100
Greenwhich,100
Hackney,100
Hammersmith and Fulham,100
Islington,100
Kensington and Chelsea,100
Lambeth,100
Lewisham,37
Southwark,100


In [33]:
# Amount of unique categories within the returned venues
print('There are {} uniques categories.'.format(len(inner_london_venues['Venue Category'].unique())))

There are 189 uniques categories.


#### Now let´s analyse each Neighborhood. One-Hot Encoding

First let´s use pandas get dummies: Convert categorical variable into dummy/indicator variables.
Returns: DataFrame (Dummy-coded data)

To differenciate one hot encoding of dummy coded variable here´s a bit of info:

[...] there is some redundancy in One-Hot encoding. For instance, if we know that a passenger’s flight ticket is not First Class and not Economy Class, then it must be Business Class. So we only need to use two of these three dummy-coded variables as a predictor. More generally, the number of dummy-coded variables needed is one less than the number of possible values, which is K-1. In statistics, this is called a dummy encoding variable, or dummy variable.
Dummy encoding variable is a standard advice in statistics to avoid the dummy variable trap, However, in the world of machine learning, One-Hot encoding is more recommended because dummy variable trap is not really a problem when applying regularization

In [34]:
# one hot encoding
london_dummy = pd.get_dummies(inner_london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_dummy['Borough'] = inner_london_venues['Borough'] 

# move neighborhood column to the first column
fixed_columns = [london_dummy.columns[-1]] + list(london_dummy.columns[:-1])
london_dummy = london_dummy[fixed_columns]

london_dummy.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Store,Bike Shop,Bistro,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Canal,Canal Lock,Cantonese Restaurant,Caribbean Restaurant,Castle,Chaat Place,Cheese Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Concert Hall,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Diner,Distillery,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fruit & Vegetable Store,Garden,Garden Center,Gastropub,Gelato Shop,Gift Shop,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lighthouse,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Outlet Mall,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recording Studio,Restaurant,Road,Roof Deck,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Mall,Shopping Plaza,Social Club,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Camden,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Camden,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
2,Camden,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Camden,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Camden,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [35]:
london_dummy.shape

(1237, 190)

#### Next, let's group rows by neighborhood and take the mean of the frequency of occurrence of each category

In [36]:
london_grouped = london_dummy.groupby('Borough').mean().reset_index()
london_grouped.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Store,Bike Shop,Bistro,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Burger Joint,Burrito Place,Butcher,Café,Canal,Canal Lock,Cantonese Restaurant,Caribbean Restaurant,Castle,Chaat Place,Cheese Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comic Shop,Concert Hall,Coworking Space,Creperie,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Diner,Distillery,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fruit & Vegetable Store,Garden,Garden Center,Gastropub,Gelato Shop,Gift Shop,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Latin American Restaurant,Lebanese Restaurant,Lighthouse,Liquor Store,Lounge,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Venue,Nail Salon,Nature Preserve,New American Restaurant,Nightclub,Okonomiyaki Restaurant,Opera House,Organic Grocery,Outdoor Sculpture,Outlet Mall,Palace,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Pie Shop,Pier,Pilates Studio,Pizza Place,Playground,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recording Studio,Restaurant,Road,Roof Deck,Salad Place,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Science Museum,Seafood Restaurant,Shopping Mall,Shopping Plaza,Social Club,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Yoga Studio
0,Camden,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.09,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.03,0.0,0.0,0.02,0.0,0.0,0.05,0.01,0.01,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0
1,City of London,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.09,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.05,0.0,0.0,0.02,0.0,0.0,0.1,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0
2,Greenwhich,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.04,0.04,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.08,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.08,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.0,0.0
3,Hackney,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.02,0.09,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.12,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.01,0.02,0.0,0.03
4,Hammersmith and Fulham,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.07,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.01,0.04,0.0,0.02,0.0,0.0,0.01,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.0,0.03,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.0,0.12,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.02,0.02,0.0,0.01,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
london_grouped.shape

(13, 190)

#### Print each Neighborhood with it´s 10 top most common venues

In [38]:
num_top_venues = 10

for hood in london_grouped['Borough']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Borough'] == hood].T.reset_index() #.T is = to transpose
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Camden----
                  venue  freq
0           Coffee Shop  0.09
1                 Hotel  0.05
2           Pizza Place  0.04
3                Bakery  0.04
4             Bookstore  0.04
5      Sushi Restaurant  0.03
6  Gym / Fitness Center  0.03
7        Breakfast Spot  0.03
8    Turkish Restaurant  0.02
9    English Restaurant  0.02


----City of London----
                  venue  freq
0                 Hotel  0.10
1           Coffee Shop  0.09
2        Scenic Lookout  0.06
3  Gym / Fitness Center  0.05
4          Cocktail Bar  0.04
5    Italian Restaurant  0.04
6               Theater  0.03
7    Falafel Restaurant  0.03
8            Food Truck  0.03
9         Grocery Store  0.03


----Greenwhich----
                   venue  freq
0            Coffee Shop  0.09
1                Brewery  0.08
2                    Pub  0.08
3                   Park  0.06
4           Cocktail Bar  0.04
5            Art Gallery  0.04
6                 Bakery  0.04
7                    Bar  0.04


#### Let´s put it now into a DataFrame

In [39]:
# Defining a funtion to sort the results descending
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
# Now let's create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neigh_venues_sort = pd.DataFrame(columns=columns)
neigh_venues_sort['Borough'] = london_grouped['Borough']

for ind in np.arange(london_grouped.shape[0]):
    neigh_venues_sort.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neigh_venues_sort.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Camden,Coffee Shop,Hotel,Bakery,Bookstore,Pizza Place,Breakfast Spot,Gym / Fitness Center,Sushi Restaurant,English Restaurant,Middle Eastern Restaurant
1,City of London,Hotel,Coffee Shop,Scenic Lookout,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Falafel Restaurant,Food Truck,Theater,Grocery Store
2,Greenwhich,Coffee Shop,Brewery,Pub,Park,Art Gallery,Cocktail Bar,Bar,Bakery,Italian Restaurant,Vietnamese Restaurant
3,Hackney,Pub,Café,Coffee Shop,Bakery,Brewery,Park,Yoga Studio,Bookstore,Restaurant,Farmers Market
4,Hammersmith and Fulham,Pub,Coffee Shop,Café,Japanese Restaurant,Gastropub,French Restaurant,Turkish Restaurant,Italian Restaurant,Park,Pizza Place


#### Clustering Neighborhoods
This will help to analyse the information to try to get an initial approximate answer for our client

In [41]:
# set number of clusters
kclust = 5

london_clust = london_grouped.drop('Borough', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclust, random_state=0).fit(london_clust)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 4, 1, 1, 1, 1, 0, 1, 3, 4], dtype=int32)

#### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [42]:
# add clustering labels
neigh_venues_sort.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = inner_london

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neigh_venues_sort.set_index('Borough'), on='Borough')

london_merged.head()

Unnamed: 0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Camden,51.529,-0.1255,0,Coffee Shop,Hotel,Bakery,Bookstore,Pizza Place,Breakfast Spot,Gym / Fitness Center,Sushi Restaurant,English Restaurant,Middle Eastern Restaurant
1,Greenwhich,51.4892,-0.0648,1,Coffee Shop,Brewery,Pub,Park,Art Gallery,Cocktail Bar,Bar,Bakery,Italian Restaurant,Vietnamese Restaurant
2,Hackney,51.545,-0.0553,1,Pub,Café,Coffee Shop,Bakery,Brewery,Park,Yoga Studio,Bookstore,Restaurant,Farmers Market
3,Hammersmith and Fulham,51.4927,-0.2339,1,Pub,Coffee Shop,Café,Japanese Restaurant,Gastropub,French Restaurant,Turkish Restaurant,Italian Restaurant,Park,Pizza Place
4,Islington,51.5416,-0.1022,1,Pub,Coffee Shop,Cocktail Bar,Park,Mediterranean Restaurant,Café,Pizza Place,Italian Restaurant,Seafood Restaurant,Ice Cream Shop


#### Now let´s visualize the clusters in the map

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclust)
ys = [i + x + (i*x)**2 for i in range(kclust)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'],london_merged['Longitude'], london_merged['Borough'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

#### Examining Clusters
Let´s examine clusters so we can determine what are their defining categories so we can categorize them with descriptive names

### CLUSTER LABEL 0

In [44]:
london_merged.loc[london_merged['Cluster Labels'] == 0, 
                     london_merged.columns[[0] + list(range(4, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Camden,Coffee Shop,Hotel,Bakery,Bookstore,Pizza Place,Breakfast Spot,Gym / Fitness Center,Sushi Restaurant,English Restaurant,Middle Eastern Restaurant
5,Kensington and Chelsea,Restaurant,Pub,Garden,Italian Restaurant,Hotel,Café,Gym / Fitness Center,Park,Bakery,Persian Restaurant
9,Tower Hamlets,Coffee Shop,Gym / Fitness Center,Hotel,Pub,Burger Joint,Bar,Park,Italian Restaurant,Plaza,Gym


Looking at this cluster we can make a visual count of places with potential high frequency of pedestrians

9 Restaurants

6 Coffee Shops/bakeries/breakfast spots

4 Gyms/Fitness Centers

3 Hotels

3 Pub/Bar

### CLUSTER LABEL 1

In [45]:
london_merged.loc[london_merged['Cluster Labels'] == 1, 
                     london_merged.columns[[0] + list(range(4, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Greenwhich,Coffee Shop,Brewery,Pub,Park,Art Gallery,Cocktail Bar,Bar,Bakery,Italian Restaurant,Vietnamese Restaurant
2,Hackney,Pub,Café,Coffee Shop,Bakery,Brewery,Park,Yoga Studio,Bookstore,Restaurant,Farmers Market
3,Hammersmith and Fulham,Pub,Coffee Shop,Café,Japanese Restaurant,Gastropub,French Restaurant,Turkish Restaurant,Italian Restaurant,Park,Pizza Place
4,Islington,Pub,Coffee Shop,Cocktail Bar,Park,Mediterranean Restaurant,Café,Pizza Place,Italian Restaurant,Seafood Restaurant,Ice Cream Shop
6,Lambeth,Coffee Shop,Pub,Cocktail Bar,Pizza Place,Brewery,Café,Park,Market,Music Venue,Japanese Restaurant



Looking at this cluster we can make a visual count of places with potential high frequency of pedestrians

14 Restaurants

10 Coffee Shops/bakeries/breakfast spots

10 Pub/Bar/Cocktail/Gastro


We also count with some Markets that can also help our client business 

But there are a number of Breweries that may drive customers away: 3

### CLUSTER LABEL 2

In [46]:
london_merged.loc[london_merged['Cluster Labels'] == 2, 
                     london_merged.columns[[0] + list(range(4, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Westminster,Hotel,Plaza,Boutique,Park,Cocktail Bar,Café,Indian Restaurant,Lounge,Art Museum,Japanese Restaurant


This cluster is very small and not very interesting one for our client

### CLUSTER LABEL 3

In [47]:
london_merged.loc[london_merged['Cluster Labels'] == 3, 
                     london_merged.columns[[0] + list(range(4, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Lewisham,Coffee Shop,Pub,Park,Café,Turkish Restaurant,Supermarket,Pizza Place,Bar,Garden Center,Cocktail Bar
10,Wandsworth,Pub,Coffee Shop,Park,Café,Pizza Place,Thai Restaurant,Supermarket,Grocery Store,Burger Joint,Cocktail Bar


This cluster came up small as well, and has som pubs and restaurants in it, but has as well some supermarkets and grocery stores that can drive customers away.

Therefore not a very interesting one for our client

### CLUSTER LABEL 4

In [48]:
london_merged.loc[london_merged['Cluster Labels'] == 4, 
                     london_merged.columns[[0] + list(range(4, london_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Southwark,Coffee Shop,Hotel,Italian Restaurant,Seafood Restaurant,Bakery,Tapas Restaurant,Steakhouse,Garden,Brewery,Scenic Lookout
12,City of London,Hotel,Coffee Shop,Scenic Lookout,Gym / Fitness Center,Italian Restaurant,Cocktail Bar,Falafel Restaurant,Food Truck,Theater,Grocery Store


This cluster has some potential as are certain concentration of restaurants but are not the most common venues

In [49]:
# Naming the clusters based in potential for our client and the analysis of each cluster we just made
Cluster0 = "Eat"
Cluster1 = "Eat and drink"
Cluster2 = "Tourist"
Cluster3 = "Coffee and walking"
Cluster4 = "Restaurant potential"

In [50]:
# Defining the conditions of my df to implement a new column with the cluster names
conditions = [
    london_merged['Cluster Labels'] == 0,
    london_merged['Cluster Labels'] == 1,
    london_merged['Cluster Labels'] == 2,
    london_merged['Cluster Labels'] == 3,
    london_merged['Cluster Labels'] == 4,
]

# Now define the outputs
outputs = [
    Cluster0,Cluster1, Cluster2, Cluster3, Cluster4
]

# Finally numpy select (using it instead of apply as gives a better performance on large data sets)
CNames = np.select(conditions, outputs, 'Other')
pd.Series(CNames).head()

0              Eat
1    Eat and drink
2    Eat and drink
3    Eat and drink
4    Eat and drink
dtype: object

#### Now that we´re happy with the result, let´s add this into our dataframe

Could be done as below:

<code>london_merged['Cluster Name'] = CNames</code>

But will do as follows, so we instert the column right beside the Cluster Label instead of at the end of the DF

In [51]:
london_merged.insert(0,'Cluster Name', CNames) 

In [52]:
london_merged.head(13)

Unnamed: 0,Cluster Name,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Eat,Camden,51.529,-0.1255,0,Coffee Shop,Hotel,Bakery,Bookstore,Pizza Place,Breakfast Spot,Gym / Fitness Center,Sushi Restaurant,English Restaurant,Middle Eastern Restaurant
1,Eat and drink,Greenwhich,51.4892,-0.0648,1,Coffee Shop,Brewery,Pub,Park,Art Gallery,Cocktail Bar,Bar,Bakery,Italian Restaurant,Vietnamese Restaurant
2,Eat and drink,Hackney,51.545,-0.0553,1,Pub,Café,Coffee Shop,Bakery,Brewery,Park,Yoga Studio,Bookstore,Restaurant,Farmers Market
3,Eat and drink,Hammersmith and Fulham,51.4927,-0.2339,1,Pub,Coffee Shop,Café,Japanese Restaurant,Gastropub,French Restaurant,Turkish Restaurant,Italian Restaurant,Park,Pizza Place
4,Eat and drink,Islington,51.5416,-0.1022,1,Pub,Coffee Shop,Cocktail Bar,Park,Mediterranean Restaurant,Café,Pizza Place,Italian Restaurant,Seafood Restaurant,Ice Cream Shop
5,Eat,Kensington and Chelsea,51.502,-0.1947,0,Restaurant,Pub,Garden,Italian Restaurant,Hotel,Café,Gym / Fitness Center,Park,Bakery,Persian Restaurant
6,Eat and drink,Lambeth,51.4607,-0.1163,1,Coffee Shop,Pub,Cocktail Bar,Pizza Place,Brewery,Café,Park,Market,Music Venue,Japanese Restaurant
7,Coffee and walking,Lewisham,51.4452,-0.0209,3,Coffee Shop,Pub,Park,Café,Turkish Restaurant,Supermarket,Pizza Place,Bar,Garden Center,Cocktail Bar
8,Restaurant potential,Southwark,51.5035,-0.0804,4,Coffee Shop,Hotel,Italian Restaurant,Seafood Restaurant,Bakery,Tapas Restaurant,Steakhouse,Garden,Brewery,Scenic Lookout
9,Eat,Tower Hamlets,51.5099,-0.0059,0,Coffee Shop,Gym / Fitness Center,Hotel,Pub,Burger Joint,Bar,Park,Italian Restaurant,Plaza,Gym


In [53]:
#Lets get only those boroughs that have more potential for our client from this point of view
london_merged['Borough'][london_merged['Cluster Name'].str.contains('Eat')].tolist()

['Camden',
 'Greenwhich',
 'Hackney',
 'Hammersmith and Fulham',
 'Islington',
 'Kensington and Chelsea',
 'Lambeth',
 'Tower Hamlets']

Now let´s see them in the map to have a grasp of how central they are:

In [59]:
#For that let´s create a new dataframe with only the selected Boroughs:
list_borough = ['Camden',
 'Greenwhich',
 'Hackney',
 'Hammersmith and Fulham',
 'Islington',
 'Kensington and Chelsea',
 'Lambeth',
 'Tower Hamlets']
selection = inner_london.loc[inner_london['Borough'].isin(list_borough)]

selection.head(10)

Unnamed: 0,Borough,Latitude,Longitude
0,Camden,51.529,-0.1255
1,Greenwhich,51.4892,-0.0648
2,Hackney,51.545,-0.0553
3,Hammersmith and Fulham,51.4927,-0.2339
4,Islington,51.5416,-0.1022
5,Kensington and Chelsea,51.502,-0.1947
6,Lambeth,51.4607,-0.1163
9,Tower Hamlets,51.5099,-0.0059


In [55]:
#Now we create a map of inner London using the values gathered above
selection_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough in zip(selection['Latitude'], selection['Longitude'], selection['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='green',
        fill=True,
        fill_color='green',
        fill_opacity=0.7,
        parse_html=False).add_to(selection_map)  
    
selection_map

In [56]:
#Let's explore a bit further our selected Boroughs by reducing the information
exploring_selection = london_merged.loc[inner_london['Borough'].isin(list_borough)]

#Let's select only some of the columns, up to 5 most common venues
exploring_selection[['Cluster Name','Borough','1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue']]

Unnamed: 0,Cluster Name,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Eat,Camden,Coffee Shop,Hotel,Bakery,Bookstore,Pizza Place
1,Eat and drink,Greenwhich,Coffee Shop,Brewery,Pub,Park,Art Gallery
2,Eat and drink,Hackney,Pub,Café,Coffee Shop,Bakery,Brewery
3,Eat and drink,Hammersmith and Fulham,Pub,Coffee Shop,Café,Japanese Restaurant,Gastropub
4,Eat and drink,Islington,Pub,Coffee Shop,Cocktail Bar,Park,Mediterranean Restaurant
5,Eat,Kensington and Chelsea,Restaurant,Pub,Garden,Italian Restaurant,Hotel
6,Eat and drink,Lambeth,Coffee Shop,Pub,Cocktail Bar,Pizza Place,Brewery
9,Eat,Tower Hamlets,Coffee Shop,Gym / Fitness Center,Hotel,Pub,Burger Joint


### Now let´s bring some info about incomes and population
https://data.london.gov.uk/dataset/average-income-tax-payers-borough

We will use the data 2017-2018

In [57]:
income = pd.read_excel('income-of-tax-payers.xlsx', index_col=0).reset_index() 
income = income.loc[income['Area'].isin(list_borough)]
income = income[['Area','Mean £']]
income = income.set_index('Area')
income.sort_values(by='Mean £',ascending=False,inplace = True)
#income['Mean £'] = income['Mean £'].astype(int)
income.head()

Unnamed: 0_level_0,Mean £
Area,Unnamed: 1_level_1
Kensington and Chelsea,177000.0
Camden,92600.0
Hammersmith and Fulham,73200.0
Islington,62100.0
Tower Hamlets,46600.0


In [58]:
population = pd.read_excel('income-of-tax-payers.xlsx', index_col=0).reset_index() 
population = population.loc[population['Area'].isin(list_borough)]
population = population[['Area','Number of Individuals']]
population = population.set_index('Area')
population.sort_values(by='Number of Individuals',ascending=False,inplace = True)
#population['Number of Individuals'] = population['Number of Individuals'].astype(int)
population

Unnamed: 0_level_0,Number of Individuals
Area,Unnamed: 1_level_1
Lambeth,162000.0
Tower Hamlets,136000.0
Hackney,112000.0
Islington,106000.0
Camden,101000.0
Hammersmith and Fulham,94000.0
Kensington and Chelsea,68000.0


# Conclusion

Our client wanted Boroughs with a lot of people and commerce and also a high income, and a good number of On Trade business as well.

Our selection is already based not only in commercial areas (inner london) but also in the ones with good amount of commerce within the on trade.

Based in population we could choose the 3 first Boroughs in the last table, but since client needs good to high income the 3 chosen boroughs are finally:

- Kengsington and Chelsea

- Camden

- Hammersmith and Fulham

This is an approximation to a broad project that could be done in a much more granular level selecting a different data set, for example by diving in the internet, we found as well that the boroughs have further subdivisions like Districts and Wards.

This way we could get clusters with many more areas on them giving a much more precise picture of each borough and its venues

Also we could use street names for getting the coordinates and explore specific streets within certain Boroughs to be able to point to an exact location to recommend to our client.

So if our client is happy with what we can do and is willing to go ahead, we will dive deep into London to give them a precise answer to their problem.

We have shown as well, the leverage that provides an API like Foursquare to answer real business questions.


                                                                                                Norma López-Sancho