# Description & discussion on the background

I wanted to analyse a big city in my neighbourhood, I choose London. With a huge population it wont be hard to get a proper clusters but based on its unique communication system and localization it can be a really interesting topic to go deeper in. We can cluster the data and check which cluster has the most venues/smallest price. 


# Data Description

- First of all obviously foursquare API - to get the borough for the city of London
- Data which i can find on real estate prices to make an analysis
-  Spatial Data Repository of NYU to get will allow to cluster the data for London

# Methodology

In [1]:
import pandas as pd 
import requests
from bs4 import BeautifulSoup
import numpy as np


### Installing necessary packages and downloading the data needed for further analysis


In [2]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_areas_of_London")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[1]
table
df = pd.read_html(str(table))
df = pd.concat(df)
df

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
5,Aldborough Hatch,Redbridge[9],ILFORD,IG2,020,TQ455895
6,Aldgate,City[10],LONDON,EC3,020,TQ334813
7,Aldwych,Westminster[10],LONDON,WC2,020,TQ307810
8,Alperton,Brent[11],WEMBLEY,HA0,020,TQ185835
9,Anerley,Bromley[11],LONDON,SE20,020,TQ345695


### Deleting rows which are not needed + changing column names 

In [3]:
rows = [4,5]
df = df.drop(df.columns[rows], axis=1)

In [4]:
df.columns = ['Location', 'Borough', 'Posttown', 'Postcode']

### Where the Postcode are more than one, (for example, in Acton, there are 2 postcodes - W3 and W4), the postcodes are spread to multi-rows and assigned the same values from the other columns.

In [5]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [6]:
df

Unnamed: 0,Location,Borough,Posttown,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14"
5,Aldborough Hatch,Redbridge,ILFORD,IG2
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2
8,Alperton,Brent,WEMBLEY,HA0
9,Anerley,Bromley,LONDON,SE20


In [7]:
df0 = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))


In [8]:
df0

Unnamed: 0,Location,Borough,Posttown,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
2,Addington,Croydon,CROYDON,CR0
3,Addiscombe,Croydon,CROYDON,CR0
4,Albany Park,Bexley,"BEXLEY, SIDCUP",DA5
4,Albany Park,Bexley,"BEXLEY, SIDCUP",DA14
5,Aldborough Hatch,Redbridge,ILFORD,IG2
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2


### We want only london so we will take the London as posttown

In [9]:
df00 = df0[df0['Posttown'].str.contains('LONDON')]

In [10]:
df00

Unnamed: 0,Location,Borough,Posttown,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,W4
6,Aldgate,City,LONDON,EC3
7,Aldwych,Westminster,LONDON,WC2
9,Anerley,Bromley,LONDON,SE20
10,Angel,Islington,LONDON,EC1
10,Angel,Islington,LONDON,N1
12,Archway,Islington,LONDON,N19
14,Arkley,Barnet,"BARNET, LONDON",EN5


### Taking into account only west London cause of Foursquare limitations

In [11]:
df_w = df00[df00['Postcode'].str.startswith(('W'))].reset_index(drop=True)

In [12]:
df_w

Unnamed: 0,Location,Borough,Posttown,Postcode
0,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3
1,Aldwych,Westminster,LONDON,WC2
2,Bayswater,Westminster,LONDON,W2
3,Bedford Park,Ealing,LONDON,W4
4,Bloomsbury,Camden,LONDON,WC1
5,Charing Cross,Westminster,LONDON,WC2
6,Chinatown,Westminster,LONDON,W1
7,Chiswick,"Hounslow, Ealing, Hammersmith and Fulham",LONDON,W4
8,Covent Garden,Westminster,LONDON,WC2
9,Ealing,Ealing,LONDON,W5


### Getting geocoder to work 

In [13]:
!pip -q install geocoder
import geocoder
import time
!pip -q install folium
import folium
from geopy.geocoders import Nominatim

In [14]:
def get_latlng(arcgis_geocoder):
    
    
    lat_lng_coords = None
    
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords


### Getting the coordinates for the postcodes we have here 

In [15]:

postal_codes = df_w['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]
df_w_loc = df_w
# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
df_w_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_w_loc['Latitude'] = df_w_coordinates['Latitude']
df_w_loc['Longitude'] = df_w_coordinates['Longitude']
df_w_loc.head(5)

Unnamed: 0,Location,Borough,Posttown,Postcode,Latitude,Longitude
0,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3,51.51324,-0.26746
1,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11968
2,Bayswater,Westminster,LONDON,W2,51.51494,-0.18048
3,Bedford Park,Ealing,LONDON,W4,51.48944,-0.26194
4,Bloomsbury,Camden,LONDON,WC1,51.5245,-0.12273


In [19]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
CLIENT_ID = "IKZKJX52LQZKIDW52RXN5UOB3WH3RCDEZNAXT3GLCEV20XTT" # your Foursquare ID
CLIENT_SECRET = "S35UPMWRV3P0A3E0IMTWWBWODBA3PGAHNQL4TZEFQISQJWQN" # your Foursquare Secret
VERSION = '20180604'



def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
w_venues = getNearbyVenues(names=df_w['Location'],
                                   latitudes=df_w['Latitude'],
                                   longitudes=df_w['Longitude']
                                  )

Acton
Aldwych
Bayswater
Bedford Park
Bloomsbury
Charing Cross
Chinatown
Chiswick
Covent Garden
Ealing
Fitzrovia
Grove Park
Gunnersbury
Hammersmith
Hanwell
Holborn
Holland Park
King's Cross
Little Venice
Maida Vale
Marylebone (also St Marylebone)
Mayfair
North Kensington
Notting Hill
Paddington
Shepherd's Bush
Soho
St Giles
St Pancras
West Ealing
West Kensington
White City
Wormwood Scrubs


In [29]:
w_venues.head(5)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Acton,51.51324,-0.26746,The Station House,51.508877,-0.263076,Pub
1,Acton,51.51324,-0.26746,London Star Hotel,51.509624,-0.272456,Hotel
2,Acton,51.51324,-0.26746,Everyone Active,51.506608,-0.266878,Gym / Fitness Center
3,Acton,51.51324,-0.26746,Acton Park,51.508595,-0.261573,Park
4,Acton,51.51324,-0.26746,Bake Me,51.508452,-0.268543,Creperie


In [30]:
w_venues.shape

(3264, 7)

In [31]:
w_venues.groupby('Neighbourhood').count()


Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acton,100,100,100,100,100,100
Aldwych,100,100,100,100,100,100
Bayswater,100,100,100,100,100,100
Bedford Park,100,100,100,100,100,100
Bloomsbury,100,100,100,100,100,100
Charing Cross,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Chiswick,100,100,100,100,100,100
Covent Garden,100,100,100,100,100,100
Ealing,100,100,100,100,100,100


In [32]:
w_venue_unique_count = w_venues['Venue Category'].value_counts().to_frame(name='Count')


In [33]:
w_venue_unique_count

Unnamed: 0,Count
Pub,195
Coffee Shop,163
Hotel,148
Bakery,95
Park,84
Gym / Fitness Center,82
Café,78
Indian Restaurant,74
Pizza Place,71
Theater,68


In [34]:
address = 'London, United Kingdom'
geolocator = Nominatim(user_agent="ln_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [38]:
map_london = folium.Map(location = [latitude, longitude], zoom_start = 12)
for lat, lng, borough, loc in zip(df_w['Latitude'], 
                                  df_w['Longitude'],
                                  df_w['Borough'],
                                  df_w['Location']):
    label = '{} - {}'.format(loc, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_london)  
    
display(map_london)

In [40]:
w_onehot = pd.get_dummies(w_venues[['Venue Category']], prefix = "", prefix_sep = "")
w_onehot['Neighbourhood'] = w_venues['Neighbourhood']
# move neighborhood column to the first column
fixed_columns = [w_onehot.columns[-1]] + list(w_onehot.columns[:-1])
w_onehot = w_onehot[fixed_columns]

In [41]:
w_onehot.head(5)

Unnamed: 0,Neighbourhood,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,...,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Acton,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [47]:
w_onehot.loc[w_onehot['Burger Joint'] != 0]

Unnamed: 0,Neighbourhood,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,...,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio
279,Bayswater,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
323,Bedford Park,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
416,Bloomsbury,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
611,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
656,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
723,Chiswick,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
922,Ealing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
943,Ealing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
976,Ealing,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1011,Fitzrovia,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [48]:
w_grouped = w_onehot.groupby('Neighbourhood').mean().reset_index()

In [49]:
num_top_venues = 10 # Top common venues needed
for hood in w_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = w_grouped[w_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Acton----
                       venue  freq
0                Coffee Shop  0.07
1       Gym / Fitness Center  0.07
2                        Pub  0.06
3              Grocery Store  0.06
4                      Hotel  0.06
5  Middle Eastern Restaurant  0.04
6                       Park  0.04
7             Sandwich Place  0.03
8                  Gastropub  0.03
9             Breakfast Spot  0.02


----Aldwych----
            venue  freq
0         Theater  0.09
1           Hotel  0.06
2  Ice Cream Shop  0.06
3     Coffee Shop  0.04
4      Steakhouse  0.04
5        Wine Bar  0.03
6       Bookstore  0.03
7          Garden  0.03
8    Liquor Store  0.03
9    Dessert Shop  0.03


----Bayswater----
                  venue  freq
0                 Hotel  0.08
1                   Pub  0.06
2                Garden  0.05
3           Coffee Shop  0.05
4                  Café  0.05
5  Gym / Fitness Center  0.04
6                Bakery  0.03
7              Wine Bar  0.03
8             Gastropub  0.02

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = w_grouped['Neighbourhood']
for ind in np.arange(w_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(w_grouped.iloc[ind, :], num_top_venues)
neighbourhoods_venues_sorted.head(5)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acton,Coffee Shop,Gym / Fitness Center,Grocery Store,Pub,Hotel,Park,Middle Eastern Restaurant,Sandwich Place,Gastropub,Pizza Place
1,Aldwych,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
2,Bayswater,Hotel,Pub,Coffee Shop,Garden,Café,Gym / Fitness Center,Wine Bar,Bakery,Yoga Studio,Ice Cream Shop
3,Bedford Park,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
4,Bloomsbury,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant


In [54]:
w_grouped_clustering = w_grouped.drop('Neighbourhood', 1)

In [56]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5
# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(w_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 4, 3, 1, 1, 2, 3, 1, 3], dtype=int32)

In [68]:
# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


ValueError: cannot insert Cluster Labels, already exists

In [69]:

w_merged = df_w
# match/merge SE London data with latitude/longitude for each neighborhood
w_merged_latlong = w_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Location')
w_merged_latlong.head(5)

Unnamed: 0,Location,Borough,Posttown,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acton,"Ealing, Hammersmith and Fulham",LONDON,W3,51.51324,-0.26746,0,Coffee Shop,Gym / Fitness Center,Grocery Store,Pub,Hotel,Park,Middle Eastern Restaurant,Sandwich Place,Gastropub,Pizza Place
1,Aldwych,Westminster,LONDON,WC2,51.51651,-0.11968,1,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
2,Bayswater,Westminster,LONDON,W2,51.51494,-0.18048,4,Hotel,Pub,Coffee Shop,Garden,Café,Gym / Fitness Center,Wine Bar,Bakery,Yoga Studio,Ice Cream Shop
3,Bedford Park,Ealing,LONDON,W4,51.48944,-0.26194,3,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
4,Bloomsbury,Camden,LONDON,WC1,51.5245,-0.12273,1,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant


In [72]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(w_merged_latlong['Latitude'], w_merged_latlong['Longitude'], w_merged_latlong['Location'], w_merged_latlong['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
display(map_clusters)

# Results

### Summarizing cluster one 


In [75]:
w_merged_latlong.loc[w_merged_latlong['Cluster Labels'] == 0, w_merged_latlong.columns[[1] + list(range(5, w_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Ealing, Hammersmith and Fulham",-0.26746,0,Coffee Shop,Gym / Fitness Center,Grocery Store,Pub,Hotel,Park,Middle Eastern Restaurant,Sandwich Place,Gastropub,Pizza Place
13,Hammersmith and Fulham,-0.22935,0,Pub,Coffee Shop,Café,Italian Restaurant,Indian Restaurant,Pizza Place,Hotel,Thai Restaurant,Park,Bakery
25,Hammersmith and Fulham,-0.23691,0,Pub,Gym / Fitness Center,Middle Eastern Restaurant,Bakery,Clothing Store,Thai Restaurant,Indian Restaurant,Coffee Shop,Gastropub,Falafel Restaurant
30,Hammersmith and Fulham,-0.20993,0,Coffee Shop,Café,Pub,Pizza Place,Bakery,Middle Eastern Restaurant,Restaurant,Indian Restaurant,Gastropub,Italian Restaurant
31,Hammersmith and Fulham,-0.23691,0,Pub,Gym / Fitness Center,Middle Eastern Restaurant,Bakery,Clothing Store,Thai Restaurant,Indian Restaurant,Coffee Shop,Gastropub,Falafel Restaurant
32,Hammersmith and Fulham,-0.23691,0,Pub,Gym / Fitness Center,Middle Eastern Restaurant,Bakery,Clothing Store,Thai Restaurant,Indian Restaurant,Coffee Shop,Gastropub,Falafel Restaurant


### Cluster Two 


In [76]:
w_merged_latlong.loc[w_merged_latlong['Cluster Labels'] == 1, w_merged_latlong.columns[[1] + list(range(5, w_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Westminster,-0.11968,1,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
4,Camden,-0.12273,1,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant
5,Westminster,-0.11968,1,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
8,Westminster,-0.11968,1,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
15,Camden,-0.12273,1,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant
17,Camden and Islington,-0.12273,1,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant
27,Camden,-0.11968,1,Theater,Ice Cream Shop,Hotel,Coffee Shop,Steakhouse,Liquor Store,Bookstore,Dessert Shop,Garden,Wine Bar
28,Camden,-0.12273,1,Coffee Shop,Theater,Hotel,Bookstore,Steakhouse,Beer Bar,Pizza Place,Exhibit,Dance Studio,Tapas Restaurant


### Cluster three

In [77]:
w_merged_latlong.loc[w_merged_latlong['Cluster Labels'] == 2, w_merged_latlong.columns[[1] + list(range(5, w_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Westminster,-0.14816,2,Hotel,Clothing Store,Art Gallery,Steakhouse,Boutique,Cocktail Bar,Juice Bar,Coffee Shop,Indian Restaurant,Hotel Bar
10,Camden,-0.14816,2,Hotel,Clothing Store,Art Gallery,Steakhouse,Boutique,Cocktail Bar,Juice Bar,Coffee Shop,Indian Restaurant,Hotel Bar
20,Westminster,-0.14816,2,Hotel,Clothing Store,Art Gallery,Steakhouse,Boutique,Cocktail Bar,Juice Bar,Coffee Shop,Indian Restaurant,Hotel Bar
21,Westminster,-0.14816,2,Hotel,Clothing Store,Art Gallery,Steakhouse,Boutique,Cocktail Bar,Juice Bar,Coffee Shop,Indian Restaurant,Hotel Bar
26,Westminster,-0.14816,2,Hotel,Clothing Store,Art Gallery,Steakhouse,Boutique,Cocktail Bar,Juice Bar,Coffee Shop,Indian Restaurant,Hotel Bar


### Cluster four

In [78]:
    w_merged_latlong.loc[w_merged_latlong['Cluster Labels'] == 3, w_merged_latlong.columns[[1] + list(range(5, w_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Ealing,-0.26194,3,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
7,"Hounslow, Ealing, Hammersmith and Fulham",-0.26194,3,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
9,Ealing,-0.30073,3,Coffee Shop,Pub,Park,Hotel,Pizza Place,Italian Restaurant,Burger Joint,Café,Supermarket,Gym / Fitness Center
11,Hounslow,-0.26194,3,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
12,Hounslow,-0.26194,3,Pub,Coffee Shop,Park,Bakery,Café,Gastropub,Japanese Restaurant,Gym / Fitness Center,Indian Restaurant,Thai Restaurant
14,Ealing,-0.3363,3,Park,Pub,Train Station,Indian Restaurant,Grocery Store,Coffee Shop,Supermarket,Hotel,Gym,Café
29,Ealing,-0.31951,3,Pub,Coffee Shop,Park,Hotel,Café,Pizza Place,Italian Restaurant,Burger Joint,Supermarket,Bar


### Cluster five

In [79]:
w_merged_latlong.loc[w_merged_latlong['Cluster Labels'] == 4, w_merged_latlong.columns[[1] + list(range(5, w_merged_latlong.shape[1]))]]

Unnamed: 0,Borough,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Westminster,-0.18048,4,Hotel,Pub,Coffee Shop,Garden,Café,Gym / Fitness Center,Wine Bar,Bakery,Yoga Studio,Ice Cream Shop
16,Kensington and Chelsea,-0.19173,4,Italian Restaurant,Hotel,Garden,Gym / Fitness Center,Burger Joint,Indian Restaurant,Café,Exhibit,Bakery,Restaurant
18,Westminster,-0.19526,4,Pub,Café,Bakery,Restaurant,Italian Restaurant,Gym / Fitness Center,Pizza Place,Turkish Restaurant,Gym,Middle Eastern Restaurant
19,Westminster,-0.19526,4,Pub,Café,Bakery,Restaurant,Italian Restaurant,Gym / Fitness Center,Pizza Place,Turkish Restaurant,Gym,Middle Eastern Restaurant
22,Kensington and Chelsea,-0.21353,4,Pub,Italian Restaurant,Gym / Fitness Center,Bakery,Cocktail Bar,Restaurant,Café,Pizza Place,Park,Breakfast Spot
23,Kensington and Chelsea,-0.20639,4,Pub,Bakery,Italian Restaurant,Breakfast Spot,Pizza Place,Juice Bar,Record Shop,Restaurant,Café,Gym / Fitness Center
24,Westminster,-0.18048,4,Hotel,Pub,Coffee Shop,Garden,Café,Gym / Fitness Center,Wine Bar,Bakery,Yoga Studio,Ice Cream Shop


# Discussion

# Conclusion

In this analyse we provided data that can help with choosin a suitable place for a new opening of a restaurant. Based on this data we can that in cluster 3 we have a really little number of restaurants. Ofcourse it is hard to deliver any deciding conclusion on such low number of data but it's a good starting point for further analysis and searching for a place. 