According to Bloomberg News, the London Housing Market is in a rut. It is now facing a number of different headwinds, including the prospect of higher taxes and a warning from the Bank of England that U.K. home values could fall as much as 30 percent in the event of a disorderly exit from the European Union. More specifically, four overlooked cracks suggest that the London market may be in worse shape than many realize: hidden price falls, record-low sales, homebuilder exodus and tax hikes addressing overseas buyers of homes in England and Wales.

### Business Problem
In this scenario, it is urgent to adopt machine learning tools in order to assist homebuyers clientele in London to make wise and effective decisions. As a result, the business problem we are currently posing is: how could we provide support to homebuyers clientele in to purchase a suitable real estate in London in this uncertain economic and financial scenario?

To solve this business problem, we are going to cluster London neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We will recommend profitable venues according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.

### Data section
Data on London properties and the relative price paid data were extracted from the HM Land Registry (http://landregistry.data.gov.uk/). The following fields comprise the address data included in Price Paid Data: Postcode; PAON Primary Addressable Object Name. Typically the house number or name; SAON Secondary Addressable Object Name. If there is a sub-building, for example, the building is divided into flats, there will be a SAON; Street; Locality; Town/City; District; County.

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on London properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.


In [None]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
df_ppd = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv")

In [4]:
# Assign meaningful column names
df_ppd.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

# Format the date column
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

# Delete all obsolete transactions which were done before 2016
df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

df_ppd_london = df_ppd.query("Town_City == 'LONDON'")

# Make a list of street names in LONDON
streets = df_ppd_london['Street'].unique().tolist()

df_grp_price = df_ppd_london.groupby(['Street'])['Price'].mean().reset_index()

# Give meaningful names to the columns
df_grp_price.columns = ['Street', 'Avg_Price']

#Input your Budget's Upper Limit and Lower Limit - Find the locations df_grp_price which fits your budget
df_affordable = df_grp_price.query("(Avg_Price >= 2200000) & (Avg_Price <= 2500000)")

# Display the dataframe
df_affordable

Unnamed: 0,Street,Avg_Price
196,ALBION SQUARE,2.450000e+06
390,ANHALT ROAD,2.435000e+06
405,ANSDELL TERRACE,2.250000e+06
422,APPLEGARTH ROAD,2.400000e+06
855,BARONSMEAD ROAD,2.375000e+06
981,BEAUCLERC ROAD,2.480000e+06
1102,BELVEDERE DRIVE,2.340000e+06
1215,BICKENHALL STREET,2.208500e+06
1253,BIRCHLANDS AVENUE,2.217000e+06
1553,BRAMPTON GROVE,2.456875e+06


In [5]:
geolocator = Nominatim()

  if __name__ == '__main__':


In [6]:
df_affordable['city_coord'] = df_affordable['Street'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [7]:

df_affordable[['Latitude', 'Longitude']] = df_affordable['city_coord'].apply(pd.Series)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


In [8]:
df = df_affordable.drop(columns=['city_coord'])

In [9]:
address = 'London, UK'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of London City are 51.5073219, -0.1276474.


In [10]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [11]:
CLIENT_ID = 'JDPJBXNEGXWDQD4FHVIDPE4RIZIE2QGJA01NALBAO3XX2JRH' # Foursquare ID
CLIENT_SECRET = 'BFJKWF3CPU1Z4FSWYOMIFF1IIDVAIBSBB4ZMEPMCAXTYUXFO' # Foursquare Secret
VERSION = '20200619' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JDPJBXNEGXWDQD4FHVIDPE4RIZIE2QGJA01NALBAO3XX2JRH
CLIENT_SECRET:BFJKWF3CPU1Z4FSWYOMIFF1IIDVAIBSBB4ZMEPMCAXTYUXFO


In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# Run the above function on each location and create a new dataframe called location_venues and display it.
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

ALBION SQUARE
ANHALT ROAD
ANSDELL TERRACE
APPLEGARTH ROAD
BARONSMEAD ROAD
BEAUCLERC ROAD
BELVEDERE DRIVE
BICKENHALL STREET
BIRCHLANDS AVENUE
BRAMPTON GROVE
BRIARDALE GARDENS
BROOKWAY
BURBAGE ROAD
BURY WALK
CALLCOTT STREET
CAMPDEN HILL ROAD
CAMPION ROAD
CANNING PLACE
CARLISLE ROAD
CARLTON GARDENS
CARLYLE COURT
CHALCOT SQUARE
CHARLES LANE
CHELSEA CRESCENT
CHESTER CLOSE NORTH
CHEYNE COURT
CHEYNE ROW
CHISWICK MALL
CITY ROAD
CLARENDON STREET
CLONCURRY STREET
COLBECK MEWS
COLLEGE CRESCENT
CORNWALL TERRACE MEWS
COURT LANE GARDENS
CRESCENT GROVE
DALEBURY ROAD
DEWHURST ROAD
DORIA ROAD
DOWNSHIRE HILL
DUCHESS WALK
ECCLESTON SQUARE MEWS
EGBERT STREET
EGERTON PLACE
ELM PARK ROAD
FLORAL STREET
FRANK DIXON WAY
FULTON MEWS
GERARD ROAD
GERRARD ROAD
GIRDLERS ROAD
GLOUCESTER CRESCENT
GORDON PLACE
GRAFTON SQUARE
GRAHAM TERRACE
HARMAN DRIVE
HARRIS STREET
HAVANNAH STREET
HAZLEWELL ROAD
HEREFORD MEWS
HERONDALE AVENUE
HIGHGATE HIGH STREET
HIGHWOOD HILL
HILLGATE PLACE
HOLLYCROFT AVENUE
HOLLYWOOD MEWS
HONEYWELL

In [14]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ALBION SQUARE,26,26,26,26,26,26
ANHALT ROAD,16,16,16,16,16,16
ANSDELL TERRACE,49,49,49,49,49,49
APPLEGARTH ROAD,5,5,5,5,5,5
BARONSMEAD ROAD,15,15,15,15,15,15
BEAUCLERC ROAD,4,4,4,4,4,4
BELVEDERE DRIVE,13,13,13,13,13,13
BICKENHALL STREET,68,68,68,68,68,68
BIRCHLANDS AVENUE,11,11,11,11,11,11
BRAMPTON GROVE,3,3,3,3,3,3


In [15]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Weight Loss Center,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Weight Loss Center,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo,Zoo Exhibit
0,ALBION SQUARE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.038462,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
1,ANHALT ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
2,ANSDELL TERRACE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.020408,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
3,APPLEGARTH ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4,BARONSMEAD ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
5,BEAUCLERC ROAD,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
6,BELVEDERE DRIVE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
7,BICKENHALL STREET,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.014706,0.0,0.000000,0.0,0.014706,0.000000,0.014706,0.014706,0.000000,0.0
8,BIRCHLANDS AVENUE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
9,BRAMPTON GROVE,0.000000,0.0,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,...,0.000000,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.0


In [17]:
london_grouped.shape

(152, 354)

In [18]:
# What are the top 5 venues/facilities nearby profitable real estate investments?#

num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALBION SQUARE----
               venue  freq
0               Café  0.19
1                Pub  0.08
2        Coffee Shop  0.08
3  Indian Restaurant  0.08
4         Restaurant  0.08


----ANHALT ROAD----
                venue  freq
0                 Pub  0.25
1       Grocery Store  0.12
2   French Restaurant  0.12
3  English Restaurant  0.06
4               Diner  0.06


----ANSDELL TERRACE----
            venue  freq
0       Juice Bar  0.08
1  Clothing Store  0.08
2      Restaurant  0.08
3           Hotel  0.06
4             Pub  0.06


----APPLEGARTH ROAD----
                  venue  freq
0                   Bar   0.4
1                Casino   0.2
2             Nightclub   0.2
3        Sandwich Place   0.2
4  Outdoor Supply Store   0.0


----BARONSMEAD ROAD----
                 venue  freq
0    Food & Drink Shop  0.13
1  Indie Movie Theater  0.07
2      Thai Restaurant  0.07
3     Community Center  0.07
4           Restaurant  0.07


----BEAUCLERC ROAD----
               venue  fre

                venue  freq
0               Hotel  0.16
1                 Pub  0.08
2         Coffee Shop  0.07
3  Chinese Restaurant  0.05
4              Garden  0.04


----GERARD ROAD----
                        venue  freq
0           Convenience Store   0.2
1               Grocery Store   0.2
2           Fish & Chips Shop   0.2
3            Business Service   0.2
4  Construction & Landscaping   0.2


----GERRARD ROAD----
                           venue  freq
0                            Pub   0.5
1           Fast Food Restaurant   0.5
2                     Print Shop   0.0
3                           Park   0.0
4  Paper / Office Supplies Store   0.0


----GIRDLERS ROAD----
                venue  freq
0                 Pub  0.15
1  Italian Restaurant  0.06
2           Gastropub  0.06
3               Hotel  0.06
4   Convention Center  0.06


----GORDON PLACE----
               venue  freq
0         Steakhouse  0.33
1             Resort  0.33
2        Pizza Place  0.33
3  Accessories

         venue  freq
0  Video Store   0.2
1         Food   0.2
2  Pizza Place   0.2
3  Golf Course   0.2
4   Smoke Shop   0.2


----OBSERVATORY GARDENS----
            venue  freq
0            Café  0.07
1             Pub  0.05
2  Clothing Store  0.05
3       Juice Bar  0.04
4      Restaurant  0.04


----OLD COURT PLACE----
            venue  freq
0           Hotel  0.10
1       Juice Bar  0.07
2  Clothing Store  0.07
3          Garden  0.07
4             Pub  0.05


----ONSLOW MEWS WEST----
                venue  freq
0               Hotel  0.10
1  Italian Restaurant  0.06
2              Bakery  0.05
3      Sandwich Place  0.04
4              Garden  0.04


----PALACE PLACE----
                           venue  freq
0        Health & Beauty Service   1.0
1              Accessories Store   0.0
2                    Pastry Shop   0.0
3                           Park   0.0
4  Paper / Office Supplies Store   0.0


----PANTON STREET----
         venue  freq
0          Pub  0.12
1  Coffee Sh

                venue  freq
0  Italian Restaurant  0.23
1      Ice Cream Shop  0.11
2               Hotel  0.08
3                Café  0.07
4         Pizza Place  0.04


----STAFFORD TERRACE----
                  venue  freq
0           Supermarket  0.43
1                   Pub  0.14
2   Rental Car Location  0.14
3           Pizza Place  0.14
4  Fast Food Restaurant  0.14


----SUTHERLAND PLACE----
                 venue  freq
0                  Bar  0.33
1                 Park  0.33
2                Hotel  0.33
3    Accessories Store  0.00
4  Outdoor Event Space  0.00


----SYDNEY STREET----
                           venue  freq
0                            Gym   0.5
1                    Men's Store   0.5
2                        Parking   0.0
3                           Park   0.0
4  Paper / Office Supplies Store   0.0


----THAMES BANK----
                  venue  freq
0  Gym / Fitness Center  0.25
1         Grocery Store  0.25
2          Burger Joint  0.25
3           Pizza Place 

In [19]:
# Define a function to return the most common venues/facilities nearby real estate investments#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [21]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

In [22]:

london_grouped=df

In [24]:
#Distribute in 5 Clusters
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([1, 3, 0, 3, 2, 1, 2, 0, 0, 1, 3, 3, 3, 1, 2, 2, 1, 3, 0, 1, 4, 4,
       3, 1, 1, 0, 3, 4, 1, 0, 3, 2, 3, 2, 2, 4, 3, 3, 2, 0, 1, 2, 4, 0,
       4, 0, 0, 4, 0, 0], dtype=int32)

In [25]:
#Dataframe to include Clusters

london_grouped_clustering=df
london_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
196,ALBION SQUARE,2450000.0,-41.273758,173.289393
390,ANHALT ROAD,2435000.0,51.480316,-0.166801
405,ANSDELL TERRACE,2250000.0,51.49989,-0.189103
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457


In [26]:
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

london_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,ALBION SQUARE,2450000.0,-41.273758,173.289393,1,Café,Pub,Restaurant,Indian Restaurant,Bar,Coffee Shop,Park,New American Restaurant,Supermarket,Beer Garden
390,ANHALT ROAD,2435000.0,51.480316,-0.166801,3,Pub,Grocery Store,French Restaurant,Garden,Plaza,English Restaurant,Gym / Fitness Center,Diner,Japanese Restaurant,Cocktail Bar
405,ANSDELL TERRACE,2250000.0,51.49989,-0.189103,0,Restaurant,Juice Bar,Clothing Store,Hotel,Pub,Italian Restaurant,Bakery,Indian Restaurant,Middle Eastern Restaurant,Breakfast Spot
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667,3,Bar,Sandwich Place,Nightclub,Casino,Zoo Exhibit,Factory,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457,2,Food & Drink Shop,Indie Movie Theater,Thai Restaurant,Pizza Place,Pub,Coffee Shop,Restaurant,Park,Café,Farmers Market
981,BEAUCLERC ROAD,2480000.0,30.211452,-81.617981,1,Spa,Speakeasy,Pizza Place,Automotive Shop,Farmers Market,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1102,BELVEDERE DRIVE,2340000.0,41.529211,-72.771639,2,Hotel,Pharmacy,Sandwich Place,Gas Station,Basketball Court,Food Truck,Donut Shop,Burger Joint,Video Store,Bank
1215,BICKENHALL STREET,2208500.0,51.521201,-0.158908,0,Café,Coffee Shop,Gastropub,Italian Restaurant,Pizza Place,Restaurant,Hotel,Bakery,Garden,Greek Restaurant
1253,BIRCHLANDS AVENUE,2217000.0,51.448394,-0.160468,0,Pub,French Restaurant,Brewery,Bakery,Pizza Place,Coffee Shop,Lake,Chinese Restaurant,Train Station,Factory
1553,BRAMPTON GROVE,2456875.0,51.589961,-0.318525,1,Food Service,Construction & Landscaping,Home Service,Farm,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit


In [27]:

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [28]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
405,2250000.0,Restaurant,Juice Bar,Clothing Store,Hotel,Pub,Italian Restaurant,Bakery,Indian Restaurant,Middle Eastern Restaurant,Breakfast Spot
1215,2208500.0,Café,Coffee Shop,Gastropub,Italian Restaurant,Pizza Place,Restaurant,Hotel,Bakery,Garden,Greek Restaurant
1253,2217000.0,Pub,French Restaurant,Brewery,Bakery,Pizza Place,Coffee Shop,Lake,Chinese Restaurant,Train Station,Factory
2225,2200000.0,,,,,,,,,,
2638,2250000.0,Bakery,Coffee Shop,Grocery Store,Supermarket,Pharmacy,Bookstore,Restaurant,Pizza Place,Flea Market,Hotel


In [29]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,2450000.0,Café,Pub,Restaurant,Indian Restaurant,Bar,Coffee Shop,Park,New American Restaurant,Supermarket,Beer Garden
981,2480000.0,Spa,Speakeasy,Pizza Place,Automotive Shop,Farmers Market,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1553,2456875.0,Food Service,Construction & Landscaping,Home Service,Farm,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit
1980,2492500.0,Supermarket,English Restaurant,Park,Coffee Shop,Café,Dry Cleaner,Rental Car Location,Gym,Fast Food Restaurant,Pub
2136,2461000.0,Pub,Trail,Zoo Exhibit,Farm,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit


In [30]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
855,2375000.0,Food & Drink Shop,Indie Movie Theater,Thai Restaurant,Pizza Place,Pub,Coffee Shop,Restaurant,Park,Café,Farmers Market
1102,2340000.0,Hotel,Pharmacy,Sandwich Place,Gas Station,Basketball Court,Food Truck,Donut Shop,Burger Joint,Video Store,Bank
2068,2375000.0,Pub,Grocery Store,Yoga Studio,Park,Hotel,Indian Restaurant,Sushi Restaurant,Thai Restaurant,Tennis Court,Sandwich Place
2129,2379652.7,Pub,Indian Restaurant,Hostel,Coffee Shop,Grocery Store,Hotel,Bakery,Yoga Studio,English Restaurant,Gastropub
2944,2367500.0,Hotel,Pub,Garden,Coffee Shop,Italian Restaurant,Café,Bar,Mediterranean Restaurant,Chinese Restaurant,Supermarket


In [31]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
390,2435000.0,Pub,Grocery Store,French Restaurant,Garden,Plaza,English Restaurant,Gym / Fitness Center,Diner,Japanese Restaurant,Cocktail Bar
422,2400000.0,Bar,Sandwich Place,Nightclub,Casino,Zoo Exhibit,Factory,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant
1632,2397132.0,Grocery Store,Convenience Store,Gym / Fitness Center,Italian Restaurant,Coffee Shop,Farm,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant
1797,2400000.0,Park,Community Center,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory,Falafel Restaurant
1914,2445000.0,Dance Studio,Athletics & Sports,Bar,Grocery Store,Construction & Landscaping,Food,Farm,Electronics Store,English Restaurant,Ethiopian Restaurant


In [32]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2242,2300000.0,Farm,Soup Place,Zoo Exhibit,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory
2406,2286679.0,Café,Pub,Italian Restaurant,Convenience Store,Bar,Coffee Shop,Park,French Restaurant,Furniture / Home Store,Breakfast Spot
2686,2287500.0,Pub,Brewery,Gym / Fitness Center,Gift Shop,Art Museum,Farm,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space
3377,2298000.0,Hotel,Zoo Exhibit,Farm,Egyptian Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Factory
4285,2265000.0,Gym / Fitness Center,American Restaurant,Trail,Gym,Dry Cleaner,Flea Market,Exhibit,Eastern European Restaurant,Egyptian Restaurant,Electronics Store
