# Applied Data Science Capstone

## Peer graded report

Andre Brink  
June, 2020

## Introduction

London, the Capital of the United Kingdom, and one of the most expensive cities in the world. Finding affordable house prices are becomming increasingly challenging. In this project I attempted to find all neighborhoods in London with average house/appartment prices below £400,000 and specifying 10 amenities within 500 meters of the locations.

The target audience for this project would be anyone who is interested in moving to London without any prior knowledge of house prices and entertainment around the city.

## Data

The data used in this project comes from the following sourses:  
- House price data was scraped from the following site  
https://www.kfh.co.uk/london/sold-data

- Location data was found using foursquare API  
https://foursquare.com/

## Table of Contents

1. Importing Libraries
2. Web scraping  
3. Defining and Cleaning Data  
3.1. Defining Dataframe from wescraping  
3.2. Remove House prices above £400,000  
4. Connect to Foursquare API
5. Location Data  
5.1. London Location Data  
5.2. Neigborhood Location Data  
6. Map of Neighborhoods  
7. Nearby Venues  
7.1. Function to find nearby venues  
7.2. Collecting Nearby Venues  
7.3. Group Venues by Neighborhood  
7.4. OneHot encoding venues  
8. Find 10 Most Common Venue by Neighborhood  
9. User's Top 10 Preferred Venues  
10. k_Means Clustering  
11. User's Ideal Cluster  
12. Create map of clusters  
13. Cluster Data  
13.1. Cluster 1 Data  
13.2. Cluster 2 Data  
13.3. Cluster 3 Data  
13.4. Cluster 4 Data  
14. Conclusion

## 1. Importing Libraries

In [None]:
import pandas as pd
import urllib.request
import numpy as np
import matplotlib.colors as colors
import matplotlib.cm as cm
import folium
import requests
import json
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize

## 2. Web Scraping

In [2]:
# URL containing the data
url = "https://www.kfh.co.uk/london/sold-data"

In [3]:
# open the url using urllib.request and put the HTML into the page variable
page = urllib.request.urlopen(url)

In [4]:
# parse the HTML from our URL into the BeautifulSoup parse tree format
soup = BeautifulSoup(page, "lxml")

In [5]:
# view html code (uncomment to view)
#BeautifulSoup.prettify(soup)

## Defining and Cleaning Data
### 3.1 Defining Dataframe from webscraping

In [6]:
# use the 'find_all' function to bring back all instances of the 'table' tag in the HTML and store in 'all_tables' variable
all_tables=soup.find_all("table")
#all_tables # Uncomment to view all tables

In [7]:
tables = []
for table in soup.find_all("table"):
    tables.append(table.extract())

In [8]:
# Extract relevant data from table
area = []
avePrice = []
priceChange = []
percentChange = []

for table in tables:
    for row in table.findAll('tr'):
        cells = row.findAll('td')
        if len(cells) == 4:
            area.append(cells[0].find(text=True))
            avePrice.append(cells[1].find(text=True).replace('£','').replace(',',''))
            priceChange.append(cells[2].find(text=True).replace('£','').replace(',',''))
            percentChange.append(cells[3].find(text=True).replace('%',''))

In [9]:
# Create dataframe and add data
areaData=pd.DataFrame(area,columns=['area'])
areaData['avePrice']=avePrice
areaData['priceChange']=priceChange
areaData['percentChange']=percentChange

In [10]:
areaData = areaData[areaData['avePrice'] != 'No data']
areaData = areaData[areaData['priceChange'] != 'No data']
areaData = areaData[areaData['percentChange'] != 'No data']

In [11]:
areaData['avePrice'] = areaData['avePrice'].astype(float)
areaData['priceChange'] = areaData['priceChange'].astype(float)
areaData['percentChange'] = areaData['percentChange'].astype(float)

### 3.2 Remove prices above £400 000

In [12]:
areaData = areaData[areaData['avePrice'] <= 400000]

In [13]:
areaData.dtypes

area              object
avePrice         float64
priceChange      float64
percentChange    float64
dtype: object

In [14]:
areaData.shape

(32, 4)

In [15]:
areaData.head()

Unnamed: 0,area,avePrice,priceChange,percentChange
20,Enfield,395131.0,10645.0,2.7
115,Park Royal,383680.0,5201.0,1.4
122,Stonebridge,382704.0,6874.0,1.8
245,Mitcham,359114.0,4750.0,1.3
260,Putney Heath,343868.0,8346.0,2.4


## 4. Connect to Foursquare API

In [16]:
cred_df = pd.read_csv(r'C:\Courses\ddd\foursquare.csv')
C_ID = cred_df.iloc[0,0]
C_Secret = cred_df.iloc[1,0]

In [17]:
CLIENT_ID = C_ID
CLIENT_SECRET = C_Secret
VERSION = '20180604'

## 5. Location Data
### 5.1 London Location Data

In [18]:
address = 'London'

geolocator = Nominatim(user_agent="Coursera Project")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

51.5073219 -0.1276474


### 5.2 Neighborhood Location Data

In [19]:
areaLatitude = []
areaLongitude = []
for area in areaData['area']:
    address = area + ',London'
    geolocator = Nominatim(user_agent="Coursera Project")
    location = geolocator.geocode(address)
    try:
        areaLatitude.append(location.latitude)
        areaLongitude.append(location.longitude)
    except:
        areaLatitude.append(np.nan)
        areaLongitude.append(np.nan)

In [20]:
areaData['Latitude'] = areaLatitude
areaData['Longitude'] = areaLongitude
areaData.dropna(inplace=True)

In [21]:
areaData.head()

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude
20,Enfield,395131.0,10645.0,2.7,51.652085,-0.081018
115,Park Royal,383680.0,5201.0,1.4,51.526434,-0.283935
122,Stonebridge,382704.0,6874.0,1.8,51.54411,-0.276228
245,Mitcham,359114.0,4750.0,1.3,51.405801,-0.164079
260,Putney Heath,343868.0,8346.0,2.4,51.442842,-0.232207


## 6. Map of Neigborhoods

In [22]:
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, area in zip(areaData['Latitude'],
                          areaData['Longitude'],
                          areaData['area']):
    
    label = '{}'.format(area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

## 7. Nearby Venues
### 7.1 Funtion to find nearby venues

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print(name, 'No items added')
            
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['area', 
                  'area Latitude', 
                  'area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### 7.2 Collecting Nearby Venues

In [24]:
LIMIT = 100
london_venues = getNearbyVenues(names=areaData['area'],
                                latitudes=areaData['Latitude'],
                                longitudes=areaData['Longitude'],
                                radius = 500)

In [25]:
print(london_venues.shape)
london_venues.head()

(341, 7)


Unnamed: 0,area,area Latitude,area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Enfield,51.652085,-0.081018,Goodlooking Optics,51.652223,-0.07949,Optical Shop
1,Enfield,51.652085,-0.081018,PizzaExpress,51.652475,-0.080832,Pizza Place
2,Enfield,51.652085,-0.081018,Waitrose & Partners,51.651602,-0.084114,Supermarket
3,Enfield,51.652085,-0.081018,Aksular Restaurant Enfield Town,51.652533,-0.080654,Turkish Restaurant
4,Enfield,51.652085,-0.081018,Enfield Town Park,51.649998,-0.083855,Park


### 7.3 Group Venues by Neighborhood

In [26]:
london_venues.groupby('area').count()

Unnamed: 0_level_0,area Latitude,area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Addiscombe,11,11,11,11,11,11
Anerley,6,6,6,6,6,6
Beddington,4,4,4,4,4,4
Bellingham,4,4,4,4,4,4
Biggin Hill,3,3,3,3,3,3
Blackfen,4,4,4,4,4,4
Catford,32,32,32,32,32,32
"Croydon, Surrey",34,34,34,34,34,34
Downham,5,5,5,5,5,5
Enfield,61,61,61,61,61,61


In [27]:
print('There are {} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 98 uniques categories.


### 7.4 OneHot encoding venues

In [28]:
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

london_onehot['area'] = london_venues['area'] 

fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

In [29]:
london_onehot.shape

(341, 99)

In [30]:
london_grouped = london_onehot.groupby('area').mean().reset_index()

In [31]:
london_grouped.shape

(32, 99)

## 8. Find 10 Most Common Venue by Neighborhood

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
area_venues_sorted = pd.DataFrame(columns=columns)
area_venues_sorted['area'] = london_grouped['area']

for ind in np.arange(london_grouped.shape[0]):
    area_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

area_venues_sorted

Unnamed: 0,area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Addiscombe,Park,Grocery Store,Bakery,Fast Food Restaurant,Pub,Chinese Restaurant,Café,Women's Store,Forest,Discount Store
1,Anerley,Grocery Store,Park,Convenience Store,Supermarket,Fast Food Restaurant,Fried Chicken Joint,Diner,Discount Store,Eastern European Restaurant,Electronics Store
2,Beddington,Hardware Store,Indian Restaurant,Pub,Park,Gym,Grocery Store,Diner,Discount Store,Eastern European Restaurant,Electronics Store
3,Bellingham,Park,Gym,Grocery Store,Train Station,Fried Chicken Joint,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Service
4,Biggin Hill,Airport,Airport Service,Pub,Furniture / Home Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant,Fish & Chips Shop
5,Blackfen,Grocery Store,Convenience Store,Gastropub,Women's Store,Fried Chicken Joint,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
6,Catford,Supermarket,Grocery Store,Bus Stop,Platform,Italian Restaurant,Coffee Shop,Train Station,Theater,Sandwich Place,Pizza Place
7,"Croydon, Surrey",Clothing Store,Pub,Coffee Shop,Sushi Restaurant,Bookstore,Event Service,Caribbean Restaurant,Café,Italian Restaurant,Burger Joint
8,Downham,Park,Café,Gym / Fitness Center,Grocery Store,Women's Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
9,Enfield,Coffee Shop,Clothing Store,Café,Pub,Pharmacy,Optical Shop,Restaurant,Sandwich Place,Shopping Mall,Fish & Chips Shop


In [35]:
london_grouped.head()

Unnamed: 0,area,Airport,Airport Service,American Restaurant,Arts & Crafts Store,Asian Restaurant,Auto Garage,Bakery,Bar,Baseball Field,...,Stationery Store,Supermarket,Sushi Restaurant,Theater,Train Station,Tram Station,Turkish Restaurant,Video Game Store,Warehouse Store,Women's Store
0,Addiscombe,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Anerley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Beddington,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bellingham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
4,Biggin Hill,0.333333,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## 9. User's Top 10 Preferred Venues

In [36]:
top10 = {
            "Park" : 0.2,
            "Train Station" : 0.125,
            "Pub" : 0.125,
            "Diner" : 0.1,
            "Coffee Shop" : 0.1,
            "Café" : 0.1,
            "Supermarket" : 0.075,
            "Clothing Store" : 0.75,
            "Pharmacy" : 0.05,
            "Bus Stop" : 0.05
           }

keys = list(top10.keys())
x = 0
myData1 = {}
myData1['area'] = 'MyIdeal'
for col in area_venues_sorted.columns:
    if col == 'area':
        continue
    myData1[col] = keys[x]
    x+=1
    
myTop10_sorted = pd.DataFrame(data=myData1, index=[0])
area_venues_sorted = area_venues_sorted.append(myTop10_sorted)
area_venues_sorted = area_venues_sorted.reset_index(drop=True)

myData2 = {}
myData2['area'] = 'MyIdeal'
for col in london_grouped.columns:
    if col == 'area':
        continue
    if col in top10:
        myData2[col] = top10[col]
    else:
        myData2[col] = 0.0

myTop10_grouped = pd.DataFrame(data=myData2, index=[0])
london_grouped = london_grouped.append(myTop10_grouped)
london_grouped = london_grouped.reset_index(drop=True)

## 10. k_Means Clustering

In [37]:
# set number of clusters
kclusters = 4

london_grouped_clustering = london_grouped.drop('area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 0, 1, 0, 1, 0, 0, 3, 0])

In [38]:
# add clustering labels
area_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = areaData

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(area_venues_sorted.set_index('area'), on='area')

london_merged.head() # check the last columns!

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Enfield,395131.0,10645.0,2.7,51.652085,-0.081018,0,Coffee Shop,Clothing Store,Café,Pub,Pharmacy,Optical Shop,Restaurant,Sandwich Place,Shopping Mall,Fish & Chips Shop
115,Park Royal,383680.0,5201.0,1.4,51.526434,-0.283935,0,Fast Food Restaurant,Hotel,Gym / Fitness Center,Movie Theater,Hookah Bar,Bowling Alley,Clothing Store,Metro Station,Portuguese Restaurant,Coffee Shop
122,Stonebridge,382704.0,6874.0,1.8,51.54411,-0.276228,0,Diner,Café,American Restaurant,Train Station,Platform,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
245,Mitcham,359114.0,4750.0,1.3,51.405801,-0.164079,1,Grocery Store,Supermarket,Park,Indian Restaurant,Discount Store,Fast Food Restaurant,Lake,Fried Chicken Joint,Diner,Eastern European Restaurant
260,Putney Heath,343868.0,8346.0,2.4,51.442842,-0.232207,3,Park,Baseball Field,Women's Store,Furniture / Home Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant,Fish & Chips Shop


## 11. User's Ideal Cluster

In [51]:
ideal_cluster = area_venues_sorted.loc[area_venues_sorted['area'] == 'MyIdeal']['Cluster Labels']
print('Your ideal cluster is', list(ideal_cluster)[0])

Your ideal cluster is 2


## 12. Create map of clusters

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'],
                                  london_merged['Longitude'],
                                  london_merged['area'],
                                  london_merged['Cluster Labels']):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster + 1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 13. Cluster Data
### 13.1 Cluster 1 Data

In [41]:
london_merged[london_merged['Cluster Labels'] == 0]

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Enfield,395131.0,10645.0,2.7,51.652085,-0.081018,0,Coffee Shop,Clothing Store,Café,Pub,Pharmacy,Optical Shop,Restaurant,Sandwich Place,Shopping Mall,Fish & Chips Shop
115,Park Royal,383680.0,5201.0,1.4,51.526434,-0.283935,0,Fast Food Restaurant,Hotel,Gym / Fitness Center,Movie Theater,Hookah Bar,Bowling Alley,Clothing Store,Metro Station,Portuguese Restaurant,Coffee Shop
122,Stonebridge,382704.0,6874.0,1.8,51.54411,-0.276228,0,Diner,Café,American Restaurant,Train Station,Platform,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
280,Thornton Heath,339672.0,6039.0,1.8,51.398871,-0.099602,0,Platform,Coffee Shop,Massage Studio,Juice Bar,Plaza,Sandwich Place,Mediterranean Restaurant,Pharmacy,Train Station,Gym Pool
308,Beddington,367562.0,14074.0,3.8,51.371988,-0.132393,0,Hardware Store,Indian Restaurant,Pub,Park,Gym,Grocery Store,Diner,Discount Store,Eastern European Restaurant,Electronics Store
313,Biggin Hill,379305.0,3829.0,1.0,51.33202,0.029026,0,Airport,Airport Service,Pub,Furniture / Home Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant,Fish & Chips Shop
324,Catford,396618.0,25582.0,6.5,51.445321,-0.019753,0,Supermarket,Grocery Store,Bus Stop,Platform,Italian Restaurant,Coffee Shop,Train Station,Theater,Sandwich Place,Pizza Place
330,"Croydon, Surrey",382622.0,18187.0,4.8,51.372163,-0.10118,0,Clothing Store,Pub,Coffee Shop,Sushi Restaurant,Bookstore,Event Service,Caribbean Restaurant,Café,Italian Restaurant,Burger Joint
347,Green Street Green,341667.0,39167.0,11.5,51.350768,0.083876,0,Pub,Auto Garage,Gastropub,Gas Station,Women's Store,Fried Chicken Joint,Discount Store,Eastern European Restaurant,Electronics Store,Event Service
349,Grove Park,365924.0,28330.0,7.7,51.431897,0.021234,0,Platform,Grocery Store,Park,Indian Restaurant,Train Station,Coffee Shop,Pub,Fried Chicken Joint,Chinese Restaurant,Fish & Chips Shop


### 13.2. Cluster 2 Data

In [42]:
london_merged[london_merged['Cluster Labels'] == 1]

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
245,Mitcham,359114.0,4750.0,1.3,51.405801,-0.164079,1,Grocery Store,Supermarket,Park,Indian Restaurant,Discount Store,Fast Food Restaurant,Lake,Fried Chicken Joint,Diner,Eastern European Restaurant
302,Addiscombe,393435.0,20442.0,5.2,51.379692,-0.074282,1,Park,Grocery Store,Bakery,Fast Food Restaurant,Pub,Chinese Restaurant,Café,Women's Store,Forest,Discount Store
303,Anerley,374880.0,1996.0,0.5,51.407599,-0.061939,1,Grocery Store,Park,Convenience Store,Supermarket,Fast Food Restaurant,Fried Chicken Joint,Diner,Discount Store,Eastern European Restaurant,Electronics Store
310,Bellingham,333988.0,4646.0,1.4,51.431081,-0.024515,1,Park,Gym,Grocery Store,Train Station,Fried Chicken Joint,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Service
315,Blackfen,377763.0,33489.0,8.9,51.450541,0.103062,1,Grocery Store,Convenience Store,Gastropub,Women's Store,Fried Chicken Joint,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
345,Forestdale,311157.0,5121.0,1.6,51.351328,-0.038756,1,Grocery Store,Indian Restaurant,Tram Station,Fish & Chips Shop,Women's Store,Furniture / Home Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service
368,New Addington,308612.0,913.0,0.3,51.342541,-0.016292,1,Convenience Store,Tram Station,Grocery Store,Gas Station,Women's Store,Fried Chicken Joint,Discount Store,Eastern European Restaurant,Electronics Store,Event Service
383,Plumstead,338503.0,5076.0,1.5,51.480463,0.092429,1,Social Club,Convenience Store,Kebab Restaurant,Lake,Park,Fried Chicken Joint,Discount Store,Eastern European Restaurant,Electronics Store,Event Service


### 13.3. Cluster 3 Data

In [43]:
london_merged[london_merged['Cluster Labels'] == 2]

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


### 13.4. Cluster 4 Data

In [44]:
london_merged[london_merged['Cluster Labels'] == 3]

Unnamed: 0,area,avePrice,priceChange,percentChange,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
260,Putney Heath,343868.0,8346.0,2.4,51.442842,-0.232207,3,Park,Baseball Field,Women's Store,Furniture / Home Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant,Fish & Chips Shop
336,Downham,324172.0,4882.0,1.5,51.426111,0.006457,3,Park,Café,Gym / Fitness Center,Grocery Store,Women's Store,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
417,Woodside,374902.0,23755.0,6.3,51.387077,-0.065331,3,Park,Tram Station,Chinese Restaurant,Fried Chicken Joint,Diner,Discount Store,Eastern European Restaurant,Electronics Store,Event Service,Fast Food Restaurant
418,Woolwich,362331.0,28947.0,8.0,51.48267,0.062334,3,Park,Pet Store,Child Care Service,Bus Stop,Grocery Store,Greek Restaurant,Diner,Discount Store,Eastern European Restaurant,Electronics Store


## 14. Conclusion

Based on the analysis, I think cluster 2, as determined by the model is, the best choice of cluster considering the user's ideal neighborhood. A deeper analysis into the neigborhoods themselves is needed to find out which of the 8 neigborhoods in cluster 2 would be the best choice.