# Business Problem section

## Background

According to Bloomberg News, the London Housing Market is in a bad condition.It is now facing a number of different headwinds, including the prospect of higher taxes,hidden price falls, record-low sales, homebuilder exodus.

## Buisness Problem

In this situation, It is crucial to adopt analytics techniques in order to assis homebuyers in London to make wise and effective decisions. As a result,  We need to provide help to homebuyers and investors to purchase a suitable house/property in london in this uncertain economic scenario,how could we do that?


## Data section

Data on London properties and the relative price paid data were extracted from the HM Land Registry. The following fields comprise the address data included in Price Paid Data: Postcode; PAON Primary Addressable Object Name. Typically the house number or name; SAON Secondary Addressable Object Name. If there is a sub-building, for example, the building is divided into flats, there will be a SAON; Street; Locality; Town/City; District; County.
To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on London properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.

## Methodology section

The Methodology section will describe the main components of our analysis and predication system. The Methodology section consist of below stages:

   1. Collect Inspection Data
   2. Explore and Understand Data
   3. Data Wrangling and preprocessing
   4. Modeling


## 1. Collect Inspection Data

After importing the necessary libraries, we download the data from the HM Land Registry website as follows:

In [1]:
import numpy as np
import pandas as pd
import os
import datetime as dt
import json

# to get latitude and longitude from address
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 

#library to handle requests
import requests 
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#import library for visualization
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.6.16          |           py36_0         148 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [2]:
# Readnig the data and Creating a DataFrame
Land_data = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv",header = None)

## 2. Explore and Understand Data
We read the dataset that we collected from the Land Registry website into a pandas' data frame and display the first ten rows of it:

In [3]:
Land_data.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,{79A74E21-C934-1289-E053-6B04A8C01627},177000,2018-09-21 00:00,LE4 6EE,S,N,F,201,,BELPER STREET,,LEICESTER,LEICESTER,LEICESTER,A,A
1,{79A74E21-C935-1289-E053-6B04A8C01627},90000,2018-10-01 00:00,LE18 2AE,F,N,L,27,,ELIZABETH COURT,,WIGSTON,OADBY AND WIGSTON,LEICESTERSHIRE,A,A
2,{79A74E21-C936-1289-E053-6B04A8C01627},375000,2018-10-04 00:00,LE11 3HG,D,N,F,6,,GOLDFINCH CLOSE,,LOUGHBOROUGH,CHARNWOOD,LEICESTERSHIRE,A,A
3,{79A74E21-C937-1289-E053-6B04A8C01627},142500,2018-10-08 00:00,LE3 6UY,S,N,F,19,,PINEHURST CLOSE,,LEICESTER,LEICESTER,LEICESTER,A,A
4,{79A74E21-C938-1289-E053-6B04A8C01627},157500,2018-10-22 00:00,LE13 0JH,S,N,F,103,,WEST AVENUE,,MELTON MOWBRAY,MELTON,LEICESTERSHIRE,A,A
5,{79A74E21-C939-1289-E053-6B04A8C01627},192500,2018-10-02 00:00,LE12 7UT,T,N,F,26,,MELODY DRIVE,SILEBY,LOUGHBOROUGH,CHARNWOOD,LEICESTERSHIRE,A,A
6,{79A74E21-C93A-1289-E053-6B04A8C01627},290000,2018-10-22 00:00,LE13 0SZ,D,N,F,35,,CAVALRY CLOSE,,MELTON MOWBRAY,MELTON,LEICESTERSHIRE,A,A
7,{79A74E21-C93C-1289-E053-6B04A8C01627},130000,2018-10-10 00:00,LE11 5AN,F,N,L,39,,BARNSDALE CLOSE,,LOUGHBOROUGH,CHARNWOOD,LEICESTERSHIRE,B,A
8,{79A74E21-C93D-1289-E053-6B04A8C01627},177000,2018-10-12 00:00,LE3 2GJ,T,N,F,5,,FRANKSON AVENUE,,LEICESTER,BLABY,LEICESTERSHIRE,A,A
9,{79A74E21-C93E-1289-E053-6B04A8C01627},200000,2018-08-31 00:00,LE18 3QF,S,N,F,7,,YARWELL DRIVE,,WIGSTON,OADBY AND WIGSTON,LEICESTERSHIRE,A,A


In [4]:
Land_data.shape

(1017357, 16)

## 3. Data Wrangling and preprocessing

At this stage, we prepare our dataset for the modeling process.Accordingly, we perform the following steps:

Rename the column names Format the date column Sort data by date of sale Select data only for the city of London Make a list of street names in London Calculate the street-wise average price of the property Read the street-wise coordinates into a data frame, eliminating recurring word London from individual names Join the data to find the coordinates of locations which fit into client's budget Plot recommended locations on London map along with current market prices

In [5]:
# Assign column names
Land_data.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Property_Type', 'Old_New', 'Duration', 'PAON', 'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

In [6]:
# Converting to date_Transfer column to datetime object
Land_data['Date_Transfer'] = Land_data['Date_Transfer'].apply(pd.to_datetime)
Land_data.head()

Unnamed: 0,TUID,Price,Date_Transfer,Postcode,Property_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
0,{79A74E21-C934-1289-E053-6B04A8C01627},177000,2018-09-21,LE4 6EE,S,N,F,201,,BELPER STREET,,LEICESTER,LEICESTER,LEICESTER,A,A
1,{79A74E21-C935-1289-E053-6B04A8C01627},90000,2018-10-01,LE18 2AE,F,N,L,27,,ELIZABETH COURT,,WIGSTON,OADBY AND WIGSTON,LEICESTERSHIRE,A,A
2,{79A74E21-C936-1289-E053-6B04A8C01627},375000,2018-10-04,LE11 3HG,D,N,F,6,,GOLDFINCH CLOSE,,LOUGHBOROUGH,CHARNWOOD,LEICESTERSHIRE,A,A
3,{79A74E21-C937-1289-E053-6B04A8C01627},142500,2018-10-08,LE3 6UY,S,N,F,19,,PINEHURST CLOSE,,LEICESTER,LEICESTER,LEICESTER,A,A
4,{79A74E21-C938-1289-E053-6B04A8C01627},157500,2018-10-22,LE13 0JH,S,N,F,103,,WEST AVENUE,,MELTON MOWBRAY,MELTON,LEICESTERSHIRE,A,A


In [7]:
# Dropping the transaction that were done before 2016
Land_data.drop(Land_data[Land_data.Date_Transfer.dt.year<2016].index,inplace = True)


#Sorting by date
Land_data.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)
Land_data.head(5)

Unnamed: 0,TUID,Price,Date_Transfer,Postcode,Property_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
307197,{85866A65-7FAA-143F-E053-6B04A8C06A15},289950,2018-12-31,LS27 8YF,D,Y,F,20,,BEDALE DRIVE,MORLEY,LEEDS,LEEDS,WEST YORKSHIRE,A,A
1011451,{8355F009-D307-55C5-E053-6B04A8C0D090},4600000,2018-12-31,OX2 9PH,O,N,F,171 - 173,,CUMNOR HILL,CUMNOR,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A
295703,{85866A65-8CC8-143F-E053-6B04A8C06A15},72500,2018-12-31,DL14 7NP,F,N,L,27A,,MARKET PLACE,,BISHOP AUCKLAND,COUNTY DURHAM,COUNTY DURHAM,B,A
149607,{8355F008-F025-55C5-E053-6B04A8C0D090},152000,2018-12-31,RM15 6AU,F,N,L,55,,CLAYBURN GARDENS,,SOUTH OCKENDON,THURROCK,THURROCK,A,A
954610,{80E1AA98-9BDC-7BF8-E053-6C04A8C00BF2},135000,2018-12-31,BS24 7DS,F,N,L,8,,WISLEY WALK,,WESTON-SUPER-MARE,NORTH SOMERSET,NORTH SOMERSET,A,A


In [8]:
# Filtering the Dataset to London Town_city
Land_data.columns = [column.replace(" ","_") for column in Land_data.columns] # Replacing the space in column name with _
Land_data_london = Land_data.query("Town_City == 'LONDON'")
Land_data_london.head()

Unnamed: 0,TUID,Price,Date_Transfer,Postcode,Property_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
303618,{85866A65-8EF7-143F-E053-6B04A8C06A15},671837,2018-12-31,N1 7JL,F,N,L,"ANGEL WHARF, 164",FLAT 7,SHEPHERDESS WALK,,LONDON,HACKNEY,GREATER LONDON,B,A
154358,{8355F009-40A8-55C5-E053-6B04A8C0D090},987972,2018-12-31,N1C 4PF,F,Y,L,98,FLAT 66,CAMLEY STREET,,LONDON,CAMDEN,GREATER LONDON,A,A
154357,{8355F009-40A7-55C5-E053-6B04A8C0D090},1032852,2018-12-31,N1C 4PF,F,Y,L,98,FLAT 69,CAMLEY STREET,,LONDON,CAMDEN,GREATER LONDON,A,A
154354,{8355F009-40A4-55C5-E053-6B04A8C0D090},1070000,2018-12-31,N1C 4PF,F,Y,L,98,FLAT 44,CAMLEY STREET,,LONDON,CAMDEN,GREATER LONDON,A,A
170512,{8355F009-6070-55C5-E053-6B04A8C0D090},370000,2018-12-31,SE25 6TX,T,N,F,13,,BROSTER GARDENS,,LONDON,CROYDON,GREATER LONDON,A,A


In [9]:
# Make a list of street names in LONDON
streets = Land_data_london['Street'].unique().tolist()
streets

['SHEPHERDESS WALK',
 'CAMLEY STREET',
 'BROSTER GARDENS',
 'ST JOSEPHS STREET',
 'HERMITAGE ROAD',
 'SYLVAN HILL',
 'VICTORIA STREET',
 'DERWENT ROAD',
 'THREE OAK LANE',
 'QUICKS ROAD',
 'GRAND DRIVE',
 'HANDYSIDE STREET',
 'TRINITY CRESCENT',
 'CHINGFORD MOUNT ROAD',
 'LANCASTER GATE',
 'ST GEORGES SQUARE',
 'APPLE YARD',
 'HYDE ESTATE ROAD',
 'MICHLEHAM DOWN',
 'RITHERDON ROAD',
 'WENTWORTH STREET',
 'BUCKINGHAM PALACE ROAD',
 'EXCHANGE GARDENS',
 'CLIFFORD STREET',
 'HIGHWOOD HILL',
 'HOWARD ROAD',
 'BLACKFRIARS ROAD',
 "ST JAMES'S PLACE",
 'HONOUR LEA AVENUE',
 'TURNHAM GREEN TERRACE',
 'ARCHWAY ROAD',
 'COLLENDALE ROAD',
 'PALACE COURT',
 'RYMILL STREET',
 'HARTFIELD ROAD',
 'SHAFTESBURY AVENUE',
 'ILDERTON ROAD',
 'KENSINGTON HIGH STREET',
 'KNATCHBULL ROAD',
 'LYNMOUTH ROAD',
 'LONDESBOROUGH ROAD',
 'OVEX CLOSE',
 'LAVENDER HILL',
 'CRIMSCOTT STREET',
 'BOUNDFIELD ROAD',
 'OLD KENT ROAD',
 'KENNINGTON ROAD',
 'ALDRINGTON ROAD',
 'WYKE ROAD',
 'DERNY AVENUE',
 'VILLIERS GARDENS

In [10]:
# Creating a New dataframe with Average price street wise
Land_data_street = Land_data_london.groupby(['Street'])['Price'].mean().reset_index()
Land_data_street = Land_data_street.rename(columns={'Price':'Avg_Price'})
Land_data_street.head()

Unnamed: 0,Street,Avg_Price
0,AARON HILL ROAD,295000.0
1,ABBERLEY MEWS,500000.0
2,ABBESS CLOSE,265000.0
3,ABBEVILLE ROAD,1365460.0
4,ABBEY GARDENS,2767950.0


In [11]:
#Asking User's Budget i.e. Lower limit and Upper limit - Find the locations Land_data_street which fits your budget
Land_affordable = Land_data_street.query("(Avg_Price >= 1900000) & (Avg_Price <= 2800000)")
Land_affordable

Unnamed: 0,Street,Avg_Price
195,ALBION SQUARE,2.450000e+06
390,ANHALT ROAD,2.435000e+06
405,ANSDELL TERRACE,2.250000e+06
420,APPLEGARTH ROAD,2.400000e+06
698,AYLESTONE AVENUE,2.286667e+06
851,BARONSMEAD ROAD,2.375000e+06
975,BEAUCLERC ROAD,2.480000e+06
1096,BELVEDERE DRIVE,2.340000e+06
1209,BICKENHALL STREET,2.208500e+06
1247,BIRCHLANDS AVENUE,2.217000e+06


In [12]:
from geopy.geocoders import Nominatim 
from geopy.distance import vincenty
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [13]:
geolocator = Nominatim()

  if __name__ == '__main__':


In [21]:
Land_affordable.head()

Unnamed: 0,Street,Avg_Price,Lattiude,Longitude
195,ALBION SQUARE,2450000.0,,
390,ANHALT ROAD,2435000.0,,
405,ANSDELL TERRACE,2250000.0,,
420,APPLEGARTH ROAD,2400000.0,,
698,AYLESTONE AVENUE,2286667.0,,


In [22]:
for i in range(len(Land_affordable)):
    print(Land_affordable['Street'])

195              ALBION SQUARE
390                ANHALT ROAD
405            ANSDELL TERRACE
420            APPLEGARTH ROAD
698           AYLESTONE AVENUE
851            BARONSMEAD ROAD
975             BEAUCLERC ROAD
1096           BELVEDERE DRIVE
1209         BICKENHALL STREET
1247         BIRCHLANDS AVENUE
1546            BRAMPTON GROVE
1625         BRIARDALE GARDENS
1790                  BROOKWAY
1906              BURBAGE ROAD
1972                 BURY WALK
2059           CALLCOTT STREET
2120         CAMPDEN HILL ROAD
2127              CAMPION ROAD
2149             CANNING PLACE
2216             CARLISLE ROAD
2221           CARLTON GARDENS
2233             CARLYLE COURT
2396            CHALCOT SQUARE
2474              CHARLES LANE
2552          CHELSEA CRESCENT
2596       CHESTER CLOSE NORTH
2627              CHEYNE COURT
2630                CHEYNE ROW
2675             CHISWICK MALL
2797          CLARENDON STREET
                 ...          
10878     RUSSELL GARDENS MEWS
11125   

In [58]:
loc = geolocator.geocode("Anhalt road",timeout =None)
print(loc.latitude,loc.longitude)

51.4803265 -0.1667607


  if __name__ == '__main__':


In [None]:
Land_affordable['city_coord'] = Land_affordable['Street'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

In [42]:
from bs4 import BeautifulSoup # to parse the data returned from the website 
import requests as rs #to handle requests and query page
import pandas as pd #to create an dataframe
!conda install -c anaconda lxml --yes

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - lxml


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    lxml-4.3.4                 |   py36hefd8a0e_0         1.5 MB  anaconda
    certifi-2019.6.16          |           py36_0         154 KB  anaconda
    openssl-1.1.1              |       h7b6447c_0         5.0 MB  anaconda
    ca-certificates-2019.5.15  |                0         133 KB  anaconda
    ------------------------------------------------------------
                                           Total:         6.8 MB

The following packages will be UPDATED:

    certifi:         2019.6.16-py36_0     conda-forge --> 2019.6.16-py36_0     anaconda
    lxml:            4.3.1-py36hefd8a0e_0             --> 4.3.4-py36hefd8a0e_0 anaconda
    openssl:         1.1.1c-h516909a_0    conda-forge --> 1.1.1-h7b6447c

In [43]:
link = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

#to make a query to link
source = rs.get(link).text
#print(source)
soup = BeautifulSoup(source,'lxml')
#print(soup)
table = soup.table
df_Neigh= pd.read_html(str(table), header=0)[0]
#df_Neigh.head(10)
#Toronto_Neigh.shape
#Toronto_Neigh.columns
# Dropping rows that have Borough as Not assigned
for index,row in df_Neigh.iterrows():
    if row['Borough']=="Not assigned":
        df_Neigh = df_Neigh.drop(index)
df_Neigh.head()
#Toronto_Neigh.shape
for index,row in df_Neigh.iterrows():
    if row['Neighbourhood']=="Not assigned":
        row['Neighbourhood'] = row['Borough']
df_Neigh.head()
df_Neigh = df_Neigh.groupby(['Postcode','Borough'])['Neighbourhood'].apply(','.join)
df_Neigh = pd.DataFrame(df_Neigh).reset_index()

In [47]:
!wget -q -O 'Geospatial_Coordinates.csv' http://cocl.us/Geospatial_data

In [62]:
coordinates_df.head(5)

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [61]:
coordinates_df = pd.read_csv('Geospatial_Coordinates.csv')
df_Neigh.assign(Latitude = coordinates_df['Latitude'],Longitude = coordinates_df['Longitude'])
df_Neigh.shape

(103, 3)

In [50]:
address = 'London, UK'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London City are 51.5073219, -0.1276474.


  app.launch_new_instance()


In [None]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df_Neigh['Latitude'], df_Neigh['Longitude'], Land_affordable['Avg_Price'], Land_affordable['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [55]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'PDDMZUSW4PZIH0WA1MTWCQ0LJ1ZJH3XOQEENNWZLZ32XXITS' # Foursquare ID
CLIENT_SECRET = 'SUDOSIMC3EMPROAWTO3OCCKYPZ0VMGLHYYSD0Y33OK13MLBH' # Foursquare Secret
VERSION = '20181206' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PDDMZUSW4PZIH0WA1MTWCQ0LJ1ZJH3XOQEENNWZLZ32XXITS
CLIENT_SECRET:SUDOSIMC3EMPROAWTO3OCCKYPZ0VMGLHYYSD0Y33OK13MLBH


We can now proceed to the Modeling phase. We will analyze neighborhoods to recommend real estates where home buyers can make a real estate investment. We will then recommend profitable venues according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.

## 4.Modeling
After exploring the dataset and gaining insights into it, we are ready to use the clustering methodology to analyze real estates. We will use the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible to account for mutations in real estate market in London and is accurate.

In [56]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
# Run the above function on each location and create a new dataframe called location_venues and display it.
location_venues = getNearbyVenues(names=Land_affordable['Street'],
                                   latitudes=df_Neigh['Latitude'],
                                   longitudes=df_Neigh['Longitude']
                                  )

In [None]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

In [None]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

In [None]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

In [None]:
# Finding the top 5 venues/facilities nearby profitable real estate investments?#
num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

In [None]:
#a function to return the most common venues nearby real estate investments
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [None]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)



## Setting the number of clusters

In [None]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]



In [None]:
#Dataframe to include Clusters
london_grouped_clustering=df
london_grouped_clustering.head()

In [None]:
# add clustering labels
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

london_grouped_clustering.head(30) # check the last columns!



In [None]:
# Create Map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results and Discussion section

Inspite of the fact that London Housing Market may be in a bad situation, it is still an "ever-green" for business affairs.

We may discuss our results under two main perspectives.

First, we may examine them according to neighborhoods/London areas. It is interesting to note that, although West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) might be considered highly profitable venues to purchase a real estate according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores, South-West London (Wandsworth, Balham) and North-West London (Isliington) are arising as next future elite venues with a wide range of amenities and facilities. Accordingly, one might target under-priced real estates in these areas of London in order to make a business affair.

Second, we may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities, we have found two main patterns. The first pattern we are referring to, i.e. Clusters 0, 2 and 4, may target home buyers prone to live in 'green' areas with parks, waterfronts. Instead, the second pattern we are referring to, i.e. Clusters 1 and 3, may target individuals who love pubs, theatres and soccer.

## SUMMARY

To sum up, according to Bloomberg News, the London Housing Market is in a bad scenario. It is now facing a number of different headwinds, including the prospect of higher taxes and a warning from the Bank of England that U.K. home values could fall as much as 30 percent in the event of a disorderly exit from the European Union. In this scenario, it is urgent to adopt machine learning tools in order to assist homebuyers clientele in London to make wise and effective decisions. As a result, the business problem we were posing was: how could we provide support to homebuyers clientele in to purchase a suitable real estate in London in this uncertain economic and financial scenario?

To solve this business problem, we clustered London neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We recommended profitable venues according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.

First, we gathered data on London properties and the relative price paid data were extracted from the HM Land Registry. Moreover, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on London properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend profitable real estate investments.

Second, The Methodology section comprised four stages: 
1. Collect Inspection Data; 
2. Explore and Understand Data; 
3. Data preparation and preprocessing; 
4. Modeling. In particular, 
in the modeling section, we used the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible to account for mutations in real estate market in London and is accurate.

Finally, we drew the conclusion that even though the London Housing Market may be in a bad scenario, it is still an "ever-green" for business affairs. We discussed our results under two main perspectives. First, we examined them according to neighborhoods/London areas. although West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) might be considered highly profitable venues to purchase a real estate according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores, South-West London (Wandsworth, Balham) and North-West London (Isliington) are arising as next future elite venues with a wide range of amenities and facilities. Accordingly, one might target under-priced real estates in these areas of London in order to make a business affair. Second, we analyzed our results according to the five clusters we produced. While Clusters 0, 2 and 4 may target home buyers prone to live in 'green' areas with parks, waterfronts, Clusters 1 and 3 may target individuals who love pubs, theatres and soccer.