# The Battle of the Neighborhoods - London Breaks

## Business Problem

Background

As indicated by Bloomberg News, the London Housing Market is stuck. It is currently confronting various headwinds, including the possibility of higher charges and an admonition from the Bank of England that U.K. home estimations could fall as much as 30 percent in case of a muddled exit from the European Union. All the more explicitly, four disregarded breaks recommend that the London market might be fit as a fiddle than many acknowledge: shrouded value falls, record-low deals, homebuilder departure and assessment climbs tending to abroad purchasers of homes in England and Wales.

Business Problem

In this situation, it is dire to receive AI instruments so as to help home buyer's customer base in London to settle on astute and successful choices. Accordingly, the business issue we are at present presenting is: how might we offer help to home buyers customer base in to buy a reasonable land in London in this unsure monetary and money related situation? 

To take care of this business issue, we are going to bunch London neighborhoods so as to prescribe settings and the present normal cost of land where home buyers can make a land venture. We will prescribe gainful settings as per comforts and basic offices encompassing such scenes for example grade schools, secondary schools, emergency clinics and supermarkets.'

## Data

Information on London properties and the relative value paid information were separated from the HM Land Registry (http://landregistry.data.gov.uk/). The accompanying fields contain the location information remembered for Price Paid Data: Postcode; PAON Primary Addressable Object Name. Regularly the house number or name; SAON Secondary Addressable Object Name. In the event that there is a sub-working, for instance, the structure is separated into pads, there will be a SAON; Street; Locality; Town/City; District; County. 

To investigate and target prescribed areas crosswise over various settings as indicated by the nearness of conveniences and fundamental offices, we will get to information through FourSquare API interface and mastermind them as a dataframe for perception. By blending information on London properties and the relative value paid information from the HM Land Registry and information on comforts and fundamental offices encompassing such properties from FourSquare API interface, we will have the option to prescribe productive land ventures.

## Methodology

The Methodology area will portray the fundamental parts of our investigation and predication framework. The Methodology segment includes four phases:
1. Collecting Inspection Data
2. Exploring and Understanding Data
3. Data preparation and preprocessing 
4. Modeling

1. Collecting Inspection Data

In the wake of bringing in the fundamental libraries, we download the data from the HM Land Registry site as pursues:

In [3]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # transform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [4]:
#Read the data for examination (Source: http://landregistry.data.gov.uk/)
df_ppd = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv")

Now, let's explore and understand the data.

2. Explore and Understand Data

Let's read the dataset that we collected from the HM Land Registry website into a pandas' data frame and display the first five rows of it as follows:

In [5]:
df_ppd.head(5)

Unnamed: 0,{7011B109-CFCA-8ED6-E053-6B04A8C075C1},280000,2018-06-04 00:00,IP4 5ES,S,N,F,3,Unnamed: 8,RANDWELL CLOSE,Unnamed: 10,IPSWICH,IPSWICH.1,SUFFOLK,A,A.1
0,{7011B109-CFCB-8ED6-E053-6B04A8C075C1},280000,2018-05-29 00:00,IP1 4BS,T,N,F,261,,NORWICH ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
1,{7011B109-CFCC-8ED6-E053-6B04A8C075C1},170000,2018-04-27 00:00,IP4 4BH,T,N,F,31,,PARADE ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
2,{7011B109-CFCD-8ED6-E053-6B04A8C075C1},246000,2018-05-25 00:00,IP1 6NB,S,N,F,42,,ELMCROFT ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
3,{7011B109-CFCE-8ED6-E053-6B04A8C075C1},180000,2018-06-08 00:00,IP3 9LZ,T,N,F,48,,WYNTERTON CLOSE,,IPSWICH,IPSWICH,SUFFOLK,A,A
4,{7011B109-CFCF-8ED6-E053-6B04A8C075C1},245000,2018-05-11 00:00,IP1 4BU,T,N,F,235,,NORWICH ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A


In [6]:
df_ppd.shape

(1026571, 16)

Our dataset comprises of more than 700000 rows and 16 columns. We will currently get ready and preprocess information in like manner.

3. Data preparation and preprocessing

At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps:

1. Rename the column names
2. Format the date column
3. Sort data by date of sale
4. Select data only for the city of London
5. Make a list of street names in London
6. Calculate the street-wise average price of the property
7. Read the street-wise coordinates into a data frame, eliminating recurring word London from individual names
8. Join the data to find the coordinates of locations which fit into client's budget
9. Plot recommended locations on London map along with current market prices

In [14]:
# Assign meaningful column names
df_ppd.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

In [15]:
# Format the date column
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

# Delete all obsolete transactions which were done before 2016
df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [16]:
df_ppd_london = df_ppd.query("Town_City == 'LONDON'")

# Make a list of street names in LONDON
streets = df_ppd_london['Street'].unique().tolist()

In [17]:
df_grp_price = df_ppd_london.groupby(['Street'])['Price'].mean().reset_index()

# Give meaningful names to the columns
df_grp_price.columns = ['Street', 'Avg_Price']

In [18]:
#Input your Budget's Upper Limit and Lower Limit - Find the locations df_grp_price which fits your budget
df_affordable = df_grp_price.query("(Avg_Price >= 2200000) & (Avg_Price <= 2500000)")

In [12]:
# Display the dataframe
df_affordable

Unnamed: 0,Street,Avg_Price
196,ALBION SQUARE,2.450000e+06
391,ANHALT ROAD,2.435000e+06
406,ANSDELL TERRACE,2.250000e+06
422,APPLEGARTH ROAD,2.400000e+06
855,BARONSMEAD ROAD,2.375000e+06
981,BEAUCLERC ROAD,2.480000e+06
1102,BELVEDERE DRIVE,2.340000e+06
1215,BICKENHALL STREET,2.208500e+06
1253,BIRCHLANDS AVENUE,2.217000e+06
1553,BRAMPTON GROVE,2.456875e+06


In [19]:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [20]:
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")

index: 196
item: Street       ALBION SQUARE
Avg_Price         2.45e+06
Name: 196, dtype: object
item.Street only: ALBION SQUARE
index: 391
item: Street       ANHALT ROAD
Avg_Price      2.435e+06
Name: 391, dtype: object
item.Street only: ANHALT ROAD
index: 406
item: Street       ANSDELL TERRACE
Avg_Price           2.25e+06
Name: 406, dtype: object
item.Street only: ANSDELL TERRACE
index: 422
item: Street       APPLEGARTH ROAD
Avg_Price            2.4e+06
Name: 422, dtype: object
item.Street only: APPLEGARTH ROAD
index: 855
item: Street       BARONSMEAD ROAD
Avg_Price          2.375e+06
Name: 855, dtype: object
item.Street only: BARONSMEAD ROAD
index: 981
item: Street       BEAUCLERC ROAD
Avg_Price          2.48e+06
Name: 981, dtype: object
item.Street only: BEAUCLERC ROAD
index: 1102
item: Street       BELVEDERE DRIVE
Avg_Price           2.34e+06
Name: 1102, dtype: object
item.Street only: BELVEDERE DRIVE
index: 1215
item: Street       BICKENHALL STREET
Avg_Price           2.2085e+06
N

In [21]:
geolocator = Nominatim(user_agent="my-application")

In [22]:
df_affordable['city_coord'] = df_affordable['Street'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [23]:
df_affordable

Unnamed: 0,Street,Avg_Price,city_coord
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.289393239104)"
391,ANHALT ROAD,2.435000e+06,"(51.4803265, -0.1667607)"
406,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)"
422,APPLEGARTH ROAD,2.400000e+06,"(53.7486539, -0.3266704)"
855,BARONSMEAD ROAD,2.375000e+06,"(51.4773147, -0.239457)"
981,BEAUCLERC ROAD,2.480000e+06,"(51.4995771, -0.2290331)"
1102,BELVEDERE DRIVE,2.340000e+06,"(38.201316, -84.623076)"
1215,BICKENHALL STREET,2.208500e+06,"(51.5211969, -0.1589341)"
1253,BIRCHLANDS AVENUE,2.217000e+06,"(51.4483941, -0.1604676)"
1553,BRAMPTON GROVE,2.456875e+06,"(51.5703648, -0.2833944)"


In [24]:
df_affordable[['Latitude', 'Longitude']] = df_affordable['city_coord'].apply(pd.Series)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


In [25]:
df_affordable

Unnamed: 0,Street,Avg_Price,city_coord,Latitude,Longitude
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.289393239104)",-41.273758,173.289393
391,ANHALT ROAD,2.435000e+06,"(51.4803265, -0.1667607)",51.480326,-0.166761
406,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)",51.499890,-0.189103
422,APPLEGARTH ROAD,2.400000e+06,"(53.7486539, -0.3266704)",53.748654,-0.326670
855,BARONSMEAD ROAD,2.375000e+06,"(51.4773147, -0.239457)",51.477315,-0.239457
981,BEAUCLERC ROAD,2.480000e+06,"(51.4995771, -0.2290331)",51.499577,-0.229033
1102,BELVEDERE DRIVE,2.340000e+06,"(38.201316, -84.623076)",38.201316,-84.623076
1215,BICKENHALL STREET,2.208500e+06,"(51.5211969, -0.1589341)",51.521197,-0.158934
1253,BIRCHLANDS AVENUE,2.217000e+06,"(51.4483941, -0.1604676)",51.448394,-0.160468
1553,BRAMPTON GROVE,2.456875e+06,"(51.5703648, -0.2833944)",51.570365,-0.283394


In [26]:
df = df_affordable.drop(columns=['city_coord'])

In [27]:
df

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
196,ALBION SQUARE,2.450000e+06,-41.273758,173.289393
391,ANHALT ROAD,2.435000e+06,51.480326,-0.166761
406,ANSDELL TERRACE,2.250000e+06,51.499890,-0.189103
422,APPLEGARTH ROAD,2.400000e+06,53.748654,-0.326670
855,BARONSMEAD ROAD,2.375000e+06,51.477315,-0.239457
981,BEAUCLERC ROAD,2.480000e+06,51.499577,-0.229033
1102,BELVEDERE DRIVE,2.340000e+06,38.201316,-84.623076
1215,BICKENHALL STREET,2.208500e+06,51.521197,-0.158934
1253,BIRCHLANDS AVENUE,2.217000e+06,51.448394,-0.160468
1553,BRAMPTON GROVE,2.456875e+06,51.570365,-0.283394


In [28]:
address = 'London, UK'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London City are 51.5073219, -0.1276474.


In [29]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [31]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'YHE2FGAO2NLOUTHBLSCNCDBIY1Z34QBWXDZ5WPDAAVVN5GGM' # Foursquare ID
CLIENT_SECRET = 'SHKA0BZ1SJNN0UAJU0WARJTQ0RQ21IJVJQHRD1SJXGDYVRBD' # Foursquare Secret
VERSION = '20180602' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YHE2FGAO2NLOUTHBLSCNCDBIY1Z34QBWXDZ5WPDAAVVN5GGM
CLIENT_SECRET:SHKA0BZ1SJNN0UAJU0WARJTQ0RQ21IJVJQHRD1SJXGDYVRBD


We would now be able to continue to the Modeling stage. We will dissect neighborhoods to prescribe genuine domains where home purchasers can make a land venture. We will at that point prescribe beneficial settings as indicated by luxuries and basic offices encompassing such scenes for example primary schools, secondary schools, medical clinics and supermarkets.

4. Modeling

Subsequent to investigating the data set and picking up experiences into it, we are prepared to utilize the clustering methodology to examine genuine domains. We will utilize the k-means clustering procedure as it is quick and productive regarding computational expense, is profoundly adaptable to represent transformations in land showcase in London and is precise.

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
# Run the above function on each location and 
## create a new dataframe called location_venues and display it.
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

ALBION SQUARE
ANHALT ROAD
ANSDELL TERRACE
APPLEGARTH ROAD
BARONSMEAD ROAD
BEAUCLERC ROAD
BELVEDERE DRIVE
BICKENHALL STREET
BIRCHLANDS AVENUE
BRAMPTON GROVE
BRIARDALE GARDENS
BROOKWAY
BURBAGE ROAD
BURY WALK
CALLCOTT STREET
CAMPDEN HILL ROAD
CAMPION ROAD
CANNING PLACE
CARLISLE ROAD
CARLTON GARDENS
CARLYLE COURT
CHALCOT SQUARE
CHARLES LANE
CHELSEA CRESCENT
CHESTER CLOSE NORTH
CHEYNE COURT
CHEYNE ROW
CHISWICK MALL
CITY ROAD
CLARENDON STREET
CLONCURRY STREET
COLBECK MEWS
COLLEGE CRESCENT
CORNWALL TERRACE MEWS
COURT LANE GARDENS
CRESCENT GROVE
DALEBURY ROAD
DEWHURST ROAD
DORIA ROAD
DOWNSHIRE HILL
DUCHESS WALK
ECCLESTON SQUARE MEWS
EGBERT STREET
EGERTON PLACE
ELM PARK ROAD
FRANK DIXON WAY
FULTON MEWS
GERARD ROAD
GERRARD ROAD
GIRDLERS ROAD
GLOUCESTER CRESCENT
GORDON PLACE
GRAFTON SQUARE
GRAHAM TERRACE
HARMAN DRIVE
HARRIS STREET
HAVANNAH STREET
HAZLEWELL ROAD
HEREFORD MEWS
HERONDALE AVENUE
HIGHGATE HIGH STREET
HIGHWOOD HILL
HILLGATE PLACE
HOLLYCROFT AVENUE
HOLLYWOOD MEWS
HONEYWELL ROAD
HORTENSI

In [49]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ALBION SQUARE,-41.273758,173.289393,The Free House,-41.273340,173.287364,Bar
1,ALBION SQUARE,-41.273758,173.289393,The Indian Cafe,-41.273308,173.286530,Indian Restaurant
2,ALBION SQUARE,-41.273758,173.289393,Queen's Gardens,-41.273671,173.291383,Park
3,ALBION SQUARE,-41.273758,173.289393,Deville Cafe,-41.271941,173.285535,Beer Garden
4,ALBION SQUARE,-41.273758,173.289393,Urban,-41.274355,173.286317,New American Restaurant
5,ALBION SQUARE,-41.273758,173.289393,Fish Stop,-41.276010,173.289592,Fish & Chips Shop
6,ALBION SQUARE,-41.273758,173.289393,Hopgood's,-41.274749,173.283831,Restaurant
7,ALBION SQUARE,-41.273758,173.289393,The Bridge Street Collective,-41.272520,173.285517,Café
8,ALBION SQUARE,-41.273758,173.289393,Burger Culture,-41.274750,173.284030,Burger Joint
9,ALBION SQUARE,-41.273758,173.289393,The Vic Mac's Brew Bar,-41.274757,173.283914,Pub


In [50]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ALBION SQUARE,25,25,25,25,25,25
ANHALT ROAD,15,15,15,15,15,15
ANSDELL TERRACE,54,54,54,54,54,54
APPLEGARTH ROAD,4,4,4,4,4,4
BARONSMEAD ROAD,14,14,14,14,14,14
BEAUCLERC ROAD,32,32,32,32,32,32
BICKENHALL STREET,93,93,93,93,93,93
BIRCHLANDS AVENUE,9,9,9,9,9,9
BRAMPTON GROVE,4,4,4,4,4,4
BRIARDALE GARDENS,4,4,4,4,4,4


In [51]:
#get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 351 uniques categories.


In [52]:
location_venues.shape

(5628, 7)

In [53]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [54]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo Exhibit
0,ALBION SQUARE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.040000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
1,ANHALT ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
2,ANSDELL TERRACE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.018519,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.018519,0.000000,0.0
3,APPLEGARTH ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
4,BARONSMEAD ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
5,BEAUCLERC ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
6,BICKENHALL STREET,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.010753,...,0.000000,0.0,0.000000,0.000000,0.00,0.010753,0.000000,0.000000,0.010753,0.0
7,BIRCHLANDS AVENUE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
8,BRAMPTON GROVE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0
9,BRIARDALE GARDENS,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,...,0.000000,0.0,0.000000,0.000000,0.00,0.000000,0.000000,0.000000,0.000000,0.0


In [55]:
london_grouped.shape

(149, 352)

In [56]:
# What are the top 5 venues/facilities nearby profitable real estate investments?#

num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALBION SQUARE----
               venue  freq
0               Café  0.20
1                Pub  0.08
2                Bar  0.08
3  Indian Restaurant  0.08
4         Restaurant  0.08


----ANHALT ROAD----
                  venue  freq
0                   Pub  0.27
1         Grocery Store  0.13
2  Gym / Fitness Center  0.07
3   Japanese Restaurant  0.07
4    English Restaurant  0.07


----ANSDELL TERRACE----
                venue  freq
0          Restaurant  0.07
1      Clothing Store  0.07
2  Italian Restaurant  0.06
3                 Pub  0.06
4               Hotel  0.06


----APPLEGARTH ROAD----
               venue  freq
0                Bar  0.25
1          Nightclub  0.25
2             Casino  0.25
3                Pub  0.25
4  Accessories Store  0.00


----BARONSMEAD ROAD----
                 venue  freq
0           Restaurant  0.07
1                  Pub  0.07
2  Indie Movie Theater  0.07
3     Community Center  0.07
4          Pizza Place  0.07


----BEAUCLERC ROAD----
       

In [57]:
# Define a function to return the most common venues/facilities nearby real estate investments#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [58]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [59]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

In [60]:
venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALBION SQUARE,Café,Pub,Bar,Restaurant,Indian Restaurant,Coffee Shop,Seafood Restaurant,Park,Fish & Chips Shop,French Restaurant
1,ANHALT ROAD,Pub,Grocery Store,Japanese Restaurant,Gym / Fitness Center,Diner,Plaza,Pizza Place,Cocktail Bar,English Restaurant,French Restaurant
2,ANSDELL TERRACE,Clothing Store,Restaurant,Pub,Italian Restaurant,Hotel,English Restaurant,French Restaurant,Garden,Juice Bar,Café
3,APPLEGARTH ROAD,Nightclub,Bar,Pub,Casino,Zoo Exhibit,Food,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
4,BARONSMEAD ROAD,Community Center,Café,Movie Theater,Food & Drink Shop,Pizza Place,Farmers Market,Breakfast Spot,Coffee Shop,Indie Movie Theater,Thai Restaurant


In [61]:
venues_sorted.shape

(149, 11)

In [62]:
london_grouped.shape

(149, 352)

In [63]:
london_grouped=df

After our examination of settings/offices/civilities close by the most productive land interests in London, we could start by grouping properties by scenes/offices/courtesies close by.

In [64]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([0, 3, 2, 3, 1, 0, 1, 2, 2, 0, 3, 3, 0, 0, 1, 1, 0, 3, 2, 0, 4, 4,
       3, 0, 0, 2, 3, 4, 0, 2, 3, 1, 3, 1, 1, 4, 3, 3, 1, 2, 0, 1, 4, 2,
       4, 2, 4, 2, 2, 3], dtype=int32)

In [65]:
#Dataframe to include Clusters

london_grouped_clustering=df
london_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
196,ALBION SQUARE,2450000.0,-41.273758,173.289393
391,ANHALT ROAD,2435000.0,51.480326,-0.166761
406,ANSDELL TERRACE,2250000.0,51.49989,-0.189103
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457


In [66]:
london_grouped_clustering.shape

(161, 4)

In [67]:
df.shape

(161, 4)

In [68]:
london_grouped_clustering.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [69]:
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [70]:
# add clustering labels
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

london_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,ALBION SQUARE,2450000.0,-41.273758,173.289393,0,Café,Pub,Bar,Restaurant,Indian Restaurant,Coffee Shop,Seafood Restaurant,Park,Fish & Chips Shop,French Restaurant
391,ANHALT ROAD,2435000.0,51.480326,-0.166761,3,Pub,Grocery Store,Japanese Restaurant,Gym / Fitness Center,Diner,Plaza,Pizza Place,Cocktail Bar,English Restaurant,French Restaurant
406,ANSDELL TERRACE,2250000.0,51.49989,-0.189103,2,Clothing Store,Restaurant,Pub,Italian Restaurant,Hotel,English Restaurant,French Restaurant,Garden,Juice Bar,Café
422,APPLEGARTH ROAD,2400000.0,53.748654,-0.32667,3,Nightclub,Bar,Pub,Casino,Zoo Exhibit,Food,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
855,BARONSMEAD ROAD,2375000.0,51.477315,-0.239457,1,Community Center,Café,Movie Theater,Food & Drink Shop,Pizza Place,Farmers Market,Breakfast Spot,Coffee Shop,Indie Movie Theater,Thai Restaurant
981,BEAUCLERC ROAD,2480000.0,51.499577,-0.229033,0,Pub,Coffee Shop,Hotel,Thai Restaurant,Chinese Restaurant,Grocery Store,Italian Restaurant,Street Food Gathering,Gym,Supermarket
1102,BELVEDERE DRIVE,2340000.0,38.201316,-84.623076,1,,,,,,,,,,
1215,BICKENHALL STREET,2208500.0,51.521197,-0.158934,2,Hotel,Café,Coffee Shop,Indian Restaurant,Chinese Restaurant,Restaurant,Pizza Place,Gastropub,Bar,Garden
1253,BIRCHLANDS AVENUE,2217000.0,51.448394,-0.160468,2,Pub,Chinese Restaurant,Brewery,Lake,French Restaurant,Bakery,Coffee Shop,Train Station,Zoo Exhibit,Flower Shop
1553,BRAMPTON GROVE,2456875.0,51.570365,-0.283394,0,Men's Store,Bar,Middle Eastern Restaurant,Lake,Food,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market


In [71]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [72]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,2450000.0,Café,Pub,Bar,Restaurant,Indian Restaurant,Coffee Shop,Seafood Restaurant,Park,Fish & Chips Shop,French Restaurant
981,2480000.0,Pub,Coffee Shop,Hotel,Thai Restaurant,Chinese Restaurant,Grocery Store,Italian Restaurant,Street Food Gathering,Gym,Supermarket
1553,2456875.0,Men's Store,Bar,Middle Eastern Restaurant,Lake,Food,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
1914,2445000.0,Café,Cricket Ground,Bakery,Diner,Convenience Store,Pizza Place,Coffee Shop,Fish & Chips Shop,Garden Center,Gas Station
1980,2492500.0,Supermarket,English Restaurant,Rental Car Location,Café,Fast Food Restaurant,Park,Gym,Hardware Store,American Restaurant,Food Truck


In [73]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
855,2375000.0,Community Center,Café,Movie Theater,Food & Drink Shop,Pizza Place,Farmers Market,Breakfast Spot,Coffee Shop,Indie Movie Theater,Thai Restaurant
1102,2340000.0,,,,,,,,,,
2068,2375000.0,Pub,Park,Grocery Store,Pizza Place,Bakery,Indian Restaurant,Ice Cream Shop,Yoga Studio,Hotel,Coffee Shop
2129,2379652.7,Pub,Hotel,Coffee Shop,Bakery,Grocery Store,Yoga Studio,Indian Restaurant,Ice Cream Shop,Park,Pizza Place
2943,2367500.0,Hotel,Pub,Garden,Italian Restaurant,Café,Coffee Shop,Chinese Restaurant,Mediterranean Restaurant,Bar,Burger Joint


In [74]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
406,2250000.0,Clothing Store,Restaurant,Pub,Italian Restaurant,Hotel,English Restaurant,French Restaurant,Garden,Juice Bar,Café
1215,2208500.0,Hotel,Café,Coffee Shop,Indian Restaurant,Chinese Restaurant,Restaurant,Pizza Place,Gastropub,Bar,Garden
1253,2217000.0,Pub,Chinese Restaurant,Brewery,Lake,French Restaurant,Bakery,Coffee Shop,Train Station,Zoo Exhibit,Flower Shop
2225,2200000.0,Restaurant,Bar,Furniture / Home Store,Frozen Yogurt Shop,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
2637,2250000.0,Gastropub,Zoo Exhibit,Food & Drink Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop


In [75]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
391,2435000.0,Pub,Grocery Store,Japanese Restaurant,Gym / Fitness Center,Diner,Plaza,Pizza Place,Cocktail Bar,English Restaurant,French Restaurant
422,2400000.0,Nightclub,Bar,Pub,Casino,Zoo Exhibit,Food,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
1632,2397132.0,Convenience Store,Coffee Shop,Gym / Fitness Center,Grocery Store,Zoo Exhibit,Food,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market
1797,2400000.0,,,,,,,,,,
2158,2425000.0,Coffee Shop,Hotel,Italian Restaurant,Bar,Pub,Café,Cocktail Bar,Burger Joint,Clothing Store,Sushi Restaurant


In [76]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2242,2300000.0,Campground,Trail,Food & Drink Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop
2405,2286679.0,Café,Italian Restaurant,Bar,Pub,Coffee Shop,Pizza Place,French Restaurant,Vegetarian / Vegan Restaurant,Cocktail Bar,Market
2685,2287500.0,Pub,Brewery,Gift Shop,Art Museum,Gym / Fitness Center,Zoo Exhibit,Food & Drink Shop,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop
3376,2298000.0,Hotel,Zoo Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop
4284,2265000.0,Pub,Zoo Exhibit,Food & Drink Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop


# Results and Discussion

As a matter of first importance, despite the fact that the London Housing Market might be stuck, it is as yet an "ever-green" for business undertakings. 

We may talk about our outcomes under two primary points of view. 

To begin with, we may look at them as indicated by neighborhoods/London regions. It is intriguing to take note of that, albeit West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) may be viewed as profoundly beneficial scenes to buy a land as indicated by luxuries and fundamental offices encompassing such settings for example grade schools, secondary schools, emergency clinics and markets, South-West London (Wandsworth, Balham) and North-West London (Isliington) are emerging as next future world class settings with a wide scope of enhancements and offices. In like manner, one may focus under-estimated genuine domains in these regions of London so as to make a business issue. 

Second, we may break down our outcomes as indicated by the five groups we have created. Despite the fact that, all bunches could laud an ideal scope of offices and civilities, we have discovered two principle designs. The principal design we are alluding to, for example Groups 0, 2 and 4, may target home purchasers inclined to live in 'green' regions with parks, waterfronts. Rather, the second example we are alluding to, for example Bunches 1 and 3, may target people who love bars, theaters and soccer.

# Conclusion

To summarize, as per Bloomberg News, the London Housing Market is stuck. It is currently confronting various headwinds, including the possibility of higher charges and an admonition from the Bank of England that U.K. home estimations could fall as much as 30 percent in case of a confused exit from the European Union. In this situation, it is critical to embrace AI apparatuses so as to help home buyers customer base in London to settle on savvy and viable choices. Therefore, the business issue we were presenting was: how might we offer help to home buyers demographic in to buy a reasonable land in London in this unsure monetary and budgetary situation? 

To take care of this business issue, we clustered London neighborhoods so as to prescribe scenes and the present normal cost of land where home buyers can make a land venture. We prescribed productive settings as indicated by civilities and fundamental offices encompassing such scenes for example grade schools, secondary schools, medical clinics and markets. 

To begin with, we accumulated information on London properties and the relative value paid information were separated from the HM Land Registry (http://landregistry.data.gov.uk/). Also, to investigate and target prescribed areas crosswise over various scenes as indicated by the nearness of conveniences and basic offices, we got to information through FourSquare API interface and orchestrated them as an information outline for perception. By consolidating information on London properties and the relative value paid information from the HM Land Registry and information on enhancements and basic offices encompassing such properties from FourSquare API interface, we had the option to prescribe productive land ventures. 

Second, The Methodology segment involved four phases: 1. Gather Inspection Data; 2. Investigate and Understand Data; 3. Information planning and preprocessing; 4. Displaying. Specifically, in the displaying area, we utilized the k-implies grouping procedure as it is quick and productive regarding computational expense, is exceptionally adaptable to represent transformations in land advertise in London and is exact. 

At last, we made the determination that despite the fact that the London Housing Market might be stuck, it is as yet an "ever-green" for business issues. We talked about our outcomes under two principle viewpoints. To start with, we analyzed them as per neighborhoods/London regions. albeit West London (Notting Hill, Kensington, Chelsea, Marylebone) and North-West London (Hampsted) may be viewed as exceptionally productive settings to buy a land as indicated by enhancements and basic offices encompassing such scenes for example primary schools, secondary schools, clinics and supermarkets, South-West London (Wandsworth, Balham) and North-West London (Isliington) are emerging as next future world class scenes with a wide scope of civilities and offices. Appropriately, one may focus under-valued genuine domains in these zones of London so as to make a business issue. Second, we dissected our outcomes as indicated by the five groups we delivered. While Clusters 0, 2 and 4 may target home purchasers inclined to live in 'green' zones with parks, waterfronts, Clusters 1 and 3 may target people who love bars, theaters and soccer.