# Capstone Final Project
## Manchester housing market clustering
### By Andrei Staradubets

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find all optimal UK wards within possible price for a client who wants to buy a property. Specifically, this report will be targeted to people, who wants to find all different kinds of offers on Manchester housing market with price below 175000.

Nowadays that is alwais a problem to find a good place in new city for living. This project helps us as seekers for property in a new city to find interesting wards to consider within our budget.

We will use our data science powers to generate a map of different wards based on infrastructure criteria. Advantages of each area will then be clearly expressed so that best possible final kind of ward can be chosen by client.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:

__Type of prevailing infrastructure__ in the ward (A list of venues in the ward with their type. An example is given in part Analysis)  
__Mean Price__ of wards (a list of wards with their mean price and coordinates. An example is given in part Total number of Manchester Wards)  

Following data sources will be needed to extract/generate the required information:

Centers of ward areas will be generated algorithmically by mean coordinates of the neighbourhood in those wards. Neighbourhood coordinates we can find in Office Of National Statistics's open dataset "__National Statistics Postcode Lookup UK__"(an example is given in part UK neighbourhood postcodes)  

Mean price offer for each ward we will achive through __land registy 2019 open dataset__ (An example is given in part dataset of all price paid data transactions)

In [1]:
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Let's download dataset of *all price paid data transactions* received at HM Land Registry in 2019

In [2]:
df=pd.read_csv("http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv")
df.head()

Unnamed: 0,{8A78B2B0-5D07-5CB0-E053-6B04A8C0F504},780000,2019-05-20 00:00,SG5 1RT,O,N,F,7 - 11A,Unnamed: 8,BURY MEAD ROAD,Unnamed: 10,HITCHIN,NORTH HERTFORDSHIRE,HERTFORDSHIRE,B,A
0,{8A78B2B0-5D09-5CB0-E053-6B04A8C0F504},520000,2019-01-30 00:00,AL3 4GD,F,N,F,8,,CASSIUS DRIVE,KINGS PARK,ST ALBANS,ST ALBANS,HERTFORDSHIRE,B,A
1,{8A78B2B0-5D0A-5CB0-E053-6B04A8C0F504},385000,2019-05-17 00:00,WD25 0NF,T,N,F,14,,AURORA CLOSE,,WATFORD,WATFORD,HERTFORDSHIRE,B,A
2,{8A78B2B0-5D0C-5CB0-E053-6B04A8C0F504},600000,2019-05-02 00:00,WD7 7NN,O,N,F,44,,WATLING STREET,,RADLETT,HERTSMERE,HERTFORDSHIRE,B,A
3,{8A78B2B0-5D0D-5CB0-E053-6B04A8C0F504},178500,2019-05-08 00:00,SG6 4LU,F,N,L,PELICAN COURT,11.0,SOUTHFIELDS,,LETCHWORTH GARDEN CITY,NORTH HERTFORDSHIRE,HERTFORDSHIRE,B,A
4,{8A78B2B0-5D0E-5CB0-E053-6B04A8C0F504},425000,2019-04-24 00:00,SG9 9JH,S,N,F,103,,SNELLS MEAD,,BUNTINGFORD,EAST HERTFORDSHIRE,HERTFORDSHIRE,B,A


In [3]:
# Assign meaningful column names
df.columns = ['ID', 'Price', 'Date', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']
df=df[['Price','Postcode','Town_City']]
df.head()

Unnamed: 0,Price,Postcode,Town_City
0,520000,AL3 4GD,ST ALBANS
1,385000,WD25 0NF,WATFORD
2,600000,WD7 7NN,RADLETT
3,178500,SG6 4LU,LETCHWORTH GARDEN CITY
4,425000,SG9 9JH,BUNTINGFORD


Thansactions of only Manchester

In [4]:
df_manch=df[df['Town_City']=='MANCHESTER']
df_manch_grouped=df_manch.groupby(by='Postcode')[['Price']].mean().reset_index()
df_manch_grouped.head()

Unnamed: 0,Postcode,Price
0,M1 1AL,225000.0
1,M1 1BA,306250.0
2,M1 1BY,205583.333333
3,M1 1EB,155000.0
4,M1 1EP,214166.666667


#### Let's download dataset with all *UK neighbourhood postcodes*

In [5]:
postcode=pd.read_csv("https://opendata.camden.gov.uk/api/views/tr8t-gqz7/rows.csv?accessType=DOWNLOAD")

In [6]:
postcode=postcode[['Postcode 1','Latitude','Longitude','Ward Name']]
postcode=postcode.rename(columns={'Postcode 1':'Postcode'})
postcode.head()

Unnamed: 0,Postcode,Latitude,Longitude,Ward Name
0,LS248HR,53.886659,-1.259178,Tadcaster
1,BL8 4LA,53.626345,-2.363526,North Manor
2,B13 3PA,52.427533,-1.891091,Billesley
3,N22 5RE,51.600081,-0.105721,Woodside
4,WA150JZ,53.369049,-2.342589,Hale Barns


#### And find only those, in which transactions were conducted

In [7]:
neighbourhoods=df_manch_grouped.merge(postcode, on='Postcode', how='inner')
wards=neighbourhoods.groupby(by='Ward Name')[['Price','Latitude','Longitude']].mean().reset_index()
wards.head()

Unnamed: 0,Ward Name,Price,Latitude,Longitude
0,Ancoats & Beswick,233707.245614,53.477757,-2.203515
1,Ardwick,667150.110063,53.464674,-2.215311
2,Astley Mosley Common,212713.041126,53.502328,-2.450243
3,Atherleigh,133690.242537,53.520424,-2.504809
4,Atherton,229106.4,53.527912,-2.489754


### Total number of Manchester Wards:

In [8]:
wards.shape

(89, 4)

#### With Price below or equal 175000

In [9]:
wards=wards[wards['Price']<=175000]
wards.head()

Unnamed: 0,Ward Name,Price,Latitude,Longitude
3,Atherleigh,133690.242537,53.520424,-2.504809
5,Audenshaw,168528.72,53.473169,-2.128313
7,Barton,151272.531481,53.478252,-2.357463
8,Besses,157128.371795,53.547343,-2.281292
11,Bucklow-St Martins,156611.996528,53.418071,-2.427692


### Map of all possible wards:

In [10]:
address = 'Manchester, UK'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manchester are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Manchester are 53.4794892, -2.2451148.


In [11]:
map_manch = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, ward, price in zip(wards['Latitude'], wards['Longitude'], wards['Ward Name'], wards['Price']):
    label = '{}, Mean price: {}'.format(ward, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manch)  
    
map_manch

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting differences between different possible wards in Manchester, UK

In first step we have collected the required data: location and type (category) of every venue within 1km from each ward center.

Second step in our analysis will be clustering wards according to their venues - we will use DS K-mean method to divide wards on 7 different categories.

In third and final step we will focus on what is so different between those wards and define clusters by their content. Also we will show map with location of different wards and their type 

## Analysis <a name="analysis"></a>

Get all venues for the ward

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Ward', 
                  'Ward Latitude', 
                  'Ward Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
# @hidden_cell
CLIENT_ID = 'ADIMSWWOBAIWZYFKXRVQ3XQ2XHDR5XVQG5XSVRL5QEFLCXGN' # your Foursquare ID
CLIENT_SECRET = 'JYF21C3MACYFCIJ4LNT5WNPB4VZ2FHRHENUO2R13TYYCG3Z0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [14]:
manch_venues = getNearbyVenues(names=wards['Ward Name'],
                                   latitudes=wards['Latitude'],
                                   longitudes=wards['Longitude']
                                  )

Atherleigh
Audenshaw
Barton
Besses
Bucklow-St Martins
Castleton
Chadderton South
Charlestown
Clayton & Openshaw
Denton North East
Denton South
Denton West
Droylsden East
Droylsden West
East Middleton
Failsworth East
Failsworth West
Gorton & Abbey Hey
Hopwood Hall
Hulme
Irlam
Irwell Riverside
Kearsley
Little Hulton
Little Lever and Darcy Lever
Longsight
Moss Side
Moston
Pendlebury
Radcliffe East
Radcliffe North
Radcliffe West
Reddish North
Sharston
Swinton North
Tyldesley
Unsworth
Walkden North
West Middleton
Winton
Woodhouse Park


### Number of venues in each ward:

In [15]:
manch_venues.groupby('Ward')[['Venue']].count()

Unnamed: 0_level_0,Venue
Ward,Unnamed: 1_level_1
Atherleigh,4
Audenshaw,2
Barton,8
Besses,2
Bucklow-St Martins,4
Castleton,4
Chadderton South,6
Charlestown,5
Clayton & Openshaw,4
Denton North East,12


In [16]:
print('There are {} uniques categories of venues.'.format(len(manch_venues['Venue Category'].unique())))

There are 74 uniques categories of venues.


In [17]:
# one hot encoding
manch_onehot = pd.get_dummies(manch_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manch_onehot['Ward'] = manch_venues['Ward'] 

# move neighborhood column to the first column
fixed_columns = [manch_onehot.columns[-1]] + list(manch_onehot.columns[:-1])
manch_onehot = manch_onehot[fixed_columns]

manch_onehot.head()

Unnamed: 0,Ward,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Bakery,Bar,Breakfast Spot,Bus Line,Bus Stop,...,Tailor Shop,Tanning Salon,Theater,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Women's Store
0,Atherleigh,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Atherleigh,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Atherleigh,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Atherleigh,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Audenshaw,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
manch_grouped = manch_onehot.groupby('Ward').mean().reset_index()
manch_grouped.head()

Unnamed: 0,Ward,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Bakery,Bar,Breakfast Spot,Bus Line,Bus Stop,...,Tailor Shop,Tanning Salon,Theater,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Women's Store
0,Atherleigh,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Audenshaw,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Barton,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Besses,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bucklow-St Martins,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Let's find most common venies for each ward

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Ward']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
wards_venues_sorted = pd.DataFrame(columns=columns)
wards_venues_sorted['Ward'] = manch_grouped['Ward']

for ind in np.arange(manch_grouped.shape[0]):
    wards_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manch_grouped.iloc[ind, :], num_top_venues)

wards_venues_sorted.head()

Unnamed: 0,Ward,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Atherleigh,Park,Recreation Center,Gastropub,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop
1,Audenshaw,Reservoir,Discount Store,Convenience Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck
2,Barton,Chinese Restaurant,Park,Pub,Gym,Pizza Place,Sandwich Place,Fish & Chips Shop,Discount Store,Dive Bar,Farm
3,Besses,Shop & Service,Pub,Women's Store,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop
4,Bucklow-St Martins,Grocery Store,IT Services,Tailor Shop,Gym / Fitness Center,Gym,Halal Restaurant,Gastropub,Garden Center,Furniture / Home Store,Convenience Store


### Clustering our possible wards

In [20]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 7

manch_grouped_clustering = manch_grouped.drop('Ward', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manch_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 6, 4, 0, 1, 2, 5, 5, 1, 2])

In [21]:
manch_grouped['Cluster']=kmeans.labels_
clusters=manch_grouped[['Ward','Cluster']].merge(wards,left_on='Ward',right_on='Ward Name')
clusters.head()

Unnamed: 0,Ward,Cluster,Ward Name,Price,Latitude,Longitude
0,Atherleigh,4,Atherleigh,133690.242537,53.520424,-2.504809
1,Audenshaw,6,Audenshaw,168528.72,53.473169,-2.128313
2,Barton,4,Barton,151272.531481,53.478252,-2.357463
3,Besses,0,Besses,157128.371795,53.547343,-2.281292
4,Bucklow-St Martins,1,Bucklow-St Martins,156611.996528,53.418071,-2.427692


### Map of clustered wards:

In [22]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
for lat, lon, poi, cluster in zip(clusters['Latitude'], clusters['Longitude'], clusters['Ward'], clusters['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Let's analyze each cluster:

In [24]:
clusters_analysis=wards_venues_sorted
clusters_analysis['Cluster']=kmeans.labels_

#### Cluster 1

In [25]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 0, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
3,Shop & Service,Pub,Women's Store,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,0
16,Pub,Sporting Goods Shop,Women's Store,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,0
22,Pub,Italian Restaurant,Food Truck,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,0
27,Soccer Stadium,Pub,Women's Store,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,0
40,Pub,Deli / Bodega,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Women's Store,0


As we can see, Cluster 1 is the most "__pubbed__" (sorry) cluster with a lot of venues for good friday evening

#### Cluster 2

In [26]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 1, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
4,Grocery Store,IT Services,Tailor Shop,Gym / Fitness Center,Gym,Halal Restaurant,Gastropub,Garden Center,Furniture / Home Store,Convenience Store,1
8,Supermarket,Discount Store,Sandwich Place,Grocery Store,Food & Drink Shop,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,1
10,Grocery Store,Convenience Store,Indian Restaurant,Auto Garage,Furniture / Home Store,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,1
19,Grocery Store,Convenience Store,Tanning Salon,Pharmacy,Performing Arts Venue,Park,Café,Food Truck,Deli / Bodega,Garden Center,1
20,Grocery Store,Convenience Store,Stationery Store,Indian Restaurant,Fast Food Restaurant,Dive Bar,Farm,Fish & Chips Shop,Flower Shop,Food & Drink Shop,1
23,Asian Restaurant,Grocery Store,Indian Restaurant,Auto Garage,Furniture / Home Store,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,1
24,Pub,Athletics & Sports,Grocery Store,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,1
28,Indian Restaurant,Chinese Restaurant,Grocery Store,Fish & Chips Shop,Food Truck,Dive Bar,Farm,Fast Food Restaurant,Flower Shop,Food & Drink Shop,1
34,Park,Grocery Store,Bar,Flower Shop,Women's Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,1
38,Grocery Store,Women's Store,Food Truck,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Furniture / Home Store,1


The idea of the cluster 2 is small stores with everything. Let's call it "__stored__"

#### Cluster 3

In [27]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 2, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
5,Women's Store,Canal Lock,Garden Center,Athletics & Sports,Home Service,Halal Restaurant,Gym / Fitness Center,Gym,Grocery Store,Gastropub,2
9,Supermarket,Pub,Mobile Phone Shop,Post Office,Pharmacy,Clothing Store,Italian Restaurant,Outlet Store,Candy Store,Shopping Plaza,2
11,Pub,Hotel,Intersection,Fast Food Restaurant,Supermarket,Gym / Fitness Center,Gym,Grocery Store,Gastropub,Garden Center,2
12,Supermarket,Tram Station,Pharmacy,Italian Restaurant,Soccer Stadium,Flower Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,2
21,Supermarket,Food & Drink Shop,Café,Bus Stop,Food Truck,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,2
31,Indian Restaurant,Gastropub,Fast Food Restaurant,Furniture / Home Store,Gym / Fitness Center,Gym,Grocery Store,Halal Restaurant,Garden Center,Deli / Bodega,2
32,Asian Restaurant,Farm,Auto Workshop,Furniture / Home Store,Dive Bar,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck,2
33,Tram Station,Supermarket,Coffee Shop,Food Truck,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,2
36,Gym / Fitness Center,Bar,Sandwich Place,Park,Supermarket,Italian Restaurant,Pizza Place,Tram Station,Gym,Garden Center,2
37,Coffee Shop,Trail,Pharmacy,Shopping Mall,Fast Food Restaurant,Supermarket,Grocery Store,Gym / Fitness Center,Gastropub,Garden Center,2


That cluster is more for daily routine life-style. Let's call it "__routine__"

#### Cluster 4

In [28]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 3, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
29,Auto Garage,Women's Store,Furniture / Home Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck,3


For car owners, for whom car is more then anything else. __Driver__

#### Cluster 5

In [29]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 4, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
0,Park,Recreation Center,Gastropub,Food & Drink Shop,Discount Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,4
2,Chinese Restaurant,Park,Pub,Gym,Pizza Place,Sandwich Place,Fish & Chips Shop,Discount Store,Dive Bar,Farm,4
26,Halal Restaurant,Shop & Service,Pizza Place,Park,Flower Shop,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,4
39,Track,Park,Miscellaneous Shop,Café,Women's Store,Food & Drink Shop,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,4


__Recreation__ cluster

#### Cluster 6

In [30]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 5, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
6,Hotel,Pizza Place,Malay Restaurant,Pub,Food & Drink Shop,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,5
7,Construction & Landscaping,Train Station,Bus Line,Breakfast Spot,Convenience Store,Gym,Grocery Store,Gastropub,Gym / Fitness Center,Halal Restaurant,5
13,Business Service,Chinese Restaurant,Playground,Convenience Store,Pub,Grocery Store,Gastropub,Garden Center,Furniture / Home Store,Food Truck,5
14,Chinese Restaurant,Indian Restaurant,Home Service,Construction & Landscaping,Gym / Fitness Center,Food & Drink Shop,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,5
15,Pub,Gym / Fitness Center,Construction & Landscaping,Bakery,Theater,Fast Food Restaurant,Chinese Restaurant,Food & Drink Shop,Dive Bar,Farm,5
17,Hotel,Gym / Fitness Center,Gym,Bakery,Sandwich Place,Market,Food & Drink Shop,Dive Bar,Farm,Fast Food Restaurant,5
18,Convenience Store,Locksmith,Home Service,Auto Garage,Furniture / Home Store,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,5
25,Café,Construction & Landscaping,Bar,Women's Store,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck,5
30,Dive Bar,Photography Studio,Women's Store,Deli / Bodega,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck,5
35,Bus Stop,Gym,Bar,Playground,Turkish Restaurant,Gym / Fitness Center,Halal Restaurant,Grocery Store,Gastropub,Garden Center,5


If you want to work near your home and be always in rush of city life, that is your choice. __Business__

#### Cluster 7

In [31]:
clusters_analysis.loc[clusters_analysis['Cluster'] == 6, clusters_analysis.columns[[1] + list(range(2, clusters_analysis.shape[1]))]]

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster
1,Reservoir,Discount Store,Convenience Store,Dive Bar,Farm,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Truck,6


Cozy cluster for cozy life. __Suburb__

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of wards in Manchester (89), there are only some in interesting for us price range. 41 ward could be chosen by customer. For each ward we receive their coordinates as mean value of all neighbourhoods in the ward.

After directing our attention to those wards, we build Machine Learning model of clustering for 7 different types of wards (defind by content of different types of venues) and then put it to the map for better visualization.

Result of all this is 7 zones containing different possible wards for our customer. This, of course, does not imply that those zones are actually optimal locations! Purpose of this analysis was to only provide info on areas, but not prove impossibility of buying property in others wards - it is entirely possible that there will be a very good offer in any of those areas, reasons which would make them suitable. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only good price but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to divide Manchester wards by types of venues in order to aid customets in narrowing down the search for optimal ward for them. By calculating mean price from open UK Goverment datasets we define start set of wards for clustering. Then from Foursquare data we have identified content of each ward. Clustering of those locations was then performed in order to create major zones of interest and put it on the map to be used as starting points for final exploration by clients.

Final decission on optimal ward will be made by customer based on specific characteristics of ward, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.