# Introduction



In the 21st century mostly every single process we do is automated. Starting from ordering our favorite food or requesting a taxi and ending with some advanced processes such as executing million of millions transactions by simply pressing one button. And it all thanks to a major progress of Information Technologies. 

Nowadays if we need to know where we can find some lovely dinner places in foreign country, what we need to do is simply open maps and by analyzing our geolocation the advanced machine can predict our preferences and provide us with most rated nearby restaurants in no time. But not that it is user-friendly and really reliable, it is, also, can be extremely precious for commercial use. For example if we would get a project to build a new venue (office building, department store, grocery store, restaurant and so on.) in a city we have never been to by using some machine learning algorithms, we can cluster our whole city dataset and then visualize it on map to predict what would be the most efficient spot for a new start up. That is exactly what we will be doing today, imagine, that we got a business project to build a new department store in City of London the most advanced Borough in London and we need to find the best neighbourhood to proceed with. Please do take your seat ant let me take you through this fascinating journey where will be exploring and clustering different neighbourhoods in London to find which one best suits our needs.

# Data

First thing first in order for us to start analyze neighbourhoods in City of London we need to find dataset which one would include boroughs and neighbourhoods names and coordinates, because we will need them later on for data visualization. We will be using dataset provided by doogal.co.uk which is almost ideal for our project. Then we will just need to clean our data a little bit by dropping all unsufficient columns and rename district to a borough and ward to a neighbourhood columns, so our whole data would look more appealing and user-friendly.

Afterwards we will be calling foursquare api to find top 10 venues in every neighborhood so we can cluster them and decide which one will be the most suitable for our project.

# Methodology

In this section first of all we will be cleaning ourdataset, to prepare it for visualization and clustering.

Secondly when data is cleaned, we will be using k-means clustering algorithm to categorize our dataset.

And lastly we will be transfering all of our processed dataset on the map, so that way we would be able to predict, what will be the most sufficient place for our project to begin.

### Installing and importing all required libraries for our project.

In [53]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install folium -c conda-forge
import folium # map rendering library
from geopy.geocoders import Nominatim
!conda install -c conda-forge geopy --yes

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.



### Reading and examining size of our dataframe.

In [54]:
df = pd.read_csv('https://www.doogal.co.uk/UKPostcodesCSV.ashx?area=London')
df.head(10)

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,...,Quality,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district,Police force,Water company,Plus Code
0,BR1 1AA,Yes,51.401546,0.015415,540291,168873,TQ402688,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.218257,BR,BR1,Metropolitan Police,Thames Water,9F32C228+J5
1,BR1 1AB,Yes,51.406333,0.015208,540262,169405,TQ402694,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley North,0.253666,BR,BR1,Metropolitan Police,Thames Water,9F32C248+G3
2,BR1 1AD,No,51.400057,0.016715,540386,168710,TQ403687,Greater London,Bromley,Bromley Town,...,1,1,2019-11-23,Bromley South,0.044559,BR,BR1,Metropolitan Police,,9F32C228+2M
3,BR1 1AE,Yes,51.404543,0.014195,540197,169204,TQ401692,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley North,0.462939,BR,BR1,Metropolitan Police,Thames Water,9F32C237+RM
4,BR1 1AF,Yes,51.401392,0.014948,540259,168855,TQ402688,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.227664,BR,BR1,Metropolitan Police,Thames Water,9F32C227+HX
5,BR1 1AG,Yes,51.401392,0.014948,540259,168855,TQ402688,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.227664,BR,BR1,Metropolitan Police,Thames Water,9F32C227+HX
6,BR1 1AH,Yes,51.400441,0.01739,540432,168754,TQ404687,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.048906,BR,BR1,Metropolitan Police,Thames Water,9F32C228+5X
7,BR1 1AJ,Yes,51.400489,0.018833,540532,168762,TQ405687,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.115632,BR,BR1,Metropolitan Police,Thames Water,9F32C229+5G
8,BR1 1AL,Yes,51.406549,0.01313,540117,169425,TQ401694,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley North,0.332674,BR,BR1,Metropolitan Police,,9F32C247+J7
9,BR1 1AX,No,51.408226,0.017578,540421,169620,TQ404696,Greater London,Bromley,Bromley Town,...,1,1,2019-11-23,Bromley North,0.042067,BR,BR1,Metropolitan Police,,9F32C259+72


In [55]:
df.shape

(321375, 46)

### Cleaning our dataset and checking the size afterwards.

In [43]:
# Dropping all unrequired collumns
df.drop(['In Use?', 'Easting', 'Northing', 'Grid Ref', 'County', 'District Code', 'Ward Code', 'Country', 'County Code', 'Constituency', 'Introduced', 'Terminated', 'Parish', 
         'National Park', 'Population', 'Households', 'Built up area', 'Built up sub-division', 'Lower layer super output area', 'Rural/urban', 'Region', 'Altitude', 'London zone', 'LSOA Code',
         'Local authority', 'MSOA Code', 'Middle layer super output area', 'Parish Code', 'Census output area', 'Constituency Code', 'Index of Multiple Deprivation', 'Quality', 'User Type', 
         'Last updated', 'Distance to station', 'Postcode area', 'Postcode district', 'Police force', 'Water company', 'Plus Code'], axis=1)
df_lon = df[['District', 'Ward', 'Latitude', 'Longitude', 'Nearest station', 'Postcode']]
df_lon = df_lon.rename(columns={'District':'Borough', 'Ward':'Neighbourhood'})
df_lon.head(10)

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Nearest station,Postcode
0,Bromley,Bromley Town,51.401546,0.015415,Bromley South,BR1 1AA
1,Bromley,Bromley Town,51.406333,0.015208,Bromley North,BR1 1AB
2,Bromley,Bromley Town,51.400057,0.016715,Bromley South,BR1 1AD
3,Bromley,Bromley Town,51.404543,0.014195,Bromley North,BR1 1AE
4,Bromley,Bromley Town,51.401392,0.014948,Bromley South,BR1 1AF
5,Bromley,Bromley Town,51.401392,0.014948,Bromley South,BR1 1AG
6,Bromley,Bromley Town,51.400441,0.01739,Bromley South,BR1 1AH
7,Bromley,Bromley Town,51.400489,0.018833,Bromley South,BR1 1AJ
8,Bromley,Bromley Town,51.406549,0.01313,Bromley North,BR1 1AL
9,Bromley,Bromley Town,51.408226,0.017578,Bromley North,BR1 1AX


In [56]:
df_lon.shape

(321375, 6)

### Creating city of London dataset and finding the size

In [57]:
city_of_london_data = df_lon[df_lon['Borough'] == 'City of London'].reset_index(drop=True)

city_of_london_data.head(10)

Unnamed: 0,Borough,Neighbourhood,Latitude,Longitude,Nearest station,Postcode
0,City of London,Bishopsgate,51.518895,-0.078378,Liverpool Street,E1 6AN
1,City of London,Portsoken,51.515567,-0.075635,Aldgate,E1 7AA
2,City of London,Portsoken,51.515457,-0.076718,Aldgate,E1 7AD
3,City of London,Portsoken,51.515613,-0.076899,Aldgate,E1 7AE
4,City of London,Portsoken,51.515613,-0.076899,Aldgate,E1 7AF
5,City of London,Portsoken,51.51563,-0.076279,Aldgate,E1 7AW
6,City of London,Aldgate,51.515526,-0.078592,Aldgate,E1 7AX
7,City of London,Aldgate,51.515526,-0.078592,Aldgate,E1 7AY
8,City of London,Aldgate,51.515175,-0.07761,Aldgate,E1 7BH
9,City of London,Portsoken,51.515432,-0.076806,Aldgate,E1 7BS


In [58]:
city_of_london_data.shape

(6800, 6)

### Finding latitude and longitude of City of London

In [59]:
address = 'City of London'
geolocator = Nominatim(user_agent="lon_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of City of London are {}, {}'.format(latitude, longitude))

The geographical coordinates of City of London are 51.5156177, -0.0919983


### Visualizing Neighbourhoods of City of London on map.

In [61]:
city_of_london_map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(city_of_london_data['Latitude'], city_of_london_data['Longitude'], city_of_london_data['Neighbourhood']):
    label = folium.Popup(label)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='purple',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(city_of_london_map)
    
city_of_london_map