<center>
    <img src='images.png' alt='IBM image', style="float:center; height:80px;">
</center>

> **Name:** Jeff Mac Osei<br>
    **Course:** Coursera - Applied Data Science Capstone <br>
    **Date:** 25/04/2021<br>

This is the capstone project for the IBM Data Science Professional Certificate on Coursera

### TABLE OF CONTENTS

* [Intoduction](#intro)
    * [Problem Statement](#ps)
    * [Project Goal](#pg)
* [Data Acquistition and Understanding](#da)
    * [Data Collected](#dco)
* [Methodology](#method)
    * [Data Collection](#dn)
    * [Data Preprocessing](#dp)
    * [Feature Engineering](#fe)
    * [London Map Visualization](#lmv)
    * [One-Hot Encoding](#ohe)
    * [Top 10 Venues in the neighbourhood](#t)
    * [Modelling](#md)
* [Results](#re)
* [Conclusion](#co)

# Introduction <a class='anchor' id='intro'></a>

The first step in approaching a data science problem is problem understanidng. This step is very important since it allows us to know the kind of decisions we want to make, the information or data that will be needed to inform those decisions and finally, the kind of analysis that will be used to arrive at those decisions. In a nutshell, developing a mental model of the problem allows us to properly structure potentially relevant information needed to solve the problem.

We will begin the problem understanding with the problem statement.

### Problem Statement <a class='anchor' id='ps'></a>

<p style="text-align:justify;">London is one of the most popular places for tourists and vacations. This may be due to the fact that London is made up of diverse people, multicultural and offers a wide variety of experiences for tourists. In this project, we will try to group the neighbourhoods of London and draw insights to what they look like now. This is to help tourists and people who have intentions of migrating make good decisions based on the insights that will be drawn from this project.</p>

### Project Goal <a class='anchor' id='pg'></a>

<p style="text-align:justify;">The goal of this project is to explore city of London and find out what makes this city one of the popular destinations for tourists.</p>

# Data Acquisition and Understanding<a class='anchor' id='da'></a>

<p style="text-align:justify;">This project faces several constraints and opportunities. There are not so many well-known or readily available datasets to help explore the city of London to arrive at the goals described above. To complete the project in full, we need the geographical location data for London. Postal codes in the city serve as a starting point. Using Postal codes,  we can easily find out the neighborhoods, boroughs, venues and their most popular venue categories.that can be used for this purpose.</p>

<p style="text-align:justify;">For this project we decided to acquire information from <a href="https://en.wikipedia.org/wiki/List_of_areas_of_London">Wikipedia</a> </p>

<p style="text-align:justify;">Finally, we will use the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters.</p>

### Data Collected<a class='anchor' id='dco'></a>

The information acquired from this site includes:
* Location
* Londond Borough
* Post Town
* Postcode District
* Dial Code
* OS Gride ref

This wikipedia page lacks information about the geographical locations. To solve this problem we use ArcGIS API.

##### ArcGIS API

ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups.

More specifically, we use ArcGIS to get the geo locations of the neighbourhoods of London. The following columns are added to our initial dataset which prepares our data.

* latitude : Latitude for Neighbourhood
* longitude : Longitude for Neighbourhood


# Methodology<a class="anchor" id="method">

<p style="text-align:justify;">The goal for this project as mentioned earlier is to explore city of London and find out what makes this city one of the popular destinations for tourists. Based on this stated goal, we will create a folium map to obtain insights that can help us arrive at some plausible conclusions.</p>

We will then go ahead and summarize our findings in a short presentation.

To achieve the first goal, the we will have to answer the question below:
* What are some of the features that these city possess?

We will start by importing the libraries we will need for the project.

In [1]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans

### Data Collection <a class="anchor" id="dn">

To attain the neighbourhoods in London, we start by scraping the list of areas of london from wikipedia.

In [2]:
site_url = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
london_handle = requests.get(site_url)
london_data = pd.read_html(london_handle.text)
london_data = london_data[1]

In [3]:
london_handle

<Response [200]>

In [4]:
# lets have a view of our data
london_data.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


### Data Preprocessing <a class="anchor" id="dp">

Now we will clean our data and make it suitable for our analysis

The first step we will take in the data cleaning process is to change the column names to lower case and replace all spaces with underscores to allow easy access. Then we will remove all numeric values form the `borough` column.

In [5]:
london_data.columns = ['location', 'london_borough', 'post_town', 'postcode_district', 'dial_code', 'os_grid_ref']

In [7]:
london_data['london_borough'] = london_data['london_borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [8]:
london_data.head()

Unnamed: 0,location,london_borough,post_town,postcode_district,dial_code,os_grid_ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


Now that we have our data, we will perform feature selection to select the right features that will be used for our analysis.

In [9]:
london_data.columns

Index(['location', 'london_borough', 'post_town', 'postcode_district',
       'dial_code', 'os_grid_ref'],
      dtype='object')

In [11]:
# selecting the right features for our analysis

london = london_data[['london_borough', 'post_town', 'postcode_district']]

### Feature Engineering <a class="anchor" id="fe">

We can observe that dataset contains all neighbourhoods which we will not need. Therefore we will have to select only `London`.

In [12]:
london = london[london['post_town'].str.contains('LONDON')]

In [13]:
london.head()

Unnamed: 0,london_borough,post_town,postcode_district
0,"Bexley, Greenwich",LONDON,SE2
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4"
6,City,LONDON,EC3
7,Westminster,LONDON,WC2
9,Bromley,LONDON,SE20


Now we will use the ArcGIS API to get the latitude and longitude of our London neighbourhood data.

In [17]:
from arcgis.geocoding import geocode
from arcgis.gis import GIS
gis = GIS()

In [18]:
def get_x_y_uk(address1):
    lat_coords = 0
    lng_coords = 0
    g = geocode(address='{}, London, England, GBR'.format(address1))[0]
    lng_coords = g['location']['x']
    lat_coords = g['location']['y']
    return lat_coords, lng_coords

In [19]:
coordinates_latlng_uk = london['postcode_district'].apply(lambda x: get_x_y_uk(x))

In [20]:
london['latitude'] = [float(val[0]) for val in coordinates_latlng_uk]
london['longitude'] = [float(val[1]) for val in coordinates_latlng_uk]

In [21]:
london.head()

Unnamed: 0,london_borough,post_town,postcode_district,latitude,longitude
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746
6,City,LONDON,EC3,51.512,-0.08058
7,Westminster,LONDON,WC2,51.51651,-0.11968
9,Bromley,LONDON,SE20,51.41009,-0.05683


Now that we have our data properly formatted, we will then go ahead and visualize the map for London.

### London Map Visualization <a class="anchor" id="lmv">

In [22]:
import folium

In [23]:
london1 = geocode(address='London, England, GBR')[0]
london_lng_coords = london1['location']['x']
london_lat_coords = london1['location']['y']
london_lng_coords

-0.1272099999999341

In [25]:
# Creating the map of London
map_London = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)
map_London

# adding markers to map
for latitude, longitude, borough, town in zip(london['latitude'], london['longitude'], london['london_borough'], london['post_town']):
    label = '{}, {}'.format(town, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(map_London)  
    
map_London

Now that we have visualized the neighbourhoods, we need to find out what each neighbourhood is like and what are the common venue and venue categories within a 500m radius.

This is where Foursquare comes into play. With the help of Foursquare we define a function which collects information pertaining to each neighbourhood including that of the name of the neighbourhood, geo-coordinates, venue and venue categories.

In [26]:
CLIENT_ID = 'SEEWRB4CKEUA20RKFEUGG1ZGZ0R5NSQJ2451B3QMARJJUQ1U'
CLIENT_SECRET = 'DBXUP2G14FPLIC4ZTT3RBHFVXJKRZVK5J2FUAMFDQ5QEJPM4'
VERSION = '20201216' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: SEEWRB4CKEUA20RKFEUGG1ZGZ0R5NSQJ2451B3QMARJJUQ1U
CLIENT_SECRET:DBXUP2G14FPLIC4ZTT3RBHFVXJKRZVK5J2FUAMFDQ5QEJPM4


In [27]:
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )

        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            pass

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']

    return(nearby_venues)

In [28]:
london_venues = getNearbyVenues(london['london_borough'], london['latitude'], london['longitude'])

Bexley, Greenwich 
Ealing, Hammersmith and Fulham
City
Westminster
Bromley
Islington
Islington
Barnet
Enfield
Wandsworth
Southwark
City
Richmond upon Thames
Barnet
Islington
Wandsworth
Westminster
Bromley
Newham
Ealing
Westminster
Lewisham
Camden
Southwark
Tower Hamlets
Bexley
City
Lewisham
Greenwich
Tower Hamlets
Camden
Haringey
Tower Hamlets
Haringey
Barnet
Brent
Lambeth
Lewisham
Tower Hamlets
Kensington and ChelseaHammersmith and Fulham
Brent
Barnet
Barnet
Southwark
Tower Hamlets
Camden
Tower Hamlets
Waltham Forest
Newham
Islington
Richmond upon Thames
Lewisham
Camden
Westminster
Greenwich
Kensington and Chelsea
Barnet
Westminster
Lewisham
Waltham Forest
Hounslow, Ealing, Hammersmith and Fulham
Brent
Barnet
Lambeth, Wandsworth
Islington
Barnet
Merton
Barnet
Westminster
Barnet, Brent, Camden
Lewisham
Bexley
Haringey
Bromley
Tower Hamlets
Newham
Hackney
Islington
Southwark
Lewisham
Brent
Southwark
Ealing
Kensington and Chelsea
Wandsworth
Southwark
Barnet
Newham
Richmond upon Thames
En

The london_venues dataset looks like:

In [29]:
# glimpse of the data set with venues and venue category included
london_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Bexley, Greenwich",51.49245,0.12127,Lesnes Abbey,Historic Site
1,"Bexley, Greenwich",51.49245,0.12127,Sainsbury's,Supermarket
2,"Bexley, Greenwich",51.49245,0.12127,Lidl,Supermarket
3,"Bexley, Greenwich",51.49245,0.12127,Abbey Wood Railway Station (ABW),Train Station
4,"Bexley, Greenwich",51.49245,0.12127,Bean @ Work,Coffee Shop


We will now perform one-hot encoding on the categorical values in our dataset by converting them to numerical values. 
This will allow us to calculate the top 10 common venues based on similarities in features.

### One Hot Encoding <a class="anchor" id="ohe">

In [30]:
# One hot encoding
venue_cat = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# Adding neighbourhood to the mix
venue_cat['Neighbourhood'] = london_venues['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [venue_cat.columns[-1]] + list(venue_cat.columns[:-1])
venue_cat = venue_cat[fixed_columns]

# Grouping and calculating the mean
london_grouped = venue_cat.groupby('Neighbourhood').mean().reset_index()

### Top 10 Venues in the neighbourhood <a class="anchor" id="t">

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)

    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [36]:
# create a new dataframe for London
neighborhoods_venues_sorted_london = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_london['Neighbourhood'] = london_grouped['Neighbourhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted_london.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_london.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barnet,Coffee Shop,Café,Grocery Store,Italian Restaurant,Pub,Bus Stop,Supermarket,Sushi Restaurant,Pharmacy,Turkish Restaurant
1,"Barnet, Brent, Camden",Bus Station,Bakery,Gym / Fitness Center,Clothing Store,Supermarket,Zoo Exhibit,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant
2,Bexley,Supermarket,Historic Site,Coffee Shop,Convenience Store,Train Station,Park,Bus Stop,Construction & Landscaping,Golf Course,Exhibit
3,"Bexley, Greenwich",Sports Club,Convenience Store,Golf Course,Park,Construction & Landscaping,Historic Site,Bus Stop,Food & Drink Shop,Food Court,Flower Shop
4,"Bexley, Greenwich",Supermarket,Historic Site,Train Station,Convenience Store,Coffee Shop,Fish Market,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio


### Modelling <a class="anchor" id="md">

We will be using KMeans Clustering ML algorithm to cluster similar neighbourhoods together. We will be going with the number of clusters as 5.

In [37]:
# set number of clusters
k_num_clusters = 5

London_grouped_clustering = london_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_london = KMeans(n_clusters=k_num_clusters, random_state=0).fit(London_grouped_clustering)
kmeans_london

KMeans(n_clusters=5, random_state=0)

In [38]:
kmeans_london.labels_

array([0, 4, 3, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0])

In [39]:
neighborhoods_venues_sorted_london.insert(0, 'Cluster Labels', kmeans_london.labels_ +1)

In [41]:
london_data = london
london_data = london_data.join(neighborhoods_venues_sorted_london.set_index('Neighbourhood'), on='london_borough')
london_data.head()

Unnamed: 0,london_borough,post_town,postcode_district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bexley, Greenwich",LONDON,SE2,51.49245,0.12127,4,Supermarket,Historic Site,Train Station,Convenience Store,Coffee Shop,Fish Market,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio
1,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",51.51324,-0.26746,2,Grocery Store,Indian Restaurant,Park,Breakfast Spot,Train Station,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant
6,City,LONDON,EC3,51.512,-0.08058,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,French Restaurant,Sandwich Place,Cocktail Bar,Wine Bar
7,Westminster,LONDON,WC2,51.51651,-0.11968,1,Hotel,Coffee Shop,Café,Sandwich Place,Pub,Italian Restaurant,Theater,Restaurant,Burger Joint,Bakery
9,Bromley,LONDON,SE20,51.41009,-0.05683,1,Supermarket,Grocery Store,Convenience Store,Fast Food Restaurant,Hotel,Park,Historic Site,Gym / Fitness Center,Italian Restaurant,Golf Course


In [42]:
# drop missing values before visualizing the data

london_data_nonan = london_data.dropna(subset=['Cluster Labels'])

In [44]:
map_clusters_london = folium.Map(location=[london_lat_coords, london_lng_coords], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_data_nonan['latitude'], london_data_nonan['longitude'], london_data_nonan['london_borough'], london_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_london)
        
map_clusters_london

#### Analysing Clusters

We could examine our clusters by expanding on our code using the Cluster Labels column:

##### Cluster 1

In [45]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 1, london_data_nonan.columns[[1] + 
                                        list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,post_town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,LONDON,1,Coffee Shop,Hotel,Italian Restaurant,Gym / Fitness Center,Pub,Restaurant,French Restaurant,Sandwich Place,Cocktail Bar,Wine Bar
7,LONDON,1,Hotel,Coffee Shop,Café,Sandwich Place,Pub,Italian Restaurant,Theater,Restaurant,Burger Joint,Bakery
9,LONDON,1,Supermarket,Grocery Store,Convenience Store,Fast Food Restaurant,Hotel,Park,Historic Site,Gym / Fitness Center,Italian Restaurant,Golf Course
10,LONDON,1,Coffee Shop,Pub,Café,Food Truck,Vietnamese Restaurant,Gym / Fitness Center,Italian Restaurant,Park,Cocktail Bar,Breakfast Spot
12,LONDON,1,Coffee Shop,Pub,Café,Food Truck,Vietnamese Restaurant,Gym / Fitness Center,Italian Restaurant,Park,Cocktail Bar,Breakfast Spot
...,...,...,...,...,...,...,...,...,...,...,...,...
522,LONDON,1,Café,Pub,Grocery Store,Coffee Shop,Bakery,Bar,Metro Station,BBQ Joint,Seafood Restaurant,Park
523,"LONDON, WOODFORD GREEN",1,Hotel,Café,Pub,Theater,Garden,Plaza,Monument / Landmark,Restaurant,Coffee Shop,Art Gallery
526,LONDON,1,Coffee Shop,Café,Grocery Store,Italian Restaurant,Pub,Bus Stop,Supermarket,Sushi Restaurant,Pharmacy,Turkish Restaurant
527,LONDON,1,Pub,Grocery Store,Bus Stop,Coffee Shop,Indian Restaurant,Historic Site,Park,Café,Clothing Store,Construction & Landscaping


##### Cluster 2

In [46]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 2, london_data_nonan.columns[[1] + 
                                                    list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,post_town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,LONDON,2,Grocery Store,Indian Restaurant,Park,Breakfast Spot,Train Station,Fish & Chips Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant
167,"LONDON, WELLING",2,Sports Club,Convenience Store,Golf Course,Park,Construction & Landscaping,Historic Site,Bus Stop,Food & Drink Shop,Food Court,Flower Shop
458,"LONDON, ERITH",2,Sports Club,Convenience Store,Golf Course,Park,Construction & Landscaping,Historic Site,Bus Stop,Food & Drink Shop,Food Court,Flower Shop


##### Cluster 3

In [47]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 3, london_data_nonan.columns[[1] + 
                                                list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,post_town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
378,"HARROW, STANMOREEDGWARE, LONDON",3,Gym,Metro Station,Bakery,Food Truck,Food Stand,Food Court,Food & Drink Shop,Flower Shop,Flea Market,Exhibit


##### Cluster 4

In [48]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 4, london_data_nonan.columns[[1] + 
                                                list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,post_town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,LONDON,4,Supermarket,Historic Site,Train Station,Convenience Store,Coffee Shop,Fish Market,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Film Studio
45,"BEXLEYHEATH, LONDON",4,Supermarket,Historic Site,Coffee Shop,Convenience Store,Train Station,Park,Bus Stop,Construction & Landscaping,Golf Course,Exhibit
124,LONDON,4,Supermarket,Historic Site,Coffee Shop,Convenience Store,Train Station,Park,Bus Stop,Construction & Landscaping,Golf Course,Exhibit
291,"LONDON, SIDCUP",4,Supermarket,Historic Site,Coffee Shop,Convenience Store,Train Station,Park,Bus Stop,Construction & Landscaping,Golf Course,Exhibit
506,LONDON,4,Supermarket,Historic Site,Coffee Shop,Convenience Store,Train Station,Park,Bus Stop,Construction & Landscaping,Golf Course,Exhibit


##### Cluster 5

In [49]:
london_data_nonan.loc[london_data_nonan['Cluster Labels'] == 5, london_data_nonan.columns[[1] + 
                                                    list(range(5, london_data_nonan.shape[1]))]]

Unnamed: 0,post_town,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
121,LONDON,5,Bus Station,Bakery,Gym / Fitness Center,Clothing Store,Supermarket,Zoo Exhibit,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Filipino Restaurant


# Results and Discussion <a class="anchor" id="re">

The neighbourhoods of London are very mulitcultural. There are a lot of different cusines including Indian, Italian, Turkish and Chinese. London seems to take a step further in this direction by having a lot of Restaurants, bars, juice bars, coffee shops, Fish and Chips shop and Breakfast spots. It has a lot of shopping options too with that of the Flea markets, flower shops, fish markets, Fishing stores, clothing stores. The main modes of transport seem to be Buses and trains. For leisure, the neighbourhoods are set up to have lots of parks, golf courses, zoo, gyms and Historic sites.

Overall, the city of London offers a multicultural, diverse and certainly an entertaining experience.

# Conclusion <a class="anchor" id="co">

The purpose of this project was to explore the city of London and find out what makes this city one of the popular destinations for tourists and migrants. We explored both the cities based on their postal codes and then extrapolated the common venues present in each of the neighbourhoods finally concluding with clustering similar neighbourhoods together.

We could see that each of the neighbourhoods in london have a wide variety of experiences to offer which is unique in it's own way. The cultural diversity is quite evident which also gives the feeling of a sense of inclusion.