## **Introduction**

London is a quite popular tourist and vacation destination for people all around the world. It is diverse and multicultural and offer a wide variety of experiences that is widely sought after. We try to group the neighbourhoods of London and draw insights to what they look like now.

## **Business Problem**

The aim is to help tourists choose their destinations depending on the experiences that the neighbourhoods have to offer and what they would want to have. This also helps people make decisions if they are thinking about migrating to London or even if they want to relocate neighbourhoods within the city. Our findings will help stakeholders make informed decisions and address any concerns they have including the different kinds of cuisines, provision stores and what the city has to offer.

## **Data Description**

We require geolocation data for London. Postal codes of city serves as a starting point. Using Postal codes we use can find out the neighbourhoods, boroughs, venues and their most popular venue categories.


## **London**

To derive our solution, We scrape our data from https://en.wikipedia.org/wiki/List_of_areas_of_London

This wikipedia page has information about all the neighbourhoods, we limit it London.

1. borough : Name of Neighbourhood
2. town : Name of borough
3. post_code : Postal codes for London

This wikipedia page lacks information about the geographical locations. To solve this problem we use ArcGIS API

## **ArcGIS API**

ArcGIS Online enables you to connect people, locations, and data using interactive maps. Work with smart, data-driven styles and intuitive analysis tools that deliver location intelligence. Share your insights with the world or specific groups.

More specifically, we use ArcGIS to get the geo locations of the neighbourhoods of London. The following columns are added to our initial dataset which prepares our data.

1. latitude : Latitude for Neighbourhood
2. longitude : Longitude for Neighbourhood

Based on all the information collected for London, we have sufficient data to build our model. We cluster the neighbourhoods together based on similar venue categories. We then present our observations and findings. Using this data, our stakeholders can take the necessary decision.

## **Methodology**

We will be creating our model with the help of Python so we start off by importing all the required packages.

In [None]:
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium

# import k-means for the clustering stage
from sklearn.cluster import KMeans

Package breakdown:

1. Pandas : To collect and manipulate data in HTMl and then data analysis
2. requests : Handle http requests
3. matplotlib : Detailing the generated maps
4. folium : Generating maps of London
5. sklearn : To import Kmeans which is the machine learning model that we are using.

The approach taken here is to explore the city, plot the map to show the neighbourhoods being considered and then build our model by clustering all of the similar neighbourhoods together and finally plot the new map with the clustered neighbourhoods. We draw insights and then discuss our findings.

## **Exploring London**

## Neighborhoods of London

We begin to start collecting and refining the data needed for the our business solution to work.

### **Data Collection**

To get the neighbourhoods in london, we start by scraping the list of areas of london wiki page.

In [None]:
url_london = "https://en.wikipedia.org/wiki/List_of_areas_of_London"
wiki_london_url = requests.get(url_london)
wiki_london_url

<Response [200]>

In [None]:
wiki_london_data = pd.read_html(wiki_london_url.text)
wiki_london_data

[                                                   0
 0  Map all coordinates in "Category:Areas of Lond...
 1                       Download coordinates as: KML,
             Location                     London borough  ... Dial code OS grid ref
 0         Abbey Wood              Bexley, Greenwich [7]  ...       020    TQ465785
 1              Acton  Ealing, Hammersmith and Fulham[8]  ...       020    TQ205805
 2          Addington                         Croydon[8]  ...       020    TQ375645
 3         Addiscombe                         Croydon[8]  ...       020    TQ345665
 4        Albany Park                             Bexley  ...       020    TQ478728
 ..               ...                                ...  ...       ...         ...
 526         Woolwich                          Greenwich  ...       020    TQ435795
 527   Worcester Park       Sutton, Kingston upon Thames  ...       020    TQ225655
 528  Wormwood Scrubs             Hammersmith and Fulham  ...       020    TQ2258

Scraping the webpage gives us all the tables present on the page. We need the 2nd table, so selecting the 2nd table.

In [None]:
wiki_london_data = wiki_london_data[1]
wiki_london_data

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
...,...,...,...,...,...,...
526,Woolwich,Greenwich,LONDON,SE18,020,TQ435795
527,Worcester Park,"Sutton, Kingston upon Thames",WORCESTER PARK,KT4,020,TQ225655
528,Wormwood Scrubs,Hammersmith and Fulham,LONDON,W12,020,TQ225815
529,Yeading,Hillingdon,HAYES,UB4,020,TQ115825


This will be our final dataset that we will be working with.