### Data Section
  
The data to identify the neighborhoods in London can be scrapped from wikipedia page on [London Bouroughs](https://en.wikipedia.org/wiki/List_of_London_boroughs). The table can be scrapped using BeautifulSoup Library. The data includes, amongst others:
1. Borough	
2. Local authority	
3. Political control
4. Area (sq mi)	
5. Population (2013 est)
6. Co-ordinates
  
For the sake of analysis, only coordinates and boroughs are necessary. Those coordinates will be later used to build a list of business within a certain radius from it and fed into the Foursquare API to get the details of number so customers to visit them. The frequency of visits will be calculated and the boroughs will be clustered according to algorithms to identify the boroughs with high demand for certain businesses. 

In [1]:
from bs4 import BeautifulSoup
import requests
import numpy as np
import pandas as pd 

In [2]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_London_boroughs').text
soup = BeautifulSoup(source, 'lxml')
table = soup.find('table', class_='wikitable sortable')

In [3]:
rows = table.find_all('tr') #saving all the rows from the table.
columns = rows[0].find_all('th')
columns = [column.get_text(strip=True) for column in columns]
columns

['Borough',
 'Inner',
 'Status',
 'Local authority',
 'Political control',
 'Headquarters',
 'Area (sq mi)',
 'Population (2013 est)[1]',
 'Co-ordinates',
 'Nr. in map']

In [4]:
table_pd = pd.DataFrame(columns=list(range(len(columns))), index=range(len(rows)-1)) #defining the table structure
for i_row, val_row in enumerate(rows):                             #Looping over the rows
    cols = val_row.find_all('td')                                  #Extracting columns from each row as a list
    cols = [col.get_text(strip=True) for col in cols]              #Striping for text between tags
    for idx, val in enumerate(cols):                               #Looping over the columns in each row into DataFrame
        table_pd[idx][i_row-1] = val

### Customer Type 
Land Prices for the city of London can be obtained from the HM Land Registry website [Land value estimates for policy appraisal 2017.](https://www.gov.uk/government/publications/land-value-estimates-for-policy-appraisal-2017) The downloaded Excel File can be uploaded into a panda Data Frame. The average land prices per hactare mapped by borough can be extracted from this data as shown below:
**Note**- The sheet one containing the residential land prices has been considered.

In [5]:
df_prices = pd.read_excel('Land_value_estimates.xlsx', sheet_name=0)

#Extracting relavant Columns and renaming.
df_prices_london = df_prices[(df_prices['Residential Land'] == 'London') & (df_prices['Unnamed: 2'] != 'City of London')][['Unnamed: 2', 'Unnamed: 3']] 
df_prices_london.reset_index(drop=True, inplace=True)
df_prices_london.rename(columns={'Unnamed: 2':'Neighbourhood', 'Unnamed: 3':'Price (Pounds)'}, inplace=True)
df_prices_london.head()

Unnamed: 0,Neighbourhood,Price (Pounds)
0,Barking and Dagenham,5400000
1,Barnet,24900000
2,Bexley,10300000
3,Brent,16800000
4,Bromley London,17700000


### Extracting Relavant Data
The table below can be derived from the scrapped table in wikipedia page shown above.

In [6]:
#Extracting Geographic Co-ordinates

import re
Latitudes = []
Longitudes = []
for i in range(len(table_pd[8])):
    extract = re.findall('/\\d*\..*;.*\d*\.\d*', table_pd[8][i])
    lat_long = extract[0][1:].split('; ')
    Latitudes.append(float(lat_long[0]))
    Longitudes.append(float(lat_long[1]))
    
table_pd['Latitude'] = Latitudes
table_pd['Longitude'] = Longitudes

table_pd.rename(columns={key:value for key, value in zip(list(table_pd.columns), columns)}, inplace=True)
table_pd.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map,Latitude,Longitude
0,Barking and Dagenham[note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /5...,25,51.5607,0.1557
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /5...,31,51.6252,-0.1517
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /5...,23,51.4549,0.1505
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /5...,12,51.5588,-0.2817
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /5...,20,51.4039,0.0198


In [7]:
#Renaming and extracting relavant columns
london_data = table_pd[['Borough', 'Latitude', 'Longitude']].reindex()
london_data['Borough'] = london_data['Borough'].str.replace('\[note (1|2|4)\]', '', regex=True)
london_data.rename(columns={'Borough':'Neighbourhood'}, inplace=True)
london_df = pd.concat([london_data, df_prices_london['Price (Pounds)']], axis=1)

london_df.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude,Price (Pounds)
0,Barking and Dagenham,51.5607,0.1557,5400000
1,Barnet,51.6252,-0.1517,24900000
2,Bexley,51.4549,0.1505,10300000
3,Brent,51.5588,-0.2817,16800000
4,Bromley,51.4039,0.0198,17700000
