# Peer Graded Assignment
##  Battle of Neighbourhoods - Week 1

### Subject: Brexit: A Challenge or Opportunity for Catering Industry - Data Analysis of London Restaurant
### By: Charles Tsang

#### Introduction
Twenty-Twenty has a landmark ring to it. The Start of a new decade prompts bigger-than-usual thought about the future. Britain’s reckoning with Brexit will leave a significant impact on the hospitality and food industry. However, Brexit does offer the potential for re-setting the direction of food business throughout the UK. 
Throughout its history, London’s dominance has often been portrayed as a ‘problem’ for its host nation. Apart from that, it is simultaneously the political, economic, business and cultural art centre. Therefore, London is very diverse. There are many different restaurants around London, including French, Italian, Asian, African, Middle Eastern and American ones.

#### Business Problem Description
As an investor agent, we are looking to take its opportunity to explore expansion after the Brexit. Lack of confidence, some restaurants are decided to shut down their business and moved back to their countries. Along with recent organic trend, many restaurants/cafés are available to rent in this moment. Traditionally, London is overwhelmingly seen as expensive and inaccessible. The preliminary target is to create a high-end / fine dining restaurants with organic mix and healthy. In order to survive in such competitive market, a strategic plan is essential. Wanna bet? Below factors will be explored in order to decide on the physical location in London:

1.	London Population
2.	London Demographics
3.	The location competitors
4.	Cuisine and Ingredient served by competitors
5.	Segmentation of Market

When we consider all these factors, we can create a map and information chart where to establish the restaurant, ingredient used and each district is clustered according to the venue density.

#### Target Audience
Considering the diversity of London, there is a high multicultural sense. It ranges from Londoners, tourists and those who are passionate about organic food. To recommend the correct location, our company has appointed me to work on Data Science project for researching the best choice. This research would help anyone who wants to start their catering business on organic food in London after the Brexit. 

#### Success Criteria
The success criteria of the project will be a good recommendation of borough/Neighborhood choice and nearest suppliers of ingredients for organic restaurant.


#### Data Description
To consider the problem, we can list the data as below:

One city will be analyzed in this project: London. And this project will base on the public data from Wikipedia and Foursquare. 
In the Wikipedia, the London is treated as synonymous to the “Great London Area”, meaning there are the areas within the London Area Postcode. In my project, we focus only in the neighborhoods within the London Post Code. Hence, the London area consists 32 Boroughs and the “City of London“. The dataset exists for free on the web. Link to the dataset is [https://en.wikipedia.org/wiki/List_of_areas_of_London](https://en.wikipedia.org/wiki/List_of_areas_of_London)

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [5]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wikipedia_page = requests.get(wikipedia_link, headers = headers)
#wikipedia_page

# Cleans html file
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
# This extracts the "tbody" within the table where class is "wikitable sortable"
table = soup.find('table', {'class':'wikitable sortable'}).tbody
# table

# Extracts all "tr" (table rows) within the table above
rows = table.find_all('tr')
# rows

# Extracts the column headers, removes and replaces possible '\n' with space for the "th" tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]
# columns

# Converts columns to pd dataframe
df = pd.DataFrame(columns = columns)
# df

In [6]:
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)

        df

In [7]:
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [8]:
df = df.rename(index=str, columns = {'Location': 'Location', 'London\xa0borough': 'Borough', 'Post town': 'Post-town', 'Postcode\xa0district': 'Postcode', 'Dial\xa0code': 'Dial-code', 'OS grid ref': 'OSGridRef'})
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df0 = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))
df1 = df0[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)
df1.head()

Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Angel,Islington,EC1,LONDON
4,Angel,Islington,N1,LONDON


In [9]:
df2 = df1
df21 = df2[df2['Post-town'].str.contains('LONDON')]
df3 = df21[['Location', 'Borough', 'Postcode']].reset_index(drop=True)
df_london = df3

In [10]:
!pip -q install geocoder
import geocoder

In [11]:
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [None]:
postal_codes = df_london['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [None]:
# This will store the London dataframe with coordinates
df_london_loc = df_london

# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
df_london_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_london_loc['Latitude'] = df_london_coordinates['Latitude']
df_london_loc['Longitude'] = df_london_coordinates['Longitude']

In [None]:
df_london_loc.head()

Besides, the Foursquare API will be used to get the geographical location for the London area, as well as leveraging to provision venues information for each neighborhood. Then, the venue will provide the categories needed for the analysis to determine the viability of the selected location for the restaurant. Link for the Foursquare API is [https://developer.foursquare.com/](https://developer.foursquare.com/).

#### How the data used to solve the problem>
Both datasets can consider the venue within the neighborhood of London Postcode area. Within a certain mile radius, the restaurant can be treated as the types of restaurant. Because of the Foursquare API constraint, the limit is 100 venues and the radius is 750 meter for each borough from their given latitude and longitude information. The accessibility and ease of supplies of ingredients are considered. Also, amenities and transportation connection are correlated.
