### Battle of Neighbourhoods (Week 1)

This week 1 assignment is in 2 parts as stated below:
* A description of the problem and a discussion of the background.
* A description of the data and how it will be used to solve the problem

### Part 1: A description of the problem and a discussion of the background

### Description of the Problem

There is a growing number of people in the UK, who are changing their lifestyle by switching their diets and choosing to become vegan. Especially, in London where there is a higher population, of the number people who are vegan than anywhere else in the UK. 

Despite there being many fine restaurants in London, only a few of them specialise in vegan dishes. While many restaurants do have a vegan option, the menu is limited. You can struggle to find good place to dine if you are a vegan.

### Veganism in the UK

* In 2018, the UK launched more vegan products than any nation.
* Orders of vegan meals grew 388% between 2016 and 2018 and they are now the UK’s fastest growing takeaway choice.
* Demand for meat-free food in the UK increased by 987% in 2017 and going vegan was predicted to be the biggest food trend in 2018. 
* The number of vegans in Great Britain quadrupled between 2014 and 2019. In 2019 there were 600,000 vegans, or 1.16% of the population; 276,000 (0.46%) in 2016; and 150,000 (0.25%) in 2014. Sources: Ipsos Mori surveys, commissioned by The Vegan Society, 2016 and 2019, and The Food & You surveys, organised by the Food Standards Agency (FSA) and the National Centre for Social Science Research (Natcen).
* The sign-ups for the Veganuary campaign - where people eat vegan for the month of January - hit record highs in 2020, with over 400,000 people signing up. In comparison, there were 250,000 participants in 2019, 168,500 in 2018; 59,500 in 2017; 23,000 in 2016; 12,800 in 2015; and 3,300 in 2014.

### Discussion of the Background

The most popular vegan dishes are south indian vegan dishes. India host the largest number of vegetarians/vegans anywhere else in the world. They specilaise in vegan/vegetarian dishes rather than, in some restaurants where meat dishes are altered to be catered for vegans. 

My client, a successful restaurant chain in India is looking to expand operation into London. They want to create a high-end vegan restaurant that comes with organic mix and healthy. Their target is not only vegans, but they are pro-organic and healthy eating. They also want to encourage people who are not vegan that vegan dishes can be better tasting and much healthier than to what they normally consume. Since the London demography is so big, my client needs deeper insight from available data in other to decide where to establish the first restaurant. 

### Part 2: A description of the data and how it will be used to solve the problem

### Description of Data

This project will rely on public data from Wikipedia and Foursquare.

#### The Dataset (1)

In this project, London will be used as synonymous to the "Greater London Area" in this project. Within the Greater London Area, there are areas that are within the London Area Postcode. The focus of this project will be the nieghbourhoods are that are within the London Post Code area.

The London Area consists of 32 Boroughs and the "City of London". Our data will be from the link - https://en.wikipedia.org/wiki/List_of_areas_of_London

A sample of the web scrapped of the Wikipedia page for the Greater London Area data is provided below:

In [1]:
import requests
import pandas as pd
from bs4 import BeautifulSoup

In [3]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wikipedia_page = requests.get(wikipedia_link, headers = headers)
wikipedia_page

<Response [200]>

In [4]:
# Cleans html file
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody

In [5]:
# Extracts all "tr" (table rows) within the table above
rows = table.find_all('tr')

In [6]:
# Extracts the column headers, removes and replaces possible '\n' with space for the "th" tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]

In [7]:
# Converts columns to pd dataframe
df = pd.DataFrame(columns = columns)

In [8]:
# Extracts every row with corresponding columns
# Then appends the values to the create pd dataframe "df"

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, 
                  tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), 
                  tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)

        df

In [9]:
df.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [11]:
df = df.rename(index=str, columns = {'Location': 'Location', 'London\xa0borough': 'Borough', 'Post town': 'Post-town', 
                                     'Postcode\xa0district': 'Postcode', 'Dial\xa0code': 'Dial-code', 'OS grid ref': 'OSGridRef'})

In [12]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [13]:
df0 = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))

In [15]:
df1 = df0[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)

In [16]:
df1.head()

Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Angel,Islington,EC1,LONDON
4,Angel,Islington,N1,LONDON


In [17]:
df2 = df1

In [18]:
df21 = df2[df2['Post-town'].str.contains('LONDON')]

In [19]:
df21.head(10)

Unnamed: 0,Location,Borough,Postcode,Post-town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Angel,Islington,EC1,LONDON
4,Angel,Islington,N1,LONDON
5,Church End,Brent,NW10,LONDON
6,Church End,Barnet,N3,LONDON
7,Clapham,"Lambeth, Wandsworth",SW4,LONDON
8,Clerkenwell,Islington,EC1,LONDON
10,Colindale,Barnet,NW9,LONDON


In [20]:
df3 = df21[['Location', 'Borough', 'Postcode']].reset_index(drop=True)

In [21]:
df_london = df3

In [22]:
!pip -q install geocoder
import geocoder

In [27]:
def get_latlng(arcgis_geocoder):
    
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [24]:
postal_codes = df_london['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [25]:
# This will store the London dataframe with coordinates
df_london_loc = df_london

# The obtained coordinates (latitude and longitude) are joined with the dataframe as shown
df_london_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_london_loc['Latitude'] = df_london_coordinates['Latitude']
df_london_loc['Longitude'] = df_london_coordinates['Longitude']

In [26]:
df_london_loc.head()

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",SE2,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",W3,51.51324,-0.26746
2,Acton,"Ealing, Hammersmith and Fulham",W4,51.48944,-0.26194
3,Angel,Islington,EC1,51.52361,-0.09877
4,Angel,Islington,N1,51.52969,-0.08697


The data output df_london_loc shows the data format that will be used for further analysis in week 2.

#### The Dataset (2)

The Foursquare API will be used to obtain the geographical location data for the London Area. These will be used to explore the venues in the neighbourhoods of London. The venues will provide the categories needed for the analysis and eventually, these will be used to determine the viability of selected locations for the restaurant.

### How data will be used to solve the problem

The data from the datasets 1 and 2 will be explored by considering the venues within the neighbourhood of London Postcode areas. These areas' restaurants would be checked in terms of the types of restaurants within a certain mile radius. Due to Foursquare restrictions, the number of venues will be limited to 100 venues. The proximity to transport connection and other amenities would be correlated. Also, accessibility and ease of supplies of organic ingredients would be considered.