# Capstone Project - The Battle of Neighborhoods (Week 1)

## Best London neighbourhood for a coffee shop

### Introduction/Business Problem

UK’s coffee consumption has soared to 95 million cups a day in 2018, up from 70 million in 2008. That’s an increase of 25 million over the last 10 years. This coffee popularity has translated in a big increase in coffee shops or cafés in London, the UK capital and one of the world's global cities. Most of these popular cafés are big chains such as the USA's Starbucks or the UK's own chains Costa Coffee or Café Nero.

But Londoners are starting to get tired of having the same chain coffee every day and are starting to look into more independent and speciality  coffee shops, where they take great care of each coffee cup and they use single origin, ethically sourced and organic coffee beans that are roasted locally by artisan roasters.

London’s obsession with coffee is showing no signs of slowing. Across the city, cafés are constantly popping up, serving up perfectly executed flat whites, espressos and cold-drip Americanos to the masses. 

The aim of this project is to find out which neighbourhood in London would be the best to open a new café. The target audience would be an entrepreneur or group of entrepreneurs that are looking to set up their new independent café in London. This project would help them to find out which are the neighbourhoods with more and less cafés so they can open their café in the less saturated neighbourhood.


### Data

This section will describe the data that will be used to solve the problem. First we are going to load the libraries needed:

In [3]:
# library to handle data in a vectorized manner
import numpy as np

# library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# install the Geocoder library to get location data
!conda install -c conda-forge geocoder --yes
import geocoder

#library for processing XML
!conda install -c conda-forge lxml --yes
import lxml

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    ratelim:         0.1.6-py_2        conda-forge

The following packages will be UPDATED:

    

##### Dataset 1

London is divided in 32 London boroughs and "the City of London" which is the central part of London or downtown. Each borough then has several neighbourhoods (although there are some neighbourhoods that may expand several boroughs).

Our London neighbourhood and borough data will come for the Wikipedia page "List of areas of London": https://en.wikipedia.org/wiki/List_of_areas_of_London

The data is presented in a Wikipedia table and we transform it to a pandas data frame for our analysis:

In [4]:
#The wikipedia table is extracted into a panda dataframe
london_df=pd.read_html("https://en.wikipedia.org/wiki/List_of_areas_of_London")[1]

#Rename columns
london_df.columns = ['Neighbourhood', 'Borough', 'Post town', 'Postcode district', 'Dial code', 'OS grid ref']

# Remove Borough reference numbers with []
london_df['Borough']= london_df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

# Remove Neighbourhood text in between parentheses
london_df['Neighbourhood']=london_df['Neighbourhood'].str.replace(r"\([^()]*\)","")


Now we have our pandas data frame with the all the London neighbourhoods and boroughs:

In [5]:
london_df

Unnamed: 0,Neighbourhood,Borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,020,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",020,TQ205805
2,Addington,Croydon,CROYDON,CR0,020,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,020,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728
5,Aldborough Hatch,Redbridge,ILFORD,IG2,020,TQ455895
6,Aldgate,City,LONDON,EC3,020,TQ334813
7,Aldwych,Westminster,LONDON,WC2,020,TQ307810
8,Alperton,Brent,WEMBLEY,HA0,020,TQ185835
9,Anerley,Bromley,LONDON,SE20,020,TQ345695


##### Dataset 2

To obtain the coordinate data of the London neighbourhoods, the Geocoder package is used to get the latitude and longitude data for each neighbourhood that is needed for the Foursquare API. 

The Geocoder location data will be used to enrich the data frame of London neighbourhoods obtained from Wikipedia above.

In [6]:
# Defining a function to get the coordinates of the different London neighbourhoods
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop is used to continuously run until all the neighbourhood coordinates are geocoded
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Geocoder ends here

In [7]:
#We call the get_latlng function that we defined earlier passing all the London neighbourhoods
london_neighbourhoods=london_df['Neighbourhood']

coordinates = [get_latlng(neighbourhood) for neighbourhood in london_neighbourhoods.tolist()]


In [9]:
# The obtained coordinates (latitude and longitude) are joined with the london dataframe
coordinates_df = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])

london_df['Latitude'] = coordinates_df['Latitude']
london_df['Longitude'] = coordinates_df['Longitude']

#Now the london_df data frame has Neighbourhood and Borough enriqued with Latitude and Longitude data from Geocoder
london_df

Unnamed: 0,Neighbourhood,Borough,Post town,Postcode district,Dial code,OS grid ref,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,020,TQ465785,51.49245,0.12127
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",020,TQ205805,51.51324,-0.26746
2,Addington,Croydon,CROYDON,CR0,020,TQ375645,51.428124,-0.044685
3,Addiscombe,Croydon,CROYDON,CR0,020,TQ345665,51.472745,-0.203324
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",020,TQ478728,51.48511,-0.08241
5,Aldborough Hatch,Redbridge,ILFORD,IG2,020,TQ455895,54.09199,-1.38166
6,Aldgate,City,LONDON,EC3,020,TQ334813,51.513308,-0.077762
7,Aldwych,Westminster,LONDON,WC2,020,TQ307810,51.513307,-0.117092
8,Alperton,Brent,WEMBLEY,HA0,020,TQ185835,51.526871,-0.20644
9,Anerley,Bromley,LONDON,SE20,020,TQ345695,51.41233,-0.06539


##### Dataset 3

The Foursquare API will be used to search for a specific venue category (in our case cafés or coffee shops) for the geographical location data for each London neighbourhood.

The use of the Foursquare API to find venues will be part of week 2 assignment.