<h1> The Battle of the Neighbourhoods</h1>
<p> This aim of this notebook is to outline the proposal for the Coursera Capstone Project, the data that is needed for the project, and the report itself.</p>

<h2> Introduction </h2>
<p> London is the largest city within the UK, and is renowned for having a bustling nightlife and having plenty of resaurants, cafés and bistros. For this reason, many people who want to create a successful business wish to move to London, and have to come to terms with the struggle of competing with thousands of other businesses just like their own.</p>

<p> This report has been produced to help a person who is aiming to open a restaurant in the London area, aiming to work out which London Borough would be most suitable. </p> <p> 'Suitability' in this report will be defined by the demand and supply ratio. For example, it is important when a restaurant is opened that there is demand, so that the owner gets customers, and hence the business generates profit. However, it is also important to be able to choose an area which is not too densely populated with restaurants that are similar to the one that is going to be opened. Hence, this report will determine which boroughs of London seem to have the top venues being restaurants, and therefore have demand for food places, and will also aim to identify which types of restaurants would be most successful in those Boroughs.</p>  

<h2> Data </h2>
<p> The data that will be used for this will be scraped from wikipedia using BeautifulSoup, to obtain a list of all of the London Borough names.</p><p> Then, FourSquare will be used to identify the top 10 venues for each borough. After these are obtained, the boroughs will be analysed in order to determine which have the most restaurants within them. </p><p> After these are determined, then the boroughs which have the most restaurants in them will be analysed, to see which types of restaurants there are. This can then be used to identify a gap in the market, and hence identify where someone should open up a food place, and which type of food they should open. </p> 

<h2> Methodology </h2>

In [1]:
# importing all of the necessary libraries

import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re

In [2]:
# obtaining the url for the wikipedia page to get the name of the boroughs and the coordinates

page = requests.get("https://en.wikipedia.org/wiki/List_of_London_boroughs")
soup = BeautifulSoup(page.content, 'html.parser')

In [3]:
# using beautiful soup to scrape the information needed from the wikiedia page and put it in a pandas dataframe

data = pd.DataFrame(columns=['Borough','Latitude','Longitude'], index=range(32))

for i in range(1, len(soup.findAll('table')[0].tbody.findAll('tr')[1:])+1):
    row = soup.findAll('table')[0].tbody.findAll('tr')[i]
    boro_name = row.findAll('td')[0].text
    if 'Note' in boro_name:
        data.iloc[i-1,0] = boro_name[:boro_name.find('Note')-1]
    else:
        data.iloc[i-1,0] = boro_name[:-1]
        
    coords = row.findAll('td')[8].text
    data.iloc[i-1,1] = coords[[m.start() for m in re.finditer(r"/",coords)][1]+2:[m.start() for m in re.finditer(r"/",coords)][1]+9]
    data.iloc[i-1,2] = coords[[m.start() for m in re.finditer(r";",coords)][0]+2:[m.start() for m in re.finditer(r";",coords)][0]+9]

In [4]:
# making the coordinates numerical

data['Latitude'] = data['Latitude'].astype(float)
for i in range(len(data['Longitude'])):
               if data['Longitude'][i].endswith('\ufeff'):
                   data['Longitude'][i] = data['Longitude'][i][:6]
data['Longitude'] = data['Longitude'].astype(float)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [5]:
# looking at the dataframe created, containing the 32 boroughs of London

data

Unnamed: 0,Borough,Latitude,Longitude
0,Barking and Dagenham,51.5607,0.1557
1,Barnet,51.6252,-0.1517
2,Bexley,51.4549,0.1505
3,Brent,51.5588,-0.2817
4,Bromley,51.4039,0.0198
5,Camden,51.529,-0.1255
6,Croydon,51.3714,-0.0977
7,Ealing,51.513,-0.3089
8,Enfield,51.6538,-0.0799
9,Greenwich,51.4892,0.0648


In [6]:
# The code was removed by Watson Studio for sharing.

In [7]:
# creating a map of all of the boroughs of London

!conda install -c conda-forge folium=0.5.0
import folium

url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address=London'.format(API_key)
response = requests.get(url).json()
geographical_data = response['results'][0]['geometry']['location'] # get geographical coordinates
lati =  geographical_data['lat']
long = geographical_data['lng']

map_london = folium.Map(location=[lati, long], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(data['Latitude'], data['Longitude'], data['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    altair:  2.2.2-py_0 conda-forge
    blas:    1.0-mkl               
    branca:  0.3.0-py_0 conda-forge
    folium:  0.5.0-py_0 conda-forge
    vincent: 0.4.4-py_1 conda-forge

blas-1.0-mkl.t 100% |################################| Time: 0:00:00   6.42 MB/s
branca-0.3.0-p 100% |################################| Time: 0:00:00 758.54 kB/s
vincent-0.4.4- 100% |################################| Time: 0:00:00 861.88 kB/s
altair-2.2.2-p 100% |################################| Time: 0:00:00 969.19 kB/s
folium-0.5.0-p 100% |################################| Time: 0:00:00   1.45 MB/s


In [8]:
# The code was removed by Watson Studio for sharing.

In [89]:
#creating a function to get the latitudes and longitudes for the neighbourhoods

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [90]:
boro_venues = getNearbyVenues(names=data['Borough'],
                                   latitudes=data['Latitude'],
                                   longitudes=data['Longitude']
                                  )

In [91]:
print('There are {} uniques categories.'.format(len(boro_venues['Venue Category'].unique())))

There are 237 uniques categories.


In [92]:
# one hot encoding
boro_onehot = pd.get_dummies(boro_venues[['Venue Category']], prefix="", prefix_sep="")

# add Borough column back to dataframe
boro_onehot['Borough'] = boro_venues['Borough'] 

# move Borough column to the first column
fixed_columns = [boro_onehot.columns[-1]] + list(boro_onehot.columns[:-1])
boro_onehot = boro_onehot[fixed_columns]

In [93]:
boro_grouped = boro_onehot.groupby('Borough').mean().reset_index()

In [94]:
boro_grouped.shape

(32, 238)

<p> Here we display a table showing the boroughs in order of the borough which has mostly restaurants, down to the borough with the least number of restaurants. This enables us to see that Hammersmith is the most popular.</p><p. However, we need to break this down further in order to see where a restaurant should be opened up.</p>

In [101]:
frequencies = []
for boro in boro_grouped['Borough']:   
    temp = boro_grouped[boro_grouped['Borough'] == boro].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    restaurants = temp[temp['venue'].apply(lambda x: x.endswith('Restaurant'))]
    frequencies.append(restaurants['freq'].sum())

restaurants  = pd.DataFrame(index = boro_grouped['Borough'])
restaurants['restaurant_freq'] = frequencies
restaurants.sort_values('restaurant_freq', ascending=False)

Unnamed: 0_level_0,restaurant_freq
Borough,Unnamed: 1_level_1
Harrow,0.39
Merton,0.32
Brent,0.317647
Haringey,0.29
Hounslow,0.27
Barnet,0.27
Croydon,0.24
Hammersmith and Fulham,0.24
Redbridge,0.231707
Bromley,0.23


<p> Here we see the breakdown of the boroughs with the most restaurants as top venues, and break these down in to the type of restaurant. This is so that we are able to see where there is a gap in the market. </p>

In [102]:
for boro in restaurants.sort_values('restaurant_freq').index: 
    print('-----'+boro+'-----')
    temp = boro_grouped[boro_grouped['Borough'] == boro].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    restaurants = temp[temp['venue'].apply(lambda x: x.endswith('Restaurant'))]
    rest_zero = restaurants[restaurants['freq']!=0.0]
    print(rest_zero.sort_values('freq',ascending=False),'\n')

-----Tower Hamlets-----
                          venue  freq
122          Italian Restaurant  0.03
12        Australian Restaurant  0.01
106         Hawaiian Restaurant  0.01
116           Indian Restaurant  0.01
123         Japanese Restaurant  0.01
141  Modern European Restaurant  0.01
215            Sushi Restaurant  0.01
229       Vietnamese Restaurant  0.01 

-----Bexley-----
                        venue  freq
122        Italian Restaurant  0.05
72       Fast Food Restaurant  0.03
4         American Restaurant  0.01
66         English Restaurant  0.01
99           Greek Restaurant  0.01
135  Mediterranean Restaurant  0.01 

-----Southwark-----
                     venue  freq
183             Restaurant  0.03
116      Indian Restaurant  0.02
122     Italian Restaurant  0.02
215       Sushi Restaurant  0.02
12   Australian Restaurant  0.01
67    Ethiopian Restaurant  0.01
69      Falafel Restaurant  0.01
99        Greek Restaurant  0.01
229  Vietnamese Restaurant  0.01 

-----Gree

<h2> Results </h2>
<p> From the above analysis, we are able to see that there is a relatively large range ni the number of restaurants that are popular venues within each of the boroughs. HOwever, delving deeper in to the analysis, we are able to see that the breakdown of the types of restaurants in each of the boroughs fails to provide any type of insight as to which boroughs are saturated with different types of restaurants. Consequently, it is difficult to provde suggestions for a business person as to which borough they should open a restaurant in. </p>
<p> It can be seen from the results that Enfield, Havering and Bexley have the fewest different types of restaurants, although this does not include the generic 'restaurant' title providing insight.</p>

<h2>Discussion</h2>
<p> From the above, it is unclear as to what any recommendations should be from the report. Although there is a clear difference in the number of restaurants that occupy each of the boroughs, the breakdown of these types of restaurants fails to provide a clear path for a potential business person.</p> <p> It could be argued that further analysis would provide insight, looking at particular neighbourhoods within each borough, in order to gain more insight in to the specific areas.</p><p> The only recommendation that can be solidly produced from this report is that further research is needed in order to choose a borough / area confidently in which to open a restaurant.</p>

<h2>  Conclusion </h2>
<p> The main conclusion of this report is that it is difficult to identify gaps in the market for business people to venture in to. A lot of data would need to be taken from FourSquare, and with a lack of business account, this can be difficult to attain to. </p>