<h4>Data reading and processing</h4>

The first step is to import all the necessary packages

In [133]:
import pandas as pd 
import requests
import json
from pandas.io.json import json_normalize 

Second Step is to download and read the csv file available at https://www.doogal.co.uk/UKPostcodesCSV.ashx?area=London.

In [8]:
df = pd.read_csv('/Users/riccardo/Downloads/london postcodes.csv')

Let's display the first few lines

In [12]:
df.head()

Unnamed: 0,Postcode,In Use?,Latitude,Longitude,Easting,Northing,Grid Ref,County,District,Ward,...,Quality,User Type,Last updated,Nearest station,Distance to station,Postcode area,Postcode district,Police force,Water company,Plus Code
0,BR1 1AA,Yes,51.401546,0.015415,540291,168873,TQ402688,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.218257,BR,BR1,Metropolitan Police,Thames Water,9F32C228+J5
1,BR1 1AB,Yes,51.406333,0.015208,540262,169405,TQ402694,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley North,0.253666,BR,BR1,Metropolitan Police,Thames Water,9F32C248+G3
2,BR1 1AD,No,51.400057,0.016715,540386,168710,TQ403687,Greater London,Bromley,Bromley Town,...,1,1,2019-11-23,Bromley South,0.044559,BR,BR1,Metropolitan Police,,9F32C228+2M
3,BR1 1AE,Yes,51.404543,0.014195,540197,169204,TQ401692,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley North,0.462939,BR,BR1,Metropolitan Police,Thames Water,9F32C237+RM
4,BR1 1AF,Yes,51.401392,0.014948,540259,168855,TQ402688,Greater London,Bromley,Bromley Town,...,1,0,2019-11-23,Bromley South,0.227664,BR,BR1,Metropolitan Police,Thames Water,9F32C227+HX


As you can see there is a lot information in this dataset ranging from postcode to police department of compentence.

We don't need all this information, therefore we need to clean the dataset and extract only the relevant records/columns. First of all let's remove all those records which refer to postcodes no longer in use

In [13]:
df2 = df[df['In Use?']=='Yes']

Then we noticed that the postcode is at a very low level, basically it refers to a building. this is too much for us. Let's group by 'District' and use the average latitute and longitute as a proxy for a general area in the district.

In [27]:
df3=df2.groupby('District').mean().reset_index()

Finally let's extract only the three columns we are interested in: District, Latitute, Longitude. Let's also disply the dataset.

In [135]:
df4= df3[['District','Latitude','Longitude']]

In [136]:
df4

Unnamed: 0,District,Latitude,Longitude
0,Barking and Dagenham,51.546853,0.12662
1,Barnet,51.60868,-0.206189
2,Bexley,51.46,0.135774
3,Brent,51.555335,-0.259383
4,Bromley,51.391456,0.030233
5,Camden,51.536437,-0.14605
6,City of London,51.514622,-0.092233
7,Croydon,51.368931,-0.090923
8,Ealing,51.518961,-0.324769
9,Enfield,51.641387,-0.080402


To check if the dataset if correctly structured let's extract one district coordinates ( Will pick Kensigton and Chelsea just because is one of my favourite)

In [101]:
latitude = df4.loc[19, 'Latitude'] # neighborhood latitude value
longitude = df4.loc[19, 'Longitude'] # neighborhood longitude value
neighborhood_name = df4.loc[19, 'District'] # neighborhood name
search_query ='coffee'
print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Kensington and Chelsea are 51.483582027395656, -0.07766932743875905.


<h4> Connection to Foursquare </h4>

Now it's time to set up the connection with the other dataset we need, Foursquare. To do this I need:

 - techincal details ( client id, secret and API version)
 - query details ( query string as if you were to type it in a browser, limit of results, search area mesure)

In [102]:
CLIENT_ID = '10P0RJRLTA05UIMMWSZH0GQ4NBMPIA1HL0ISZYWLGHVLT0LB' # your Foursquare ID
CLIENT_SECRET = 'ZZ5F1IOT0WSTZUIWOR4OYC3IZEA12ZAEZC5C523BK1CCBWSR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
search_query ='coffee'
LIMIT = 30
radius = 500
num_results = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 10P0RJRLTA05UIMMWSZH0GQ4NBMPIA1HL0ISZYWLGHVLT0LB
CLIENT_SECRET:ZZ5F1IOT0WSTZUIWOR4OYC3IZEA12ZAEZC5C523BK1CCBWSR


Using the above information we can set up the url to pass to Foursquare

In [103]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=10P0RJRLTA05UIMMWSZH0GQ4NBMPIA1HL0ISZYWLGHVLT0LB&client_secret=ZZ5F1IOT0WSTZUIWOR4OYC3IZEA12ZAEZC5C523BK1CCBWSR&ll=51.49935643452255,-0.18864078550057511&v=20180605&query=coffee&radius=500&limit=30'

...and get the results of our query

In [104]:
results = requests.get(url).json()

<h4> Main code for the analysis </h4>

Next step is the central step of the analysis. The following code loops through all districts, calls Forsquare with the average coordinates computed before, and extract both the number of coffee shop results and Starbucks coffee shop in each district.

Few important remarks:

- the query to Foursquare is set up as above
- i inserted a try catch because i noticed for some district i didn't get answer at times
- the json results need to be cleaned and restructured using json_normalize
- the loop skips using a continue statment all those district that returns 0 results for the query 'coffee'
- I am displying the results using simple print statements

In [132]:
CLIENT_ID = '10P0RJRLTA05UIMMWSZH0GQ4NBMPIA1HL0ISZYWLGHVLT0LB' # your Foursquare ID
CLIENT_SECRET = 'ZZ5F1IOT0WSTZUIWOR4OYC3IZEA12ZAEZC5C523BK1CCBWSR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 30

radius = 500
num_results = 100
search_query ='coffee'
print('Number of Starbucks registred on Forsquare in:')
print('')
for i in df4.index:
    latitude = df4.loc[i, 'Latitude'] # neighborhood latitude value
    longitude = df4.loc[i, 'Longitude'] # neighborhood longitude value
    neighborhood_name = df4.loc[i, 'District'] # neighborhood name
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    try:
         results = requests.get(url).json()
    except Exception:
        continue
    results = requests.get(url).json()
    venues = results['response']['venues']
    dataframe = json_normalize(venues)
    if len(dataframe)<1:
        continue  
    #print(df4.loc[i,'District'])
    print(' {},  {} Starbucks out of {} coffee shops.'.format(df4.loc[i,'District'],len(dataframe[dataframe['name'].str.contains("Starbucks")]),len(dataframe)))


Number of Starbucks registred on Forsquare in:

 Camden,  2 Starbucks out of 19 coffee shops.
 City of London,  2 Starbucks out of 30 coffee shops.
 Croydon,  0 Starbucks out of 1 coffee shops.
 Greenwich,  0 Starbucks out of 1 coffee shops.
 Hackney,  0 Starbucks out of 5 coffee shops.
 Hammersmith and Fulham,  0 Starbucks out of 5 coffee shops.
 Haringey,  0 Starbucks out of 3 coffee shops.
 Harrow,  0 Starbucks out of 3 coffee shops.
 Hounslow,  0 Starbucks out of 2 coffee shops.
 Islington,  1 Starbucks out of 8 coffee shops.
 Kensington and Chelsea,  4 Starbucks out of 8 coffee shops.
 Lambeth,  0 Starbucks out of 3 coffee shops.
 Lewisham,  0 Starbucks out of 1 coffee shops.
 Merton,  0 Starbucks out of 3 coffee shops.
 Newham,  0 Starbucks out of 2 coffee shops.
 Sutton,  0 Starbucks out of 1 coffee shops.
 Tower Hamlets,  0 Starbucks out of 1 coffee shops.
 Waltham Forest,  0 Starbucks out of 2 coffee shops.
 Westminster,  1 Starbucks out of 30 coffee shops.


<h4> Conclusions </h4>

I will be brief and list my conclusions in a series of bullet points:

- The districts with the most coffee shops are the City of London and Westminster
- The district with the highest number of Starbuck is Kensignton
- The choice of which district is the best for a new independent coffee shop is hard because:
    - Foursquare seems not to have that many results for London
    - Not necesarily the number of Starbucks is a good indicator of the taste of the customers
    - Out of 32 districts there are many without results which seem suspicious
- All in all however we have some suggestions:
    - A new indipendent coffee shop should avoid central london post codes ( city, wesminster)
    - The best pick is a not to periferical area where the number of Starbucks is limited,but with a relative high number of coffee shops to indicate a potential client base
    - My choice would therefore be Islington as it has all these characteristics