# Newcastle Capstone Project: Find the best region in Newcastle Upon Tyne to set up a restaurant delivery serviceÂ¶
By Charlie Witty

## 1. Load In all of the data from the various sources

## 1.1 Newcastle Upon Tyne areas broken down by Postcode

In [1]:
# Import Necessary libraries
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

data = pd.read_csv("https://raw.githubusercontent.com/cwitty255/Coursera_Capstone/main/income_by_postcode.csv", error_bad_lines= False,encoding = 'unicode_escape')


# Preview the first 5 lines of the loaded data 
data.head()

Unnamed: 0,PostalCode,PostTown,NetIncome,Population,Coverage,Local Authority Area
0,NE1,NEWCASTLE UPON TYNE,31400,174894,City Centre,Newcastle upon Tyne
1,NE2,NEWCASTLE UPON TYNE,39400,296275,"Jesmond,Â Spital Tongues",Newcastle upon Tyne
2,NE3,NEWCASTLE UPON TYNE,24800,275023,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",Newcastle upon Tyne
3,NE4,NEWCASTLE UPON TYNE,25100,109832,"Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell",Newcastle upon Tyne
4,NE5,NEWCASTLE UPON TYNE,39000,48390,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Newcastle upon Tyne


## 1.1.1 Newcastle Upon Tyne geospatial cooridinates merged to Newcastle Upon Tyne Postal Code Data

In [2]:
# Load and Read in Newcastle Upon Tyne geospatial cooridinates
  
geo_data = pd.read_csv("https://raw.githubusercontent.com/cwitty255/Coursera_Capstone/main/geo_data.csv", error_bad_lines= False,encoding = 'unicode_escape')

# Merge the Newcastle Upon Tyne data with geo cooridinate data
geo_data2 = pd.merge(data, geo_data, on='PostalCode', how='inner')

# display the new dataframe
geo_data2.head()

Unnamed: 0,PostalCode,PostTown,NetIncome,Population,Coverage,Local Authority Area,Latitude,Longitude
0,NE1,NEWCASTLE UPON TYNE,31400,174894,City Centre,Newcastle upon Tyne,54.967722,-1.615787
1,NE2,NEWCASTLE UPON TYNE,39400,296275,"Jesmond,Â Spital Tongues",Newcastle upon Tyne,54.97565,-1.597167
2,NE3,NEWCASTLE UPON TYNE,24800,275023,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",Newcastle upon Tyne,55.004963,-1.619512
3,NE4,NEWCASTLE UPON TYNE,25100,109832,"Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell",Newcastle upon Tyne,54.975084,-1.640244
4,NE5,NEWCASTLE UPON TYNE,39000,48390,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Newcastle upon Tyne,54.991248,-1.713915


## 1.2 National Average Income in the UK after Tax

- Here we must also manually download this from Statista and load them. https://www.statista.com/statistics/1002964/average-full-time-annual-earnings-in-the-uk/ British families and individuals had a median after-tax income of Â£31,461 in 2020

## 1.3 List of Restaurants and Venues in Newcastle Upon Tyne which we can target for our delivery service

In [3]:
# At this point you will need to load your FourSquare Credentials

CLIENT_ID = 'XI4QDFVYVPSUYL3YDXWLM5RTJY3QSOC5Z4RGWSVDRHLFOKM5' # Foursquare ID

CLIENT_SECRET = 'DNOKVOIS2DKMZPF1T4NNPXYFB1REGHZ5MGFLMXQCJGYQFXFE' # Foursquare Secret

VERSION = '20180605' # API version

In [4]:
# Within our dataframe we want to start exploring the available areas

# Import pandas library to transform JSON files into a dataframe and handle requests

from pandas.io.json import json_normalize
import requests 

# Define radius covered by the API and a limit for the number of venues which are returned

radius = 500
VENUE_LIMIT = 200

In [5]:
def get_venues_near(names, latitudes, longitudes, radius=500):
    
    venues=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # Define API request URL then call a GET request
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, VENUE_LIMIT)
        
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Specify relevant information for each nearby venue to be returned
        venues.append([(name, lat, lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    near_venues = pd.DataFrame([item for venue_list in venues for item in venue_list])
    near_venues.columns = ['Coverage', 
                  'Coverage Latitude', 
                  'Coverage Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(near_venues)

In [6]:
# Newcastle Upon Tyne Areas
NE_data = geo_data2
NE_data.head()

Unnamed: 0,PostalCode,PostTown,NetIncome,Population,Coverage,Local Authority Area,Latitude,Longitude
0,NE1,NEWCASTLE UPON TYNE,31400,174894,City Centre,Newcastle upon Tyne,54.967722,-1.615787
1,NE2,NEWCASTLE UPON TYNE,39400,296275,"Jesmond,Â Spital Tongues",Newcastle upon Tyne,54.97565,-1.597167
2,NE3,NEWCASTLE UPON TYNE,24800,275023,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",Newcastle upon Tyne,55.004963,-1.619512
3,NE4,NEWCASTLE UPON TYNE,25100,109832,"Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell",Newcastle upon Tyne,54.975084,-1.640244
4,NE5,NEWCASTLE UPON TYNE,39000,48390,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Newcastle upon Tyne,54.991248,-1.713915


### 1.3.1 Retrieve all venues in Newcastle Upon Tyne

In [7]:
# Call all venues
NE_venues = get_venues_near(names=NE_data['Coverage'], latitudes=NE_data['Latitude'],longitudes=NE_data['Longitude'])

City Centre
Jesmond,Â Spital Tongues
Gosforth,Â Fawdon,Â Kingston Park,Â Kenton
Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell
Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,
Walker,Â Byker,Â Heaton,Â Walkergate
Heaton, Benton
Gateshead,Â Bensham
Low Fell, Springwell
Felling, Whitehills Estate,Â Leam Lane,Â Pelaw,Â Bill Quay
Dunston, Metro Centre, Team Valley,Â Kibblesworth
Killingworth,Â Longbenton
Airport, Wideopen, Dinnington, Great Park (West),Â Woolsington
Whickham, Sunniside,Â Burnopfield
Chopwell, Western Chopwell Wood
Stamfordham
Byrness, Otterburn
Ponteland
Blaydon, Winlaton
Bedlington, Hartford Bridge
Cramlington,Â Seghill
Blyth, Newsham,Â Cowpen,Â Cambois
Monkseaton,Â Earsdon,Â New Hartley, Holywell,Â Seaton Delaval
Whitley Bay,Â Seaton Sluice
Shiremoor,Â West Allotment, Backworth, Holystone, Murton Village
Battle Hill,Â Willington,Â Wallsend, North Tyne Tunnel
North Shields, Royal Quays, Billy Mill,Â New York
Marden, Tynemouth,Â Cullercoats
Hebburn
Jarrow, Fellgate, Sou

In [8]:
NE_venues.groupby('Coverage').count()

Unnamed: 0_level_0,Coverage Latitude,Coverage Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Coverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Acomb,Â Hexhamshire",20,20,20,20,20,20
"Airport, Wideopen, Dinnington, Great Park (West),Â Woolsington",5,5,5,5,5,5
BT Group[6],12,12,12,12,12,12
"Battle Hill,Â Willington,Â Wallsend, North Tyne Tunnel",2,2,2,2,2,2
"Bedlington, Hartford Bridge",8,8,8,8,8,8
"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",4,4,4,4,4,4
"Blyth, Newsham,Â Cowpen,Â Cambois",6,6,6,6,6,6
Boldon Colliery,3,3,3,3,3,3
"Byrness, Otterburn",1,1,1,1,1,1
"Chopwell, Western Chopwell Wood",4,4,4,4,4,4


In [9]:
# Select restaurants from Venue Categories

print('Distinctive Venue Categories:')
list(NE_venues['Venue Category'].unique())

Distinctive Venue Categories:


['Coffee Shop',
 'Pub',
 'Burrito Place',
 'Indian Restaurant',
 'Performing Arts Venue',
 'Beer Bar',
 'Bar',
 'Hotel',
 'Pizza Place',
 'Thai Restaurant',
 'Fried Chicken Joint',
 'Music Venue',
 'Comic Shop',
 'English Restaurant',
 'Noodle House',
 'Steakhouse',
 'Ice Cream Shop',
 'Greek Restaurant',
 'Cocktail Bar',
 'Science Museum',
 'Bubble Tea Shop',
 'Bowling Alley',
 'Asian Restaurant',
 'Opera House',
 'Gay Bar',
 'Hookah Bar',
 'Italian Restaurant',
 'Furniture / Home Store',
 'Turkish Restaurant',
 'Dance Studio',
 'Nightclub',
 'Plaza',
 'Sports Bar',
 'Art Gallery',
 'Indie Movie Theater',
 'Museum',
 'Lounge',
 'Farm',
 'Fast Food Restaurant',
 'Grocery Store',
 'Skate Park',
 'CafÃ©',
 'Seafood Restaurant',
 'Supermarket',
 'Park',
 'Pharmacy',
 'Deli / Bodega',
 'Stationery Store',
 'Bed & Breakfast',
 'Sandwich Place',
 'Fish & Chips Shop',
 'Middle Eastern Restaurant',
 'Athletics & Sports',
 'Burger Joint',
 'Bus Stop',
 'Restaurant',
 'Chinese Restaurant',
 'Cli

## 1.3.2 Specifically add Restaurants as Venue Categories

In [10]:
# Here I have manually selected restaurants as features from the distinctive venue list which will be used for clustering similiarity # Note I have listed this information alphabetically for convenience

restaurant_list = ['Afghan Restaurant', 'Airport Food Court''American Restaurant', 'Arepa Restaurant', 'Asian Restaurant', 'Bar', 'Belgian Restaurant',
                   'Bistro', 'Brazilian Restaurant Gluten-free Restaurant', 'Breakfast Spot', 'Burger Joint', 'Burrito Place', 'CafÃ©', 'Cajun / Creole Restaurant',
                   'Caribbean Restaurant', 'Chinese Restaurant', 'Coffee Shop', 'Comfort Food Restaurant', 'Creperie', 'Cuban Restaurant', 'Deli / Bodega',
                   'Dim Sum Restaurant', 'Diner', 'Doner Restaurant', 'Dumpling Restaurant', 'Eastern European Restaurant', 'Empanada Restaurant',
                   'Ethiopian Restaurant', 'Falafel Restaurant', 'Fast Food Restaurant', 'Filipino Restaurant', 'Fish & Chips Shop', 'Food', 'Food & Drink Shop',
                   'Food Court', 'Food Truck', 'French Restaurant', 'Fried Chicken Joint', 'Gastropub', 'German Restaurant', 'Gourmet Shop', 'Greek Restaurant',
                   'Hakka Restaurant', 'Hotpot Restaurant', 'Ice Cream Shop', 'Indian Restaurant', 'Indonesian Restaurant', 'Irish Pub', 'Japanese Restaurant',
                   'Jewish Restaurant', 'Korean Restaurant', 'Latin American Restaurant', 'Mac & Cheese Joint', 'Malay Restaurant', 'Mediterranean Restaurant BBQ Joint',
                   'Mexican Restaurant', 'Middle Eastern Restaurant', 'Modern European Restaurant', 'Molecular Gastronomy Restaurant', 'Molecular Gastronomy Restaurant',
                   'New American Restaurant', 'Noodle House', 'Persian Restaurant', 'Pizza Place', 'Polish Restaurant', 'Portuguese Restaurant', 'Poutine Place',
                   'Pub Italian Restaurant', 'Ramen Restaurant', 'Restaurant', 'Sake Bar', 'Salad Place', 'Sandwich Place', 'Seafood Restaurant', 'Snack Place',
                   'Soup Place', 'South American Restaurant', 'Southern / Soul Food Restaurant', 'Sports Bar', 'Steakhouse', 'Sushi Restaurant', 'Taco Place',
                   'Taiwanese Restaurant', 'Tapas Restaurant', 'Thai Restaurant', 'Vegetarian / Vegan Restaurant', 'Vietnamese Restaurant', 'Wings Joint']


# Rename the columns so that they are in line with the naming scheme
# Join the dataframes

restaurant_pd = pd.DataFrame(restaurant_list)
restaurant_pd = restaurant_pd.rename(columns={0:'Venue Category'})
NE_new = pd.merge(NE_venues, restaurant_pd, on='Venue Category', how='right')

# Display the Dataframe using the count method 

NE_new.groupby('Coverage').count()


Unnamed: 0_level_0,Coverage Latitude,Coverage Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Coverage,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Acomb,Â Hexhamshire",5,5,5,5,5,5
BT Group[6],3,3,3,3,3,3
"Battle Hill,Â Willington,Â Wallsend, North Tyne Tunnel",1,1,1,1,1,1
"Bedlington, Hartford Bridge",2,2,2,2,2,2
"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",2,2,2,2,2,2
"Chopwell, Western Chopwell Wood",2,2,2,2,2,2
City Centre,18,18,18,18,18,18
Department for Work and Pensions (Central Office)[6],2,2,2,2,2,2
Department for Work and PensionsÂ (Earlsway)[6],1,1,1,1,1,1
"Dunston, Metro Centre, Team Valley,Â Kibblesworth",1,1,1,1,1,1


## 1.3.3 Count restaurants and Implement OneHot encoding

In [11]:
# one hot encoding
NE_new_onehot = pd.get_dummies(NE_new[['Venue Category']], prefix="", prefix_sep="")

# add Coverage column back to dataframe
NE_new_onehot['Coverage'] = NE_new['Coverage'] 

# move Coverage column to the first column
fixed_columns = [NE_new_onehot.columns[-1]] + list(NE_new_onehot.columns[:-1])
NE_new_onehot = NE_new_onehot[fixed_columns]

NE_new_onehot.head()

Unnamed: 0,Coverage,Afghan Restaurant,Airport Food CourtAmerican Restaurant,Arepa Restaurant,Asian Restaurant,Bar,Belgian Restaurant,Bistro,Brazilian Restaurant Gluten-free Restaurant,Breakfast Spot,...,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,City Centre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,City Centre,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Gateshead,Â Bensham",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [12]:
# Implement an analysis of each coverage area
# SHow first 5 items from the tuple of array dimensions.
NE_new_grouped = NE_new_onehot.groupby('Coverage').mean().reset_index()
NE_new_grouped.shape
NE_new_grouped.head()

Unnamed: 0,Coverage,Afghan Restaurant,Airport Food CourtAmerican Restaurant,Arepa Restaurant,Asian Restaurant,Bar,Belgian Restaurant,Bistro,Brazilian Restaurant Gluten-free Restaurant,Breakfast Spot,...,Sports Bar,Steakhouse,Sushi Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,"Acomb,Â Hexhamshire",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,BT Group[6],0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Battle Hill,Â Willington,Â Wallsend, North Tyne ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Bedlington, Hartford Bridge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 2. Clustering

## 2.1 Use silhouette score to find optimal number of clusters to segment the data

In [13]:
# Import libraries from sklearn for KMeans and Silhouette Score
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score # For interpreting and validating consistency within clusters of data. 
import numpy as np

# Silhouette score is used to find ideal number of clusters for segmenting data

NE_grouped_clustering = NE_new_grouped.drop('Coverage', 1)

kclusters = np.arange(2,10)
k_results = {}
for size in kclusters:
    model = KMeans(n_clusters = size).fit(NE_grouped_clustering)
    predictions = model.predict(NE_grouped_clustering)
    k_results[size] = silhouette_score(NE_grouped_clustering, predictions)

optimal_size = max(k_results, key=k_results.get)
optimal_size

7

## 2.2 Apply K-Means, segment data into clusters and generate labels

In [14]:
kclusters = optimal_size

# run k-means clustering
kmeans = KMeans(n_clusters=optimal_size, random_state=0).fit(NE_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 6, 1, 2, 1, 1, 4, 0, 2], dtype=int32)

In [15]:
def most_common_venues(row, num_top_venues):
    row_cat = row.iloc[1:]
    row_cat_sorted = row_cat.sort_values(ascending=False)
    return row_cat_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Coverage']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
coverage_venues_sorted = pd.DataFrame(columns=columns)
coverage_venues_sorted['Coverage'] = NE_new_grouped['Coverage']

for ind in np.arange(NE_new_grouped.shape[0]):
    coverage_venues_sorted.iloc[ind, 1:] = most_common_venues(NE_new_grouped.iloc[ind, :], num_top_venues)

coverage_venues_sorted.head()

Unnamed: 0,Coverage,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Acomb,Â Hexhamshire",Coffee Shop,Pizza Place,CafÃ©,Doner Restaurant,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant
1,BT Group[6],Coffee Shop,Restaurant,Wings Joint,Diner,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant
2,"Battle Hill,Â Willington,Â Wallsend, North Tyne ...",Gastropub,Wings Joint,Food,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant
3,"Bedlington, Hartford Bridge",Coffee Shop,Fast Food Restaurant,Wings Joint,Diner,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant
4,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Restaurant,Chinese Restaurant,Wings Joint,Diner,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant


## 2.3 Merge Newcastle Upon Tyne data with Long/Lat data

In [16]:
#Merge Newcastle Upon Tyne data with Long/Lat data
NE_labels = pd.merge(NE_data,NE_new_grouped, on='Coverage', how='right')
NE_labels.shape


NE_labels = NE_labels.drop(columns=['Afghan Restaurant', 'Airport Food Court''American Restaurant', 'Arepa Restaurant', 'Asian Restaurant', 'Bar', 'Belgian Restaurant',
                   'Bistro', 'Brazilian Restaurant Gluten-free Restaurant', 'Breakfast Spot', 'Burger Joint', 'Burrito Place', 'CafÃ©', 'Cajun / Creole Restaurant',
                   'Caribbean Restaurant', 'Chinese Restaurant', 'Coffee Shop', 'Comfort Food Restaurant', 'Creperie', 'Cuban Restaurant', 'Deli / Bodega',
                   'Dim Sum Restaurant', 'Diner', 'Doner Restaurant', 'Dumpling Restaurant', 'Eastern European Restaurant', 'Empanada Restaurant',
                   'Ethiopian Restaurant', 'Falafel Restaurant', 'Fast Food Restaurant', 'Filipino Restaurant', 'Fish & Chips Shop', 'Food', 'Food & Drink Shop',
                   'Food Court', 'Food Truck', 'French Restaurant', 'Fried Chicken Joint', 'Gastropub', 'German Restaurant', 'Gourmet Shop', 'Greek Restaurant',
                   'Hakka Restaurant', 'Hotpot Restaurant', 'Ice Cream Shop', 'Indian Restaurant', 'Indonesian Restaurant', 'Irish Pub', 'Japanese Restaurant',
                   'Jewish Restaurant', 'Korean Restaurant', 'Latin American Restaurant', 'Mac & Cheese Joint', 'Malay Restaurant', 'Mediterranean Restaurant BBQ Joint',
                   'Mexican Restaurant', 'Middle Eastern Restaurant', 'Modern European Restaurant', 'Molecular Gastronomy Restaurant', 'Molecular Gastronomy Restaurant',
                   'New American Restaurant', 'Noodle House', 'Persian Restaurant', 'Pizza Place', 'Polish Restaurant', 'Portuguese Restaurant', 'Poutine Place',
                   'Pub Italian Restaurant', 'Ramen Restaurant', 'Restaurant', 'Sake Bar', 'Salad Place', 'Sandwich Place', 'Seafood Restaurant', 'Snack Place',
                   'Soup Place', 'South American Restaurant', 'Southern / Soul Food Restaurant', 'Sports Bar', 'Steakhouse', 'Sushi Restaurant', 'Taco Place',
                   'Taiwanese Restaurant', 'Tapas Restaurant', 'Thai Restaurant', 'Vegetarian / Vegan Restaurant', 'Vietnamese Restaurant', 'Wings Joint'])
NE_labels.head()

Unnamed: 0,PostalCode,PostTown,NetIncome,Population,Coverage,Local Authority Area,Latitude,Longitude
0,NE1,NEWCASTLE UPON TYNE,31400,174894,City Centre,Newcastle upon Tyne,54.967722,-1.615787
1,NE2,NEWCASTLE UPON TYNE,39400,296275,"Jesmond,Â Spital Tongues",Newcastle upon Tyne,54.97565,-1.597167
2,NE3,NEWCASTLE UPON TYNE,24800,275023,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",Newcastle upon Tyne,55.004963,-1.619512
3,NE4,NEWCASTLE UPON TYNE,25100,109832,"Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell",Newcastle upon Tyne,54.975084,-1.640244
4,NE5,NEWCASTLE UPON TYNE,39000,48390,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Newcastle upon Tyne,54.991248,-1.713915


## 2.4 Adding the K-Means labels

In [17]:
NE_merged = NE_labels

# Add clustering labels
NE_merged['Cluster Labels'] = kmeans.labels_

# merge the grouped NE data with NE data to introduce the lat/long for all overage
NE_merged = NE_merged.join(coverage_venues_sorted.set_index('Coverage'), on='Coverage')

NE_merged.head()

Unnamed: 0,PostalCode,PostTown,NetIncome,Population,Coverage,Local Authority Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,NE1,NEWCASTLE UPON TYNE,31400,174894,City Centre,Newcastle upon Tyne,54.967722,-1.615787,1,Bar,Indian Restaurant,Fried Chicken Joint,Coffee Shop,Burrito Place,Steakhouse,Pizza Place,Ice Cream Shop,Sports Bar,Noodle House
1,NE2,NEWCASTLE UPON TYNE,39400,296275,"Jesmond,Â Spital Tongues",Newcastle upon Tyne,54.97565,-1.597167,1,Bar,Fast Food Restaurant,Wings Joint,Food,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Filipino Restaurant
2,NE3,NEWCASTLE UPON TYNE,24800,275023,"Gosforth,Â Fawdon,Â Kingston Park,Â Kenton",Newcastle upon Tyne,55.004963,-1.619512,6,CafÃ©,Coffee Shop,Fish & Chips Shop,Sandwich Place,Seafood Restaurant,Deli / Bodega,Indian Restaurant,Bar,Filipino Restaurant,Eastern European Restaurant
3,NE4,NEWCASTLE UPON TYNE,25100,109832,"Fenham,Â Arthurs Hill,Â Elswick,Â Wingrove,Â Benwell",Newcastle upon Tyne,54.975084,-1.640244,1,Indian Restaurant,Pizza Place,Middle Eastern Restaurant,Fast Food Restaurant,Burger Joint,Fish & Chips Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant
4,NE5,NEWCASTLE UPON TYNE,39000,48390,"Blakelaw,Â Cowgate,Â DentonÂ andÂ Westerhope,",Newcastle upon Tyne,54.991248,-1.713915,2,Restaurant,Chinese Restaurant,Wings Joint,Diner,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant


In [18]:
NE_merged_new1 = NE_merged.loc[NE_merged['Cluster Labels'] == 0, NE_merged.columns[[3, 4] + list(range(5, NE_merged.shape[1]))]]
NE_merged_new1.shape

(7, 16)

In [19]:

NE_merged_new2 = NE_merged.loc[NE_merged['Cluster Labels'] == 1, NE_merged.columns[[3, 4] + list(range(5, NE_merged.shape[1]))]]
NE_merged_new2.shape

(14, 16)

# 3. To identify the optimum location for our delivery service we need to find the geographic center for the cluster. The second cluster has the highest cluster density so we will use this one

In [20]:
# Geographic center of the clust which is most dense
Cluster_0_id = NE_merged_new2[['Latitude', 'Longitude']]
Cluster_0_id = list(Cluster_0_id.values) 
lat = []
long = []

for l in Cluster_0_id:
  lat.append(l[0])
  long.append(l[1])

secondary_latitude = sum(lat)/len(lat)
secondary_longitude = sum(long)/len(long)
print(secondary_latitude)
print(secondary_longitude)

55.0105722857143
-1.6759753


In [21]:
# Using the pip installer, we will use opencage (a 3rd party api to lookup what location a set of coordinates relate to)pip install opencage
!pip install opencage
from opencage.geocoder import OpenCageGeocode
from pprint import pprint

Collecting opencage
  Downloading opencage-1.2.2-py3-none-any.whl (6.1 kB)
Collecting backoff>=1.10.0
  Downloading backoff-1.10.0-py2.py3-none-any.whl (31 kB)
Installing collected packages: backoff, opencage
Successfully installed backoff-1.10.0 opencage-1.2.2


In [22]:
key = '95e75ae7122e4ecbb3c74728dcff3067'
geocoder = OpenCageGeocode(key)

results = geocoder.reverse_geocode(secondary_latitude, secondary_longitude)
pprint(results)

[{'annotations': {'DMS': {'lat': "55Â° 0' 38.29608'' N",
                          'lng': "1Â° 40' 33.45168'' W"},
                  'MGRS': '30UWF8467196776',
                  'Maidenhead': 'IO95da82vn',
                  'Mercator': {'x': -186566.879, 'y': 7328898.0},
                  'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?way=159152726#map=17/55.01064/-1.67596',
                          'note_url': 'https://www.openstreetmap.org/note/new#map=17/55.01064/-1.67596&layers=N',
                          'url': 'https://www.openstreetmap.org/?mlat=55.01064&mlon=-1.67596#map=17/55.01064/-1.67596'},
                  'UN_M49': {'regions': {'EUROPE': '150',
                                         'GB': '826',
                                         'NORTHERN_EUROPE': '154',
                                         'WORLD': '001'},
                             'statistical_groupings': ['MEDC']},
                  'callingcode': 44,
                  'currency': {'alterna

# 4. Results

## 4.1 Retrieving the best location and their coordinates

In [23]:
bestloc = NE_data[NE_data['PostalCode'].str.contains('NE13')]

def str_join(*args):
    return ''.join(map(str, args))

bestloc_new = str_join('The ideal area to setup a Restaurant Delivery service is in: ', bestloc['Coverage'].values,  ' in ' ,  bestloc['PostTown'].values)


print(bestloc_new)

The ideal area to setup a Restaurant Delivery service is in: ['Airport, Wideopen, Dinnington, Great Park (West),\xa0Woolsington'] in ['NEWCASTLE UPON TYNE']


In [24]:
# Based on this information we will want to identify the coordinates of Newcastle Upon Tyne
from geopy.geocoders import Nominatim
address = 'Newcastle Upon Tyne, NE'

geolocator = Nominatim(user_agent="Coursera_Capstone")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Newcastle Upon Tyne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Newcastle Upon Tyne are 54.9738474, -1.6131572.


## 4.2 Plot the processed clusters onto a Map of Newcastle Upon Tyne

In [26]:
!pip install folium

Collecting folium
  Downloading folium-0.12.0-py2.py3-none-any.whl (94 kB)
[K     |â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 94 kB 3.4 MB/s eta 0:00:011
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.0


In [33]:
# Firstly retrieve Matplotlib and folium, then generate the map
import folium 
import matplotlib.colors as colors
import matplotlib.cm as cm

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NE_merged['Latitude'], NE_merged['Longitude'], NE_merged['Coverage'], NE_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    
folium.CircleMarker([secondary_latitude, secondary_longitude],
                    radius=50,
                    popup='Newcastle Upon Tyne',
                    color='red',
                    ).add_to(map_clusters)

# Interactive marker
map_clusters.add_child(folium.ClickForMarker(popup=bestloc_new))
       
#map_clusters
map_clusters.save('map_clusters.html')

map_clusters


In [34]:
print('The ideal address to locate a delivery service would be: Warbeck Close, Newcastle-upon-Tyne, NE3 2FF, Newcastle-upon-Tyne England United Kingdom lat: 55.0106378, lng: -1.6759588') 

The ideal address to locate a delivery service would be: Warbeck Close, Newcastle-upon-Tyne, NE3 2FF, Newcastle-upon-Tyne England United Kingdom lat: 55.0106378, lng: -1.6759588


## 4.3 Discussing the results:

The key discovery when looking at the coverage that only includes restaurants is that we see most coverage produced similar results. We can also note that the most significant concentration of restaurants can be found within central Newcastle which is to be expected as it is the city centre. This also shows a correlation between the NE3 postcode being an affluent area (For the region) and a higher number of restaurants. This postcode would be a good place for us to set up a restaurant delivery service as it is close vicinity to an affluent area and a large number of restaurants.

Of the 66 Postcodes test, 43 areas or **68.2%** are above the median average income of the UK and therefore 23 areas or **31.8%** are below the median income of the UK.

I conducted a Silhouette analysis during the building of the K-Means dataset to identify the similarities between different coverages and the restaurants within those regions. There are a couple of clusters present however, the main cluster of restaurants appears to be within central Newcastle.



# 5. Conclusion

Based on the information I have gathered from the data analysis process, I believe that a suitable location for setting up a restaurant in and around Warbeck Close, Newcastle-upon-Tyne, NE3 2FF. The information collected also has an extended use and could be used to infer more conclusions for different situations 