# Newcastle Capstone Project: Find the best region in Newcastle Upon Tyne to set up a restaurant delivery service Report
By Charlie Witty

## Introduction:

Me and a colleague have decided to set up a delivery service for restaurant takeaways in the center of Newcastle Upon Tyne. We decided that we need to figure out which area in the city has the highest concentration of restaurants in the city, so we can set up a successful business.

The type of business we are intending to creae is similar to the following:
- https://www.just-eat.co.uk/
- https://deliveroo.co.uk/
- https://www.ubereats.com/gb

Me and my colleague sat down and determined the following requirements after conducting some market research:

#### Prediciment:
1. Which area should we open up our new delivery service in Newcastle Upon Tyne to ensure frequent, sustainable and lucrative business?

#### Initial Assumptions:
1. Potential customers of the service will be aware of it's existance and we are assuming that this will not impact our data regarding population statistics.
2. If all requirements are met then the following can be assumed to be true:
- Minimal average cost for delivery will be met.
- Overall travelling time is reduced for all delivery drivers 
- Lower organisational cost for me and my colleague.

#### Business Requirements:

1. Delivery service base must be located within the largest conentration of restaurents within Newcastle Upon Tyne
2. Ensure that a suitable number of potential customers can utilise our service with sustained frequency.
    -Does this area have an average income which is close to the UK national average?
    -Does the area have a suitably sized population?
3. Validate and verify assumptions about the scenario by modelling and testing the data using visualised clusters of restaurants in the Newcastle Upon Tyne

#### Optional Requirements:
1. Identify which areas surrounding Newcastle Upon Tyne may have an increased number of restaurants in the near future.

## Data Sourcing:

The following is a break down of the data set I used and where I located them from:

#### 1. I initially needed to process a list of Newcastle Upon Tyne areas which I  decided to determine by UK Postcodes. I used a list of postcodes for the NE postcode area which I sourced from Wikipedia.
    - [List of NE Postcodes](https://en.wikipedia.org/wiki/NE_postcode_area)

#### 2. I also needed a list of the populations of each postcode area:
    -[Population by Postcode](https://www.nomisweb.co.uk/census/2011/ks101ew)
    - Note this data has been sanitised and uploaded as [censuspop.csv](https://github.com/cwitty255/Coursera_Capstone/blob/main/censuspop.csv)
    
    

#### 3. I needed a list of Newcastle Upon Tyne area average after tax income broken down by postal code. **Note:** These statistics were downloaded manually and can be found here in my git repo.
    -[NE Income Statistics](https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/earningsandworkinghours/datasets/smallareaincomeestimatesformiddlelayersuperoutputareasenglandandwales)
    - The data has been sanitised and can be found [here](https://github.com/cwitty255/Coursera_Capstone/blob/main/income_by_postcode.csv)

#### 4. I then needed the same median but for all of the United Kingdom by Postcode.
Here we must also manually download this from Stats Canada and load them.
https://www.statista.com/statistics/1002964/average-full-time-annual-earnings-in-the-uk/
British families and individuals had a median after-tax income of £31,461 in 2020

**Note**: Of the 66 Postcodes test, 43 areas or 68.2% are above the median average income of the UK and therefore 23 areas or 31.8% are below the median income of the UK.


#### 5. Finally I needed a list of all Restaurant Venues from all neighborhoods in Newcastle Upon Tyne by Postal code
For this I used the FourSquare API to download all venues from all neighborhoods in Newcastle Upon Tyne.
https://api.foursquare.com
#### 5.1 Extract Restaurants and only include Restaurants in our Data Set.

#### 6. OneHot encode and count restaurants

#### 7. I combined all of this information into a suitable working Data Set which was used for clustering and Geo-Spatial mapping of the results which showed the best area to a open a delivery service.

The compiled data set will clearly demonstrate:

- Which neighborhoods in Newcastle Upon Tyne have clusters of like Restaurants
- The average net income of all areas
- The population of each area
- Which are we should target to setup our restaurant delivery service

# Methodology

## K-Means Clustering

My choice of algorithm was the K-Means algorthim

[K-Means](https://en.wikipedia.org/wiki/K-means_clustering)

K-means clustering is a simple unsupervised learning algorithm that is used to solve clustering problems. It follows a simple procedure of classifying a given data set into a number of clusters, defined by the letter "k," which is fixed beforehand. The clusters are then positioned as points and all observations or data points are associated with the nearest cluster, computed, adjusted and then the process starts over using the new adjustments until a desired result is reached.

K-means clustering has uses in search engines, market segmentation, statistics and even astronomy.

In order to determine the ideal number of clusters I decided to utilise the Silhouette Analysis:

Silhouette refers to a method of interpretation and validation of consistency within clusters of data. The technique provides a succinct graphical representation of how well each object has been classified. The silhouette value is a measure of how similar an object is to its own cluster compared to other clusters [Ref](https://en.wikipedia.org/wiki/Silhouette_(clustering))

It is important to note that my highest score was **7**

### 2.1 Use silhouette score to find optimal number of clusters to segment the data

### 2.2 Apply K-Means, segment data into clusters and generate labels

This step was used to reshape the Newcastle Upon Tyne data so that it's shape matched the shape of the clustered data.

### 2.3 Merge Newcastle Upon Tyne data with Long/Lat data


### 2.4 Adding the K-Means labels

I determined that the most significant cluster was the 2nd cluster with a shape of (14, 16)

## 3. To identify the optimum location for our delivery service we need to find the geographic center for the cluster. The second cluster has the highest cluster density so we will use this one


Here we take the average latitude and longitude to be the centroid.

3.1 Install opencage to reverse lookup the coordinates
Opencage allows me to reverse lookup the geo coordinates.
Key Observation: This is the optimum location for a new Restaurant Supply Store.

## 4. Results

### 4.1 Retrieving the best location and their coordinates

### 4.2 Plot the processed clusters onto a Map of Newcastle Upon Tyne with the best location for a delicery service

### 4.3 Exact Address of desired Location

Using a reverse lookup tool I identified that the ideal address to locate a delivery service would be: Warbeck Close, Newcastle-upon-Tyne, NE3 2FF, Newcastle-upon-Tyne England United Kingdom lat: 55.0106378, lng: -1.6759588

## 5 Discussing the results:

The key discovery when looking at the coverage that only includes restaurants is that we see most coverage produced similar results. We can also note that the most significant concentration of restaurants can be found within central Newcastle which is to be expected as it is the city centre. This also shows a correlation between the NE3 postcode being an affluent area (For the region) and a higher number of restaurants. This postcode would be a good place for us to set up a restaurant delivery service as it is close vicinity to an affluent area and a large number of restaurants.

Of the 66 Postcodes test, 43 areas or **68.2%** are above the median average income of the UK and therefore 23 areas or **31.8%** are below the median income of the UK.

I conducted a Silhouette analysis during the building of the K-Means dataset to identify the similarities between different coverages and the restaurants within those regions. There are a couple of clusters present however, the main cluster of restaurants appears to be within central Newcastle.

# 6. Conclusion

Based on the information I have gathered from the data analysis process, I believe that a suitable location for setting up a restaurant in and around Warbeck Close, Newcastle-upon-Tyne, NE3 2FF. The information collected also has an extended use and could be used to infer more conclusions for different situations 