# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## 1. Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Indonesian restaurant** in **Trafford**, Greater Manchester, England.

Since there are lots of restaurants in Trafford we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Indonesian restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## 2. Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decission are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Indonesian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Trafford center will be obtained using **Google Maps API geocoding**

### Neighborhood

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods.

In [1]:
import pandas as pd

In [2]:
url = 'https://en.wikipedia.org/wiki/Greater_Manchester'

table = pd.read_html(url)[3]

table

Unnamed: 0,Metropolitan borough,Metropolitan borough.1,Administrative centre,Other components
0,Bury,,Bury,"Prestwich, Radcliffe, Ramsbottom, Tottington, ..."
1,Bolton,,Bolton,"Blackrod, Farnworth, Horwich, Kearsley, Little..."
2,Manchester,,Manchester,"Blackley, Cheetham Hill, Chorlton-cum-Hardy, D..."
3,Oldham,,Oldham,"Chadderton, Shaw and Crompton, Failsworth, Lee..."
4,Rochdale,,Rochdale,"Heywood, Littleborough, Middleton, Milnrow, Ne..."
5,Salford,,Swinton,"Eccles, Clifton, Little Hulton, Walkden, Worsl..."
6,Stockport,,Stockport,"Bramhall, Bredbury, Cheadle, Gatley, Hazel Gro..."
7,Tameside,,Ashton-under-Lyne,"Audenshaw, Denton, Droylsden, Dukinfield, Hyde..."
8,Trafford,,Stretford,"Altrincham, Bowdon, Hale, Sale, Urmston, Parti..."
9,Wigan,,Wigan,"Abram, Ashton-in-Makerfield, Aspull, Astley, A..."


we do not need columns `Metropolitan borough.1`, so we remove it

In [3]:
data = table.drop(['Metropolitan borough.1'], axis=1)

data

Unnamed: 0,Metropolitan borough,Administrative centre,Other components
0,Bury,Bury,"Prestwich, Radcliffe, Ramsbottom, Tottington, ..."
1,Bolton,Bolton,"Blackrod, Farnworth, Horwich, Kearsley, Little..."
2,Manchester,Manchester,"Blackley, Cheetham Hill, Chorlton-cum-Hardy, D..."
3,Oldham,Oldham,"Chadderton, Shaw and Crompton, Failsworth, Lee..."
4,Rochdale,Rochdale,"Heywood, Littleborough, Middleton, Milnrow, Ne..."
5,Salford,Swinton,"Eccles, Clifton, Little Hulton, Walkden, Worsl..."
6,Stockport,Stockport,"Bramhall, Bredbury, Cheadle, Gatley, Hazel Gro..."
7,Tameside,Ashton-under-Lyne,"Audenshaw, Denton, Droylsden, Dukinfield, Hyde..."
8,Trafford,Stretford,"Altrincham, Bowdon, Hale, Sale, Urmston, Parti..."
9,Wigan,Wigan,"Abram, Ashton-in-Makerfield, Aspull, Astley, A..."


We expand column `Other components` and change the name as Neighboohods. We also rename coloumn `Metropolitan borough` as `Borough` and `Administrative centre` as `Centre`

In [4]:
import numpy as np
from itertools import chain

In [5]:
def chainer(s):
    return list(chain.from_iterable(s.str.split(',')))

lens = data['Other components'].str.split(',').map(len)

df = pd.DataFrame({'Borough': np.repeat(data['Metropolitan borough'], lens),
                   'Centre': np.repeat(data['Administrative centre'], lens),
                   'Neighboorhood': chainer(data['Other components'])}).reset_index(drop=True)

df

Unnamed: 0,Borough,Centre,Neighboorhood
0,Bury,Bury,Prestwich
1,Bury,Bury,Radcliffe
2,Bury,Bury,Ramsbottom
3,Bury,Bury,Tottington
4,Bury,Bury,Whitefield
...,...,...,...
74,Wigan,Wigan,Orrell
75,Wigan,Wigan,Shevington
76,Wigan,Wigan,Standish
77,Wigan,Wigan,Tyldesley


Let's define Neighboorhood's Latitude and Longitude

In [6]:
import geocoder
from geopy.exc import GeocoderTimedOut
from geopy.geocoders import Nominatim

In [7]:
latitude = []
longitude = []

def findGeocode(city):
    try:
        geolocator = Nominatim(user_agent='greater_manchester')
        
        return geolocator.geocode(city)
        
    except:
        
        return findGeocode(city)
    

for i in (df['Neighboorhood'].astype(str) + ', Greater Manchester'):
    
    if findGeocode(i) != None:
        
        loc = findGeocode(i)
        
        latitude.append(loc.latitude)
        longitude.append(loc.longitude)
        
    else:
        latitude.append(np.nan)
        longitude.append(np.nan)

In [8]:
df['Latitude'] = latitude
df['Longitude'] = longitude

In [9]:
df

Unnamed: 0,Borough,Centre,Neighboorhood,Latitude,Longitude
0,Bury,Bury,Prestwich,53.530427,-2.296019
1,Bury,Bury,Radcliffe,53.559338,-2.326155
2,Bury,Bury,Ramsbottom,53.648383,-2.315696
3,Bury,Bury,Tottington,53.612626,-2.343351
4,Bury,Bury,Whitefield,53.553368,-2.296902
...,...,...,...,...,...
74,Wigan,Wigan,Orrell,53.530179,-2.709059
75,Wigan,Wigan,Shevington,53.570987,-2.692607
76,Wigan,Wigan,Standish,53.586326,-2.663575
77,Wigan,Wigan,Tyldesley,53.514486,-2.466171


Because we will start an Indonesian restaurant in Trafford, then we will take the area around Trafford

In [10]:
data_trafford = df.copy()

data_trafford = df[df.Borough.str.contains('Trafford')].reset_index(drop=True)

data_trafford

Unnamed: 0,Borough,Centre,Neighboorhood,Latitude,Longitude
0,Trafford,Stretford,Altrincham,53.383966,-2.352546
1,Trafford,Stretford,Bowdon,53.376876,-2.371532
2,Trafford,Stretford,Hale,53.378522,-2.347497
3,Trafford,Stretford,Sale,53.424494,-2.318415
4,Trafford,Stretford,Urmston,53.448321,-2.353657
5,Trafford,Stretford,Partington,53.421478,-2.428249


### Foursquare

Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Indonesian restaurant' category, as we need info on Indonesian restaurants in the neighborhood.

In [11]:
import requests

In [12]:
CLIENT_ID = '4ZVZN5MKAUAAWYUF1JZ0E01KEKK4P1DTMG1DFODLMF53DB1Q'
CLIENT_SECRET = 'WFJ2L3BNY5LA4VRN5AJOGGT2QQIVJX02FGXNEOJ1BFQPN35G'
VERSION = '20180604'
LIMIT = 100

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
trafford_venues = getNearbyVenues(names=data_trafford['Neighboorhood'],
                                  latitudes=data_trafford['Latitude'],
                                  longitudes=data_trafford['Longitude']
                                 )

trafford_venues

Altrincham
 Bowdon
 Hale
 Sale
 Urmston
 Partington


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altrincham,53.383966,-2.352546,Oxford Road Café,53.383604,-2.351889,Café
1,Altrincham,53.383966,-2.352546,Altrincham Market & Market House,53.387479,-2.351873,Farmers Market
2,Altrincham,53.383966,-2.352546,Costellos Bar,53.385438,-2.350267,Bar
3,Altrincham,53.383966,-2.352546,Mort Subite,53.386939,-2.352087,Bar
4,Altrincham,53.383966,-2.352546,Yara,53.383209,-2.351076,Middle Eastern Restaurant
...,...,...,...,...,...,...,...
110,Urmston,53.448321,-2.353657,Urmston Mens Club,53.448200,-2.348143,Bar
111,Partington,53.421478,-2.428249,Co-op Food,53.418228,-2.425136,Grocery Store
112,Partington,53.421478,-2.428249,Figment Design,53.420965,-2.428064,IT Services
113,Partington,53.421478,-2.428249,Tesco Express,53.418544,-2.425578,Grocery Store


Looking good. So now we have all the restaurants in area within Trafford.  We also know which restaurants exactly are in vicinity of every neighborhood.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Indonesian restaurant!

## 3. Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Trafford that have low restaurant density, particularly those with low number of Indonesian restaurants.

In first step we have collected the required **data: location and type (category) of every restaurant**.

Second step in our analysis will be calculation and exploration of '**restaurant density**' across different areas of Trafford - we will use **map** to identify a few promising areas with low number of restaurants in general (*and* no Indonesian restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas. We will take into consideration locations with **as few restaurants as possible**, and we want locations **without Indonesian restaurants**. We will present map of all such locations of those locations to identify neighborhoods which should be a starting point for final exploration and search for optimal venue location by stakeholders.

## 4. Analysis <a name="analysis"></a>

Let's choose a 'Venue Category' that contains a restaurant

In [15]:
trafford_venues_restaurant = trafford_venues[trafford_venues['Venue Category'].str.contains(
    'Restaurant')].reset_index(drop=True)

trafford_venues_restaurant

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Altrincham,53.383966,-2.352546,Yara,53.383209,-2.351076,Middle Eastern Restaurant
1,Altrincham,53.383966,-2.352546,Double Tree,53.384685,-2.351836,Indian Restaurant
2,Altrincham,53.383966,-2.352546,Tre Ciccio,53.386496,-2.348426,Italian Restaurant
3,Altrincham,53.383966,-2.352546,Coco's Bar & Restaurant,53.386106,-2.352417,Italian Restaurant
4,Altrincham,53.383966,-2.352546,The Con Club,53.387342,-2.351789,Restaurant
5,Altrincham,53.383966,-2.352546,Nando's,53.38706,-2.348962,Portuguese Restaurant
6,Altrincham,53.383966,-2.352546,Phanthong,53.385898,-2.352253,Thai Restaurant
7,Altrincham,53.383966,-2.352546,Bistrot Pierre,53.387196,-2.349877,French Restaurant
8,Altrincham,53.383966,-2.352546,Bem Brasil Altrincham,53.38397,-2.352464,Brazilian Restaurant
9,Altrincham,53.383966,-2.352546,Frankie & Benny's,53.384664,-2.349227,American Restaurant


From the data above, we can see that there are no Indonesian restaurants open at Trafford

Let's look at the distribution of restaurants on the map so that we can determine where we will open an Indonesian restaurant

In [16]:
from geopy.geocoders import Nominatim

address = 'Trafford, Greater Manchester'

geolocator = Nominatim(user_agent = 'trafford_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Trafford are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Trafford are 53.41893605, -2.359297161165271.


In [17]:
import folium

In [18]:
trafford_venues_map = folium.Map(location=[latitude, longitude], zoom_start=10)

folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='black',
    popup='Stretford',
    fill = True,
    fill_color = 'black',
    fill_opacity = 0.6
).add_to(trafford_venues_map)

for lat, lng, label in zip(trafford_venues['Neighborhood Latitude'], trafford_venues['Neighborhood Longitude'], 
                           trafford_venues['Venue Category']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(trafford_venues_map)

for lat, lng, label in zip(trafford_venues_restaurant['Venue Latitude'], 
                           trafford_venues_restaurant['Venue Longitude'], 
                           trafford_venues_restaurant['Neighborhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='red',
        popup=label,
        fill = True,
        fill_color='red',
        fill_opacity=0.6
    ).add_to(trafford_venues_map)
    
trafford_venues_map

## 4. Result and Discussion <a name="results"></a>

Our analysis shows that there are a number of restaurants in Trafford in a radius of 500. Of these restaurants, not a single restaurant serves Indonesian cuisine.

The highest restaurant density was in Altrincham, followed by Hale. As for Sale and Urmston were the same density.

What is interesting is that, in Parlington and Bowdon our analysis shows the absence of any restaurants in a radius of 500.

So that in this case Parlington and Bowdon can be candidates to start an Indonesian restaurant business at Trafford.

However, Bowdon is located closer to Altrincham and Hale, which, when combined, will have a fairly high restaurant density.

Therefore, from the results of the analysis, we suggest opening an Indonesian restaurant in the **Parlington** area

## 5. Conclusion <a name="conclusion"></a>

Purpose of this project was to identify Trafford areas close to center with low number of restaurants (particularly Indonesian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Indonesian restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations, and we suggest opening an Indonesian restaurant in the **Parlington** area