# Hospitality Opportunities in HS2-Era Birmingham

5 March 2020 / W Carter

This notebook is part of a final submission for the IBM Data Science Professional certificate course.  The business problem is fictitious, and intended only to demonstrate competence with data science methods.  The notebook was produced on IBM Watson Studio, before being 'pushed' to this online repository.

## Introduction & Business Problem

A new rail infrastructure project has recently been confirmed in the UK, the 'HS2'.  This ambitious transport system will connect the capital, London, to the second-most populous city in the UK, Birmingham (and the greater conurbation in the West Midlands), and later to northern England and Scotland.  Rail transport times between London and Birmingham will be reduced from 1hr 24 mins to just 49 mins, over a similar time period that increasing environmental pressure is building to reduce on air travel.

Birmingham City plans to capitalise on the transport upgrade for massive economic regeneration, and are mobilising for an urban development master plan that intends to create 36,000 new jobs, build 4,000 new homes, and provide £1.4bn economic uplift.  The plan identifies six 'Places for Growth' across the City, for the purposes of tourism, retail, business, learning, research, and a new creative sector.  

For the latter, the area/neighbourhood of Digbeth has been identified as the main area for the growing number of companies involved with digital technologies, design, TV production and arts.

Although development of the main infrastructure project has not yet started, a hospitality entrepreneur and an investor (the clients for this data science project) have requested a brief analysis of pre-existing popular venues and hotels around Digbeth and the likely site of the new railway station.  They are looking to strategically position 2-3 venues (1 hotel, and 1-2 restaurants) in the target area that would cater to both current popular demands and the future demands of the growing community of creative sector professionals living in or visiting the area.

In particular, the clients have three questions at this pre-concept stage, and only a limited budget:

- What are the most popular (commonly visited) restaurant cuisines in and around the Digbeth area (i.e. the cluster of neighbourhoods) of Birmingham City, UK?
- What are the top five most popular (highest rated by visitors) hotels in Birmingham, and how far are they from the railway station?
- What is the current estimated population of Digbeth and its surrounding neighbourhoods?

## Data

This project will primarily utilise location and venue (visit and rating) data available through Foursquare.  Foursquare is a location technology company that, in 2009, developed a crowd-sourcing 'check-in' system that make sense of where phones go, and offers a propietary dataset (called Pilgrim) built upon over 13+ billion check-ins.

The project also uses a composite dataset that contains the postcode districts for Birmingham, their respective latitudes and longitudes, and estimated population by postcode (based upon 2011 UK census data).  This dataset was extracted from data curated by Chris Bell on www.dougal.co.uk (for which, the author is extremely grateful), and is available on a public domain licence. The data has been loaded directly into the IBM Watson Studio project, and connected with this notebook.

In addition to the Foursquare, to validate the hotel rating information, data from www.booking.com will also be referred to, but excluded from the data analysis.  This cross-checking is done as the representativeness of Foursquare rating (but not location) data for hotels in Birmingham, UK is unknown, and therefore may be confirmed through comparative validation with another crowd-sourced rating system.

Lastly, the contextual data for the background of this project (i.e. information about HS2 and Birmingham City's master development plan), but not used in the data analysis of this project, has been taken from the Birmingham City Council website (www.birmingham.gov.uk/).

## Preliminaries

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

import requests # library to handle requests
from bs4 import BeautifulSoup
import csv
import json
import xml

from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [160]:
CLIENT_ID = 'QNBM2LMRAUNUXJCPPBXC0WZQJIJQMBVURYLWBCXM1FQOEQPX' # your Foursquare ID
CLIENT_SECRET = 'MPACHLKKDKKTVJJPNMI32VEAC3CPM5WL2ZRF5Z2A2Z0313J2' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 1000
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QNBM2LMRAUNUXJCPPBXC0WZQJIJQMBVURYLWBCXM1FQOEQPX
CLIENT_SECRET:MPACHLKKDKKTVJJPNMI32VEAC3CPM5WL2ZRF5Z2A2Z0313J2


In [161]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

client_ce8198a121a849d89097425d2c3d8ebd = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='mcvvOvgDMxp7SabKMo-Tt1KzWhcVshi061CzH_ofLrwG',
    ibm_auth_endpoint="https://iam.eu-gb.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3.eu-geo.objectstorage.service.networklayer.com')

body = client_ce8198a121a849d89097425d2c3d8ebd.get_object(Bucket='boutiquehotelsinhs2erabirmingham-donotdelete-pr-ylwa9fq70kklgv',Key='B-postcodes.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data_1 = pd.read_csv(body)
df_data_1.head()


Unnamed: 0,Postcode,Latitude,Longitude,Town/Area,Region,Population,Households
0,B1,52.4796,-1.90778,"Birmingham City Centre, Ladywood",Birmingham,8514.0,4526.0
1,B2,52.4863,-1.89732,"Birmingham City Centre, Ladywood",Birmingham,655.0,473.0
2,B3,52.4823,-1.90288,"Birmingham City Centre, Ladywood",Birmingham,2226.0,1406.0
3,B4,52.4838,-1.89373,"Birmingham City Centre, Ladywood",Birmingham,4337.0,465.0
4,B5,52.4722,-1.89683,"Digbeth, Highgate, Lee Bank",Birmingham,12156.0,5139.0


In [162]:
address = 'Birmingham, UK'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Birmingham is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Birmingham is 52.4796992, -1.9026911.


In [163]:
df_data_1 = df_data_1[df_data_1['Region'] == 'Birmingham'].reset_index(drop=True)


In [164]:
# create map of Birmingham using latitude and longitude values
map_birmingham = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_data_1['Latitude'], df_data_1['Longitude'], df_data_1['Town/Area'], df_data_1['Region']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_birmingham)  
    
map_birmingham

In [165]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [166]:
birmingham_venues = getNearbyVenues(names=df_data_1['Town/Area'],
                                   latitudes=df_data_1['Latitude'],
                                   longitudes=df_data_1['Longitude']
                                  )

Digbeth, Highgate, Lee Bank
Aston
Nechells
Washwood Heath, Ward End, Saltley
Bordesley Green
Small Heath
Sparkhill, Tyseley
Balsall Heath, Sparkbrook, Highgate
Moseley, Billesley
Kings Heath, Yardley Wood, Brandwood, Druids Heath, Warstock
Edgbaston, Lee Bank
Edgbaston, Ladywood
Harborne
Winson Green, Hockley
Lozells, Newtown, Birchfield
Birchfield, Handsworth Wood Perry Barr
Handsworth
Kings Norton
Erdington, Short Heath
Erdington, Tyburn
Yardley
Sheldon, Yardley
Acocks Green
Hall Green
Selly Oak, Bournbrook, Selly Park, Weoley Castle, California
Bournville, Cotteridge, Stirchley
Northfield
Woodgate, Bartley Green, Quinton, California
Kitts Green, Stechford
Shard End, Buckland End
Castle Vale
Kings Norton
Perry Barr, Great Barr, Hamstead
Great Barr, Hamstead, Pheasey
Perry Barr, Kingstanding, Great Barr
Rednal, Rubery
Sutton Coldfield town centre, Maney, Wylde Green
Boldmere, New Oscott, Wylde Green
Four Oaks, Mere Green, Little Aston, Streetly
Sutton Trinity, Falcon Lodge, Rectory
Wa

In [167]:
print(birmingham_venues.shape)
birmingham_venues.head()

(208, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Digbeth, Highgate, Lee Bank",52.4722,-1.89683,Quarter Horse Coffee,52.471454,-1.899532,Coffee Shop
1,"Digbeth, Highgate, Lee Bank",52.4722,-1.89683,Birmingham Hippodrome,52.474471,-1.897573,Theater
2,"Digbeth, Highgate, Lee Bank",52.4722,-1.89683,The Diskery,52.472253,-1.899471,Record Shop
3,"Digbeth, Highgate, Lee Bank",52.4722,-1.89683,Topokki (떡볶이),52.474059,-1.896732,Korean Restaurant
4,"Digbeth, Highgate, Lee Bank",52.4722,-1.89683,Vanguard Supermarket 萬佳超市,52.473924,-1.896641,Grocery Store


In [168]:
birmingham_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acocks Green,10,10,10,10,10,10
Aston,4,4,4,4,4,4
"Balsall Heath, Sparkbrook, Highgate",7,7,7,7,7,7
"Birchfield, Handsworth Wood Perry Barr",1,1,1,1,1,1
"Boldmere, New Oscott, Wylde Green",4,4,4,4,4,4
Bordesley Green,5,5,5,5,5,5
"Bournville, Cotteridge, Stirchley",6,6,6,6,6,6
Castle Vale,5,5,5,5,5,5
"Digbeth, Highgate, Lee Bank",39,39,39,39,39,39
"Edgbaston, Ladywood",11,11,11,11,11,11


In [169]:
print('There are {} uniques categories.'.format(len(birmingham_venues['Venue Category'].unique())))

There are 87 uniques categories.


In [170]:
# one hot encoding
birmingham_onehot = pd.get_dummies(birmingham_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
birmingham_onehot['Neighborhood'] = birmingham_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [birmingham_onehot.columns[-1]] + list(birmingham_onehot.columns[:-1])
birmingham_onehot = birmingham_onehot[fixed_columns]

birmingham_onehot.head()

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,Bakery,Bar,Beer Bar,...,Sports Club,Stadium,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store
0,"Digbeth, Highgate, Lee Bank",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Digbeth, Highgate, Lee Bank",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,0,0,0,0
2,"Digbeth, Highgate, Lee Bank",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Digbeth, Highgate, Lee Bank",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Digbeth, Highgate, Lee Bank",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [171]:
birmingham_onehot.shape

(208, 88)

In [205]:
birmingham_onehot.sum()

Neighborhood                  Digbeth, Highgate, Lee BankDigbeth, Highgate, ...
Arts & Crafts Store                                                           1
Asian Restaurant                                                              3
Athletics & Sports                                                            1
Auto Garage                                                                   2
Auto Workshop                                                                 1
Automotive Shop                                                               2
Bakery                                                                        1
Bar                                                                           1
Beer Bar                                                                      1
Beer Store                                                                    1
Bookstore                                                                     1
Botanical Garden                        

In [172]:
birmingham_grouped = birmingham_onehot.groupby('Neighborhood').mean().reset_index()
birmingham_grouped

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,Automotive Shop,Bakery,Bar,Beer Bar,...,Sports Club,Stadium,Supermarket,Tapas Restaurant,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Vietnamese Restaurant,Warehouse Store
0,Acocks Green,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.1
1,Aston,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Balsall Heath, Sparkbrook, Highgate",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Birchfield, Handsworth Wood Perry Barr",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Boldmere, New Oscott, Wylde Green",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bordesley Green,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,...,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Bournville, Cotteridge, Stirchley",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Castle Vale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2
8,"Digbeth, Highgate, Lee Bank",0.0,0.0,0.0,0.0,0.0,0.025641,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.025641,0.025641,0.0,0.0,0.025641,0.0
9,"Edgbaston, Ladywood",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [173]:
birmingham_grouped.shape

(38, 88)

In [174]:
num_top_venues = 10

for hood in birmingham_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = birmingham_grouped[birmingham_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Acocks Green----
                    venue  freq
0                     Pub   0.2
1             Supermarket   0.2
2  Furniture / Home Store   0.1
3             Coffee Shop   0.1
4          Sandwich Place   0.1
5          Breakfast Spot   0.1
6           Bowling Alley   0.1
7         Warehouse Store   0.1
8        Tapas Restaurant   0.0
9               Pet Store   0.0


----Aston----
                   venue  freq
0        Motorcycle Shop  0.25
1            Auto Garage  0.25
2                   Park  0.25
3          Grocery Store  0.25
4    Arts & Crafts Store  0.00
5              Pet Store  0.00
6     Persian Restaurant  0.00
7  Performing Arts Venue  0.00
8   Pakistani Restaurant  0.00
9           Optical Shop  0.00


----Balsall Heath, Sparkbrook, Highgate----
                       venue  freq
0       Fast Food Restaurant  0.29
1          Electronics Store  0.14
2      Performing Arts Venue  0.14
3                       Café  0.14
4  Middle Eastern Restaurant  0.14
5             

In [175]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [176]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = birmingham_grouped['Neighborhood']

for ind in np.arange(birmingham_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(birmingham_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acocks Green,Supermarket,Pub,Warehouse Store,Breakfast Spot,Bowling Alley,Furniture / Home Store,Sandwich Place,Coffee Shop,Golf Course,Food Service
1,Aston,Park,Auto Garage,Grocery Store,Motorcycle Shop,Warehouse Store,Gas Station,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
2,"Balsall Heath, Sparkbrook, Highgate",Fast Food Restaurant,Middle Eastern Restaurant,Performing Arts Venue,Electronics Store,Grocery Store,Café,Warehouse Store,Deli / Bodega,Farmers Market,Food Service
3,"Birchfield, Handsworth Wood Perry Barr",Grocery Store,Warehouse Store,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store
4,"Boldmere, New Oscott, Wylde Green",Gym Pool,Coffee Shop,Park,Gas Station,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service


In [177]:
neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Acocks Green,Supermarket,Pub,Warehouse Store,Breakfast Spot,Bowling Alley,Furniture / Home Store,Sandwich Place,Coffee Shop,Golf Course,Food Service
1,Aston,Park,Auto Garage,Grocery Store,Motorcycle Shop,Warehouse Store,Gas Station,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
2,"Balsall Heath, Sparkbrook, Highgate",Fast Food Restaurant,Middle Eastern Restaurant,Performing Arts Venue,Electronics Store,Grocery Store,Café,Warehouse Store,Deli / Bodega,Farmers Market,Food Service
3,"Birchfield, Handsworth Wood Perry Barr",Grocery Store,Warehouse Store,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store
4,"Boldmere, New Oscott, Wylde Green",Gym Pool,Coffee Shop,Park,Gas Station,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service
5,Bordesley Green,Supermarket,Automotive Shop,Bus Station,Furniture / Home Store,Warehouse Store,Gastropub,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
6,"Bournville, Cotteridge, Stirchley",Indian Restaurant,Grocery Store,Park,Beer Store,Beer Bar,Go Kart Track,Gym,Furniture / Home Store,Deli / Bodega,History Museum
7,Castle Vale,Warehouse Store,Gym Pool,Chinese Restaurant,Electronics Store,Supermarket,Gas Station,Cricket Ground,Deli / Bodega,Farmers Market,Fast Food Restaurant
8,"Digbeth, Highgate, Lee Bank",Gay Bar,Chinese Restaurant,Hotel,Music Venue,Korean Restaurant,Japanese Restaurant,Grocery Store,Bookstore,Pizza Place,Latin American Restaurant
9,"Edgbaston, Ladywood",Hotel,Convenience Store,Chinese Restaurant,Casino,Electronics Store,Fast Food Restaurant,Reservoir,Middle Eastern Restaurant,Sandwich Place,Persian Restaurant


In [178]:
# set number of clusters
kclusters = 6

birmingham_grouped_clustering = birmingham_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(birmingham_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 0, 0, 1, 0, 1, 1], dtype=int32)

In [179]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [180]:
birmingham_merged = df_data_1

birmingham_merged = birmingham_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Town/Area')

birmingham_merged.head() # check the last columns!

Unnamed: 0,Postcode,Latitude,Longitude,Town/Area,Region,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,B5,52.4722,-1.89683,"Digbeth, Highgate, Lee Bank",Birmingham,12156.0,5139.0,1.0,Gay Bar,Chinese Restaurant,Hotel,Music Venue,Korean Restaurant,Japanese Restaurant,Grocery Store,Bookstore,Pizza Place,Latin American Restaurant
1,B6,52.5025,-1.88686,Aston,Birmingham,19507.0,5886.0,1.0,Park,Auto Garage,Grocery Store,Motorcycle Shop,Warehouse Store,Gas Station,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
2,B7,52.4938,-1.87437,Nechells,Birmingham,8554.0,3444.0,1.0,Medical Center,Gym / Fitness Center,Grocery Store,Convenience Store,Gym Pool,Gym,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market
3,B8,52.49,-1.84277,"Washwood Heath, Ward End, Saltley",Birmingham,42278.0,11987.0,,,,,,,,,,,
4,B9,52.4781,-1.85285,Bordesley Green,Birmingham,24915.0,7548.0,0.0,Supermarket,Automotive Shop,Bus Station,Furniture / Home Store,Warehouse Store,Gastropub,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant


In [181]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [182]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 0, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,52.4781,24915.0,7548.0,0.0,Supermarket,Automotive Shop,Bus Station,Furniture / Home Store,Warehouse Store,Gastropub,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
10,52.4679,17872.0,6220.0,0.0,Coffee Shop,Hotel,Supermarket,Café,Pub,Restaurant,Botanical Garden,Gastropub,Asian Restaurant,Tapas Restaurant
12,52.4615,25625.0,10999.0,0.0,Coffee Shop,Indoor Play Area,Gym / Fitness Center,Deli / Bodega,Pub,Gas Station,Cricket Ground,Electronics Store,Farmers Market,Fast Food Restaurant
20,52.4617,15861.0,5983.0,0.0,Indian Restaurant,Supermarket,Optical Shop,Hotel,Gym,Food Service,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store
21,52.4607,32458.0,13394.0,0.0,Pub,Warehouse Store,Construction & Landscaping,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store
22,52.4458,25331.0,10206.0,0.0,Supermarket,Pub,Warehouse Store,Breakfast Spot,Bowling Alley,Furniture / Home Store,Sandwich Place,Coffee Shop,Golf Course,Food Service
23,52.4278,30345.0,10982.0,0.0,Indian Restaurant,Supermarket,Pub,Arts & Crafts Store,Theater,Coffee Shop,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store
24,52.4377,42615.0,15306.0,0.0,Pub,Post Office,Restaurant,Rugby Pitch,Warehouse Store,Coffee Shop,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store
26,52.4091,53794.0,22747.0,0.0,Pet Store,Gym Pool,Supermarket,Pub,Breakfast Spot,Furniture / Home Store,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store
28,52.4807,34483.0,13699.0,0.0,Train Station,Auto Garage,Supermarket,Furniture / Home Store,Warehouse Store,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market


In [183]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 1, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,52.4722,12156.0,5139.0,1.0,Gay Bar,Chinese Restaurant,Hotel,Music Venue,Korean Restaurant,Japanese Restaurant,Grocery Store,Bookstore,Pizza Place,Latin American Restaurant
1,52.5025,19507.0,5886.0,1.0,Park,Auto Garage,Grocery Store,Motorcycle Shop,Warehouse Store,Gas Station,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant
2,52.4938,8554.0,3444.0,1.0,Medical Center,Gym / Fitness Center,Grocery Store,Convenience Store,Gym Pool,Gym,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market
5,52.4701,26004.0,7108.0,1.0,Sandwich Place,Café,Indian Restaurant,Turkish Restaurant,Gym / Fitness Center,Fast Food Restaurant,Middle Eastern Restaurant,Cricket Ground,Deli / Bodega,Electronics Store
6,52.4554,44391.0,12602.0,1.0,Convenience Store,Gas Station,Italian Restaurant,Pakistani Restaurant,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service
7,52.4609,20548.0,6649.0,1.0,Fast Food Restaurant,Middle Eastern Restaurant,Performing Arts Venue,Electronics Store,Grocery Store,Café,Warehouse Store,Deli / Bodega,Farmers Market,Food Service
8,52.4351,37672.0,15392.0,1.0,Convenience Store,Auto Workshop,Golf Course,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service
9,52.4188,43099.0,18063.0,1.0,Photography Studio,Gym / Fitness Center,Chinese Restaurant,Grocery Store,Cricket Ground,Restaurant,Food Service,Convenience Store,Deli / Bodega,Electronics Store
11,52.4755,23619.0,10305.0,1.0,Hotel,Convenience Store,Chinese Restaurant,Casino,Electronics Store,Fast Food Restaurant,Reservoir,Middle Eastern Restaurant,Sandwich Place,Persian Restaurant
14,52.4963,20204.0,6606.0,1.0,Soccer Field,Soccer Stadium,Food Service,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Furniture / Home Store


In [184]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 2, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,52.3885,30273.0,12810.0,2.0,Park,Warehouse Store,IT Services,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store


In [185]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 3, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,52.4509,38357.0,16209.0,3.0,Bakery,Construction & Landscaping,Warehouse Store,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service
29,52.4957,18941.0,8112.0,3.0,IT Services,Construction & Landscaping,Convenience Store,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store


In [186]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 4, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,52.4915,16739.0,6389.0,4.0,Bar,Warehouse Store,Gastropub,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store


In [187]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 5, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,52.5447,21509.0,9175.0,5.0,Rugby Pitch,Warehouse Store,Construction & Landscaping,Cricket Ground,Deli / Bodega,Electronics Store,Farmers Market,Fast Food Restaurant,Food Service,Furniture / Home Store


In [188]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 6, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [189]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 7, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [190]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 8, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [191]:
birmingham_merged.loc[birmingham_merged['Cluster Labels'] == 9, birmingham_merged.columns[[1] + list(range(5, birmingham_merged.shape[1]))]]


Unnamed: 0,Latitude,Population,Households,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [192]:
address = 'Moor St Queensway, Birmingham, West Midlands, B4 7UL, UK'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

52.4788762 -1.8929807


In [193]:
search_query = 'Hotel'
radius = 10000
print(search_query + ' .... OK!')

Hotel .... OK!


In [194]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=QNBM2LMRAUNUXJCPPBXC0WZQJIJQMBVURYLWBCXM1FQOEQPX&client_secret=MPACHLKKDKKTVJJPNMI32VEAC3CPM5WL2ZRF5Z2A2Z0313J2&ll=52.4788762,-1.8929807&v=20180604&query=Hotel&radius=10000&limit=1000'

In [195]:
results = requests.get(url).json()

In [196]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,570673e7498e909686bcccce,"160 Wharfside Street, The Mailbox",GB,Birmingham,United Kingdom,,979,"[160 Wharfside Street, The Mailbox, Birmingham...","[{'label': 'display', 'lat': 52.475198, 'lng':...",52.475198,-1.906108,,B1 1RL,West Midlands,AC Hotel by Marriott Birmingham,v-1583366449,422703144.0
1,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4b058828f964a5205db522e3,New St,GB,Birmingham,United Kingdom,,277,"[New St, Birmingham, West Midlands, B2 4RX, Un...","[{'label': 'display', 'lat': 52.47899629983553...",52.478996,-1.897068,,B2 4RX,West Midlands,Britannia Hotel,v-1583366449,
2,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4b058829f964a52077b522e3,"12 Holloway Circus, Queensway",GB,Birmingham,United Kingdom,,624,"[12 Holloway Circus, Queensway, Birmingham, We...","[{'label': 'display', 'lat': 52.47547677455434...",52.475477,-1.90031,,B1 1BT,West Midlands,"Radisson Blu Hotel, Birmingham",v-1583366449,
3,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4b058828f964a5203ab522e3,126 New St,GB,Birmingham,United Kingdom,btw Lower Temple St & Stephensons St,399,[126 New St (btw Lower Temple St & Stephensons...,"[{'label': 'display', 'lat': 52.47904101927383...",52.479041,-1.898868,,B2 4JQ,West Midlands,Macdonald Burlington Hotel,v-1583366449,
4,"[{'id': '4bf58dd8d48988d1fa931735', 'name': 'H...",False,4ba93479f964a5201a143ae3,21 Ladywell Walk,GB,Birmingham,United Kingdom,,486,"[21 Ladywell Walk, Birmingham, West Midlands, ...","[{'label': 'display', 'lat': 52.47504220011472...",52.475042,-1.89643,,B5 4ST,West Midlands,Ibis Hotel,v-1583366449,


In [197]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,AC Hotel by Marriott Birmingham,Hotel,"160 Wharfside Street, The Mailbox",GB,Birmingham,United Kingdom,,979,"[160 Wharfside Street, The Mailbox, Birmingham...","[{'label': 'display', 'lat': 52.475198, 'lng':...",52.475198,-1.906108,,B1 1RL,West Midlands,570673e7498e909686bcccce
1,Britannia Hotel,Hotel,New St,GB,Birmingham,United Kingdom,,277,"[New St, Birmingham, West Midlands, B2 4RX, Un...","[{'label': 'display', 'lat': 52.47899629983553...",52.478996,-1.897068,,B2 4RX,West Midlands,4b058828f964a5205db522e3
2,"Radisson Blu Hotel, Birmingham",Hotel,"12 Holloway Circus, Queensway",GB,Birmingham,United Kingdom,,624,"[12 Holloway Circus, Queensway, Birmingham, We...","[{'label': 'display', 'lat': 52.47547677455434...",52.475477,-1.90031,,B1 1BT,West Midlands,4b058829f964a52077b522e3
3,Macdonald Burlington Hotel,Hotel,126 New St,GB,Birmingham,United Kingdom,btw Lower Temple St & Stephensons St,399,[126 New St (btw Lower Temple St & Stephensons...,"[{'label': 'display', 'lat': 52.47904101927383...",52.479041,-1.898868,,B2 4JQ,West Midlands,4b058828f964a5203ab522e3
4,Ibis Hotel,Hotel,21 Ladywell Walk,GB,Birmingham,United Kingdom,,486,"[21 Ladywell Walk, Birmingham, West Midlands, ...","[{'label': 'display', 'lat': 52.47504220011472...",52.475042,-1.89643,,B5 4ST,West Midlands,4ba93479f964a5201a143ae3
5,Hotel Chocolat,Café,117 New St,GB,Birmingham,United Kingdom,,426,"[117 New St, Birmingham, West Midlands, B2 4JH...","[{'label': 'display', 'lat': 52.479125, 'lng':...",52.479125,-1.899262,,B2 4JH,West Midlands,5a05d9d10457b70895825389
6,Copthorne Hotel Birmingham,Hotel,"Paradise Circus, Paradise Place",GB,Warwickshire,United Kingdom,Centenary Square,928,"[Paradise Circus, Paradise Place (Centenary Sq...","[{'label': 'display', 'lat': 52.48033191682519...",52.480332,-1.906459,,B3 3HJ,Warwickshire,51818544c84c71399407eca0
7,Ibis Budget Hotel,Hotel,1 Great Colmore Street,GB,Birmingham,United Kingdom,,971,"[1 Great Colmore Street, Birmingham, West Midl...","[{'label': 'display', 'lat': 52.47138548166894...",52.471385,-1.900328,,B15 2AP,West Midlands,501d14ade4b08947b51d0f04
8,Hotel du Vin & Bistro,Hotel,25 Church St,GB,Birmingham,United Kingdom,,641,"[25 Church St, Birmingham, West Midlands, B3 2...","[{'label': 'display', 'lat': 52.48239866190785...",52.482399,-1.900461,City Centre,B3 2NR,West Midlands,4badf914f964a520fd763be3
9,Hotel Indigo Birmingham,Hotel,"The Cube, Wharfside St",GB,Birmingham,United Kingdom,at Commercial St,1021,"[The Cube, Wharfside St (at Commercial St), Bi...","[{'label': 'display', 'lat': 52.4749228, 'lng'...",52.474923,-1.906575,,B1 1RS,West Midlands,4eee6a37003937534527c5ba


In [198]:
dataframe_filtered.name

0                       AC Hotel by Marriott Birmingham
1                                       Britannia Hotel
2                        Radisson Blu Hotel, Birmingham
3                            Macdonald Burlington Hotel
4                                            Ibis Hotel
5                                        Hotel Chocolat
6                            Copthorne Hotel Birmingham
7                                     Ibis Budget Hotel
8                                 Hotel du Vin & Bistro
9                               Hotel Indigo Birmingham
10    Conference Aston - Aston Business School Hotel...
11                                BLOC Hotel Birmingham
12                                           Ibis Hotel
13                                    Ibis Styles Hotel
14                                        Clayton Hotel
15                                      Campanile Hotel
16                                       Hotel Chocolat
17                                 easyHotel Bir

In [199]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map