# **CAPSTONE PROJECT: BATTLE OF THE NEIGHBORHOODS** 

### **I. Purpose**

This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate program – Coursera Capstone.

## **II. Introduction**

Glasgow is the most populous city in Scotland, and the third most populous city in the United Kingdom, as of the 2017 estimated city population of 621,020. Historically part of Lanarkshire, the city now forms the Glasgow City council area, one of the 32 council areas of Scotland;Glasgow is situated on the River Clyde in the country's West Central Lowlands. It is the fifth most visited city in the UK.

Glasgow has an estimated population of 596,000, which is little changed from 595,000 in 2012 and 593,000 in 2011. Glasgow has a population density of 3,400 people per square kilometer, which makes it the most densely populated city in Scotland. The larger Greater Glasgow area has an estimated population of 1.2 million, while the region surrounding the conurbation has about 2.8 million residents. This represents about 42% of the population of Scotland. Additionally, the city proper takes 175 square kilometers of area (approximately 68 square miles), while the metro area expands out to 3,338 square kilometers (about 1,289 square miles).

Coffee is the most popular drink worldwide with around two billion cups consumed every day. In the UK, we now drink approximately 95 million cups of coffee per day. The coffee industry creates over 210,000 UK jobs. The Gross Value-Added contribution from the UK coffee industry to the economy is estimated to be £9.1 billion, whilst output contribution, including indirect and induced multiplier impacts, of £17.7 billion in 2017.

In this project, we will atempt to use FourSquare and K-Means clustering to find the optimal location for opening a new cafe. 

## **III. Data acquisition**

This demonstration will make use of the following data sources:

Greater Glasgow & Clyde areas and their size and population density:
https://en.wikipedia.org/wiki/List_of_places_in_Glasgow


A list of current gyms and exercise facitilies as found on Google Maps.

Glasgow Top Venue Recommendations from FourSquare API
(FourSquare website: www.foursquare.com)

I will be using the FourSquare API to explore areas in Glasgow. The Foursquare explore function will be used to get the location of the gyms in each neighborhood, and then use this feature to group the neighborhoods into clusters. The following information are retrieved on the first query:

Venue ID
Venue Name
Coordinates : Latitude and Longitude
Category Name
Another venue query will be performed to retrieve venue ratings for each location. Note that rating information is a paid service from FourSquare and we are limited to only 50 queries per day. With this constraint, we limit the category analysis with only one type for this demo. I will try to retrieve as many ratings as possible for each retrieved venue ID.

## **IV. Methodology**

In [1]:
# Install all necessary packages 

import requests
from bs4 import BeautifulSoup #parses through xml and html 
import csv
import json
import xml
import pandas as pd
import numpy as np

#visualising maps and data interpreted by Python
!conda install -c conda-forge folium=0.5.0 --yes 
import folium
from folium import plugins

import json
from pprint import pprint

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim #converts address into latitude and longitude coordinates

#libraries for displaying images 
from IPython.display import Image
from IPython.core.display import HTML

#plotting
import matplotlib.cm as cm
import matplotlib.colors as colors 

#clustering
from sklearn.cluster import KMeans

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json
import requests #handles requests 
from pandas.io.json import json_normalize #transform json files into pandas dataframe

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [2]:
#URL containing list of Glasgow subway stations
url = requests.get('https://en.wikipedia.org/wiki/List_of_places_in_Glasgow').text

In [3]:
soup = BeautifulSoup(url, 'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of places in Glasgow - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XpiO5ApAAEAAAFaJ3BQAAAAG","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_places_in_Glasgow","wgTitle":"List of places in Glasgow","wgCurRevisionId":932466011,"wgRevisionId":932466011,"wgArticleId":1173652,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Use dmy dates from May 2015","Use British English from May 2015","Areas of Glasgow","Glasgow-related lists","Glasgow","Lists of

In [4]:
arrond_table = soup.find('table', class_='wikitable sortable')
arrond_table_rows = arrond_table.find_all('tr')

In [5]:
#extract information from table 
information = []
for row in arrond_table_rows:
    info = row.text.split('\n')[1:-1]
    information.append(info)
    


information

[['',
  '',
  'District',
  '',
  'Population',
  '',
  'Area (km²)',
  '',
  'Density (/km²)'],
 ['1', '', 'Govanhill', '', '9,725', '', '0.86', '', '11,308'],
 ['2', '', 'Pollokshields', '', '9,738', '', '1.59', '', '6,125'],
 ['3', '', 'Partick', '', '8,884', '', '0.85', '', '10,452'],
 ['4', '', 'Hillhead', '', '6,275', '', '0.96', '', '6,536'],
 ['5', '', 'Govan', '', '5,860', '', '1.63', '', '3,595'],
 ['6', '', 'Gorbals', '', '6,030', '', '0.83', '', '7,265'],
 ['7', '', 'Shawlands', '', '7,015', '', '0.52', '', '13,490'],
 ['8', '', 'Langside', '', '4,425', '', '0.46', '', '9,620'],
 ['Σ', '', 'Total', '', '57,952', '', '7.7', '', '7,526']]

In [6]:
#turn the information into a dataframe 
arrond_df = pd.DataFrame(information[1:], columns=information[0])


arrond_df.head(25)


Unnamed: 0,Unnamed: 1,Unnamed: 2,District,Unnamed: 4,Population,Unnamed: 6,Area (km²),Unnamed: 8,Density (/km²)
0,1,,Govanhill,,9725,,0.86,,11308
1,2,,Pollokshields,,9738,,1.59,,6125
2,3,,Partick,,8884,,0.85,,10452
3,4,,Hillhead,,6275,,0.96,,6536
4,5,,Govan,,5860,,1.63,,3595
5,6,,Gorbals,,6030,,0.83,,7265
6,7,,Shawlands,,7015,,0.52,,13490
7,8,,Langside,,4425,,0.46,,9620
8,Σ,,Total,,57952,,7.7,,7526


In [34]:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
arrond_df['District_Coord']= arrond_df['District'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

arrond_df

  


Unnamed: 0,Unnamed: 1,Unnamed: 2,District,Unnamed: 4,Population,Unnamed: 6,Area (km²),Unnamed: 8,Density (/km²),Major_Dist_Coord,District_Coord
0,1,,Govanhill,,9725,,0.86,,11308,"(55.8363741, -4.2581531)","(55.8363741, -4.2581531)"
1,2,,Pollokshields,,9738,,1.59,,6125,"(55.8422663, -4.2849973)","(55.8422663, -4.2849973)"
2,3,,Partick,,8884,,0.85,,10452,"(55.8699211, -4.3094365)","(55.8699211, -4.3094365)"
3,4,,Hillhead,,6275,,0.96,,6536,"(55.8752091, -4.293281)","(55.8752091, -4.293281)"
4,5,,Govan,,5860,,1.63,,3595,"(55.860879, -4.3185273)","(55.860879, -4.3185273)"
5,6,,Gorbals,,6030,,0.83,,7265,"(55.851813, -4.2531625)","(55.851813, -4.2531625)"
6,7,,Shawlands,,7015,,0.52,,13490,"(55.8292301, -4.2924584)","(55.8292301, -4.2924584)"
7,8,,Langside,,4425,,0.46,,9620,"(55.8209413, -4.276069)","(55.8209413, -4.276069)"
8,Σ,,Total,,57952,,7.7,,7526,"(45.420063, 12.3751361)","(45.420063, 12.3751361)"


In [43]:
arrond_new_df = arrond_df.drop(arrond_df.columns[0],axis=1)
arrond_df.update(arrond_new_df)
arrond_new_df

Unnamed: 0,District,Population,Area (km²),Density (/km²),Major_Dist_Coord,District_Coord
0,Govanhill,9725,0.86,11308,"(55.8363741, -4.2581531)","(55.8363741, -4.2581531)"
1,Pollokshields,9738,1.59,6125,"(55.8422663, -4.2849973)","(55.8422663, -4.2849973)"
2,Partick,8884,0.85,10452,"(55.8699211, -4.3094365)","(55.8699211, -4.3094365)"
3,Hillhead,6275,0.96,6536,"(55.8752091, -4.293281)","(55.8752091, -4.293281)"
4,Govan,5860,1.63,3595,"(55.860879, -4.3185273)","(55.860879, -4.3185273)"
5,Gorbals,6030,0.83,7265,"(55.851813, -4.2531625)","(55.851813, -4.2531625)"
6,Shawlands,7015,0.52,13490,"(55.8292301, -4.2924584)","(55.8292301, -4.2924584)"
7,Langside,4425,0.46,9620,"(55.8209413, -4.276069)","(55.8209413, -4.276069)"
8,Total,57952,7.7,7526,"(45.420063, 12.3751361)","(45.420063, 12.3751361)"


In [44]:
arrond_new_df = arrond_new_df.drop(arrond_new_df.columns[4],axis=1)
arrond_new_df.update(arrond_new_df)
arrond_new_df

Unnamed: 0,District,Population,Area (km²),Density (/km²),District_Coord
0,Govanhill,9725,0.86,11308,"(55.8363741, -4.2581531)"
1,Pollokshields,9738,1.59,6125,"(55.8422663, -4.2849973)"
2,Partick,8884,0.85,10452,"(55.8699211, -4.3094365)"
3,Hillhead,6275,0.96,6536,"(55.8752091, -4.293281)"
4,Govan,5860,1.63,3595,"(55.860879, -4.3185273)"
5,Gorbals,6030,0.83,7265,"(55.851813, -4.2531625)"
6,Shawlands,7015,0.52,13490,"(55.8292301, -4.2924584)"
7,Langside,4425,0.46,9620,"(55.8209413, -4.276069)"
8,Total,57952,7.7,7526,"(45.420063, 12.3751361)"


In [45]:
arrond_new_df.dtypes

District          object
Population        object
Area (km²)        object
Density (/km²)    object
District_Coord    object
dtype: object

In [8]:
arrond_new_df = arrond_df.drop(arrond_df.columns[1],axis=1)
arrond_df.update(arrond_new_df)

In [46]:
arrond_new_df.head(25)

Unnamed: 0,District,Population,Area (km²),Density (/km²),District_Coord
0,Govanhill,9725,0.86,11308,"(55.8363741, -4.2581531)"
1,Pollokshields,9738,1.59,6125,"(55.8422663, -4.2849973)"
2,Partick,8884,0.85,10452,"(55.8699211, -4.3094365)"
3,Hillhead,6275,0.96,6536,"(55.8752091, -4.293281)"
4,Govan,5860,1.63,3595,"(55.860879, -4.3185273)"
5,Gorbals,6030,0.83,7265,"(55.851813, -4.2531625)"
6,Shawlands,7015,0.52,13490,"(55.8292301, -4.2924584)"
7,Langside,4425,0.46,9620,"(55.8209413, -4.276069)"
8,Total,57952,7.7,7526,"(45.420063, 12.3751361)"


In [48]:
arrond_new_df[['Latitude', 'Longitude']] = arrond_new_df['District_Coord'].apply(pd.Series)
arrond_new_df

Unnamed: 0,District,Population,Area (km²),Density (/km²),District_Coord,Latitude,Longitude
0,Govanhill,9725,0.86,11308,"(55.8363741, -4.2581531)",55.836374,-4.258153
1,Pollokshields,9738,1.59,6125,"(55.8422663, -4.2849973)",55.842266,-4.284997
2,Partick,8884,0.85,10452,"(55.8699211, -4.3094365)",55.869921,-4.309437
3,Hillhead,6275,0.96,6536,"(55.8752091, -4.293281)",55.875209,-4.293281
4,Govan,5860,1.63,3595,"(55.860879, -4.3185273)",55.860879,-4.318527
5,Gorbals,6030,0.83,7265,"(55.851813, -4.2531625)",55.851813,-4.253163
6,Shawlands,7015,0.52,13490,"(55.8292301, -4.2924584)",55.82923,-4.292458
7,Langside,4425,0.46,9620,"(55.8209413, -4.276069)",55.820941,-4.276069
8,Total,57952,7.7,7526,"(45.420063, 12.3751361)",45.420063,12.375136


**End of week 4 - part 1**