# IBM Data Science Capstone - Neighborhoods
### This notebook will be used for the capstone project in the Data Science Professional Certificate

In [1]:
import pandas as pd
import numpy as np

In [2]:
print ('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Introduction

You mighgt be a seasoned traveller that's been many places. You're a foodie and love theatres. You've travelled to many cities before and you don't really know where you would like to go next? Wouldn't it be nice to get new travel ideas? To find new cities based on how similar they are to the ones you liked in your previous travels?

Or, you might be a business man that travels around a world to sign deals. Everywhere you go, you need to take your futur clients out and entertain them to talk business in a relaxed setting.  Wouldn't it be nice if you could know how similar the next city you're going to go to is to the ones you know already? 

Usually, all these travels start with landing in a major city. Often, these are either country capitals or financial centers. The problem we're going to solve is to answer these questions: How is one city similar to the others? What are the most common venues that set these cities appart?

And to answer these questions, we're going to compare and cluster capitals and financial centers based on the Foursquare API explore venues feature.

## Data

The datasets we will need for this analysis are a list of capital cities and financial centers of the world. To build that list, we will scrape two pages from Wikipedia. 

Thoses pages are "List_of_national_capitals" and "Global_Financial_Centres_Index".
 
Next, we will need to get the venues in every city. For this, we will use the Fourquare API with the maximum setting of 100 000 meters. This will give us a range of 100 km to explore and will cover an area that would be accessible to travelers and business men for a day trip by car or public transport.

To feed the Foursquare API, we will need the geographical coordinates for every city. Those will be provided using the geocoders Nominatim library.

### Example of the capital cities dataset

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_national_capitals' 
dfs = pd.read_html(url)
print(dfs[1])

                                             City/Town     Country/Territory  \
0    Abidjan (former capital; still has many govern...           Ivory Coast   
1                              Yamoussoukro (official)           Ivory Coast   
2                                            Abu Dhabi  United Arab Emirates   
3                                                Abuja               Nigeria   
4                                                Accra                 Ghana   
..                                                 ...                   ...   
251                                           Windhoek               Namibia   
252                                            Yaoundé              Cameroon   
253                                   Yaren (de facto)                 Nauru   
254                                            Yerevan               Armenia   
255                                             Zagreb               Croatia   

                                       

In [4]:
dfCap = pd.DataFrame()
city = dfs[1]['City/Town'] 
country = dfs[1]['Country/Territory']  
dfCap['Capital'] = city
dfCap['Country'] = country
print(dfCap.shape)
dfCap.head(10)

(256, 2)


Unnamed: 0,Capital,Country
0,Abidjan (former capital; still has many govern...,Ivory Coast
1,Yamoussoukro (official),Ivory Coast
2,Abu Dhabi,United Arab Emirates
3,Abuja,Nigeria
4,Accra,Ghana
5,Adamstown,Pitcairn Islands
6,Addis Ababa,Ethiopia
7,"Aden (de facto, temporary)",Yemen
8,Sana'a (de jure),Yemen
9,Algiers,Algeria


### Example of the financial centers dataset

In [5]:
url2 = 'https://en.wikipedia.org/wiki/Global_Financial_Centres_Index' 
dff = pd.read_html(url2)
print(dff[1])

    Rank  Change            Centre  Rating  Change.1
0      1     NaN     New York City     770       1.0
1      2     NaN            London     766      24.0
2      3     1.0          Shanghai     748       8.0
3      4     1.0             Tokyo     747       6.0
4      5     1.0         Hong Kong     743       6.0
5      6     1.0         Singapore     742       4.0
6      7     NaN           Beijing     741       7.0
7      8     NaN     San Francisco     738       6.0
8      9     2.0          Shenzhen     732      10.0
9     10     4.0            Zurich     724       5.0
10    11     1.0       Los Angeles     720       3.0
11    12     6.0        Luxembourg     719       4.0
12    13     4.0         Edinburgh     718       2.0
13    14     5.0            Geneva     717      12.0
14    15    10.0            Boston     716       8.0
15    16     3.0         Frankfurt     715       5.0
16    17     5.0             Dubai     714       7.0
17    18     3.0             Paris     713    

In [7]:
dfFin = pd.DataFrame()
center1 = dff[1]['Centre']
center2 = dff[2]['Centre']
dfFin['Financial Center'] = center1.append(center2, ignore_index=True)
print(dfFin.shape)
dfFin.tail(10)

(111, 1)


Unnamed: 0,Financial Center
101,Tehran
102,Kuwait City
103,Saint Petersburg
104,Xi'an
105,Manila
106,Riyadh
107,Tianjin
108,Hangzhou
109,Dalian
110,Wuhan


### Example of geocoders

In [8]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

In [36]:
for x in range(0,5):
    address = dfFin.loc[x,'Financial Center']
    geolocator = Nominatim(user_agent="ny_explorer") 
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of ',dfFin.loc[x,'Financial Center'],  ' are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of  New York City  are 40.7127281, -74.0060152.
The geograpical coordinate of  London  are 51.5073219, -0.1276474.
The geograpical coordinate of  Shanghai  are 31.2322758, 121.4692071.
The geograpical coordinate of  Tokyo  are 35.6828387, 139.7594549.
The geograpical coordinate of  Hong Kong  are 22.350627, 114.1849161.


### Example of Foursquare API explore venues feature

In [31]:
# The code was removed by Watson Studio for sharing.

In [40]:
import json
import requests

city_latitude = latitude
city_longitude = longitude
city_name = dfFin.loc[x,'Financial Center']

print('Latitude and longitude values of {} are {}, {}.'.format(city_name,city_latitude,city_longitude))
VERSION = '20180605' # Foursquare API version

LIMIT = 5 # limit of number of venues returned by Foursquare API
radius = 100000 # define radius of 100 km
# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID, CLIENT_SECRET, VERSION, city_latitude, city_longitude, radius,
LIMIT)
print(url) # display URL
results = requests.get(url).json() 
results

Latitude and longitude values of Hong Kong are 22.350627, 114.1849161.
https://api.foursquare.com/v2/venues/explore?&client_id=HGTQKEYL2BGEFGU2JS43LI4AOWASODLQC33WIYUKCBAMAPVK&client_secret=K2CFUHUWDCGEWHF144ORU0QSB32HRYSLVF4FFNYX2DJY3UMJ&v=20180605&ll=22.350627,114.1849161&radius=100000&limit=5


{'meta': {'code': 200, 'requestId': '5fbbf06d0cc1d6096f04b210'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Hong Kong',
  'headerFullLocation': 'Hong Kong',
  'headerLocationGranularity': 'city',
  'totalResults': 240,
  'suggestedBounds': {'ne': {'lat': 23.2506279000009,
    'lng': 115.15620562749005},
   'sw': {'lat': 21.4506260999991, 'lng': 113.21362657250994}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4bb697b3ef159c74493d76f7',
       'name': 'Nan Lian Garden (南蓮園池)',
       'location': {'address': '60 Fung Tak Rd',
        'lat': 22.339033,
        'lng': 114.204766,
        'labeledLatLngs': [{'label': 'display',
          'lat': 22.339033,
          'lng': 114.2