## Coursera_IBM_Applied-Data-Science-Capstone
#### This Notebook represents my work for the Coursera_IBM_Applied Data Science Capstone as one of the various courses of IBM Data Science Professional Certificate

### Peer-graded Assignment: Capstone Project - The Battle of Neighborhoods
##### (C) Ahmed Tealeb

## Business Density of Shanghai and Tokyo

### Description:

Sometimes, we will confuse where to open a restaurant while we are rich enough. So can we find the way to do the assessments. We selected <b>Shanghai</b> and <b>Tokyo</b> to compare the neighborhoods of the two cities and determine how similar or dissimilar they are. Is <b>Shanghai</b> more like <b>Tokyo</b> or vice-versa.

### Background:

<b>Shanghai</b> is one of the four municipalities under the direct administration of the central government of China, the largest city in China by population, and the second most populous city proper in the world, with a population of more than 24 million as of 2017. It is a global financial centre and transport hub, with the world's busiest container port. Located in the Yangtze River Delta, it sits on the south edge of the estuary of the Yangtze in the middle portion of the East China coast. The municipality borders the provinces of Jiangsu and Zhejiang to the north, south and west, and is bounded to the east by the East China Sea.

<b>Tokyo</b>, officially <b>Tokyo Metropolis</b>, one of the 47 prefectures of Japan, has served as the Japanese capital since 1869.As of 2014 the Greater Tokyo Area ranked as the most populous metropolitan area in the world. The urban area houses the seat of the Emperor of Japan, of the Japanese government and of the National Diet. Tokyo forms part of the Kantō region on the southeastern side of Japan's main island, Honshu, and includes the Izu Islands and Ogasawara Islands.Tokyo was formerly named Edo when Shōgun Tokugawa Ieyasu made the city as his headquarters in 1603. It became the capital after Emperor Meiji moved his seat to the city from Kyoto in 1868; at that time Edo was renamed Tokyo. Tokyo Metropolis formed in 1943 from the merger of the former Tokyo Prefecture and the city of Tokyo.

### Description of the data:

1. We need to use google map to find the city's location data.
2. We need to use the foursquare venues data to make samples.
3. We need to use DBSCAN algorithm to make labels from the matrix sample which from foursquare venues data.
4. We need to use the folium map data to do the visualization.

### How to solve the problem: 

### Import all the functions

In [6]:
# -*- coding: utf-8 -*-
import os,sys
import urllib
import requests 
import json
from urllib.request import urlopen
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN
import folium
import math

In [7]:
# get the location of the city
def getlocation(address):
    geolocator = Nominatim()
    location = geolocator.geocode(address)
    latitude = 0.0
    latitude = location.latitude
    longitude =0.0
    longitude = location.longitude
    print('The geograpical coordinate of ',address,'are {}, {}.'.format(latitude, longitude))
    return latitude,longitude

In [8]:
# get the dataframe of the place using foursqare api
def getNearbyVenues_new(name, latitudes, longitudes, radius=5000):
    
    venues_list=[]
#    for name, lat, lng in zip(names, latitudes, longitudes):
#        print(name)
            
        # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        latitudes, 
        longitudes, 
        radius, 
        500)
            
        # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
       
        # return only relevant information for each nearby venue
    venues_list.append([(
        name, 
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location','Venue','Latitude','Longitude','Category']    
    return(nearby_venues)

In [9]:
# create cluster map
def clusterMap(kclusters,dfs):
    x = np.arange(kclusters)
    ys = [i+x+(i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(dfs['Latitude'], dfs['Longitude'], dfs['Venue'], dfs['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters) 

In [10]:
# get the eps value of the DBSCAN algorithm
def epsilon(data, MinPts):
    m, n = np.shape(data)
    xMax = np.max(data, 0)
    xMin = np.min(data, 0)
    eps = ((np.prod(xMax - xMin) * MinPts * math.gamma(0.5 * n + 1)) / (m * math.sqrt(math.pi ** n))) ** (1.0 / n)
    return eps

In [11]:
# color the place in Map
def mapMarkers(map_name,dfs_data):  # add markers to map
    for lat, lng, label in zip(dfs_data['Latitude'], dfs_data['Longitude'], dfs_data['Venue']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7).add_to(map_name) 

### Get the location of Shanghai and Tokyo

In [7]:
address_sh = 'Shanghai,CN'
address_tk = 'Tokyo,JP'
#location_sh = getlocation(address_sh)
latitude_sh = 31.2253441
longitude_sh = 121.4888922
#location_tk = getlocation(address_tk)
latitude_tk = 35.6828387
longitude_tk = 139.7594549
print('The geograpical coordinate of ',address_sh,'are {} {} .'.format(latitude_sh,longitude_sh))
print('The geograpical coordinate of ',address_tk,'are {} {}.'.format(latitude_tk,longitude_sh))

The geograpical coordinate of  Shanghai,CN are 31.2253441 121.4888922 .
The geograpical coordinate of  Tokyo,JP are 35.6828387 121.4888922.


### Make the map to display the two cities

In [8]:
map_all = folium.Map(location=[(latitude_sh+latitude_tk)/2, (longitude_sh+longitude_tk)/2], tiles='Stamen Terrain',zoom_start=5)

folium.Marker(location=[latitude_sh, longitude_sh], popup='Shanghai City').add_to(map_all)
folium.CircleMarker(location=[latitude_sh, longitude_sh], radius=10,
popup='Shanghai City', color='#3186cc',fill_color='#3186cc').add_to(map_all)

folium.Marker(location=[latitude_tk, longitude_tk], popup='Tokyo City').add_to(map_all)
folium.CircleMarker(location=[latitude_tk, longitude_tk], radius=10,
popup='Shanghai City', color='#3186cc',fill_color='#3186cc').add_to(map_all)
map_all

In [9]:
map_all.save('map_all.html')

### Define the ID of the foursquare api

In [5]:
CLIENT_ID = '1DSRC5HBGQXKI2PVYIUZY2EDC2N0QXKLJ32YSJVNLXJNXP12' # your Foursquare ID
CLIENT_SECRET = 'M551JVRZT2OYTPWAGQQTFDZF4UBOHDQTRI4ACE1Y0T14SVIF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
QUERY = 'food'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1DSRC5HBGQXKI2PVYIUZY2EDC2N0QXKLJ32YSJVNLXJNXP12
CLIENT_SECRET:M551JVRZT2OYTPWAGQQTFDZF4UBOHDQTRI4ACE1Y0T14SVIF


### Get the venues of shanghai

In [11]:
dfs_sh = getNearbyVenues_new('Shanghai',latitude_sh,longitude_sh)
dfs_sh.head()

Unnamed: 0,Location,Venue,Latitude,Longitude,Category
0,Shanghai,Yu Garden (豫园),31.228922,121.487982,Garden
1,Shanghai,CHAR Bar,31.228209,121.495593,Hotel Bar
2,Shanghai,Hotel Indigo Shanghai On The Bund (上海外灘英迪格酒店),31.228193,121.495571,Hotel
3,Shanghai,City of God Temple (城隍庙),31.227859,121.487536,Temple
4,Shanghai,Goodfellas,31.234878,121.48673,Italian Restaurant


### create the map of Shanghai with venues

In [12]:
# create map using latitude and longitude values
map_sh = folium.Map(location=[latitude_sh, longitude_sh], zoom_start=13)
mapMarkers(map_sh,dfs_sh)
map_sh

In [13]:
map_sh.save('map_sh.html')

### get the location of each venues, then the dataframe as the sample matrix

In [14]:
X_dfs_sh=dfs_sh.drop(['Venue','Location','Category'],axis=1)
X_dfs_sh.head()

Unnamed: 0,Latitude,Longitude
0,31.228922,121.487982
1,31.228209,121.495593
2,31.228193,121.495571
3,31.227859,121.487536
4,31.234878,121.48673


### using the DBSCAN module to cluster

In [15]:
# Create the DBSCAN module
eps_temp = epsilon(X_dfs_sh,5)
ydbscan_sh = DBSCAN(eps=eps_temp,min_samples=5).fit(X_dfs_sh)
dfs_sh['Cluster Labels'] = ydbscan_sh.labels_
ydbscan_sh
ydbscan_sh.labels_

array([ 0, -1, -1, -1,  0,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0, -1,  0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1,  0,  0,  0,
        0,  0,  0,  0, -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, -1, -1,  0, -1],
      dtype=int64)

### check the lables

In [16]:
dfs_sh.groupby('Cluster Labels').size().sort_values()

Cluster Labels
-1    11
 0    89
dtype: int64

In [28]:
business_density_of_shanghai = (89/100)
business_density_of_shanghai

0.89

### marked in the map

In [17]:
map_clusters = folium.Map(location=[latitude_sh, longitude_sh], zoom_start=13)
# set color scheme for the clusters
clusterMap(2,dfs_sh)
map_clusters

In [18]:
map_clusters.save('map_sh_density.html')

### Repeat the actions for Tokyo city.

In [19]:
dfs_tk = getNearbyVenues_new('Tokyo',latitude_tk,longitude_tk)
dfs_tk.head()

Unnamed: 0,Location,Venue,Latitude,Longitude,Category
0,Tokyo,Palace Hotel Tokyo (パレスホテル東京),35.684644,139.761302,Hotel
1,Tokyo,Imperial Palace East Garden (皇居東御苑),35.685797,139.756662,Garden
2,Tokyo,KITTE Garden (屋上庭園 KITTEガーデン),35.679806,139.764872,Garden
3,Tokyo,Aman Tokyo (アマン東京),35.685236,139.765401,Hotel
4,Tokyo,Mitsubishi Ichigokan Museum (三菱一号館美術館),35.67842,139.76326,Art Museum


In [31]:
# create map of Scarborough using latitude and longitude values
map_tk = folium.Map(location=[latitude_tk, longitude_tk], zoom_start=13)
mapMarkers(map_tk,dfs_tk)
map_tk

In [21]:
map_tk.save('map_tk.html')

In [22]:
X_dfs_tk=dfs_tk.drop(['Venue','Location','Category'],axis=1)
X_dfs_tk.head()

Unnamed: 0,Latitude,Longitude
0,35.684644,139.761302
1,35.685797,139.756662
2,35.679806,139.764872
3,35.685236,139.765401
4,35.67842,139.76326


In [23]:
eps_temp = epsilon(X_dfs_tk,5)
ydbscan_tk = DBSCAN(eps=eps_temp,min_samples=5).fit(X_dfs_tk)
dfs_tk['Cluster Labels'] = ydbscan_tk.labels_
ydbscan_tk
ydbscan_tk.labels_

array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  0,
        0,  0,  1,  1,  0,  0,  0,  0,  0,  1,  0,  0,  1,  1,  1,  1,  0,
        1,  0,  0,  1,  0,  0,  0,  1,  0, -1, -1,  0,  0,  1,  0,  0,  0,
        0,  0,  0,  1,  0,  0, -1,  0, -1, -1,  0, -1, -1,  0,  0,  0,  0,
        0, -1, -1,  0, -1,  0, -1,  0,  0, -1, -1,  0,  0,  0, -1],
      dtype=int64)

In [24]:
dfs_tk.groupby('Cluster Labels').size().sort_values()

Cluster Labels
 1    13
-1    14
 0    73
dtype: int64

In [30]:
business_density_of_tokyo = ((100-14)/100)
business_density_of_tokyo

0.86

In [25]:
map_clusters = folium.Map(location=[latitude_tk, longitude_tk], zoom_start=13)
# set color scheme for the clusters
clusterMap(3,dfs_tk)
map_clusters

In [26]:
map_clusters.save('map_tk_density.html')

### Result:

From the lables, we can know the shanghai's business desity is 0.89 > tokyo's business density 0.86.So I will choose Shanghai to open a restaurant from the business density data. 

### Discussion

But business density is only one point of opening restaurant,we should consult more points such as population density,consumption index and so on.Because I can't get the data, so I only use business density to describe the problem.

### Conclusion 

So, if base on the foursquare venues data, I will use business density to find the city to open a restaurant.So I will open the restaurant in Shanghai City of China.  

### Thank you for completing this notebook!

This notebook was created by [Ahmed Tealeb](https://www.linkedin.com/in/ahmedtealeb/).

This notebook is part of an assignment on **Coursera** called *Applied Data Science Capstone*. 

<hr>
Copyright &copy; 2019