# IBM Coursera Capstone Project

## 1.Business Problem: Subway Business Opportunities
     - Select few POIs of subway, which locate in differenct areas of Beijing, China.
     - Analyze venue categories, count and distribuation around each POI.
     - Use machine learning algorithm, e.g. Kmeans, to execute Venue's analysis
     - Provide meaningful businese suggestions based on above model's output(predication), businese type, locations and so on. e.g. It's better to open "hotpot" restaurant in Daxing district, but it's better to open "convenience store" in Sihui area.
     - Besides of venue information, there are other important factors which affect final business decision as below, but they are out of scope of this project.
       -> Price of House of area
       -> If there is subway station of area
       -> population density of area
     
##  2. DataSet:
     - Select few POIs from Baidu map
     - Get longitude and latitude of each POIs based on coordinate system of Baidu.
     - Save POI's name, longitude and latitude in .csv file as dataset 
     - Retrieve venue information of each POIs by using Fourqure APIs

## 3. Project implemenation 

### 3.1 Import All Libraries and Packages

In [7]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


In [8]:
beijing_df = pd.read_csv('/root/Notebooks/coursera/coursera_capstone_project/Geospatial_Beijing_Coordinates.csv')

In [9]:
beijing_df.head(10)

Unnamed: 0,Neighborhood,Longitude,Latitude
0,GuoMao Subway Station,116.468148,39.914832
1,TuanJieHu Subway Station,116.468363,39.93951
2,WangJing Subway Station,116.474634,40.004785
3,Lishuiqiao South Subway Station,116.420987,40.048115
4,Xierqi Subway Station,116.312409,40.059146
5,Wukesong Subway Station,116.280681,39.91385
6,Fengtai Subway Station,116.311618,39.855966
7,Daxing Airport Subway Station,116.423278,39.51862
8,Yizhuang Bridge Subway Station,116.486949,39.808647
9,Liangxian Subway Station,116.163847,39.748704


In [10]:
beijing_df.shape

(10, 3)

### 3.2 get the latitude,longitude of Beijing

In [11]:
address = 'Beijing,China'
geolocator = Nominatim(user_agent="bj_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Beijing are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Beijing are 39.9020803, 116.7185213.


### 3.3 Create map of Beijing by using folium

In [6]:
map_beijing = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(beijing_df['Latitude'], beijing_df['Longitude'],beijing_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, 'Beijing')
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_beijing)  
    
map_beijing 

### 3.4 Explore the subway stations by using Foursquare APIs

In [12]:
CLIENT_ID = 'PVBTYHASBMH1NGZUITEIGVZO3FUTBPBL1WSYMA010JUN0QXV' # your Foursquare ID
CLIENT_SECRET = '3ZNVZWN4Y4C31WHVD2KOUNVQWJPGSNSR2QJU1XY2KY51W4AQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PVBTYHASBMH1NGZUITEIGVZO3FUTBPBL1WSYMA010JUN0QXV
CLIENT_SECRET:3ZNVZWN4Y4C31WHVD2KOUNVQWJPGSNSR2QJU1XY2KY51W4AQ


In [14]:
radius = 5000
LIMIT = 200

In [18]:

def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [19]:
# type your answer here
beijing_venues = getNearbyVenues(names=beijing_df['Neighborhood'],
                                   latitudes=beijing_df['Latitude'],
                                   longitudes=beijing_df['Longitude']
                                  )

GuoMao Subway Station
TuanJieHu Subway Station
WangJing Subway Station
Lishuiqiao South Subway Station
Xierqi Subway Station
Wukesong Subway Station
Fengtai Subway Station
Daxing Airport Subway Station
Yizhuang Bridge Subway Station
Liangxian Subway Station


In [25]:
print(beijing_venues.shape)
beijing_venues.head()

(522, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,GuoMao Subway Station,39.914832,116.468148,Din Tai Fung (鼎泰丰),39.90921,116.473768,Dumpling Restaurant
1,GuoMao Subway Station,39.914832,116.468148,Rosewood Beijing (北京瑰丽酒店),39.918612,116.454916,Hotel
2,GuoMao Subway Station,39.914832,116.468148,Hotel Jen Beijing (新国贸饭店),39.911456,116.453576,Hotel
3,GuoMao Subway Station,39.914832,116.468148,Apple China Central Mall (Apple 华贸购物中心),39.909304,116.479679,Electronics Store
4,GuoMao Subway Station,39.914832,116.468148,Shangri-La China World Summit Wing (国贸大酒店),39.910893,116.452236,Hotel


In [26]:
beijing_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Daxing Airport Subway Station,6,6,6,6,6,6
Fengtai Subway Station,35,35,35,35,35,35
GuoMao Subway Station,100,100,100,100,100,100
Liangxian Subway Station,10,10,10,10,10,10
Lishuiqiao South Subway Station,37,37,37,37,37,37
TuanJieHu Subway Station,100,100,100,100,100,100
WangJing Subway Station,70,70,70,70,70,70
Wukesong Subway Station,76,76,76,76,76,76
Xierqi Subway Station,63,63,63,63,63,63
Yizhuang Bridge Subway Station,25,25,25,25,25,25


### 3.5 Analyze Each Neighborhood of Scarborough

In [27]:
# one hot encoding
beijing_onehot = pd.get_dummies(beijing_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
beijing_onehot['Neighborhood'] = beijing_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [beijing_onehot.columns[-1]] + list(beijing_onehot.columns[:-1])
beijing_onehot = beijing_onehot[fixed_columns]

beijing_onehot.head(10)

Unnamed: 0,Neighborhood,Airport,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beijing Restaurant,Bistro,Bookstore,Brewery,Bubble Tea Shop,Buffet,Bus Station,Bus Stop,Café,Camera Store,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Department Store,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gastropub,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Herbs & Spices Store,History Museum,Hong Kong Restaurant,Hot Spring,Hotel,Hotel Bar,Hotpot Restaurant,Hubei Restaurant,Hunan Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Karaoke Bar,Korean Restaurant,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mongolian Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,New American Restaurant,Noodle House,Outdoor Sculpture,Park,Peking Duck Restaurant,Pizza Place,Planetarium,Pub,Public Art,Restaurant,Sandwich Place,Shanxi Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sporting Goods Shop,Stadium,Supermarket,Sushi Restaurant,Szechuan Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Train Station,Vietnamese Restaurant,Warehouse Store,Water Park,Women's Store,Yunnan Restaurant
0,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,GuoMao Subway Station,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
beijing_grouped = beijing_onehot.groupby('Neighborhood').mean().reset_index()

print(beijing_grouped.shape)
beijing_grouped.head(10)

(10, 108)


Unnamed: 0,Neighborhood,Airport,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beijing Restaurant,Bistro,Bookstore,Brewery,Bubble Tea Shop,Buffet,Bus Station,Bus Stop,Café,Camera Store,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Department Store,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,French Restaurant,Fruit & Vegetable Store,Furniture / Home Store,Gastropub,German Restaurant,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Herbs & Spices Store,History Museum,Hong Kong Restaurant,Hot Spring,Hotel,Hotel Bar,Hotpot Restaurant,Hubei Restaurant,Hunan Restaurant,Indie Movie Theater,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Karaoke Bar,Korean Restaurant,Massage Studio,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mongolian Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,New American Restaurant,Noodle House,Outdoor Sculpture,Park,Peking Duck Restaurant,Pizza Place,Planetarium,Pub,Public Art,Restaurant,Sandwich Place,Shanxi Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sporting Goods Shop,Stadium,Supermarket,Sushi Restaurant,Szechuan Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toll Plaza,Train Station,Vietnamese Restaurant,Warehouse Store,Water Park,Women's Store,Yunnan Restaurant
0,Daxing Airport Subway Station,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0
1,Fengtai Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.085714,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.028571,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,GuoMao Subway Station,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.03,0.0,0.03,0.03,0.01,0.01,0.01,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.18,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.03,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.03,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.07,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
3,Liangxian Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
4,Lishuiqiao South Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.027027,0.0,0.189189,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.135135,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.108108,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0
5,TuanJieHu Subway Station,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.01,0.02,0.05,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.01,0.02,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.18,0.0,0.0,0.01,0.0,0.0,0.04,0.01,0.07,0.01,0.0,0.03,0.0,0.0,0.03,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.03,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.06,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,WangJing Subway Station,0.0,0.0,0.042857,0.0,0.014286,0.042857,0.028571,0.0,0.0,0.014286,0.028571,0.014286,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.085714,0.0,0.028571,0.085714,0.0,0.0,0.028571,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.014286,0.014286,0.0,0.0,0.014286,0.014286,0.0,0.014286,0.014286,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.071429,0.014286,0.028571,0.0,0.014286,0.0,0.042857,0.0,0.042857,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.028571,0.014286,0.014286,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286
7,Wukesong Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.052632,0.013158,0.0,0.157895,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.184211,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.026316,0.0,0.0,0.144737,0.0,0.026316,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.026316,0.0,0.039474,0.013158,0.052632,0.013158,0.0,0.0,0.0,0.0,0.0,0.065789,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0
8,Xierqi Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.031746,0.063492,0.015873,0.0,0.174603,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.238095,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.079365,0.0,0.015873,0.0,0.015873,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.015873,0.031746,0.0,0.0,0.015873,0.0,0.015873,0.0,0.031746,0.0,0.0,0.0,0.0,0.015873,0.0,0.063492,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Yizhuang Bridge Subway Station,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.16,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0


In [None]:
#hood_test='Agincourt'
#temp_test = scarborough_grouped[scarborough_grouped['Neighborhood'] == hood_test].T.reset_index()
#temp_test

In [30]:
num_top_venues = 10

for hood in beijing_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = beijing_grouped[beijing_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Daxing Airport Subway Station----
                     venue  freq
0                  Airport  0.17
1          Bubble Tea Shop  0.17
2            Women's Store  0.17
3          Airport Service  0.17
4           Clothing Store  0.17
5             Noodle House  0.17
6   Peking Duck Restaurant  0.00
7                     Park  0.00
8        Outdoor Sculpture  0.00
9  New American Restaurant  0.00


----Fengtai Subway Station----
                  venue  freq
0  Fast Food Restaurant  0.20
1           Coffee Shop  0.20
2         Shopping Mall  0.11
3                 Hotel  0.09
4           Pizza Place  0.09
5         Metro Station  0.09
6     Hotpot Restaurant  0.06
7               Theater  0.03
8           Bus Station  0.03
9                Bakery  0.03


----GuoMao Subway Station----
                 venue  freq
0                Hotel  0.18
1        Shopping Mall  0.07
2                 Café  0.06
3              Brewery  0.04
4                 Park  0.03
5  Dumpling Restaurant  0.03
6

### 3.6 Put above data into new dataframe

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = beijing_grouped['Neighborhood']

for ind in np.arange(beijing_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(beijing_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Daxing Airport Subway Station,Airport,Bubble Tea Shop,Noodle House,Clothing Store,Women's Store,Airport Service,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
1,Fengtai Subway Station,Coffee Shop,Fast Food Restaurant,Shopping Mall,Metro Station,Pizza Place,Hotel,Hotpot Restaurant,Bus Station,Theater,Athletics & Sports
2,GuoMao Subway Station,Hotel,Shopping Mall,Café,Brewery,Japanese Restaurant,Park,Peking Duck Restaurant,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant
3,Liangxian Subway Station,Metro Station,Hotel,Shopping Mall,Fast Food Restaurant,Coffee Shop,Pizza Place,Train Station,Argentinian Restaurant,Fruit & Vegetable Store,Dessert Shop
4,Lishuiqiao South Subway Station,Coffee Shop,Fast Food Restaurant,Park,Hotpot Restaurant,Shopping Mall,Chinese Restaurant,Pizza Place,Sporting Goods Shop,Go Kart Track,Toll Plaza
5,TuanJieHu Subway Station,Hotel,Japanese Restaurant,Shopping Mall,Brewery,Italian Restaurant,Park,Café,Mexican Restaurant,Massage Studio,Dumpling Restaurant
6,WangJing Subway Station,Café,Chinese Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,American Restaurant,Art Gallery,Art Museum,Cantonese Restaurant,Shopping Mall
7,Wukesong Subway Station,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Pizza Place,Chinese Restaurant,Park,History Museum,Asian Restaurant,Hotpot Restaurant
8,Xierqi Subway Station,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Chinese Restaurant,Asian Restaurant,Multiplex,Cantonese Restaurant,Pizza Place,Clothing Store
9,Yizhuang Bridge Subway Station,Hotel,Metro Station,Fast Food Restaurant,Sandwich Place,Beijing Restaurant,Golf Course,Furniture / Home Store,Mediterranean Restaurant,Movie Theater,Coffee Shop


### 3.7 k-means to cluster the scarborough areas into 5 clusters.

In [33]:
kclusters = 5
beijing_grouped_cluster = beijing_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(beijing_grouped_cluster)
kmeans.labels_[0:20] 

array([1, 0, 4, 3, 0, 4, 4, 0, 0, 2], dtype=int32)

In [34]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

beijing_merged = beijing_df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
beijing_merged = beijing_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
print(beijing_merged.shape)
beijing_merged.head(10)

(10, 14)


Unnamed: 0,Neighborhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,GuoMao Subway Station,116.468148,39.914832,4,Hotel,Shopping Mall,Café,Brewery,Japanese Restaurant,Park,Peking Duck Restaurant,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant
1,TuanJieHu Subway Station,116.468363,39.93951,4,Hotel,Japanese Restaurant,Shopping Mall,Brewery,Italian Restaurant,Park,Café,Mexican Restaurant,Massage Studio,Dumpling Restaurant
2,WangJing Subway Station,116.474634,40.004785,4,Café,Chinese Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,American Restaurant,Art Gallery,Art Museum,Cantonese Restaurant,Shopping Mall
3,Lishuiqiao South Subway Station,116.420987,40.048115,0,Coffee Shop,Fast Food Restaurant,Park,Hotpot Restaurant,Shopping Mall,Chinese Restaurant,Pizza Place,Sporting Goods Shop,Go Kart Track,Toll Plaza
4,Xierqi Subway Station,116.312409,40.059146,0,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Chinese Restaurant,Asian Restaurant,Multiplex,Cantonese Restaurant,Pizza Place,Clothing Store
5,Wukesong Subway Station,116.280681,39.91385,0,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Pizza Place,Chinese Restaurant,Park,History Museum,Asian Restaurant,Hotpot Restaurant
6,Fengtai Subway Station,116.311618,39.855966,0,Coffee Shop,Fast Food Restaurant,Shopping Mall,Metro Station,Pizza Place,Hotel,Hotpot Restaurant,Bus Station,Theater,Athletics & Sports
7,Daxing Airport Subway Station,116.423278,39.51862,1,Airport,Bubble Tea Shop,Noodle House,Clothing Store,Women's Store,Airport Service,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
8,Yizhuang Bridge Subway Station,116.486949,39.808647,2,Hotel,Metro Station,Fast Food Restaurant,Sandwich Place,Beijing Restaurant,Golf Course,Furniture / Home Store,Mediterranean Restaurant,Movie Theater,Coffee Shop
9,Liangxian Subway Station,116.163847,39.748704,3,Metro Station,Hotel,Shopping Mall,Fast Food Restaurant,Coffee Shop,Pizza Place,Train Station,Argentinian Restaurant,Fruit & Vegetable Store,Dessert Shop


In [35]:
print(beijing_merged.shape)
beijing_merged=beijing_merged.dropna()
print(beijing_merged.shape)
beijing_merged.head(10)

(10, 14)
(10, 14)


Unnamed: 0,Neighborhood,Longitude,Latitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,GuoMao Subway Station,116.468148,39.914832,4,Hotel,Shopping Mall,Café,Brewery,Japanese Restaurant,Park,Peking Duck Restaurant,Dumpling Restaurant,Chinese Restaurant,Mexican Restaurant
1,TuanJieHu Subway Station,116.468363,39.93951,4,Hotel,Japanese Restaurant,Shopping Mall,Brewery,Italian Restaurant,Park,Café,Mexican Restaurant,Massage Studio,Dumpling Restaurant
2,WangJing Subway Station,116.474634,40.004785,4,Café,Chinese Restaurant,Hotel,Italian Restaurant,Japanese Restaurant,American Restaurant,Art Gallery,Art Museum,Cantonese Restaurant,Shopping Mall
3,Lishuiqiao South Subway Station,116.420987,40.048115,0,Coffee Shop,Fast Food Restaurant,Park,Hotpot Restaurant,Shopping Mall,Chinese Restaurant,Pizza Place,Sporting Goods Shop,Go Kart Track,Toll Plaza
4,Xierqi Subway Station,116.312409,40.059146,0,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Chinese Restaurant,Asian Restaurant,Multiplex,Cantonese Restaurant,Pizza Place,Clothing Store
5,Wukesong Subway Station,116.280681,39.91385,0,Fast Food Restaurant,Coffee Shop,Hotel,Shopping Mall,Pizza Place,Chinese Restaurant,Park,History Museum,Asian Restaurant,Hotpot Restaurant
6,Fengtai Subway Station,116.311618,39.855966,0,Coffee Shop,Fast Food Restaurant,Shopping Mall,Metro Station,Pizza Place,Hotel,Hotpot Restaurant,Bus Station,Theater,Athletics & Sports
7,Daxing Airport Subway Station,116.423278,39.51862,1,Airport,Bubble Tea Shop,Noodle House,Clothing Store,Women's Store,Airport Service,Dessert Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
8,Yizhuang Bridge Subway Station,116.486949,39.808647,2,Hotel,Metro Station,Fast Food Restaurant,Sandwich Place,Beijing Restaurant,Golf Course,Furniture / Home Store,Mediterranean Restaurant,Movie Theater,Coffee Shop
9,Liangxian Subway Station,116.163847,39.748704,3,Metro Station,Hotel,Shopping Mall,Fast Food Restaurant,Coffee Shop,Pizza Place,Train Station,Argentinian Restaurant,Fruit & Vegetable Store,Dessert Shop


In [36]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(beijing_merged['Latitude'], beijing_merged['Longitude'], beijing_merged['Neighborhood'], beijing_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster) - 1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters