# IBM DATA SCIENCE CAPSTONE PROJECT by Shuhao Constan Chang

##  Find Best Neighborhood in Shanghai to Build Electrical Vehicle Charging Station

#### -Build a dataframe of neighborhoods in Shanghai, China by web scraping(BeautifulSoup) the data from Wikipedia page
#### -Get the geographical coordinates(Geocoder) of the neighborhoods
#### -Obtain the venue data for the neighborhoods from Foursquare API
#### -Explore and cluster the neighborhoods by venue types with different parking time length
#### -Select the best cluster to build electrical vehicle charging stations

## 1. Install and Import Libraries

In [1]:
pip install beautifulsoup4 


Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/1a/b7/34eec2fe5a49718944e215fde81288eec1fa04638aa3fb57c1c6cd0f98c3/beautifulsoup4-4.8.0-py3-none-any.whl (97kB)
[K     |████████████████████████████████| 102kB 19.0MB/s ta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/0b/44/0474f2207fdd601bb25787671c81076333d2c80e6f97e92790f8887cf682/soupsieve-1.9.3-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.0 soupsieve-1.9.3
Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install geocoder

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 20.3MB/s ta 0:00:01
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Collecting future (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/90/52/e20466b85000a181e1e144fd8305caf2cf475e2f9674e797b222f8105f5f/future-0.17.1.tar.gz (829kB)
[K     |████████████████████████████████| 829kB 37.5MB/s eta 0:00:01
Collecting click (from geocoder)
[?25l  Downloading https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl (81kB)
[K     |████████████████████████████████| 81kB 18.9MB/s eta 0:00:01
Building wheels 

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         237 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

## 2. Build a dataframe of neighborhoods in Shanghai, China by web scraping(BeautifulSoup) the data from Wikipedia page

In [4]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_of_Shanghai").text
soup = BeautifulSoup(data, 'html.parser')
neighborhoodList = []
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)
kl_df = pd.DataFrame({"Neighborhood": neighborhoodList})

kl_df

Unnamed: 0,Neighborhood
0,Anting
1,Changshou Road Subdistrict
2,Fengjing
3,"Gaoqiao, Shanghai"
4,"Gubei, Shanghai"
5,"Koreatown, Shanghai"
6,Lujiazui
7,"Luodian, Shanghai"
8,Nanxiang
9,Qiantan International Business Zone (Shanghai)


In [5]:
kl_df.shape

(19, 1)

## 3.Get the geographical coordinates(Geocoder) of the neighborhoods and build a map with neighborhoods

In [6]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize variable to None
    lat_lng_coords = None
    # loop until getting the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Shanghai, China'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [7]:
coords = [ get_latlng(neighborhood) for neighborhood in kl_df["Neighborhood"].tolist() ]
coords

[[31.29890000000006, 121.15760000000012],
 [30.916040000000066, 121.15409000000011],
 [31.116700000000037, 121.12902000000008],
 [31.22222000000005, 121.45806000000005],
 [31.22222000000005, 121.45806000000005],
 [31.22222000000005, 121.45806000000005],
 [30.79141000000004, 121.34888000000001],
 [31.22222000000005, 121.45806000000005],
 [31.29979000000003, 121.31180000000006],
 [31.22222000000005, 121.45806000000005],
 [31.152670000000057, 121.35688000000005],
 [31.03595000000007, 121.21460000000002],
 [31.22222000000005, 121.45806000000005],
 [31.37566000000004, 121.49041000000011],
 [31.024740000000065, 121.67880000000002],
 [30.946420000000046, 121.0098200000001],
 [31.190000000000055, 121.43194000000005],
 [31.20861000000008, 121.60889000000009],
 [31.107570000000067, 121.05696000000012]]

In [8]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [9]:
# merge the coordinates into the original dataframe
kl_df['Latitude'] = df_coords['Latitude']
kl_df['Longitude'] = df_coords['Longitude']

In [10]:
# check the neighborhoods and the coordinates
print(kl_df.shape)
kl_df

(19, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Anting,31.2989,121.1576
1,Changshou Road Subdistrict,30.91604,121.15409
2,Fengjing,31.1167,121.12902
3,"Gaoqiao, Shanghai",31.22222,121.45806
4,"Gubei, Shanghai",31.22222,121.45806
5,"Koreatown, Shanghai",31.22222,121.45806
6,Lujiazui,30.79141,121.34888
7,"Luodian, Shanghai",31.22222,121.45806
8,Nanxiang,31.29979,121.3118
9,Qiantan International Business Zone (Shanghai),31.22222,121.45806


In [11]:
# save the DataFrame as CSV file
kl_df.to_csv("kl_df.csv", index=False)

In [12]:

# get the coordinates of Shanghai
address = 'Shanghai, China'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Shanghai, China {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Shanghai, China 31.2322735, 121.4691749.


In [13]:
# create map of Shanghai using latitude and longitude values
map_kl = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

In [14]:
# save the map as HTML file
map_kl.save('map_kl.html')

## 4.Obtain the venue data for the neighborhoods from Foursquare API and categorize the venues by parking time length

In [15]:
# define Foursquare Credentials and Version
CLIENT_ID = 'P0KVSF2GWWJTLK3VBH0FEB42MIFB4MWTBM5CBWU0VGWR0TNN' # my Foursquare ID
CLIENT_SECRET = '2XP4VOXJOM1YAZXOQ01E0TVU4WIV34ZP1HZHVW53LB3VYP0I' # my Foursquare Secret
VERSION = '20190921' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: P0KVSF2GWWJTLK3VBH0FEB42MIFB4MWTBM5CBWU0VGWR0TNN
CLIENT_SECRET:2XP4VOXJOM1YAZXOQ01E0TVU4WIV34ZP1HZHVW53LB3VYP0I


In [16]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [17]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(825, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Anting,31.2989,121.1576,Alibaba,31.297209,121.162602,German Restaurant
1,Anting,31.2989,121.1576,Wirtshaus,31.291667,121.154532,Bar
2,Anting,31.2989,121.1576,Starbucks (星巴克),31.289777,121.157733,Coffee Shop
3,Anting,31.2989,121.1576,Life Hub (嘉亭荟城市生活广场),31.289792,121.157673,Shopping Mall
4,Anting,31.2989,121.1576,Biergarten Anting,31.297506,121.164596,German Restaurant


In [18]:
# Count the number of main venue types with descend order
venues_df.groupby(["VenueCategory"]).count().sort_values(by="Neighborhood",ascending=False).head(20)

Unnamed: 0_level_0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude
VenueCategory,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Coffee Shop,66,66,66,66,66,66
Hotel,60,60,60,60,60,60
Shopping Mall,35,35,35,35,35,35
Cocktail Bar,33,33,33,33,33,33
Spa,31,31,31,31,31,31
Dumpling Restaurant,30,30,30,30,30,30
Bakery,30,30,30,30,30,30
Chinese Restaurant,28,28,28,28,28,28
Café,23,23,23,23,23,23
Japanese Restaurant,20,20,20,20,20,20


In [19]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 110 uniques categories.


In [20]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['German Restaurant', 'Bar', 'Coffee Shop', 'Shopping Mall',
       'Fast Food Restaurant', 'Park', 'Bus Station', 'Metro Station',
       'Hotel', 'Shanghai Restaurant', 'Toll Plaza', 'Train Station',
       'Market', 'Asian Restaurant', 'Art Gallery', 'Chinese Restaurant',
       'Garden', 'Optical Shop', 'Cocktail Bar', 'Pizza Place',
       'Hunan Restaurant', 'Club House', 'Furniture / Home Store',
       'Turkish Restaurant', 'Japanese Restaurant', 'Other Nightlife',
       'Mexican Restaurant', 'Theme Restaurant', 'Spa', 'Noodle House',
       'Café', 'Dumpling Restaurant', 'Speakeasy', 'Hong Kong Restaurant',
       'Multiplex', 'Seafood Restaurant', 'Nail Salon',
       'Gym / Fitness Center', 'Brazilian Restaurant', 'Yoga Studio',
       'Gastropub', 'Clothing Store', 'Pedestrian Plaza', 'Wine Bar',
       'Electronics Store', 'Supermarket', 'Plaza', 'Bakery', 'Diner',
       'Ramen Restaurant'], dtype=object)

In [21]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(825, 111)


Unnamed: 0,Neighborhoods,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Basketball Court,Bed & Breakfast,Big Box Store,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Dim Sum Restaurant,Diner,Dongbei Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,French Restaurant,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Historic Site,History Museum,Hong Kong Restaurant,Hotel,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Lounge,Market,Massage Studio,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mongolian Restaurant,Movie Theater,Moving Target,Multiplex,Nail Salon,New American Restaurant,Noodle House,Optical Shop,Other Nightlife,Park,Pastry Shop,Pedestrian Plaza,Pie Shop,Pizza Place,Plaza,Ramen Restaurant,Restaurant,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Shopping Mall,Spa,Spanish Restaurant,Speakeasy,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zhejiang Restaurant
0,Anting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Anting,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Anting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Anting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Anting,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [22]:
kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

(18, 111)


Unnamed: 0,Neighborhoods,American Restaurant,Art Gallery,Art Museum,Asian Restaurant,Bakery,Bar,Basketball Court,Bed & Breakfast,Big Box Store,Bookstore,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Bus Station,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Dim Sum Restaurant,Diner,Dongbei Restaurant,Dumpling Restaurant,Electronics Store,Fast Food Restaurant,French Restaurant,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,General Travel,German Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hawaiian Restaurant,Historic Site,History Museum,Hong Kong Restaurant,Hotel,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indie Theater,Irish Pub,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Lounge,Market,Massage Studio,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mongolian Restaurant,Movie Theater,Moving Target,Multiplex,Nail Salon,New American Restaurant,Noodle House,Optical Shop,Other Nightlife,Park,Pastry Shop,Pedestrian Plaza,Pie Shop,Pizza Place,Plaza,Ramen Restaurant,Restaurant,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Shopping Mall,Spa,Spanish Restaurant,Speakeasy,Sports Club,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Toll Booth,Toll Plaza,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Wine Bar,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant,Zhejiang Restaurant
0,Anting,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Changshou Road Subdistrict,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Fengjing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Gaoqiao, Shanghai",0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.05,0.07,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.08,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.04,0.05,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
4,"Gubei, Shanghai",0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.05,0.07,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.08,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.04,0.05,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
5,"Koreatown, Shanghai",0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.05,0.07,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.08,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.04,0.05,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
6,Lujiazui,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Luodian, Shanghai",0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.05,0.07,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.08,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.04,0.05,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
8,Nanxiang,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Qiantan International Business Zone (Shanghai),0.01,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.01,0.0,0.01,0.01,0.01,0.05,0.07,0.0,0.0,0.01,0.01,0.0,0.04,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.01,0.08,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.01,0.01,0.04,0.05,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0


In [23]:
# Select top 10 venue types by numbers for data analyze
kl_select = kl_grouped[["Neighborhoods","Shopping Mall","Hotel","Park","Spa","Dumpling Restaurant","Chinese Restaurant","Japanese Restaurant","Coffee Shop","Cocktail Bar","Bakery"]]

In [24]:
# In these venue types, "Shopping Mall, Hotel, Park" are venues longer parking hours
# "Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant" are venues Medium parking hours
# "Cocktail Bar,Bakery,Coffee Shop" are venues shorter parking hours or no parking places
kl_select["Shopping Mall, Hotel, Park"]=kl_select["Shopping Mall"]+kl_select["Hotel"]+kl_select["Park"]
kl_select["Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant"]=kl_select["Spa"]+kl_select["Dumpling Restaurant"]+kl_select["Chinese Restaurant"]
kl_select["Cocktail Bar,Bakery,Coffee Shop"]=kl_select["Cocktail Bar"]+kl_select["Bakery"]+kl_select["Coffee Shop"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [25]:
# Data after being categoried
kl_categoried=kl_select[["Neighborhoods", "Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop"]]
kl_categoried

Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop"
0,Anting,0.222222,0.0,0.166667
1,Changshou Road Subdistrict,0.0,0.0,0.0
2,Fengjing,0.0,0.5,0.0
3,"Gaoqiao, Shanghai",0.13,0.1,0.16
4,"Gubei, Shanghai",0.13,0.1,0.16
5,"Koreatown, Shanghai",0.13,0.1,0.16
6,Lujiazui,0.0,0.5,0.0
7,"Luodian, Shanghai",0.13,0.1,0.16
8,Nanxiang,0.25,0.166667,0.166667
9,Qiantan International Business Zone (Shanghai),0.13,0.1,0.16


In [26]:
# clustering the neighborhoods by above venue characteristics
# set number of clusters
kclusters = 3

kl_categoried_without_neighbor = kl_categoried.drop(["Neighborhoods"], 1)
#kl_clustering = kl_grouped.drop(["Neighborhoods"], 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_categoried_without_neighbor)
#kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 1, 2, 2, 2, 1, 2, 2, 2], dtype=int32)

In [27]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_categoried.copy()
# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [28]:
kl_merged.head()

Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels
0,Anting,0.222222,0.0,0.166667,2
1,Changshou Road Subdistrict,0.0,0.0,0.0,2
2,Fengjing,0.0,0.5,0.0,1
3,"Gaoqiao, Shanghai",0.13,0.1,0.16,2
4,"Gubei, Shanghai",0.13,0.1,0.16,2


In [29]:
# Add latitude and longitude for each neighborhood
kl_merged = kl_merged.join(kl_df.set_index("Neighborhood"), on="Neighborhoods")

print(kl_merged.shape)
kl_merged.head()

(18, 7)


Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels,Latitude,Longitude
0,Anting,0.222222,0.0,0.166667,2,31.2989,121.1576
1,Changshou Road Subdistrict,0.0,0.0,0.0,2,30.91604,121.15409
2,Fengjing,0.0,0.5,0.0,1,31.1167,121.12902
3,"Gaoqiao, Shanghai",0.13,0.1,0.16,2,31.22222,121.45806
4,"Gubei, Shanghai",0.13,0.1,0.16,2,31.22222,121.45806


In [30]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(18, 7)


Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels,Latitude,Longitude
14,Xintiandi,0.5,0.0,0.0,0,31.02474,121.6788
10,Qibao,0.32,0.2,0.12,0,31.15267,121.35688
2,Fengjing,0.0,0.5,0.0,1,31.1167,121.12902
6,Lujiazui,0.0,0.5,0.0,1,30.79141,121.34888
0,Anting,0.222222,0.0,0.166667,2,31.2989,121.1576
15,Xujiahui,0.05,0.1,0.14,2,31.19,121.43194
13,Wusong,0.0,0.0,0.0,2,31.37566,121.49041
12,Tianzifang,0.13,0.1,0.16,2,31.22222,121.45806
11,Songjiang Town,0.222222,0.111111,0.222222,2,31.03595,121.2146
8,Nanxiang,0.25,0.166667,0.166667,2,31.29979,121.3118


In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhoods'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters       

In [32]:
map_clusters.save('map_clusters.html')

In [33]:

kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels,Latitude,Longitude
14,Xintiandi,0.5,0.0,0.0,0,31.02474,121.6788
10,Qibao,0.32,0.2,0.12,0,31.15267,121.35688


In [34]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels,Latitude,Longitude
2,Fengjing,0.0,0.5,0.0,1,31.1167,121.12902
6,Lujiazui,0.0,0.5,0.0,1,30.79141,121.34888


In [35]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhoods,"Shopping Mall, Hotel, Park","Spa, Dumpling Restaurant, Chinese Restaurant,Japanese Restaurant","Cocktail Bar,Bakery,Coffee Shop",Cluster Labels,Latitude,Longitude
0,Anting,0.222222,0.0,0.166667,2,31.2989,121.1576
15,Xujiahui,0.05,0.1,0.14,2,31.19,121.43194
13,Wusong,0.0,0.0,0.0,2,31.37566,121.49041
12,Tianzifang,0.13,0.1,0.16,2,31.22222,121.45806
11,Songjiang Town,0.222222,0.111111,0.222222,2,31.03595,121.2146
8,Nanxiang,0.25,0.166667,0.166667,2,31.29979,121.3118
16,Zhangjiang Town,0.175,0.225,0.2,2,31.20861,121.60889
7,"Luodian, Shanghai",0.13,0.1,0.16,2,31.22222,121.45806
5,"Koreatown, Shanghai",0.13,0.1,0.16,2,31.22222,121.45806
4,"Gubei, Shanghai",0.13,0.1,0.16,2,31.22222,121.45806


### 5. Analyze the result and make decision

### From above clustering result, cluster 2 are neighborhoods where longer, medium and shorter parking time places are relatively in average number. These neighborhoods are usually downtown neighborhoods. all kinds of business overlapped with each other, parking space are usually limited, and public transportations are usually the first priority for people in Shanghai. Cluster 1 are usually residential area, the most popular business here are restaurants, these small restaurants are closed to living area, most of these restaurants do not provides parking spaces. Electrical cars owner will prefer to charge there cars in their own parking lot. Cluster 0 has high occurrence of Shopping Mall, Hotel, Park. These neighborhoods are a kind of away from downtown places, and the public transportation are not covered thoroughly. People will choose drive to these neighborhoods and usually stay for rather a long time. These neighborhoods are prefect places for building electrical  charging stations.
### In conclusion, the neighborhoods to build electrical charging stations are Xintiandi, Qibao (cluster 0). 