# This notebook will show code for project "*The Battle of Neighborhoods*"  

## 1 - Required libraries for the project including:
* pandas 
* numpy 
* requests
* geopy
* sklearn 
* matplotlib 
* folium 

In [3]:
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
import numpy as np
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

## 2 - Data Boroughs and Neighborhoods collections 
### 2.1 - Data collection 
* To obtain data, I use the link on wiki containing boroughs and neiborhoods of Amsterdams in the following link:  'https://en.wikipedia.org/wiki/Boroughs_of_Amsterdam#List_of_boroughs'. 
* Then, scrape the table with meaningful information inside.

In [13]:
# scrape and parse the wiki page 'https://en.wikipedia.org/wiki/Boroughs_of_Amsterdam#List_of_boroughs' to get the borough and neighborhoods
url = 'https://en.wikipedia.org/wiki/Boroughs_of_Amsterdam#List_of_boroughs'
df = pd.read_html(url)
df = df[1]

# clean the table obtain
df.drop('Location (in green)', axis = 1, inplace = True)
df.rename(columns = {'Area':'areaKm2','Population density' : 'densityPerkm2'}, inplace = True)
df['areaKm2'] =  df['areaKm2'].apply(lambda x: x.split(' ')[0])
df['densityPerkm2'] = df['densityPerkm2'].apply(lambda x: x.split('/')[0])
df['densityPerkm2'] = df['densityPerkm2'].apply(lambda x: ''.join(x.split(',')))
df['densityPerkm2'] = pd.to_numeric(df['densityPerkm2'])
df['areaKm2'] = pd.to_numeric(df['areaKm2'])
df.sort_values(by = ['densityPerkm2'], ascending = False, inplace = True)
df = df.reset_index(drop = True)
print(df)

                   Borough  areaKm2  Population  densityPerkm2  \
0                     West     9.89      143842          15252   
1         Centrum (Centre)     8.04       86422          13748   
2             Zuid (South)    17.41      144432           9349   
3              Oost (East)    30.56      135767           7635   
4     Nieuw-West(New West)    32.38      151677           4478   
5      Zuidoost(Southeast)    22.08       87854           4391   
6            Noord (North)    49.01       94766           2269   
7  Westpoort(West Gateway)    10.00         192             10   

                                      Neighbourhoods  
0  Frederik Hendrikbuurt, Houthaven, Spaarndammer...  
1  Binnenstad, Grachtengordel, Haarlemmerbuurt, J...  
2  Apollobuurt, Buitenveldert, Hoofddorppleinbuur...  
3  IJburg, Indische Buurt, Eastern Docklands, Oud...  
4  Geuzenveld, Nieuw Sloten, Oostoever, Osdorp, O...  
5    Bijlmermeer, Venserpolder, Gaasperdam, Driemond  
6  Banne Buiksloot, 

In [14]:
# split neighborhood into each row
data_neighborhood = []
data_borough = []
for idx in range(df.shape[0]):
    neighborhoods = df.iloc[idx]['Neighbourhoods']
    neighborhoods = neighborhoods.split(',')
    neighborhoods_list = [i.strip() for i in neighborhoods]
    borough_tmp = len(neighborhoods_list)*[df.iloc[idx]['Borough']]
    data_neighborhood.extend(neighborhoods_list)
    data_borough.extend(borough_tmp)
df_data = pd.DataFrame([data_borough,data_neighborhood])
df_data = df_data.transpose()
df_data.columns = ['Borough', 'Neighborhoods']

print(df_data)

                    Borough                        Neighborhoods
0                      West                Frederik Hendrikbuurt
1                      West                            Houthaven
2                      West                    Spaarndammerbuurt
3                      West                    Staatsliedenbuurt
4                      West                       Zeeheldenbuurt
5                      West                           Westerpark
6                      West                          Kinkerbuurt
7                      West                        Overtoombuurt
8                      West                          De Baarsjes
9                      West                        Bos en Lommer
10                     West                        Kolenkitbuurt
11                     West                             Landlust
12                     West                           Sloterdijk
13         Centrum (Centre)                           Binnenstad
14         Centrum (Centr

In [8]:
# retrieve the coordination
coor_ll = []
for idx in range(df_data.shape[0]):
    geolocator = Nominatim(user_agent = 'Netherland_explorer')
    try:
        coor = geolocator.geocode(df_data.loc[idx]['Neighborhoods'])
        #coor = geolocator.geocode('Overtoombuurt')
        coor_ll.append([coor.latitude, coor.longitude])
    except:
        coor_ll.append([np.nan,np.nan])
print(coor_ll)

[[52.376955699999996, 4.87408475121028], [52.39337645, 4.881680240481273], [52.389662599999994, 4.87936892609182], [52.3802865, 4.870950694196747], [52.389329849999996, 4.888242227776295], [52.387236349999995, 4.871777328438663], [52.3691672, 4.866649434878931], [nan, nan], [52.3689257, 4.8563825], [52.3785206, 4.8487385], [52.3796239, 4.8414043], [52.379851, 4.858608466297305], [52.3871325, 4.8465234], [50.8492705, 5.6887558], [52.370836999999995, 4.885478190638034], [52.382441299999996, 4.887193084850383], [nan, nan], [52.3754157, 4.8810958], [52.3677527, 4.919543395257523], [52.371250450000005, 4.905507717552577], [52.37659135, 4.907560405017876], [52.371869000000004, 4.922875349226905], [52.366405, 4.913728577285194], [52.155884, 4.4876151], [52.3706669, 4.905258231575102], [nan, nan], [52.3607227, 4.887778], [52.348072599999995, 4.875559011765657], [52.3286468, 4.8735234], [nan, nan], [52.355760950000004, 4.876834631189791], [52.3542396, 4.896946171061886], [nan, nan], [52.34404, 

In [15]:
# append the above coordinations to the df data frame 
coor_ll = np.array(coor_ll)
df_coor = pd.DataFrame(coor_ll)
df_ams = pd.concat([df_data, df_coor], axis = 1)
print(df_ams)

                    Borough                        Neighborhoods          0  \
0                      West                Frederik Hendrikbuurt  52.376956   
1                      West                            Houthaven  52.393376   
2                      West                    Spaarndammerbuurt  52.389663   
3                      West                    Staatsliedenbuurt  52.380286   
4                      West                       Zeeheldenbuurt  52.389330   
5                      West                           Westerpark  52.387236   
6                      West                          Kinkerbuurt  52.369167   
7                      West                        Overtoombuurt        NaN   
8                      West                          De Baarsjes  52.368926   
9                      West                        Bos en Lommer  52.378521   
10                     West                        Kolenkitbuurt  52.379624   
11                     West                         

### 2.2 Data cleaning
* There are 6 fields the geopy could not find the coordinations for latitude and longitude, so then i have to fill it by manually search each one. 
* Six coordination are not a big deal, the final data will be shown!
* I have appended 6 missing coordination and save the the file as name 'Amsterdam.csv'

In [22]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_0a2bb30e1b3942749bfb3b1e3beacf65 = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='7PO2J7yfvtLgtQp3Q9WGiF8iE-xgW7jdjMnEzIOUZIRp',
    ibm_auth_endpoint="https://iam.ng.bluemix.net/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_0a2bb30e1b3942749bfb3b1e3beacf65.get_object(Bucket='myfirstproject-donotdelete-pr-bbk9kqij8opqwt',Key='Amsterdam.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df = pd.read_csv(body)
df.drop(['Unnamed: 0'], axis = 1, inplace = True)
df.rename(columns = {'0' : 'Latitude', '1' : 'Longitude', 'Neighborhoods' : 'Neighborhood' }, inplace = True)
print(df)


                    Borough                         Neighborhood   Latitude  \
0                      West                Frederik Hendrikbuurt  52.376956   
1                      West                            Houthaven  52.393376   
2                      West                    Spaarndammerbuurt  52.389663   
3                      West                    Staatsliedenbuurt  52.380286   
4                      West                       Zeeheldenbuurt  52.389330   
5                      West                           Westerpark  52.387236   
6                      West                          Kinkerbuurt  52.369167   
7                      West                        Overtoombuurt  52.363518   
8                      West                          De Baarsjes  52.366647   
9                      West                        Bos en Lommer  52.378521   
10                     West                        Kolenkitbuurt  52.379624   
11                     West                         

## 3. Visualization all the neighborhoods on the map

In [23]:
# plot map amsterdam with sampling points are the neighborhoods 
# latitude and longitude of Amsterdam to create map 
latitude = 52.3545362
longitude = 4.7638781

# create map of Amsterdam using latitude and longitude values
map_amsterdam = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_amsterdam)  
map_amsterdam

## 4. Some pre-defined functions for extract venues
* Credential of Foursquare to access to use Foursquare API calls
* function to handling json to extract venue categories
* create also data for each Borough
* Set parameters for exploration around 600 meter around any neighborhood 

In [24]:
# Define foursquare API credential to use API
CLIENT_ID = 'MC2SRBWTQ4OJXCCA1KUMROH2SOVUVP103QMRIRDMPJHEPGNK' # Foursquare ID
CLIENT_SECRET = 'IFMJISV3YMKFQI1WTGAUAKTKPQ3EUPBOO4J2BQ55Z1M1WRQY' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MC2SRBWTQ4OJXCCA1KUMROH2SOVUVP103QMRIRDMPJHEPGNK
CLIENT_SECRET:IFMJISV3YMKFQI1WTGAUAKTKPQ3EUPBOO4J2BQ55Z1M1WRQY


In [25]:
# function to return nearby venues for any latitude and longitude
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)
# function to sort the venues in descending order.
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
# create data for each Borough
center_data = df[df['Borough'] == 'Centrum (Centre)'].reset_index(drop=True)
west_data = df[df['Borough'] == 'West'].reset_index(drop=True)
south_data = df[df['Borough'] == 'Zuid (South)'].reset_index(drop=True)
east_data = df[df['Borough'] == 'Oost (East)'].reset_index(drop=True)
newwest_data = df[df['Borough'] == 'Nieuw-West(New West)'].reset_index(drop=True)
southeast_data = df[df['Borough'] == 'Zuidoost(Southeast)'].reset_index(drop=True)
North_data = df[df['Borough'] == 'Noord (North)'].reset_index(drop=True)
westgateway_data = df[df['Borough'] == 'Westpoort(West Gateway)'].reset_index(drop=True)

In [27]:
# set limit venues and radius to explore
LIMIT = 150
radius = 600

## 5.  Get venues and segmentation in cente
* Center of Amsterm is the most density people and we will try to extract all the venues in this area

In [28]:
# Center neighborhood data
center_venues = getNearbyVenues(names=center_data['Neighborhood'],
                                   latitudes=center_data['Latitude'],
                                   longitudes=center_data['Longitude']
                                  )

Binnenstad
Grachtengordel
Haarlemmerbuurt
Jodenbuurt
Jordaan
Kadijken
Lastage
Oosterdokseiland
Oostelijke Eilanden
Plantage
Rapenburg
Uilenburg
Westelijke Eilanden
Weteringschans


In [29]:
# check venues data frame and compute number of unique features 
print('* size of the resulting dataframe', center_venues.shape)
print(center_venues.head())
print('* how many venues were returned for each neighborhood', center_venues.groupby('Neighborhood').count())
print('* There are {} uniques categories.'.format(len(center_venues['Venue Category'].unique())))

* size of the resulting dataframe (1109, 7)
  Neighborhood  Neighborhood Latitude  Neighborhood Longitude          Venue  \
0   Binnenstad              52.373128                 4.88808  De Bierkoning   
1   Binnenstad              52.373128                 4.88808     The Hoxton   
2   Binnenstad              52.373128                 4.88808    The Duchess   
3   Binnenstad              52.373128                 4.88808    W Amsterdam   
4   Binnenstad              52.373128                 4.88808    Hummingbird   

   Venue Latitude  Venue Longitude     Venue Category  
0       52.372404         4.889795         Beer Store  
1       52.371863         4.887487              Hotel  
2       52.372712         4.889253  French Restaurant  
3       52.372787         4.890006              Hotel  
4       52.371641         4.889550        Coffee Shop  
* how many venues were returned for each neighborhood                      Neighborhood Latitude  Neighborhood Longitude  Venue  \
Neighbor

In [30]:
# one hot encoder for venue data frames
# one hot encoding
center_onehot = pd.get_dummies(center_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
center_onehot['Neighborhood'] = center_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [center_onehot.columns[-1]] + list(center_onehot.columns[:-1])
center_onehot = center_onehot[fixed_columns]

#print(center_onehot.head())
print(center_onehot.shape)

# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
center_grouped = center_onehot.groupby('Neighborhood').mean().reset_index()
center_grouped

(1109, 183)


Unnamed: 0,Neighborhood,Zoo Exhibit,Afghan Restaurant,African Restaurant,Aquarium,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridge,Buffet,Burger Joint,Burrito Place,Bus Stop,Butcher,Café,Camera Store,Canal,Candy Store,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Cosmetics Shop,Coworking Space,Creperie,Cruise,Cupcake Shop,Deli / Bodega,Design Studio,Dessert Shop,Diner,Dive Bar,Dutch Restaurant,Electronics Store,Exhibit,Fish & Chips Shop,Fish Market,Flea Market,Food & Drink Shop,French Restaurant,Furniture / Home Store,Garden,Gastropub,Gay Bar,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indonesian Restaurant,Indoor Play Area,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Latin American Restaurant,Lebanese Restaurant,Liquor Store,Lounge,Malay Restaurant,Marijuana Dispensary,Market,Massage Studio,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Movie Theater,Multiplex,Museum,Music Venue,Office,Opera House,Optical Shop,Organic Grocery,Outdoor Sculpture,Palace,Park,Pastry Shop,Performing Arts Venue,Pet Store,Pier,Pizza Place,Planetarium,Plaza,Pool,Pool Hall,Pop-Up Shop,Pub,Public Art,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,South American Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Stables,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Taco Place,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Tram Station,Turkish Restaurant,Udon Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Whisky Bar,Windmill,Wine Bar,Women's Store,Yoga Studio,Zoo
0,Binnenstad,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.02,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.1,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Grachtengordel,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.04,0.0,0.03,0.0,0.0,0.02,0.02,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.08,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.01,0.03,0.0,0.0,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Haarlemmerbuurt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.011905,0.119048,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.059524,0.0,0.011905,0.0,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.011905,0.011905,0.0,0.011905,0.0,0.0,0.0,0.011905,0.011905,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.059524,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,0.0,0.011905,0.0,0.011905,0.0,0.0,0.0,0.035714,0.011905,0.0,0.02381,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.02381,0.02381,0.0,0.0,0.0,0.02381,0.0,0.011905,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.011905,0.0,0.0,0.0,0.0,0.011905,0.0,0.035714,0.0
3,Jodenbuurt,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.1,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Jordaan,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.13,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.05,0.01,0.01,0.02,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.02,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0
5,Kadijken,0.294118,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.019608,0.019608
6,Lastage,0.0,0.0,0.012658,0.0,0.012658,0.012658,0.0,0.0,0.0,0.012658,0.0,0.025316,0.012658,0.101266,0.012658,0.0,0.0,0.0,0.012658,0.012658,0.012658,0.0,0.012658,0.0,0.0,0.025316,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.025316,0.0,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.0,0.025316,0.025316,0.0,0.0,0.012658,0.0,0.0,0.0,0.012658,0.0,0.0,0.012658,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.025316,0.012658,0.0,0.012658,0.0,0.025316,0.037975,0.088608,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.037975,0.012658,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.037975,0.0,0.0,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.025316,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.012658,0.012658,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Oostelijke Eilanden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Oosterdokseiland,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.048193,0.012048,0.012048,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.012048,0.0,0.0,0.024096,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.012048,0.024096,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.012048,0.0,0.0,0.036145,0.120482,0.048193,0.0,0.012048,0.0,0.0,0.0,0.0,0.036145,0.024096,0.012048,0.0,0.0,0.0,0.0,0.0,0.024096,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.012048,0.0,0.0,0.0,0.0,0.0,0.012048,0.036145,0.0,0.012048,0.0,0.0,0.012048,0.084337,0.0,0.0,0.012048,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.024096,0.012048,0.0,0.0,0.0,0.0,0.0,0.024096,0.0,0.0,0.0,0.0,0.0,0.012048,0.0,0.0,0.012048,0.0,0.012048,0.012048,0.012048,0.0,0.0,0.0,0.0,0.0
9,Plantage,0.090909,0.0,0.011364,0.0,0.011364,0.011364,0.011364,0.0,0.0,0.0,0.0,0.011364,0.011364,0.079545,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.045455,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.079545,0.0,0.011364,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.022727,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.022727,0.011364,0.079545,0.0,0.011364,0.0,0.011364,0.011364,0.011364,0.0,0.022727,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.056818,0.011364,0.011364,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.045455,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.011364,0.0,0.0,0.0,0.0,0.011364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011364


In [33]:
# create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = center_grouped['Neighborhood']

for ind in np.arange(center_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(center_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Binnenstad,Hotel,Café,Bar,French Restaurant,Dessert Shop,Bookstore,Marijuana Dispensary,Thrift / Vintage Store,Steakhouse,Italian Restaurant
1,Grachtengordel,Hotel,Bar,Restaurant,Cheese Shop,Bookstore,French Restaurant,Coffee Shop,Chocolate Shop,Sandwich Place,Café
2,Haarlemmerbuurt,Bar,Café,Italian Restaurant,Marijuana Dispensary,Deli / Bodega,Restaurant,Yoga Studio,Organic Grocery,Tapas Restaurant,Sandwich Place
3,Jodenbuurt,Marijuana Dispensary,Hotel,Bar,Coffee Shop,Italian Restaurant,Theater,Café,Brewery,Bagel Shop,Steakhouse
4,Jordaan,Bar,Café,Coffee Shop,Hotel,Furniture / Home Store,Pizza Place,Thai Restaurant,Museum,Record Shop,Restaurant
5,Kadijken,Zoo Exhibit,Restaurant,Plaza,Supermarket,Italian Restaurant,Hotel,Science Museum,Planetarium,Park,Museum
6,Lastage,Bar,Hotel,Chinese Restaurant,Hostel,Italian Restaurant,Marijuana Dispensary,Cocktail Bar,Bagel Shop,Café,Coffee Shop
7,Oostelijke Eilanden,Park,Gym / Fitness Center,Café,Bus Stop,Breakfast Spot,Mediterranean Restaurant,Pub,Hotel,Restaurant,Seafood Restaurant
8,Oosterdokseiland,Hotel,Restaurant,Bar,Hotel Bar,Italian Restaurant,Hostel,Pub,Boat or Ferry,Steakhouse,Coffee Shop
9,Plantage,Zoo Exhibit,Bar,Hotel,Café,Pizza Place,Breakfast Spot,Restaurant,Museum,History Museum,Italian Restaurant


## 6. Segmentation the center neighborhoods to find the patterns
* Kmeans clustering is deployed to segment the neighborhood
* The number of clusters are chosen to 3, due to the small number of observation

In [36]:
# segmentation
# set number of clusters
kclusters = 3

center_grouped_clustering = center_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(center_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 2, 1, 0, 1, 1, 1, 1, 1, 1], dtype=int32)

In [37]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

center_merged = center_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
center_merged = center_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

center_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Centrum (Centre),Binnenstad,52.373128,4.88808,1,Hotel,Café,Bar,French Restaurant,Dessert Shop,Bookstore,Marijuana Dispensary,Thrift / Vintage Store,Steakhouse,Italian Restaurant
1,Centrum (Centre),Grachtengordel,52.370837,4.885478,1,Hotel,Bar,Restaurant,Cheese Shop,Bookstore,French Restaurant,Coffee Shop,Chocolate Shop,Sandwich Place,Café
2,Centrum (Centre),Haarlemmerbuurt,52.382441,4.887193,1,Bar,Café,Italian Restaurant,Marijuana Dispensary,Deli / Bodega,Restaurant,Yoga Studio,Organic Grocery,Tapas Restaurant,Sandwich Place
3,Centrum (Centre),Jodenbuurt,52.36887,4.900223,1,Marijuana Dispensary,Hotel,Bar,Coffee Shop,Italian Restaurant,Theater,Café,Brewery,Bagel Shop,Steakhouse
4,Centrum (Centre),Jordaan,52.375416,4.881096,1,Bar,Café,Coffee Shop,Hotel,Furniture / Home Store,Pizza Place,Thai Restaurant,Museum,Record Shop,Restaurant


In [38]:
# longitude and latitude of center of amsterdam
latitude = 52.3733713
longitude = 4.8689007

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(center_merged['Latitude'], center_merged['Longitude'], center_merged['Neighborhood'], center_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 7. Examine the results after clustering and draw some conclusion about center areas

In [39]:
# examine each classified group
center_merged.loc[center_merged['Cluster Labels'] == 0, center_merged.columns[[1] + list(range(5, center_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Oostelijke Eilanden,Park,Gym / Fitness Center,Café,Bus Stop,Breakfast Spot,Mediterranean Restaurant,Pub,Hotel,Restaurant,Seafood Restaurant


In [40]:
# examine each classified group
center_merged.loc[center_merged['Cluster Labels'] == 1, center_merged.columns[[1] + list(range(5, center_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Binnenstad,Hotel,Café,Bar,French Restaurant,Dessert Shop,Bookstore,Marijuana Dispensary,Thrift / Vintage Store,Steakhouse,Italian Restaurant
1,Grachtengordel,Hotel,Bar,Restaurant,Cheese Shop,Bookstore,French Restaurant,Coffee Shop,Chocolate Shop,Sandwich Place,Café
2,Haarlemmerbuurt,Bar,Café,Italian Restaurant,Marijuana Dispensary,Deli / Bodega,Restaurant,Yoga Studio,Organic Grocery,Tapas Restaurant,Sandwich Place
3,Jodenbuurt,Marijuana Dispensary,Hotel,Bar,Coffee Shop,Italian Restaurant,Theater,Café,Brewery,Bagel Shop,Steakhouse
4,Jordaan,Bar,Café,Coffee Shop,Hotel,Furniture / Home Store,Pizza Place,Thai Restaurant,Museum,Record Shop,Restaurant
6,Lastage,Bar,Hotel,Chinese Restaurant,Hostel,Italian Restaurant,Marijuana Dispensary,Cocktail Bar,Bagel Shop,Café,Coffee Shop
7,Oosterdokseiland,Hotel,Restaurant,Bar,Hotel Bar,Italian Restaurant,Hostel,Pub,Boat or Ferry,Steakhouse,Coffee Shop
9,Plantage,Zoo Exhibit,Bar,Hotel,Café,Pizza Place,Breakfast Spot,Restaurant,Museum,History Museum,Italian Restaurant
10,Rapenburg,Bar,Hotel,Hostel,Coffee Shop,Italian Restaurant,Pizza Place,Breakfast Spot,History Museum,Boat or Ferry,Chinese Restaurant
11,Uilenburg,Bar,Hotel,Italian Restaurant,Marijuana Dispensary,Hostel,Bagel Shop,Café,Chinese Restaurant,Coffee Shop,Pizza Place


In [41]:
# examine each classified group
center_merged.loc[center_merged['Cluster Labels'] == 2, center_merged.columns[[1] + list(range(5, center_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Kadijken,Zoo Exhibit,Restaurant,Plaza,Supermarket,Italian Restaurant,Hotel,Science Museum,Planetarium,Park,Museum


As we can see from above frames, the group 1, is the neighbor named 'Oostelijke Eilanden' is populated with natural and friedly venues such as gyms, parks and bus stop. Whereas, in the group 0, they are mostly populated with food and beverage groups with so many bars, cafe, and Hotels.
And in the group 2, we can easy.