# Capstone Project - The Battle of the Neighborhoods
### -- Osaka vs Manhattan--

## Purpose
This document provides the details of my final peer reviewed assignment for the IBM Data Science Professional Certificate  program –Coursera Capstone.  In this project, it is aimed to compare the neighborhoods of Osaka and Manhattan and determine how similar or dissimilar they are.


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Acquisition](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

 **Osaka** is one of the most famous cities in Japan.  My friend is moving from **Manhattan**, NY to **Osaka** for a career change. 
 
 The topic assumed here is to help her to find a place living in Osaka where the environment is similar to her previous home in Manhattan. In this project, I will cluster the venues of all neighborhoods in both cities and make a comparision to understand the similarity and dissimilarity of them. I will also collect and provide a data driven recommendation about where to eat or visit in Osaka.

## Data Acquisition <a name="data"></a>

#### Osaka neighborhoods names
* Osaka districts names will be retrieved from [Wikipedia](https://en.wikipedia.org/wiki/Osaka) 

#### Osaka , Manhattan and  their neighborhoods location
* Data coordinates of Osaka and Manhattan's neighborhood will be retrieved using google API.

#### Osaka top Venue recommendations 
(Foursquare Category:  https://developer.foursquare.com/docs/resources/categories)
* Osaka and Mahattan's neighborhoods are explored using Foursquare API . The following information are retrieved. 
  
  - Venue ID
  - Venue name
  - Coordinates: Latitude and Longitude
  - Category names 
  - Venue ratings (Due to the Foursquare access limitation,  only 2 types of ratings were retrieved in this project)


## Methodology <a name="methodology"></a>

1. The website information will be retrieved using **Beutifulsoup tool**.

2. Using **Pandas** for proper cleaning to create a dataframe.

3. The locations are marked upon the map to obtain the co-ordinates of the places via the **Geocoding API** from Google.

4. **K-means clustering** algorithm will be use to analyze the similarity or dissimilarity between two cities. 

## Analysis <a name="analysis"></a>

let's download all the dependencies that we will need.

In [1]:
!conda install -c conda-forge folium=0.5.0 --yes # comment/uncomment if not yet installed.
!conda install -c conda-forge geopy --yes        # comment/uncomment if not yet installed

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

# Numpy and Pandas libraries were already imported at the beginning of this notebook.
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library

import requests # library to handle requests
import lxml.html as lh
import bs4 as bs
import urllib.request

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.6.11

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /opt/anaconda3

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    cryptography-2.6.1         |   py36h1905b30_0         603 KB  conda-forge
    python-3.6.7               |    ha0a29de_1004        13.0 MB  conda-forge
    krb5-1.16.3                |    hd2bbab6_1001         1.7 MB  conda-forge
    altair-2.2.2               |             py_0         278 KB  conda-forge
    tk-8.6.9                   |    h84994c4_1001         3.7 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ca-certificates-2019.3.9   |       hecc5488_0         146 KB  conda-forge
    libgcc-ng-8.2.0            |       h822a55f_1      

In [2]:
# @hidden_cell
CLIENT_ID = 'GQNAJNX2TO0Q2T1FE51EDSTI411IY14GY4N1POVQJIMTOLPV' # your Foursquare ID
CLIENT_SECRET = 'GXAFIZ2WWDQVBRGVVPY5CTGW0I5Z4NP0OI2L4FGKQSR3NSAK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GQNAJNX2TO0Q2T1FE51EDSTI411IY14GY4N1POVQJIMTOLPV
CLIENT_SECRET:GXAFIZ2WWDQVBRGVVPY5CTGW0I5Z4NP0OI2L4FGKQSR3NSAK


### Exploring Osaka
Osaka has 24 neighborhoods. In order to segement these neighborhoods and explore them, we will essentially need a dataset that contains them. I scraped the following Wikipedia page inorder to get the information. 

In [3]:
from bs4 import BeautifulSoup
source = requests.get('https://en.wikipedia.org/wiki/Osaka#Neighborhoods').text
soup = BeautifulSoup(source, 'html5lib')
table = soup.find_all('table')[4] 
df = pd.read_html(str(table))
#df[0]

In [8]:
# to clean up the list.
col_rename = {0:'index',1:'Neighborhood', 2:'Neighborhood (Kanji)'}
df_ward = df[0].drop([0,1,2,3]).rename(columns = col_rename).set_index('index')
Osaka_ku = df_ward.replace({'Kita-ku (administrative center)':'Kita-ku'})
Osaka_ku.head()

Unnamed: 0_level_0,Neighborhood,Neighborhood (Kanji)
index,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Abeno-ku,阿倍野区
2,Asahi-ku,旭区
3,Chūō-ku,中央区
4,Fukushima-ku,福島区
5,Higashinari-ku,東成区


In [9]:
#retrieve the coordinates
Osaka_ku['Latitude'] = float(0)
Osaka_ku['Longitude'] = float(0)

geolocator = Nominatim(user_agent="nj_explorer")
for index,Place_name in Osaka_ku['Neighborhood'].iteritems():
    location = geolocator.geocode(Place_name)
    lat = location.latitude
    lon = location.longitude
    if Osaka_ku.loc[index,'Latitude'] == 0:
        Osaka_ku.loc[index,'Latitude']= lat
    if Osaka_ku.loc[index,'Longitude'] == 0:
        Osaka_ku.loc[index,'Longitude']= lon

Osaka_ku.head()

Unnamed: 0_level_0,Neighborhood,Neighborhood (Kanji),Latitude,Longitude
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,Abeno-ku,阿倍野区,34.627501,135.514095
2,Asahi-ku,旭区,35.476018,139.53192
3,Chūō-ku,中央区,35.666255,139.775565
4,Fukushima-ku,福島区,34.692104,135.474812
5,Higashinari-ku,東成区,34.672912,135.550567


In [85]:
#generate a basemap
geo = Nominatim(user_agent='nj_explorer')
address = 'Osaka'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Osaka {}, {}.'.format(latitude, longitude))

# create map of Singapore using latitude and longitude values
map_osaka = folium.Map(location=[latitude, longitude],tiles="OpenStreetMap", zoom_start=10)

# add markers to map
for lat, lng, Place_name in zip(
    Osaka_ku['Latitude'],
    Osaka_ku['Longitude'],
    Osaka_ku['Neighborhood']):
    label = Osaka_ku['Neighborhood']
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=8,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.9,
        parse_html=False).add_to(map_osaka)
map_osaka

The geograpical coordinate of Osaka 34.6937569, 135.5014539.


### Explore Neighborhoods in Osaka
Create a dataframe of category in the venue

In [12]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
#url # display URL

In [13]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [15]:
Osaka_venues = getNearbyVenues(names=Osaka_ku['Neighborhood'],
                                   latitudes=Osaka_ku['Latitude'],
                                   longitudes=Osaka_ku['Longitude'])                                 
print(Osaka_venues.shape)
print('There are {} uniques categories in Osaka.'.format(len(Osaka_venues['Venue Category'].unique())))
Osaka_venues.head()

(550, 7)
There are 107 uniques categories in Osaka.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abeno-ku,34.627501,135.514095,Usagi to Boku (うさぎとぼく),34.62986,135.514996,Coffee Shop
1,Abeno-ku,34.627501,135.514095,ライフ セントラルスクエア 北畠店,34.626273,135.509131,Supermarket
2,Abeno-ku,34.627501,135.514095,FamilyMart (ファミリーマート 阿倍野昭和町店),34.627117,135.516143,Convenience Store
3,Abeno-ku,34.627501,135.514095,7-Eleven (セブンイレブン 大阪阪南町3丁目店),34.628868,135.514661,Convenience Store
4,Abeno-ku,34.627501,135.514095,モスバーガー 昭和町店,34.630623,135.516397,Fast Food Restaurant


### Analyze Each Neighborhood in Osaka

In [16]:
# one hot encoding
Osaka_onehot = pd.get_dummies(Osaka_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Osaka_onehot['Neighborhood'] = Osaka_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Osaka_onehot.columns[-1]] + list(Osaka_onehot.columns[:-1])
Osaka_onehot = Osaka_onehot[fixed_columns]

Osaka_grouped = Osaka_onehot.groupby('Neighborhood').mean().reset_index()
Osaka_grouped.head()

Unnamed: 0,Neighborhood,Arcade,Art Gallery,Arts & Crafts Store,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Bed & Breakfast,Bookstore,Buddhist Temple,Burger Joint,Bus Station,Bus Stop,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Concert Hall,Convenience Store,Cupcake Shop,Deli / Bodega,Dessert Shop,Diner,Discount Store,Donburi Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Food Court,Food Service,French Restaurant,Fried Chicken Joint,Furniture / Home Store,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Halal Restaurant,Harbor / Marina,Hardware Store,History Museum,Hot Spring,Hotel,Ice Cream Shop,Indian Restaurant,Intersection,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Kaiseki Restaurant,Karaoke Bar,Karaoke Box,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Liquor Store,Market,Metro Station,Mexican Restaurant,Mobile Phone Shop,Music Store,Music Venue,Noodle House,Okonomiyaki Restaurant,Park,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Pub,Ramen Restaurant,Restaurant,Rock Club,Sake Bar,Sandwich Place,Seafood Restaurant,Shopping Mall,Snack Place,Soba Restaurant,Soccer Field,Spa,Spanish Restaurant,Sporting Goods Shop,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Takoyaki Place,Tea Room,Tempura Restaurant,Thai Restaurant,Tonkatsu Restaurant,Trail,Train Station,Udon Restaurant,Unagi Restaurant,Wagashi Place,Wine Bar,Wings Joint,Yakitori Restaurant
0,Abeno-ku,0.0,0.071429,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Asahi-ku,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Chūō-ku,0.0,0.0,0.0,0.0,0.010638,0.031915,0.0,0.0,0.021277,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.053191,0.0,0.085106,0.0,0.0,0.0,0.010638,0.0,0.031915,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.010638,0.010638,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.010638,0.0,0.031915,0.010638,0.138298,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.010638,0.0,0.031915,0.0,0.0,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.0,0.010638,0.021277,0.0,0.0,0.042553,0.0,0.0,0.010638,0.0,0.0,0.010638,0.0,0.191489,0.0,0.0,0.010638,0.021277,0.0,0.010638,0.010638,0.0,0.0,0.021277,0.0,0.010638,0.0,0.010638
3,Fukushima-ku,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.021739,0.065217,0.0,0.0,0.0,0.152174,0.0,0.021739,0.0,0.021739,0.0,0.065217,0.021739,0.0,0.021739,0.0,0.021739,0.086957,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.021739,0.0,0.043478,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.021739
4,Higashinari-ku,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Print each neighborhood along with the top 5 most common venues

In [17]:
num_top_venues = 5
for hood in Osaka_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = Osaka_grouped[Osaka_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abeno-ku----
               venue  freq
0  Convenience Store  0.21
1      Shopping Mall  0.14
2       Intersection  0.07
3        Coffee Shop  0.07
4         Steakhouse  0.07


----Asahi-ku----
            venue  freq
0      Playground  0.25
1  Baseball Field  0.25
2     Golf Course  0.25
3            Park  0.25
4          Arcade  0.00


----Chūō-ku----
                 venue  freq
0     Sushi Restaurant  0.19
1  Japanese Restaurant  0.14
2    Convenience Store  0.09
3          Coffee Shop  0.05
4      Soba Restaurant  0.04


----Fukushima-ku----
                  venue  freq
0     Convenience Store  0.15
1  Fast Food Restaurant  0.09
2      Ramen Restaurant  0.07
3    Donburi Restaurant  0.07
4    Chinese Restaurant  0.07


----Higashinari-ku----
            venue  freq
0            Café  0.33
1   Shopping Mall  0.33
2   Grocery Store  0.33
3  Sandwich Place  0.00
4       Rock Club  0.00


----Higashisumiyoshi-ku----
                  venue  freq
0     Convenience Store  0.25
1   

### Put into a dataframe and display to top 5 common venues

In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    #print(indicators[ind])
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_osaka = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_osaka['Neighborhood'] = Osaka_grouped['Neighborhood']

for ind in np.arange(Osaka_grouped.shape[0]):
    neighborhoods_venues_sorted_osaka.iloc[ind, 1:] = return_most_common_venues(Osaka_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_osaka

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abeno-ku,Convenience Store,Shopping Mall,Steakhouse,Liquor Store,Art Gallery
1,Asahi-ku,Golf Course,Playground,Baseball Field,Park,Yakitori Restaurant
2,Chūō-ku,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant
3,Fukushima-ku,Convenience Store,Fast Food Restaurant,Donburi Restaurant,Chinese Restaurant,Ramen Restaurant
4,Higashinari-ku,Café,Shopping Mall,Grocery Store,Yakitori Restaurant,Dumpling Restaurant
5,Higashisumiyoshi-ku,Convenience Store,Pharmacy,Japanese Curry Restaurant,Szechuan Restaurant,Donut Shop
6,Higashiyodogawa-ku,Convenience Store,Japanese Restaurant,Chinese Restaurant,Ramen Restaurant,Sake Bar
7,Hirano-ku,Japanese Restaurant,Convenience Store,Liquor Store,Gourmet Shop,Electronics Store
8,Ikuno-ku,Convenience Store,Supermarket,Tempura Restaurant,Italian Restaurant,Diner
9,Jōtō-ku,Convenience Store,Japanese Restaurant,Metro Station,Sake Bar,Fast Food Restaurant


### Cluster Neighborhoods in Osaka

In [20]:
# set number of clusters
kclusters = 5
Osaka_grouped_clustering = Osaka_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Osaka_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 3, 4, 4, 4, 4, 4, 4, 2, 0, 4, 0, 0, 4, 4, 0, 4, 4, 0,
       1, 4], dtype=int32)

In [21]:
neighborhoods_venues_sorted_osaka.insert(0, 'Cluster Labels', kmeans.labels_)

In [22]:
neighborhoods_venues_sorted_osaka.head()

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,Abeno-ku,Convenience Store,Shopping Mall,Steakhouse,Liquor Store,Art Gallery
1,0,Asahi-ku,Golf Course,Playground,Baseball Field,Park,Yakitori Restaurant
2,0,Chūō-ku,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant
3,0,Fukushima-ku,Convenience Store,Fast Food Restaurant,Donburi Restaurant,Chinese Restaurant,Ramen Restaurant
4,3,Higashinari-ku,Café,Shopping Mall,Grocery Store,Yakitori Restaurant,Dumpling Restaurant


In [25]:
# add clustering labels
# neighborhoods_venues_sorted_osaka.replace(0, 'Cluster Labels', kmeans.labels_, axis = 1, inplace = True)

#Osaka_merged = Osaka_ku
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
Osaka_merged = Osaka_ku.join(neighborhoods_venues_sorted_osaka.set_index('Neighborhood'), on='Neighborhood')
Osaka_merged['Cluster Labels'].astype(np.int64, inplace = True)
Osaka_merged.head()

Unnamed: 0_level_0,Neighborhood,Neighborhood (Kanji),Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Abeno-ku,阿倍野区,34.627501,135.514095,0,Convenience Store,Shopping Mall,Steakhouse,Liquor Store,Art Gallery
2,Asahi-ku,旭区,35.476018,139.53192,0,Golf Course,Playground,Baseball Field,Park,Yakitori Restaurant
3,Chūō-ku,中央区,35.666255,139.775565,0,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant
4,Fukushima-ku,福島区,34.692104,135.474812,0,Convenience Store,Fast Food Restaurant,Donburi Restaurant,Chinese Restaurant,Ramen Restaurant
5,Higashinari-ku,東成区,34.672912,135.550567,3,Café,Shopping Mall,Grocery Store,Yakitori Restaurant,Dumpling Restaurant


In [35]:
#generate a basemap
geo = Nominatim(user_agent='nj_explorer')
address = 'Osaka'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Osaka {}, {}.'.format(latitude, longitude))

# create map
map_osaka_cluster = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Osaka_merged['Latitude'], Osaka_merged['Longitude'], Osaka_merged['Neighborhood'], Osaka_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.4).add_to(map_osaka_cluster)
       
map_osaka_cluster

The geograpical coordinate of Osaka 34.6937569, 135.5014539.


### Best food restuarant in Osaka
quory over notification!!

In [86]:
FOURSQUARE_SEARCH_URL = 'https://api.foursquare.com/v2/venues/search?'
# SEARCH VENUES BY CATEGORY

# Dataframe : venue_id_recover 
# - store venue id to recover failed venues id score retrieval later if foursquare limit is exceeded when getting score.
venue_id_rcols = ['VenueID']
venue_id_recover = pd.DataFrame(columns=venue_id_rcols)

def getVenuesByCategory(names, latitudes, longitudes, categoryID, radius=500):
    global CLIENT_ID
    global CLIENT_SECRET
    global FOURSQUARE_EXPLORE_URL
    global FOURSQUARE_SEARCH_URL
    global VERSION
    global LIMIT
    venue_columns = ['Town','Town Latitude','Town Longitude','VenueID','VenueName','score','category','catID','latitude','longitude']
    venue_DF = pd.DataFrame(columns=venue_columns)
    print("[#Start getVenuesByCategory]")
    for name, lat, lng in zip(names, latitudes, longitudes):
        #cyclefsk2()
        #print(name,",",end='')
        #print('getVenuesByCategory',categoryID,name) ; # DEBUG: be quiet
        # create the API request URL
        url = '{}client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            FOURSQUARE_SEARCH_URL,CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT,categoryID)
        # make the GET request
        results = requests.get(url).json()
        # Populate dataframe with the category venue results
        # Extracting JSON  data values
        
        for jsonSub in results['response']['venues']:
            #print(jsonSub)
            # JSON Results may not be in expected format or incomplete data, in that case, skip!
            ven_id = 0
            try:
                # If there are any issue with a restaurant, retry or ignore and continue
                # Get location details
                ven_id   = jsonSub['id']
                ven_cat  = jsonSub['categories'][0]['pluralName']
                ven_CID  = jsonSub['categories'][0]['id']
                ven_name = jsonSub['name']
                ven_lat  = jsonSub['location']['lat']
                ven_lng  = jsonSub['location']['lng']
                venue_DF = venue_DF.append({
                    'Town'      : name,
                    'Town Latitude' : lat,
                    'Town Longitude': lng,
                    'VenueID'   : ven_id,
                    'VenueName' : ven_name,
                    'score'     : float(0),
                    'category'  : ven_cat,
                    'catID'     : ven_CID,
                    'latitude'  : ven_lat,
                    'longitude' : ven_lng}, ignore_index=True)
            except:
                continue
    # END OF LOOP, return.
    print("\n[#Done getVenuesByCategory]")
    return(venue_DF)
            

In [89]:
categoryID_food= "4bf58dd8d48988d142941735" # food category                              
Osaka_food = getVenuesByCategory(names=Osaka_ku['Neighborhood'],
                                   latitudes=Osaka_ku['Latitude'],
                                   longitudes=Osaka_ku['Longitude'],categoryID = categoryID_food)                                 

print(Osaka_food.shape)
for idx, VenueID in Osaka_food['score'].iteritems():
    venue_id = Osaka_food.loc[idx,'VenueID']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    if Osaka_food.loc[idx,'score'] == 0.00 :
        try:
            Osaka_food.loc[idx,'score'] = result['response']['venue']['rating']
        except:
            Osaka_food.loc[idx,'score'] = 'pass'
    
Osaka_food

[#Start getVenuesByCategory]

[#Done getVenuesByCategory]
(772, 10)


Unnamed: 0,Town,Town Latitude,Town Longitude,VenueID,VenueName,score,category,catID,latitude,longitude
0,Abeno-ku,34.627501,135.514095,5a2526c4c8b2fb5c62b99f57,中華厨房,pass,Chinese Restaurants,4bf58dd8d48988d145941735,34.625418,135.512938
1,Abeno-ku,34.627501,135.514095,5bc43e32898bdc002bb3a707,手打ちうどん めん処 しかだ,pass,Udon Restaurants,55a59bace4b013909087cb2a,34.6266,135.51616
2,Abeno-ku,34.627501,135.514095,4f9cc085e4b0ef2f71180d79,夢菜館,pass,Chinese Restaurants,4bf58dd8d48988d145941735,34.628438,135.516495
3,Abeno-ku,34.627501,135.514095,51387e2be4b019e1c4300cd8,地沢臨,pass,Japanese Restaurants,4bf58dd8d48988d111941735,34.621941,135.513947
4,Abeno-ku,34.627501,135.514095,53fedba8498efbf7ffe64ce4,泓源閣,pass,Szechuan Restaurants,52af3b773cf9994f4e043c03,34.622438,135.514291
5,Abeno-ku,34.627501,135.514095,52188d4011d29a1c1e41f3e6,旬菜旬魚 匠,pass,Japanese Restaurants,4bf58dd8d48988d111941735,34.627061,135.51594
6,Abeno-ku,34.627501,135.514095,4bee69e3e8c3c928b2ab9892,口口福,pass,Chinese Restaurants,4bf58dd8d48988d145941735,34.624171,135.515237
7,Abeno-ku,34.627501,135.514095,4cc509d1bc239521a712bd33,松寿司,pass,Sushi Restaurants,4bf58dd8d48988d1d2941735,34.627648,135.510144
8,Abeno-ku,34.627501,135.514095,4cde08d77e2e236a2ce77c1b,今日亭 昭和町店,pass,Noodle Houses,4bf58dd8d48988d1d1941735,34.631967,135.516488
9,Abeno-ku,34.627501,135.514095,4cc1accd5684a35d673dba0d,亭亭,pass,Noodle Houses,4bf58dd8d48988d1d1941735,34.622258,135.515138


In [108]:
a = Osaka_food[Osaka_food['score'] != 'pass']
a.nlargest(3, 'score')

Unnamed: 0,Town,Town Latitude,Town Longitude,VenueID,VenueName,score,category,catID,latitude,longitude
10,Abeno-ku,34.627501,135.514095,4c2748b95c5ca593b48047fe,CoCo Ichibanya (CoCo壱番屋 阿倍野昭和町店),6.2,Japanese Curry Restaurants,55a59bace4b013909087cb30,34.631863,135.516867
25,Abeno-ku,34.627501,135.514095,4bcae59c68f976b0a1206083,なか卯 西田辺店,5.7,Donburi Restaurants,55a59bace4b013909087cb0c,34.621789,135.515244
28,Abeno-ku,34.627501,135.514095,4bf4d2db98ac0f47db7964a8,餃子の王将 西田辺店,6.3,Chinese Restaurants,4bf58dd8d48988d145941735,34.622106,135.515176


In [None]:
Osaka_food.nlargest(3,'score')

In [118]:
categoryID_Art= "4d4b7104d754a06370d81259" # Art category                              
Osaka_Art = getVenuesByCategory(names=Osaka_ku['Neighborhood'],
                                   latitudes=Osaka_ku['Latitude'],
                                   longitudes=Osaka_ku['Longitude'],categoryID = categoryID_Art)                                 

print(Osaka_Art.shape)
for idx, VenueID in Osaka_Art['score'].iteritems():
    venue_id = Osaka_Art.loc[idx,'VenueID']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    if Osaka_Art.loc[idx,'score'] == 0.00 :
        try:
            Osaka_Art.loc[idx,'score'] = result['response']['venue']['rating']
        except:
            Osaka_Art.loc[idx,'score'] = 'pass'
    
Osaka_Art.head()

[#Start getVenuesByCategory]

[#Done getVenuesByCategory]
(284, 10)


Unnamed: 0,Town,Town Latitude,Town Longitude,VenueID,VenueName,score,category,catID,latitude,longitude
0,Abeno-ku,34.627501,135.514095,4d0f6e5ae2365481fdc271ea,旧制大阪高等学校跡地,pass,Historic Sites,4deefb944765f83613cdba6e,34.627081,135.510406
1,Abeno-ku,34.627501,135.514095,4bea587cb3352d7f491955d2,阿倍王子神社,pass,Shrines,4eb1d80a4b900d56c88a45ff,34.631045,135.509169
2,Abeno-ku,34.627501,135.514095,4fc069eae4b0a3bda063b9a8,阿倍野長屋,pass,Art Galleries,4bf58dd8d48988d1e2931735,34.625736,135.518135
3,Abeno-ku,34.627501,135.514095,4f98dac9bb3db8353a69ba21,カフェ華,pass,Cafés,4bf58dd8d48988d16d941735,34.629986,135.518847
4,Abeno-ku,34.627501,135.514095,4e71daa4483b926f07d96fb4,寅屋,pass,General Entertainment,4bf58dd8d48988d1f1931735,34.622285,135.51448


In [116]:
Osaka_Art[Osaka_Art['score'] != 'pass']
#a.nlargest(3, 'score')

Unnamed: 0,Town,Town Latitude,Town Longitude,VenueID,VenueName,score,category,catID,latitude,longitude


In [120]:
categoryID_Museum= "4bf58dd8d48988d181941735" # Museum category                              
Osaka_Museum = getVenuesByCategory(names=Osaka_ku['Neighborhood'],
                                   latitudes=Osaka_ku['Latitude'],
                                   longitudes=Osaka_ku['Longitude'],categoryID = categoryID_Museum)                                 

print(Osaka_Museum.shape)
for idx, VenueID in Osaka_Museum['score'].iteritems():
    venue_id = Osaka_Museum.loc[idx,'VenueID']
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    if Osaka_Museum.loc[idx,'score'] == 0.00 :
        try:
            Osaka_Museum.loc[idx,'score'] = result['response']['venue']['rating']
        except:
            Osaka_Museum.loc[idx,'score'] = 'pass'
    
Osaka_Museum.head()

[#Start getVenuesByCategory]

[#Done getVenuesByCategory]
(13, 10)


Unnamed: 0,Town,Town Latitude,Town Longitude,VenueID,VenueName,score,category,catID,latitude,longitude
0,Chūō-ku,35.666255,139.775565,4bcd60e2cc8cd13a4c84c2cf,タイムドーム明石 中央区立郷土天文館,pass,Museums,4bf58dd8d48988d181941735,35.666997,139.776927
1,Chūō-ku,35.666255,139.775565,4bd3c7749854d13adfacfe4d,Teusler Memorial House (トイスラー記念館),pass,History Museums,4bf58dd8d48988d190941735,35.667458,139.776058
2,Chūō-ku,35.666255,139.775565,4bda93123904a59339cf469e,かちどき 橋の資料館,pass,History Museums,4bf58dd8d48988d190941735,35.663189,139.773474
3,Chūō-ku,35.666255,139.775565,4cc3c6a7be40a35d28c07a4c,ノベルティミュージアム,pass,Museums,4bf58dd8d48988d181941735,35.669838,139.777207
4,Chūō-ku,35.666255,139.775565,4df19e0a52b100c2d7f2db68,築地よりみち館,pass,History Museums,4bf58dd8d48988d190941735,35.666583,139.770612


### Exploring Manhhatan
check 

In [36]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [37]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
neighborhoods_data = newyork_data['features']

# define the dataframe columns
column_names = ['Borough', 'Neighborhood','Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [38]:
manhattan_data = pd.DataFrame(neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True))
#manhattan_data_x = pd.DataFrame(manhattan_data.loc[0:0,])
print (manhattan_data.shape)
manhattan_data.head()

(40, 4)


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [39]:
geo = Nominatim(user_agent='nj_explorer')
address = 'Manhattan'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan {}, {}.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_Manh = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood'], manhattan_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Manh)  
    
map_Manh

The geograpical coordinate of Manhattan 40.7900869, -73.9598295.


### Explore Neighborhoods in Manhattan

In [40]:
M_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude'])                                 
#print(Osaka_venues.shape)
#print('There are {} uniques categories in Osaka.'.format(len(Osaka_venues['Venue Category'].unique())))
print(M_venues.shape)
M_venues.head()

(3328, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin' Donuts,40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [41]:
M_venues.groupby('Neighborhood').count()
print('There are {} uniques categories.'.format(len(M_venues['Venue Category'].unique())))

There are 327 uniques categories.


### Analyze Each Neighborhood of Manhattan

In [42]:
# one hot encoding
manhattan_onehot = pd.get_dummies(M_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = M_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

print(manhattan_onehot.shape)
manhattan_onehot.head()

(3328, 328)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Bookstore,College Cafeteria,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soba Restaurant,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [43]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
print(manhattan_grouped.shape)
manhattan_grouped.head()

(40, 328)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Bookstore,College Cafeteria,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soba Restaurant,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Track,Trail,Tree,Turkish Restaurant,Udon Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.020202,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.070707,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.020202,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.050505,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.020202,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.070707,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.020202,0.020202,0.020202,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.020202,0.010101,0.0,0.010101,0.010101,0.0,0.0,0.0,0.030303,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.030303,0.0,0.010101,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.03
2,Central Harlem,0.0,0.0,0.0,0.068182,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.07,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.05,0.01,0.0,0.01,0.0,0.0,0.01,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### The top 5 common venues in each neighborhood of manhattan

In [45]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    #print(indicators[ind])
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_M= pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_M['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted_M.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_M.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Hotel,Gym,Wine Shop
1,Carnegie Hill,Pizza Place,Café,Coffee Shop,Bar,Cosmetics Shop
2,Central Harlem,African Restaurant,Cosmetics Shop,American Restaurant,Gym / Fitness Center,Seafood Restaurant
3,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Bakery,American Restaurant
4,Chinatown,Chinese Restaurant,Cocktail Bar,Dim Sum Restaurant,American Restaurant,Vietnamese Restaurant


### Cluster Neighborhoods in Manh

In [46]:
# set number of clusters
kclusters = 5
manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([3, 1, 0, 1, 1, 0, 0, 4, 1, 3, 0, 1, 0, 4, 0, 4, 1, 0, 1, 1, 1, 3,
       3, 0, 0, 3, 1, 1, 3, 0, 2, 1, 0, 4, 0, 1, 1, 4, 0, 1], dtype=int32)

In [47]:
# add clustering labels
neighborhoods_venues_sorted_M.insert(0, 'Cluster Labels', kmeans.labels_)
manhattan_merged = manhattan_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted_M.set_index('Neighborhood'), on='Neighborhood')
manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,3,Coffee Shop,Discount Store,Sandwich Place,Yoga Studio,Big Box Store
1,Manhattan,Chinatown,40.715618,-73.994279,1,Chinese Restaurant,Cocktail Bar,Dim Sum Restaurant,American Restaurant,Vietnamese Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Café,Bakery,Mobile Phone Shop,Grocery Store,Pizza Place
3,Manhattan,Inwood,40.867684,-73.92121,4,Café,Mexican Restaurant,Pizza Place,Lounge,Frozen Yogurt Shop
4,Manhattan,Hamilton Heights,40.823604,-73.949688,4,Mexican Restaurant,Pizza Place,Café,Coffee Shop,Deli / Bodega


In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examing Clusters

In [39]:
#cluster1
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
6,Central Harlem,African Restaurant,Cosmetics Shop,American Restaurant,Gym / Fitness Center,Seafood Restaurant
13,Lincoln Square,Theater,Café,Italian Restaurant,Gym / Fitness Center,Plaza
14,Clinton,Theater,Italian Restaurant,Gym / Fitness Center,Hotel,American Restaurant
15,Midtown,Hotel,Clothing Store,Cocktail Bar,Steakhouse,Theater
18,Greenwich Village,Italian Restaurant,French Restaurant,Sushi Restaurant,Clothing Store,Indian Restaurant
21,Tribeca,Café,Park,Italian Restaurant,Spa,American Restaurant
23,Soho,Clothing Store,Boutique,Women's Store,Men's Store,Mediterranean Restaurant
24,West Village,Italian Restaurant,New American Restaurant,Gastropub,Cosmetics Shop,Wine Bar
32,Civic Center,Gym / Fitness Center,Italian Restaurant,Bakery,French Restaurant,Yoga Studio
33,Midtown South,Korean Restaurant,Hotel,Hotel Bar,Cosmetics Shop,Japanese Restaurant


In [40]:
#cluster2
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Chinatown,Chinese Restaurant,Cocktail Bar,Dim Sum Restaurant,American Restaurant,Vietnamese Restaurant
8,Upper East Side,Italian Restaurant,Exhibit,Coffee Shop,Gym / Fitness Center,Juice Bar
9,Yorkville,Italian Restaurant,Coffee Shop,Bar,Gym,Pizza Place
10,Lenox Hill,Coffee Shop,Italian Restaurant,Sushi Restaurant,Pizza Place,Gym / Fitness Center
12,Upper West Side,Italian Restaurant,Bar,Coffee Shop,Indian Restaurant,Wine Bar
16,Murray Hill,Hotel,Coffee Shop,Japanese Restaurant,Sandwich Place,Gym
17,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Bakery,American Restaurant
19,East Village,Bar,Ice Cream Shop,Wine Bar,Mexican Restaurant,Pizza Place
20,Lower East Side,Café,Coffee Shop,Art Gallery,Ramen Restaurant,Park
22,Little Italy,Bakery,Café,Chinese Restaurant,Seafood Restaurant,Sandwich Place


### Osaka vs Manh

In [50]:
Osaka_venues.head()


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abeno-ku,34.627501,135.514095,Usagi to Boku (うさぎとぼく),34.62986,135.514996,Coffee Shop
1,Abeno-ku,34.627501,135.514095,ライフ セントラルスクエア 北畠店,34.626273,135.509131,Supermarket
2,Abeno-ku,34.627501,135.514095,FamilyMart (ファミリーマート 阿倍野昭和町店),34.627117,135.516143,Convenience Store
3,Abeno-ku,34.627501,135.514095,7-Eleven (セブンイレブン 大阪阪南町3丁目店),34.628868,135.514661,Convenience Store
4,Abeno-ku,34.627501,135.514095,モスバーガー 昭和町店,34.630623,135.516397,Fast Food Restaurant


In [51]:
M_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Dunkin' Donuts,40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop


In [54]:
OandM_venues = pd.concat([Osaka_venues,M_venues])
OandM_venues = pd.DataFrame(OandM_venues.reset_index(drop = True))

In [55]:
OandM_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abeno-ku,14,14,14,14,14,14
Asahi-ku,4,4,4,4,4,4
Battery Park City,99,99,99,99,99,99
Carnegie Hill,100,100,100,100,100,100
Central Harlem,44,44,44,44,44,44
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Chūō-ku,94,94,94,94,94,94
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100


In [56]:
# one hot encoding
OandM_onehot = pd.get_dummies(OandM_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
OandM_onehot['Neighborhood'] = OandM_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [OandM_onehot.columns[-1]] + list(OandM_onehot.columns[:-1])
OandM_onehot = OandM_onehot[fixed_columns]

print(OandM_onehot.shape)
OandM_onehot.head()

(3878, 349)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Bookstore,College Cafeteria,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donburi Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kaiseki Restaurant,Karaoke Bar,Karaoke Box,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tempura Restaurant,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tonkatsu Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Unagi Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Wagashi Place,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yakitori Restaurant,Yoga Studio
0,Abeno-ku,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abeno-ku,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abeno-ku,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abeno-ku,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abeno-ku,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [57]:
#Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
OandM_grouped = OandM_onehot.groupby('Neighborhood').mean().reset_index()
print(OandM_grouped.shape)
OandM_grouped.head()

(64, 349)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,Australian Restaurant,Austrian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Beer Store,Big Box Store,Bike Rental / Bike Share,Bike Shop,Bike Trail,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Bridal Shop,Bridge,Bubble Tea Shop,Buddhist Temple,Building,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cambodian Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Caucasian Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Circus,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Bookstore,College Cafeteria,College Gym,College Theater,Comedy Club,Community Center,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cultural Center,Cupcake Shop,Cycle Studio,Czech Restaurant,Dance Studio,Daycare,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dive Bar,Dog Run,Donburi Restaurant,Donut Shop,Drugstore,Dry Cleaner,Dumpling Restaurant,Duty-free Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Ethiopian Restaurant,Event Space,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Service,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General College & University,General Entertainment,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Halal Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Heliport,Herbs & Spices Store,High School,Himalayan Restaurant,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hot Spring,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Kaiseki Restaurant,Karaoke Bar,Karaoke Box,Kebab Restaurant,Kids Store,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Laundry Service,Leather Goods Store,Lebanese Restaurant,Library,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Movie Theater,Museum,Music School,Music Store,Music Venue,Nail Salon,New American Restaurant,Newsstand,Nightclub,Non-Profit,Noodle House,North Indian Restaurant,Office,Okonomiyaki Restaurant,Opera House,Optical Shop,Organic Grocery,Other Nightlife,Outdoor Sculpture,Outdoors & Recreation,Paella Restaurant,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Peruvian Restaurant,Pet Café,Pet Service,Pet Store,Pharmacy,Photography Studio,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Platform,Playground,Plaza,Poke Place,Pool,Portuguese Restaurant,Pub,Public Art,Ramen Restaurant,Record Shop,Rental Car Location,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,River,Rock Climbing Spot,Rock Club,Roof Deck,Russian Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shanghai Restaurant,Shipping Store,Shoe Store,Shopping Mall,Skate Park,Smoke Shop,Snack Place,Soba Restaurant,Soccer Field,Social Club,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Spa,Spanish Restaurant,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Art,Strip Club,Supermarket,Supplement Shop,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tattoo Parlor,Tea Room,Tech Startup,Temple,Tempura Restaurant,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tonkatsu Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tree,Turkish Restaurant,Udon Restaurant,Unagi Restaurant,Used Bookstore,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Wagashi Place,Watch Shop,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yakitori Restaurant,Yoga Studio
0,Abeno-ku,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Asahi-ku,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Battery Park City,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.020202,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.070707,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.020202,0.020202,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.040404,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.050505,0.0,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.030303,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.010101,0.020202,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.070707,0.010101,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.020202,0.0,0.020202,0.020202,0.0,0.0,0.0,0.010101,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.020202,0.010101,0.0,0.010101,0.010101,0.0,0.0,0.0,0.030303,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.030303,0.0,0.010101,0.0,0.0
3,Carnegie Hill,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.03
4,Central Harlem,0.0,0.0,0.0,0.068182,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [58]:
num_top_venues = 5
for hood in OandM_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = OandM_grouped[OandM_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abeno-ku----
                  venue  freq
0     Convenience Store  0.21
1         Shopping Mall  0.14
2  Fast Food Restaurant  0.07
3           Supermarket  0.07
4            Steakhouse  0.07


----Asahi-ku----
                   venue  freq
0         Baseball Field  0.25
1            Golf Course  0.25
2                   Park  0.25
3             Playground  0.25
4  Outdoors & Recreation  0.00


----Battery Park City----
         venue  freq
0         Park  0.07
1  Coffee Shop  0.07
2        Hotel  0.05
3          Gym  0.04
4    Wine Shop  0.03


----Carnegie Hill----
            venue  freq
0     Pizza Place  0.06
1            Café  0.05
2     Coffee Shop  0.05
3             Bar  0.04
4  Cosmetics Shop  0.04


----Central Harlem----
                  venue  freq
0    African Restaurant  0.07
1    Seafood Restaurant  0.05
2  Gym / Fitness Center  0.05
3     French Restaurant  0.05
4        Cosmetics Shop  0.05


----Chelsea----
                 venue  freq
0          Coffee Shop  

In [59]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    #print(indicators[ind])
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_OandM = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_OandM['Neighborhood'] = OandM_grouped['Neighborhood']

for ind in np.arange(OandM_grouped.shape[0]):
    neighborhoods_venues_sorted_OandM.iloc[ind, 1:] = return_most_common_venues(OandM_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_OandM.reset_index(drop = True)
neighborhoods_venues_sorted_OandM


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abeno-ku,Convenience Store,Shopping Mall,Supermarket,Intersection,Steakhouse,Fast Food Restaurant,Bakery,Coffee Shop,Liquor Store,Art Gallery
1,Asahi-ku,Baseball Field,Park,Golf Course,Playground,Yoga Studio,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market
2,Battery Park City,Park,Coffee Shop,Hotel,Gym,Wine Shop,Italian Restaurant,Shopping Mall,Food Court,Burger Joint,Clothing Store
3,Carnegie Hill,Pizza Place,Café,Coffee Shop,Cosmetics Shop,Bar,Yoga Studio,Bookstore,Wine Shop,Japanese Restaurant,Spa
4,Central Harlem,African Restaurant,Gym / Fitness Center,Cosmetics Shop,American Restaurant,Seafood Restaurant,French Restaurant,Chinese Restaurant,Caribbean Restaurant,Salon / Barbershop,Jazz Club
5,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Nightclub,Bakery,American Restaurant,Seafood Restaurant,Hotel,Theater,French Restaurant
6,Chinatown,Chinese Restaurant,Dim Sum Restaurant,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Bar,Hotpot Restaurant,Ice Cream Shop,Noodle House,Bakery
7,Chūō-ku,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant,Bakery,Park,Donburi Restaurant,Italian Restaurant,Seafood Restaurant
8,Civic Center,Gym / Fitness Center,Italian Restaurant,Bakery,French Restaurant,Yoga Studio,Cocktail Bar,Sporting Goods Shop,Spa,Park,American Restaurant
9,Clinton,Theater,Italian Restaurant,Gym / Fitness Center,Hotel,American Restaurant,Gym,Spa,Wine Shop,Mediterranean Restaurant,French Restaurant


In [60]:
# set number of clusters
kclusters = 5
OandM_grouped_clustering = OandM_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans_OandM= KMeans(n_clusters=kclusters, random_state=0).fit(OandM_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans_OandM.labels_

array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 3, 1, 1, 1,
       0, 1, 0, 1, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 2,
       2, 0, 0, 0, 0, 0, 2, 0, 1, 0, 0, 4, 0, 0, 0, 0, 0, 0, 1, 0],
      dtype=int32)

In [62]:
O = Osaka_ku.drop('Neighborhood (Kanji)', axis = 1)

In [63]:
M = manhattan_data.drop('Borough', axis = 1)

In [64]:
OandM_data = pd.concat([O,M]).reset_index(drop = True)
OandM_data

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Abeno-ku,34.627501,135.514095
1,Asahi-ku,35.476018,139.53192
2,Chūō-ku,35.666255,139.775565
3,Fukushima-ku,34.692104,135.474812
4,Higashinari-ku,34.672912,135.550567
5,Higashisumiyoshi-ku,34.615662,135.531096
6,Higashiyodogawa-ku,34.740212,135.517432
7,Hirano-ku,34.603715,135.559027
8,Ikuno-ku,34.653003,135.547722
9,Jōtō-ku,34.693887,135.547769


In [67]:
# add clustering labels
#neighborhoods_venues_sorted_OandM.insert(0, 'Cluster Labels', kmeans_OandM.labels_)
OandM_merged= OandM_data
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
OandM_merged = OandM_merged.join(neighborhoods_venues_sorted_OandM.set_index('Neighborhood'), on='Neighborhood')
OandM_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abeno-ku,34.627501,135.514095,1,Convenience Store,Shopping Mall,Supermarket,Intersection,Steakhouse,Fast Food Restaurant,Bakery,Coffee Shop,Liquor Store,Art Gallery
1,Asahi-ku,35.476018,139.53192,0,Baseball Field,Park,Golf Course,Playground,Yoga Studio,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market
2,Chūō-ku,35.666255,139.775565,0,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant,Bakery,Park,Donburi Restaurant,Italian Restaurant,Seafood Restaurant
3,Fukushima-ku,34.692104,135.474812,1,Convenience Store,Fast Food Restaurant,Chinese Restaurant,Ramen Restaurant,Donburi Restaurant,Train Station,Supermarket,Shopping Mall,Japanese Restaurant,Sake Bar
4,Higashinari-ku,34.672912,135.550567,3,Café,Shopping Mall,Grocery Store,Yoga Studio,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market,Flower Shop


In [70]:
#generate a basemap
geo = Nominatim(user_agent='nj_explorer')
address = 'Osaka'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Osaka {}, {}.'.format(latitude, longitude))

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(OandM_merged['Latitude'], OandM_merged['Longitude'], OandM_merged['Neighborhood'], OandM_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Osaka 34.6937569, 135.5014539.


In [73]:
#generate a basemap
geo = Nominatim(user_agent='nj_explorer')
address = 'Manhattan'
location = geo.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Osaka {}, {}.'.format(latitude, longitude))

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(OandM_merged['Latitude'], OandM_merged['Longitude'], OandM_merged['Neighborhood'], OandM_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color='black',
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The geograpical coordinate of Osaka 40.7900869, -73.9598295.


In [80]:
#cluster 0
OandM_merged[OandM_merged['Cluster Labels']==0]

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Asahi-ku,35.476018,139.53192,0,Baseball Field,Park,Golf Course,Playground,Yoga Studio,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish Market,Flea Market
2,Chūō-ku,35.666255,139.775565,0,Sushi Restaurant,Japanese Restaurant,Convenience Store,Coffee Shop,Soba Restaurant,Bakery,Park,Donburi Restaurant,Italian Restaurant,Seafood Restaurant
18,Suminoe-ku,34.614132,135.466545,0,Sporting Goods Shop,Restaurant,Athletics & Sports,Japanese Restaurant,Intersection,Karaoke Bar,Hot Spring,History Museum,Italian Restaurant,Soccer Field
21,Tennōji-ku,34.655043,135.51837,0,Bus Stop,Wagashi Place,Intersection,Donut Shop,Convenience Store,Sake Bar,Bakery,Noodle House,Chinese Restaurant,Italian Restaurant
24,Marble Hill,40.876551,-73.91066,0,Discount Store,Sandwich Place,Coffee Shop,Yoga Studio,Bank,Department Store,Diner,Pizza Place,Donut Shop,Seafood Restaurant
25,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Dim Sum Restaurant,American Restaurant,Cocktail Bar,Vietnamese Restaurant,Bar,Hotpot Restaurant,Ice Cream Shop,Noodle House,Bakery
26,Washington Heights,40.851903,-73.9369,0,Café,Mobile Phone Shop,Bakery,Pizza Place,Grocery Store,Park,Sandwich Place,Supermarket,Tapas Restaurant,Latin American Restaurant
27,Inwood,40.867684,-73.92121,0,Mexican Restaurant,Café,Pizza Place,Lounge,Chinese Restaurant,Bakery,Restaurant,Park,Spanish Restaurant,Deli / Bodega
28,Hamilton Heights,40.823604,-73.949688,0,Mexican Restaurant,Pizza Place,Coffee Shop,Café,Deli / Bodega,Yoga Studio,Bakery,Liquor Store,Sandwich Place,School
29,Manhattanville,40.816934,-73.957385,0,Coffee Shop,Mexican Restaurant,Park,Italian Restaurant,Seafood Restaurant,Diner,Beer Garden,Lounge,Falafel Restaurant,Bike Trail


## Results and Discussion <a name="results"></a> 

1. After both Osaka and Manhattan places information were merged together,  the cities were categorized again. I got a very interesting result.  
Cluster classification both were changed for each city. Compared to Osaka neighborhoods, Manhattan neighborhoods showed much more similarity (clustered to one category- red circles)
In Osaka neighborhoods, Tennoji-ku and Sumiyoshi-ku showed the most similarity to Manhattan. Other three neighborhoods  which were in the same cluster with Tennoji-ku and Sumiyoshi-ku in the previous slide  were categorized differently at this time.
2. The best three restaurants are all located in Abeno-Ku.
(I researches several categories (such as food, ART, museum, cinema et al.) using Foursquare API in order to make a recommendation list. Due to the limitation of accesses to Foursquare API, I only showed the food restaurants results here. )
3. The best three restaurants are all located in Abeno-Ku.
(I researches several categories (such as food, ART, museum, cinema et al.) using Foursquare API in order to make a recommendation list. Due to the limitation of accesses to Foursquare API, I only showed the food restaurants results here. )

## Conclusion <a name="conclusion"></a> 

The stakeholder's problem is resolved. This project was aimed to help my friend to find the best place which neighborhood is similar to where she lived before. Compared to Manhattan, Osaka city showed much more diversities of its neighborhoods. Tennōji-ku and Sumiyoshi-ku in Osaka showed the most similarity with Manhattan. 
The data showed that Abeno-ku owned the best 3 food restaurants in Osaka. Sumiyoshi-ku ranked in number 4 neighborhoods with Japanese restaurants numbers. As the ranking algorithms was developed, I can quickly ranked every category as I want to recommend to my friend. 