# ANALYSIS OF VENUES FOR FOREIGN VISITORS IN TOKYO

### Coursera IBM DataScience Capstone project
Link to the full report : https://github.com/belanello/Coursera_Capstone/blob/master/full_report.pdf<br>
Link to the notebook with folium map : 
<br>https://nbviewer.jupyter.org/github/belanello/Coursera_Capstone/blob/master/data_preparation_modeling.ipynb

## (1) Data Collection

In this project, we want to find out, which areas must be developed better for the foreign tourists staying in Tokyo. 
To answer the question, we need to know, if they arrive in any stations in Tokyo, how many of those basic venues they can find in walking distance. I took 4 steps below to collect data.

1. Define the basic venues for foreign tourists. I chose 8 venues.
2. Collect the geographical coordinates of all the stations (all the lines) in Tokyo. 
3. Search each basic venue data from the all the stations using FourSquareAPI. 
4. Convert the dataframe to csv file 

### 1. Define the basic venues for foreign tourists. I chose 8 venues.

>1. Hotels
>2. Restaurants (all categories)
>3. Convenience stores
>4. ATMs
>5. Café
>6. Parks
>7. Pharmacies
>8. Tourist Information Center

### 1. Collect the geographical cordinations for all the stations in Tokyo

In [302]:
# import basic libraries
from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from geopy.geocoders import Nominatim
from IPython.display import Image
from IPython.core.display import HTML

import folium
import requests
import re

In [188]:
# let's get the stations names in Tokyo from wikipedia (there are 646 stations on the pages)

urls = ['https://en.wikipedia.org/w/index.php?title=Category:Railway_stations_in_Tokyo&pageuntil=Kasai-Rinkai+Park+Station#mw-pages',
       'https://en.wikipedia.org/w/index.php?title=Category:Railway_stations_in_Tokyo&pagefrom=Kasai-Rinkai+Park+Station#mw-pages',
       'https://en.wikipedia.org/w/index.php?title=Category:Railway_stations_in_Tokyo&pagefrom=Oku-Tama+Station#mw-pages',
       'https://en.wikipedia.org/w/index.php?title=Category:Railway_stations_in_Tokyo&pagefrom=Tsukishima+Station#mw-pages']

stations_list = []
for url in urls:

    html = urlopen(url)
    bs = BeautifulSoup(html.read(),'html.parser')
    tags = bs.find('div',{'class':'mw-category'}).find_all('li')

    for tag in tags:
        stations_list.append(tag.get_text().strip())
    

In [189]:
len(stations_list)

646

In [190]:
# convert list to pd.Series to clean the data
stations_list = pd.Series(stations_list)

In [191]:
# remove the '()'
stations_list = stations_list.str.replace('(\(.*\))','')

In [192]:
for i,station in enumerate(stations_list):
    print(i, station)

0 Adachi-Odai Station
1 Aihara Station
2 Akabane Station
3 Akabane-iwabuchi Station
4 Akabanebashi Station
5 Akado-shōgakkōmae Station
6 Akasaka-mitsuke Station
7 Akebonobashi Station
8 Akigawa Station
9 Akihabara Station
10 Akishima Station
11 Akitsu Station 
12 Anamori-inari Station
13 Aoi Station
14 Aomi Station
15 Aomono-yokochō Station
16 Aoto Station
17 Aoyama-itchōme Station
18 Araiyakushi-mae Station
19 Arakawa-itchūmae Station
20 Arakawa-kuyakushomae Station
21 Arakawa-nanachōme Station
22 Arakawa-nichōme Station
23 Arakawa-shakomae Station
24 Arakawa-yūenchimae Station
25 Ariake Station 
26 Ariake-Tennis-no-mori Station
27 Asagaya Station
28 Asakusa Station 
29 Asakusa Station 
30 Asakusabashi Station
31 Asukayama Station
32 Awajichō Station
33 Ayase Station
34 Azabu-juban Station
35 Bakuro-yokoyama Station
36 Bakurochō Station
37 Bubaigawara Station
38 Chidorichō Station
39 Chikatetsu-Akatsuka Station
40 Chikatetsu-Narimasu Station
41 Chitose-Funabashi Station
42 Chitose-kar

In [193]:
# fix some spellings for geocorder to understand
stations_list[52] = 'Ebara-machi Station'
stations_list[60] = 'Fuchūhonmachi Station'
stations_list[61] = 'Fuchūkeiba-seimonmae Station'
stations_list[132] = 'Higashiyamatoshi Station'
stations_list[415] = 'Ōsakihirokōji Station'
stations_list[285] = 'Meiji-jingumae \'Harajuku\' Station'
stations_list[330] = 'Odakyu Nagayama Station'

In [194]:
# delete some stations where theres no cordinates in geocoder, and add them with cordinates later

del(stations_list[155]) # IIdamachi station is not exists >> delete
del(stations_list[171]) # Itabashi Kuyakushomae Station, 35.751809056628396, 139.7097175970459
del(stations_list[482]) # Shimo-Shimmei Station, 35.60887072676263, 139.72623176709013
del(stations_list[518]) # its terminal station for freight >> delete
del(stations_list[525]) # Shōin-Jinjamae Station, 35.64414268616708, 139.65526296820596
del(stations_list[624]) # Yaguchinowatashi Station, 35.562719924859366, 139.70029060995995

In [195]:
len(stations_list)

640

In [196]:
# define the function to get cordinates
def get_cordinates(address):
    geolocator = Nominatim(user_agent='explorer')
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude
    return latitude, longitude

In [197]:
# make dataframe with station name, latitude and longitude
stations = pd.DataFrame(columns=['station','latitude','longitude'])

for station in stations_list:
    
    mapping = dict()     
    
    try:
        address = station + ', Tokyo, Japan'
        lat, lng = get_cordinates(address)
    except:
        address = station
        lat, lng = get_cordinates(address)
    
    mapping['station'] = station.strip()
    mapping['latitude'] = lat
    mapping['longitude'] = lng
    print(mapping)
    
    stations = stations.append(mapping,ignore_index=True)
    

{'station': 'Adachi-Odai Station', 'latitude': 35.7548009, 'longitude': 139.77040381447955}
{'station': 'Aihara Station', 'latitude': 35.60681905, 'longitude': 139.33168576980717}
{'station': 'Akabane Station', 'latitude': 35.7781394, 'longitude': 139.7207999}
{'station': 'Akabane-iwabuchi Station', 'latitude': 35.7829684, 'longitude': 139.7198532}
{'station': 'Akabanebashi Station', 'latitude': 35.654987, 'longitude': 139.7438912}
{'station': 'Akado-shōgakkōmae Station', 'latitude': 35.7428607, 'longitude': 139.7690418}
{'station': 'Akasaka-mitsuke Station', 'latitude': 35.6782157, 'longitude': 139.7356589}
{'station': 'Akebonobashi Station', 'latitude': 35.6924125, 'longitude': 139.7228564}
{'station': 'Akigawa Station', 'latitude': 35.728075, 'longitude': 139.2866763}
{'station': 'Akihabara Station', 'latitude': 35.69855685, 'longitude': 139.7731417779832}
{'station': 'Akishima Station', 'latitude': 35.70248, 'longitude': 139.350065}
{'station': 'Akitsu Station', 'latitude': 35.7785

{'station': 'Harajuku Station', 'latitude': 35.6709419, 'longitude': 139.7023936}
{'station': 'Hasune Station', 'latitude': 35.785341, 'longitude': 139.6793961}
{'station': 'Hasunuma Station', 'latitude': 35.5642988, 'longitude': 139.7071369}
{'station': 'Hatagaya Station', 'latitude': 35.6772317, 'longitude': 139.6768293}
{'station': 'Hatanodai Station', 'latitude': 35.6048647, 'longitude': 139.7026298}
{'station': 'Hatchōbori Station', 'latitude': 35.674006, 'longitude': 139.7764689}
{'station': 'Hatonosu Station', 'latitude': 35.8150523, 'longitude': 139.1287096}
{'station': 'Hatsudai Station', 'latitude': 35.6815487, 'longitude': 139.6867391}
{'station': 'Hazama Station', 'latitude': 35.6406875, 'longitude': 139.2932569}
{'station': 'Heiwadai Station', 'latitude': 35.7582769, 'longitude': 139.6531337}
{'station': 'Heiwajima Station', 'latitude': 35.5788329, 'longitude': 139.7353178}
{'station': 'Hibarigaoka Station', 'latitude': 35.7515231, 'longitude': 139.5455333}
{'station': 'Hi

{'station': 'Kanamechō Station', 'latitude': 35.7337025, 'longitude': 139.6994752}
{'station': 'Kanda Station', 'latitude': 35.6952811, 'longitude': 139.7704909}
{'station': 'Kanegafuchi Station', 'latitude': 35.7337353, 'longitude': 139.8205233}
{'station': 'Karakida Station', 'latitude': 35.6154702, 'longitude': 139.4118892}
{'station': 'Kasai Station', 'latitude': 35.6636282, 'longitude': 139.8726921}
{'station': 'Kasai-Rinkai Park Station', 'latitude': 35.642210649999996, 'longitude': 139.85948341156484}
{'station': 'Kasuga Station', 'latitude': 35.7084783, 'longitude': 139.7527482}
{'station': 'Kasumigaseki Station', 'latitude': 35.6740542, 'longitude': 139.7509719}
{'station': 'Katakura Station', 'latitude': 35.6398013, 'longitude': 139.3414342}
{'station': 'Kawai Station', 'latitude': 35.813361, 'longitude': 139.1629695}
{'station': 'Kayabachō Station', 'latitude': 35.6807188, 'longitude': 139.7779182}
{'station': 'Keikyū Kamata Station', 'latitude': 35.561334, 'longitude': 139.

{'station': 'Minami-ōsawa Station', 'latitude': 35.6141445, 'longitude': 139.3799374}
{'station': 'Minami-Senju Station', 'latitude': 35.73242555, 'longitude': 139.7988598351285}
{'station': 'Minami-Shinjuku Station', 'latitude': 35.6837333, 'longitude': 139.698905}
{'station': 'Minami-Sunamachi Station', 'latitude': 35.6683869, 'longitude': 139.8318701}
{'station': 'Minami-Tama Station', 'latitude': 35.6492461, 'longitude': 139.489904}
{'station': 'Minamidaira Station', 'latitude': 35.6551434, 'longitude': 139.3925488}
{'station': 'Minowa Station', 'latitude': 35.7294495, 'longitude': 139.7910987}
{'station': 'Minowabashi Station', 'latitude': 35.732138, 'longitude': 139.7916028}
{'station': 'Minumadai-shinsuikōen Station', 'latitude': 35.8140375, 'longitude': 139.7700688}
{'station': 'Mita Station', 'latitude': 35.6470962, 'longitude': 139.7482582}
{'station': 'Mitaka Station', 'latitude': 35.685227, 'longitude': 139.572916}
{'station': 'Mitakadai Station', 'latitude': 35.692157, 'lo

{'station': 'Ōgi-ōhashi Station', 'latitude': 35.7639694, 'longitude': 139.7708262}
{'station': 'Ogikubo Station', 'latitude': 35.7042661, 'longitude': 139.6200273}
{'station': 'Ohanajaya Station', 'latitude': 35.7476118, 'longitude': 139.840248990246}
{'station': 'Ōi Keibajō Mae Station', 'latitude': 35.5951172, 'longitude': 139.7470539}
{'station': 'Ōimachi Station', 'latitude': 35.6064857, 'longitude': 139.7339046}
{'station': 'Ōizumi-gakuen Station', 'latitude': 35.749494150000004, 'longitude': 139.58657634105236}
{'station': 'Ōji Station', 'latitude': 35.7538992, 'longitude': 139.7384535}
{'station': 'Ōji-ekimae Station', 'latitude': 35.7528392, 'longitude': 139.7381333}
{'station': 'Oji-kamiya Station', 'latitude': 35.7650552, 'longitude': 139.7357178}
{'station': 'Ojima Station', 'latitude': 35.6897695, 'longitude': 139.835076}
{'station': 'Okachimachi Station', 'latitude': 35.7069531, 'longitude': 139.7746303}
{'station': 'Oku Station', 'latitude': 35.747265, 'longitude': 139.7

{'station': 'Shin-Akitsu Station', 'latitude': 35.7779146, 'longitude': 139.493322}
{'station': 'Shin-egota Station', 'latitude': 35.7325995, 'longitude': 139.6703696}
{'station': 'Shin-itabashi Station', 'latitude': 35.748744, 'longitude': 139.7199957}
{'station': 'Shin-Kiba Station', 'latitude': 35.6461461, 'longitude': 139.8272864}
{'station': 'Shin-Kodaira Station', 'latitude': 35.731206, 'longitude': 139.4706846}
{'station': 'Shin-koenji Station', 'latitude': 35.6978447, 'longitude': 139.6486908}
{'station': 'Shin-Koganei Station', 'latitude': 35.6959822, 'longitude': 139.5267588}
{'station': 'Shin-Koiwa Station', 'latitude': 35.7168322, 'longitude': 139.858076}
{'station': 'Shin-koshinzuka Station', 'latitude': 35.7407605, 'longitude': 139.7312044}
{'station': 'Shin-Nakano Station', 'latitude': 35.6974599, 'longitude': 139.6686428}
{'station': 'Shin-Nihombashi Station', 'latitude': 35.6886078, 'longitude': 139.7733323}
{'station': 'Shin-Ochanomizu Station', 'latitude': 35.6972185

{'station': 'Tokyo Skytree Station', 'latitude': 35.71005425, 'longitude': 139.81071409992649}
{'station': 'Tokyo Station', 'latitude': 35.6810912, 'longitude': 139.7671861}
{'station': 'Tokyo Teleport Station', 'latitude': 35.6270943, 'longitude': 139.7797852}
{'station': 'Toneri Station', 'latitude': 35.8056634, 'longitude': 139.7701164}
{'station': 'Toneri-kōen Station', 'latitude': 35.7963713, 'longitude': 139.7701613}
{'station': 'Toranomon Hills Station', 'latitude': 35.6668644, 'longitude': 139.7491586731395}
{'station': 'Toranomon Station', 'latitude': 35.6701871, 'longitude': 139.750056}
{'station': 'Toritsu-Daigaku Station', 'latitude': 35.6179196, 'longitude': 139.67647533558412}
{'station': 'Toritsu-Kasei Station', 'latitude': 35.7223382, 'longitude': 139.6450194}
{'station': 'Toshimaen Station', 'latitude': 35.74478045, 'longitude': 139.64521873364365}
{'station': 'Tōyōchō Station', 'latitude': 35.669493, 'longitude': 139.8171938}
{'station': 'Toyoda Station', 'latitude': 

In [198]:
stations

Unnamed: 0,station,latitude,longitude
0,Adachi-Odai Station,35.754801,139.770404
1,Aihara Station,35.606819,139.331686
2,Akabane Station,35.778139,139.720800
3,Akabane-iwabuchi Station,35.782968,139.719853
4,Akabanebashi Station,35.654987,139.743891
...,...,...,...
635,Yūrakuchō Station,35.675844,139.763362
636,Yushima Station,35.707947,139.770056
637,Yūtenji Station,35.637575,139.691007
638,Zoshigaya Station,35.720165,139.714744


In [199]:
stations.shape

(640, 3)

In [200]:
# add deleted data with correct cordinates
missing_data = pd.DataFrame({'station': ['Itabashi Kuyakushomae Station','Shimo-Shimmei Station',
                                         'Shōin-Jinjamae Station','Yaguchinowatashi Station'],
                             'latitude':[35.751809056628396,35.60887072676263,35.64414268616708,35.562719924859366],
                             'longitude':[139.7097175970459,139.72623176709013,139.65526296820596,139.70029060995995]})

In [201]:
stations = stations.append(missing_data,ignore_index=True)

In [202]:
stations.shape

(644, 3)

In [203]:
# fix cordinates for some stations as it assigned wrong
# (some station names return wrong cordinations outside of Tokyo)
stations.loc[stations['station']=='Kitami Station','latitude'] = 35.6367473201698
stations.loc[stations['station']=='Kitami Station','longitude'] = 139.58726515779352

stations.loc[stations['station']=='Nagayama Station','latitude'] = 35.63004377163487
stations.loc[stations['station']=='Nagayama Station','longitude'] = 139.44805497428734

stations.loc[stations['station']=='Hino Station','latitude'] = 35.679415964597375
stations.loc[stations['station']=='Hino Station','longitude'] = 139.39403008355146

stations.loc[stations['station']=='Miyanosaka Station','latitude'] = 35.64800354817093 
stations.loc[stations['station']=='Miyanosaka Station','longitude'] = 139.64493159704216

stations.loc[stations['station']=='Kabe Station','latitude'] = 35.784645452875246
stations.loc[stations['station']=='Kabe Station','longitude'] = 139.2846639528668

stations.loc[stations['station']=='Ebaramachi Station','latitude'] = 35.60408998318122
stations.loc[stations['station']=='Ebaramachi Station','longitude'] = 139.70756737005686

stations.loc[stations['station']=='Hinode Station','latitude'] = 35.6493961729461
stations.loc[stations['station']=='Hinode Station','longitude'] = 139.75910498355026

stations.loc[stations['station']=='Okusawa Station','latitude'] = 35.60418851332801
stations.loc[stations['station']=='Okusawa Station','longitude'] = 139.67222078169632


In [205]:
# check if there's duplicates
stations[stations.duplicated()]

Unnamed: 0,station,latitude,longitude
29,Asakusa Station,35.711344,139.798354
617,Waseda Station,35.705768,139.720078


In [206]:
stations.shape

(644, 3)

In [207]:
# drop duplicates
stations = stations.drop_duplicates()

In [208]:
stations.shape

(642, 3)

In [303]:
# we have 642 stations in total
stations

Unnamed: 0,station,latitude,longitude
0,Adachi-Odai Station,35.754801,139.770404
1,Aihara Station,35.606819,139.331686
2,Akabane Station,35.778139,139.720800
3,Akabane-iwabuchi Station,35.782968,139.719853
4,Akabanebashi Station,35.654987,139.743891
...,...,...,...
637,Zōshiki Station,35.550041,139.715202
638,Itabashi Kuyakushomae Station,35.751809,139.709718
639,Shimo-Shimmei Station,35.608871,139.726232
640,Shōin-Jinjamae Station,35.644143,139.655263


In [210]:
stations = stations.reset_index()
stations.drop('index',axis=1,inplace=True)

In [212]:
stations.shape

(642, 3)

In [213]:
# visualize the locations on the map
map_tokyo = folium.Map(location=[35.6226771, 139.7226985],zoom_start=10,tiles='OpenStreetMap')

for station, lat, lng in zip(stations.station,stations.latitude,stations.longitude):
    label = '{},{} {}'.format(station,lat,lng)
    label = folium.Popup(label,parse_html=True)
    
    folium.CircleMarker(
        [lat,lng],
        radius=2,
        popup=label,
        color='#3186cc',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.3,
        parse_html=False).add_to(map_tokyo)
    
map_tokyo

In [214]:
# now there are 642 sations data re-indexed
stations.tail()

Unnamed: 0,station,latitude,longitude
637,Zōshiki Station,35.550041,139.715202
638,Itabashi Kuyakushomae Station,35.751809,139.709718
639,Shimo-Shimmei Station,35.608871,139.726232
640,Shōin-Jinjamae Station,35.644143,139.655263
641,Yaguchinowatashi Station,35.56272,139.700291


### 2. Collect the venues/facilities data within 1000m radius from each stations

In [304]:
search = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}'.format(
           CLIENT_ID,CLIENT_SECRET,VERSION)

In [217]:
# define the function to search the venues by categoryId
def get_venues(df,categoryId):     
    
    venues = pd.DataFrame(columns=['station','id','name','venue_lat','venue_lng','categories'])
    
    for st,lat,lng in zip(df['station'],df['latitude'],df['longitude']):
        
        url = search + '&ll={},{}&categoryId={}&radius=1000&limit=100'.format(lat,lng,categoryId)
        try:
            results = requests.get(url).json()['response']['venues']
        except:
            print(st)
            continue
        
        
        for i in range(len(results)):
            mapping = {}
            mapping['station'] = st
            mapping['id'] = results[i]['id']
            mapping['name'] = results[i]['name']
            mapping['venue_lat'] = results[i]['location']['lat']       
            mapping['venue_lng'] = results[i]['location']['lng']
            try:
                mapping['categories'] = results[i]['categories'][0]['name']
            except:
                print(st,results[i]['name']+'>> not categorized')
                continue
            venues = venues.append(mapping,ignore_index=True)

    return venues

In [218]:
# search the venues by keywords
def get_venues_by_name(df,query):    
    
    venues = pd.DataFrame(columns=['station','id','name','venue_lat','venue_lng','categories'])
    
    for st,lat,lng in zip(df['station'],df['latitude'],df['longitude']):
        
        url = search + '&ll={},{}&query={}&radius=1000&limit=100'.format(lat,lng,query)        
        try:
            results = requests.get(url).json()['response']['venues']
        except:
            print(st)
        
        
        for i in range(len(results)):
            mapping = {}
            mapping['station'] = st
            mapping['id'] = results[i]['id']
            mapping['name'] = results[i]['name']
            mapping['venue_lat'] = results[i]['location']['lat']       
            mapping['venue_lng'] = results[i]['location']['lng']
            try:
                mapping['categories'] = results[i]['categories'][0]['name']
            except:
                mapping['categories'] = None
                continue
                
            venues = venues.append(mapping,ignore_index=True)

    return venues

In [219]:
# 1. hotels data for each station
hotels = get_venues(stations, '4bf58dd8d48988d1fa931735')

In [220]:
hotels.shape

(10449, 6)

In [221]:
hotels.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,4d21b00af7a9a1437e48389f,Ryota Kuga's Guest House,35.746466,139.771546,Hostel
1,Adachi-Odai Station,59a382f5cad1b628d2f030f5,東京ゲストハウス2020,35.743168,139.771271,Hostel
2,Adachi-Odai Station,53f45d47498e66d089216755,Tokyo Guest House B&B Hostel (東京ゲストハウス B&B ホステル),35.743172,139.771286,Hostel
3,Aihara Station,4ce533fa5fce5481a53d5aaa,Laxio-Inn (ホテル ラクシオ・イン),35.610112,139.34474,Hotel
4,Aihara Station,5b6c2cf9e65d0c002ced1699,ホテル anniversary,35.611662,139.343505,Hotel


In [222]:
# drop the data wrongly assined to hotel's category
hotels['categories'].unique()

array(['Hostel', 'Hotel', 'Bed & Breakfast', 'Motel', 'Café',
       'Other Nightlife', 'Boarding House', 'Hotel Pool',
       'Residential Building (Apartment / Condo)', 'Conference Room',
       'Inn', 'Gym', 'Coworking Space', 'Resort', 'Vacation Rental',
       'Ballroom', 'Event Space', 'Boutique', 'Restaurant',
       'Japanese Restaurant', 'Meeting Room', 'Garden', 'Spa', 'Gay Bar',
       'Housing Development', 'Sauna / Steam Room', 'Pool', 'Office',
       'Other Great Outdoors', 'Trail'], dtype=object)

In [223]:
drop_list = ['Motel','Café','Other Nightlife', 'Boarding House', 'Hotel Pool', 'Conference Room','Gym','Coworking Space',
 'Ballroom','Event Space','Restaurant', 'Japanese Restaurant','Meeting Room','Gay Bar','Housing Development','Pool',
 'Other Great Outdoors', 'Garden','Trail']

In [224]:
for item in drop_list:
    indices = hotels[hotels['categories']==item].index
    hotels.drop(indices,axis=0,inplace=True)

In [225]:
hotels.shape

(9407, 6)

In [226]:
hotels['categories'].unique()

array(['Hostel', 'Hotel', 'Bed & Breakfast',
       'Residential Building (Apartment / Condo)', 'Inn', 'Resort',
       'Vacation Rental', 'Boutique', 'Spa', 'Sauna / Steam Room',
       'Office'], dtype=object)

In [227]:
# classify all the categories above into 'Hotels'
hotels['classes'] = 'Hotels'

In [228]:
hotels.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,4d21b00af7a9a1437e48389f,Ryota Kuga's Guest House,35.746466,139.771546,Hostel,Hotels
1,Adachi-Odai Station,59a382f5cad1b628d2f030f5,東京ゲストハウス2020,35.743168,139.771271,Hostel,Hotels
2,Adachi-Odai Station,53f45d47498e66d089216755,Tokyo Guest House B&B Hostel (東京ゲストハウス B&B ホステル),35.743172,139.771286,Hostel,Hotels
3,Aihara Station,4ce533fa5fce5481a53d5aaa,Laxio-Inn (ホテル ラクシオ・イン),35.610112,139.34474,Hotel,Hotels
4,Aihara Station,5b6c2cf9e65d0c002ced1699,ホテル anniversary,35.611662,139.343505,Hotel,Hotels


In [229]:
# 2.Restaurants data for each station
restaurants = get_venues(stations, '4d4b7105d754a06374d81259')

In [230]:
restaurants.shape

(31212, 6)

In [231]:
restaurants.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,4ef831942c5b0445017356c3,MOS Burger (モスバーガー),35.755219,139.768133,Fast Food Restaurant
1,Adachi-Odai Station,4d423743c65bf04d8e7a52f4,らーめん蔵芸 足立小台店,35.75522,139.768331,Ramen Restaurant
2,Adachi-Odai Station,5bb1f718f4b525002c5282a2,スエヒロ館,35.764637,139.771255,BBQ Joint
3,Adachi-Odai Station,53abd2c4498e42e280bf66a5,Domino’s Pizza (ドミノ・ピザ西尾久小台店),35.749725,139.76235,Pizza Place
4,Adachi-Odai Station,4da3bf62b521224b37f024ee,中華酒房 湘香坊,35.749268,139.768645,Chinese Restaurant


In [232]:
restaurants['categories'].unique()

array(['Fast Food Restaurant', 'Ramen Restaurant', 'BBQ Joint',
       'Pizza Place', 'Chinese Restaurant', 'Soba Restaurant',
       'Tonkatsu Restaurant', 'Japanese Family Restaurant', 'Café',
       'French Restaurant', 'Sake Bar', 'Tea Room', 'Pastry Shop',
       'Unagi Restaurant', 'Japanese Restaurant', 'Yakitori Restaurant',
       'Candy Store', 'Deli / Bodega', 'Coffee Shop', 'Dessert Shop',
       'Yoshoku Restaurant', 'Italian Restaurant', 'Donburi Restaurant',
       'Sushi Restaurant', 'Indian Restaurant', 'Kebab Restaurant',
       'Bakery', 'Shabu-Shabu Restaurant', 'Fried Chicken Joint',
       'Ice Cream Shop', 'Wagashi Place', 'Burger Joint', 'Snack Place',
       'Burrito Place', 'Wings Joint', 'Restaurant', 'Steakhouse',
       'Noodle House', 'Korean Restaurant', 'Salad Place',
       'Bubble Tea Shop', 'Dumpling Restaurant', 'Event Space',
       'Asian Restaurant', 'Udon Restaurant', 'Sukiyaki Restaurant',
       'Teishoku Restaurant', 'Japanese Curry Restaurant

In [233]:
# drop the data miscategorized
drop_list = ['Art Gallery','Arts & Crafts Store','Assisted Living','Bed & Breakfast','Bike Shop','Bookstore','Boutique','Building',
'Coworking Space','Event Space','Factory','Farm','Flower Shop','Furniture / Home Store','Garden',
 'Gas Station','Gift Shop','Laundromat','Museum','Office','Outdoor Supply Store','Rock Climbing Spot',
 'Stationery Store']

In [234]:
for item in drop_list:
    indices = restaurants[restaurants['categories']==item].index
    restaurants.drop(indices,axis=0,inplace=True)

In [235]:
restaurants.shape

(31131, 6)

In [236]:
restaurants['categories'].unique()

array(['Fast Food Restaurant', 'Ramen Restaurant', 'BBQ Joint',
       'Pizza Place', 'Chinese Restaurant', 'Soba Restaurant',
       'Tonkatsu Restaurant', 'Japanese Family Restaurant', 'Café',
       'French Restaurant', 'Sake Bar', 'Tea Room', 'Pastry Shop',
       'Unagi Restaurant', 'Japanese Restaurant', 'Yakitori Restaurant',
       'Candy Store', 'Deli / Bodega', 'Coffee Shop', 'Dessert Shop',
       'Yoshoku Restaurant', 'Italian Restaurant', 'Donburi Restaurant',
       'Sushi Restaurant', 'Indian Restaurant', 'Kebab Restaurant',
       'Bakery', 'Shabu-Shabu Restaurant', 'Fried Chicken Joint',
       'Ice Cream Shop', 'Wagashi Place', 'Burger Joint', 'Snack Place',
       'Burrito Place', 'Wings Joint', 'Restaurant', 'Steakhouse',
       'Noodle House', 'Korean Restaurant', 'Salad Place',
       'Bubble Tea Shop', 'Dumpling Restaurant', 'Asian Restaurant',
       'Udon Restaurant', 'Sukiyaki Restaurant', 'Teishoku Restaurant',
       'Japanese Curry Restaurant', 'Vietnamese 

In [237]:
# classify all the categories above into 'Restaurants'
restaurants['classes'] = 'Restaurants'

In [238]:
restaurants.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,4ef831942c5b0445017356c3,MOS Burger (モスバーガー),35.755219,139.768133,Fast Food Restaurant,Restaurants
1,Adachi-Odai Station,4d423743c65bf04d8e7a52f4,らーめん蔵芸 足立小台店,35.75522,139.768331,Ramen Restaurant,Restaurants
2,Adachi-Odai Station,5bb1f718f4b525002c5282a2,スエヒロ館,35.764637,139.771255,BBQ Joint,Restaurants
3,Adachi-Odai Station,53abd2c4498e42e280bf66a5,Domino’s Pizza (ドミノ・ピザ西尾久小台店),35.749725,139.76235,Pizza Place,Restaurants
4,Adachi-Odai Station,4da3bf62b521224b37f024ee,中華酒房 湘香坊,35.749268,139.768645,Chinese Restaurant,Restaurants


In [239]:
# 3.Convnenience stores data for each station
convenience_stores = get_venues(stations, '4d954b0ea243a5684a65b473')

In [240]:
convenience_stores.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,4c07c4773f200f4780ff08d1,Lawson (ローソン 東尾久五丁目店),35.749275,139.767812,Convenience Store
1,Adachi-Odai Station,5ba39b0f4b78c5002c68bcfc,7-Eleven (セブンイレブン 荒川東尾久6丁目店),35.749547,139.774679,Convenience Store
2,Adachi-Odai Station,4c62b6cae1621b8d740d2453,7-Eleven (セブンイレブン 東尾久店),35.743864,139.769391,Convenience Store
3,Adachi-Odai Station,58c732d865ca0155761af652,7-Eleven (セブンイレブン 扇大橋インター店),35.761882,139.766965,Convenience Store
4,Adachi-Odai Station,4c42659ada3dc928e184cab9,7-Eleven (セブンイレブン 熊の前店),35.749516,139.768367,Convenience Store


In [241]:
convenience_stores.shape

(22850, 6)

In [242]:
convenience_stores['categories'].unique()

array(['Convenience Store', 'Shopping Mall', 'Supermarket',
       'Herbs & Spices Store', 'Grocery Store', 'Sake Bar',
       'College Bookstore', 'Miscellaneous Shop', 'Liquor Store',
       'Souvenir Shop', 'Farmers Market', 'Cafeteria', 'Deli / Bodega',
       'Café', 'Food & Drink Shop', 'Pharmacy'], dtype=object)

In [243]:
# classify all the categories above into 'Convenience stores'
convenience_stores['classes'] = 'Convenience stores'

In [244]:
convenience_stores.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,4c07c4773f200f4780ff08d1,Lawson (ローソン 東尾久五丁目店),35.749275,139.767812,Convenience Store,Convenience stores
1,Adachi-Odai Station,5ba39b0f4b78c5002c68bcfc,7-Eleven (セブンイレブン 荒川東尾久6丁目店),35.749547,139.774679,Convenience Store,Convenience stores
2,Adachi-Odai Station,4c62b6cae1621b8d740d2453,7-Eleven (セブンイレブン 東尾久店),35.743864,139.769391,Convenience Store,Convenience stores
3,Adachi-Odai Station,58c732d865ca0155761af652,7-Eleven (セブンイレブン 扇大橋インター店),35.761882,139.766965,Convenience Store,Convenience stores
4,Adachi-Odai Station,4c42659ada3dc928e184cab9,7-Eleven (セブンイレブン 熊の前店),35.749516,139.768367,Convenience Store,Convenience stores


In [245]:
# 4.money exchange/ atm data for each station
# in Tokyo, you can use international cards only at particular ATMs
# you can use the ATMs at Post Office, 7-Eleven convenience stores, E.net ATMs at some other convenience stores,
# citiBank and prestia ATMs for sure. will exclude citi bank and prestia in this project as both has just around 10 brunches.
# other major banks are introducing machines for international cards progressively

In [246]:
# E.net AtMs are located in some of the convenience stores.
# Let's scrape the location form their websites

html = urlopen('https://pkg.navitime.co.jp/enet/address/list?address=13&search=address')
bs = BeautifulSoup(html.read(),'html.parser')
tags = bs.find('ul',{'class':'list-unstyled'}).find_all('a')
temp = []
for tag in tags:
    for i in range(10):
        try:
            html2 = urlopen('https:' + tag.get('href',None) + '&limit=10&page={}'.format(i))
            bs2 = BeautifulSoup(html2.read(),'html.parser')
        except:
            print('no more pages')
            continue

        tags2 = bs2.find_all(['dt','dd'],{'class':['w_5_aroundshop_1_1-spotName',
                                               'w_5_aroundshop_1_1-spotAddress w_5_aroundshop_1_1-spotInfoRight']})
        for tag2 in tags2:
            temp.append(tag2.get_text().strip())      


In [247]:
len(temp)

4068

In [248]:
temp = np.array(temp).reshape(2034,2)

In [249]:
enet = pd.DataFrame(temp,columns=['shops','address'])
enet.head(50)

Unnamed: 0,shops,address
0,生活彩家昭島病院　共同出張所,東京都昭島市中神町１２６０
1,エコスＴＡＩＲＡＹＡ中神　共同出張所,東京都昭島市中神町１３８０－５
2,エコス昭島　共同出張所,東京都昭島市中神町１１４９－１
3,カインズホーム昭島　共同出張所,東京都昭島市つつじが丘２丁目８－５５
4,ファミリーマート西武拝島　共同出張所,東京都昭島市美堀町５丁目２１－２
5,ファミリーマート中神駅前　共同出張所,東京都昭島市朝日町１丁目６－１
6,ファミリーマート昭島諏訪松中通り　共同出張所,東京都昭島市宮沢町４８４－３
7,ファミリーマート昭島東文化通り　共同出張所,東京都昭島市中神町１３８８－３
8,ファミリーマート宮沢町一丁目　共同出張所,東京都昭島市宮沢町１丁目２０－８
9,ファミリーマート昭島中神町　共同出張所,東京都昭島市中神町１２９４－１


In [250]:
# clean up the data from the website
enet['shops'] = enet['shops'].str.replace('　共同出張所','')
enet['shops'] = enet['shops'].str.replace('ファミリーマート','ファミリーマート ')
enet['shops'] = enet['shops'].str.replace('デイリーヤマザキ','デイリーヤマザキ ')
enet

Unnamed: 0,shops,address
0,生活彩家昭島病院,東京都昭島市中神町１２６０
1,エコスＴＡＩＲＡＹＡ中神,東京都昭島市中神町１３８０－５
2,エコス昭島,東京都昭島市中神町１１４９－１
3,カインズホーム昭島,東京都昭島市つつじが丘２丁目８－５５
4,ファミリーマート 西武拝島,東京都昭島市美堀町５丁目２１－２
...,...,...
2029,ファミリーマート 緑が丘駅前,東京都目黒区緑が丘１丁目１１
2030,ファミリーマート 目黒青葉台三丁目,東京都目黒区青葉台３丁目１７－１１
2031,ファミリーマート 小浦目黒青葉台,東京都目黒区青葉台１丁目２２－１０
2032,ファミリーマート 目黒三田通り,東京都目黒区目黒１丁目１－５


In [252]:
# find out their location from the convenience stores data
# if the name matches, get the index in temporary list
temp = list()
for shop in enet['shops']:
    temp.append(convenience_stores[convenience_stores['name'].str.match('.*{}.*'.format(shop))].index)

In [253]:
enet_indices = []
for lst in temp:
    for index in lst:
        enet_indices.append(index)

In [254]:
# slice them using with the indices above
enet_shop = convenience_stores.loc[enet_indices]

In [255]:
enet_shop[enet_shop.duplicated()]

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
389,Akishima Station,5950fa43bed48303eb487588,FamilyMart (ファミリーマート 昭島田中町団地入口店),35.703215,139.351975,Convenience Store,Convenience stores
793,Arakawa-nanachōme Station,4c2adacc8ef52d7fae9030ba,FamilyMart (ファミリーマート 町屋八丁目店),35.748861,139.786568,Convenience Store,Convenience stores
856,Arakawa-nichōme Station,4c2adacc8ef52d7fae9030ba,FamilyMart (ファミリーマート 町屋八丁目店),35.748861,139.786568,Convenience Store,Convenience stores
9573,Machiya Station,4c2adacc8ef52d7fae9030ba,FamilyMart (ファミリーマート 町屋八丁目店),35.748861,139.786568,Convenience Store,Convenience stores
9610,Machiya-nichōme Station,4c2adacc8ef52d7fae9030ba,FamilyMart (ファミリーマート 町屋八丁目店),35.748861,139.786568,Convenience Store,Convenience stores
...,...,...,...,...,...,...,...
2216,Fuchū Station,51e9145c498ea1fb03288d63,FamilyMart (ファミリーマート 府中南町二丁目店),35.658616,139.465873,Convenience Store,Convenience stores
11425,Nakagawara Station,51e9145c498ea1fb03288d63,FamilyMart (ファミリーマート 府中南町二丁目店),35.658616,139.465873,Convenience Store,Convenience stores
2579,Gaiemmae Station,4e0c3a60227100f4dfdf656a,FamilyMart (ファミリーマート 南青山三丁目店),35.664910,139.714015,Convenience Store,Convenience stores
14207,Omotesandō Station,4e0c3a60227100f4dfdf656a,FamilyMart (ファミリーマート 南青山三丁目店),35.664910,139.714015,Convenience Store,Convenience stores


In [256]:
 enet_shop = enet_shop.drop_duplicates()

In [257]:
# also get the 7-Eleven data as they have international ATMs
atms = convenience_stores.loc[convenience_stores['name'].str.match('.*7-Eleven.*')]
atms.shape

(7803, 7)

In [258]:
# add them together
atms = pd.concat([atms,enet_shop],ignore_index=True)
atms.shape

(11394, 7)

In [259]:
# get the money exchange office data
exchanges = get_venues(stations, '5744ccdfe4b0c0459246b4be')
exchanges

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Akabanebashi Station,5e7437513fe81d0008e93ffe,ワールドワイドマネー 六本木店,35.662678,139.733150,Currency Exchange
1,Akasaka-mitsuke Station,5052a8f3e4b0e7ceb59a27dc,Travelex,35.677289,139.735592,Currency Exchange
2,Akihabara Station,58b3e3d0e6160c515f6ca544,株式会社エーファクトリー,35.696531,139.773288,Office
3,Akihabara Station,4c62a438eb82d13ad9c805d6,Travelex,35.699136,139.774540,Currency Exchange
4,Akihabara Station,5e7b1078f6847800081f38c2,Adores Exchange Akihabara,35.698704,139.771550,Currency Exchange
...,...,...,...,...,...,...
588,Yushima Station,5858a59e2eb9792b4e89e52a,Ninja Money Exchange,35.699593,139.774920,Currency Exchange
589,Yushima Station,5e7b1078f6847800081f38c2,Adores Exchange Akihabara,35.698704,139.771550,Currency Exchange
590,Yushima Station,599e8b0cdb1d817a921ec903,インターバンク,35.699508,139.774818,Currency Exchange
591,Zoshigaya Station,5e2950390f291f0009aea87a,外貨両替マーケット,35.730305,139.710070,Currency Exchange


In [260]:
# drop the wrongly categorized data
exchanges['categories'].unique()

array(['Currency Exchange', 'Office', 'Thrift / Vintage Store',
       'Gift Shop'], dtype=object)

In [261]:
wrong_items = exchanges[exchanges['categories']=='Office'].index
exchanges.drop(wrong_items, inplace=True)

In [262]:
# add them together with atms data
atms = pd.concat([atms,exchanges],ignore_index=True)
atms.shape

(11970, 7)

In [263]:
# get the post office data as their ATMs accept international cards
post_offices = get_venues(stations, '4bf58dd8d48988d172941735')
post_offices

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,4d40e776c1d4721eba1015c7,熊野前郵便局,35.749403,139.769332,Post Office
1,Adachi-Odai Station,4cbbd3f74495721ebeb6577a,Arakawa Nishi-Oku 3 Post Office (荒川西尾久三郵便局),35.752648,139.762153,Post Office
2,Adachi-Odai Station,4ca93eaab0b8236acbe2bbe6,荒川西尾久二郵便局,35.747636,139.762890,Post Office
3,Adachi-Odai Station,4e6050441838ad3d0ddb794e,荒川東尾久二郵便局,35.743425,139.770816,Post Office
4,Adachi-Odai Station,4d520e80dcce224b05bee81b,荒川東尾久六郵便局,35.746467,139.774986,Post Office
...,...,...,...,...,...,...
5972,Yaguchinowatashi Station,4d92965862ad54816a17704b,大田西六郷郵便局,35.554209,139.706888,Post Office
5973,Yaguchinowatashi Station,4fe17fede4b0cc64d722fc1c,千鳥町駅前郵便局,35.572426,139.693435,Post Office
5974,Yaguchinowatashi Station,54798022498e3c86ae93468b,大田東矢口三郵便局,35.564042,139.708756,Post Office
5975,Yaguchinowatashi Station,4dfabef288775f8b52f71f15,大田池上六郵便局,35.567304,139.703972,Post Office


In [264]:
# drop the wrongly categorized data
wrong_items = post_offices[post_offices['categories']=='Building'].index
post_offices.drop(wrong_items,inplace=True)
post_offices['categories'].unique()

array(['Post Office', 'Shipping Store'], dtype=object)

In [265]:
wrong_items = post_offices[post_offices['categories']=='Shipping Store'].index
post_offices.drop(wrong_items,inplace=True)
post_offices['categories'].unique()

array(['Post Office'], dtype=object)

In [266]:
# add them together with atms data
atms = pd.concat([atms,post_offices],ignore_index=True)
atms.shape

(17944, 7)

In [267]:
atms.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,5ba39b0f4b78c5002c68bcfc,7-Eleven (セブンイレブン 荒川東尾久6丁目店),35.749547,139.774679,Convenience Store,Convenience stores
1,Adachi-Odai Station,4c62b6cae1621b8d740d2453,7-Eleven (セブンイレブン 東尾久店),35.743864,139.769391,Convenience Store,Convenience stores
2,Adachi-Odai Station,58c732d865ca0155761af652,7-Eleven (セブンイレブン 扇大橋インター店),35.761882,139.766965,Convenience Store,Convenience stores
3,Adachi-Odai Station,4c42659ada3dc928e184cab9,7-Eleven (セブンイレブン 熊の前店),35.749516,139.768367,Convenience Store,Convenience stores
4,Adachi-Odai Station,5bd06eae65211f0039d78736,7-Eleven (セブンイレブン 扇大橋駅前店),35.76403,139.770513,Convenience Store,Convenience stores


In [268]:
# let's categorize all in 'ATMS/Exchanges'
atms['classes'] = 'ATMs/Exchanges'

In [269]:
atms

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,5ba39b0f4b78c5002c68bcfc,7-Eleven (セブンイレブン 荒川東尾久6丁目店),35.749547,139.774679,Convenience Store,ATMs/Exchanges
1,Adachi-Odai Station,4c62b6cae1621b8d740d2453,7-Eleven (セブンイレブン 東尾久店),35.743864,139.769391,Convenience Store,ATMs/Exchanges
2,Adachi-Odai Station,58c732d865ca0155761af652,7-Eleven (セブンイレブン 扇大橋インター店),35.761882,139.766965,Convenience Store,ATMs/Exchanges
3,Adachi-Odai Station,4c42659ada3dc928e184cab9,7-Eleven (セブンイレブン 熊の前店),35.749516,139.768367,Convenience Store,ATMs/Exchanges
4,Adachi-Odai Station,5bd06eae65211f0039d78736,7-Eleven (セブンイレブン 扇大橋駅前店),35.764030,139.770513,Convenience Store,ATMs/Exchanges
...,...,...,...,...,...,...,...
17939,Yaguchinowatashi Station,4d92965862ad54816a17704b,大田西六郷郵便局,35.554209,139.706888,Post Office,ATMs/Exchanges
17940,Yaguchinowatashi Station,4fe17fede4b0cc64d722fc1c,千鳥町駅前郵便局,35.572426,139.693435,Post Office,ATMs/Exchanges
17941,Yaguchinowatashi Station,54798022498e3c86ae93468b,大田東矢口三郵便局,35.564042,139.708756,Post Office,ATMs/Exchanges
17942,Yaguchinowatashi Station,4dfabef288775f8b52f71f15,大田池上六郵便局,35.567304,139.703972,Post Office,ATMs/Exchanges


In [270]:
# 5.Cafes data for each station
cafes = get_venues(stations, '4bf58dd8d48988d16d941735')

Tabata Station


In [271]:
cafes.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,5ef8776118cbdf0008d52f81,Arakawa Ii Village,35.751899,139.765807,Souvenir Shop
1,Adachi-Odai Station,50a1bb43e4b0e39ceee652da,炭火煎珈琲 尾久珈琲亭,35.747942,139.765245,Café
2,Adachi-Odai Station,5666633a498e323a836a8db6,ナチュラルカフェ こひきや,35.746309,139.768221,Café
3,Adachi-Odai Station,5dd0cc8e0e96d2000826fbd1,生活茶屋,35.750257,139.762223,Café
4,Adachi-Odai Station,5f793f8a98df6d72aad9fa91,ワッゼカフェ Wazze Cafe,35.748853,139.763229,Café


In [272]:
cafes.shape

(22015, 6)

In [273]:
cafes['categories'].unique()

array(['Souvenir Shop', 'Café', 'Bakery', 'Coffee Shop', 'Tea Room',
       'Taiwanese Restaurant', 'Pastry Shop', 'Bistro', 'Bookstore',
       'Cupcake Shop', 'Dessert Shop', 'American Restaurant', 'Hostel',
       'Event Space', 'Bar', 'Vegetarian / Vegan Restaurant',
       'Ice Cream Shop', 'Italian Restaurant', 'Donut Shop', 'Dog Run',
       'French Restaurant', 'Hotel Bar', 'Restaurant', 'Salad Place',
       'Jazz Club', 'Theme Restaurant', 'Beer Bar', 'Rock Club',
       'Food Truck', 'Museum', 'Australian Restaurant', 'Pizza Place',
       'Japanese Curry Restaurant', 'Burger Joint', 'Breakfast Spot',
       'Bubble Tea Shop', 'Art Gallery', 'Japanese Restaurant',
       'Furniture / Home Store', 'Cafeteria', 'BBQ Joint', 'Juice Bar',
       'Teishoku Restaurant', 'Wagashi Place', 'Gourmet Shop',
       'Chocolate Shop', 'Coworking Space', 'Lounge', 'Food', 'Bike Shop',
       'Deli / Bodega', 'Wine Bar', 'Hookah Bar', 'Pie Shop',
       'Bagel Shop', 'Cosmetics Shop', 'Cand

In [274]:
# drop miscategorized data
drop_list = ['Souvenir Shop', 'Taiwanese Restaurant', 'Bistro', 
       'Bookstore', 'Hostel',
       'Event Space', 
       'Vegetarian / Vegan Restaurant', 
       'Jazz Club','Beer Bar', 'Rock Club',
       'Food Truck',  'Museum',
       'Japanese Curry Restaurant', 'Pizza Place',
       'Art Gallery', 'Japanese Restaurant', 'Furniture / Home Store', 
       'BBQ Joint', 'Gourmet Shop',
       'Teishoku Restaurant', 'Coworking Space',
       'Lounge', 'Food', 'Bike Shop', 'Deli / Bodega', 'Wine Bar',
       'Pie Shop',  'Cosmetics Shop',
       'Candy Store', 'Food & Drink Shop',
       'Gelato Shop',  'Wedding Hall', 'Antique Shop',
       'Hot Dog Joint', 'Yakitori Restaurant', 'Sports Bar', 'Boutique',
       'Gift Shop', 'Asian Restaurant', 'Wine Shop', 'Arcade',
       'Laundromat', 'Hotel', 'Golf Driving Range',
       'Soba Restaurant', 'Cocktail Bar',
       'Clothing Store', 'Turkish Restaurant', 'Nightclub',
       'Diner', 'Laundry Service', 'Music Venue', 'Gastropub',
       'Yoshoku Restaurant', 
       'Arts & Crafts Store', 'College Cafeteria',
       'Massage Studio', 'Farmers Market', 'Salon / Barbershop',
       'Assisted Living', 'Sake Bar', 'Beer Garden',
       'Mediterranean Restaurant', 'Organic Grocery', 'Brasserie',
       'Tapas Restaurant', 'Business Center', 
       'Fish & Chips Shop', 'Korean Restaurant', 'Flower Shop',
       'Recruiting Agency', 'New American Restaurant',
       'Corporate Cafeteria', 'Karaoke Bar', 
       'Grocery Store', 'Gym / Fitness Center', 'Soup Place',
       'Community College', 'Spanish Restaurant', 'Inn',
       'Rock Climbing Spot', 'Garden Center', 'Steakhouse', 'Food Court',
       "Women's Store", 'Seafood Restaurant', 'Gas Station', 'Food Stand',
       'Toy / Game Store']

In [275]:
for item in drop_list:
    indices = cafes[cafes['categories']==item].index
    cafes.drop(indices,axis=0,inplace=True)

In [276]:
cafes.shape

(21095, 6)

In [277]:
cafes['categories'].unique()

array(['Café', 'Bakery', 'Coffee Shop', 'Tea Room', 'Pastry Shop',
       'Cupcake Shop', 'Dessert Shop', 'American Restaurant', 'Bar',
       'Ice Cream Shop', 'Italian Restaurant', 'Donut Shop', 'Dog Run',
       'French Restaurant', 'Hotel Bar', 'Restaurant', 'Salad Place',
       'Theme Restaurant', 'Australian Restaurant', 'Burger Joint',
       'Breakfast Spot', 'Bubble Tea Shop', 'Cafeteria', 'Juice Bar',
       'Wagashi Place', 'Chocolate Shop', 'Hookah Bar', 'Bagel Shop',
       'Pub', 'Thai Restaurant', 'Sandwich Place', 'Ramen Restaurant',
       'Hawaiian Restaurant', 'Gaming Cafe', 'Fast Food Restaurant',
       'Snack Place', 'Caribbean Restaurant', 'Office',
       'Chinese Restaurant', 'Indian Restaurant', 'Creperie', 'Pet Café',
       'Internet Cafe'], dtype=object)

In [278]:
# classify all the categories above into 'Cafes'
cafes['classes'] = 'Cafes'

In [279]:
cafes.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
1,Adachi-Odai Station,50a1bb43e4b0e39ceee652da,炭火煎珈琲 尾久珈琲亭,35.747942,139.765245,Café,Cafes
2,Adachi-Odai Station,5666633a498e323a836a8db6,ナチュラルカフェ こひきや,35.746309,139.768221,Café,Cafes
3,Adachi-Odai Station,5dd0cc8e0e96d2000826fbd1,生活茶屋,35.750257,139.762223,Café,Cafes
4,Adachi-Odai Station,5f793f8a98df6d72aad9fa91,ワッゼカフェ Wazze Cafe,35.748853,139.763229,Café,Cafes
5,Adachi-Odai Station,4ef27de16c253bf123ed2d97,MIYOSHI,35.747773,139.765286,Café,Cafes


In [280]:
# 6.Parks data for each station
parks = get_venues(stations, '4bf58dd8d48988d163941735')

In [281]:
parks.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,4c0e8088cd8eb713899fef94,荒川区立 尾久第五児童遊園,35.750994,139.768675,Playground
1,Adachi-Odai Station,4d1de403d4fa2d43cef6fb4d,足立区立 小台東公園,35.754538,139.774551,Park
2,Adachi-Odai Station,4b55a323f964a520f5e927e3,Ogunohara Park (尾久の原公園),35.749925,139.776113,Park
3,Adachi-Odai Station,4cb52adde262b60c0dee65e0,荒川区立 尾久八幡公園,35.750818,139.764487,Park
4,Adachi-Odai Station,4bc12d1974a9a593a085d1f6,荒川区少年運動場野球場,35.755687,139.772325,Park


In [282]:
parks.shape

(19849, 6)

In [283]:
parks['categories'].unique()

array(['Playground', 'Park', 'Tree', 'Mountain', 'Event Space',
       'Monument / Landmark', 'Other Great Outdoors', 'Fountain', 'Plaza',
       'Bridge', 'Tennis Court', 'Field', 'Lake', 'Trail', 'Garden',
       'Waterfront', 'Historic Site', 'Castle', 'Track', 'Pier', 'Office',
       'Parking', 'Shopping Mall', 'Bike Trail', 'Athletics & Sports',
       'Soccer Field', 'Other Nightlife', 'Water Park', 'Dog Run',
       'Forest', 'Cemetery', 'Outdoor Event Space', 'Scenic Lookout',
       'Campground', 'Art Gallery', 'Sculpture Garden', 'Zoo'],
      dtype=object)

In [284]:
# drop some data as its wrongly categorized
drop_list = ['Event Space','Office','Other Nightlife','Parking','Cemetery','Art Gallery']

In [285]:
for item in drop_list:
    indices = parks[parks['categories']==item].index
    parks.drop(indices,axis=0,inplace=True)

In [286]:
parks.shape

(19813, 6)

In [287]:
parks['categories'].unique()

array(['Playground', 'Park', 'Tree', 'Mountain', 'Monument / Landmark',
       'Other Great Outdoors', 'Fountain', 'Plaza', 'Bridge',
       'Tennis Court', 'Field', 'Lake', 'Trail', 'Garden', 'Waterfront',
       'Historic Site', 'Castle', 'Track', 'Pier', 'Shopping Mall',
       'Bike Trail', 'Athletics & Sports', 'Soccer Field', 'Water Park',
       'Dog Run', 'Forest', 'Outdoor Event Space', 'Scenic Lookout',
       'Campground', 'Sculpture Garden', 'Zoo'], dtype=object)

In [288]:
# classify all the categories above into 'Parks'
parks['classes'] = 'Parks'

In [289]:
parks.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,4c0e8088cd8eb713899fef94,荒川区立 尾久第五児童遊園,35.750994,139.768675,Playground,Parks
1,Adachi-Odai Station,4d1de403d4fa2d43cef6fb4d,足立区立 小台東公園,35.754538,139.774551,Park,Parks
2,Adachi-Odai Station,4b55a323f964a520f5e927e3,Ogunohara Park (尾久の原公園),35.749925,139.776113,Park,Parks
3,Adachi-Odai Station,4cb52adde262b60c0dee65e0,荒川区立 尾久八幡公園,35.750818,139.764487,Park,Parks
4,Adachi-Odai Station,4bc12d1974a9a593a085d1f6,荒川区少年運動場野球場,35.755687,139.772325,Park,Parks


In [324]:
# 7.Pharmacy data for each station
pharmacies = get_venues(stations,'4bf58dd8d48988d10f951735')

In [291]:
pharmacies.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories
0,Adachi-Odai Station,5562da4a498e84500d10f8df,くまのまえ薬局,35.749271,139.769318,Pharmacy
1,Adachi-Odai Station,4efbe47c9a52109189216aad,尾久ミキ薬局,35.748506,139.765255,Pharmacy
2,Adachi-Odai Station,5f5b2f627ca2954a056c142b,西尾久薬局,35.74799,139.76525,Pharmacy
3,Adachi-Odai Station,50416013e4b0b9943deb8bc5,入江薬局 トダカコーポ店,35.747744,139.769613,Pharmacy
4,Adachi-Odai Station,4cb9086690c9a1430eae84d6,せきぐち薬局,35.749843,139.761833,Pharmacy


In [325]:
pharmacies.shape

(17513, 6)

In [326]:
pharmacies['categories'].unique()

array(['Pharmacy', 'Drugstore', 'Convenience Store', 'Supermarket',
       'Medical Center', 'Cosmetics Shop', 'Health & Beauty Service',
       'Hospital', 'Grocery Store', 'Optical Shop', 'Museum',
       'Miscellaneous Shop', "Doctor's Office"], dtype=object)

In [327]:
drop_list = ['Cosmetics Shop','Doctor\'s Office','Grocery Store','Optical Shop','Supermarket','Health & Beauty Service']

In [328]:
for item in drop_list:
    indices = pharmacies[pharmacies['categories']==item].index
    pharmacies.drop(indices,axis=0,inplace=True)

In [329]:
pharmacies.shape

(17423, 6)

In [330]:
pharmacies['categories'].unique()

array(['Pharmacy', 'Drugstore', 'Convenience Store', 'Medical Center',
       'Hospital', 'Museum', 'Miscellaneous Shop'], dtype=object)

In [331]:
# classify all the categories above into 'Pharmacies'
pharmacies['classes'] = 'Pharmacies'

In [332]:
pharmacies.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,5562da4a498e84500d10f8df,くまのまえ薬局,35.749271,139.769318,Pharmacy,Pharmacies
1,Adachi-Odai Station,4efbe47c9a52109189216aad,尾久ミキ薬局,35.748506,139.765255,Pharmacy,Pharmacies
2,Adachi-Odai Station,5f5b2f627ca2954a056c142b,西尾久薬局,35.74799,139.76525,Pharmacy,Pharmacies
3,Adachi-Odai Station,588adb68b3cdc8644e2a3108,さくら薬局,35.74721,139.776403,Pharmacy,Pharmacies
4,Adachi-Odai Station,5b9e00159ef8ef00396c4fd9,スギ薬局 東尾久店,35.743057,139.768795,Pharmacy,Pharmacies


In [301]:
# 8.Tourist information center for each station(search by category name)
tourist_info_centers = get_venues_by_name(stations,'Tourist Information Center')

In [305]:
tourist_info_centers.shape

(411, 6)

In [306]:
tourist_info_centers['categories'].unique()

array(['Tourist Information Center', 'Other Repair Shop', 'Art Gallery',
       'Office', 'Student Center', 'College Academic Building',
       'IT Services', 'Government Building'], dtype=object)

In [307]:
# extract only Tourist Information Center
tourist_info_centers = tourist_info_centers[tourist_info_centers['categories']=='Tourist Information Center']

In [308]:
tourist_info_centers['classes'] =  'Tourist Information Center'

In [309]:
tourist_info_centers.head()

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Akihabara Station,4d3598a9b5c78eec02b94dbf,秋葉原観光情報センター (Akihabara Tourist Information Cen...,35.69808,139.773717,Tourist Information Center,Tourist Information Center
1,Akihabara Station,5bc8152866fc65002c06ae49,Akihabara Tourist Information Center (秋葉原観光情報セ...,35.69794,139.77422,Tourist Information Center,Tourist Information Center
2,Asakusa Station,4e40a74ee4cd4324642f4914,Asakusa Culture Tourist Information Center (浅草...,35.71082,139.796405,Tourist Information Center,Tourist Information Center
3,Asakusabashi Station,4d3598a9b5c78eec02b94dbf,秋葉原観光情報センター (Akihabara Tourist Information Cen...,35.69808,139.773717,Tourist Information Center,Tourist Information Center
4,Asakusabashi Station,5bc8152866fc65002c06ae49,Akihabara Tourist Information Center (秋葉原観光情報セ...,35.69794,139.77422,Tourist Information Center,Tourist Information Center


In [333]:
# combine all the dataframe
data = pd.concat([hotels, restaurants,convenience_stores,atms,cafes,parks,pharmacies,tourist_info_centers],ignore_index=True)

In [334]:
data.shape

(139992, 7)

In [343]:
data

Unnamed: 0,station,id,name,venue_lat,venue_lng,categories,classes
0,Adachi-Odai Station,4d21b00af7a9a1437e48389f,Ryota Kuga's Guest House,35.746466,139.771546,Hostel,Hotels
1,Adachi-Odai Station,59a382f5cad1b628d2f030f5,東京ゲストハウス2020,35.743168,139.771271,Hostel,Hotels
2,Adachi-Odai Station,53f45d47498e66d089216755,Tokyo Guest House B&B Hostel (東京ゲストハウス B&B ホステル),35.743172,139.771286,Hostel,Hotels
3,Aihara Station,4ce533fa5fce5481a53d5aaa,Laxio-Inn (ホテル ラクシオ・イン),35.610112,139.344740,Hotel,Hotels
4,Aihara Station,5b6c2cf9e65d0c002ced1699,ホテル anniversary,35.611662,139.343505,Hotel,Hotels
...,...,...,...,...,...,...,...
139987,Yushima Station,5b0e7ba6a2a6ce002c79fc2b,Ueno Information Center,35.710779,139.775775,Tourist Information Center,Tourist Information Center
139988,Yushima Station,4ce14895c9a0a0903596246a,Tokyo Tourist Information Center (東京観光情報センター),35.710783,139.773482,Tourist Information Center,Tourist Information Center
139989,Yushima Station,57368551498ed9b3ec5509db,Park Information Center (公園案内所),35.714783,139.775900,Tourist Information Center,Tourist Information Center
139990,Yushima Station,5e4256344c4a85000888edc8,General Information Center (総合案内所),35.716144,139.772131,Tourist Information Center,Tourist Information Center


### 4. Convert the dataframe to csv file 

In [336]:
# convert it to csv file
data.to_csv('all_data.csv',index=False)

In [337]:
# convert stations data to csv file
stations.to_csv('stations.csv',index=False)