# Data
To excute idea that categorize osaka central city and find good place to open coffee shop, I used following 3 data sources.

## Data sources  

- **Forsqure API**  
As we learned it during this module, I used Forsqure location data via API.

- **Address**   
the list of osaka neighborhoods are requied. It is available in open data base. [http://jusyo.jp/csv/new.php]
This website provides all Japanese address with postal codes, and files are zipped for each prefecture.(Sorry, only Japanese are available!) I took only Osaka address using `wget` command. 

- **Geocoding**    
To get coordinates, I used [geocoorder](https://www.geocoding.jp/) API. It allows to get all latitude and longitude for given address.(Again, addresses are written in Japanese, sorry!)
This [reference](https://qiita.com/paulxll/items/7bc4a5b0529a8d784673) helped me to use the API.


## 1.Get address 

First, let's get addresses of all osaka area.

In [None]:
!wget -q -O http://jusyo.jp/downloads/new/csv/csv_27osaka.zip

import zipfile
with zipfile.ZipFile('csv_27osaka.zip') as myzip:
    myzip.extract('27osaka.csv') 

In [3]:
import pandas as pd
import codecs

In [4]:
# data are encorded with JIS, and 'pd.read_csv' doesn't work.
with codecs.open('27osaka.csv', "r", "Shift-JIS", "ignore") as file:
    df = pd.read_table(file, delimiter=",")
df = pd.DataFrame(df)

# drop specific office address
df = df[df['事業所名'].isna()]
df = df[['郵便番号', '市区町村','町域']]
df.columns = ['PostalCode','Borough','Neighborhood']

# limit to center of Osaka-city
df = df[df['Borough'].str.startswith('大阪市北区')|df['Borough'].str.startswith('大阪市中央区')|df['Borough'].str.startswith('大阪市福島区')|df['Borough'].str.startswith('大阪市都島区')]

# clean up
df['Address'] = df['Borough'] + df['Neighborhood']
df.drop_duplicates(subset=['Address'], inplace = True)
df.dropna(inplace = True)
df.reset_index(inplace = True, drop =True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Address
0,534-0026,大阪市都島区,網島町,大阪市都島区網島町
1,534-0013,大阪市都島区,内代町,大阪市都島区内代町
2,534-0025,大阪市都島区,片町,大阪市都島区片町
3,534-0001,大阪市都島区,毛馬町,大阪市都島区毛馬町
4,534-0015,大阪市都島区,善源寺町,大阪市都島区善源寺町


In [5]:
df.shape

(148, 4)

148 neighborhood's name was obtained. 

## 2.Get coordinates 

To use forsqure API, we need longitude and latitude. I could get it through geocoder API


In [6]:
import requests
from bs4 import BeautifulSoup
import time

URL = 'http://www.geocoding.jp/api/'

def coordinate(address):
    payload = {'q': address}
    html = requests.get(URL, params=payload)
    soup = BeautifulSoup(html.content, "html.parser")
    if soup.find('error'):
        raise ValueError(f"Invalid address submitted. {address}")
    latitude = soup.find('lat').string
    longitude = soup.find('lng').string
    return [latitude, longitude]


def coordinates(addresses):
    coordinates = pd.DataFrame(columns = ['latitude','longitude','address'])
    coordinates.set_index('address',inplace = True)
    for address in addresses:
        try:
            coord = coordinate(address)
            coordinates.loc[address] = [coord[0],coord[1]]
        except:
            print(address)
        time.sleep(10)
    return coordinates

In [None]:
osaka_address = df.Address.to_list()
osaka_coords = coordinates(osaka_address)

In [10]:
#save as csv so that I could use for exploring data 
#osaka_coords.to_csv('osaka_coords.csv', index = False)
#osaka_coords = pd.read_csv('osaka_coords.csv')

In [11]:
print(osaka_coords.shape)
osaka_coords.head()

(148, 3)


Unnamed: 0,address,latitude,longitude
0,大阪市都島区網島町,34.695491,135.524609
1,大阪市都島区内代町,34.711172,135.538168
2,大阪市都島区片町,34.69378,135.528146
3,大阪市都島区毛馬町,34.723028,135.52284
4,大阪市都島区善源寺町,34.712168,135.524609


All 148 neighborhoods coordinates could be prepared.

## 3.Get location data

Using forsqure API, get venue data.


In [12]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


Using Folium package, visualize addresses which I got on map.

In [25]:
# create map of osaka using latitude and longitude values
lat = osaka_coords[osaka_coords['address']=='大阪市北区西天満'].latitude
lon = osaka_coords[osaka_coords['address']=='大阪市北区西天満'].longitude
map_osaka = folium.Map(location=[lat, lon], zoom_start=14)

# add markers to map
for lat, lng, neighborhood in zip(osaka_coords['latitude'], osaka_coords['longitude'], osaka_coords['address']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_osaka)  
    
map_osaka

In [26]:
#Credential setting for foursqure developer API
CLIENT_ID = '4CXJVQTJNPBIFY5413VOP2WRVQK0EGJ400OOJI2QJU1HCBB4' # your Foursquare ID
CLIENT_SECRET = 'I3ECCDVCHAW2MDLJNISPQJYOU0ZYJGEK0CWECVOEYX1HWX5K' # your Foursquare Secret
ACCESS_TOKEN = 'CYCAIMSPDWIJBXNN5HVCNN5PBGZIMUHQFBTPG354MGFQ5EQ0' # your FourSquare Access Token
VERSION = '20180604'
LIMIT = 30

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
venues = getNearbyVenues(names=osaka_coords['address'],
                                   latitudes=osaka_coords['latitude'],
                                   longitudes=osaka_coords['longitude']
                                  )

In [28]:
venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,大阪市都島区網島町,34.695491,135.524609,とん太,34.693324,135.526503,Tonkatsu Restaurant
1,大阪市都島区網島町,34.695491,135.524609,Fujita Museum (藤田美術館),34.69505,135.525421,Art Museum
2,大阪市都島区網島町,34.695491,135.524609,京橋Arc,34.694584,135.527113,Rock Club
3,大阪市都島区網島町,34.695491,135.524609,Mint Museum (造幣博物館),34.695739,135.521648,Museum
4,大阪市都島区網島町,34.695491,135.524609,藤田邸跡公園,34.695006,135.524321,Garden


In [29]:
print('Total ' + str(venues.shape[0])+ ' venues are found') 
print('There are {} unique categories'.format(len(venues['Venue Category'].unique())))

Total 4257 venues are found
There are 199 unique categories


Now, I have all data for my analysis.