# Expanding data for Open Rice restaurants via JSON API calls and visualization - PART 1

<img src="data/openrice.png">

This analysis starts with open data from Open Rice - Hong Kong's most popular dining guide to help people find places to eat based on the restaurant reviews written by real local people. In part 1 of our data analysis, we will use that data from Open Rice and an API call to get latitude and longitude information for each restaurant. In part 2, we will use our JSON data to do some analyses and data visualizations.



## Geocoding
The Hong Kong government provides an [API](https://data.gov.hk/en-data/dataset/hk-ogcio-st_div_02-als/resource/ac80cf7b-f1e8-40d1-8b1c-8ea344d6e4cf) for address lookup which can be used to get the longitude and latitude for the address. Although many other APIs provide geo coding like Google or Bings API, sometimes they don't work so great with Hong Kong Addresses. 

So we'll start with doing a POST Request to the Hong Kong government's API to test to make sure it's working.

In [11]:
import json
import requests

url = "https://www.als.ogcio.gov.hk/lookup"
address =  "612-618 Nathan Rd" # this is a test address
params = {
     # these parameters are unique for each API - we are specifying the address and the number of addresses it returns associated with that string address
    "q":address,
    "n": 1
}

# Headers - tells us what type of information we want. We want it in JSON - The headers are specific to this API - the website tells us the header parameters
headers = {
    "Accept": "application/json",
    "Accept-Language": "en"
    } 

# making our request and assigning to a response variable
resp = requests.post(url, data=params, headers=headers)

# getting our information into the JSON format:
json.loads(resp.text)

{'RequestAddress': {'AddressLine': ['612-618 Nathan Rd']},
 'SuggestedAddress': [{'Address': {'PremisesAddress': {'EngPremisesAddress': {'BuildingName': 'GOOD HOPE BUILDING',
      'EngStreet': {'StreetName': 'NATHAN ROAD',
       'BuildingNoFrom': '612',
       'BuildingNoTo': '618'},
      'EngDistrict': {'DcDistrict': 'YTM'},
      'Region': 'KLN'},
     'GeospatialInformation': [{'Northing': '819812',
       'Easting': '835573',
       'Latitude': '22.3173',
       'Longitude': '114.1701'}]}},
   'ValidationInformation': {'ValidationTime': None}}]}

### Current Data

Let's import and check out the data that we have currently:

In [12]:
import pandas as pd

df = pd.read_csv("data/open-rice.csv")

df.head()

Unnamed: 0,address,bookmarks,dislikes,food_type,likes,name,number_of_reviews,price_range
0,"Shop J-K., 200 Hollywood Road,",5838,6,Hong Kong Style,78,For Kee Restaurant 科記咖啡餐室,(133 Reviews),Below $50
1,"G/F, 108 Hollywood Road,",3492,2,International,20,Blue · Butcher & Meat Specialist,(30 Reviews),$201-400
2,"G/F, 206 Hollywood Road,",5517,5,Thai,31,Chachawan,(43 Reviews),$201-400
3,"Shop 3018, 3/F, Shun Tak Centre, 168-200 Conna...",1173,1,Hong Kong Style,23,Capital Café 華星冰室,(39 Reviews),Below $50
4,"G/F, 38 Queens Road West,",1064,1,Indian,50,Namaste Kitchen 滋味廚房,(57 Reviews),$51-100


In [13]:
df.shape

(26165, 8)

### Defining Functions to make our Post Requests for every single one of these 26,165 restaurants

Let's write a function that will take a session and address and make a request for us.

In [14]:
# this library helps us do asynchronous requests
from requests_futures.sessions import FuturesSession

In [15]:
def make_request(session, address):
    '''This function takes a requests_futures session and the address string and returns the JSON response from the HK government'''
    data = {"q":address, "n":1}
    headers ={"Accept": "application/json"}
    api_url = "https://www.als.ogcio.gov.hk/lookup"
    future = session.post(api_url,data=data,headers=headers)
    return future

Since the API is quite slow this will take sometime, therefore the following is a function that prints the progress.

In [16]:
import time, sys
import numpy as np

def print_progress(futures):
    '''This function accepts a list of asynchronous sessions and prints the progress until all requests have completed'''
    
    # creating a function using .done() which is a method from a session that allows us to find request % completion
    check_done = lambda x: x.done()
    
    # vectorizing the function so that we can run it with speed across a list of sessions
    check_done = np.vectorize(check_done)

    # check_done will return nothing if it's done. so while there's NOT something for all sessions, print information every 1 second
    while not check_done(futures).all():
        time.sleep(1)
        percent = check_done(futures).mean() * 100
        sys.stdout.write("\r%d%%" % percent)
        sys.stdout.flush()
    print("\n")

Let's make a request for each unique address from our imported data.

In [17]:
# specific to jupyter notebook to allow us to "time-it"
%time

#create session with 16 threads that can make requests at a time - so 16 requests at a time asynchronously
session = FuturesSession(max_workers=16)

# make all of the requests using our make_request() function defined earlier for every address in our imported data and assigning it to our variable futures
futures = np.array([make_request(session,address) for address in df.address.unique()])

# once future is filled up from our request, this will complete. But in the meanwhile, we will run the print progress function to see the progress
print_progress(futures)

Wall time: 496 µs
100%



## Parsing Response

Now that all of the requests have been made we can parse them to get the json.

In [18]:
# getting the result().json() to every single response we got from above
json_results = np.vectorize(lambda x: x.result().json())(futures) 

# another way to read it:
# json_results = [f.result().json() for f in futures] 
# but vectorize is MUCH faster.

In [19]:
# testing to see what a row of our result looks like in our np array:
json_results[0]

{'RequestAddress': {'AddressLine': ['Shop J-K., 200 Hollywood Road,']},
 'SuggestedAddress': [{'Address': {'PremisesAddress': {'EngPremisesAddress': {'BuildingName': 'KEE ON BUILDING',
      'EngStreet': {'StreetName': 'HOLLYWOOD ROAD', 'BuildingNoFrom': '200'},
      'EngDistrict': {'DcDistrict': 'CW'},
      'Region': 'HK'},
     'ChiPremisesAddress': {'Region': '香港',
      'ChiDistrict': {'DcDistrict': 'CW'},
      'ChiStreet': {'StreetName': '荷李活道', 'BuildingNoFrom': '200'},
      'BuildingName': '祺安大廈'},
     'GeospatialInformation': [{'Northing': '816264',
       'Easting': '833279',
       'Latitude': '22.2852',
       'Longitude': '114.1478'},
      {'Northing': '816264',
       'Easting': '833281',
       'Latitude': '22.2852',
       'Longitude': '114.1478'}]}},
   'ValidationInformation': {'ValidationTime': None}}]}

Let's now write this json to disk for future use.

In [20]:
result = [json.dumps(result) for result in json_results]
result = json.dumps(result)
with open("data/openrice_addresses.json","w") as f:
    f.write(result)

### Part 2 will use the data we gathered to perform some analyses and visualizations.