# Asynchronous Requests

If we want to make many request, we might just use a for loop, however if we have 1000s to send this can take a long time. This is because each request has to wait until we got a response from the previous before it can be sent. 


In [63]:
from requests_futures.sessions import FuturesSession
from bs4 import BeautifulSoup
import pandas as pd

In [64]:
session = FuturesSession()

# Making Requests

Let's write a function that will take a session and and address and make a request for us.

In [65]:
import numpy as np
import pandas as pd

In [74]:
def make_request(session, url):
    future = session.get(url,headers= {'User-Agent': 'Mozilla/5.0'})
    return future

Since the API is quite slow this will take sometime, therefore below is a function that prints the percentage progress.

In [75]:
import time,sys

def print_progress(futures):

    check_done = lambda x: x.done()
    check_done = np.vectorize(check_done)

    #basic percentage progress
    while not check_done(futures).all():
        time.sleep(1)
        percent = check_done(futures).mean() * 100
        sys.stdout.write("\r%d%%" % percent)
        sys.stdout.flush()    
    print("\n")


We can now read in the open rice csv and make a request for each unique address

In [24]:
with open("master_url.txt", "r") as f:
    masterurllist = [i.strip() for i in f.readlines()]
    
test = masterurllist
len(test)

27454

In [89]:
%%time
#create session with 16 workers
session = FuturesSession(max_workers=32)
#make all of the requests
futures =   np.array([make_request(session,url) for url in test]) 
print_progress(futures)
print(futures)

100%

[<Future at 0x213c295e0d0 state=finished returned Response>
 <Future at 0x213c2724760 state=finished returned Response>
 <Future at 0x213c31621c0 state=finished returned Response> ...
 <Future at 0x213cb188c40 state=finished returned Response>
 <Future at 0x213cb188d90 state=finished returned Response>
 <Future at 0x213cb188ee0 state=finished returned Response>]
Wall time: 17min 16s


It took nearly 40 minutes even with async requests.

## Parsing Response

Now all of the requests have been made we can parse them to get the json.

In [90]:
results = [future.result().text for future in futures]

In [None]:
import json

save_r = [json.dumps(i) for i in results]
save_r = json.dumps(save_r)

with open("save_results.json","w") as f:
    f.write(save_r)

In [None]:
openrice_page = []

for page in results:
    soup = BeautifulSoup(page, 'html.parser')
    try:
        name = soup.find("h1").find("span").text
    except:
        name = None

    try:
        name2 = soup.find("div", class_="smaller-font-name").text
    except:
        name2 = None

    try:
        stars = soup.find("div", class_="header-score-details-left-score").text.strip()
    except:
        stars = None

    try:
        review_count = soup.find("span", itemprop="reviewCount").text.strip()
    except:
        review_count = None

    try:
        bookmarks = soup.find("div", class_="header-bookmark-count js-header-bookmark-count").text.strip()
    except:
        bookmarks = None

    try:
        district = soup.find("div", class_="header-poi-district dot-separator").text.strip()
    except:
        district = None

    try:
        price_range = soup.find("div", itemprop="priceRange").text.strip()
    except:
        price_range = None

    try:
        food_type = soup.find("div", class_="header-poi-categories dot-separator").text.strip()
    except:
        food_type = None

    try:
        emoji = soup.find("div", class_="header-smile-section").text.strip().split("\n\n")
    except:
        emoji = None

    try:
        address_ch = soup.find("section", class_="address-section").find_all("div", class_="content")[0].find("a").text.strip()
    except:
        address_ch = None

    try:
        address_en = soup.find("section", class_="address-section").find_all("div", class_="content")[1].find("a").text.strip()
    except:
        address_en = None

    try:
        transport = soup.find("section", class_="transport-section").find("div", class_="content js-text-wrapper").text.strip()
    except:
        transport = None

    try:
        telephone = soup.find("section", class_="telephone-section").find("div", class_="content").text.strip()
    except:
        telephone = None

    try:
        introduction = soup.find("section", class_="introduction-section").find("div", class_="content js-text-wrapper").text.strip().replace("\r","").replace("\n","")
    except:
        introduction = None

    try:
        open_hours = soup.find("div", class_="opening-hours-section js-normal-and-special-opening-hours-section").text.replace("\n","")
    except:
        open_hours = None

    try:
        payment = soup.find("div", class_="comma-tags").text.strip()
    except:
        payment = None

    try:
        review = [i.text.strip() for i in soup.find_all("div", class_="text js-text")]
    except:
        review = None

    openrice_page.append({"Name" : name,
    "Name2" : name2,
    "Stars" : stars,
    "Review_count" : review_count,
    "Bookmarks" : bookmarks,
    "District" : district,
    "Price_range" : price_range,
    "Food_type" : food_type,
    "Emoji" : emoji,
    "Address_ch" : address_ch,
    "Address_en" : address_en,
    "Transport" : transport,
    "Telephone" : telephone,
    "Intro" : introduction,
    "Openhours" : open_hours,
    "Payment" : payment,
    "Review" : review,
    })

df = pd.DataFrame(openrice_page)
df.to_csv("openrice2021.csv",mode='w', header=True, index=False)	
    

We'll write this json to disk for future use.

In [80]:
df

Unnamed: 0,Name,Name2,Stars,Review_count,Bookmarks,District,Price_range,Food_type,Emoji,Address_ch,Address_en,Transport,Telephone,Intro,Openhours,Payment,Review
0,肯德基家鄉雞 (KFC),Kentucky Fried Chicken,3.5,7.0,140,大埔,$50以下,美國菜\n快餐店,"[0, 1, 0]",大埔安邦路9號大埔超級城C區地下501-504號舖,"Shop 501-504, G/F, Tai Po Mega Mall Zone C, 9 ...",,26676643,,星期一至日07:30 - 22:00,Visa\nMaster\n現金\n八達通\nAE\n銀聯\nApple Pay\nGoog...,[18:00左右用手機APP點完餐，選擇了「立即」送餐。原本以為等一個小時便會送到。怎知等了...
1,太平洋咖啡,Pacific Coffee,,,21,愉景灣,$50以下,美國菜\n沙律\n西式糕點 \n咖啡店,"[0, 0, 0]",愉景灣愉景灣碼頭2號舖,"Shop 02, Discovery Bay Pier, Discovery Bay",,29141128,,星期一至六06:00 - 22:00星期日07:30 - 22:00公眾假期07:30 - ...,Visa\nMaster\n現金\n八達通\nAE\n銀聯\nApple Pay\nGoog...,[]


'C:\\Users\\azwin\\Xccelerate\\FTDS\\01Foundation\\11-requests-and-APIs\\notebooks'