I created this to scrape the JCB Dining30 website (website for Platinum card holders who get 30% discounts at specific restaurants across Japan) and get all the restaurants.

I started this based on the code from https://towardsdatascience.com/how-to-web-scrape-with-python-in-4-minutes-bc49186a8460

The dining30 website: https://pr.gnavi.co.jp/promo/jcb-dining30/restaurant/list.php?page=1

To use:

In order for this to work for you, you must first login with a Google Chrome browser to the Dining30 dining list (via JCB login and hosted on https://pr.gnavi.co.jp/).  This will ensure Google Chrome has access to active cookies that can be used by the driver to scrap the website.

You'll also need a Google Cloud Account with billing enabled to get an API key to the necessary services.  Store your API key in a file called `gmap_api.key` in the same folder as this notebook. The API key should have the Google Maps, Google Places and Google Geocodes APIs enabled.  Check these resources for more information:
 - https://developers.google.com/maps/documentation/javascript/overview
 - https://developers.google.com/places/web-service/overview
 - https://developers.google.com/maps/documentation/geocoding/overview
 
This is an excellent source on how to interact with the Maps API within Python: https://buildmedia.readthedocs.org/media/pdf/jupyter-gmaps/latest/jupyter-gmaps.pdf

In [1]:
import time
from bs4 import BeautifulSoup
import browser_cookie3

# This next attempt is from https://www.pluralsight.com/guides/guide-scraping-dynamic-web-pages-python-selenium
# chromedriver-install has changed to pyderman: https://pypi.org/project/pyderman/
import pyderman as driver
path = driver.install(browser=driver.chrome, file_directory='c:\\data\\')
print('Installed geckodriver driver to path: %s' % path)

chromedriver is already installed.
Installed geckodriver driver to path: c:\data\chromedriver_84.0.4147.30.exe


This will open the Chrome webbrowser that will be used for webscrapping.  We'll setup our options to auto translate from Japanese to English.

In [2]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options

# Use Google Translate to translate webpage
tranOptionsOn = Options()
tranOptionsOff = Options()
prefsOn = {
  "translate_whitelists": {"ja":"en"},
  "translate":{"enabled":"true"}
}
prefsOff = {
  "translate_whitelists": {"ja":"en"},
  "translate":{"enabled":"false"}
}
tranOptionsOn.add_experimental_option("prefs", prefsOn)
tranOptionsOff.add_experimental_option("prefs", prefsOff)

jp_driver = webdriver.Chrome(path)
en_driver = webdriver.Chrome(path, options=tranOptionsOn)

We naviage to our target site to setup our domain (otherwise we'd get a domain error when we add the cookies).  Now we'll get the cookies based on our current session.  After that we'll navigate to the site get the page source before Google Translates the page.  We then wait 5 seconds which should allow enough time for Google to Translate the page.

Note: we have to have recently visited the site and that page may still need to be open

In [3]:
cookies = browser_cookie3.chrome()

url = 'https://pr.gnavi.co.jp/promo/jcb-dining30/restaurant/list.php?page=1'
jp_driver.get(url)
en_driver.get(url)

for c in cookies:
    cookie = {}
    if c.domain == 'pr.gnavi.co.jp':
        cookie['name'] = c.name
        cookie['value'] = c.value
        cookie['domain'] = c.domain
        print(cookie)
        jp_driver.add_cookie(cookie)
        en_driver.add_cookie(cookie)



{'name': 'PHPSESSID', 'value': 'efbo0961agtsj5122a6gla5i64', 'domain': 'pr.gnavi.co.jp'}


Now we'll get the source code for each page in japanese intially and again in English.

In [4]:
def build_restaurant_dict(r):
    d = {}
    d['name'] = r.find("div", class_="searchContent-tll-hld").find('a').text
    d['type'] = r.find("div", class_="searchContent-tll-hld").find('span').text
    d['pr'] = r.find("p", class_="searchContent-info-pr").text
    d['discount'] = r.find("div", class_="searchContent-ttl-off").text
    access = r.find("ul", class_="searchContent-info-access")
    if access != "None":
        d['access'] = '<br/>'.join([x.text for x in access.find_all('li')])
    else:
        d['access'] = ""
    extra = r.find("ul", class_="searchContent-info-ex")
    if extra is None:
        d['extra'] = ""
    else:
        d['extra'] = '<br/>'.join([x.text for x in extra.find_all('li')])
    dtls = r.find("ul", class_="search-info-detail").find_all('li')
    d['seats'] = dtls[0].find('dd').text
    d['budget'] = dtls[1].find('dd').text
    d['phone'] = dtls[2].find('dd').text
    d['address'] = dtls[3].find('dd').text
    return d

In [5]:
restaurants = {}

for page in range(1,11):  # pages 1-10 = range(1,11)
    url = f"https://pr.gnavi.co.jp/promo/jcb-dining30/restaurant/list.php?page={page}"
    
    # Go to the site and get the japanese source then wait for the english translation and get that
    jp_driver.get(url)
    jp_source = BeautifulSoup(jp_driver.page_source, "html.parser")
    
    # Now scroll through the page so it completely translates into English
    en_driver.get(url)
    for i in range(0,8):
        time.sleep(0.5)
        y = 500*i
        en_driver.execute_script(f"window.scrollTo(0, {y});")
    en_source = BeautifulSoup(en_driver.page_source, "html.parser")
    
    # Get Japanese Data
    jp_results = jp_source.find_all("li", class_="searchContent-list")
    for idx, r in enumerate(jp_results):
        restaurant_jp = build_restaurant_dict(r)
        new_rest = {}
        new_rest['jp'] = restaurant_jp
        restaurants[idx*page] = new_rest

    # Get English data
    en_results = en_source.find_all("li", class_="searchContent-list")
    for idx, r in enumerate(en_results):
        restaurant_en = build_restaurant_dict(r)
        restaurants[idx*page]['en'] = restaurant_en

    # We'll print a list of restaurants that were added
    for k, v in restaurants.items():
        print (v['en']['name'])

Magic Bar CUORE Shinjuku Store
Restaurant Perfumes
Spring and Autumn Tameike Sanno Store
JAM ORCHESTRA
Spring and Autumn
restaurant DOLCH
Mountain dome
Sentence
Hokukaien
PIZZERIA TRATTORIA NITTANA
Japanese Dining Kura ANA Crowne Plaza Hotel Niigata
Restaurant Perfumes
Fist Romantic
JAM ORCHESTRA
the ringo
restaurant DOLCH
Steak House Azuma
Sentence
Serge Gen's Nishiki store Japanese black beef yakiniku
PIZZERIA TRATTORIA NITTANA
Ozaki beef and Japanese beef yakiniku restaurant Masuo Shinjuku main store
Meatworker Kogia Banyan Gotanda store
Kamazuda
Beef tongue shabu-shabu Shabu-an
Spring and Autumn Tsugihagi Hibiya
Wagyu Shunsai Ippon
Restaurant Perfumes
Fist Romantic
Japanese cuisine Kaiseki cuisine Sushiichi
the ringo
restaurant DOLCH
Kyoto cuisine Tachijin
Sentence
Serge Gen's Nishiki store Japanese black beef yakiniku
First Penguin IL TEATRINO
Ozaki beef and Japanese beef yakiniku restaurant Masuo Shinjuku main store
Wine Lounge Cuvee ITOU
Kamazuda
Beef tongue shabu-shabu Shabu-an

In [6]:
# We will use Google Maps for display maps and geocoding the address data.  Both require API key and cost $$$
# API Billing is here: https://console.cloud.google.com/apis/credentials?project=arboreal-totem-281307

import gmaps

with open('gmap_api.key') as f:
    mygmap_key = f.readline()
    f.close
    
gmaps.configure(api_key=mygmap_key)

In [7]:
# Since the geocoding cost most, we'll save the geocode data in a JSON file for reference and add to it if we have a new address
import json
from geopy.geocoders import GoogleV3

def get_google_geocode(address):
    geolocator = GoogleV3(api_key=mygmap_key)
    location = geolocator.geocode(address, timeout=10)
    return (location.latitude, location.longitude)

def load_geocodes():
    try:
        with open('geocodes.dat') as json_file:
            return json.load(json_file)
    except OSError as e:
        return {}

def save_geocodes(geocodes):
    with open('geocodes.dat', 'w') as outfile:
        json.dump(geocodes, outfile)

def get_geocode(geocodes, address):
    if address in geocodes:
        return tuple(geocodes[address])
    else:
        geocodes[address] = get_google_geocode(address)
        save_geocodes(geocodes)
        return geocodes[address]
        
geocodes = load_geocodes()

# Load the geocodes for each restaurant
for k, r in restaurants.items():
    geocode = get_geocode(geocodes, r['jp']['address'])
    r['geocode'] = geocode
    print (f"{r['jp']['address']} -----> {r['geocode']}")

〒606-8413 京都府京都市左京区浄土寺下馬場町3  -----> (35.0234439, 135.7912442)
〒141-0022 東京都品川区東五反田4-7-29 NK五反田ビル1F -----> (35.6298694, 139.7272968)
〒730-0017 広島県広島市中区鉄砲町9-3 クレセントヒルズ1F -----> (34.394541, 132.464905)
〒670-0016 兵庫県姫路市坂元町35 -----> (34.833578, 134.686454)
〒330-0845 埼玉県さいたま市大宮区仲町1-94 2F -----> (35.9048123, 139.6266569)
〒106-0045 東京都港区麻布十番1-9-2 AZABU 10ビル5F -----> (35.656446, 139.735405)
〒730-0012 広島県広島市中区上八丁堀4-1 アーバンビューグランドタワー12F -----> (34.3994737, 132.4648936)
〒104-0061 東京都中央区銀座8-10-15 -----> (35.6678952, 139.762326)
〒650-0011 兵庫県神戸市中央区下山手通1-3-24 -----> (34.6945507, 135.1914395)
〒107-0062 東京都港区南青山2-12-15  -----> (35.6712439, 139.7204948)
〒659-0012 兵庫県芦屋市朝日ケ丘町28-27  -----> (34.7448965, 135.3102219)
〒460-0003 愛知県名古屋市中区錦3-19-30 第三錦ビル2F -----> (35.1702794, 136.9034986)
〒730-0016 広島県広島市中区幟町8-5 1F -----> (34.3942884, 132.4674398)
〒163-0252 東京都新宿区西新宿2-6-1 新宿住友三角ビル52F -----> (35.6914592, 139.6923553)
〒650-0012 兵庫県神戸市中央区北長狭通2-10-9 イトウビル2F -----> (34.6926678, 135.1906001)
〒739-0016 広島県東広島市西条岡町10-20

In [8]:
# This gets the google details like website, rating, place_id, and link to google maps.  This costs $$$.
import requests

def get_info_from_google(r, fields):
    print('Getting data from google...')
    name = r['jp']['name']
    address = r['jp']['address']
    url = f"https://maps.googleapis.com/maps/api/place/textsearch/json?query={name}{address}&key={mygmap_key}"
    response = requests.get(url)
    google_place = json.loads(response.text)['results'][0]
    place_id = google_place['place_id']
    
    url = f"https://maps.googleapis.com/maps/api/place/details/json?place_id={place_id}&fields={fields}&key={mygmap_key}"
    response = requests.get(url)
    return(json.loads(response.text)['result'])

def load_google_info():
    try:
        with open('google.dat') as json_file:
            return json.load(json_file)
    except OSError as e:
        return {}


def save_google_info(google_info, fields):
    # add missing keys if they were returned from google
    for k, v in google_info.items():
        for f in fields.split(','):
            if f not in v.keys():
                v[f] = ""
    with open('google.dat', 'w') as outfile:
        json.dump(google_info, outfile)

def get_google_info(google_info, r, fields):
    address = r['jp']['address']
    if address in google_info:
        hasAllFields = True
        for f in fields.split(','):
            if f not in google_info[address].keys():
                hasAllFields = False
        if hasAllFields is True:
            print('Already have the data...')
            return google_info[address]
        else:
            print('New fields...')
            google_info[address] = get_info_from_google(r, fields)
            save_google_info(google_info, fields)
            return google_info[address]
    else:
        print('New address...')
        google_info[address] = get_info_from_google(r, fields)
        save_google_info(google_info, fields)
        return google_info[address]

fields = 'rating,url,website'
google_info = load_google_info()
    
for k, r in restaurants.items():
    info = get_google_info(google_info, r, fields)
    r['mapsurl'] = info['url']
    if 'website' in info.keys():
        r['website'] = info['website']
    else:
        r['website'] = ""
    r['google_rating'] = info['rating']

Already have the data...
New address...
Getting data from google...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
New address...
Getting data from google...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
New address...
Getting data from google...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
Already have the data...
Alread

In [9]:
# PDF: https://buildmedia.readthedocs.org/media/pdf/jupyter-gmaps/latest/jupyter-gmaps.pdf
# Tabs: https://www.w3schools.com/howto/howto_js_tabs.asp
# Tabs in CSS: https://medium.com/allenhwkim/how-to-build-tabs-only-with-css-844718d7de2f
def get_html(r):
    html = f"""
    <h3>{r['jp']['name']}</h3>
    <input type="radio" name="tabs" id="tab1" checked />
    <label for="tab1">Restaurant</label>
    <input type="radio" name="tabs" id="tab2" />
    <label for="tab2">Rating</label>
    <input type="radio" name="tabs" id="tab3" />
    <label for="tab3">Access</label>
    <div class="tab content1">
        <table>
        <tbody>
        <tr>
        <td>Name</td>
        <td>{r['en']['name']}</td>
        </tr>
        <tr>
        <td>Type</td>
        <td>{r['en']['type']}</td>
        </tr>
        <tr>
        <td>Discount</td>
        <td>{r['en']['discount']}</td>
        </tr>
        <tr>
        <tr>
        <td>Extra</td>
        <td>{r['en']['extra']}</td>
        </tr>
        <tr>
        <td>Seats</td>
        <td>{r['en']['seats']}</td>
        </tr>
        <tr>
        <td>Budget</td>
        <td>{r['jp']['budget']}</td>
        </tr>
        </tbody>
        </table>
    </div>
    <div class="tab content2">
        <table>
        <tbody>
        <tr>
        <td>Google Rating</td>
        <td>{r['google_rating']}</td>
        </tr>
        </tbody>
        </table>
    </div>
    <div class="tab content3">
        <p>{r['en']['access']}</p>
        <p><a href= "{r['mapsurl']}" target="_blank">{r['jp']['address']}</a></p>
        <p><a href= "{r['website']}" target="_blank">{r['website']}</a></p>
        <p>{r['jp']['phone']}</p>
    </div>
    <style type="text/css">
        /* Source: https://medium.com/allenhwkim/how-to-build-tabs-only-with-css-844718d7de2f */
        input {{ display: none; }}                /* hide radio buttons */
        input + label {{ display: inline-block }} /* show labels in line */
        input ~ .tab {{ display: none }}          /* hide contents */
        /* show contents only for selected tab */
        #tab1:checked ~ .tab.content1,
        #tab2:checked ~ .tab.content2,
        #tab3:checked ~ .tab.content3 {{ display: block; }}
        
        input + label {{             /* box with rounded corner */
          border: 1px solid #999;
          background: #EEE;
          padding: 4px 12px;
          border-radius: 4px 4px 0 0;
          position: relative;
          top: 1px;
        }}
        input:checked + label {{     /* white background for selected tab */
          background: #FFF;
          border-bottom: 1px solid transparent;
        }}
        input ~ .tab {{          /* grey line between tab and contents */
          border-top: 1px solid #999;
          padding: 12px;
        }}
    </style>
    """
    
    return html



In [10]:
# marker_locations = []
# marker_locations.append((location.latitude, location.longitude))
# print (marker_locations)
# fig = gmaps.figure(center=tokyo_coord, zoom_level = 13)
# markers = gmaps.marker_layer(marker_locations)
# fig.add_layer(markers)
# fig

In [11]:
restaurant_locations = [r['geocode'] for k, r in restaurants.items()]
restaurant_info = [get_html(r) for k, r in restaurants.items()]
#rint (restaurant_info[0])  # <-- need to fix this so it replaces the variable with the dyanmic text

In [12]:
print(len(restaurants))

37


In [13]:
# This generates a map (cost $$$)
marker_layer = gmaps.marker_layer(restaurant_locations, info_box_content=restaurant_info)
fig = gmaps.figure()
fig.add_layer(marker_layer)
fig

Figure(layout=FigureLayout(height='420px'))