<h1><center>Mengimpor <i>Library</i></center></h1>

<div class="alert alert-danger">Jika pada saat mengimpor <i>library</i>, terdapat <i>error</i> seperti di bawah ini:<br>
<tt><font color=black>&emsp;ModuleNotFoundError: No module named 'EXAMPLE'</tt>

Silakan mengunduh library tersebut terlebih dahulu dengan cara mengetikkan perintah berikut ini:<br>
<tt><font color=black>&emsp;pip install EXAMPLE</tt><br>atau<br>
<tt><font color=black>&emsp;conda install -c anaconda EXAMPLE</tt></div>

In [7]:
import numpy as np
import pandas as pd
import re
import json
import dateparser
import datetime

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl

<h1><center>Scraping Ulasan Setiap Cabang Menggunakan SerpAPI</center></title>

SerpAPI merupakan perusahaan yang menyediakan layanan API. Mereka menawarkan cara yang efisien untuk mengambil hasil pencarian dari berbagai *search engine* seperti Google, Bing, Yahoo, dan sebagainya. Pada kasus ini, kita akan menggunakan salah satu layanan API-nya, yaitu Google Maps API. Berdasarkan [dokumentasi](https://serpapi.com/google-maps-api) resminya:

<div class="alert alert-info" style="margin: 20px">Our Google Maps API allows you to scrape SERP results from a Google Maps search or places query. The API is accessed through the following endpoint: /search?engine=google_maps. A user may query the following: https://serpapi.com/search?engine=google_maps utilizing a GET request. Head to the <a href='https://serpapi.com/playground?engine=google_maps'>playground</a> for a live and interactive demo.

<br>

Some results may be inaccurate for the provided geographic location. Particularly near me keyword in query will show results not for provided coordinates. Also city, state and zip code can be added to query to refine search. Alternatively <a href='https://serpapi.com/local-results'>local results</a> can be used.</div>

Masukkan URL cabang yang ingin di-*scrape* ulasannya ke variabel `place_url`untuk mendapatkan ID-nya.

In [3]:
place_url = "https://www.google.com/maps/place/Auto2000+Garuda/@-6.1613082,106.8451228,17z/data=!4m8!3m7!1s0x2e69f5bbe41da311:0x90c253d6494645da!8m2!3d-6.1613135!4d106.8476977!9m1!1b1!16s%2Fg%2F1tcy13h3?entry=ttu"

place_id = re.search(r"1s(0x.*?)!8m", place_url).group(1)
place_id

'0x2e69f5bbe41da311:0x90c253d6494645da'

Masukkan API Key ke parameter `api_key`. API Key dapat diakses di [sini](https://serpapi.com/manage-api-key). Misal, jika API Key-nya adalah AbCd, tulis:

`api_key: "AbCd"`

In [None]:
params = {
    "api_key": "INSERT API KEY HERE", 
    "engine": "google_maps_reviews",
    "hl": "en",
    "data_id": place_id,
    "sort_by": "newestFirst"
}

search = GoogleSearch(params)

Tentukan batas awal tanggal ulasan yang ingin diambil pada variabel `target_date`. Tuliskan dalam bentuk format `datetime`, yaitu tahun, bulan, tanggal. Jika ingin mengambil ulasan dari tanggal 19 Juli 2023 (*included*), tulis:

`target_date  = datetime.datetime(2023, 7, 19)`

<font color=blue>**NOTE: Jika ingin memodifikasi *code* di bawah ini, silakan mempelajarinya terlebih dahulu di [sini](https://serpapi.com/blog/using-google-maps-reviews-api-from-serpapi/).**</font>

In [6]:
target_date  = datetime.datetime(2023, 1, 1)
reviews = []
page_num = 0

while True:
    page_num += 1
    results = search.get_dict()

    print(f"Scraping Page {page_num}")

    if "reviews" in results:
        for result in results["reviews"]:
            date_value = result.get("date")
            if len(date_value) > 1:
                if isinstance(date_value, str):
                    review_date = dateparser.parse(date_value).strftime("%Y-%m-%d")
                elif isinstance(date_value, int):
                    review_date = datetime.datetime.fromtimestamp(int(date_value)).strftime("%Y-%m-%d")
            else:
                review_date = np.nan

            if review_date and datetime.datetime.strptime(review_date, "%Y-%m-%d") >= target_date:
                reply = result.get("response", {})
                reply_snippet = reply.get("snippet")
                reply_date = reply.get("date", "")
                if len(reply_date) > 1:
                    if isinstance(reply_date, str):
                        reply_date = dateparser.parse(reply_date).strftime("%Y-%m-%d")
                    elif isinstance(reply_date, int):
                        reply_date = datetime.datetime.fromtimestamp(int(reply_date)).strftime("%Y-%m-%d")
                else:
                    reply_date = np.nan

                reviews.append({
                    "scraped_date": datetime.datetime.now().isoformat(),
                    "name": result["user"]["name"],
                    "link": result["user"]["link"],
                    "thumbnail": result["user"]["thumbnail"],
                    "rating": result["rating"],
                    "review_date": review_date,
                    "review": result["snippet"],
                    "reply_date": reply_date,
                    "reply": reply_snippet,
                    "images": result.get("images"),
                    "local_guide": result["user"].get("local_guide")
                })
            else:
                break
    else:
        print(results.get("error", "Unknown error"))
        break

    if results.get("serpapi_pagination").get("next") and results.get("serpapi_pagination").get("next_page_token"):
        search.params_dict.update(dict(parse_qsl(urlsplit(results["serpapi_pagination"]["next"]).query)))
    else:
        break

print(json.dumps(reviews, indent=2, ensure_ascii=False))

Scraping Page 1
Scraping Page 2
Scraping Page 3
Scraping Page 4
Scraping Page 5
Scraping Page 6
Scraping Page 7
Your searches for the month are exhausted. You can upgrade plans on SerpApi.com website.
[
  {
    "scraped_date": "2023-06-04T17:04:01.575563",
    "name": "Sonny Angga",
    "link": "https://www.google.com/maps/contrib/114008955936441678306?hl=en-US&sa=X&ved=2ahUKEwjaq9_uqqn_AhU4kYkEHT4uCEoQvvQBegQIARBB",
    "thumbnail": "https://lh3.googleusercontent.com/a-/AD_cMMQ9m7YpAfPfnLIu9BQ-iLu6KUtqWS0jIA1tQe_MAmM=s40-c-c0x00000000-cc-rp-mo-br100",
    "rating": 5.0,
    "review_date": "2023-05-28",
    "review": "(Translated by Google) Thank you Pa Ade for the good and extraordinary service.. 👍👍👍 Recommended sales (Original) Terima kasih Pa Ade atas pelayanan yang baik dan luar biasa.. 👍👍👍 Recommended sales",
    "reply_date": "2023-05-30",
    "reply": "(Translated by Google) Good afternoon Mr. Sonny Angga, Thank you for the 5 star review given, it's a pleasure to provide satisfy

<h1><center>Melakukan <i>Data Cleaning</i></center></h1>

Mengubah hasil *scraping* dari JSON ke Pandas *dataframe* agar lebih mudah dibaca. 

In [9]:
columns = [
    "scraped_date", "name", "link", "rating", "review_date", "english_review", "indonesian_review",
    "reply_date", "english_reply", "indonesian_reply", "images", "local_guide"
]

df = pd.DataFrame(reviews)

df["local_guide"] = df["local_guide"].apply(lambda x: False if x is not True else True)

df = df.fillna(np.nan)
df = df.replace("", np.nan)

# for col in ["scraped_date", "review_date", "reply_date"]:
#     df[col] = pd.to_datetime(df[col])

df["english_review"] = df["review"].str.extract(r"\(Translated by Google\)\s+(.*?)\s+\(Original\)")
df["english_review"] = df["english_review"].str.strip()

df["indonesian_review"] = df["review"].str.extract(r"\(Original\)\s+(.*)")
df["indonesian_review"] = df["indonesian_review"].str.strip()

df["english_reply"] = df["reply"].str.extract(r"\(Translated by Google\)\s+(.*?)\s+\(Original\)")
df["english_reply"] = df["english_reply"].str.strip()

df["indonesian_reply"] = df["reply"].str.extract(r"\(Original\)\s+(.*)")
df["indonesian_reply"] = df["indonesian_reply"].str.strip()

df = df[columns]

df.head()

Unnamed: 0,scraped_date,name,link,rating,review_date,english_review,indonesian_review,reply_date,english_reply,indonesian_reply,images,local_guide
0,2023-06-04T17:04:01.575563,Sonny Angga,https://www.google.com/maps/contrib/1140089559...,5.0,2023-05-28,Thank you Pa Ade for the good and extraordinar...,Terima kasih Pa Ade atas pelayanan yang baik d...,2023-05-30,"Good afternoon Mr. Sonny Angga, Thank you for ...","Selamat siang Pak Sonny Angga, Terima kasih at...",,False
1,2023-06-04T17:04:01.579560,Ade Pardana,https://www.google.com/maps/contrib/1059841397...,5.0,2023-05-28,,,2023-05-30,Good afternoon. Thank you for the 5 star revie...,Selamat siang. Terima kasih atas review bintan...,,False
2,2023-06-04T17:04:01.584098,Cahyadi Wijaya,https://www.google.com/maps/contrib/1178533912...,5.0,2023-05-28,"Good service, fast, friendly, served by Mr. Ad...","Pelayanang Baik, Cepat , Ramah, dilayani Pak A...",2023-05-30,"Good afternoon Mr. Cahyadi Wijaya, Thank you f...","Selamat siang Pak Cahyadi Wijaya, Terima kasih...",[https://lh5.googleusercontent.com/p/AF1QipOaO...,False
3,2023-06-04T17:04:01.588076,Tukang Review,https://www.google.com/maps/contrib/1044526187...,5.0,2023-05-28,Auto 2000 Garuda service is great. Thank you f...,Pelayanan Auto 2000 Garuda mantap. Terima kasi...,2023-05-30,Good afternoon AutoFamily. Thank you for the 5...,Selamat siang AutoFamily. Terima kasih atas re...,,False
4,2023-06-04T17:04:01.594660,Yesi Yoto,https://www.google.com/maps/contrib/1024983469...,5.0,2023-05-28,,,2023-02-04,Good afternoon Mr. Yesi Yoto. Thank you for th...,Selamat siang Pak Yesi Yoto. Terima kasih atas...,[https://lh5.googleusercontent.com/p/AF1QipMmE...,True


Menyimpan hasil *scraping* dalam bentuk Excel.

In [12]:
df.to_excel("result.xlsx", index=False)