# Google Play Store Scraping - Indodax

---
## Project Prerequisites

### Data Extraction from Google Play Store with Google Play Scraper

To fetch app reviews from the Google Play Store, the `google-play-scraper` library is required. This library enables access to data such as reviews, app information, and more directly from the Google Play Store.

#### Installation
To install the `google-play-scraper` library, use the following command:

In [1]:
!pip install google-play-scraper



DEPRECATION: Loading egg at d:\software\python\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330


### Import Project Libraries

In [2]:
import os
import csv
from google_play_scraper import Sort, reviews_all

In [3]:
def ensure_directory_exists(directory):
    """
    Ensures that the specified directory exists.
    If the directory does not exist, it will be created.
    """
    os.makedirs(directory, exist_ok=True)
    # Create the directory if it doesn't already exist, using 'exist_ok=True' to avoid errors if the directory is already present.

---
## Data Acquisition through Web Scraping

##### *Scraping from Google Play Store*

In [4]:
# Fetch all reviews for an app with a specific ID from the Google Play Store
app_reviews = reviews_all(
    'id.co.bitcoin',             # The app's ID from which reviews will be fetched
    lang='id',                   # Language of the reviews (Indonesian)
    country='id',                # Country of the reviews (Indonesia)
    sort=Sort.MOST_RELEVANT,     # Order the reviews based on relevance
)

The code above uses the `google-play-scraper` library to fetch all reviews from an app available on the Google Play Store.

- **`from google_play_scraper import Sort, reviews_all`**: Imports the `reviews_all` function and the `Sort` constant from the `google-play-scraper` library. The `reviews_all` function is used for scraping reviews, while `Sort` determines the order in which reviews are retrieved.

- **`app_reviews = reviews_all(...)`**: Retrieves all reviews for the app identified by the provided `app_id`. The parameters include:
  - `app_id`: The app's ID on the Google Play Store.
  - `lang`: The desired language of the reviews.
  - `country`: The country from which the reviews originate.
  - `sort`: The method of sorting reviews, in this case, based on relevance (most relevant).

In [5]:
def save_review_content_to_csv(reviews, file_path):
    """
    Saves the 'content' field from each review to a CSV file using csv.writer.

    Parameters:
        reviews (list): A list of dictionaries, where each dictionary contains review data.
        file_path (str): The path to the CSV file where the content will be saved.

    Returns:
        None: This function writes data directly to a file and does not return a value.
    """

    # Ensure the directory for the file path exists
    ensure_directory_exists(os.path.dirname(file_path))

    # Open the file and write the data to it
    with open(file_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        
        # Write the header row
        writer.writerow(['content'])

        # Write the content of each review
        for rev in reviews:
            # Ensure rev is a dictionary and contains the 'content' key
            if isinstance(rev, dict) and 'content' in rev:
                writer.writerow([rev['content']])
            else:
                print(f"Unexpected data format: {rev}")
                # Print an error message if the data format is unexpected

# Save the review content to a CSV file
save_review_content_to_csv(app_reviews, 'data/review.csv')