<a href="https://colab.research.google.com/github/ayowassup/fore-coffee-reviews-scraper/blob/main/Fore_Coffee_Google_Play_Reviews.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Scraping "Fore Coffee" App Reviews on Google Play Store**

---

## **Project Overview:**
This mini python project aims to scrape user reviews of the 'Fore Coffee' application on Google Play Store. The project uses web scraping techniques to fetch review data, filter reviews by year, and save the results in CSV format.

## **Project Steps:**

1. **Library Installation:** Installing the `google-play-scraper` library and importing all necessary libraries.
2. **Review Extraction:** Fetching all reviews for the "Fore Coffee" app from the Google Play Store.
3. **Date Filtering:** Filtering reviews to include only those from 2023.
4. **Data Transformation:** Converting the filtered reviews into a DataFrame and sorting them from oldest to newest.
6. **Export to CSV:** Saving the final results in CSV format, specifically to `/generated_data` folder.

## **About Fore Coffee:**

**Fore Coffee** is a popular coffee chain in Indonesia that aims to bring quality coffee to a wider audience. Known for its innovative approach to coffee brewing and a strong focus on customer experience, Fore Coffee has gained a significant following. The [Fore Coffee App](https://play.google.com/store/apps/details?id=coffee.fore2.fore) allows customers to order coffee for delivery or pickup, explore menu options, and take advantage of various promotions.

In this case, the Fore Coffee app ID (`coffee.fore2.fore`) is used for scraping reviews from the Google Play Store. This ID uniquely identifies the app on the platform.

## **Used Libraries:**

1.   `google-play-scraper`: For fetching review data from the Google Play Store.
2.   `datetime`: For manipulating date and time.
3.   `pandas`: For data manipulation.
4.   `numpy`: For numerical computing support.

## **Project Output:**
The final output of this project is a CSV file containing the reviews of the "Fore Coffee" app from 2023. The CSV file includes columns such as name, date created, reviews, and rating. The file is saved in the /generated_data folder.

# **Library Installation**

This code below serves two main purposes: installing the `google-play-scraper` library and importing the necessary libraries for the project.



1.   It installs the `google-play-scraper` library using the `pip` package manager to facilitate scraping of data from the Google Play Store.
2.   It imports essential libraries such as `app` and `Sort` from `google_play_scraper`, `datetime` for handling date and time, `pandas` for data manipulation, and `numpy` for numerical computing support.


These libraries are crucial for fetching, processing, and analyzing the reviews of the "Fore Coffee" app from the Google Play Store.

In [1]:
pip install google-play-scraper # Installing google play scraper library

Collecting google-play-scraper
  Downloading google_play_scraper-1.2.7-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.7


In [2]:
# Importing libraries used on this project

from google_play_scraper import app, Sort, reviews_all
from datetime import datetime
import pandas as pd
import numpy as np

# **Date Filtering and Review Extraction**

This code below focuses on filtering and extracting reviews from the "Fore Coffee" app for the year 2023. It sets the date range for the entire year, fetches all reviews using the `google-play-scraper` library, and then filters the reviews to include only those within the specified date range. This step ensures that only relevant reviews from 2023 are included.

In [3]:
# Set the year to filter reviews for
year = 2023

# Set the start and end dates for the year 2023
start_date = datetime(year=year, month=1, day=1)
end_date = datetime(year=year, month=12, day=31)

result = reviews_all(
    'coffee.fore2.fore',
    sleep_milliseconds=0, # defaults to 0
    lang='id', # default to 'en'
    country='id', # default to 'us'
    sort=Sort.NEWEST, # default Sort.MOST_RELEVANT
)

# Filter the reviews by date range
reviews_filtered = [review for review in result if start_date <= datetime.strptime(review['at'].strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S') <= end_date]


# **Data Transformation**

This code below converts the filtered reviews into a DataFrame for further analysis. It first transforms the reviews into a structured format, then counts the number of reviews fetched. A preview of key review details (`username`, `at`, `content`, and `score`) is displayed. The data is sorted by the date of the review, from oldest to newest, to organize the reviews chronologically. Finally, a preview of the sorted data is shown to verify the transformation.

In [4]:
# Converting results into DataFrame

df_fore = pd.DataFrame(np.array(reviews_filtered),columns=['review'])
df_fore = df_fore.join(pd.DataFrame(df_fore.pop('review').tolist()))
df_fore.head()

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion
0,75e52b4f-803e-4182-b26d-871a03e61c6b,Zenita Hasanah,https://play-lh.googleusercontent.com/a-/ALV-U...,Pelayanan fore cofee Sunter mall buruk kalau p...,1,0,4.1.8,2023-12-29 05:57:31,"Hai ka, kami mohon maaf atas ketidaknyamanan y...",2023-12-30 01:05:59,4.1.8
1,c7c93b66-10ca-4f68-9a08-40c7907c2173,inez arindya,https://play-lh.googleusercontent.com/a/ACg8oc...,"Banyak promonya, kopinya enak",5,0,4.1.8,2023-12-28 02:38:25,"Hai kak, Terima kasih atas rating baik & ulasa...",2023-12-28 03:29:17,4.1.8
2,aa1b75ce-9c01-4a77-a371-c50a16eb141e,Ivo Paypal,https://play-lh.googleusercontent.com/a/ACg8oc...,ko bikin lah khusus pulau tertentu. masa estim...,1,0,4.1.8,2023-12-26 23:28:26,"Hai ka, kami mohon maaf atas ketidaknyamanan y...",2023-12-27 01:42:59,4.1.8
3,ccd39619-6b21-4556-a088-c39805e5e4f3,Ina Prameswary,https://play-lh.googleusercontent.com/a-/ALV-U...,terbaik,5,0,4.1.8,2023-12-26 04:09:49,"Hai kak, Terima kasih atas rating baik & ulasa...",2023-12-26 06:42:42,4.1.8
4,5ea46934-f256-4675-a377-4e6dd8002658,Rico Konyek,https://play-lh.googleusercontent.com/a/ACg8oc...,andalan,5,0,4.1.8,2023-12-25 05:12:29,"Hai kak, Terima kasih atas rating baik & ulasa...",2023-12-26 06:42:37,4.1.8


In [5]:
# Count the number of fetched reviews

len(df_fore.index)

243

In [6]:
 # Preview userName, date-time created, reviews, and score only

df_fore[['userName','at', 'content', 'score']].head()

Unnamed: 0,userName,at,content,score
0,Zenita Hasanah,2023-12-29 05:57:31,Pelayanan fore cofee Sunter mall buruk kalau p...,1
1,inez arindya,2023-12-28 02:38:25,"Banyak promonya, kopinya enak",5
2,Ivo Paypal,2023-12-26 23:28:26,ko bikin lah khusus pulau tertentu. masa estim...,1
3,Ina Prameswary,2023-12-26 04:09:49,terbaik,5
4,Rico Konyek,2023-12-25 05:12:29,andalan,5


In [7]:
# Sort the data by date in ascending order

new_df = df_fore[['userName','at', 'content', 'score']]
sorted_df = new_df.sort_values(by='at', ascending=True) # Sort by oldest data
sorted_df.columns = ['Name', 'Date Created', 'Review', 'Rating']
sorted_df.head()

Unnamed: 0,Name,Date Created,Review,Rating
242,Adisti Bastaman,2023-01-02 08:35:39,minuman terfavorit..selalu promo yaaa love for...,5
241,Deki Zulkifli,2023-01-03 09:25:10,enak,5
240,Julianto Budi,2023-01-05 05:20:29,"Setelah update, pembayaran elektronik sll stac...",1
239,Mugiwara,2023-01-08 14:29:34,Banyak promo,5
238,maria defina,2023-01-12 08:13:05,aren latte nya ennaakkkk,5


In [8]:
# Preview of the sorted data

sorted_df.head()

Unnamed: 0,Name,Date Created,Review,Rating
242,Adisti Bastaman,2023-01-02 08:35:39,minuman terfavorit..selalu promo yaaa love for...,5
241,Deki Zulkifli,2023-01-03 09:25:10,enak,5
240,Julianto Budi,2023-01-05 05:20:29,"Setelah update, pembayaran elektronik sll stac...",1
239,Mugiwara,2023-01-08 14:29:34,Banyak promo,5
238,maria defina,2023-01-12 08:13:05,aren latte nya ennaakkkk,5


# **Exporting Results to CSV File**

This code saves the processed reviews into a CSV file for permanent storage and later analysis. It creates a new directory to store the CSV file, navigates to the default directory, and copies the CSV file to the newly created folder. Finally, it navigates to the new directory and lists its contents to verify that the file has been successfully copied.

In [9]:
# Save the generated reviews as CSV file

sorted_df.to_csv("fore_coffee_2023_reviews.csv", index = False)

In [10]:
# Create a new directory to save csv file
!mkdir -p /content/generated_data

# Navigate to colab default directory
%cd /content/

# Copy csv file to another folder (/generated data)
!cp fore_coffee_2023_reviews.csv /content/generated_data

/content


In [11]:
# Navigate to new directory just created
%cd /content/generated_data
!ls

/content/generated_data
fore_coffee_2023_reviews.csv
