# Project: Sentiment Analysis of DANA App Reviews
## Phase 1: Data Gathering (Web Scraping)

> **Author:** Azizah Adilah

> **Date:** January 2026

> **Project:** Data Science Portfolio - Sentiment Analysis

### Description
In this notebook, I perform web scraping to collect user reviews of the **DANA** application from the Google Play Store. This raw data will serve as the foundation for sentiment analysis and business insight discovery.

### **Import Libraries**

#### **Step 1: Import Libraries and Environment Setup**
In this first step, I install and import the necessary libraries to perform web scraping. 
* **`google-play-scraper`**: A Node.js and Python library to get reviews from the Google Play Store.
* **`pandas`**: Used for data manipulation and converting the scraped data into a structured DataFrame.
* **`os`**: To manage directory paths and ensure the data is saved in the correct folder.

In [4]:
from google_play_scraper import Sort, reviews
import pandas as pd
import os

### **Scraping Execution** 

#### **Step 2: Scraping Execution**
This is the core of the data gathering phase. I define the target application using its ID (`id.dana`) and set specific parameters:
* **Language & Country**: Set to 'id' (Indonesian) to capture local user feedback.
* **Sort**: Using `Sort.NEWEST` to get the most recent and relevant user experiences.
* **Count**: Collecting **1,000 reviews** to provide a robust sample size for statistical analysis.

In [5]:
# Application ID for DANA
app_id = 'id.dana'

# Scraping reviews
result, continuation_token = reviews(
    app_id,
    lang='id',      # Language: Indonesian
    country='id',   # Region: Indonesia
    sort=Sort.NEWEST, # Get the latest reviews
    count=1000      # Collecting 1000 reviews for a more robust analysis
)

# Convert the list of dictionaries into a DataFrame
df_reviews = pd.DataFrame(result)

# Display the first 5 rows to check the data
df_reviews.head()

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion
0,4620172f-ab4c-4455-9a82-8417db3a4202,Efannurdin Ipan,https://play-lh.googleusercontent.com/a/ACg8oc...,lumayan TPI kadang erorr sampah,5,0,,2026-01-29 12:54:15,"Hi Kak Ipan, maaf kalau sempat bikin kurang ny...",2026-01-29 13:08:12,
1,ac777878-9577-44d5-a134-801fee4fe618,Kuss kussaini,https://play-lh.googleusercontent.com/a-/ALV-U...,sangat buruk pelayananya baru beberapa hari ap...,1,0,2.111.0,2026-01-29 12:54:10,"Hi Kak, mohon maaf buat Kakak merasa tidak nya...",2026-01-29 13:08:51,2.111.0
2,6ce8bf94-43e3-4c78-8cbd-c0d3963b52c0,Purkon,https://play-lh.googleusercontent.com/a/ACg8oc...,luar biasa berguna,5,0,2.112.0,2026-01-29 12:52:09,,NaT,2.112.0
3,4a1debb7-8c14-4f47-bc6e-9ab59e433909,Sumpena Tea,https://play-lh.googleusercontent.com/a-/ALV-U...,bagus,5,0,,2026-01-29 12:45:31,,NaT,
4,31aa4bae-8a89-4dee-96e5-c06d19b0b26c,Linda Rahangmetan,https://play-lh.googleusercontent.com/a/ACg8oc...,sangat baik,5,0,2.103.1,2026-01-29 12:45:07,,NaT,2.103.1


### **Saving The Data**

#### **Step 3: Exporting Raw Data**
To ensure reproducibility and keep a backup of the original dataset, I save the result into a CSV file. 
* The code first checks if the `/data` directory exists, and creates it if it doesn't.
* The file is saved as `reviews_dana_raw.csv` without an index to keep the file size efficient.

In [6]:
# Create folder if it doesn't exist
if not os.path.exists('../data'):
    os.makedirs('../data')

# Save to CSV
df_reviews.to_csv('../data/reviews_dana_raw.csv', index=False)

print(f"Successfully scraped {len(df_reviews)} reviews and saved to data/reviews_dana_raw.csv")

Successfully scraped 1000 reviews and saved to data/reviews_dana_raw.csv
