# Scraping Ulasan Aplikasi Twitter dari Google Play Store

Notebook ini digunakan untuk melakukan proses pengambilan data ulasan aplikasi **Twitter** dari Google Play Store menggunakan library `google-play-scraper`. Data akan diberi label sentimen berdasarkan rating dan dibersihkan untuk digunakan dalam pelatihan model Machine Learning.

## Langkah-langkah:
- Instalasi library
- Scraping data ulasan
- Menyimpan data ke CSV
- Pemberian label sentimen (positif, netral, negatif)
- Pembersihan teks (preprocessing)


In [27]:
# Install library
!pip install google-play-scraper



In [28]:
# Import library
from google_play_scraper import reviews, Sort
import pandas as pd
import re

In [29]:
# Set variabel
app_id = 'com.twitter.android'
total_reviews = 10000
batch_size = 100
all_reviews = []

In [30]:
# Scraping loop
for i in range(0, total_reviews, batch_size):
    print(f"Scraping {i + 1} - {i + batch_size} ...")
    result, _ = reviews(
        app_id,
        lang='id',
        country='id',
        sort=Sort.NEWEST,
        count=batch_size,
        filter_score_with=None
    )
    all_reviews.extend(result)

Scraping 1 - 100 ...
Scraping 101 - 200 ...
Scraping 201 - 300 ...
Scraping 301 - 400 ...
Scraping 401 - 500 ...
Scraping 501 - 600 ...
Scraping 601 - 700 ...
Scraping 701 - 800 ...
Scraping 801 - 900 ...
Scraping 901 - 1000 ...
Scraping 1001 - 1100 ...
Scraping 1101 - 1200 ...
Scraping 1201 - 1300 ...
Scraping 1301 - 1400 ...
Scraping 1401 - 1500 ...
Scraping 1501 - 1600 ...
Scraping 1601 - 1700 ...
Scraping 1701 - 1800 ...
Scraping 1801 - 1900 ...
Scraping 1901 - 2000 ...
Scraping 2001 - 2100 ...
Scraping 2101 - 2200 ...
Scraping 2201 - 2300 ...
Scraping 2301 - 2400 ...
Scraping 2401 - 2500 ...
Scraping 2501 - 2600 ...
Scraping 2601 - 2700 ...
Scraping 2701 - 2800 ...
Scraping 2801 - 2900 ...
Scraping 2901 - 3000 ...
Scraping 3001 - 3100 ...
Scraping 3101 - 3200 ...
Scraping 3201 - 3300 ...
Scraping 3301 - 3400 ...
Scraping 3401 - 3500 ...
Scraping 3501 - 3600 ...
Scraping 3601 - 3700 ...
Scraping 3701 - 3800 ...
Scraping 3801 - 3900 ...
Scraping 3901 - 4000 ...
Scraping 4001 - 4100 

In [31]:
# Simpan ke DataFrame
df = pd.DataFrame(all_reviews)[['userName', 'content', 'score']]

In [32]:
# Labeling sentimen
def get_sentiment(score):
    if score <= 2:
        return 'negatif'
    elif score == 3:
        return 'netral'
    else:
        return 'positif'

df['label'] = df['score'].apply(get_sentiment)


In [35]:
# Preprocessing teks ulasan
def clean_text(text):
    text = text.lower()
    text = re.sub(r'http\S+', '', text)        # hapus URL
    text = re.sub(r'@\w+', '', text)           # hapus mention
    text = re.sub(r'#\w+', '', text)           # hapus hashtag
    text = re.sub(r'[^a-zA-Z\s]', '', text)    # hapus karakter selain huruf
    text = re.sub(r'\s+', ' ', text).strip()   # hapus spasi berlebih
    return text

df['clean_content'] = df['content'].apply(clean_text)

# Tampilkan hasil
df[['content', 'clean_content', 'score', 'label']].head()

Unnamed: 0,content,clean_content,score,label
0,keren,keren,5,positif
1,hidup yg terpeedaya,hidup yg terpeedaya,1,negatif
2,"Twitter skrng jlek skli,",twitter skrng jlek skli,5,positif
3,udh di update tp pas dibuka ga bisa di scroll,udh di update tp pas dibuka ga bisa di scroll,2,negatif
4,gua masuk ke akun gua sendiri kenapa di kira a...,gua masuk ke akun gua sendiri kenapa di kira a...,1,negatif


In [36]:
# Simpan hasil akhir
df.to_csv('twitter_data_scraping.csv', index=False)
print("✅ Done! Total data scraped:", len(df))

✅ Done! Total data scraped: 10000
