# Google Play Store Reviews

Name: Austin Kane\
NIM: 2702229232\
Explanation Video Link: [Video](https://drive.google.com/drive/folders/1F6TGbxbSQFHubLkAtHVDbxzIiRKXrqYP?usp=sharing)\
GitHub Link: [GitHub](https://github.com/Tinnne/Play-Store-Ratings-Prediction)

## Import Libraries

In [20]:
# Basic Libraries
import pandas as pd
import os

# Import Google Play Store Reviews Data
from google_play_scraper import Sort, reviews_all

## Import Google Play Store Games Data

Data Source: [PlayStoreAppsData.csv](https://www.kaggle.com/datasets/gauthamp10/google-playstore-apps)

In [21]:
PlayStoreData = pd.read_csv("Data/PlayStoreAppsData.csv")
PlayStoreApps = PlayStoreData.sort_values(by='Rating Count', ascending=False)

# Define Non-Games Categories
NotGamesCat = 'Art & Design|Auto & Vehicles|Beauty|Books & Reference|Business|Comics|Communication|Dating|Education|Entertainment|Events|Finance|Food & Drink|Health & Fitness|House & Home|Libraries & Demo|Lifestyle|Maps & Navigation|Medical|Music|Music & Audio|News & Magazines|Parenting|Personalization|Photography|Productivity|Shopping|Social|Tools|Travel & Local|Video Players & Editors|Weather'
PlayStoreApps = PlayStoreApps[~PlayStoreApps['Category'].str.contains(NotGamesCat)]

PlayStoreApps.head()

Unnamed: 0,App Name,App Id,Category,Rating,Rating Count,Installs,Minimum Installs,Maximum Installs,Free,Price,...,Developer Website,Developer Email,Released,Last Updated,Content Rating,Privacy Policy,Ad Supported,In App Purchases,Editors Choice,Scraped Time
244319,Garena Free Fire - Rampage,com.dts.freefireth,Action,4.2,89177097.0,"500,000,000+",500000000.0,976536041,True,0.0,...,https://ff.garena.com,freefire@garena.com,"Dec 7, 2017","Jun 04, 2021",Mature 17+,https://ff.garena.com/others/policy/en/,False,True,True,2021-06-16 00:32:12
423997,Clash of Clans,com.supercell.clashofclans,Strategy,4.5,56025424.0,"500,000,000+",500000000.0,643789632,True,0.0,...,http://supercell.helpshift.com/a/clash-of-clans/,gp-info@supercell.com,"Sep 30, 2013","Jun 09, 2021",Everyone 10+,http://www.supercell.net/privacy-policy,False,True,True,2021-06-16 03:26:52
58082,PUBG MOBILE - Traverse,com.tencent.ig,Action,4.3,37479011.0,"500,000,000+",500000000.0,505818718,True,0.0,...,,PUBGMOBILE_CS@tencentgames.com,"Mar 19, 2018","May 10, 2021",Teen,http://pubgmobile.proximabeta.com/privacy.html,True,True,True,2021-06-15 21:17:17
286345,Candy Crush Saga,com.king.candycrushsaga,Casual,4.6,31476637.0,"1,000,000,000+",1000000000.0,1208422684,True,0.0,...,http://candycrushsaga.com/help/,candycrush.techhelp@king.com,"Nov 15, 2012","Jun 04, 2021",Everyone,https://king.com/privacyPolicy,True,True,False,2021-06-16 01:14:37
1433519,Clash Royale,com.supercell.clashroyale,Strategy,4.2,31018623.0,"100,000,000+",100000000.0,405849099,True,0.0,...,http://supercell.helpshift.com/a/clash-royale/,gp-info@supercell.com,"Mar 1, 2016","Jun 10, 2021",Everyone 10+,http://www.supercell.net/privacy-policy,False,True,True,2021-06-16 00:21:39


## Save Games Description Data

In [25]:
GamesData = PlayStoreApps[['App Name', 'App Id', 'Category', 'Rating', 'Rating Count', 'Released', 'Content Rating']].copy()

# Clean Game Names
GamesData['App Name'] = GamesData['App Name'].str.replace(r'(\s-\s.*|:.*)', '', regex=True)
GamesData.to_csv("Data/GamesDesc.csv", index=False)
GamesData.head()

Unnamed: 0,App Name,App Id,Category,Rating,Rating Count,Released,Content Rating
244319,Garena Free Fire,com.dts.freefireth,Action,4.2,89177097.0,"Dec 7, 2017",Mature 17+
423997,Clash of Clans,com.supercell.clashofclans,Strategy,4.5,56025424.0,"Sep 30, 2013",Everyone 10+
58082,PUBG MOBILE,com.tencent.ig,Action,4.3,37479011.0,"Mar 19, 2018",Teen
286345,Candy Crush Saga,com.king.candycrushsaga,Casual,4.6,31476637.0,"Nov 15, 2012",Everyone
1433519,Clash Royale,com.supercell.clashroyale,Strategy,4.2,31018623.0,"Mar 1, 2016",Everyone 10+


## Collect Google Play Store Reviews Data

In [None]:
print('SOURCE |      GAME NAME      | REVIEWS')
print('======================================')
total_review = 0
mergeData = []
GamesData = GamesData[:20]  # Limit to top 20 games 
for index, game in GamesData.iterrows():
    language = 'en'
    country = 'us'
    file_path = f"Data/GameReviewData{country.upper()}/{game['App Name'].replace(' ', '')}-{language.upper()}-{country.upper()}.csv"

    # Load reviews from local file if exists
    if os.path.exists(file_path):
        source = 'Local'
        data = pd.read_csv(file_path, low_memory=False)
        print(f"{source:<7}| {game['App Name']:<20}| {data.index.size:>7,}")

    # Otherwise, scrape reviews from Google Play Store
    else: 
        source = 'Scrape'
        print(f"{source:<7}| {game['App Name']:<20}| ", end="")

        # Scrape game reviews
        result = reviews_all(game['App Id'], sleep_milliseconds=0, lang='en', country='us', sort=Sort.MOST_RELEVANT) # adjust sleep_milliseconds to avoid being rate limited
        data = pd.DataFrame.from_dict(result)
        os.makedirs(os.path.dirname(file_path), exist_ok=True)
        data.to_csv(file_path, index=False)

        print(f"{data.index.size:>7,}")

    data["game_name"] = game["App Name"]
    mergeData.append(data)
    
    # Update total reviews count
    total_review = total_review + int(data.index.size)

print(f"\nSuccesfully loaded {total_review:,} reviews from {GamesData.index.size:,} games")

# Merge and save all reviews data
data = pd.concat(mergeData, ignore_index=True)[["game_name", "content", "score", "at"]]
data["year"] = pd.to_datetime(data["at"], errors="coerce").dt.year
data.to_csv("Data/GamesData.csv", index=False)
print(f"Data saved to 'Data/GameReviewData{country.upper()}' folder and merged data to 'Data/GamesData.csv'")

data.head()

SOURCE |      GAME NAME      | REVIEWS
Local  | Garena Free Fire    |  72,000
Local  | Clash of Clans      |  63,000
Local  | PUBG MOBILE         |  85,500
Local  | Candy Crush Saga    |  81,000
Local  | Clash Royale        |  85,500
Local  | Mobile Legends      | 139,500
Local  | Roblox              |  63,000
Local  | 8 Ball Pool         | 126,000
Local  | Brawl Stars         |  49,500
Local  | My Talking Tom      |  85,500
Local  | Pokémon GO          |  94,500
Local  | Shadow Fight 2      | 108,000
Local  | Call of Duty®       |  85,500
Local  | Dream League Soccer | 144,000
Local  | My Talking Angela   |  76,500
Local  | Among Us            |  72,000
Local  | Pou                 | 117,000
Local  | Gardenscapes        | 103,500
Local  | Minion Rush         | 301,500
Local  | Homescapes          |  90,000

Succesfully loaded 2,043,000 reviews from 20 games
Data saved to 'Data/GameReviewDataUS' folder and merged data to 'Data/GamesData.csv'


Unnamed: 0,game_name,content,score,at,year
0,Garena Free Fire,Gunslinger meets battle royale! The developers...,5,2025-06-11 12:57:53,2025
1,Garena Free Fire,Free Fire is a really fun and addictive battle...,4,2025-09-26 11:04:50,2025
2,Garena Free Fire,I've been aware of this game for quite a while...,5,2025-07-20 19:12:10,2025
3,Garena Free Fire,The game has the potential to be great. Howeve...,3,2019-01-27 04:04:17,2019
4,Garena Free Fire,I personally find this game to be very enterta...,3,2019-01-01 00:37:56,2019
