# üìò Task 1: Data Collection

**Project:** Fintech Mobile CX Analytics  
**Author:** Mifta Y  

---

## üéØ Objective
Collect customer reviews from the Google Play Store for the following banks:
1. **Commercial Bank of Ethiopia (CBE)**
2. **Bank of Abyssinia (BOA)**
3. **Dashen Bank**

**Target:** ‚â• 400 reviews per bank (Total ‚â• 1,200).

---

In [1]:
import logging
import pandas as pd
from pathlib import Path
from google_play_scraper import Sort, reviews
from datetime import datetime

# Configure Logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## ‚öôÔ∏è Configuration

In [2]:
APP_PACKAGES = {
    "CBE": "com.combanketh.mobilebanking",
    "BOA": "com.boa.boaMobileBanking",
    "Dashen": "com.dashen.dashensuperapp",
}

TARGET_COUNT = 500
OUTPUT_DIR = Path("../data/raw")
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

## üöÄ Scraping Function

In [3]:
def fetch_reviews(bank_name, app_id, count=TARGET_COUNT):
    logger.info(f"Fetching {count} reviews for {bank_name}...")
    try:
        result, _ = reviews(
            app_id,
            lang='en',
            country='et',
            sort=Sort.NEWEST,
            count=count
        )
        
        data = []
        for r in result:
            data.append({
                "source": "Google Play",
                "bank_name": bank_name,
                "app_id": app_id,
                "review_date": r["at"],
                "user_name": r["userName"],
                "rating": r["score"],
                "review_text": r["content"],
                "thumbs_up_count": r["thumbsUpCount"],
                "app_version": r["reviewCreatedVersion"],
            })
        return data
    except Exception as e:
        logger.error(f"Error fetching {bank_name}: {e}")
        return []

## üì¶ Execution & Save

In [4]:
all_reviews = []

for bank, app_id in APP_PACKAGES.items():
    bank_data = fetch_reviews(bank, app_id)
    all_reviews.extend(bank_data)
    logger.info(f"Fetched {len(bank_data)} reviews for {bank}.")

df = pd.DataFrame(all_reviews)

# Save to CSV
timestamp = datetime.now().strftime("%Y-%m-%d")
filename = f"reviews_raw_{timestamp}.csv"
file_path = OUTPUT_DIR / filename

df.to_csv(file_path, index=False)
logger.info(f"Saved {len(df)} total reviews to {file_path}")

2025-12-02 23:31:29,899 - INFO - Fetching 500 reviews for CBE...
2025-12-02 23:31:33,226 - INFO - Fetched 500 reviews for CBE.
2025-12-02 23:31:33,229 - INFO - Fetching 500 reviews for BOA...
2025-12-02 23:31:36,319 - INFO - Fetched 500 reviews for BOA.
2025-12-02 23:31:36,324 - INFO - Fetching 500 reviews for Dashen...
2025-12-02 23:31:39,990 - INFO - Fetched 500 reviews for Dashen.
2025-12-02 23:31:40,160 - INFO - Saved 1500 total reviews to ..\data\raw\reviews_raw_2025-12-02.csv


## üìä Validation

In [5]:
print("Total Reviews:", len(df))
print("\nPer Bank Count:")
print(df['bank_name'].value_counts())

Total Reviews: 1500

Per Bank Count:
bank_name
CBE       500
BOA       500
Dashen    500
Name: count, dtype: int64
