# Complete Pipeline: Google Play Reviews Trend Analysis

This notebook orchestrates the end-to-end daily pipeline for the Pulsegen assignment. Run each section from top to bottom.


## 1. Configure Inputs
Set the Google Play store URL, target report date (T), and LLM provider.

In [None]:
import os
from datetime import datetime, timedelta, timezone
from pathlib import Path
import json

os.environ['MEGALLM_API_KEY'] = 'sk-mega-5c718f7e9327ca90c8dbf159e39a4192407c48474a922ec5fa6a91027a247d1a'
os.environ['MEGALLM_BASE_URL'] = 'https://ai.megallm.io/v1'
os.environ['OPENAI_API_KEY'] = os.environ['MEGALLM_API_KEY']

APP_STORE_URL = 'https://play.google.com/store/apps/details?id=in.swiggy.android'
TARGET_DATE = datetime.now(timezone.utc).astimezone(timezone(timedelta(hours=5, minutes=30)) ).date()
ROLLING_WINDOW_DAYS = 30
LLM_PROVIDER = 'megallm'
LLM_MODEL = os.getenv('MEGALLM_MODEL', 'mega-1-chat')

print(f'App: {APP_STORE_URL}')
print(f'Report date (T): {TARGET_DATE}')
print(f'Rolling window: {ROLLING_WINDOW_DAYS} days')
print(f'LLM: {LLM_PROVIDER} / {LLM_MODEL}')

## 2. Run Data Cleaning (June 2024+)
Loads raw CSV and writes filtered parquet.

In [None]:
!jupyter nbconvert --to notebook --execute --inplace notebooks/01_setup_and_clean.ipynb

## 3. Run Topic Routing by Day
Generates per-day parquet files with topic assignments.

In [None]:
!jupyter nbconvert --to notebook --execute --inplace notebooks/02_topic_router.ipynb

## 4. Generate 30-Day Trend Report
Creates CSV and HTML trend outputs under /output.

In [None]:
!jupyter nbconvert --to notebook --execute --inplace notebooks/05_trend_analysis.ipynb

## 5. Verify Output Artifacts
Checks that expected CSV/HTML reports exist.

In [None]:
from pathlib import Path
OUTPUT_DIR = Path('output')
artifacts = sorted(OUTPUT_DIR.glob('topics_trend_*.csv'))
if not artifacts:
    raise FileNotFoundError('No CSV trend report found in output/.')
latest_csv = artifacts[-1]
latest_html = latest_csv.with_suffix('.html')
print(f'Latest CSV: {latest_csv}')
print(f'HTML report exists: {latest_html.exists()}')
