##  App Review Trend Analysis using Agentic AI

### Problem Statement  
Product teams need to understand recurring issues, feature requests, and feedback trends from
app store reviews. However, user reviews are noisy and often describe the same issue using
different language.

### Objective  
Build an agentic AI system that:
- Consumes daily Google Play Store reviews (batch-wise)
- Extracts high-recall topics (issues, requests, feedback)
- Consolidates semantically similar topics into canonical categories
- Tracks topic frequency daily
- Produces a rolling **T-30 trend report**

### Why Agentic AI  
Traditional topic modeling (LDA, TopicBERT) fails to reliably merge semantically similar topics.
This system uses:
- LLM-based topic extraction agents  
- Embedding-based semantic consolidation  
- Persistent topic memory across days


In [26]:
!pip install google-play-scraper pandas numpy sentence-transformers faiss-cpu tqdm




In [27]:
import sys
import pandas as pd

print(sys.version)
print(pd.__version__)


3.10.19 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 16:41:31) [MSC v.1929 64 bit (AMD64)]
2.3.3


In [4]:
from google_play_scraper import reviews


 Daily Review Ingestion Agent

Each day's app reviews are ingested as a separate batch,
simulating real-world daily feedback ingestion.


In [7]:
from datetime import datetime

# App configuration
APP_ID = "in.swiggy.android"
LANG = "en"
COUNTRY = "in"

# Date configuration
START_DATE = datetime(2024, 6, 1)
TARGET_DATE = datetime(2024, 6, 30)

# Similarity threshold
SIMILARITY_THRESHOLD = 0.85




In [8]:
test_day = START_DATE
sample_reviews = fetch_daily_reviews(APP_ID, test_day)

print(f"Fetched {len(sample_reviews)} reviews for {test_day.date()}")

if sample_reviews:
    print("\nSample review:")
    print(sample_reviews[0])
else:
    print("No reviews found for this date (this is acceptable).")


Fetched 0 reviews for 2024-06-01
No reviews found for this date (this is acceptable).


In [9]:
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np


In [12]:
# Load a lightweight sentence embedding model
model = SentenceTransformer("all-MiniLM-L6-v2")

print("Model loaded successfully")


Model loaded successfully


In [13]:
from datetime import timedelta

def collect_reviews(app_id, start_date, days=7):
    all_reviews = []

    for i in range(days):
        day = start_date + timedelta(days=i)
        daily = fetch_daily_reviews(app_id, day)
        all_reviews.extend(daily)

    return all_reviews


In [14]:
START_DATE = datetime(2024, 6, 1)
reviews_text = collect_reviews(APP_ID, START_DATE, days=7)

print("Total reviews collected:", len(reviews_text))



Total reviews collected: 0


In [15]:
if len(reviews_text) == 0:
    reviews_text = [
        "The app is very slow",
        "I love the new update",
        "Too many ads",
        "Great user experience",
        "App crashes frequently"
    ]
    print("Using fallback sample reviews")


Using fallback sample reviews


In [16]:
embeddings = model.encode(reviews_text, show_progress_bar=True)

print("Embeddings shape:", embeddings.shape)


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Embeddings shape: (5, 384)


In [17]:
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))

print("FAISS index size:", index.ntotal)


FAISS index size: 5


In [18]:
def search_similar_reviews(query, top_k=5):
    query_embedding = model.encode([query])
    distances, indices = index.search(np.array(query_embedding), top_k)

    results = []
    for idx in indices[0]:
        results.append(reviews_text[idx])

    return results


In [19]:
query = "app is slow and crashes"
results = search_similar_reviews(query)

print("Query:", query)
print("\nSimilar reviews:")
for r in results:
    print("-", r)


Query: app is slow and crashes

Similar reviews:
- App crashes frequently
- The app is very slow
- Too many ads
- Great user experience
- I love the new update


In [20]:
from collections import Counter

def detect_trends(reviews):
    keywords = ["slow", "crash", "ads", "update", "bug", "great", "love"]
    counter = Counter()

    for r in reviews:
        text = r.lower()
        for k in keywords:
            if k in text:
                counter[k] += 1

    return counter


In [21]:
trends = detect_trends(reviews_text)

print("Detected review trends:")
for k, v in trends.items():
    print(f"{k}: {v}")


Detected review trends:
slow: 1
update: 1
love: 1
ads: 1
great: 1
crash: 1
