##  App Review Trend Analysis using Agentic AI

### Problem Statement  
Product teams need to understand recurring issues, feature requests, and feedback trends from
app store reviews. However, user reviews are noisy and often describe the same issue using
different language.

### Objective  
Build an agentic AI system that:
- Consumes daily Google Play Store reviews (batch-wise)
- Extracts high-recall topics (issues, requests, feedback)
- Consolidates semantically similar topics into canonical categories
- Tracks topic frequency daily
- Produces a rolling **T-30 trend report**

### Why Agentic AI  
Traditional topic modeling (LDA, TopicBERT) fails to reliably merge semantically similar topics.
This system uses:
- LLM-based topic extraction agents  
- Embedding-based semantic consolidation  
- Persistent topic memory across days


In [39]:
!pip install google-play-scraper pandas numpy sentence-transformers faiss-cpu tqdm




In [40]:
import pandas as pd
import numpy as np
from datetime import datetime
from google_play_scraper import reviews
from sentence_transformers import SentenceTransformer
import faiss



In [41]:
APP_ID = "in.swiggy.android"   # keep one app only
START_DATE = datetime(2024, 6, 1)
END_DATE = datetime.now()



 Daily Review Ingestion Agent

Each day's app reviews are ingested as a separate batch,
simulating real-world daily feedback ingestion.


In [42]:
def fetch_reviews_with_dates(app_id, max_reviews=2000):
    result, _ = reviews(
        app_id,
        lang="en",
        country="in",
        count=max_reviews
    )

    data = []
    for r in result:
        if r.get("at") and START_DATE <= r["at"] <= END_DATE:
            data.append({
                "date": r["at"].date(),
                "review": r["content"]
            })

    return pd.DataFrame(data)

df = fetch_reviews_with_dates(APP_ID)
df.head()






Unnamed: 0,date,review
0,2025-12-24,very good service
1,2025-12-24,nice üëçüèª
2,2025-12-24,good app
3,2025-12-24,wow delivering at 2:43 AM üò±
4,2025-12-24,"I loved swiggy Services till now. But today, I..."


In [43]:
model = SentenceTransformer("all-MiniLM-L6-v2")
print("Model loaded")



Model loaded


In [44]:
embeddings = model.encode(df["review"].tolist(), show_progress_bar=True)



Batches:   0%|          | 0/63 [00:00<?, ?it/s]

In [45]:
dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))



In [46]:
queries = {
    "Crash Issue": "app crashes, stops working, keeps closing",
    "Login Issue": "cannot login, sign in problem, authentication failed",
    "Performance Issue": "app is slow, lagging, takes long time",
    "UI Feedback": "design, interface, layout, UI",
    "Positive Feedback": "great app, love it, amazing experience"
}


In [47]:
results = []

for topic, query in queries.items():
    q_vec = model.encode([query])
    D, I = index.search(np.array(q_vec), k=5)

    for score, idx in zip(D[0], I[0]):
        results.append({
            "date": df.iloc[idx]["date"],
            "review": df.iloc[idx]["review"],
            "detected_topic": topic,
            "similarity_score": float(score)
        })

final_df = pd.DataFrame(results)
final_df




Unnamed: 0,date,review,detected_topic,similarity_score
0,2025-12-23,app is not working bug create issues,Crash Issue,0.750286
1,2025-12-22,bad app,Crash Issue,0.900621
2,2025-12-21,bad app,Crash Issue,0.900621
3,2025-12-21,app not working properly new features are not ...,Crash Issue,0.964294
4,2025-12-22,bad app no customer service,Crash Issue,0.993483
5,2025-12-21,not able to login my account. it always shows ...,Login Issue,0.553503
6,2025-12-23,"My account is not login, its get session time ...",Login Issue,0.796147
7,2025-12-22,Did not able to Loginn .Shiw error that tooman...,Login Issue,0.912552
8,2025-12-21,i am unable to login to my account due to mult...,Login Issue,1.050485
9,2025-12-21,I'm not able to login using any mobile number ...,Login Issue,1.055312
