# App Reviews AI - Demo Notebook

This notebook demonstrates how to use the App Reviews AI system to analyze app reviews from the Google Play Store.

## Setup

First, let's set up the environment and import the necessary modules.

In [None]:
# Import required libraries
import os
import sys
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from dotenv import load_dotenv

# Add the project root to the path so we can import our modules
sys.path.insert(0, '..')

# Load environment variables from .env file
load_dotenv()

# Check if OpenAI API key is available
if not os.environ.get("OPENAI_API_KEY"):
    print("⚠️ WARNING: OPENAI_API_KEY not found in environment variables. Some features may not work.")

## Initialize the Runner

Now let's import and initialize our review analysis runner.

In [None]:
from src.runner import ReviewAnalysisRunner

# Initialize the runner
runner = ReviewAnalysisRunner()

# Initialize the modules
runner._initialize_modules()

## Fetch App Reviews

Let's fetch reviews for an example mobile app from the Google Play Store. You can adjust the parameters as needed.

In [ ]:
# Fetch reviews
app_id = "com.example.app"  # Example mobile app ID
start_date = "6 months ago"     # Fetch reviews from 6 months ago
end_date = "now"                # Up to today
max_reviews = 1000              # Limit to 1000 reviews for demonstration purposes

reviews_df = runner.fetch_reviews(
    app_id=app_id,
    start_date=start_date,
    end_date=end_date,
    max_reviews=max_reviews
)

# Display app info
app_info = runner.pipeline_metadata.get("app_info", {})
print(f"App: {app_info.get('name')}")
print(f"Developer: {app_info.get('developer')}")
print(f"Total reviews: {app_info.get('total_reviews')}")
print(f"Average rating: {app_info.get('average_rating')}")

# Show the first few reviews
reviews_df.head()

## Preprocess Reviews

Now let's preprocess the reviews to clean and normalize the text.

In [None]:
# Preprocess reviews
processed_df = runner.preprocess_reviews(reviews_df)

# Show the preprocessed data
processed_df[["text", "cleaned_text", "normalized_text"]].head()

## Analyze Reviews

Let's run various analyses on the reviews.

In [None]:
# Analyze reviews
analysis_output = runner.analyze_reviews(
    reviews_df=processed_df,
    analysis_types=["sentiment", "topics", "keywords", "trends"]
)

# Extract updated DataFrame and analysis results
analyzed_df = analysis_output["reviews_df"]
analysis_results = analysis_output["analysis_results"]

# Show sentiment analysis results
if "sentiment" in analysis_results:
    print("=== Sentiment Analysis ===\n")
    sentiment_counts = analyzed_df["sentiment"].value_counts()
    print(sentiment_counts)
    print(f"\nPositive percentage: {sentiment_counts.get('positive', 0) / len(analyzed_df) * 100:.2f}%")
    print(f"Neutral percentage: {sentiment_counts.get('neutral', 0) / len(analyzed_df) * 100:.2f}%")
    print(f"Negative percentage: {sentiment_counts.get('negative', 0) / len(analyzed_df) * 100:.2f}%")

# Show topic modeling results
if "topics" in analysis_results:
    print("\n=== Topic Modeling ===\n")
    _, topic_words = analysis_results["topics"]
    for topic_id, words in topic_words.items():
        print(f"Topic {topic_id}: {', '.join(words[:10])}")

# Show some analyzed reviews with sentiment and topics
analyzed_df[["text", "sentiment", "primary_topic", "rating"]].head()

## Keyword Analysis

Let's examine the top keywords extracted from the reviews.

In [None]:
# Display keyword analysis results
if "keywords" in analysis_results:
    keywords_df = analysis_results["keywords"]
    print("Top 20 Keywords:")
    keywords_df.head(20)

## Rating Distribution Visualization

Let's create a visualization of the rating distribution.

In [ ]:
# Create rating distribution visualization
rating_viz = runner.visualizer.plot_rating_distribution(
    data=analyzed_df,
    title="Mobile App - Rating Distribution",
    use_plotly=True,
    close_fig=False
)

# Display the figure
rating_viz["figure"].show()

## Rating Trend Over Time

Let's examine how ratings have changed over time.

In [ ]:
# Create rating trend visualization
trend_viz = runner.visualizer.plot_rating_trend(
    data=analyzed_df,
    title="Mobile App - Rating Trend",
    freq="W",  # Weekly
    use_plotly=True,
    close_fig=False
)

# Display the figure
trend_viz["figure"].show()

## Sentiment Distribution

Let's visualize the sentiment distribution.

In [ ]:
# Create sentiment distribution visualization
sentiment_viz = runner.visualizer.plot_sentiment_distribution(
    data=analyzed_df,
    title="Mobile App - Sentiment Distribution",
    use_plotly=True,
    close_fig=False
)

# Display the figure
sentiment_viz["figure"].show()

## Word Cloud

Let's create a word cloud to visualize the most common terms in the reviews.

In [ ]:
# Create word cloud visualization
wordcloud_viz = runner.visualizer.plot_word_cloud(
    data=analyzed_df,
    title="Mobile App - Word Cloud",
    close_fig=False
)

# Display the figure
plt.figure(figsize=(12, 8))
plt.imshow(wordcloud_viz["figure"])
plt.axis("off")
plt.title("Mobile App - Word Cloud")
plt.show()

## Topic Distribution

Let's visualize the distribution of topics.

In [ ]:
# Create topic distribution visualization
if "topics" in analysis_results:
    _, topic_words = analysis_results["topics"]
    
    topic_viz = runner.visualizer.plot_topic_distribution(
        data=analyzed_df,
        topic_words=topic_words,
        title="Mobile App - Topic Distribution",
        use_plotly=True,
        close_fig=False
    )
    
    # Display the figure
    topic_viz["figure"].show()

## Generate Insights with LLM

Let's use the LLM to generate insights from the reviews.

In [None]:
# Generate insights
insights = runner.generate_insights(
    reviews_df=analyzed_df,
    analysis_results=analysis_results,
    insight_types=["general", "issues", "suggestions"]
)

# Display general insights
if "general" in insights:
    print("=== General Insights ===\n")
    print(insights["general"]["analysis"])

# Display identified issues
if "issues" in insights:
    print("\n=== Identified Issues ===\n")
    print(insights["issues"]["analysis"])

# Display suggestions
if "suggestions" in insights:
    print("\n=== Suggestions for Improvement ===\n")
    print(insights["suggestions"]["analysis"])

## Save Reviews to Storage

Let's save the analyzed reviews to our storage system.

In [None]:
# Store reviews
success = runner.store_reviews(analyzed_df)
print(f"Stored reviews successfully: {success}")

## Index Reviews in Vector Database

Let's index the reviews in our vector database for semantic search.

In [None]:
# Index reviews in vector database
success = runner.index_reviews(analyzed_df)
print(f"Indexed reviews successfully: {success}")

# Get vector database stats
stats = runner.vector_db.get_collection_stats()
print("\nVector Database Stats:")
for key, value in stats.items():
    print(f"{key}: {value}")

## Semantic Search

Let's search for reviews semantically using the vector database.

In [None]:
# Perform semantic searches
search_queries = [
    "Problems with booking flights",
    "Issues with customer service",
    "Compliments about the app's user interface",
    "Problems with check-in process"
]

for query in search_queries:
    print(f"\n=== Search: '{query}' ===\n")
    results = runner.vector_db.search(query, n_results=5)
    
    for i, result in enumerate(results):
        print(f"Result {i+1} (Score: {result.get('score', 'N/A')})")
        print(f"Rating: {result.get('rating', 'N/A')}")
        print(f"Sentiment: {result.get('sentiment', 'N/A')}")
        print(f"Text: {result.get('text', 'N/A')[:200]}..." if len(result.get('text', '')) > 200 else f"Text: {result.get('text', 'N/A')}")
        print()

## Create a Dashboard

Finally, let's create a comprehensive dashboard with all our visualizations.

In [ ]:
# Create dashboard
if "topics" in analysis_results:
    _, topic_words = analysis_results["topics"]
else:
    topic_words = None

dashboard = runner.visualizer.create_dashboard(
    data=analyzed_df,
    topic_words=topic_words,
    title="Mobile App Review Analysis Dashboard"
)

# Display dashboard file path
print(f"Dashboard created at: {dashboard['file_path']}")
print("\nOpen this file in a web browser to view the interactive dashboard.")

## Summary

In this notebook, we've demonstrated the capabilities of the App Reviews AI system:

1. Fetching reviews from the Google Play Store
2. Preprocessing and cleaning review text
3. Analyzing reviews for sentiment and topics
4. Extracting keywords and trends
5. Creating visualizations
6. Generating insights using LLM
7. Storing and indexing reviews for semantic search
8. Creating a comprehensive dashboard

To run the full pipeline in one go, you can use the `run_pipeline()` method from the runner, or use the command-line interface by running `python src/runner.py` with the appropriate arguments.