# Installing Libraries:

1. ***google-play-scraper***: Scrapes data from Google Play Store
2. ***pandas***: For Data Analysis
3. ***matplotlib***: For Data Vizualization
4. ***gradio***: For deployment and demonstration

In [2]:
!pip install google-play-scraper pandas matplotlib gradio



# Sorting the Positive, Neutral and Negative Reviews and Putting them on a Graph

Note that this just puts the numbers on a graph. Improvements coming soon!

## Inputs:
*The app id* (and not the app name) - The app ID is the unique identifier of an app on the Play Store.

### Steps to get the app ID:
 1. Open up the Goodle Play Store on a laptop browser (I'm using Chrome)
 2. Go to an app of your choice (say Instagram), and find the link. On chrome, the play store link for Instagram looks like this: https://play.google.com/store/apps/details?id=com.instagram.android&hl=en

 3. The app id is the bit that comes after the clause 'id=' in the link. For Instagram, it is 'com.instagram.android'

 ## Outputs

 1. A graph showing the number of positive, neutral and negative reviews

In [3]:
from google_play_scraper import Sort, reviews_all
import pandas as pd
import matplotlib.pyplot as plt
import gradio as gr

# Function to fetch reviews and plot the graph
def get_review_graph(app_id):
    # Fetch reviews
    result = reviews_all(
        app_id,
        sleep_milliseconds=0,  # no delay
        lang='en',  # language English
        country='in',  # country India
        sort=Sort.NEWEST  # Sort by newest reviews
    )

    # Convert to DataFrame
    df = pd.DataFrame(result)

    # Classify reviews based on score
    df['score_label'] = df['score'].apply(lambda x: 'Positive' if x > 3
                                          else ('Neutral' if x == 3
                                                else 'Negative'))

    # Count the number of reviews in each category
    review_counts = df['score_label'].value_counts()

    # Plot the bar chart
    plt.figure(figsize=(6, 4))
    review_counts.plot(kind='bar', color=['green', 'gray', 'red'])
    plt.title(f"Review Sentiment for {app_id}")
    plt.xlabel('Sentiment')
    plt.ylabel('Number of Reviews')
    plt.xticks(rotation=0)

    # Save plot as an image
    plt.savefig('review_graph.png')
    plt.close()

    return 'review_graph.png'

# Gradio interface
with gr.Blocks() as demo:
    app_id_input = gr.Textbox(label="Enter App ID", placeholder="e.g., com.zapmoney")
    output_image = gr.Image(label="Review Graph")

    submit_btn = gr.Button("Submit")
    submit_btn.click(get_review_graph, inputs=app_id_input, outputs=output_image)

# Launch the Gradio app
demo.launch()


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://de3a00d06b2f4c7eed.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://21b19ac33460c16766.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




#Finding the Frequently raised issues and compliments

## Changes made to the Graph:
 Added Data Labels the Bars

## Additional Functionality
1. I've added a section that shows the most common words / phrases in both positive and negative reviews
2. I've added the option to download the reviews in csv form, with the sentiment labels and key words of each review attached.

In [6]:
from google_play_scraper import Sort, reviews_all
import pandas as pd
import matplotlib.pyplot as plt
import gradio as gr
from collections import Counter
import nltk
from nltk.corpus import stopwords

# Download stopwords list from NLTK
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

# Add custom words to omit
custom_stopwords = {'app', 'download', 'issue', 'problem', 'good', 'bad','loan','application'}
stop_words.update(custom_stopwords)  # Combine both stopwords and custom words

# Function to fetch reviews, plot the graph, extract feedback, and export CSV
def get_review_graph_and_feedback(app_id):
    # Fetch reviews
    result = reviews_all(
        app_id,
        sleep_milliseconds=0,  # no delay
        lang='en',  # language English
        country='in',  # country India
        sort=Sort.NEWEST  # Sort by newest reviews
    )

    # Convert to DataFrame
    df = pd.DataFrame(result)

    # Classify reviews based on score
    df['score_label'] = df['score'].apply(lambda x: 'Positive' if x > 3 else ('Neutral' if x == 3 else 'Negative'))

    # Ensure order: Positive, Neutral, Negative
    df['score_label'] = pd.Categorical(df['score_label'], categories=['Positive', 'Neutral', 'Negative'], ordered=True)

    # Extract keywords from reviews (removing stopwords)
    df['keywords'] = df['content'].apply(lambda review: get_keywords(review))

    # Format date and time for readability
    df['review_datetime'] = pd.to_datetime(df['at']).dt.strftime('%Y-%m-%d %H:%M:%S')

    # Count the number of reviews in each category
    review_counts = df['score_label'].value_counts()

    # Plot the bar chart
    plt.figure(figsize=(6, 4))
    review_counts.plot(kind='bar', color=['green', 'gray', 'red'])

    # Add labels on top of each bar
    for i, count in enumerate(review_counts):
        plt.text(i, count, str(count), ha='center', va='bottom')

    plt.title(f"Review Sentiment for {app_id}")
    plt.xlabel('Sentiment')
    plt.ylabel('Number of Reviews')
    plt.xticks(rotation=0)

    # Save plot as an image
    plt.savefig('review_graph.png')
    plt.close()

    # Extract feedback (liked best and common issues)
    positive_reviews = df[df['score_label'] == 'Positive']['content']
    negative_reviews = df[df['score_label'] == 'Negative']['content']

    liked_best = get_common_phrases(positive_reviews)
    common_issues = get_common_phrases(negative_reviews)

    # Export the DataFrame as a CSV file with sentiment, keywords, review date, and reviewer name
    export_path = f"{app_id}_reviews.csv"
    df[['userName', 'content', 'score_label', 'keywords', 'review_datetime']].to_csv(export_path, index=False)

    return 'review_graph.png', liked_best, common_issues, export_path

# Function to get most common words/phrases in reviews, excluding stopwords
def get_common_phrases(reviews):
    all_words = ' '.join(reviews).lower().split()  # Convert to lowercase and split words
    filtered_words = [word for word in all_words if word not in stop_words and word.isalpha()]  # Remove stopwords and non-alphabetic words
    common_phrases = Counter(filtered_words).most_common(5)  # Get top 5 common words
    return ', '.join([word for word, count in common_phrases])

# Function to extract keywords for individual reviews
def get_keywords(review):
    words = review.lower().split()  # Convert to lowercase and split words
    keywords = [word for word in words if word not in stop_words and word.isalpha()]  # Remove stopwords and non-alphabetic words
    return ' '.join(keywords)

# Gradio interface
with gr.Blocks() as demo:
    app_id_input = gr.Textbox(label="Enter App ID", placeholder="e.g., com.zapmoney")
    output_image = gr.Image(label="Review Graph")
    liked_best_output = gr.Textbox(label="What users liked best")
    common_issues_output = gr.Textbox(label="Most common issues")
    csv_output = gr.File(label="Download CSV")

    submit_btn = gr.Button("Submit")
    submit_btn.click(get_review_graph_and_feedback,
                     inputs=app_id_input,
                     outputs=[output_image, liked_best_output, common_issues_output, csv_output])

# Launch the Gradio app
demo.launch()


[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://7ea8ebdc26ce3f57e0.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


