Skip to content

"Utilized Python with Pandas, NumPy, and TensorFlow for data scraping and sentiment analysis in Microsoft Azure Data Studio. Employed MS Excel for data cleaning and exploration, with analysis done in PostgreSQL. Utilized Microsoft Power BI for visualization, deriving actionable insights."

Notifications You must be signed in to change notification settings

Mariyajoseph24/SugarFit_Google_Play_Store_Review_Analysis_and_Power_BI_Reporting

Repository files navigation

Sugarfit Google Play Store Review Analysis and Power BI Reporting

Coding

Contents📖

Mobile applications heavily rely on user feedback to refine their offerings and ensure customer satisfaction. The Sugar.fit Android app, aimed at addressing diabetes-related concerns, stands to benefit greatly from an in-depth analysis of user reviews sourced from the Google Play Store. This project endeavors to leverage advanced analytics techniques to dissect these reviews, extracting valuable insights that can inform strategic decisions and drive improvements in app functionality and user experience.

Feel free to reach out for any questions or suggestions about this project. I'm open to discussions and eager to assist. Linkedln | Mariya Joseph

Don't forget to follow and star ⭐ the repository if you find it valuable.

    Tools Used🛠️:
  • Database:PostgreSQL
  • Programming Language: Python
  • Libraries: Pandas, Numpy, tensorflow
  • IDE: Microsoft Azure Data Studio

    Tools Used🛠️:
  • Programming Language: Python
  • Libraries: Pandas, Numpy, Tensorflow
  • IDE: Microsoft Azure Data Studio
  • Import Required Libraries
  • from google_play_scraper import app, Sort ,reviews_all
    from app_store_scraper import AppStore
    import pandas as pd 
    import numpy as np 
    import json,os,uuid
  • Fetching Google Play Store Reviews for Sugarfit Android app
  • g_reviews=reviews_all(
        'fit.sugar.android',
        sleep_milliseconds=0,
        lang='en',
        country='us',
        sort=Sort.NEWEST,
    )
    
  • Transforming and Cleaning Google Play Store Reviews Data for Analysis
  • g_df=pd.DataFrame(np.array(g_reviews),columns=['review'])
    g_df2=g_df.join(pd.DataFrame(g_df.pop('review').tolist()))
    
    g_df2.drop(columns={'userImage','reviewCreatedVersion'},inplace=True)
    g_df2.rename(columns={'reviewId':'ID','userName':'Username','content':'Review','score':'AppRating','thumbsUpCount':'ThumbsUpCount','at':'ReviewTime','replyContent':'CompanyReply','repliedAt':'ReplyTime','appVersion':'AppVersion'},inplace=True)
    g_df2
    
    

    Coding

  • Calculating the Mean App Rating of Sugar.fit Android App Reviews
  • g_df2['AppRating'].mean()
    

    Coding

  • Installing TensorFlow
  • !pip install tensorflow
    
    import tensorflow as tf
    
    !pip install ipykernel
    
    import tensorflow as tf
    print(tf.__version__)
    

    Coding

  • Importing and Configuring Sentiment Analysis Pipeline
  • from transformers import pipeline
    sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")
    
    print(sentiment_analysis("I love this!"))
    

    Coding

  • Checking Data Types of DataFrame Columns
  • g_df2.dtypes
    

    Coding

  • Converting Review Column to String Data Type
  • g_df2['Review']=g_df2['Review'].astype('str')
    
  • Applying Sentiment Analysis on Review Column
  • g_df2['result']=g_df2['Review'].apply(lambda x: sentiment_analysis(x))
    
    g_df2.head()
    

    Coding

  • Extracting Sentiment Label and Score from Result Column
  • g_df2['sentiment']=g_df2['result'].apply(lambda x: (x[0]['label']))
    g_df2['score']=g_df2['result'].apply(lambda x: (x[0]['score']))
    
    g_df2.head()
    

    Coding

  • Calculating the Mean Sentiment Score
  • g_df2['score'].mean()
    

    Coding

    g_df2['sentiment'].value_counts()
    

    Coding

  • Calculating Normalized Sentiment Distribution
  • g_df2['sentiment'].value_counts(normalize=True)
    

    Coding

  • Visualizing Sentiment Distribution with Plotly Express
  • import plotly.express as px
    fig=px.histogram(g_df2,x='sentiment',color='sentiment', text_auto=True)
    fig.show()
    

    Coding
    Coding

  • Exporting DataFrame to CSV File and Reading it Back
  • g_df2.to_csv("C:\\Users\\hp\\Desktop\\newfile.CSV")
    g_df2=pd.read_csv("C:\\Users\\hp\\Desktop\\newfile.csv")
    g_df2
    

    • Tools Used🛠️:Microsoft Excel
    • Deleted unwanted columns that is not required for this analysis
    • Checked and formatted the cells with proper datatypes
    • Coding
    • Missing Values in each column
    1. used filter function in excel to identify missing/null values
    2. Coding
    3. conditional foramtting to identify and highlight the missing values
    Coding
    • Removing the duplicates
    • Coding
    • Handling the missing values by using find & select inbuilt function in excel
    1. Replaced the blank space with NULL for the column that is TEXT Datatype
    2. Replaced the blank space with a default date value for the column that is DATE Datatype
    Coding Coding ------------------------------------------------------------------------------------------------
    • Tools Used⚙️:PostgreSQL
    • Creating and importing dataset to postgreSQL
    CREATE TABLE SF1 (
      ID VARCHAR(50),
      Username VARCHAR(50),
      Review VARCHAR(5000),
      AppRating INT,
      ThumbsUpCount INT,
      ReviewTime DATE,
      CompanyReply VARCHAR(5000),
      ReplyTime DATE,
      sentiment VARCHAR(10),
      score NUMERIC );
    
    Coding
  • Selecting and viewing the dataset
  • SELECT * FROM SF1
    Coding
  • Total Number of reviews
  • SELECT COUNT(*) AS total_reviews
    FROM sf1;
    
    Coding
  • Total number of positive reviews
  • SELECT COUNT(*)
    FROM SF1
    WHERE sentiment='POSITIVE';
    
    Coding
  • Total number of negative reviews
  • SELECT COUNT(*)
    FROM SF1
    WHERE sentiment='NEGATIVE';
    
    Coding
  • Average App Rating
  • SELECT AVG(AppRating) AS average_rating
    FROM sf1
    
    
    Coding
  • Review with highest thump count
  • SELECT Review, ThumbsUpCount
    FROM sf1
    ORDER BY ThumbsUpCount DESC
    LIMIT 1;
    
    
    Coding Coding
  • Users with high number of reviews
  • SELECT Username, COUNT(*) AS review_count
    FROM sf1
    GROUP BY Username
    ORDER BY review_count DESC
    LIMIT 5;
    
    
    Coding
  • Average Sentiment Score
  • SELECT AVG(score) AS average_sentiment_score
    FROM sf1;
    
    
    Coding
  • Count of User Reviews that are not replied by company
  • SELECT COUNT(*)AS not_replied
    FROM SF1
    WHERE companyreply='NULL';
    
    
    Coding Coding
  • Total number of reviews that are replied by company
  • SELECT id,review
    FROM SF1
    WHERE companyreply='NULL';
  • Reviews that are not replied by company
  • SELECT COUNT(*)AS replied
    FROM sf1
    WHERE id NOT IN
    (SELECT id
    FROM SF1
    WHERE companyreply='NULL');
    
    
    Coding
  • Reviews with sentiment negative to analyze the issues
  • SELECT review
    FROM sf1
    WHERE sentiment = 'NEGATIVE';
    
    
    Coding
  • Distribution of ratngs and it's total count
  • SELECT AppRating, COUNT(*) AS rating_count
    FROM sf1
    GROUP BY AppRating
    ORDER BY rating_count DESC;
    
    Coding
  • Reviews with the text like 'Challenges'
  • SELECT id,review
    FROM sf1
    WHERE Review LIKE '%challenges%';
    
    Coding
  • Total number of reviews per year
  • SELECT EXTRACT(YEAR FROM ReviewTime) AS review_year, COUNT(*) AS review_count
    FROM sf1
    GROUP BY review_year
    ORDER BY review_year;
    
    
    Coding
  • Reviews with the text like 'Personalized diet'
  • SELECT id,review
    FROM sf1
    WHERE Review LIKE '%personalized diet%';
    
    
    Coding
  • Latest 5 reviews and date
  • SELECT review,reviewtime
    FROM sf1
    ORDER BY ReviewTime DESC
    LIMIT 5;
    
    
    Coding Coding
------------------------------------------------------------------------------------------------------
  • Tools Used⚙️:Microsoft Power BI
Coding

Based on the analysis conducted in the project, the following findings and suggestions can be derived:

  1. Understanding User Issues: By analyzing the negative reviews, we gained insights into user concerns and issues. This enables us to address them more effectively, leading to improved user satisfaction.
  2. Identifying Review Trends: The project helped in identifying trends in user reviews, such as common topics mentioned in negative feedback or frequently praised aspects in positive reviews. This information can guide product development and marketing strategies.
  3. Enhancing Customer Trust: Responding promptly to customer reviews demonstrates attentiveness and care towards users' experiences. It fosters trust and loyalty among customers, showing them that their feedback is valued and acted upon.
  4. Improving Customer Engagement: Engaging with customers through timely replies to reviews creates a positive interaction and strengthens the relationship between the company and its users. It encourages ongoing dialogue and fosters a sense of community around the product.
  5. Leveraging Feedback for Improvement: Analyzing user feedback provides valuable insights for product improvement and feature development. By addressing user concerns and implementing suggestions, the company can enhance the overall user experience and stay competitive in the market.
  6. Understanding Pain Points: Negative reviews often highlight areas where users are facing challenges or experiencing dissatisfaction with the product or service. By analyzing these reviews, we can pinpoint specific pain points and areas for improvement.
  7. Proactive Issue Resolution: Identifying and addressing negative feedback promptly demonstrates a proactive approach to problem-solving. By acknowledging user concerns and taking steps to resolve issues, the company can prevent further escalation and mitigate potential damage to its reputation.
  8. Opportunity for Improvement: Negative reviews present valuable opportunities for improvement. By listening to user feedback and incorporating suggestions for enhancement, the company can iteratively improve its products or services, ultimately leading to a better user experience and increased customer satisfaction.
  9. Building Trust and Credibility: Transparently addressing negative feedback shows users that their concerns are taken seriously and that the company is committed to delivering a high-quality product or service. This builds trust and credibility with users, fostering stronger relationships and brand loyalty over time.
  10. Continuous Feedback Loop: By monitoring and analyzing negative reviews on an ongoing basis, the company can establish a continuous feedback loop for product improvement. This allows for agile and responsive development, ensuring that user needs and preferences are always top of mind.

About

"Utilized Python with Pandas, NumPy, and TensorFlow for data scraping and sentiment analysis in Microsoft Azure Data Studio. Employed MS Excel for data cleaning and exploration, with analysis done in PostgreSQL. Utilized Microsoft Power BI for visualization, deriving actionable insights."

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published