- Introduction
- Data Scraping and Sentiment Analysis
- Data Cleaning and Exploration
- Data Analysis
- Data Visualization
- Findings
Mobile applications heavily rely on user feedback to refine their offerings and ensure customer satisfaction. The Sugar.fit Android app, aimed at addressing diabetes-related concerns, stands to benefit greatly from an in-depth analysis of user reviews sourced from the Google Play Store. This project endeavors to leverage advanced analytics techniques to dissect these reviews, extracting valuable insights that can inform strategic decisions and drive improvements in app functionality and user experience.
Feel free to reach out for any questions or suggestions about this project. I'm open to discussions and eager to assist.
Linkedln | Mariya Joseph
Don't forget to follow and star ⭐ the repository if you find it valuable.
- Tools Used🛠️:
- Database:PostgreSQL
- Programming Language: Python
- Libraries: Pandas, Numpy, tensorflow
- IDE: Microsoft Azure Data Studio
- Tools Used🛠️:
- Programming Language: Python
- Libraries: Pandas, Numpy, Tensorflow
- IDE: Microsoft Azure Data Studio
- Import Required Libraries
- Fetching Google Play Store Reviews for Sugarfit Android app
- Transforming and Cleaning Google Play Store Reviews Data for Analysis
- Calculating the Mean App Rating of Sugar.fit Android App Reviews
- Installing TensorFlow
- Importing and Configuring Sentiment Analysis Pipeline
- Checking Data Types of DataFrame Columns
- Converting Review Column to String Data Type
- Applying Sentiment Analysis on Review Column
- Extracting Sentiment Label and Score from Result Column
- Calculating the Mean Sentiment Score
- Calculating Normalized Sentiment Distribution
- Visualizing Sentiment Distribution with Plotly Express
- Exporting DataFrame to CSV File and Reading it Back
- Tools Used🛠️:Microsoft Excel
- Deleted unwanted columns that is not required for this analysis
- Checked and formatted the cells with proper datatypes
- Missing Values in each column
- used filter function in excel to identify missing/null values
- conditional foramtting to identify and highlight the missing values
- Removing the duplicates
- Handling the missing values by using find & select inbuilt function in excel
- Replaced the blank space with NULL for the column that is TEXT Datatype
- Replaced the blank space with a default date value for the column that is DATE Datatype
- Tools Used⚙️:PostgreSQL
- Creating and importing dataset to postgreSQL
- Selecting and viewing the dataset
- Total Number of reviews
- Total number of positive reviews
- Total number of negative reviews
- Average App Rating
- Review with highest thump count
- Users with high number of reviews
- Average Sentiment Score
- Count of User Reviews that are not replied by company
- Total number of reviews that are replied by company
- Reviews that are not replied by company
- Reviews with sentiment negative to analyze the issues
- Distribution of ratngs and it's total count
- Reviews with the text like 'Challenges'
- Total number of reviews per year
- Reviews with the text like 'Personalized diet'
- Latest 5 reviews and date
from google_play_scraper import app, Sort ,reviews_all
from app_store_scraper import AppStore
import pandas as pd
import numpy as np
import json,os,uuid
g_reviews=reviews_all(
'fit.sugar.android',
sleep_milliseconds=0,
lang='en',
country='us',
sort=Sort.NEWEST,
)
g_df=pd.DataFrame(np.array(g_reviews),columns=['review'])
g_df2=g_df.join(pd.DataFrame(g_df.pop('review').tolist()))
g_df2.drop(columns={'userImage','reviewCreatedVersion'},inplace=True)
g_df2.rename(columns={'reviewId':'ID','userName':'Username','content':'Review','score':'AppRating','thumbsUpCount':'ThumbsUpCount','at':'ReviewTime','replyContent':'CompanyReply','repliedAt':'ReplyTime','appVersion':'AppVersion'},inplace=True)
g_df2
g_df2['AppRating'].mean()
!pip install tensorflow
import tensorflow as tf
!pip install ipykernel
import tensorflow as tf
print(tf.__version__)
from transformers import pipeline
sentiment_analysis = pipeline("sentiment-analysis",model="siebert/sentiment-roberta-large-english")
print(sentiment_analysis("I love this!"))
g_df2.dtypes
g_df2['Review']=g_df2['Review'].astype('str')
g_df2['result']=g_df2['Review'].apply(lambda x: sentiment_analysis(x))
g_df2.head()
g_df2['sentiment']=g_df2['result'].apply(lambda x: (x[0]['label']))
g_df2['score']=g_df2['result'].apply(lambda x: (x[0]['score']))
g_df2.head()
g_df2['score'].mean()
g_df2['sentiment'].value_counts()
g_df2['sentiment'].value_counts(normalize=True)
import plotly.express as px
fig=px.histogram(g_df2,x='sentiment',color='sentiment', text_auto=True)
fig.show()
g_df2.to_csv("C:\\Users\\hp\\Desktop\\newfile.CSV")
g_df2=pd.read_csv("C:\\Users\\hp\\Desktop\\newfile.csv")
g_df2
CREATE TABLE SF1 (
ID VARCHAR(50),
Username VARCHAR(50),
Review VARCHAR(5000),
AppRating INT,
ThumbsUpCount INT,
ReviewTime DATE,
CompanyReply VARCHAR(5000),
ReplyTime DATE,
sentiment VARCHAR(10),
score NUMERIC );
SELECT * FROM SF1
SELECT COUNT(*) AS total_reviews
FROM sf1;
SELECT COUNT(*)
FROM SF1
WHERE sentiment='POSITIVE';
SELECT COUNT(*)
FROM SF1
WHERE sentiment='NEGATIVE';
SELECT AVG(AppRating) AS average_rating
FROM sf1
SELECT Review, ThumbsUpCount
FROM sf1
ORDER BY ThumbsUpCount DESC
LIMIT 1;
SELECT Username, COUNT(*) AS review_count
FROM sf1
GROUP BY Username
ORDER BY review_count DESC
LIMIT 5;
SELECT AVG(score) AS average_sentiment_score
FROM sf1;
SELECT COUNT(*)AS not_replied
FROM SF1
WHERE companyreply='NULL';
SELECT id,review
FROM SF1
WHERE companyreply='NULL';
SELECT COUNT(*)AS replied
FROM sf1
WHERE id NOT IN
(SELECT id
FROM SF1
WHERE companyreply='NULL');
SELECT review
FROM sf1
WHERE sentiment = 'NEGATIVE';
SELECT AppRating, COUNT(*) AS rating_count
FROM sf1
GROUP BY AppRating
ORDER BY rating_count DESC;
SELECT id,review
FROM sf1
WHERE Review LIKE '%challenges%';
SELECT EXTRACT(YEAR FROM ReviewTime) AS review_year, COUNT(*) AS review_count
FROM sf1
GROUP BY review_year
ORDER BY review_year;
SELECT id,review
FROM sf1
WHERE Review LIKE '%personalized diet%';
SELECT review,reviewtime
FROM sf1
ORDER BY ReviewTime DESC
LIMIT 5;
- Tools Used⚙️:Microsoft Power BI
Based on the analysis conducted in the project, the following findings and suggestions can be derived:
- Understanding User Issues: By analyzing the negative reviews, we gained insights into user concerns and issues. This enables us to address them more effectively, leading to improved user satisfaction.
- Identifying Review Trends: The project helped in identifying trends in user reviews, such as common topics mentioned in negative feedback or frequently praised aspects in positive reviews. This information can guide product development and marketing strategies.
- Enhancing Customer Trust: Responding promptly to customer reviews demonstrates attentiveness and care towards users' experiences. It fosters trust and loyalty among customers, showing them that their feedback is valued and acted upon.
- Improving Customer Engagement: Engaging with customers through timely replies to reviews creates a positive interaction and strengthens the relationship between the company and its users. It encourages ongoing dialogue and fosters a sense of community around the product.
- Leveraging Feedback for Improvement: Analyzing user feedback provides valuable insights for product improvement and feature development. By addressing user concerns and implementing suggestions, the company can enhance the overall user experience and stay competitive in the market.
- Understanding Pain Points: Negative reviews often highlight areas where users are facing challenges or experiencing dissatisfaction with the product or service. By analyzing these reviews, we can pinpoint specific pain points and areas for improvement.
- Proactive Issue Resolution: Identifying and addressing negative feedback promptly demonstrates a proactive approach to problem-solving. By acknowledging user concerns and taking steps to resolve issues, the company can prevent further escalation and mitigate potential damage to its reputation.
- Opportunity for Improvement: Negative reviews present valuable opportunities for improvement. By listening to user feedback and incorporating suggestions for enhancement, the company can iteratively improve its products or services, ultimately leading to a better user experience and increased customer satisfaction.
- Building Trust and Credibility: Transparently addressing negative feedback shows users that their concerns are taken seriously and that the company is committed to delivering a high-quality product or service. This builds trust and credibility with users, fostering stronger relationships and brand loyalty over time.
- Continuous Feedback Loop: By monitoring and analyzing negative reviews on an ongoing basis, the company can establish a continuous feedback loop for product improvement. This allows for agile and responsive development, ensuring that user needs and preferences are always top of mind.