# **Project Name**    - Play Store App Review Analysis



# **Project Summary -**

The goal of the Play Store App Review Analysis project is to analyze user reviews from the Google Play Store to gain insights into user sentiment, with analysis of rating we can get the view whether users like the app or not and also exploratory analysis analysis of apps like their cateogry, price, genre, number of installs  whch will give us more idea about these apps. This analysis aims to help app developers, marketers, and stakeholders enhance app performance, improve user satisfaction, and elevate overall app quality.

**There will be two datasets for this project**

1. **Playstore data**


* App: The name of the app.

* Category: The category under which the app is listed in the Play Store (e.g.,
  Games, Productivity).

* Rating: The average user rating of the app (on a scale from 1 to 5).

* Reviews: The number of user reviews that the app has received.

* Size: The size of the app, usually measured in megabytes (MB).

* Installs: The number of times the app has been installed.

* Type: Whether the app is free or paid.

* Price: The cost of the app if it is a paid app. For free apps, this value is
  typically $0.

* Content Rating: The age group for which the app is deemed appropriate (e.g.,
  Everyone, Teen, Mature 17+).

* Genres: The genre(s) of the app (e.g., Action, Puzzle, Finance).

* Last Updated: The date when the app was last updated.

* Current Ver: The current version of the app.

* Android Ver: The minimum version of the Android operating system required to
  run the app.




2. **User Reviews**

* App: The name of the app to which the review belongs.

* Translated_Review: The text of the user review

* Sentiment: The overall sentiment of the review, categorized as positive,
  negative, or neutral.

* Sentiment_Polarity: A numerical score that indicates the positivity or
  negativity of the review, typically ranging from -1 (most negative) to 1(most positive).

* Sentiment_Subjectivity: A numerical score that indicates the subjectivity of
  the review, typically ranging from 0 (most objective) to 1 (most subjective). Subjectivity refers to how much of the review is based on personal opinion, emotion, or bias as opposed to factual information.

  # **Steps of Data Analysis -**

1. **Data Exploration**   Data exploration involves understanding the dataset's structure, the types of data it contains, and identifying any initial patterns or anomalies.

**Steps:**
Load the Data: Import the dataset using a data manipulation library like Pandas.
Understand the Structure: Use functions like head(), info(), and describe() to get a sense of the data.
Initial Visualization: Plot initial graphs to understand the distribution of data, such as histograms for numeric columns and bar charts for categorical columns. <br><br><br>






2. **Data Cleaning**
Data cleaning involves removing or fixing any errors or inconsistencies in the dataset to ensure the analysis is accurate and reliable.

**Steps:**
Remove Duplicates: Ensure no duplicate entries are present.
Handle Missing Values: Address missing data through removal, imputation, or other methods.<br><br><br>







3. **Analyzing the Data**
Data analysis involves deriving meaningful insights from the cleaned data, such as sentiment analysis, identifying trends etc

**Steps:**
Sentiment Analysis: Determine the sentiment of each review.
Topic Modeling: Identify common themes in the reviews.
Trend Analysis: Analyze trends over time.<br><br><br>





4. **Data Visualization**
Data visualization involves creating graphical representations of the data to make the insights easier to understand and communicate.

**Steps:**
Sentiment Distribution: Visualize the distribution of sentiments.
Word Cloud: Create word clouds for positive and negative reviews.
Trend Analysis: Plot trends over time.

Write the summary here within 500-600 words.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

The main business objective of analyzing Play Store data is to extract actionable insights,  understanding user feedback, trends, and behaviors, app developers, marketers, and business stakeholders can enhance user satisfaction, increase app engagement, and ultimately drive revenue growth. Here are the key business objectives of Play Store data analysis:<br><br><br>

Sentiment Analysis: Assess user sentiment to understand their emotional responses to the app and make necessary adjustments to improve satisfaction.<br><br>

Optimal Pricing Strategies: Analyze user feedback on pricing to find the ideal price point for paid apps or in-app purchases.<br><br>

Competitor Benchmarking: Compare your app’s performance and user feedback with competitors to identify unique selling points and areas for improvement.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset

data1 = pd.read_csv('/content/Play Store Data.csv')
data2 = pd.read_csv('/content/User Reviews.csv')

### Dataset First View

In [None]:
# Playstore dataset First Look
data1.head()


In [None]:
# User Review dataset first look

data2.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

print('Playstore dataset - ',data1.shape)
print('\n')
print('User review dataset - ',data2.shape)

### Dataset Information

In [None]:
# Playstore Dataset Info
data1.info()


In [None]:
# User review Dataset Info
data2.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
data1.duplicated().sum()

In [None]:
data2.duplicated().sum()

In [None]:
data1.drop_duplicates(inplace=True)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

data1.isnull().sum()

In [None]:

data2.isnull().sum()


In [None]:
# Visualizing the missing values

plt.bar(data1.isnull().sum().index,data1.isnull().sum().values)
plt.xticks(rotation = 90)
plt.title('Playstore dataset missing values')
plt.show()

In [None]:
plt.bar(data2.isnull().sum().index,data2.isnull().sum().values)
plt.xticks(rotation = 90)
plt.title('User review dataset missing values')
plt.show()

In [None]:
data1.dropna()
data2.dropna()

### What did you know about your dataset?

Playstore dataset containg null values and it also shows duplicate values in user_reviews, but we cannot consider them as duplicate because different person can give same rating for the app.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(data1.columns)

print(data2.columns)

In [None]:
# Dataset Describe
print(data1.describe())
print('\n')
print(data2.describe())

### Variables Description

In [None]:
data1['Type'].unique()

App: The name of the app.

Category: The category under which the app is listed in the Play Store (e.g., Games, Productivity).

Rating: The average user rating of the app (on a scale from 1 to 5).

Reviews: The number of user reviews that the app has received.

Size: The size of the app, usually measured in megabytes (MB).

Installs: The number of times the app has been installed.

Type: Whether the app is free or paid.

Price: The cost of the app if it is a paid app. For free apps, this value is typically $0.

Content Rating: The age group for which the app is deemed appropriate (e.g., Everyone, Teen, Mature 17+).

Genres: The genre(s) of the app (e.g., Action, Puzzle, Finance).

Last Updated: The date when the app was last updated.

Current Ver: The current version of the app.

Android Ver: The minimum version of the Android operating system required to run the app.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for i in data1.columns:
  print(i,' unique values')
  print(data1[i].unique())
  print('\n\n')

In [None]:
for i in data2.columns:
  print(i,' unique values')
  print(data2[i].unique())
  print('\n\n')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
data1.Category.value_counts()

In [None]:
data2.App.value_counts()

In [None]:
# Count of Positive Sentiments
data3 = data2[data2['Sentiment']=='Positive']

data3.App.value_counts()

In [None]:
# Count of Positive Sentiments
data4 = data2[data2['Sentiment']=='Negative']


data4.App.value_counts()

In [None]:
# Find the mean rating category wise
data1.groupby('Category').agg({'Rating':'mean'})


### What all manipulations have you done and insights you found?

Data manipulation includes count of positive and negative sentiments for each app and also find the mean rating category wise

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
data2['App'].value_counts()[0:5].plot.bar()
plt.title('Top 5 apps in terms of most reviews by users')
plt.show()




##### 1. Why did you pick the specific chart?

Bar chart is used to find the top 5 apps having most reviews

##### 2. What is/are the insight(s) found from the chart?

As we can see in the chart Angry bird Classic, 8 Ball pool bone master, Blockpuzzle have most reviews

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.boxplot(data=data2,x='Sentiment_Polarity')
plt.show()



##### 1. Why did you pick the specific chart?

Box plot is used to find the range of the values in the column and it also shows the outliers in the data

##### 2. What is/are the insight(s) found from the chart?

It shows most of the sentments lies between 0 to 0.4

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.histplot(data=data2,x='Sentiment_Polarity',kde=True)
plt.show()


##### 1. Why did you pick the specific chart?

Histogram is used to see the frequency distribution of data

##### 2. What is/are the insight(s) found from the chart?

As we can see in the chart data form a well shaped curved, which means it is equally distributed

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

data2.groupby('App').agg({'Sentiment_Polarity':'mean'}).sort_values(by='Sentiment_Polarity',ascending=False)[0:5].plot.bar()
plt.title('Top 5 apps with high sentiment polarity')
plt.show()



##### 1. Why did you pick the specific chart?

Bar chart is used to find the top 5 apps having high polarity

##### 2. What is/are the insight(s) found from the chart?

With this we get the name of apps with highest polarity

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code
data2.groupby('App').agg({'Sentiment_Polarity':'mean'}).sort_values(by='Sentiment_Polarity',ascending=False)[-5:].plot.bar()
plt.title(' Apps with the least sentiment polarity')
plt.show()

##### 1. Why did you pick the specific chart?

Bar chart is used to find the  5 apps having least polarity

##### 2. What is/are the insight(s) found from the chart?

With this we get the name of apps with least polarity

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

data1.groupby('Type').agg({'Type':'count'})[1:].plot.bar(color='Green')
plt.title('Free and paid apps count on Playstore')
plt.show()

##### 1. Why did you pick the specific chart?

Bar plot is used to find the count of free and paid apps

##### 2. What is/are the insight(s) found from the chart?

Count of free apps are more than paid apps

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
data1['Category'].value_counts().sort_values(ascending=False)[0:5].plot.barh(color='yellow')
plt.title('Top 5 categories of apps')
plt.show()

##### 1. Why did you pick the specific chart?

Horizontal bar chart is used to find count of apps in terms of category

##### 2. What is/are the insight(s) found from the chart?

Family category having the highest apps

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

data1['Genres'].value_counts()[0:5].plot.barh(color='cyan')
plt.title('Top 5 Genres')
plt.show()


##### 1. Why did you pick the specific chart?

Bar chart is used to find the genre count of apps

##### 2. What is/are the insight(s) found from the chart?

Tool Genre has most apps

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code
d2=data1['Genres'].value_counts()[-5:].plot.bar(color='cyan')
plt.title('Genre with least apps')
plt.show()

##### 1. Why did you pick the specific chart?

Bar chart is used to see genre which has least apps

##### 2. What is/are the insight(s) found from the chart?

Arcade Pretend Play has least apps


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code
d2=data1['Category'].value_counts()[-5:-1].plot.bar(color='cyan')
plt.title('Category with least apps')
plt.show()

##### 1. Why did you pick the specific chart?

Bar chart is used  to cateogry having least apps

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Beauty, comic, parenting category has least apps

#### Chart - 11

In [None]:
# Chart - 11 visualization code
data1['Content Rating'].value_counts().plot.bar()
plt.title('Count of apps in terms of with their content rating')
plt.show()

##### 1. Why did you pick the specific chart?

Bar chart is used to count apps in terms of their content rating

##### 2. What is/are the insight(s) found from the chart?

Content rating with Everyone has highest apps

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code
data1.groupby('Category').agg({'Rating':'mean'}).sort_values(by='Category',ascending=False)[0:5].plot.bar()
plt.title('Top 5 categories with highest rating')
plt.show()


##### 1. Why did you pick the specific chart?

Bar chart is used to to see highest rated categories

##### 2. What is/are the insight(s) found from the chart?

Weather, video_players, Sports has highest rated categories

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code
sns.boxplot(data=data1,x='Rating')
plt.show()


##### 1. Why did you pick the specific chart?

Boxplot  is used to see the range of values and also check the outliers

##### 2. What is/are the insight(s) found from the chart?

Rating column values mostly lies between 3.5 to 4 and outliers are also present in this column.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 -

In [None]:

plt.figure(figsize=(8,3))
sns.violinplot(data=data2 , x ='Sentiment_Subjectivity')
plt.show()

##### 1. Why did you pick the specific chart?

Violin chart is same as boxplot, to see the range of values in data

##### 2. What is/are the insight(s) found from the chart?

As we can see Sentiment subjectivity most data lies between o.3 to 0.7

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

sns.pairplot(data2)

##### 1. Why did you pick the specific chart?

A pair plot is used to plot pair-wise relationships between all the variables of a dataset.

##### 2. What is/are the insight(s) found from the chart?

As we can see sentiment polarity and subjectivity form a well shaped curve, which means data is equally distributed.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Least apps comes under beauty, comics, parenting and events cateogry, which means there is not much competition in these categories. With this we can create more quality apps in these categories

2. In same way least apps comes under Arcade, Card: Brain games, Lifestyle, strategy genre, we can take this an opportunity to build more apps in this genre to gain more customers attention

3. Bliter, CareonDemand, Blockcraft are least rated apps, we can review customer comment and imporve these apps on the basis of it.

4. As we can see most of the apps are free on playstore, with that we see that people prefer free apps more rather than paid and average price of app is $13

# **Conclusion**


Popular Categories: Family, Medical, Business and Game apps are the most popular categories.

Average Ratings: The average rating across all apps is approximately
4.1 stars.

Review Insights: Apps with frequent updates and responsive developers tend to have higher ratings and positive reviews.

Free vs. Paid: The majority of apps are free, supported by ads or in-app purchases.

Price Distribution: Paid apps typically range from $0.99 to $4.99, with premium apps and services going higher.

The Play Store ecosystem is dynamic and highly competitive, with continuous growth in diverse categories. Success in this market relies on understanding user needs, providing regular updates, maintaining high quality, and effectively monetizing through ads, in-app purchases, or subscriptions. Keeping an eye on emerging trends and adapting to user feedback are crucial for long-term success.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***