<a href="https://colab.research.google.com/github/Umesh2851997/Playstore_project/blob/main/EDA_Play_Store_App_Review_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Play Store App Review Analysis



##### **Project Type**    - EDA (Exploratory Data Analysis)
##### **Contribution**    - Team
##### **Team Member 1 -**  Umesh Makkar
##### **Team Member 2 -**  Soni Verma
##### **Team Member 3 -**  Tanmay

# **Project Details -**

The Play Store apps data has enormous potential to drive app-making businesses to success. Actionable insights an be drawn for developers to work on and capture the Android market. Each app(row) has values for category, ratin, size, and more. Another dataset contains customer reviews of teh android apps. Explore and analyse the data to discover key factors responsible for app engagement and success.

# **GitHub Link -**

https://github.com/Umesh2851997
---



# **Problem Summary**


"Many businesses and developers face challenges in understanding user sentiments, feedback, and preferences from the vast amount of reviews posted on the Google Play Store for their mobile applications. Extracting meaningful insights from these reviews manually is time-consuming and often lacks accuracy. As a result, there is a need to develop an automated solution using natural language processing (NLP) and sentiment analysis techniques to efficiently analyze app reviews. The objective is to gain actionable insights that can help improve app quality, enhance user satisfaction, prioritize feature development, and make data-driven decisions to boost user acquisition and retention. The solution should provide an efficient, scalable, and accurate means of processing and categorizing large volumes of reviews to aid in brand reputation management, competitive analysis, and product roadmap planning."

#### **Define Your Business Objective?**

The business objective of conducting Play Store app review analysis is to gain valuable insights and feedback from users who have used or are currently using a particular app. By analyzing the reviews and feedback left by users, businesses can achieve several goals:

1 Improve App Quality: App reviews often contain feedback about bugs, crashes, and other issues. Analyzing these reviews can help developers identify problems and make improvements to enhance the overall app quality.

2 User Satisfaction: Understanding user sentiments and preferences expressed in reviews can help businesses gauge user satisfaction levels. Positive reviews can highlight features users love, while negative reviews can shed light on areas that need improvement.

3 Feature Prioritization: Analyzing reviews can help businesses identify popular features and functionalities that users appreciate the most. This data can be used to prioritize future development efforts.

4 Competitive Analysis: Comparing app reviews with those of competitors can offer valuable insights into strengths and weaknesses. This can help businesses identify opportunities to differentiate themselves and improve their market position.

5 User Acquisition and Retention: Positive reviews and high ratings can attract more users to download the app, improving user acquisition. Additionally, addressing negative reviews and resolving user issues can lead to higher user retention rates.

# ***Intorducation of Data***

Mobile applications are widely available. They are simple to make and may be profitable. These two reasons have led to an increase in the number of apps being created. In this notebook, we'll compare more than 10,000 Google Play apps from various categories to conduct a thorough analysis of the Android app industry. In order to develop strategies to promote growth and retention, we will search the data for insights.

Let's examine the data, which consists of the following two files:

datasets/play_store_data.csv

This file contains all the details of the apps on Google Play. There are 13 features that describe a given app.

*   **App**: Name of the app
*   **Category**: Category of the app. Some examples are: ART_AND_DESIGN, FINANCE, COMICS, BEAUTY etc.
*   **Rating**: The current average rating (out of 5) of the app on Google Play
*   **Reviews**: Number of user reviews given on the app
*   **Size**: Size of the app in MB (megabytes)
*   **Installs**: Number of times the app was downloaded from Google Play
*   **Type**: Whether the app is paid or free
*   **Price**: Price of the app in US$
*   **Content Rating**: A content rating (also known as maturity rating) rates the suitability of TV broadcasts, movies, comic books, or video games to its audience.To show which age group is suitable to view media and entertainment.
*   **Genres**: A category of artistic, musical, or literary composition characterized by a particular style, form, or content
*   **Last Updated**: Date on which the app was last updated on Google Play
*   **Current Ver**: Current Version means a version of the software that is currently being supported by its publisher.
*   **Android Ver**: Android versions (codenames) are used to describe the various updates for the open source Android mobile operating system.

datasets/user_reviews.csv

This document showcases a selection of the top 100 first-user evaluations for each app. Each category's distribution of favorable and unfavorable reviews has been pre-processed and run through a sentiment analyzer.

* **App**: Name of the app on which the user review was provided. Matches the App column of the play_store_data.csv file
* **Translated Review**: The pre-processed user review text.
* **Sentiment**: Sentiment category of the user review - Positive, Negative or Neutral.
* **Sentiment Polarity**: Sentiment score of the user review.



### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

### Dataset Loading

In [None]:
# Mount the Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Read the data
playstore_data = "/content/drive/MyDrive/My_new_numpy/Play Store Data.csv"
user_data = "/content/drive/MyDrive/My_new_numpy/User Reviews.csv"

playstoredf = pd.read_csv(playstore_data)
userdf = pd.read_csv(user_data)



### Datasets First View

In [None]:
playstoredf.head() #Display the first 5 row of the Playstoredf

In [None]:
#Display the first 5 row of the userdf
userdf.head()

### Explore the Both Datasets

In [None]:
#Play Store App Data
print("Total number of row and columns in the dataset", playstoredf.shape)
print("Total number of Unique apps in the dataset:", playstoredf['App'].nunique())

In [None]:
#user reviews Data
print("Total of row and columns in User review data", userdf.shape)
print("Total number of unique apps in the dataset", userdf['App'].nunique())

In [None]:
# Summary of the playstore Dataset
playstoredf.info()

In [None]:
#summary of user review data
userdf.info()

# Data Cleaning

---

## 1. Check the Null values in dataset

The first step is to find the Missing/Null Values in the Playstore dataset.

In [None]:
playstoredf.isna().sum()

In [None]:
playstoredf.sort_values(by = 'Rating', ascending = False)

In [None]:
#Drop the null Values
playstoredf.dropna(subset =['Rating'], inplace = True)


In [None]:
#Fill the Columns which has <.05 Null values using most comman value of mode
playstoredf['Current Ver'] = playstoredf['Current Ver'].fillna(playstoredf['Current Ver'].mode()[0])
playstoredf['Android Ver'] = playstoredf['Android Ver'].fillna(playstoredf['Android Ver'].mode()[0])
playstoredf['Content Rating'] = playstoredf['Content Rating'].fillna(playstoredf['Content Rating'].mode()[0])

In [None]:
playstoredf.isna().any()

## 2. Check Duplicates

In [None]:
## Drop the duplicates in the "App" column
playstoredf.drop_duplicates(subset='App', inplace=True)
print(playstoredf.shape)

## 2. Check the Outliers

In [None]:
playstoredf['Rating'].unique()

In [None]:
playstoredf[playstoredf['Rating']==19.0]

In [None]:
## Drop the row that has incorrect values
playstoredf.drop([10472], inplace = True)


## 3. Updating the Datatype for following columns for EDA

Installs

In [None]:
playstoredf['Installs'].unique()

In [None]:
#Let's remove some , + from Installs for easy analysis
playstoredf['Installs'] = playstoredf['Installs'].apply(lambda x: x.replace('+', '') if '+' in str(x) else x) #remove '+' to ''
playstoredf['Installs'] = playstoredf['Installs'].apply(lambda x: x.replace(',', '') if ',' in str(x) else x) #remove ',' to ''
playstoredf['Installs'] = playstoredf['Installs'].apply(lambda x:int(x)) #it will convert data type into int64

In [None]:
playstoredf['Installs'].dtype

Size
  

In [None]:
playstoredf['Size'].unique()

In [None]:
#Remove these Characters ('Varies with device',',','+','k') from the Size Columns
playstoredf['Size'] = playstoredf['Size'].apply(lambda x: str(x).replace('Varies with device', 'NaN') if 'Varies with device' in str(x) else x)
playstoredf['Size'] = playstoredf['Size'].apply(lambda x: str(x).replace('M', '') if 'M' in str(x) else x)
playstoredf['Size'] = playstoredf['Size'].apply(lambda x: str(x).replace(',', '') if 'M' in str(x) else x)
playstoredf['Size'] = playstoredf['Size'].apply(lambda x:float(str(x).replace('k', '')) / 1024 if 'k' in str(x) else x) #convert KB to MB

In [None]:
playstoredf['Size'].dtype

Price & Reviews

In [None]:
playstoredf['Price'].unique()

In [None]:
#let's remove $ sign from the Price column
playstoredf['Price'] = playstoredf['Price'].apply(lambda x: str(x).replace('$', '') if '$' in str(x) else x)

In [None]:
# Convert the column types to numeric datatype
playstoredf['Size'] = playstoredf['Size'].apply(lambda x: float(x))
playstoredf['Installs'] = playstoredf['Installs'].apply(lambda x: float(x))
playstoredf['Price'] = playstoredf['Price'].apply(lambda x: float(x))
playstoredf['Reviews'] = playstoredf['Reviews'].apply(lambda x: int(x))

In [None]:
playstoredf.dtypes

In [None]:
playstoredf.describe(include = 'all')

###Identifying the Statistics from the Dataset

In [None]:
# finding numerical data

numeric_data = playstoredf.select_dtypes(include=np.number)
numeric_col = numeric_data.columns


# we will store the numeric features in a variable

print("Numeric Features:", numeric_data.shape)
numeric_data.describe(include='all').T

In [None]:
# Finding the Categorical data

cat_data = playstoredf.select_dtypes(exclude=np.number) #selects data with non-numeric features
cat_col = cat_data.columns

# non-numeric features in a variable

print("Non-Numeric Features:", cat_data.shape)
cat_data.describe(include='all').T

### What all manipulations have you done and insights you found?

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***