# **Project Name**    -
###PLAY STORE APP REVIEW ANALYSIS


#### **Project Type**    - **EDA**
#### **Contribution**    - **Individual**


# **Project Summary -**

####The project involves exploring and analyzing data from the Play Store apps, which holds substantial potential for app-making businesses. The dataset includes information about various apps, such as category, rating, size, etc. Additionally, there is another dataset containing customer reviews of Android apps. The goal is to extract actionable insights that can drive success in the Android market.

####Key components of the project include:

####Dataset Description: The datasets consist of information about Play Store apps and customer reviews, offering a rich source for analysis. Attributes such as category, rating, size, etc., are available for each app.

####Main Libraries to be Used:

####Pandas for data manipulation and aggregation.
####Matplotlib and Seaborn for visualization, exploring at least 5 different visualizations to understand behavior with respect to the target variable.
####NumPy for computationally efficient operations.
####Project Architecture: The project likely involves a structured approach to data analysis. This may include steps such as data cleaning, exploration, and visualization using Pandas, Matplotlib, and Seaborn. The aim is to uncover key factors that contribute to app engagement and success in the Android market.

####In summary, the project focuses on leveraging Play Store apps data to derive actionable insights for developers, with an emphasis on using key Python libraries for data manipulation and visualization to understand factors influencing app engagement and success in the Android market.







# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


####In the dynamic landscape of the Android app market, extracting actionable insights from the Play Store apps data is crucial for app-making businesses to attain success. The dataset comprises valuable information about each app, encompassing attributes such as category, rating, size, and more. Additionally, a dataset containing customer reviews further enriches the available data.

####The primary challenge lies in systematically exploring and analyzing this extensive dataset to identify the key factors responsible for app engagement and overall success in the Android market.

#### **Define Your Business Objective?**

####My business objective is to leverage data-driven insights to guide app-making businesses in optimizing their strategies, enhancing their offerings, and ultimately achieving success in the Android app market. The project aims to bridge the gap between raw data and actionable intelligence, providing a foundation for informed decision-making and strategic positioning in the dynamic Android app ecosystem.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from datetime import date
%matplotlib inline

### For unwanted warnings

### Dataset Loading

In [None]:
# Load Dataset
# FIRST WE MOUNTING DRIVE
from google.colab import drive
drive.mount('/content/drive')


#loading both data set
path_play_store_data = '/content/Play Store Data final.csv'
df = pd.read_csv(path_play_store_data)


path_user_review = '/content/User Reviews (1).csv'
df2 = pd.read_csv(path_user_review)

### Dataset First View

In [None]:
# Dataset First Look of play_store_data
df.head(7)

In [None]:
# now looking for user_review
df2.head(7)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count for play_store_data
df.shape

In [None]:
# Dataset Rows & Columns count for user_review
df2.shape

### Dataset Information

In [None]:
# Dataset Info for play_store_data
df.info()

In [None]:
# Dataset Info for user_review
df2.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count for paly_store_data
print(len(df[df.duplicated()]))

In [None]:
# Dataset Duplicate value Count for user_review
print(len(df2[df2.duplicated()]))

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Checking the sum of null values for each column for play_store_data
df.isna().sum()

In [None]:
#Checking the sum of null values for each column for user_review
df2.isna().sum()

In [None]:
# Visualizing the missing values for play_store_data
df.isna().any()

In [None]:
# Visualizing the missing values for user_review
df2.isna().any()

### What did you know about your dataset?

* In both dataset, we saw that how many rows and coulmns is present in the each datset with the help of shape function.
* We also saw that how much information is showing in each dataset.
* Furthermore we see that 483 duplicate value present in play_store_data.Moreover we saw that 33616 duplicate value present in user_review dataset.
*Also, we checked missing values/null values in each dataset and visualized the missing values in each dataset.

## ***2. Understanding Your Variables***

In [None]:
# Columns for play_store_data
df.columns

In [None]:
# Columns for user_review
df2.columns

In [None]:
# Describe for play_store_data
df.describe()

In [None]:
# Describe for uesr_review
df2.describe()

### Variables Description

We can find the data source and details here. The dataset consists of 2 tables. The first table called ‘Play Store Data.csv’ consists of 13 columns ( app, category, rating, reviews, size, installs, type, price, content rating, genres). Meanwhile, the second table called ‘googleplaystore_User_Reviews.csv’ consists of 5 columns (app, translated review, sentiment, sentiment polarity, sentiment subjectivity).

Here is information on what columns represent in Play store Data :

App - Indivisual name of the applications.

Category - Category of each applications that available in play store.

Rating - Play Store ratings for apps showing the proportional number of 1-5 star reviews, are calculated based on the app's current quality ratings from user reviews, rather than the lifetime average value of user reviews, unless the app has very few ratings.

Reviews - App reviews are the individual comments users can leave under an app.

Size - The amount of space required to install your app.
Installs - The number of times the app was installed from the store regardless to any events in the past.

Installs - The number of times the app was installed from the store regardless to any events in the past.

Type - Type is basically represent either the app is paid or free.

Price - Price is the amount to be paid if any app is not available for free.or With some apps, you can buy additional content or services within the app.

Content Rating - Content ratings are used to describe the minimum maturity level of content in apps. However, content ratings don't tell you whether an app is designed for users of a specific age. Ratings are typically based on a number of factors, including sexual content, violence, drugs, gambling, and profane language.
Genres - Google Play Store app genres refer to the different categories or types of apps available in the Google Play Store.

Last Updated - It indicates the date and time when the app received its most recent update or release. App updates can include bug fixes, performance improvements, new features, design changes, or compatibility enhancements.

Current Versions - The "current versions" of apps in the Google Play Store refer to the most up-to-date versions available for each app. App developers regularly release updates to improve the functionality, fix bugs, enhance security, and introduce new features. These updates ensure that users have access to the best possible experience with the app.

Android Versions - When we refer to "Android versions" in the context of Google Play Store apps, it means the minimum and target Android versions specified by the app developers. Each Android app is designed to work on a specific range of Android operating system versions. The minimum Android version indicates the oldest version of the Android operating system that the app is compatible with, while the target Android version represents the version that the app is primarily optimized for.

Here is information on what columns represent in User Reviews :

App - User reviews in the context of apps refer to the feedback and opinions shared by individuals who have used the app. These reviews are typically found in app stores like the Apple App Store or Google Play Store, where users can rate and leave comments about their experiences with an app. User reviews play a significant role in helping other users make informed decisions about whether to download and use the app.

Translated_Review - When we talk about "user reviews translated reviews" in the context of apps, it refers to the process of translating user reviews from one language to another. User reviews are an essential source of feedback and information for apps, and translating them allows app developers and users to understand and engage with reviews in different languages.

Sentiment - Sentiment analysis in the context of user reviews refers to the process of determining the emotional tone or sentiment expressed in the reviews. It involves analyzing the text of the reviews to determine whether the sentiment conveyed is positive, negative, or neutral. This analysis can provide valuable insights into how customers perceive and feel about a product, service, or brand.

Sentiment_Polarity - Sentiment analysis in the context of user reviews refers to the process of determining the emotional tone or sentiment expressed in the reviews. It involves analyzing the text of the reviews to determine whether the sentiment conveyed is positive, negative, or neutral. This analysis can provide valuable insights into how customers perceive and feel about a product, service, or brand.

Sentiment_Subjectivity - Subjectivity in the context of sentiment analysis refers to the degree of personal opinion, emotion, or judgment expressed in a text. It indicates whether the text contains subjective information based on individual perspectives rather than objective facts. Subjective sentences often convey personal feelings, beliefs, or views, while objective sentences tend to present factual information.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable of play_store_data dataset
df['App'].unique()

In [None]:
df['Category'].unique()

In [None]:
df['Rating'].unique()

In [None]:
df['Reviews'].unique()

In [None]:
df['Size'].unique()

In [None]:
df['Installs'].unique()

In [None]:
df['Type'].unique()

In [None]:
df['Price'].unique()

In [None]:
df['Content Rating'].unique()

In [None]:
df['Genres'].unique()

In [None]:
df['Last Updated'].unique()

In [None]:
df['Current Ver'].unique()

In [None]:
df['Android Ver'].unique()

## 3. ***Data Wrangling***

### Handle Missing Values

In [None]:
df.replace('Varies with device',np.nan,inplace = True)

In [None]:
df.info()

### Drop Duplicate Values

In [None]:
df.shape

In [None]:
df.drop_duplicates(inplace = True)

In [None]:
df.isna().sum()

### Drop Duplicate Values in APP variables

In [None]:
df.drop_duplicates(subset='App',inplace = True)
df['App'].duplicated().sum()

#### In calculating unique values of Category i see 1.9' is wrong value. So I have to see in which rows '1.9' corresponds to.

In [None]:
df[df['Category']=='1.9']

#### I can make NaN category value. At that time i can get a clean data. I used shift() method that is found Pandas library.

In [None]:
df.loc[10472] = df.loc[10472].shift()
df['App'].loc[10472] = df['Category'].loc[10472]
df['Category'].loc[10472] = np.nan
df.loc[10472]


In [None]:
df['Category'].unique()

### Fill the missing values of Rating variables by calculating the mean.

In [None]:
df['Rating'].fillna(df['Rating'].astype(float).mean(),inplace=True)

### In case of Reviews fist replace the 'M' to '000',then 'k' to '' and '1000+' to 1000.Also convert string datatype to float.

In [None]:
df['Reviews']=df.Reviews.str.replace("M","000")
df['Reviews']=df.Reviews.str.replace("K","")
df['Reviews']=df.Reviews.replace("1,000+",1000)
df['Reviews']=df['Reviews'].astype(float)
df['Reviews'].dtype

### In case of Size variables fist replace the 'M' to '000',then 'k' to '' and '1000+' to 1000.Also convert string datatype to float.

In [None]:
df['Size']=df.Size.str.replace("M","000")
df['Size']=df.Size.str.replace("K","")
df['Size']=df.Size.replace("1,000+",1000)
df['Size']=df['Reviews'].astype(float)
df['Size'].dtype

### In case of Installs variables replace ',' to ,'' and '+' to '' and 'Free' to np.nan.Also convert string datatype to float.

In [None]:
df['Installs']=df.Installs.str.replace(",","")
df['Installs']=df.Installs.str.replace("+","")
df['Installs']=df.Installs.replace("Free",np.nan)
df['Installs']=df['Installs'].astype(float)
df['Installs'].dtype

### In case of Price variables replace 'Everyone to np.nan and '$' to ''.Also convert string datatype to float

In [None]:
df['Price']=df.Price.str.replace("$","").astype(float)
df['Price']=df.Price.replace("Everyone",np.nan)
df['Price'].dtype

### In case of Last Updated veriable convert string datatype to datetime.

In [None]:
df['Last Updated']=pd.to_datetime(df['Last Updated'])
df['Last Updated']

### Fill the missing values of Current Ver and Android Ver variables by calculating mode.

In [None]:
df['Current Ver']=df['Current Ver'].fillna(df['Current Ver'].mode()[0])
df['Android Ver']=df['Android Ver'].fillna(df['Android Ver'].mode()[0])

## This is our complete clean dataset of play_store_data dataframe.

In [None]:
df.info()

#Unique values of User_views

In [None]:
df2['App'].unique()

In [None]:
df2['Translated_Review'].unique()

In [None]:
df2['Sentiment_Polarity'].unique()

In [None]:
df2['Sentiment_Subjectivity'].unique()

#Data clean for user_review

### Fill the missing values of Translated Review and Sentiment by using mode value.

In [None]:
df2['Translated_Review']=df2['Translated_Review'].fillna(df2['Translated_Review'].mode()[0])
df2['Sentiment']=df2['Sentiment'].fillna(df2['Sentiment'].mode()[0])

### Fill the missing values of Sentiment Polarity and Sentiment Subjectivity by using median value and convert string datatype to float datatype.

In [None]:
df2['Sentiment_Polarity']=df2['Sentiment_Polarity'].fillna(df2['Sentiment_Polarity'].astype(float).median(),inplace = True)
df2['Sentiment_Subjectivity']=df2['Sentiment_Subjectivity'].fillna(df2['Sentiment_Subjectivity'].astype(float).median,inplace = True)


## This is our complete clean dataset of User_review

In [None]:
df2.info()

### What all manipulations have you done and insights you found?

#### I found , I filled all the null/missing values in both dataset variables . Also I handled all miising values, and droped all unique values.  

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 HOW MANY NUMBER OF APPS IN EACH CATEGORY?

In [None]:
# Chart - 1 visualization code
plt.figure(figsize= (8,8))
sns.countplot(x='Category',data = df)
plt.title('No.Of Applications In Category')
plt.ylabel('No. Of APPS')
plt.xticks(rotation = 90)
plt.show()

##### 1. Why did you pick the specific chart?

'countplot' is specifically designed for counting the occurrences of unique values in a categorical variable. It's useful when you want to understand the distribution of categorical data.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

With the help of this chart we get insight that the most number of people using family category app.
The second most number of app is gaming app.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Gained insights play a pivotal role in creating a positive business impact for a Play Store app. By understanding user behavior, preferences, and interactions, developers can enhance the overall user experience, prioritize and optimize popular features, and tailor marketing efforts to specific user segments. Swift identification and resolution of technical issues based on insights contribute to user satisfaction, preventing negative reviews. Moreover, insights guide strategic decisions, such as refining the app's monetization strategy and staying competitive in the market through continuous improvement. This data-driven approach ensures that the app evolves to meet user needs, ultimately fostering higher retention rates and positive engagement, crucial elements for long-term success on the Play Store.

#### Chart - 2    A pairwise plot between all the quantitative variables

In [None]:
# Chart - 2 visualization code
Rating = df['Rating']
Size = df['Size']
Installs = df['Installs']
Reviews = df['Reviews']
Type = df['Type']
Price = df['Price']

p = sns.pairplot(pd.DataFrame(list(zip(Rating, Size, np.log(Installs), np.log10(Reviews), Price, Type)),
                        columns=['Rating','Size', 'Installs', 'Reviews', 'Price','Type']), hue='Type')
p.fig.suptitle("Pairwise Plot - Rating, Size, Installs, Reviews, Price",x=0.5, y=1.0, fontsize=16)

##### 1. Why did you pick the specific chart?

'pairplot' is particularly useful for visualizing the relationships between multiple variables in a dataset. It allows you to create scatterplots for all pairs of numerical features, histograms for individual features, and kernel density estimates along the diagonal.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

A pair plot chart, or scatterplot matrix, offers valuable insights into the relationships and patterns within a dataset. By examining the scatterplots on the diagonal, one can quickly gauge the distribution and potential correlation of each variable with itself. The off-diagonal scatterplots provide a comprehensive view of multivariate relationships, aiding in the identification of patterns, dependencies, and potential outliers. Additionally, diagonal histograms offer a glimpse into the distribution shapes of individual variables, helping to detect skewed or non-normal distributions. Pair plots are instrumental in assessing collinearity among variables, guiding decisions on variable transformation, and revealing potential data clusters. In essence, this visualization tool serves as a comprehensive guide for understanding the intricacies of multivariate data, enabling data analysts to make informed decisions about variable relationships and overall data characteristics.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The business impact gained from pair plots is multifaceted and pivotal for data-informed decision-making. Pair plots allow businesses to discern patterns, correlations, and outliers within their datasets, empowering them to make strategic decisions with a profound understanding of their data. By identifying relationships between variables, businesses can optimize marketing strategies, ensuring targeted and effective campaigns. Enhanced user experiences can be crafted by understanding how different features correlate with user interactions, leading to higher customer satisfaction and retention. Pair plots are instrumental in risk mitigation, allowing businesses to spot outliers early and take proactive measures. Moreover, the data-driven insights gained from pair plots guide product development efforts, ensuring that resources are directed towards features and attributes that have the most significant impact on user satisfaction and engagement. In essence, pair plots contribute to streamlined operations, improved customer experiences, and overall business success by fostering a deeper comprehension of the intricate relationships within the data.

#### Chart - 3 Correlation of variable for play store data.

In [None]:
# Chart - 3 visualization code
plt.figure(figsize = (10,10))
sns.heatmap(df.corr(), annot= True)
plt.title('Corelation Heatmap for Playstore Data')

##### 1. Why did you pick the specific chart?


A heatmap is a type of chart commonly used in data visualization, and it is effective for representing the magnitude of a phenomenon as colors in a two-dimensional space.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

A correlation heatmap visually represents the correlation matrix of a dataset, offering insights into the relationships between variables. The heatmap provides a quick overview of the strength and direction of correlations, with darker colors indicating stronger associations. This tool is valuable for identifying significant positive or negative correlations, detecting multicollinearity among independent variables, and aiding in variable selection for predictive modeling. The heatmap can reveal patterns, guide further analysis, and highlight potential outliers or changes in correlation structures over time or across different groups. However, it's crucial to remember that correlation does not imply causation, and additional statistical methods and domain knowledge are necessary for a comprehensive understanding of the data.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

A positive business impact that can be derived from a correlation heatmap involves making informed and strategic decisions based on a deeper understanding of relationships within the data. For instance, identifying strong positive correlations between certain marketing strategies and increased sales could guide resource allocation towards those successful campaigns. The heatmap can aid in optimizing business processes by revealing correlations between different operational variables, leading to improved efficiency. In the context of customer satisfaction, finding positive correlations between specific service offerings and high customer ratings can inform targeted improvements, potentially enhancing overall customer experience. Ultimately, leveraging insights from a correlation heatmap empowers businesses to make data-driven decisions, allocate resources more effectively, and enhance overall performance and profitability.







#### Chart - 4 Top categories on Google Playstore?

In [None]:
df.groupby("Category")["App"].count().sort_values(ascending= False)

x = df['Category'].value_counts()
y = df['Category'].value_counts().index
x_list = []
y_list = []
for i in range(len(x)):
    x_list.append(x[i])
    y_list.append(y[i])

In [None]:
#No. of apps belonging to each category in the playstore
plt.figure(figsize=(20,10))
plt.xlabel('Number of Apps')
plt.ylabel('App Categories')
graph = sns.barplot(y = x_list, x = y_list, palette= "tab10")
graph.set_title("Top categories on Playstore")
graph.set_xticklabels(graph.get_xticklabels(), rotation= 90, horizontalalignment='right',)

##### 1. Why did you pick the specific chart?


Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable.
Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 5 What is the ratio of number of Paid apps and Free apps?

In [None]:
# Chart - 5 visualization code
data = df['Type'].value_counts()
labels = ['Free', 'Paid']

#pie chart
plt.figure(figsize=(7,7))
colors = ["orange","blue"]
explode=(0.01,0.1)
plt.pie(data, labels = labels, colors = colors, autopct='%.2f%%',explode=explode)
plt.title('Distribution of Paid and Free apps')
plt.legend()


##### 1. Why did you pick the specific chart?

Pie charts are used in data visualization for specific scenarios where you want to represent the distribution of a categorical variable as a whole.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

---



Pie charts are useful for providing a visual representation of the composition of a whole, showcasing the distribution of parts as percentages of the total. They offer insights into the relative proportions of different categories within a dataset, making it easy to identify dominant or minority components. Pie charts are effective for conveying a snapshot of categorical data, such as market share, budget allocations, or demographic breakdowns. They allow viewers to quickly discern the significance of each category and understand the overall structure of the data set. However, it's important to note that pie charts are most suitable when dealing with a small number of categories, as complex datasets with numerous segments may lead to difficulties in interpretation. Despite potential limitations, pie charts offer a straightforward and accessible way to communicate proportional relationships in a visually compelling manner.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pie charts can have a positive business impact by providing a visually engaging and easily understandable representation of the distribution of data. They are particularly effective in conveying the relative proportions of different components within a whole, making complex information more accessible to a broad audience. In business contexts, pie charts are often used to illustrate market share, budget allocations, or sales distribution across various products or regions. Their simplicity aids in quick comprehension, allowing decision-makers to identify key trends, allocate resources strategically, and make informed decisions. Additionally, pie charts can be powerful tools for presentations and reports, enhancing communication and fostering a shared understanding among stakeholders. Their visual appeal and intuitive nature contribute to more effective data communication and analysis in various business scenarios.

#### Chart - 6 Which category of Apps from the Content Rating column are found more on playstore ?

In [None]:
# Chart - 6 visualization code
data = df['Content Rating'].value_counts()
labels = ['Everyone', 'Teen', 'Everyone 10+', 'Mature 17+','Adults only 18+', 'Unrated']

#create pie chart
plt.figure(figsize=(7,7))
explode=(0,0.1,0.1,0.1,0.0,1.3)
colors = ['C1', 'red', 'skyblue', 'green', 'purple', 'black']
plt.pie(data, labels = labels, colors = colors, autopct='%.2f%%',explode=explode)
plt.title('Content Rating')
plt.legend(bbox_to_anchor=(0.9, 0, 0.5, 1))

##### 1. Why did you pick the specific chart?

Pie charts are used in data visualization for specific scenarios where you want to represent the distribution of a categorical variable as a whole.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Pie charts are useful for providing a visual representation of the composition of a whole, showcasing the distribution of parts as percentages of the total. They offer insights into the relative proportions of different categories within a dataset, making it easy to identify dominant or minority components. Pie charts are effective for conveying a snapshot of categorical data, such as market share, budget allocations, or demographic breakdowns. They allow viewers to quickly discern the significance of each category and understand the overall structure of the data set. However, it's important to note that pie charts are most suitable when dealing with a small number of categories, as complex datasets with numerous segments may lead to difficulties in interpretation. Despite potential limitations, pie charts offer a straightforward and accessible way to communicate proportional relationships in a visually compelling manner.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pie charts can have a positive business impact by providing a visually engaging and easily understandable representation of the distribution of data. They are particularly effective in conveying the relative proportions of different components within a whole, making complex information more accessible to a broad audience. In business contexts, pie charts are often used to illustrate market share, budget allocations, or sales distribution across various products or regions. Their simplicity aids in quick comprehension, allowing decision-makers to identify key trends, allocate resources strategically, and make informed decisions. Additionally, pie charts can be powerful tools for presentations and reports, enhancing communication and fostering a shared understanding among stakeholders. Their visual appeal and intuitive nature contribute to more effective data communication and analysis in various business scenarios.

#### Chart - 7 Category App's have most number of installs.

In [None]:
# Chart - 7 visualization code
a = df.groupby(['Category'])['Installs'].sum().sort_values()
a.plot.barh(figsize=(15,10), color = 'g')
plt.ylabel('Total app Installs')
plt.xlabel('App Categories')
plt.xticks()
plt.title('Total app installs in each category')

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 8 Average rating of the apps

In [None]:
# Chart - 8 visualization code
df['Rating'].value_counts().plot.bar(figsize=(20,8), color = 'b' )
plt.xlabel('Average rating')
plt.ylabel('Number of apps')
plt.title('Average rating of apps in Playstore')
plt.legend()

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 9 Top apps that are of free type.

In [None]:
# Chart - 9 visualization code
free_df = df[df['Type'] == 'Free']

# creating top_free_df
top_free_df = free_df[free_df['Installs'] == free_df['Installs'].max()]
top10free_apps=top_free_df.nlargest(10, 'Installs', keep='first')
top10free_apps.head(10)


In [None]:
# TOP FREE APPS
top_free_df['App']

In [None]:
# Categories in which the top free apps belong to
top_free_df['Category'].value_counts().plot.bar(figsize=(20,6), color= ('violet','blue'))
plt.xlabel('Category')
plt.ylabel('Number of apps')
plt.title('Categories in which the top 20 free apps belong')
plt.xticks(rotation=45)
plt.legend()

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 10 Top apps that are of paid type.

In [None]:
# Chart - 10 visualization code
paid_df=df[df['Type']=='Paid']

paid_df.groupby('Price')['App'].count().sort_values(ascending= False).plot.bar(figsize = (20,6), color = 'lightcoral')

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 11 Percentage of Review Sentiments.

In [None]:
# Chart - 11 visualization code
counts = list(df2['Sentiment'].value_counts())
labels = 'Positive Reviews', 'Negative Reviews','Neutral Reviews'
plt.pie(counts, labels=labels, explode=[0.01, 0.05, 0.05], shadow=True, autopct="%.2f%%")
plt.title('Percentage of Review Sentiments')
plt.legend(bbox_to_anchor=(0.9, 0, 0.5, 1))
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are used in data visualization for specific scenarios where you want to represent the distribution of a categorical variable as a whole.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Pie charts are useful for providing a visual representation of the composition of a whole, showcasing the distribution of parts as percentages of the total. They offer insights into the relative proportions of different categories within a dataset, making it easy to identify dominant or minority components. Pie charts are effective for conveying a snapshot of categorical data, such as market share, budget allocations, or demographic breakdowns. They allow viewers to quickly discern the significance of each category and understand the overall structure of the data set. However, it's important to note that pie charts are most suitable when dealing with a small number of categories, as complex datasets with numerous segments may lead to difficulties in interpretation. Despite potential limitations, pie charts offer a straightforward and accessible way to communicate proportional relationships in a visually compelling manner.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pie charts can have a positive business impact by providing a visually engaging and easily understandable representation of the distribution of data. They are particularly effective in conveying the relative proportions of different components within a whole, making complex information more accessible to a broad audience. In business contexts, pie charts are often used to illustrate market share, budget allocations, or sales distribution across various products or regions. Their simplicity aids in quick comprehension, allowing decision-makers to identify key trends, allocate resources strategically, and make informed decisions. Additionally, pie charts can be powerful tools for presentations and reports, enhancing communication and fostering a shared understanding among stakeholders. Their visual appeal and intuitive nature contribute to more effective data communication and analysis in various business scenarios.

#### Chart - 12 Apps with the highest number of positive reviews.

In [None]:
# Chart - 12 visualization code
positive_ur_df=df2[df2['Sentiment']=='Positive']
positive_ur_df.groupby('App')['Sentiment'].value_counts().nlargest(10).plot.barh(figsize=(10,8),color='salmon').invert_yaxis()
plt.title("Top 10 positive review apps")
plt.xlabel('Total number of positive reviews')
plt.legend()

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

#### Chart - 13 Apps with the highest number of negative reviews.

In [None]:
# Chart - 13 visualization code
negative_ur_df=df2[df2['Sentiment']=='Negative']
negative_ur_df.groupby('App')['Sentiment'].value_counts().nlargest(10).plot.barh(figsize=(15,8),color='tomato').invert_yaxis()
plt.title("Top 10 negative review apps")
plt.xlabel('Total number of negative reviews')
plt.legend()

##### 1. Why did you pick the specific chart?

Bar plots are effective for comparing the values of different categories. Each category is represented by a bar, and the height of the bar corresponds to the value of the variable. Bar plots are useful for displaying the frequency distribution of categorical data. You can visualize how often each category appears in the dataset.That's why i chose this chart.

##### 2. What is/are the insight(s) found from the chart?

Bar plots offer valuable insights by visually representing data in a clear and concise manner. These visualizations facilitate the comparison of different categories, allowing for the identification of trends, differences, and outliers within the dataset. Bar plots are particularly effective for showcasing the distribution of data among categories, helping to reveal patterns and trends over time or across subgroups. They serve as powerful tools for communicating complex information to both technical and non-technical audiences, making it easy to understand and interpret the relative magnitudes of various data points. Whether assessing sales performance, analyzing survey responses, or benchmarking against goals, bar plots provide a versatile and intuitive means of deriving actionable insights for informed decision-making in business contexts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Barplots can have a positive business impact by providing a clear and intuitive visualization of data, enabling effective decision-making and communication within an organization. These visual representations are particularly valuable for summarizing and comparing categorical data, such as sales performance across different products, regions, or time periods. Business leaders can quickly grasp trends, identify successful strategies, and pinpoint areas for improvement. Barplots enhance data-driven decision-making by making complex information accessible to a broader audience, facilitating discussions, and fostering a shared understanding of key business metrics. Additionally, the visual impact of barplots can be instrumental in presentations and reports, helping stakeholders absorb and retain crucial information, ultimately contributing to more informed and strategic business decisions.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



 From my opinion , i would suggest do all these things for positive impact :

1. Monitor Analytics:

Use analytics tools to track user behavior within the app. Understand where users drop off and make improvements to enhance the user experience.

2. Encourage Positive Reviews:

Encourage satisfied users to leave positive reviews. Positive reviews not only boost your app's credibility but also improve its visibility in the app store.

3. Utilize Social Media:

Leverage social media platforms to create awareness about your app. Share updates, features, and engage with your audience on platforms like Twitter, Facebook, and Instagram.

4. Run Targeted Marketing Campaigns:

Use digital marketing channels such as Google Ads, Facebook Ads, or other platforms to target specific user demographics and drive installations.

5. Focus on User Retention:

Regularly update your app with new features and improvements to keep users engaged.
Implement push notifications wisely to re-engage users without being intrusive.

6. Collaborate with Influencers:

Nowdays you see influencers gets more attraction from user side so,Partner with influencers or bloggers who can reach your target audience and promote your app.




# **Conclusion**

1.Rating

Most of the apps have rating in between 4 and 5.

Most numbers of apps are rated at 4.3

Categories of apps have more than 4 average rating.item

2.Size

Maximum number of applications present in the dataset are of small size.

3.Installs

Majority of the apps come into these three categories, Family, Game, and Tools.

Maximum number of apps present in google play store come under Family, Game and tools but as per the installation and requirement in the market plot, scenario is not the same. Maximum installed apps comes under Game, Communication, Productivity and Social.

Subway Surfers, Facebook, Messenger and Google Drive are the most installed apps.

4.Type(Free/Paid)

About 92% apps are free and 8% apps are of paid type.

The category ‘Family’ has the highest number of paid apps.

Free apps are installed more than paid apps.

The app “I’m Rich — Trump Edition” from the category ‘Lifestyle’ is the most costly app priced at $400

5.Content Rating

Content having Everyone only has most installs, while unrated and Adults only 18+ have less installs.

6.Reviews

Number of installs is positively correlated with reviews with correlation 0.64. Sentiment Analysis

7.Sentiment

Most of the reviews are of Positive Sentiment, while Negative and Neutral have low number of reviews.

8.Sentiment Polarity / Sentiment Subjectivity

Collection of reviews shows a wide range of subjectivity and most of the reviews fall in [-0.50,0.75] polarity scale implying that the extremely negative or positive sentiments are significantly low. Most of the reviews show a mid-range of negative and positive sentiments.

Sentiment subjectivity is not always proportional to sentiment polarity but in maximum number of case, shows a proportional behavior, when variance is too high or low.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***