# **Project Name**    -  **PLAY STORE DATA ANALYSIS**






##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary-**

The Google Play Store hosts millions of apps across various categories, serving diverse user needs. Understanding the dynamics of app performance and user engagement is crucial for developers, marketers, and business strategists. This project aims to analyze a dataset of 10,841 apps from the Google Play Store to uncover insights into factors affecting app success, user engagement, and market trends.

This project will leverage the Google Play Store dataset to provide comprehensive insights into the app ecosystem. The findings will help developers in enhancing app performance, optimizing revenue, and improving user satisfaction. By understanding market trends and user preferences, businesses can make informed decisions and stay competitive in the dynamic app market.

# **GitHub Link-**

https://github.com/Bhola90/Data-Science-Projects

# **Problem Statement**


**BUSINESS PROBLEM OVERVIEW**

For a Play Store data analysis project, I could consider various problem statements depending on the area of focus. Here are a few:

Analyze the factors affecting app ratings and reviews on the Google Play Store. Identify trends and patterns to provide actionable insights for app developers to improve their app quality and user satisfaction.

Segment the Google Play Store app market based on categories, user demographics, and download patterns to identify profitable niches and opportunities for new app development.

Develop a predictive model to forecast the success of an app on the Google Play Store based on its initial performance metrics such as downloads, ratings, and user reviews.

**These problem statements can lead to valuable insights and recommendations for app developers, marketers, and business strategists.**

#### **Define Your Business Objective?**

***

1.   To enhance app quality and user satisfaction.
2.   To identify profitable market segments.

1.   To forecast the potential success of an app.










***

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


5. You have to create at least 20 logical & meaningful charts having important insights.

[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]







# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
from datetime import datetime
from datetime import timedelta
from datetime import date
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import rcParams


### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Importing the dataset
dataset = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/Play Store Data.csv')

In [None]:
dataset_original = dataset.copy()

### Dataset First View

In [None]:
# Dataset First
dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns
dataset.shape

### Dataset Information

In [None]:
# Dataset Info
dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
len(dataset[dataset.duplicated()])

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(dataset.isnull().sum())

### What did you know about your dataset?

The dataset given is a dataset from Play Store Data, and we have to analyse and getting the insights behind it.

This dataset offers information that can be used to perform various types of analyses, such as understanding factors affecting app ratings, identifying market trends, and analyzing user engagement.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
dataset.columns

In [None]:
# Dataset Describe
dataset.describe()

### Variables Description

* **App               :**Name of the Application

* **Category       :**The kind of Application

* **Rating            :**Ratings given by the user (out of 5)

* **Reviews            :**Number of Users Reviews

* **Size           :**The size of the app

* **Installs       :**Number of App Installs

* **Type             :**Free or Paid app

* **Price        :** Price of the application

* **Content Rating         :**Age group the app is for

* **Genres          :**The classification of the app

* **Last Updated          :**When was the last updated

* **Current Version         :**Latest version of the application

* **Android Version         :**Minimum Android version required



### Check Unique Values for each variable.

In [None]:
dataset.columns.value_counts()

## 3. ***Data Wrangling***

### Data Wrangling Codes

In [None]:
dataset['Rating'].fillna('0.',inplace = True)

In [None]:
# Remove ',' , '+', 'Free', '$' and 'Any other letter or sign'

dataset['Installs'] = dataset['Installs'].str.replace(',', '').str.replace('+', '').str.replace('Free', '0')
dataset['Reviews'] = dataset['Reviews'].str.replace(r'\D', '', regex=True)
dataset['Price'] = dataset['Price'].str.replace('$', '').str.replace('Everyone', '0')
dataset['Size'] = dataset['Size'].str.replace('Varies with device', '0').str.replace('M', '').str.replace('k', '000').str.replace('+', '').str.replace(',', '')
dataset['Category'] = dataset['Category'].str.replace('1.9','unknown')

In [None]:
# Convert 'object' column to 'integer' or 'Float'

dataset['Installs'] = dataset['Installs'].astype(int)
dataset['Reviews'] = dataset['Reviews'].astype(int)
dataset['Price'] = dataset['Price'].astype(float)
dataset['Rating'] = dataset['Rating'].astype(float)
dataset['Size'] = dataset['Size'].astype(float)

In [None]:
# Convert string format to datetime

dataset['Last Updated'] = dataset['Last Updated'].str.replace(' ','/').str.replace(',','')
dataset['Last Updated'] = pd.to_datetime(dataset['Last Updated'], format='mixed', errors='coerce', dayfirst=True)
dataset['Last Updated'] = dataset['Last Updated'].fillna(0)


In [None]:
dataset_dropped = dataset.drop(index=10472, inplace=True)

### What all manipulations have you done and insights you found?

In [None]:
# Which App having the highest number of installs?
Highest_installs = pd.DataFrame(dataset.groupby(['App','Category','Genres'])['Installs'].sum())
top_10_installs = Highest_installs.sort_values('Installs', ascending = False).head(10)
top_10_installs

In [None]:
# Which Category of the app having the highest no. of reviews?
Highest_reviews = pd.DataFrame(dataset.groupby('Category')['Reviews'].sum())
Top_reviews = Highest_reviews.sort_values(by = 'Reviews', ascending = False).head(5)
Top_reviews

In [None]:
# Average ratings of the apps by Category
Avg_ratings = pd.DataFrame(dataset.groupby('Category')['Rating'].mean())
Avg_Ratings = Avg_ratings.sort_values(by='Rating', ascending = False)


In [None]:
# Which Category having the most Free or Paid apps?
Type_app = dataset.groupby(['Category', 'Type']).size().unstack().fillna(0)
Type_app

In [None]:
# Top 20 prices of the Apps by Category
Top_prices = pd.DataFrame(dataset.groupby(['Category'])['Price'].max())
Top_prices.sort_values(by='Price', ascending = False).head(20)


In [None]:
# Calculate the app sizes
Highest_App_size = pd.DataFrame(dataset.groupby('App')['Size'].max())
Highest_App_size = Highest_App_size.sort_values(by='Size', ascending=False).head(10)
Highest_App_size

In [None]:
# Top 10 sizes of the app
dataset_size = dataset.dropna(subset=['Size'])
top_10_sizes = dataset.sort_values(by='Size', ascending=False).head(10)
print(top_10_sizes[['Size']])

In [None]:
# Which Age Group is App for? Count of Content Rating (Audience/Age Group)
content_rating_counts = dataset['Content Rating'].value_counts()
content_rating_counts

In [None]:
# Which App having the highest installs with largest in Size?
installs_vs_size = pd.DataFrame(dataset.groupby(['App','Installs'])['Size'].sum())
installs_vs_size  = installs_vs_size .sort_values(by='Size', ascending=False).head(10)
installs_vs_size

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 - Horizontal Bar Chart: Apps with Highest Number of Installs (Multivariate)

In [None]:
plt.figure(figsize=(14, 8))
top_10_installs.plot(kind = 'barh', color='yellow', edgecolor='black')
plt.xlabel('Number of Installs')
plt.ylabel('App-Category-Genres')
plt.title('Top 10 Apps with Highest Number of Installs')
plt.show()

##### 1. Why did you pick the specific chart?


A bar chart is used when we want to show a distribution of data points or perform a comparison of metric values across different variables of our data. From a bar chart, we can also see which group are the highest or most common, and how other groups compare against each other.

Thus, I used horizontal bar chart to show the total number of installs combined with apps, category and genres.

##### 2. What is/are the insight(s) found from the chart?

Here, I am looking for under which App under which Category and Genres having the highest Installs.
So I got Subway Surfer (App), Game (Category), Arcade (Genre) having 6000000000 installs which is the highest in this segment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, This insight helps in creating a positive business impact. When we are talking about App ecosystem, this shows that Gaming Industry are in the Top Rank. Many investors and business entities are coming to this industry with creative ideas, addictive and Viral Games to catch high amount of people around the world.

#### Chart - 2 - Bar Chart: Number of Apps in Each Category (Univariate with Categorical)

In [None]:
category_counts = dataset['Category'].value_counts()
plt.figure(figsize=(12, 6))
category_counts.plot(kind='bar')
plt.title('Number of Apps in Each Category')
plt.xlabel('Category')
plt.ylabel('Number of Apps')
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?


I picked a bar chart in this case because it is one of the most effective ways to visualize the distribution of categorical data.


**Clear Comparison:**
It makes easy to compare which categories have more or fewer apps.

**Visual Impact:**
It visually naturally which categories dominate or are under perform in the dataset.

**Scalability:**
Bar charts are well-suited. Even with many categories, a bar chart remains readable.


##### 2. What is/are the insight(s) found from the chart?

From the above chart I got to know that Categories like **[ Family, Game, Tools, Medical and Business ]** are having the large number of Apps in Play Store Data.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from this bar plot can help create a positive business impact by informing strategic decisions in market analysis, product development, resource allocation, and investment. However, for a more comprehensive understanding, it should be combined with other analyses, such as market trends, customer preferences, and competitor actions.

#### Chart - 3 - Histogram: Show the distribution of App Rating (Univariate with Numerical)

In [None]:
plt.figure(figsize=(6, 4))
Rating = dataset[~dataset['Rating'].isnull()]['Rating']
dataset['Rating'].plot(kind = 'hist', color = 'green', bins = 10)
plt.title('Distribution of App Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')

##### 1. Why did you pick the specific chart?

Choosing a histogram to visualize the distribution of app ratings is an effective choice for several reasons:

**Visualize Frequency:** A histogram allows me to see how frequently different rating values occur across the dataset.

**Identifying Patterns:** I can easily spot if there are certain ratings
that are more common.

**Set Benchmarks:** If I have a target rating in mind (e.g. an average rating of 4.0), the histogram helps in visualizing how many apps meet that target.

##### 2. What is/are the insight(s) found from the chart?

Most apps have a rating around 4 or 4.5

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from this histogram of app ratings can contribute to a positive business impact.

*   It provides a clear overview of user satisfaction.

*   It tells whether to invest more in apps that are already performing well or to focus on improving lower-rated apps.

*   It guides businesses on maintaining and improving the elements that users appreciate to retain users.






#### Chart - 4 - Box Plot: Distribution of App Reviews (Univariate with Numerical variable)


In [None]:
plt.figure(figsize=(4, 10))
dataset['Reviews'].plot(kind = 'box')
plt.title('Distribution of App Reviews')
plt.xlabel('Number of Reviews')

In [None]:
# Plotting box plots for Installs as compare to reviews
plt.figure(figsize=(4, 10))
dataset.boxplot(column=['Reviews', 'Installs'])
plt.title('Box Plots of Various App Metrics')
plt.ylabel('Values')
plt.show()

##### 1. Why did you pick the specific chart?

***I choose a box plot for the following reasons:***

**Identifying Outliers:** Box plots helps in identifying outliers in the dataset.

**Comparative Analysis:** If we have multiple groups or categories, box plots allow us to compare the distributions of these groups side by side.

##### 2. What is/are the insight(s) found from the chart?

**Outlier Detections:**

**High Installs with Low Reviews:** Outliers showing high installs but low reviews, it could indicate that users are not engaging with the app after installation. This insight can prompt a review of user experience post-installation.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights tells that there is no positive impact on Business.

**Negative or No Correlation:** Lack of correlation or negative correlation indicates issues with app quality, marketing strategies, or user engagement post-installation.

**High Reviews with Low Installs:** If there would be, high reviews with low installs which might suggest strong user satisfaction among a small user base, indicating potential for growth.

**Negative Growth**

It also Indicates that users might be uninstalling the app. This can lead to negative growth if not addressed, as it suggests the app fails to meet user expectations.

Ignoring negative trends can result in user attrition and harm the app's reputation, leading to negative growth.


#### Chart - 5 - Scatter Plot: Relationship Between Reviews and Rating (Bivariate with Numerical and Numerical)

In [None]:
plt.figure(figsize=(8, 6))
plt.scatter(dataset['Reviews'], dataset['Rating'], alpha=0.5)
plt.title('Relationship Between Reviews and Rating')
plt.xlabel('Number of Reviews')
plt.ylabel('Rating')
plt.xscale('log')
plt.show()

##### 1. Why did you pick the specific chart?

I picked this chart because first of all it provides me the relationship of two Numerical variables (Reviews and Ratings) and,

Secondly, it gave me the better insights about how many reviews have how much ratings.

##### 2. What is/are the insight(s) found from the chart?

 The Darker portion says that Maximum number of reviews having more than 4 star out of 5.

 Another insight is that more the no. of reviews, less the no. of people giving rating.

 Maximum ratings is fall under less than 1000 reviews.

##### 3. Will the gained insights help creating a positive business impact?
**Resource Allocation:** A strong correlation between the number of reviews and ratings, investing resources in acquiring more reviews might be a good strategy to improve overall ratings and customer satisfaction.

**Quality Control:** If items with very few reviews tend to have extreme ratings (either very high or very low), it might indicate that these ratings are less reliable or that these items need more review data to stabilize their ratings.

Are there any insights that lead to negative growth? Justify with specific reason.

*Although, the scatter plot provides valuable insights, but sometimes we can't say that this is all depends on the people interest.
Some people give only ratings, some people give only reviews and very few people are interested in giving both. And its impact on
business decisions will depend on how we interpret the data and use these insights to make strategic decisions.*

#### Chart - 6 - Pie Chart: Proportion of Apps in Each Category (Univariate with Categorical)

In [None]:
plt.figure(figsize=(6, 6))
Top_categories = dataset['Category'].value_counts().head(10)
Top_categories.plot(kind='pie', autopct='%1.1f%%')
plt.title('Market Share of Apps (in %) in Top 10 Categories')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are used to represent the proportional data or relative data in a single chart. The concept of pie slices is used to show the percentage of a particular data from the whole pie.

So, I used to show the percentage of market share of the app by category through pie chart with different colored area under a circle.


##### 2. What is/are the insight(s) found from the chart?

**Market Share of the App**

The highest market share of the app is in Family Category. Apps like YouTube Kids, Candy Bomb , Robolox and many more are the most popular FAMILY APPS.
 ***


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights found will definitely help for a positive business impact.  Maximum number of investors or businesses should target this kind of Category to successful in the market.

#### Chart - 7 - Line Chart: Trend of Average Ratings Over Time. (Bivariate with Numerical and Numerical)

In [None]:
Avg_ratings = dataset.groupby('Last Updated')['Rating'].mean()
plt.figure(figsize=(12, 6))
Avg_ratings.plot( marker='o', color='b')
plt.xlabel('Last Updated')
plt.ylabel('Average Rating')
plt.title('Average App Ratings Over Time')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

The line chart I’ve chosen is a good choice for visualizing the trend of average ratings over time.

Here's Why-

**Trends:** A line chart effectively displays trends over time, making it easy to see how average ratings have changed across different time periods.

**Continuity:** Since time is a continuous variable, a line chart helps in providing a clear view of any upward or downward trends.

**Date Points:** Plotting average ratings against time with markers (e.g., circles) allows you to see individual data points


##### 2. What is/are the insight(s) found from the chart?


As per the trend showing in the chart tells that-

After 2015 there are large number of ratings which shows that after that period there are lot of apps are coming to the market and people are more serious in giving ratings. Ultimately those apps will impact themselves only for getting more and more benefits.

The observation is that users are more serious about giving ratings. This can be beneficial for developers who use this feedback to enhance their apps.

##### 3. Will the gained insights help creating a positive business impact?



The insights drawn from the trend showing an increase in ratings after 2015 can indeed be useful for creating a positive business impact.


**User Feedback:** The increased volume of user feedback to make data-driven decisions.

**Highlight Ratings:** High ratings and positive reviews builds credibility and attract more downloads.

**Quality over Quantity:** The quantity of ratings is increasing, ensure that the focus remains on improving the quality of the app.

#### Chart - 8- Horizontal Bar Chart: Average Ratings by Category (Bivariate with Numerical and Categorical)

In [None]:
plt.figure(figsize=(12, 10))
Avg_Ratings['Rating'].plot(kind='barh', color='skyblue', edgecolor='black')
plt.title('Average Ratings by Category')
plt.xlabel('Average Rating')
plt.ylabel('Category')
plt.gca().invert_yaxis()  # Largest rating on top
plt.show()

##### 1. Why did you pick the specific chart?

The horizontal bar chart is specifically chosen for its ability to clearly and effectively present comparative data with potentially long category names. Its design facilitates easy reading, quick comparison, and immediate insights into the average ratings of different categories.

**Ordered Presentation:** Sorting the bars in descending order and inverting the y-axis ensures that the highest average ratings are at the top.

**Visual Impact:** The visual difference between the lengths of the bars clearly shows the variation in average ratings between categories. It makes it easy to identify which categories are rated higher or lower.



##### 2. What is/are the insight(s) found from the chart?

Top Average Ratings by Category


1   EVENTS

2   EDUCATION

3   ART AND DESIGN

4   BOOKS AND REFERENCES

5   PERSONALISATION










##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact***

**Focus on Top Rated Category:** Businesses can allocate more resources
(e.g., marketing, development, support) to these categories to further capitalize on their popularity

**User Feedback Analysis:** High average ratings typically come with positive user feedback. Analyzing this feedback can reveal what users love about these apps, guiding future product improvements and innovations.

***May Impact for Business Negative Growth***

**Customer Dissatisfaction:** Users of apps in lower-rated categories might feel neglected, leading to higher churn rates and negative reviews, further decreasing ratings and potentially harming the brand's overall reputation.

**Overinvestment in Already Successful Areas:** Investing too much in categories that are already performing well might not provide significant returns compared to investing in areas that need improvement. This can lead to diminishing returns and missed opportunities for growth in other areas.

#### Chart - 9 - Line Chart: Count of Apps by Age Group (Univariate with Categorical)



In [None]:
plt.figure(figsize=(10, 6))
content_rating_counts.plot(kind='line', marker = 's', linestyle = '-', color= 'maroon')
plt.title('Count of Apps by Content Rating')
plt.xlabel('Content Rating')
plt.ylabel('Number of Apps')
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

A line chart was selected for its ability to clearly represent ordered categorical data and trends across those categories, offering a smooth and easy visual that helps in understanding the distribution of apps by content rating.

**Sequential Data:** Even though content ratings are categorical, they often follow a natural order (e.g., from ratings suitable for all audiences to those suitable only for adults).

**Markers:** The use of markers (squares) on the line helps highlight individual data points, making it clear where the counts increase or decrease. This can help identify which content ratings have the most or fewest apps.

##### 2. What is/are the insight(s) found from the chart?

Highest amount of apps fall under the **Everyone** Content Rating
Approx. **9000 Apps** are targeted for Everyone Segment which is the **80.4%** of the entire App count.
and very limited number of apps are in Adult only 18+ Segment

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impacts are:***

**Expanding Portfolio:** Investors and Marketers who are concentrated in a specific content rating, consider diversifying by developing apps in different audience segment.

**Audience Behavior:** This also shows user behaviour data to understand how different content ratings impacts on downloading apps, user retention, and in-app purchases.

**Marketing Focus:** Marketers can shift their focus on differentiating  app within that space if the data shows that large no. of apps in a (Everyone) content rating segment.

#### Chart - 10 - Horizontal Bar Chart: Highest Apps by Sizes (Bivariate with Categorical and Numerical)





In [None]:
plt.figure(figsize=(10, 8))
Highest_App_size['Size'].plot(kind='barh', color='pink', edgecolor='red')
plt.title('Top 10 Apps by Size')
plt.xlabel('File Size (MB)')
plt.ylabel('App')
plt.gca().invert_yaxis()  # Invert the y-axis to have the largest app on top
plt.show()

##### 1. Why did you pick the specific chart?


The horizontal bar chart (barh) was chosen for this visualization for several reasons:

**Readable:** Horizontal bar charts are easier to read, especially when dealing with long category names, such as app names.

**Ranking:** Inverting the y-axis (with invert_yaxis()) places the largest app at the top, making it easier to quickly identify the largest apps.

**Comparison:** Bar charts make it easy to compare the sizes of different apps, as the lengths of the bars represent the values directly.

This is an excellent choice for displaying the sizes of the top 10 apps

##### 2. What is/are the insight(s) found from the chart?

**World Search Tab 1 FR** App having the largest size in Play Store Data as compare to all other apps.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

***Positive Business Impact:***

According to me reducing the size of these apps can improve downloads, save storage space on users devices, and potentially increase user satisfaction and retention.

**Marketing strategy:** If the sizes of apps is significantly smaller than competitors' apps but offers similar features, businesses can use this as a marketing strategy  to attract users with limited device storage or slower internet connections.

***Negative Growth Reasons:***

**User Frustration:** If the apps are too large, users with limited storage space or slower internet connections might avoid downloading them, leading to lower download rates.

**Performance Issues:** Large app sizes can sometimes inefficient resource usage, which might lead to performance issues, higher maintenance costs, and potential user dissatisfaction.

#### Chart - 11 - Bar Chart: Highest Number of Reviews (Bivariate with Numerical and Categorical)

In [None]:
plt.figure(figsize=(10, 6))
Highest_reviews.plot(kind='bar', color= 'r')
plt.title('Category having highest number of Reviews')
plt.xlabel('Category')
plt.ylabel('Number of Reviews')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is an effective choice for displaying the number of reviews across categories because it offers


*   Clear and Direct comparisons
*   Quantitative Analysis
*   Easy to interpret
*   Highlights key insights






##### 2. What is/are the insight(s) found from the chart?

**TOP REVIEWS BY CATEGORY ARE:**

GAME

COMMUNICATION

SOCIAL

FAMILY

TOOLS

PHOTOGRAPHY

##### 3. Will the gained insights help creating a positive business impact?

The Insights Identified

**Users Interest:** It determines where user interest is most concentrated.

**Market Opportunities:** Categories have significantly more reviews presenting opportunities for further development, marketing efforts, or monetization.

#### Chart - 12 - Grouped Bar Chart: Most Free or Paid Apps by Category (Bivariate with Categorical and Categorical)

In [None]:
Type_app.plot(kind='bar', figsize= (12,6), color=['lightgreen','red'], edgecolor='black')
plt.xlabel('Category')
plt.ylabel('Number of Apps')
plt.title('Distribution of Free vs Paid Apps by Category')
plt.legend(title='App Type')
plt.show()

##### 1. Why did you pick the specific chart?

The Grouped bar chart I've chosen to visualize the distribution of free vs. paid apps by category is a strong choice for several reasons-

**Categorical Comparison:** Bar charts are ideal for comparing quantities across different categories.

**Dual Representation:** By using different colors (light green for free apps and red for paid apps), the bar chart allows you to directly compare the distribution of free vs. paid apps within each category.

**Visual Clarity:** The distinct bars make it easy to see differences in the number of free and paid apps within each category.

**Color Combo:** The use of distinct colors helps in distinguishing between free and paid apps.

##### 2. What is/are the insight(s) found from the chart?

**TOP FREE APPS BY CATEGORY ARE:**

1.FAMILY

2.GAME

3.TOOLS

**TOP PAID APPS BY CATEGORY ARE:**


1.FAMILY

2.MEDICAL

3.GAME







##### 3. Will the gained insights help creating a positive business impact?

*Positive Business Impact:*

**Pricing Strategy:** If certain category dominated by free apps, you may consider offering a freemium model to attract users.

**Identifying Opportunity:** If certain categories are under perform by paid apps, there might be an opportunity to enter the market with a paid offering that provides unique value.

#### Chart - 13 - Heatmap: Correlation B/W different Variables (Multivariate)

In [None]:
correlation = dataset['Reviews'].corr(dataset['Rating'])
print(correlation)

The value is nearby 0 which indicates no correlation.

In [None]:
dataset_corr = dataset[['Rating', 'Reviews', 'Size', 'Installs', 'Price',]]

In [None]:
sns.heatmap(dataset_corr.corr(), cmap='coolwarm', annot=True)

##### 1. Why did you pick the specific chart?

A heatmap is a powerful tool for visualizing correlations between multiple variables because it provides both a visual and numerical representation of the relationships. Here’s why a heatmap is a suitable choice:

**Color Differences:** This makes it easy to quickly identify strong positive or negative relationships between variables.

**Clarity:** The grid-like structure of the heatmap makes it clear which variables are being compared.

**Detecting Patterns:** Strong Color identifies strong relationship and Lighter color indicates weak relationships between variables

##### 2. What is/are the insight(s) found from the chart?


**Reviews** are directly correlate with **Installs**.

The value is **nearby 1** which indicates **positive correlation**.

#####3. Will the gained insights help creating a positive business impact?

Yes, the insights gained from a correlation heatmap can indeed help create a positive business impact. Here’s how:

**Product Improvements:** Understanding correlations between Rating and other variables (like Size or Price) can guide product development. For example, if there's a negative correlation between Size and Rating, efforts can be made to optimize app size to improve user satisfaction.

***Here from the insights where Reviews are directly correlate with Installs which is a positive correlation between them.***

**Highest number of reviews will get large number of downloads which will strongly impact on business growth.**

#### Chart - 14 - Horizontal Bar Chart: App Installs vs App Size  (Multivariate)

In [None]:
installs_vs_size .plot(kind='barh', figsize= (12,6), color=['orange'], edgecolor='black')
plt.xlabel('Size')
plt.ylabel('Number of Apps by installs')
plt.title('Installs vs Size')
plt.gca().invert_yaxis()
plt.show()

##### 1. Why did you pick the specific chart?

***Why I Choose a Horizontal Bar Chart?***

**Easy Comparison:** Horizontal bar charts are particularly effective when we have long category names or when we need to display data with wide range.

**Space Saver:** They make better use of horizontal space, which is often more abundant than vertical space on a screen.

**Highlighting Trends:** The horizontal layout can make it easier to spot trends and patterns,

##### 2. What is/are the insight(s) found from the chart?

The App having biggest in size and its installs is-

**App- iBP Blood Pressure**

**Size- 1.4 GB**

**Installs- 10000**




#####3. Will the gained insights help creating a positive business impact?

***Positive Business Impact:***

**Optimal App Size:** Knowing which app sizes are most popular with users allows developers to optimize their app size, balancing features and performance with the download size that users prefer.

***Risks of Negative Growth:***

**Overlooking Insights:** Failing to consider the preferred app sizes and continuing to develop apps that are too large or too small could result in lower install rates and user dissatisfaction.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Strong Recommendations for Clients to Achieve Business Objectives are:**

1. **Focus on High-Performing Categories:** Categories with high average ratings indicate strong user satisfaction. Client take action to Invest more in these categories to leverage their popularity. Develop similar features and quality standards in lower-rated categories.

2. **User Reviews and Installations:** Strong positive correlation between the number of reviews and installs. Clients would Encourage users to leave reviews through in-app prompts and incentives. Positive reviews attracts new users.

3. **App Size Optimization:** Negative correlation between app size and user ratings. Clients regularly update their apps to remove unnecessary files and improve loading times.

4. **Pricing Strategy:** Weak correlation between price and user ratings shows that client need to Focus on improving app quality and user experience rather than changing prices to impact ratings.

5. **User Feedback:** Insights tells that Increased volume of user feedback after 2015 indicates users are more willing to share their experiences. Action for clients to Implement a system to regularly gather and analyze user feedback. Use this feedback to make continuous improvements.

# **Conclusion**

**Enhanced User Satisfaction:** By focusing on high-performing categories and optimizing app size and performance, user satisfaction and ratings can be improved.

**Strategic Marketing:** Leveraging positive reviews and user feedback in marketing campaigns can attract more users and increase installs.

**Efficient Resource Allocation:** Investing resources in areas that show strong positive correlations with key performance metrics ensures efficient use of resources and maximizes return on investment.

**Reduced Attrition:** Implementing targeted retention strategies based on data insights can significantly reduce attrition rates and enhance customer loyalty.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***