<a href="https://colab.research.google.com/github/Sumit4085/Data-Science/blob/main/Google_Play_Store_EDA_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual



# **Project Summary -**

The Google Play Store is a massive digital marketplace where users can download a wide variety of applications for their smartphones and tablets. With millions of apps available, understanding the trends and preferences of both users and developers is crucial for optimizing the app ecosystem. Exploratory Data Analysis (EDA) offers a powerful approach to uncover insights and patterns within the vast dataset of the Google Play Store.

The Google Play Store EDA project aims to explore and analyze the extensive dataset of app listings available on the platform. By conducting thorough analysis, we seek to address several key questions and challenges:

1. App Categories and Popularity:
   - What are the most popular app categories on the Google Play Store?
   - How does the popularity of different categories vary?
   - specific categories that dominate the market?

2. App Ratings and Reviews:
   - How do user ratings affect the popularity and success of an app?
   - Are there correlations between the number of reviews and app ratings?
   

3. Pricing Strategies and Revenue:
   - How do pricing strategies influence app downloads and revenue generation?
   - Are paid apps more successful than free apps, or vice versa?
   - What are the most common price ranges for apps across different categories?

4. User Demographics and Preferences:
   - Who are the primary users of the Google Play Store?
   - What are the demographics of users who download specific types of apps?
   - Are there regional or cultural differences in app preferences?

5. App Size and Performance:
   - How does app size impact user downloads and retention?
   - Are users more likely to download smaller apps over larger ones?
   - Is there a correlation between app size and user ratings/performance?

6. Developer Trends and Behavior:
   - What are the characteristics of successful app developers on the Google Play Store?
   - How do factors such as app updates, developer reputation, and engagement influence app success?
   - Are there patterns in the behavior of developers regarding app pricing, updates, and user feedback?

Approach:
To address these questions and challenges, we will perform exploratory data analysis on a comprehensive dataset extracted from the Google Play Store. This analysis will involve data cleaning, visualization, and statistical techniques to uncover meaningful insights. We will utilize Python programming language along with libraries such as Pandas, Matplotlib, and Seaborn for data manipulation and visualization. Additionally, we will employ descriptive and inferential statistical methods to derive conclusions and make recommendations based on the findings.

Expected Outcome:
Through this EDA project, we aim to provide valuable insights into the dynamics of the Google Play Store ecosystem. By understanding app trends, user preferences, and developer behavior, stakeholders such as app developers, marketers, and platform administrators can make informed decisions to optimize their strategies and enhance the overall user experience on the Google Play Store.

# **GitHub Link -**

Provide your GitHub Link here.https://github.com/Sumit4085/Data-Science-Projects.git

# **Problem Statement**


**Write Problem Statement Here.**

Problem Statement:
Analyze the Google Play Store dataset to gain insights into app trends, user preferences, and factors influencing app ratings and downloads. The goal is to uncover patterns that can guide app developers and marketers in making informed decisions about app development, pricing strategies, and marketing campaigns. By conducting exploratory data analysis (EDA), we aim to identify correlations between various app attributes such as category, size, price, and rating, and understand their impact on user engagement and satisfaction. Additionally, we seek to explore geographical trends in app usage and popularity across different regions. Ultimately, the insights derived from this analysis will enable stakeholders to optimize their app offerings and maximize their success on the Google Play Store platform.

#### **Define Your Business Objective?**

Answer Here.
"To leverage insights derived from the analysis of the Google Play Store dataset to inform strategic decision-making processes and optimize app offerings, pricing strategies, and marketing campaigns. The primary goal is to enhance user engagement, satisfaction, and overall success on the Google Play Store platform by identifying key trends, preferences, and factors influencing app ratings and downloads."

In summary, the business objective is to use data-driven insights to:

Inform strategic decision-making.
Optimize app offerings.
Develop effective pricing strategies.
Create targeted marketing campaigns.
Enhance user engagement and satisfaction.
Maximize success on the Google Play Store platform.






# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing all mandatory Libraries
import pandas as pd # Numerical operations on arrays
import numpy as np #  Data manipulation and analysis
import matplotlib.pyplot as plt #  Data visualization
%matplotlib inline
import seaborn as sns # Statistical data visualization



### Dataset Loading

In [None]:
# importing drive from google drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# uploading Dataset (play store)
df = pd.read_csv('/content/drive/MyDrive/Play Store Data.csv')

In [None]:
# uploading dataset (play store user review)
df1 = pd.read_csv('/content/drive/MyDrive/User Reviews.csv')

### Dataset First View

In [None]:
# overview of top 5 data of df
df.head()

In [None]:
# overview of top 5 data of df1
df1.head()

### Dataset Rows & Columns count

In [None]:
# getting Rows & Columns count of dataset dataset df
df.shape

In [None]:
# getting Rows & Columns count of dataset df1
df1.shape

### Dataset Information

In [None]:
# summary of data structure in detail(total rows nd columns, null ,dtypes) of dataset df
df.info()

In [None]:
# summary of data structure in detail(total rows nd columns, null ,dtypes) of dataset df1
df1.info()

#### Duplicate Values

In [None]:
# finding duplicates in df
df.duplicated().sum()

In [None]:
# finding duplicates in df1
df1.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# finding null values in df
df.isnull().sum()

In [None]:
# finding null values in df1
df1.isnull().sum()

In [None]:
# Visualizing the missing values in df
# Checking Null Value by plotting Heatmap
sns.heatmap(df.isnull(), cbar=False )
plt.title('Missing Values')
plt.xlabel('Categories')
plt.ylabel('No. of Rows')
plt.show()

In [None]:
# Visualizing the missing values in df1
# Checking Null Value by plotting Heatmap
sns.heatmap(df1.isnull(), cbar=False )
plt.title('Missing Values')
plt.xlabel('Categories')
plt.ylabel('No. of Rows')
plt.show()

### What did you know about your dataset?

We have 2 dataset here one is of google play store and other is of user review
 now we have to perform exploratory data analysis (EDA), we aim to identify correlations between various app attributes such as category, size, price, and rating, and understand their impact on user engagement and satisfaction. Additionally, we seek to explore geographical trends in app usage and popularity across different regions. Ultimately, the insights derived from this analysis will enable stakeholders to optimize their app offerings and maximize their success on the Google Play Store platform.Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset df Columns
df.columns

In [None]:
# Dataset df1 Columns
df1.columns

In [None]:
# Dataset df Describe
df.describe(include = 'all')

In [None]:
# Dataset df1 Describe
df1.describe(include = 'all')

### Variables Description

Here we have descripton for all column of both dataset(df and df1)

- **App**: Name of the application.
- **Category**: Category or genre of the application.
- **Rating**: Average user rating of the application.
- **Reviews**: Number of user reviews for the application.
- **Size**: Size of the application in terms of storage space.
- **Installs**: Number of times the application has been installed.
- **Type**: Type of the application (e.g., Free or Paid).
- **Price**: Price of the application (if it's a paid app).
- **Content Rating**: Content rating of the application based on audience suitability.
- **Genres**: Sub-genres or additional categorizations of the application.
- **Last Updated**: Date when the application was last updated.
- **Current Ver**: Current version of the application.
- **Android Ver**: Minimum Android version required to run the application.re
Certainly, here are the variable descriptions for each column:

- **Translated_Review**: Reviews of the application translated into a specific language.
- **Sentiment**: Overall sentiment of the translated review (e.g., Positive, Negative, Neutral).
- **Sentiment_Polarity**: Numeric value indicating the sentiment polarity (positive, negative, or neutral) of the translated review.
- **Sentiment_Subjectivity**: Numeric value indicating the subjectivity (opinionated vs. factual) of the translated review.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable for dataset df
df.nunique()

In [None]:
# Check Unique Values for each variable for dataset df1
df1.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

DATA CLEANING AND TRANSFORMATION.
--> As We Have Already Removed Duplicates and Null Values so now we're performing below stated steps in order to ensure cleaning of data and after that we will create some new columns i.e calculated fields and will remove those columns which have no use.

WILL CHECK BELOW THINGS TO EVERY COLUMN AND WILL PERFORM MANDATORY ACTIONS.

--> FOR CLEANING
* CHECKING DATATYPE
* CHECKING REPEATED VALUES
* CHECKING ABNORMALITIES

--> DATA TRANSFORMATION
* COLUMN DELETING
* COLUMN ADDING(CALCULATED FIELD)

FIRST WE WILL CLEAN AND MODEL DATASET df, THEN WILL CLEAN AND MODEL DATASET df1

CLEANING AND TRANSFORMING df

In [None]:
# finding null values
df.isnull().sum()

insights for null values for df

* as we have very small null values in couple of columns so will remove those rows and in review column we have big amount of null values and this is numerical column so will fill this column with mean value.

In [None]:
# dropping all null values which are small in numbers
df = df.dropna(axis=0, subset ={'Content Rating','Type','Android Ver','Current Ver'})

In [None]:
# finding duplicates in df
df.duplicated().sum()

insights for duplicate values in df

* as we have observed that 'App' category contains most duplicates so will analyse it with same column and will remove after analysing

In [None]:
# observing for 'App' column
df['App'].unique()

In [None]:
# finding duplicates in 'App' column as 'App' column needs to contains only unique values
df.duplicated(subset = 'App').sum()

In [None]:
# dropping duplicates from app column
df = df.drop_duplicates(subset = 'App')

In [None]:
# observing column Category
df.Category.unique()

In [None]:
# insights of column 'App'

# * abnormalities: none
# * repeatation in data : yes so will remove duplicates
# * dtype : will remain same

In [None]:
# dropping duplicates from app column
df = df.drop_duplicates(subset = 'App')

In [None]:
# observing column Category
df.Category.unique()

In [None]:
# insights of column 'Category'

# * abnormalities: to replace '_AND_'  with ' & '
# * repeatation in data : none
# * dtype : will remain same

In [None]:
# replacing '_AND_' with ' & ' in Category column
df['Category'] = df['Category'].str.replace('_AND_', ' & ')

In [None]:
# observing size column
df.Size.value_counts()

In [None]:
# insights of column 'Size'

# * abnormalities: to replace 'M' to '', 'Varies with device' to 'nan'and 'k' to '' and will divide it with 1000 to get all values in MB(MEGABYTE)
# * repeatation in data : none
# * dtype : will change in float

In [None]:
# replacing 'M' with ''
df['Size'] = df['Size'].apply(lambda x: x.replace('M','') if 'M' in x else x)

In [None]:
# replacing 'k' with '' and changing to float and then dividing by 1000 to convert data in MB
df['Size'] = df['Size'].apply(lambda x: float(x.replace('k','')) / 1000 if 'k' in x else x)

In [None]:
# now changing 'Size' column dtype into string to replace 'Varies with device' into 'nan' as we can't iterate numbers.
df['Size'] = df['Size'].astype('str')

In [None]:
# now replacing 'Varies with device' with 'nan'
df['Size'] = df['Size'].apply(lambda x: x.replace('Varies with device','nan') if 'Varies with device' in x else x)

In [None]:
# again chganging 'Size' column dtype into float
df['Size'] = df['Size'].astype('float64')

In [None]:
# observing column Rating
df.Rating.unique()

In [None]:
# insights of Rating column

# * abnormalities: found nan value, now finding mean and will fill all nan.
# * repeatation in data : none
# * dtype : remain same

In [None]:
# finding mean values to replace all null values in Rating column
df['Rating'].mean()

In [None]:
# filling all nan values with mean values
df.fillna({'Rating': 4.2},inplace = True)

In [None]:
# observing column Reviews
df.Reviews.unique()

In [None]:
# # insights of Reviews column

# * abnormalities: none
# * repeatation in data : none
# * dtype : changing dtype into int

In [None]:
# changed datatype
df['Reviews'] = df['Reviews'].astype('int64')

In [None]:
# observing Installs column
df.Installs.unique()

In [None]:
# # insights of Installs column

# * abnormalities: need to remove '+', and ',' to change dtype
# * repeatation in data : none
# * dtype : changing dtype into int

In [None]:
# replacing '+' with ''
df['Installs'] = df['Installs'].str.replace('+','')

In [None]:
# replacing ',' with ''
df['Installs'] = df['Installs'].str.replace(',','')

In [None]:
# changing dtype into int
df['Installs'] = df['Installs'].astype('int64')

In [None]:
# observing Type Category
df.Type.unique()

In [None]:
# # insights of Type column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Price Category
df.Price.unique()

In [None]:
# # insights of Price column

# * abnormalities: to remove '$' in order to change dtype
# * repeatation in data : none
# * dtype : to change in float

In [None]:
# replacing $ with ''
df['Price'] = df['Price'].str.replace('$','')

In [None]:
# changing dtype into float
df['Price'] = df['Price'].astype('float64')

In [None]:
# observing Content Rating
df['Content Rating'].unique()

In [None]:
# # insights of Content Rating column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing 'Genres'
df.Genres.unique()

In [None]:
# # insights of 'Genres' column

# * as we already have 'Category' column and genre column is mixed of category and sub category in which sub category having mostly 'none' values also having very less count so we can delete this column this action will have no impact on data set


In [None]:
# removing 'Genres' column
df = df.drop( columns= 'Genres')

In [None]:
# observing Last Updated column
df['Last Updated'].unique()

In [None]:
#  # insights of Last Updated column

# * abnormalities: none
# * repeatation in data : none
# * dtype : to change in datetime

In [None]:
# changing dtype in datetime
df['Last Updated'] = pd.to_datetime(df['Last Updated'])

In [None]:
# observing Content Rating
df['Current Ver'].value_counts()

In [None]:
# # insights of Current Ver column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Android Ver
df['Android Ver'].unique()

In [None]:
# # insights of Android Ver column

# * abnormalities: to remove 'W' from '4.4W', and replacing 'upper range values' to 'and up'
# * repeatation in data : none
# * dtype : remain same

In [None]:
# removing 'W' with ''
df['Android Ver'] = df['Android Ver'].str.replace('W', '')

In [None]:
# replacing '- 7.1.1' to 'and up'
df['Android Ver'] = df['Android Ver'].str.replace('- 7.1.1', 'and up')

In [None]:
# replacing '- 8.0'' to 'and up'
df['Android Ver'] = df['Android Ver'].str.replace('- 8.0', 'and up')

In [None]:
# replacing '- 6.0' to 'and up'
df['Android Ver'] = df['Android Ver'].str.replace('- 6.0', 'and up')

In [None]:
# ensuring all changes takes place
df.info()

In [None]:
# having a sneak peak to ensure changes
df.head()

In [None]:
# CLEANING AND TRANSFORMING DATASET df1

In [None]:
# finding duplicates
df1.duplicated().sum()

In [None]:
# insights for duplicates in df1

# * having duplicate 33616 rows so will remove all

In [None]:
# dropping duplicates
df1 = df1.drop_duplicates()

In [None]:
# finding null values
df1.isnull().sum()

In [None]:
# insights for null values in df1

# * as we have only 5 columns in which 4 have null values so will remove all corresponding rows to get clean data

In [None]:
# dropping all null values
df1 = df1.dropna(axis = 0)

In [None]:
# observing App column of df1
df1['App'].unique()

In [None]:
# # insights of App column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Translated_Review column of df1
df1['Translated_Review'].unique()

In [None]:
# # insights of Translated_Review column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Sentiment column of df1
df1['Sentiment'].unique()

In [None]:
# checking amount of null values
df1['Sentiment'].value_counts()

In [None]:
# # insights of Sentiment column

# * abnormalities: have null values and small in amount so will delete all null values
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Sentiment column of df1
df1['Sentiment_Polarity'].unique()

In [None]:
# # insights of Sentiment_Polarity column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Sentiment_Subjectivity column of df1
df1['Sentiment_Subjectivity'].unique()

In [None]:
# # insights of Sentiment_Subjectivity column

# * abnormalities: none
# * repeatation in data : none
# * dtype : remain same

In [None]:
# observing Translated_Review column of df1
df1['Translated_Review'].unique()

### What all manipulations have you done and insights you found?

Answer Here. I have cleaned and transformed and analysed the dataset thoroughly in order to perform EDA(EXPLORATORY DATA ANALYSIS). Firstly i have checked for all bad elements in data such as null values, duplicates , datatype error, abnormalities, columns to split and remove, etc. Then i have applied data cleaning method in which i removed null values from several columns , removed duplicates, removed and fixed abnormalities, removed unwanted columns and rows, changed datatypes of columns, splitted columns into two parts, applied lambda function for manipulation, replaced values where needed. Then after every manipulation i have created a note of it that what was problem and what manipulation needed and what i have performed then after i have assured by checking every column that i have cleaned my data or not to ensure 100% cleaning. Also i have created some calculated columns to find KEY METRICS and KPIs for data and other columns as per the need after analysing data thoroughly. I have applied above procedure for both of my dataset and cleaned and fixed wherever needed.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Category wise apps

In [None]:
# to find no. of apps in each category
plt.figure(figsize =(16,4))
dc = df.groupby(['Category']).count()['App'].sort_values(ascending = False)
sns.barplot(data =  dc)
plt.xticks(rotation=90)
plt.title('category wise no. of apps')
plt.show()



##### 1. Why did you pick the specific chart?

Category Comparison: Bar plots are excellent for comparing the frequency or count of items in different categories, such as the number of apps in each category in this case.

Readability: The bar plot's vertical bars make it easy to see the differences in counts between categories. Additionally, by rotating the x-axis labels with plt.xticks(rotation=90), you've ensured that the category labels are readable even if there are many categories.

Ordered Presentation: Sorting the categories by count (dc = df.groupby(['Category']).count()['App'].sort_values(ascending=False)) helps in presenting the data in a more organized and meaningful way. This allows viewers to quickly identify the categories with the most and least number of apps.

Visual Impact: The large size of the figure (plt.figure(figsize=(16,4)) enhances the visual impact of the chart, making it easier to read and interpret.

Overall, the bar plot effectively communicates the distribution of apps across different categories, making it a suitable choice for visualizing this type of data.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

From the bar plot showing the number of apps in each category, you can derive several insights:

Most Popular Categories: You can easily identify the categories with the highest number of apps, which likely indicates the most popular or widely represented app categories.

Least Popular Categories: Conversely, you can identify the categories with the lowest number of apps, which may represent niche or less competitive segments of the app market.

Market Dominance: Categories with a significantly higher number of apps compared to others may indicate market dominance or strong competition within those categories.

Opportunities for Growth: Categories with a relatively low number of apps but still significant user demand may present opportunities for app developers to enter less saturated markets and target specific user needs.

Trends and Preferences: Changes in the distribution of apps across categories over time can reveal emerging trends or shifts in user preferences, which can inform strategic decisions for app development and marketing.

Overall, analyzing the distribution of apps across categories can provide valuable insights into market dynamics, competitive landscape, and opportunities for growth within the app industryAnswer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained from analyzing the distribution of apps across categories can indeed help create a positive business impact. Understanding market trends, competitive dynamics, and areas of opportunity can inform strategic decision-making and potentially lead to:

Better Resource Allocation: By identifying the most popular categories with high demand and competition, businesses can allocate resources such as marketing budgets, development efforts, and talent recruitment to capitalize on these lucrative segments of the market.

Product Development: Insights into emerging trends and user preferences can guide product development efforts, enabling businesses to create apps that better align with market demand and user needs. This can result in higher user engagement, retention, and ultimately, revenue generation.

Market Positioning and Differentiation: Understanding the competitive landscape and niche segments of the market allows businesses to strategically position their apps and differentiate themselves from competitors. By targeting underserved or less competitive categories, businesses can carve out a unique value proposition and attract a specific audience.

Optimized Marketing Strategies: Insights into category popularity and user behavior can inform marketing strategies, including targeted advertising campaigns, partnerships, and promotions. By focusing resources on channels and audiences with the highest potential for engagement and conversion, businesses can maximize the effectiveness of their marketing efforts.

Regarding insights that may lead to negative growth, one potential scenario could be:

Saturated Markets: If the analysis reveals that certain categories are highly saturated with a large number of apps competing for market share, entering or investing heavily in these categories may lead to intense competition, diminishing returns, and potentially negative growth. In such cases, businesses may struggle to stand out among competitors, acquire users cost-effectively, or achieve sustainable revenue streams.
It's essential for businesses to carefully evaluate the competitive landscape, market dynamics, and their own capabilities before making strategic decisions based on insights derived from data analysis. While certain insights may present opportunities for growth, others may signal challenges or risks that require careful consideration and mitigation strategies.







Answer Here

Yes, the insights gained from analyzing the distribution of apps across categories can indeed help create a positive business impact. Understanding market trends, competitive dynamics, and areas of opportunity can inform strategic decision-making and potentially lead to:

Better Resource Allocation: By identifying the most popular categories with high demand and competition, businesses can allocate resources such as marketing budgets, development efforts, and talent recruitment to capitalize on these lucrative segments of the market.

Product Development: Insights into emerging trends and user preferences can guide product development efforts, enabling businesses to create apps that better align with market demand and user needs. This can result in higher user engagement, retention, and ultimately, revenue generation.

Market Positioning and Differentiation: Understanding the competitive landscape and niche segments of the market allows businesses to strategically position their apps and differentiate themselves from competitors. By targeting underserved or less competitive categories, businesses can carve out a unique value proposition and attract a specific audience.

Optimized Marketing Strategies: Insights into category popularity and user behavior can inform marketing strategies, including targeted advertising campaigns, partnerships, and promotions. By focusing resources on channels and audiences with the highest potential for engagement and conversion, businesses can maximize the effectiveness of their marketing efforts.

Regarding insights that may lead to negative growth, one potential scenario could be:

Saturated Markets: If the analysis reveals that certain categories are highly saturated with a large number of apps competing for market share, entering or investing heavily in these categories may lead to intense competition, diminishing returns, and potentially negative growth. In such cases, businesses may struggle to stand out among competitors, acquire users cost-effectively, or achieve sustainable revenue streams.
It's essential for businesses to carefully evaluate the competitive landscape, market dynamics, and their own capabilities before making strategic decisions based on insights derived from data analysis. While certain insights may present opportunities for growth, others may signal challenges or risks that require careful consideration and mitigation strategies.







#### Chart 2 - top 10 categories review wise

In [None]:
# Chart - 2 reviews of top 10 categories
plt.figure(figsize =(16,4))
sns.barplot(x = 'Category', y = 'Reviews', data = df)
plt.title('top 10 category wise apps')
df.sort_values(by = 'Category' , ascending = True)
plt.xticks(rotation=90)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Insights that can be derived from this chart include:
Category Popularity: Identifying which categories have the highest number of reviews can indicate the overall popularity of app categories among users. Categories with a large number of reviews may signify high user engagement and satisfaction.

User Feedback: Review counts can serve as a proxy for user feedback and satisfaction levels. Categories with higher review counts may have a larger pool of user-generated feedback, providing insights into user preferences, feature requests, and areas for improvement.

Competitive Analysis: Analyzing review counts across categories can help assess the competitive landscape. Categories with a high concentration of reviews may indicate intense competition, while categories with fewer reviews may present opportunities for market entry or differentiation.

Strategic Decision-Making: Insights from this analysis can inform strategic decision-making, such as resource allocation, product prioritization, and marketing strategies. Businesses can focus their efforts on categories with high user engagement and positive feedback, maximizing the potential for success.

By visualizing review counts across different app categories, businesses can gain valuable insights into user behavior, market trends, and competitive dynamics, enabling them to make data-driven decisions to drive growth and success in the app marketplace.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

From the chart, some insights that can be derived include:

Category Engagement: The chart provides a visual representation of the distribution of reviews across different app categories. It allows us to identify which categories have the highest levels of user engagement based on the number of reviews.

Top Categories: By analyzing the bar heights, we can identify the top categories in terms of review counts. These categories are likely to be the most popular or widely used among users.

Relative Performance: Comparing the lengths of the bars allows us to gauge the relative performance of different categories in terms of user engagement. Categories with taller bars indicate higher review counts and potentially stronger user engagement.

User Preferences: The chart helps us understand user preferences and interests by highlighting the categories that attract the most attention and feedback from users. This information can be valuable for app developers and businesses seeking to tailor their offerings to meet user needs.

Overall, the chart provides valuable insights into user engagement and preferences across different app categories, which can inform strategic decision-making, marketing efforts, and product development initiatives.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The insights gained from the chart can indeed help create a positive business impact in several ways:

Informed Decision-Making: Understanding which app categories attract the most user engagement can guide businesses in allocating resources and investments more effectively. They can focus their efforts on developing and promoting apps in categories that have high user interest, potentially leading to increased downloads and revenue.

Targeted Marketing: Armed with knowledge about user preferences and popular app categories, businesses can tailor their marketing strategies to target relevant audiences more effectively. This targeted approach can result in higher conversion rates and customer satisfaction.

Product Development: Insights from the chart can inform product development initiatives, allowing businesses to prioritize features and functionalities that resonate with users in popular app categories. By aligning their offerings with user preferences, businesses can enhance user experience and loyalty.

However, there might be some insights that could potentially lead to negative growth if not addressed appropriately:

Neglected Categories: If certain app categories have consistently low review counts compared to others, it could indicate a lack of user interest or dissatisfaction with existing offerings. Ignoring these categories or failing to address user concerns within them could lead to negative growth over time as competitors may capitalize on unmet needs.

Poor User Experience: Categories with high review counts but predominantly negative feedback may indicate issues with user experience, functionality, or performance. Ignoring or neglecting these concerns could result in churn as dissatisfied users seek alternatives, leading to negative growth and reputation damage.

In summary, while the insights gained from the chart offer opportunities for positive business impact, it's essential for businesses to also address any potential pitfalls or areas of concern to avoid negative growth and maintain competitiveness in the market.







#### Chart 3 - total count vs total installs

In [None]:
#  Chart 3 - total count vs total installs
plt.figure(figsize = (14,6))
plt.xticks(rotation = 90)
sns.countplot(x = 'Installs', data = df)

##### 1. Why did you pick the specific chart?

Answer Here.

I picked this chart to visualize the distribution of app installs across different categories. Using a count plot with the x-axis representing the number of installs and the data sourced from the DataFrame df, you can quickly grasp the frequency of installs within each category. The rotation=90 argument for plt.xticks() ensures that the category labels on the x-axis are displayed vertically for better readability, especially if there are many categories. Overall, this chart provides an overview of the popularity of apps based on their installation counts, helping to identify trends and preferences among users.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

From the count plot of app installs across different categories:

Popular Categories: You can identify which categories have the highest number of app installs, indicating their popularity among users.

Distribution: You can observe the distribution of installs across different categories, understanding which categories have a higher or lower number of installs.

Market Trends: By analyzing the distribution pattern, you can identify market trends and user preferences. For instance, if certain categories consistently have higher install counts, it may indicate a growing demand for apps in those categories.

Competitive Analysis: You can compare the number of installs in different categories to assess competition levels. Categories with higher installs may have more competition, while those with lower installs may present opportunities for growth.

Strategic Decisions: These insights can inform strategic decisions regarding resource allocation, marketing efforts, and app development priorities. For example, investing more resources in categories with high demand can lead to increased user engagement and revenue generation. Conversely, categories with lower installs may require targeted marketing campaigns or product improvements to boost performance.







##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The insights gained from the count plot of app installs can indeed help create a positive business impact:

Identifying Growth Opportunities: By pinpointing categories with high install counts, businesses can focus their efforts on developing or enhancing apps in these categories, potentially leading to increased user engagement and revenue generation.

Optimizing Marketing Strategies: Understanding which categories attract more users allows businesses to tailor their marketing strategies to target those specific demographics more effectively. This can lead to improved user acquisition and retention rates.

Resource Allocation: Insights from the plot can guide resource allocation decisions, such as investing in research and development for popular categories or reallocating resources from underperforming categories to those with higher growth potential.

Competitive Advantage: Businesses can gain a competitive advantage by leveraging insights to differentiate their offerings in popular categories, optimizing features and user experience to better meet consumer needs and preferences.

Regarding insights that may lead to negative growth:

Underperforming Categories: Categories with consistently low install counts may indicate a lack of demand or intense competition, making it challenging for new entrants or existing apps to gain traction. Investing resources in these categories without a thorough understanding of market dynamics could lead to negative growth.

Misaligned Strategies: If businesses misinterpret the insights or fail to adapt their strategies accordingly, they may allocate resources inefficiently, resulting in wasted investments and stagnant or declining performance.

Failure to Address User Needs: Focusing solely on categories with high install counts without considering user needs and preferences may lead to subpar app experiences, resulting in negative reviews, decreased user engagement, and ultimately, decreased installs over time.

In summary, while the insights gleaned from the count plot can certainly contribute to positive business outcomes, businesses must carefully analyze and interpret the data to make informed decisions and avoid potential pitfalls that could lead to negative growth.







#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize =(14,6))
sns.countplot(x='Type', data=df)

##### 1. Why did you pick the specific chart?

Answer Here.The choice of a count plot for the "Type" variable in the dataset can provide several insights that are valuable for understanding the distribution of app types within the dataset:

Understanding Distribution: The count plot allows us to visualize the distribution of different types of apps (e.g., Free vs. Paid) in the dataset. This helps in understanding the proportion of each type relative to the total number of apps.

Market Segmentation: By analyzing the count of each type of app, businesses can gain insights into market segmentation trends. For example, they can understand the popularity of free apps compared to paid apps and tailor their strategies accordingly.

Monetization Strategy: For developers or businesses considering app development, this plot can provide insights into the prevalent monetization strategies in the market. Understanding the distribution of free and paid apps can help in making informed decisions about pricing models.

Competitive Analysis: By examining the count of different app types, businesses can conduct competitive analysis within their industry. They can assess the market landscape, identify key players, and understand their positioning based on the type of apps they offer.

User Preferences: Insights from the count plot can also shed light on user preferences. For example, if free apps significantly outnumber paid apps, it may indicate a strong preference among users for free content or a reluctance to pay for apps.

Overall, the count plot provides a clear and concise visualization of the distribution of app types, enabling businesses to derive actionable insights that can inform their strategic decisions related to app development, marketing, and monetization.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

The insights gleaned from the count plot of the "Type" variable could include:

Prevalence of Free Apps: The plot may reveal that the majority of apps in the dataset are free. This suggests that the app market is highly competitive, with many developers opting for a freemium or ad-supported model to attract users.

Variety of Paid Apps: While free apps dominate, there might still be a substantial number of paid apps in the dataset. This indicates that developers are leveraging different monetization strategies, targeting specific niches, or offering premium features for a fee.

Market Dynamics: Understanding the distribution of free and paid apps provides insights into market dynamics. For instance, if the number of paid apps is significantly lower compared to free apps, it could imply challenges in monetizing apps through upfront payments.

Monetization Potential: Businesses can assess the monetization potential of their apps based on the ratio of free to paid apps. A higher proportion of paid apps might suggest a market where users are more willing to pay for premium content or features.

Competitive Landscape: Analyzing the distribution of app types can help identify competitors' strategies. If certain categories have a higher concentration of paid apps, it may indicate opportunities for differentiation or gaps in the market that businesses can exploit.

User Behavior: The prevalence of free apps may reflect user preferences for cost-free options or reluctance to pay for apps. This insight can inform pricing strategies and the design of monetization models to align with user behavior.

Overall, the count plot offers insights into the composition of the app market, the preferences of developers and users, and the potential for monetization, which can guide strategic decisions in app development, marketing, and revenue generation.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The insights gained from the count plot of app types can indeed help create a positive business impact by providing valuable information for strategic decision-making. However, there might be potential insights that could raise concerns or indicate areas of negative growth:

Limited Monetization Opportunities: If the count plot reveals a disproportionately low number of paid apps compared to free apps, it could signal challenges in monetizing apps through upfront payments. This imbalance may limit revenue potential and profitability, especially if the market primarily relies on free offerings.

Competitive Pressures: A high concentration of free apps in certain categories might indicate intense competition, making it challenging for paid apps to stand out or gain traction. This competitive pressure could lead to reduced profitability and market saturation, negatively impacting growth prospects for developers relying on paid models.

Value Perception: If the majority of apps in lucrative categories are offered for free, it may influence users' perceptions of value and willingness to pay for premium features or content. This dynamic could erode the perceived value of paid apps and hinder efforts to monetize through app purchases or subscriptions.

Monetization Strategy Effectiveness: Disproportionate adoption of one type of monetization strategy (e.g., primarily free apps with in-app purchases or advertisements) may indicate over-reliance on a single revenue stream. This lack of diversification could pose risks if market conditions change or if users become resistant to certain monetization methods.

Market Fragmentation: A wide variation in the distribution of app types across categories could lead to market fragmentation, making it difficult for developers to target specific user segments effectively. Fragmentation may result in diluted market share, reduced brand visibility, and challenges in achieving economies of scale.

To address these potential challenges and leverage the insights gained, businesses may need to adapt their app monetization strategies, differentiate their offerings, and explore alternative revenue streams such as partnerships, subscriptions, or value-added services. Additionally, focusing on user engagement, retention, and providing unique value propositions can help mitigate the negative impacts of market dynamics and foster sustainable growth in the competitive app ecosystem.







#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize =(14,6))
sns.scatterplot(y='Rating', x='Price', data= df, hue='Type')

##### 1. Why did you pick the specific chart?

Answer Here.
The scatter plot comparing app ratings to their prices, differentiated by app type (free or paid), is chosen for several reasons:

Price Sensitivity Analysis: This chart allows for an examination of the relationship between app prices and user ratings. Understanding how pricing affects user perceptions of value and satisfaction is crucial for optimizing pricing strategies and maximizing revenue.

Comparison Across Types: By including both free and paid apps in the analysis, the chart enables a comparison of how pricing impacts user ratings differently for each type. This comparison can reveal insights into the effectiveness of different monetization models and their implications for user satisfaction.

Identifying Pricing Sweet Spots: Patterns in the scatter plot may highlight price points that correspond to higher user ratings, indicating potential pricing "sweet spots" where apps strike a balance between affordability and perceived value. Identifying these sweet spots can inform pricing decisions and help maximize user satisfaction and profitability.

Detection of Outliers: The scatter plot can help identify outliers, such as high-priced apps with low ratings or free apps with exceptionally high ratings. Investigating these outliers can uncover factors contributing to extreme user perceptions and guide adjustments in pricing, features, or marketing strategies.

Type-Based Analysis: The use of hue to differentiate between free and paid apps allows for a visual comparison of rating-price relationships within each type. This distinction can reveal whether pricing has a consistent impact across app types or if there are nuanced differences in user expectations and behaviors between free and paid offerings.

By analyzing the scatter plot, businesses can gain valuable insights into the relationship between pricing and user satisfaction, optimize their pricing strategies to maximize revenue and user satisfaction, and identify opportunities for product differentiation and competitive advantage in the app marketplace.







##### 2. What is/are the insight(s) found from the chart?

Answer Here

From the scatter plot comparing app ratings to their prices, differentiated by app type (free or paid), several insights can be gleaned:

Free Apps Predominate: There is a higher concentration of data points for free apps compared to paid apps, indicating that the majority of apps in the dataset are free to download.

Distribution of Ratings: The scatter plot shows a wide distribution of ratings across both free and paid apps. While most apps have ratings clustered around the higher end of the scale (above 4.0), there are also significant numbers of apps with lower ratings.

Price Range: Free apps are predominantly concentrated at price point zero, as expected, while paid apps exhibit more variability in pricing, ranging from very low to higher price points.

Impact of Price on Ratings: Generally, there seems to be a trend where higher-priced apps (especially those priced above $20) tend to have lower ratings compared to lower-priced apps. However, there are exceptions, with some high-priced apps receiving relatively high ratings.

Type-Based Analysis: There appears to be a difference in the distribution of ratings between free and paid apps. While free apps exhibit a broader range of ratings, paid apps tend to cluster more around higher ratings, especially in the mid-price range.

Outliers: There are some outliers where apps, both free and paid, have exceptionally high or low ratings relative to their prices. These outliers may warrant further investigation to understand the factors influencing user perceptions and ratings.

Overall, the scatter plot provides insights into the complex relationship between app prices and user ratings. It suggests that while pricing may influence user perceptions to some extent, other factors such as app quality, features, and user experience also play significant roles in determining user satisfaction and ratings.







##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The insights gained from the scatter plot can indeed contribute to positive business impact by informing pricing strategies and product development decisions. Here's how:

Optimizing Pricing Strategy: Understanding the relationship between app prices and user ratings can help businesses optimize their pricing strategies. For instance, they can adjust prices based on the observed trends to maximize revenue without compromising user satisfaction.

Improving App Quality: Insights from the scatter plot can guide businesses in enhancing the quality of their apps. By analyzing the ratings of both free and paid apps, companies can identify areas for improvement and prioritize features that contribute to higher user satisfaction.

Differentiating Paid Apps: For paid apps, maintaining a balance between pricing and perceived value is crucial. Businesses can use the insights to ensure that paid apps offer sufficient value to justify their prices, thereby increasing customer retention and positive reviews.

However, there are also insights that could potentially lead to negative growth if not addressed:

Risk of Negative Reviews for High-Priced Apps: The scatter plot indicates that some high-priced apps receive lower ratings. If left unaddressed, this trend could lead to negative reviews, reduced downloads, and ultimately, decreased revenue.

Competitive Pricing Pressure: Businesses operating in highly competitive markets may face pressure to lower prices to remain competitive. While lowering prices may attract more users initially, it could impact profitability if not carefully managed.

Overall, while the insights from the scatter plot offer valuable guidance for improving app performance and pricing strategies, it's essential for businesses to carefully balance pricing decisions with maintaining high-quality user experiences to ensure sustained positive growth.








#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.
To achieve the business objectives the client can take the following actions:

Optimize App Offerings: Utilize the insights gained from the analysis to optimize app offerings, including refining features, enhancing user experience, and ensuring alignment with user preferences and trends.

Pricing Strategies: Tailor pricing strategies based on factors such as app category, size, and user ratings. Consider offering freemium models, tiered pricing options, or promotional discounts to attract and retain users.

Marketing Campaigns: Develop targeted marketing campaigns informed by geographical trends and user preferences. Leverage data-driven strategies to reach the right audience with compelling messaging and offers.

Continuous Monitoring: Implement systems for continuous monitoring and analysis of app trends, user behaviors, and competitor activities. Stay agile and responsive to market changes to maintain competitiveness and drive sustained growth.

User Engagement: Focus on enhancing user engagement and satisfaction through regular updates, feature enhancements, and personalized experiences. Encourage user feedback and iterate based on user preferences and pain points.

Partnerships and Collaborations: Explore partnerships and collaborations with relevant stakeholders such as influencers, content creators, and other app developers to expand reach and drive user acquisition.

Data-Driven Decision Making: Foster a culture of data-driven decision-making within the organization. Invest in analytics tools and resources to empower stakeholders with actionable insights for informed decision-making at all levels.

Customer Support and Feedback: Prioritize customer support and feedback mechanisms to address user concerns promptly and proactively. Use feedback loops to identify areas for improvement and iterate on app offerings accordingly.

By implementing these strategies and leveraging the insights derived from the analysis, the client can effectively optimize their app offerings, drive user engagement and satisfaction, and maximize success on the Google Play Store platform.







# **Conclusion**

Write the conclusion here.

In conclusion, the analysis of the Google Play Store dataset has provided valuable insights into app trends, user preferences, and factors influencing app ratings and downloads. Through exploratory data analysis (EDA), we identified correlations between various app attributes such as category, size, price, and rating, shedding light on their impact on user engagement and satisfaction. Additionally, geographical trends in app usage and popularity across different regions were explored, offering valuable information for developers and marketers.

These insights are instrumental in guiding app developers and marketers in making informed decisions about app development, pricing strategies, and marketing campaigns. By leveraging these findings, stakeholders can optimize their app offerings to better meet user needs and preferences, ultimately maximizing their success on the Google Play Store platform.

Moving forward, continuous monitoring and analysis of app trends and user behaviors will be essential to adapt to evolving market dynamics and maintain competitiveness in the rapidly evolving app ecosystem. Through ongoing data-driven strategies, stakeholders can continue to innovate and enhance their app offerings to drive sustained growth and success in the dynamic landscape of the Google Play Store.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***