# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member 1 -**      - Aryan Varma


# **Project Summary -**

- This project analyzes Google Play Store data combined with User Reviews to understand what drives app success, user engagement, and overall performance. The Play Store dataset includes app details such as category, rating, reviews, installs, size, type, price, content rating, and genres. The User Reviews dataset adds deeper insights through translated reviews, sentiment labels, polarity scores, and subjectivity.

- After cleaning and merging both datasets, various charts were created to explore patterns in installs, categories, sentiments, ratings, and app characteristics. The analysis revealed that most apps are free and targeted toward a broad audience (“Everyone”). Categories like Communication, Social, Tools, and Games dominate with the highest installs. Ratings mostly fall between 4.0–4.5, indicating generally good-quality apps. A strong correlation was observed between installs and the number of reviews, meaning popular apps naturally receive more feedback.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


- The Google Play Store hosts thousands of mobile applications across various categories, each competing for user attention, downloads, and positive ratings.

---

- However, app performance varies widely depending on factors like app size, category, pricing, user sentiment, and overall ratings. Without proper analysis, developers and businesses struggle to understand what influences app popularity and user satisfaction.

---


- This project aims to analyze Play Store data and user reviews to uncover meaningful patterns, trends, and relationships that determine app success.

#### **Define Your Business Objective?**

- The business objective is to help app developers and businesses make data-driven decisions to improve app performance, increase visibility, and maximize user engagement.

---


- By studying installs, ratings, sentiment patterns, and category-wise behavior, the goal is to identify what type of apps perform best, which factors influence user satisfaction, and how app characteristics can be optimized to achieve growth.


---


- The insights from this project will guide the client in designing better apps, improving user experience, targeting the right audience, and ultimately boosting downloads and business outcomes.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as ex

### Dataset Loading

In [None]:
# Load Dataset
playstore_data = pd.read_csv("Play Store Data.csv")
user_reviews = pd.read_csv("User Reviews.csv")
# playstore_data
user_reviews

### Dataset First View

In [None]:
# Dataset First Look
print("playstore Data", playstore_data.shape)
print("User Reviews Data", user_reviews.shape)
# playstore_data.info()
# user_reviews.info()
merged_df = pd.merge(    # Here we have merged the both datasets to avoid double executions
    playstore_data,
    user_reviews,
    on="App",
    how="left"
)

merged_df.head()
merged_df.tail()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
merged_df.shape # Returns the count od Rows & columns in which 131971 is a row count and 17 is columns count


### Dataset Information

In [None]:
# Dataset Info
merged_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
merged_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
merged_df.isnull().sum()

### What did you know about your dataset?



I have Merged the Playstore App Review Datasets and Created a new Dataset Named as merged_df

## What did i get to know from my merged_df:-
* My Dataset has 131971 Rows & 17 Columns
* dataset includes both app-level attributes (Category, Rating, Size, Installs, Price, etc.) and review-level attributes (Sentiment, Polarity, Subjectivity).
* Missing values are present in Rating, Sentiment, Polarity, and Subjectivity columns — mainly because some apps do not have reviews.
* Some columns such as Installs, Size, Price, Reviews are stored as text and require cleaning before analysis.
* After merging, the dataset provides a complete view of app performance, user sentiment, and engagement — making it suitable for deeper EDA.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
merged_df.columns

In [None]:
# Dataset Describe
merged_df.describe()

### Variables Description

Answer Here

* App – Name of the application
* Category – App category (e.g., Tools, Game, Education)
* Rating – Average user rating on Play Store
* Reviews – Number of user reviews received
* Size – App size in MB or KB
* Installs – Total downloads/installations
* Type – Free or Paid
* Price – Price of the app (if paid)
* Content Rating – Suitable age group
* Genres – App genre
* Last Updated – Last update date
* Current Ver – Current version
* Android Ver – Minimum Android version required
* Translated_Review – User review text
* Sentiment – Sentiment label (Positive, Negative, Neutral)
* Sentiment_Polarity – Sentiment polarity score
* Sentiment_Subjectivity – Degree of subjectivity of review

### Check Unique Values for each variable.

In [None]:

# Check Unique Values for each variable.
for cols in merged_df:
  print(f"{cols}: {merged_df[cols].nunique()} unique Value")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df = merged_df.copy()

# 1. First we'll convert the Reviews Columns Numeric
df["Reviews"] = pd.to_numeric(df["Reviews"], errors="coerce")
# df["Reviews"].dtype

# 2. Now we will Clean Installs columns which is Containing(+ and , )
df["Installs"] = df["Installs"].str.replace("+", "", regex=False) #This line removes all the + values in a columns like 10,000+
df["Installs"] = df["Installs"].str.replace(",", "", regex=False) #This line removes all the "," in a columns like 10,000
df["Installs"] = pd.to_numeric(df["Installs"], errors="coerce") # Converts to Numeric values
# df["Installs"].dtype

# 3. Clean Price (remove $ sign)
df["Price"] = df["Price"].str.replace("$", "", regex=False) # This removes the $ sign from the Price columns
df["Price"] = pd.to_numeric(df["Price"], errors="coerce")

# 4. Clean Size column
# Convert KB to MB and set "Varies with device" as NaN
df["Size"] = df["Size"].replace("Varies with device", np.nan)

def convert_size(x):
    if "M" in str(x):
        return float(x.replace("M",""))
    elif "k" in str(x):
        return float(x.replace("k",""))/1024
    return np.nan

df["Size_MB"] = df["Size"].apply(convert_size)

# 5. Convert Rating missing values
df["Rating"] = pd.to_numeric(df["Rating"], errors="coerce")

# 6. Drop rows where essential fields are missing
df = df.dropna(subset=["Rating"])

# 7. Reset index
df.reset_index(drop=True, inplace=True)

df.head()


### What all manipulations have you done and insights you found?

Answer Here.

* Converted “Reviews” into numeric format
→ Some values were non-numeric; fixing this allows mathematical operations.

* Cleaned the "Installs" column by removing commas and plus signs
→ Now it can be used for numerical comparisons.

* Cleaned the “Price” column by removing the dollar sign
→ Enables analysis of free vs paid apps.

* Converted “Size” into a consistent MB format
Values in KB were converted

* “Varies with device” was replaced with NaN
→ This helps us examine the impact of app size on installs and rating.

* Converted Rating to numeric and dropped rows with missing ratings
→ Ensures correct visualization and statistical analysis.

* Created a new column “Size_MB” for consistent measurement.

* Removed rows with missing essential values
→ Results in a cleaner dataset for EDA.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

# Chart - 1 App-Count By Category
plt.figure(figsize=(12, 6))
sns.countplot(data = merged_df, y = "Category", order = merged_df["Category"].value_counts().index)
plt.title("App Count By Category")
plt.xlabel("Count")
plt.ylabel("Category")
plt.show()


##### 1. Why did you pick the specific chart?

A countplot clearly shows which categories dominate the Apps of Play Store.

##### 2. What is/are the insight(s) found from the chart?

Some categories (like FAMILY, GAME, TOOLS) have far more apps than niche categories. Market saturation varies heavily and Most least is COMICS

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- According to me What i have analyised from this Chart is:-
      * Positive: Helps identify crowded categories vs emerging opportunities.
      * Negative: Highly crowded categories mean tougher competition.

#### Chart - 2

In [None]:
# Chart - 2 Apps By Content Rating
import seaborn as sns

rating_counts = merged_df.groupby('Content Rating')['App'].count().reset_index()
rating_counts = rating_counts.sort_values(by='App')

sns.barplot(data=rating_counts, x='Content Rating', y='App')
plt.title("Apps by Content Rating")
plt.xlabel("Content Rating")
plt.xticks(rotation = 90)
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

Bar charts are ideal for comparing discrete categories.

##### 2. What is/are the insight(s) found from the chart?

Most apps target "Everyone." "Teen" and "Mature" categories are significantly smaller.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Shows safer market entry zones.
Negative: Niche content ratings have limited reach.

#### Chart - 3

In [None]:
# Chart - 3 Rating Distribution

sns.histplot(merged_df["Rating"].dropna(), bins = 30, kde=True)
plt.title("App Rating Distribution")
plt.xlabel("Rating")
plt.ylabel("Frequency")
plt.show()

##### 1. Why did you pick the specific chart?

Histogram + KDE shows rating spread and concentration

##### 2. What is/are the insight(s) found from the chart?

Ratings skew heavily toward 4–5. Most apps maintain decent user satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: New apps must maintain >4 rating for visibility.
Negative: Poor-rating apps will be lost in competition.

#### Chart - 4

In [None]:
# Chart - 4 Top 10 Installed Categories

top_rates = merged_df.groupby("Category")["Installs"].count().head(10)
top_rates.plot(kind = "bar")
plt.title("Top 10 Categories by Install Count")
plt.ylabel("Number of Install Entries")
plt.show()

##### 1. Why did you pick the specific chart?

Shows which categories attract the highest user base.

##### 2. What is/are the insight(s) found from the chart?

GAME, COMMUNICATION, SOCIAL usually dominate installs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Great target categories.
Negative: But also highly competitive.

#### Chart - 5

In [None]:
# Chart - 5 Free vs Paid Apps (Pie Chart)

merged_df['Type'].value_counts().plot(kind='pie', autopct="%1.1f%%", figsize=(7,7))
plt.title("Free vs Paid Apps")
plt.ylabel("")
plt.show()


##### 1. Why did you pick the specific chart?

This chart properly shows the Distribution between 2 or multiple datas.

##### 2. What is/are the insight(s) found from the chart?

90% apps are free → ad-based or freemium strategies dominate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Free model gets larger user base.
Negative: Paid apps struggle unless they offer unique value.

#### Chart - 6

In [None]:
# Chart 6:- Sentiment Polarity Distribution
plt.figure(figsize=(8,5))
plt.hist(df["Sentiment_Polarity"].dropna())
plt.title("Sentiment Polarity Distribution")
plt.xlabel("Polarity")
plt.ylabel("Count")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A histogram is the most natural way to visualize polarity scores (ranging from –1 to +1). It captures the emotional tone behind user reviews.

##### 2. What is/are the insight(s) found from the chart?

Most polarity scores cluster around the positive zone (>0).

Negative reviews exist but are fewer.

Neutral polarity peaks indicate many users leave short, neutral comments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive:

Understanding sentiment helps identify if issues mentioned in textual reviews align with ratings.

Strong clusters in positive polarity confirm high user satisfaction.

Negative:

If negative clusters appear, they point to pain points hurting the product’s image.

#### Chart - 7

In [None]:
# Chart - Size_MB vs Rating
plt.figure(figsize=(15,3))
plt.scatter(df["Size_MB"], df["Rating"])
plt.title("App Size vs Rating")
plt.xlabel("Size (MB)")
plt.ylabel("Rating")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

 This chart visually shows whether larger or smaller apps tend to get better ratings.

##### 2. What is/are the insight(s) found from the chart?

Ratings seem scattered without a strong upward or downward trend.

App size does not appear to significantly influence rating.

High-rating apps exist across all size ranges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive:

Developers can focus on functionality instead of worrying that small/compact apps get lower ratings.

User satisfaction is not dependent on size.

Negative:

If app size is very large without clear benefits, it may hurt downloads even if ratings don't suffer.

#### Chart - 8

In [None]:
# Chart - 8 Reviews Vs Ratings
plt.scatter(df["Reviews"], df["Rating"])
plt.title("Reviews vs Rating")
plt.xlabel("Number of Reviews")
plt.ylabel("Rating")
plt.show()


##### 1. Why did you pick the specific chart?

A scatter plot helps visualize how user engagement (number of reviews) relates to overall rating.

##### 2. What is/are the insight(s) found from the chart?

Apps with extremely high reviews usually maintain strong ratings.

Some apps have many reviews but still low ratings—indicating frustration scaling with user volume.

Smaller apps with niche audiences show scattered behaviour.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Identifies apps whose customer base is active and engaged.
Negative: High reviews + low ratings signal large-scale user dissatisfaction that must be addressed.

#### Chart - 9

In [None]:
# Chart - Average Rating by Category (Grouped)

avg_ratings = df.groupby("Category")["Rating"].mean().sort_values()
plt.figure(figsize=(10,5))
avg_ratings.plot(kind="bar")
plt.title("Average Rating by App Category")
plt.xlabel("Category")
plt.ylabel("Average Rating")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Comparing averages across categories highlights which industries deliver better user satisfaction.

##### 2. What is/are the insight(s) found from the chart?

Certain categories consistently get higher ratings (e.g., productivity, education).

Entertainment-heavy categories might show more volatility.

Poorly rated categories indicate competitive pressure or UX issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Helps identify categories known for good user experience.
Negative: Low-rated categories may require higher development investment to stay competitive.

#### Chart - 10

In [None]:
# Chart - 10 Sentiment Count By Types
plt.figure(figsize=(7,5))
df["Sentiment"].value_counts().plot(kind="pie", autopct="%1.1f%%")
plt.title("Sentiment Distribution")
plt.ylabel("")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A sentiment chart quickly shows what users emotionally feel about apps.

##### 2. What is/are the insight(s) found from the chart?

Positive reviews dominate the sentiment pool.

Neutral sentiment also appears in a large chunk—users giving factual comments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Positive sentiment reflects overall satisfaction.
Negative: Even small negative sentiment can highlight strong pain points developers must fix.

#### Chart - 11

In [None]:
# Chart - 11 Top 15 Genres By Number Of apps
plt.figure(figsize=(10,5))
df["Genres"].value_counts().head(15).plot(kind="bar")
plt.title("Top 15 Genres by Number of Apps")
plt.xlabel("Genre")
plt.ylabel("Count")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Genres are more detailed than categories. Visualizing the top genres pinpoints user preferences at a granular level.

##### 2. What is/are the insight(s) found from the chart?

Some genres dominate heavily (e.g., Tools, Communication).

A few genres appear in very small numbers, indicating niche markets.

Genre distribution helps understand developer focus areas.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Helps identify trending genres where user demand is strong.
Negative: Over-saturated genres may require strong differentiation to succeed.

#### Chart - 12

In [None]:
# Chart - 12 Install Distributions
plt.figure(figsize=(9,5))
plt.hist(df["Installs"], bins=40)
plt.title("Distribution of App Installs")
plt.xlabel("Number of Installs")
plt.ylabel("Frequency")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A histogram shows how installs are spread—whether most apps are small, mid-size, or massively downloaded.

##### 2. What is/are the insight(s) found from the chart?

Install counts are often extremely right-skewed.

Majority of apps have low install numbers.

A small percentage of apps dominate total installs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Identify app groups with strong user reach.
Negative: Highlights how difficult it is for new apps to gain traction.

#### Chart - 13

In [None]:
# Chart - 13 Avg Sentiment Polarity By Category
plt.figure(figsize=(10,5))
df.groupby("Category")["Sentiment_Polarity"].mean().sort_values().plot(kind="bar")
plt.title("Average Sentiment Polarity by Category")
plt.xlabel("Category")
plt.ylabel("Avg Sentiment Polarity")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Shows which app categories have the happiest or most frustrated users.

##### 2. What is/are the insight(s) found from the chart?

Some categories consistently generate positive sentiments.

Certain categories (like tools/communication) may show more negativity.

Helps compare user emotions across industries.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Identifies categories with high user trust.
Negative: Highlights which categories face the most user dissatisfaction.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(12,8))

# Selecting only numeric columns
numeric_df = df.select_dtypes(include=['float64', 'int64'])

corr = numeric_df.corr()

sns.heatmap(corr, annot=True, cmap="coolwarm", fmt=".2f")
plt.title("Correlation Heatmap of Numeric Features")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is the most effective way to see how numeric variables relate to each other.
It quickly reveals:

Positive relationships

Negative relationships

Variables with no meaningful connection

Which features might influence rating or installs

##### 2. What is/are the insight(s) found from the chart?

High correlations likely:

Reviews ↔ Installs → Apps with more installs usually receive more reviews.

Sentiment Polarity ↔ Rating → More positive sentiments usually correlate with higher ratings.

⭐ Medium correlations likely:

Size_MB ↔ Price → Some heavy apps (games, premium tools) may have higher prices.

⭐ Low or no correlation:

Size_MB ↔ Rating → App size doesn’t determine quality.

Price ↔ Rating → Paid apps don’t always rate better.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
plt.figure(figsize=(12,8))

# Select the most meaningful numeric columns for analysis
pairplot_cols = ["Rating", "Reviews", "Installs", "Sentiment_Polarity", "Size_MB"]

sns.pairplot(df[pairplot_cols], diag_kind="kde")
plt.suptitle("Pairplot of Key Numeric Variables", y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

## A pairplot is one of the best ways to see:

- How all numeric features relate to each other

- Trend directions (positive/negative)

- Clusters, outliers, linear patterns

- Distributions on the diagonal

- Correlation visibly without calculating it

##### 2. What is/are the insight(s) found from the chart?

Rating vs Reviews

Highly reviewed apps tend to keep stable ratings.

A few outliers might show many reviews but low ratings → user dissatisfaction.
1. Installs vs Reviews

Positive diagonal trend: more installs → more reviews.

Some apps have high installs but very low reviews → inactive user base.

2. Sentiment Polarity vs Rating

Usually shows a positive cluster → happier text reviews = higher rating.

3. Size_MB vs Installs

Small apps often have high installs.

Large apps cluster around mid-range installs.

4. Overall Distribution

Reviews and installs show heavy right skewness.

Rating distribution is concentrated between 3 and 5.

In [None]:
## Chart 16:- Box Plot Of Rating By Category
sns.boxplot(data = df, x = "Category", y = "Rating")
plt.title("Rating By Category")
plt.xlabel("Category")
plt.ylabel("Rating")
plt.xticks(rotation = 90)
plt.show()

1. Why did you pick the specific chart?
 - A boxplot shows how ratings vary within each category, revealing spread, outliers, and consistency.

2. What is/are the insight(s) found from the chart?
- Some categories have tightly packed rating ranges → consistent quality.
- Others show wide spread → unstable user experience.
- Outliers reveal exceptionally poor-performing apps

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.
- Positive: Helps find categories where maintaining high ratings is easier.
- Negative: Wide-spread categories require more effort to impress users.

In [None]:
# Chart 17:- Countplot of Sentiment by Content Rating

sns.countplot(data = df, x = "Content Rating", hue="Sentiment")
plt.title("Sentiment Distribution Across Content Rating")
plt.xticks(rotation = 90)
plt.show()

1. Why this chart?
- Shows how sentiment varies depending on the age segment (Everyone, Teen, 18+, etc.).

2. Insights
- “Everyone” apps may have more neutral/positive reviews.
- Teen/Mature apps may show stronger emotional reactions — more negative or positive spikes.
- Helps understand user expectations by age group.

3. Business Impact
- Positive: Useful for targeting user segments.
- Negative: Categories with high negativity need redesign or UX fixes.

In [None]:
# Chart 18:- Average Installs By Category
avg_installs = df.groupby("Category")["Installs"].mean()
avg_installs.plot(kind = "bar")
plt.title("Average Installs by Category")
plt.xlabel("Category")
plt.ylabel("Average Installs")
plt.tight_layout()
plt.show()

1. Why this chart?
- Shows which categories attract the most downloads on average.

2. Insights

- Some categories attract much higher average installs (e.g., Communication, Social).

- Low-install categories might be niche or lack user demand.

- Highlights categories with strong user interest.

3. Business Impact
- Positive: Points toward categories with strong growth potential.
- Negative: Entering saturated, high-install categories requires higher marketing effort.

In [None]:
# Chart 19 :- KDE Plot of Rating
sns.kdeplot(df["Rating"], fill = True)
plt.title("Rating Density Distribution")
plt.xlabel("Rating")
plt.ylabel("Density")
plt.tight_layout()
plt.show()

1. Why this chart?
- A KDE plot gives a smooth view of rating distribution, better than a histogram for spotting density peaks.

2. Insights
- Clear peaks around the most common rating values.

- Shows whether ratings are skewed or uniform.

- Helps visually detect modal behaviour (multiple peaks).

3. Business Impact
- Positive: Helps developers see user satisfaction trends clearly.
- Negative: Clusters around low ratings reveal possible UX issues.

In [None]:
# Chart 20: Top 20 Genres BY Number Of Apps
genres_counts = merged_df['Genres'].value_counts().head(20)

plt.figure(figsize=(12,6))
genres_counts.plot(kind='bar')
plt.title('Top 20 Genres by Number of Apps')
plt.xlabel('Genres')
plt.ylabel('Count')
plt.xticks(rotation=90)
plt.show()



1. Why this chart?

- Great for understanding which genres dominate.

2. Insights

- Some genres like Tools, Entertainment, Productivity appear frequently.

- Many genres have very few apps.

3. Business Impact
- High-count genres = more competition.

- Low-count genres = opportunities for niche apps.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

- To help the client achieve their business objective, the insights from the charts clearly show which categories, content ratings, and app types drive the most installs, engagement, and positive sentiment.

---


- The client should focus on building free, high-quality apps in categories with strong demand such as Communication, Social, Productivity, and Games. Optimizing app size, improving user ratings through regular updates, and targeting broader audiences like “Everyone” can significantly boost reach. Additionally, analyzing user sentiment helps enhance features users value most.

---

- Overall, using these insights, the client can prioritize high-impact categories, increase visibility, and design apps that align with user expectations to maximize growth and profitability.

# **Conclusion**

- The overall analysis highlights strong patterns in user behavior, app performance, and market demand on the Play Store. Most successful apps tend to be free, lightweight, frequently updated, and targeted toward a broad audience.

---

- Categories like Communication, Social, Tools, and Games consistently attract the highest installs, while user sentiment directly influences ratings and long-term engagement.

---

- By understanding these trends and aligning app
development with user expectations, the client can make informed decisions that improve visibility, increase downloads, and strengthen product-market fit. This data-driven approach ensures a more strategic roadmap for growth, competitiveness, and sustained business success

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***