# **Project Name**    - Play Store Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Member -**  Roshan


# **Project Summary -**

Google Play Store is one of the largest and most widely used digital distribution platforms for apps, games, movies, books, and more. It is designed specifically for Android users and is powered by Google. Launched in 2008 as the "Android Market", the platform has grown tremendously, offering millions of apps to users around the world. Here is a detailed summary of its main features and functionality.
Google_Play_2022_logo.svg
# **1. App types and variety**
The Play Store offers a huge variety of apps and games that cater to almost every imaginable need, from entertainment to business to education and health. These apps are categorized into social, communication, photography, music, health & fitness, business, and other categories. This classification makes it easier for users to search and find the right app for their needs. There are both free and paid apps, and many offer in-app purchases and subscriptions for additional services.

# **2. Games and entertainment**
One of the biggest sections of the Play Store is dedicated to gaming, from casual titles like puzzles and arcade games to more immersive, high-end mobile games that offer similar experiences console of popular games like "PUBG Mobile", "Call of Duty Mobile", "Fried Works". It is the essence of the titles. There are many indie games and developers on the Play Store that can reach a wider audience due to the platform’s accessibility. In addition to games, the Play Store offers a wide variety of movies, TV shows and music, which users can buy or rent.

# **3. Books and e-books**
The Play Store also doubles as a bookstore with millions of e-books in genres including fiction, non-fiction, educational, and more. Users can buy or rent books, making it easier for readers to access content directly on their devices. The sync feature allows users to read more content with ease, picking up right where they left off.

   




#The Google Play Store is a comprehensive digital marketplace that offers a wide range of products from apps and games to books and movies. It emphasizes user protection through Play Protect, offers developers more monetization opportunities, and provides a user-friendly interface to enhance the browsing and shopping experience Its global reach and continuous improvement make it a part of the Android ecosystem it is essential.






# **GitHub Link -** https://github.com/Roshan6288

# **Problem Statement**


1. Which are the top app categories on the Play Store based on the number of installs and average ratings?
2. How important is an app's rating in determining its success (number of installs, reviews, etc.)?
3. Which app category has the highest number of total installations?
4. What percentage of Play Store apps are free compared to paid?
5. Which app categories are most favored by users in terms of both downloads and positive reviews?
6. How does the distribution of apps vary across different genres?
7. How are app ratings affected when the app is paid versus free?
8. What is the distribution of review sentiments (positive, neutral, negative) across the dataset?
9. What impact does the content rating (e.g., Everyone, Teen, Mature) have on app success (downloads, ratings)?
10. What is the distribution of app updates over the course of the year?

#### **Define Your Business Objective?**

Using Play Store data, we can conduct in-depth analysis to understand app performance, user behavior, and market trends. Key insights can be gained by analyzing categories, ratings, reviews, installations, and app attributes (size, price, etc.). You can identify the top apps, identify features generating high ratings, and measure how price (free vs paid) affects users. Sentiment analysis in user reviews can reveal patterns in user satisfaction. By testing updates you can analyze their impact on app success. These insights help developers optimize app features, monetization strategies, and user experience to successfully penetrate the market.
And try to give answers to all the question we have.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
file_path = '/content/drive/MyDrive/Pc Folder/Almabetter/Play Store Data.csv'
gpd_df = pd.read_csv(file_path)
gpd_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

rows=gpd_df.shape[0]
columns=gpd_df.shape[1]
print(f"the no of rows is {rows} and no of columns is {columns}")
print(gpd_df.columns)

### Dataset Information

In [None]:
# Dataset Info
gpd_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = gpd_df[gpd_df.duplicated()]
duplicate_count = len(duplicate_rows)

print(f"Number of duplicate rows: {duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values_count = gpd_df.isnull().sum()
print("Missing/Null Values Count for each column:")
print(missing_values_count)

In [None]:
   # Visualizing the missing values
plt.figure(figsize=(12, 8))
sns.heatmap(gpd_df.isnull(), cbar=False, cmap='viridis')
plt.title("Heatmap of Missing Values")
plt.show()

### What did you know about your dataset?

The data set provides a list of Play Store apps, including columns such as App, Category, Rating, Installs, and Price. This includes both statistical variables (e.g., Rating, Reviews, Installs) and categorical variables (e.g., Category, Type). There are some missing values ​​in some columns of the dataset, specifically Rating and Size, that need to be addressed. Duplicates may also be present and will need to be removed for accurate analysis. Descriptive statistics such as mean, median, and standard deviation provide insight into statistical data distribution, and help identify patterns, outliers, and skewness in the data

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

print(gpd_df['Category'].unique())

print(gpd_df.nunique())

print(gpd_df.isnull().sum())



In [None]:
# Dataset Describe

print(gpd_df.describe())
print(gpd_df['Category'].value_counts())


### Variables Description

The dataset contains variables describing Play Store apps. The main variables are App (name of the app), category (type of app, such as games, educational), rating (average user rating out of 5), and review (a overall usage ratio), and Installs (number of app downloads). However, the price indicates whether the app is free or paid. Type distinguishes between free and paid apps, and content ratings determine the right audience (e.g., general, teens). Last Update records the app's most recent update date, providing a more detailed taxonomy beyond the Genres category.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values_count = gpd_df.nunique()
print("Number of unique values per column:")
print(unique_values_count)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
gpd_df[gpd_df["Android Ver"].isnull()]
gpd_df["Android Ver"].value_counts()
gpd_df=gpd_df[gpd_df['Android Ver'].notna()]
gpd_df['Current Ver'].value_counts()
gpd_df["Current Ver"].value_counts()
gpd_df=gpd_df[gpd_df['Current Ver'].notna()]
gpd_df[gpd_df["Type"].isnull()]
gpd_df["Type"].value_counts()
gpd_df=gpd_df[gpd_df['Type'].notna()]
gpd_df[gpd_df["Price"].isnull()]
gpd_df["Price"].value_counts()
gpd_df=gpd_df[gpd_df['Price'].notna()]
gpd_df[gpd_df["Genres"].isnull()]
gpd_df["Genres"].value_counts()
gpd_df=gpd_df[gpd_df['Genres'].notna()]
gpd_df[gpd_df['Rating'].isnull()]
gpd_df['Rating'].value_counts()
gpd_df=gpd_df[gpd_df['Rating'].notna()]
print(gpd_df.isnull().sum())




### What all manipulations have you done and insights you found?

Key steps included handling missing values ​​by filling numeric columns (e.g., rating) with mean, category columns (e.g., category) with mode Removed duplicates, and correcting data sets— value and installation are converted to numeric, and last to datetime format . Outliers in Installs were extracted using IQR.

Reviews:
There are free apps, and those with higher ratings have more installations.
Classifications such as "game" and "tool" are the most popular.
Continuous innovation has a positive impact on ratings.
Free apps attract more downloads than paid apps.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - Top Categories by App Count

In [None]:
# Chart - 1 visualization code

top_categories = gpd_df['Category'].value_counts().head(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=top_categories.values, y=top_categories.index, palette="viridis")
plt.title('Top Categories by App Count')
plt.xlabel('App Count')
plt.ylabel('Categories')
plt.show()


##### 1. Why did you pick the specific chart?

Bar Chart – Bar charts are ideal for comparing the popularity of different categories based on discrete variables like app counts and installs.

##### 2. What is/are the insight(s) found from the chart?

 The most popular app categories based on the number of apps and installations are Games, Tools, and Communication. These categories dominate the Play Store, attracting large user bases.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Targeting popular app categories such as Games, Tools, and Communication can increase the likelihood of downloads. Developers and businesses can focus on these categories to enhance visibility, as users are more inclined to explore highly popular sections. By innovating or filling gaps within these dominant categories, app creators can increase their chances of standing out and capturing a larger user base.

#### Chart -   Paid vs. Free Apps

In [None]:
# Chart - 2 visualization code
paid_free = gpd_df['Type'].value_counts()
plt.figure(figsize=(8, 8))
plt.pie(paid_free, labels=paid_free.index, autopct='%1.1f%%', colors=['#ff9999','#66b3ff'], startangle=90)
plt.title('Paid vs Free Apps')
plt.show()


##### 1. Why did you pick the specific chart?

 A Pie Chart illustrates the proportion of paid and free apps, effectively showing the relative sizes of categories.

##### 2. What is/are the insight(s) found from the chart?

The majority of Play Store apps are free. Only a small portion are paid apps, reflecting user preference for free content.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Since the market is dominated by free apps, businesses should explore freemium models where the base app is free but offers in-app purchases or premium upgrades. This approach balances accessibility and monetization. Developing completely paid apps may work for niche markets but will face competition from free alternatives.


#### Chart -

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Rating', y='Installs', data=gpd_df, hue='Type')
plt.title('Ratings vs Installs')
plt.xlabel('Rating')
plt.ylabel('Installs')
plt.show()


##### 1. Why did you pick the specific chart?

A Scatter Plot was used to visualize the relationship between ratings and installs since both are continuous variables.


##### 2. What is/are the insight(s) found from the chart?

 Higher-rated apps tend to have more installs. Ratings significantly influence an app's success in the store.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Ratings act as social proof, impacting user decisions. Businesses can prioritize improving user experience, addressing feedback, and fixing bugs to increase ratings. High ratings improve visibility and attract more users, leading to a positive feedback loop.

#### Chart - App Categories by Audience

In [None]:
# Chart - 4 visualization code

audience_categories = gpd_df['Content Rating'].value_counts()
plt.figure(figsize=(10, 6))
sns.barplot(x=audience_categories.values, y=audience_categories.index, palette='Set2')
plt.title('App Count by Audience Category')
plt.xlabel('App Count')
plt.ylabel('Content Rating')
plt.show()

##### 1. Why did you pick the specific chart?

 A Bar Chart was used to visualize the count of apps targeting different audiences, making it easy to compare across groups.

##### 2. What is/are the insight(s) found from the chart?

Family, Education, and Lifestyle apps attract a wide range of users, reflecting their broad appeal.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Developing apps for broad audiences, particularly families and education, can result in higher user engagement. Apps designed for family or educational use often enjoy longer lifespans and frequent usage. Businesses focusing on creating content for these audiences can secure larger, more diverse user bases.

#### Chart -  Impact of Last Update on Ratings

In [None]:
# Chart - 5 visualization code
gpd_df['Last Updated'] = pd.to_datetime(gpd_df['Last Updated'])
gpd_df = gpd_df.sort_values(by='Last Updated')

plt.figure(figsize=(10, 6))
sns.lineplot(x='Last Updated', y='Rating', data=gpd_df)
plt.title('Last Update vs Ratings')
plt.xlabel('Last Updated')
plt.ylabel('Rating')
plt.show()


##### 1. Why did you pick the specific chart?

 A Line Plot shows how ratings evolve over time with frequent app updates, indicating temporal changes.

##### 2. What is/are the insight(s) found from the chart?

Apps that are updated frequently tend to have better ratings, as users appreciate responsiveness and improvements.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Regular updates not only enhance the app’s performance but also indicate that developers are responsive to user feedback. Businesses that maintain frequent update cycles can improve user retention and ratings. This helps apps stay relevant and compete effectively in the market.

#### Chart -  Paid Apps and Ratings

In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10, 6))
sns.boxplot(x='Type', y='Rating', data=gpd_df)
plt.title('Paid vs Free Apps - Rating Comparison')
plt.xlabel('App Type')
plt.ylabel('Rating')
plt.show()


##### 1. Why did you pick the specific chart?

 Box plots show the distribution of ratings for paid vs. free apps, highlighting median values and outliers

##### 2. What is/are the insight(s) found from the chart?

A strong positive correlation exists between the number of reviews and ratings.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Premium apps, while fewer in number, often receive more favorable ratings due to better quality and fewer ads. Businesses developing paid apps should focus on delivering superior experiences to justify the cost. Offering high-quality features and excellent customer support ensures higher user satisfaction.
python

#### Chart - Top Categories by Total Installs

In [None]:
# Chart - 7 visualization code
genres_count = gpd_df['Genres'].value_counts().head(10)
plt.figure(figsize=(10, 6))
sns.barplot(x=genres_count.values, y=genres_count.index, palette="coolwarm")
plt.title('Top Genres by App Count')
plt.xlabel('App Count')
plt.ylabel('Genres')
plt.show()




##### 1. Why did you pick the specific chart?

A Bar Chart was used to compare the number of apps across different genres, highlighting the top ones.

##### 2. What is/are the insight(s) found from the chart?

Tools, Games, and Education are the most common genres. Some genres are saturated, while others present niche opportunities

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Saturated genres like Tools and Games offer more competition but also signify high demand. Niche genres with fewer apps might present an opportunity to stand out. Businesses can decide to enter competitive spaces with unique offerings or explore underrepresented genres for growth.e

#### Chart - Rating by Content Rating

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10, 6))
sns.scatterplot(x='Reviews', y='Rating', data=gpd_df, hue='Type')
plt.title('Reviews vs Ratings')
plt.xlabel('Reviews')
plt.ylabel('Rating')
plt.show()


##### 1. Why did you pick the specific chart?

 A Scatter Plot was used to visualize the relationship between the number of reviews and ratings, which are both continuous variables.


##### 2. What is/are the insight(s) found from the chart?

 Apps with more reviews often have higher ratings, showing that user feedback drives visibility and credibility.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Encouraging user reviews helps boost an app’s rating and overall credibility. More reviews help potential users gauge an app’s quality. Businesses can implement prompts for user reviews, encouraging feedback, which can lead to higher ratings and increased installs.

#### Chart -  Price vs. Installs

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(10, 6))
sns.lineplot(x='Last Updated', y='Rating', data=gpd_df.sort_values('Last Updated'))
plt.title('Effect of Last Update Date on Ratings')
plt.xlabel('Last Updated')
plt.ylabel('Rating')
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

A Line Plot was used to examine how the frequency of app updates affects ratings over time.

##### 2. What is/are the insight(s) found from the chart?

Apps that were updated recently tend to have slightly higher ratings, as regular updates improve performance and user satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Regular updates are crucial to maintaining high ratings. Businesses should ensure their apps are frequently updated to fix bugs, introduce new features, and stay relevant. This not only improves user experience but also signals to users that the app is well-maintained, driving higher retention and positive feedback.

#### Chart - Reviews vs. Rating

In [None]:
# Chart - 10 visualization code

def categorize_sentiment(rating):
    if rating >= 4.0:
        return 'Positive'
    elif rating >= 2.0:
        return 'Neutral'
    else:
        return 'Negative'

gpd_df['Sentiment'] = gpd_df['Rating'].apply(categorize_sentiment)
sentiment_counts = gpd_df['Sentiment'].value_counts()

plt.figure(figsize=(8, 8))
plt.pie(sentiment_counts,
        labels=sentiment_counts.index,
        autopct='%1.1f%%',
        colors=['#99ff99', '#ff9999', '#66b3ff'],
        startangle=90)
plt.title('Distribution of Review Sentiments Based on Ratings')
plt.axis('equal')
plt.show()



##### 1. Why did you pick the specific chart?

A Pie Chart was used to visualize the proportion of positive, negative, and neutral sentiments in user reviews.

##### 2. What is/are the insight(s) found from the chart?

The sentiment analysis of reviews shows that most are positive, followed by neutral, with a small percentage being negative.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive reviews indicate high user satisfaction, but focusing on neutral and negative feedback can uncover areas for improvement. Businesses can take action on negative reviews to improve the app's features or user experience, leading to better retention and conversion rates.

#### Chart -  Content Rating Impact on App

In [None]:
# Chart - 11 visualization code
content_rating_installs = gpd_df.groupby('Content Rating')['Installs'].sum()
plt.figure(figsize=(10, 6))
sns.barplot(x=content_rating_installs.index, y=content_rating_installs.values, palette='coolwarm')
plt.title('Impact of Content Rating on Installs')
plt.xlabel('Content Rating')
plt.ylabel('Total Installs')
plt.show()


##### 1. Why did you pick the specific chart?

 A Bar Chart helps compare the number of installs based on content ratings.

##### 2. What is/are the insight(s) found from the chart?

Apps rated for Everyone receive the most downloads, while content ratings like Teen and Mature have lower install numbers but may target specific audiences.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Businesses targeting a broad audience should focus on developing apps with a content rating of Everyone, as these tend to attract more users. However, apps targeting niche audiences (like Teen or Mature) can still be successful if they meet specific user needs, offering opportunities for tailored marketing and engagement.

#### Chart - App Updates Over the Years

In [None]:
# Chart - 12 visualization code
gpd_df['Year'] = pd.DatetimeIndex(gpd_df['Last Updated']).year
updates_over_years = gpd_df['Year'].value_counts().sort_index()
plt.figure(figsize=(10,6))
sns.lineplot(x=updates_over_years.index, y=updates_over_years.values)
plt.title('Number of App Updates Over the Years')
plt.show()


##### 1. Why did you pick the specific chart?

A line plot was chosen for visualizing the number of app updates over the years because it effectively shows trends and changes over time. This chart allows us to see how app update frequency has evolved, making it easier to identify patterns, peaks, and declines in update activity across different years.

##### 2. What is/are the insight(s) found from the chart?

The line plot illustrates trends in app updates, indicating that there may be fluctuations in the number of updates over the years. A noticeable increase in updates in certain years could suggest that developers are responding to user feedback, technological advancements, or competitive pressures, while declines may indicate market saturation or decreased engagement.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 understanding the trends in app updates over the years can inform strategic planning for development teams. If there are years with significantly fewer updates, businesses may want to investigate potential causes, such as resource allocation or market dynamics. Additionally, identifying peaks in updates may encourage businesses to adopt similar practices or strategies to maintain high engagement. Regular updates can lead to improved user satisfaction and retention, directly contributing to a positive business impact by enhancing the app's overall quality and relevance in the market.

#### Chart -  App Size vs. Rating

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(8,6))
sns.scatterplot(x='Size', y='Rating', data=gpd_df)
plt.title('App Size vs. Rating')
plt.show()



##### 1. Why did you pick the specific chart?

A scatter plot was chosen because it’s ideal for visualizing the relationship between two continuous variables—app size and app rating. The scatter plot allows us to see if there’s any correlation between the size of an app (in MB) and its user rating. This chart makes it easy to detect trends, clusters, or outliers.

##### 2. What is/are the insight(s) found from the chart?

The scatter plot reveals that there is no strong linear correlation between app size and rating. While some larger apps tend to have higher ratings, overall, apps of various sizes receive similar ratings, suggesting that size alone isn’t a determining factor for user satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 the insights help developers understand that app size alone does not significantly impact ratings. This implies that instead of focusing solely on reducing app size, businesses should prioritize improving app features, performance, and user experience to achieve higher ratings. However, maintaining an optimal app size for performance efficiency can still enhance user experience, especially in regions with limited storage or slower internet connectivity.

#### Chart - 14 - Correlation Heatmap of Most Downlorded App vs Categories

In [None]:
# Correlation Heatmap visualization code
gpd_df['Installs'] = gpd_df['Installs'].astype(str).str.replace('[+,]', '', regex=True)
gpd_df['Installs'] = gpd_df['Installs'].replace('', '0').astype(int)
category_installs = gpd_df.groupby('Category')['Installs'].sum().sort_values(ascending=False)
category_installs_df = category_installs.reset_index()
plt.figure(figsize=(12, 8))
sns.heatmap(category_installs_df.set_index('Category').T, cmap='YlGnBu', annot=False, fmt='.0f', linewidths=0.5)
plt.title('Total Installs by Category', fontsize=16)
plt.yticks(rotation=0)
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Heatmaps are an effective data analytics tool, providing clarity and depth in understanding the segment’s performance.

##### 2. What is/are the insight(s) found from the chart?

The heatmap of general installs through category inside the Google Play Store presents several insights:

Category Popularity: It simply indicates which app categories have the best general installs, permitting builders to pick out trending and popular classes for new app development.

Install Distribution: The heatmap visually illustrates the distribution of installs among categories, revealing capacity possibilities for underrepresented categories which could gain from more app offerings.

These insights manual strategic decisions on app improvement and advertising efforts in the Android environment.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
gpd_df['Installs'] = gpd_df['Installs'].astype(str).str.replace('[+,]', '', regex=True)
gpd_df['Installs'] = gpd_df['Installs'].replace('', '0').astype(int)
numeric_columns = ['Rating', 'Installs', 'Size', 'Price', 'Reviews']
numeric_df = gpd_df[numeric_columns]
sns.pairplot(numeric_df, diag_kind='kde', markers='o')
plt.suptitle('Pair Plot of Google Play Store Apps', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

The pair plot is a effective device for exploratory data analysis, taking into account a holistic view of the relationships in the dataset.

##### 2. What is/are the insight(s) found from the chart?

The pair plot of the Google Play Store data reveals several key insights:

Interaction: Higher app ratings are often associated with more installs, suggesting that user satisfaction drives downloads.
Review Effect: Apps with more reviews have better ratings, resulting in extreme average opinions with large feedback samples.
Size and ratings: Smaller apps can perform better in ratings, indicating that the user prefers smaller apps.
Pricing: Free apps generally get higher ratings and downloads than paid apps.
Outliers: The plot can identify outsiders, which triggers further research on apps with high ratings but low ranking or vice versa

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The insights gained from this evaluation have direct commercial enterprise implications for app improvement and advertising techniques. Understanding that categories like Games and Tools power higher downloads can assist groups prioritize where to invest resources. The unfastened app marketplace dominates, however paid apps tend to acquire higher ratings, suggesting that monetization techniques want cautious attention. Regular updates are critical for maintaining excessive rankings and ensuring user pleasure, even as sentiment analysis of reviews can assist groups apprehend user needs and improve app capabilities.

# **Conclusion**

In conclusion, the analysis of the Play Store data set revealed a lot of valuable insights for app developers and businesses aiming to win in the Android market The dominance of free apps emphasizes its importance need to provide free options to attract more downloads while paid apps tend to have better ratings it is suggested that users have higher expectations for premium content Categories hear that games, tools, communication have the highest number of installations, indicating its popularity among users, while family and educational apps cater to a wider audience

The correlation between the frequency and the high number of updates underscores the need for continuous improvement in the app's functionality and features. Regularly updated apps tend to improve user satisfaction, which is important to maintain user loyalty. Furthermore, research into search sentiment suggests that most reviews are positive, although negative feedback provides important insights to address user pain points and improve the app experience effective de

By leveraging these insights, businesses can create more targeted app development and marketing strategies. Focusing on users through regular updates, addressing negative reviews, and designing apps strategically in high-demand groups will give them users have been satisfied and successful for a long time in a competitive Play Store ecosystem

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***