# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual

##### **Name**   - Gaurav Kumar



# **Project Summary -**

This Play Store app analysis project offers invaluable insights into the intricate workings of the mobile app ecosystem. These findings serve as a compass for developers, marketers, and publishers, empowering them to make informed decisions that amplify app performance, user engagement, and overall success.

The dataset predominantly comprises free apps, advocating a strategy for developers to concentrate on creating free apps to establish a vast user base. For paid apps, a size constraint of 40MB proves pivotal for enhancing accessibility. Additionally, categories like Events, Beauty, and Parenting, though popular, remain underexplored, presenting a ripe opportunity for developers to capitalize on high installation rates.

Continuous app updates serve as a linchpin for customer retention. A universal content approach fosters inclusivity, while the development of bulkier apps finds a niche in categories like Games and Family. For paid apps, maintaining a modest price tag below a size limit of 20MB emerges as a winning formula.

The scrutiny of Game and Family categories uncovers a propensity for higher negative reviews, warranting meticulous development strategies in these domains. These are just a fraction of the comprehensive conclusions derived, offering crucial avenues for developers to explore before embarking on app development.

Conducting a preliminary exploratory data analysis (EDA) proved instrumental in mitigating avoidable risks and ideating superior app-building strategies. Prior to analysis, the dataset underwent rigorous cleansing and preprocessing to ensure data integrity, including addressing missing values, eliminating duplicates, and handling outliers.

The primary objectives encompassed identifying factors influencing app ratings and downloads, exploring correlations between app categories and user reviews, and elucidating insights into effective pricing strategies for top-rated apps. Leveraging my background knowledge in the mobile app industry, I shaped research questions that encapsulated the significance of user ratings, reviews, and app categories in app success.

Ethical considerations guided the responsible handling of user-generated data, ensuring privacy and sensitivity. Employing Python for data analysis, coupled with libraries like pandas, matplotlib, seaborn, and scikit-learn for machine learning tasks, enabled a comprehensive exploration of the dataset.

This project sheds light on the pathways to app success within the Google Play Store, emphasizing the potency of data-driven analysis in the vibrant arena of mobile applications. It underscores the pivotal role of informed decision-making based on nuanced insights extracted from data analysis.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

To identify the critical elements influencing app engagement on the Play Store. It allows for a focused and structured analysis of the data to extract actionable insights.


#### **Define Your Business Objective?**

**Unlocking App-Making Potential:** Leveraging the vast repository of Play Store app data to empower app-making businesses towards success by deriving actionable insights. These insights will guide developers in capturing the Android market effectively.

**Identifying App Success Factors:** Through comprehensive analysis of app attributes such as category, ratings, size, and utilizing customer reviews, the goal is to unveil the pivotal elements driving app engagement and overall success within the Android ecosystem.

**Strategic Insights for Developers:** By delving into the intricacies of the data, the objective is to unearth strategic insights that developers can utilize to refine their approaches, enhance app performance, and navigate the competitive Android landscape more effectively.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

**App:** Indicates the name of the application within the dataset.

**Category:** Specifies the classification or genre to which an application belongs.

**Rating:** Represents the user-assigned score for a particular application, indicating its overall quality or satisfaction level.

**Reviews:** Reflects the total count of user reviews given for the application.

**Size:** Indicates the storage space occupied by the application on a mobile device.

**Installs:** Specifies the total number of times an application has been downloaded or installed.

**Type:** Differentiates between free and paid applications within the dataset.

**Price:** Specifies the cost or pricing structure of the application where applicable.

**Content_Rating:** Indicates the target audience or age group for which the application is suitable.

**Genres:** Highlights additional categories or genres to which an application may belong, providing further classification.

**Last_Updated:** Specifies the date when the application was last updated.

**Current_Ver:** Provides information about the current version of the application.

**Android_Ver:** Specifies the minimum Android version required to support the application on its platform.

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

### Dataset Loading

In [None]:
# Load Dataset

from google.colab import drive
drive.mount('/content/drive')


filepath = "/content/drive/MyDrive/Alma Better Projects /Play Store Data.csv"
store_df= pd.read_csv(filepath)

filepath = "/content/drive/MyDrive/Alma Better Projects /User Reviews.csv"
reviews_df= pd.read_csv(filepath)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look

df_1 =  store_df.copy()   #df_1 (Play Store Data)
df_2 =  reviews_df.copy()   #df_2 (User Reviews)

df_1

In [None]:
df_2

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count


rows_df_1, columns_df_1 = df_1.shape
print(f"Rows in df_1: {rows_df_1}")
print(f"Columns in df_1: {columns_df_1}")


rows_df_2, columns_df_2 = df_2.shape
print(f"Rows in df_2: {rows_df_2}")
print(f"Columns in df_2: {columns_df_2}")



### Dataset Information

In [None]:
# Dataset Info

df_1.info()

In [None]:
df_2.info()

In [None]:
df_1.head()

In [None]:
df_1.tail()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

df_1.duplicated().sum()


In [None]:
df_2.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

df_1.isnull().sum().sort_values(ascending=False)    # Null Values in decending orders

In [None]:
df_2.isnull().sum().sort_values(ascending=False)    # Null Values in decending orders

In [None]:
# Visualizing the missing values

In [None]:
# Removing Null Value Row.
df_1.dropna(subset=["Type","Content Rating", "Current Ver", "Android Ver"], inplace= True)


In [None]:
df_1.isnull().sum()

In [None]:
df_1.boxplot()

From Box plot we can observe that 4.3 median of Rating Majority Consumer gave 4.3 Rating


In [None]:
#Fill Null Value
#Calculate the median of the 'Rating' column

rating_median = df_1['Rating'].median()

df_1['Rating'].fillna(rating_median, inplace=True)

df_1.isnull().sum()

In [None]:
# Null Value in User Review

df_2.isnull().sum()

In [None]:
df_2=df_2.dropna()

In [None]:
# Droping duplicates value in Play Store Data
df_1= df_1.drop_duplicates()


In [None]:
# Checking Duplicates Value After Droping
df_1.duplicated().sum()


In [None]:
# Droping duplicates value in Reviews Data
df_2=df_2.drop_duplicates()

In [None]:
df_2.duplicated().sum()

In [None]:
# Checking row & Columns of Play store Dataset after removing dataset
print('Row & Columns of Play Store Dataset is:',df_1.shape)
print('Row & Columns of User Dataset is:',df_2.shape)

### What did you know about your dataset?

In conducting the Play Store app analysis, my focus encompassed a thorough comprehension of the dataset's rich information. This extensive dataset cataloged diverse mobile applications available on the platform, offering a broad temporal perspective that allowed me to discern evolving trends within the app ecosystem across different time frames.

The dataset comprised pivotal variables, including App Name, Category spanning genres like Games, Business, Health, Rating denoting user evaluations on a 5-point scale, Reviews quantifying user feedback, Installs indicating app download volumes, Price in dollars, Content Rating for various audience groups, Genre for detailed app categorization, and Last Updated denoting the most recent app update.

My analysis aimed at several primary objectives: delineating factors influencing app ratings and download metrics, uncovering correlations between app categories and user reviews, and deriving insights into pricing strategies adopted by top-rated apps.

Leveraging prior knowledge within the mobile app industry, I recognized the profound impact of user ratings, reviews, and app categorization on an app's trajectory to success. This pre-existing insight served as the cornerstone for formulating pertinent research inquiries, guiding my analytical approach throughout the project.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

df_1.columns    # For df_1 (Play Store Data)


In [None]:
df_2.columns      # For df_2 (User Reviews)

In [None]:
# Dataset Describe

df_1.describe()     # For df_1 (Play Store Data)


In [None]:
df_2.describe()       # For df_2 (User Reviews)

### Variables Description


**App Name:** The specific name of the mobile application as listed on the Play Store.

**Category:** The broader classification or genre to which the app belongs, encompassing diverse categories like Games, Business, Health & Fitness, etc.

**Rating:** User-assigned evaluation of the app, typically on a scale ranging from 1 to 5 stars.

**Reviews:** The total count of user reviews and ratings received by the app.

**Installs:** An estimated figure denoting the number of times the app has been downloaded and installed by users.

**Price:** The monetary value assigned to the app if it is not offered for free, typically displayed in the currency relevant to the Play Store's country.

**Content Rating:** A classification system applied to the app indicating the appropriate audience group for the content, such as Everyone, Teen, or Mature.

**Genre:** A more detailed and nuanced categorization beyond the main category, providing specific descriptors like Action, Puzzle, etc.

**Last Updated:** The date representing the most recent update made to the app on the Play Store.

**Size:** The digital footprint of the app in terms of storage space, commonly measured in megabytes (MB) or gigabytes (GB), guiding users with device storage considerations.

**Current Version:** The specific version number of the app currently available on the Play Store.

**Android Version:** The minimum version of the Android operating system required to run the app.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

In [None]:
df_1['Type'].unique()       #Unique values in Type

In [None]:

df_1['Installs'].unique()        #Unique values in Install


In [None]:
df_1['Size'].unique()      #Unique values in Size

In [None]:
df_1['Category'].unique()        #Unique values in Category

In [None]:
df_1['Price'].unique()         #Unique values in Price

In [None]:
string_columns_df1 = df_1.select_dtypes(include='object')  # Selecting string columns
for column in string_columns_df1.columns:
    unusual_values = df_1[column].loc[df_1[column].str.contains(r'[@#%?+&]', regex=True, na=False)]
    if not unusual_values.empty:
        print(f"Column '{column}' has unusual characters:")
        print('\n')
        print(unusual_values)
        print('\n')

In [None]:
# we are checking here some wrong entries like #,@,%,?,+,& in string or in integer type coloumn that we are unable to find during null value detection.
for num,col in enumerate(df_1.columns,1):
    print('\n')
    print(num,')\n','{} : {}'.format(col,df_1[col].unique().tolist()))

In [None]:
# Replacing $ symbol with space and convering into float data type
df_1['Price']=df_1['Price'].apply((lambda x:str(x).replace('$','') if'$'in str(x) else str(x)))
df_1['Price']=df_1['Price'].apply((lambda x:float(x)))
df_1['Price'].unique()

## 3. ***Data Wrangling***

### Data Wrangling Code

### What all manipulations have you done and insights you found?


In [None]:
# Write your code to make your dataset analysis ready.

In [None]:
# Replacing all + and , with space and converting into float data type
df_1['Installs']=df_1['Installs'].apply(lambda x:str(x).replace('+','')if '+' in str(x) else str(x))
df_1['Installs']=df_1['Installs'].apply(lambda x:str(x).replace(',','')if ','in str(x)else str(x))
df_1['Installs']=df_1['Installs'].apply(lambda x:float(x))

In [None]:
df_1['Installs'].unique()

In [None]:
df_1['Last Updated'].value_counts().sort_values(ascending=False)

In [None]:
df_1['Last Updated'].value_counts().sort_values(ascending=False)

In [None]:
df_1['Android Ver'].value_counts().sort_values(ascending=False)

In [None]:
df_1['Android Ver'] = df_1['Android Ver'].str.replace(' and up', '')

In [None]:
df_1['Android Ver']=df_1['Android Ver'].replace('Varies with device','4.1')

In [None]:
df_1['Android Ver'].unique()

In [None]:
df_1['Current Ver'].value_counts().sort_values(ascending=False)

In [None]:
df_1['Current Ver']=df_1['Current Ver'].replace('Varies with device',1.0)

In [None]:
df_1['Last Updated'].unique()

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### **Chart - 1  To visualize total installations according to content rating and the count of apps per content rating.**

In [None]:
# Chart - 1 visualization code

# Total Apps according to Content Rating Count.
content_rating_counts = df_1['Content Rating'].value_counts()
content_rating_counts


In [None]:
#Total Installation According to Content Rating Wise.
content_apps_intallations=df_1.groupby('Content Rating')['Installs'].sum().astype('int64')
content_apps_intallations.sort_values(ascending=False)

In [None]:
#Chart Visualization:

fig=plt.figure(figsize=(20,4))
plt.subplot(1, 2, 1)

# Total Apps according to Content Rating Count
plt.barh(content_rating_counts.index, content_rating_counts.values, color='skyblue')
plt.title("Content Ratings Distribution")
plt.xlabel("Number of Apps")
plt.ylabel("Content Rating")


#Total Installation According to Content Rating Wise
plt.subplot(1,2,2)
plt.barh(content_apps_intallations.index, content_apps_intallations.values, color='skyblue')
plt.title("Number of installation acc. to content")
plt.xlabel("Number of Apps")
plt.ylabel("Installation")
plt.show()

##### 1. Why did you pick the specific chart?

To observe the total Installations According to Content Rating Wise & Total Apps according to Content Rating Count.

##### 2. What is/are the insight(s) found from the chart?

The number of apps available for everyone is higher compared to apps rated for Everyone 10+. However, despite fewer apps, Everyone 10+ rated apps have higher installations than apps with a Mature 17+ rating, indicating better engagement in the former category despite a smaller app count.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights from the chart show positive impacts for business. The distribution indicates higher content ratings for everyone, suggesting no negative growth trends. The larger presence in the "everyone" category signifies a positive influence, likely broadening the app's appeal and user reach

#### **Chart - 2 To determine the top apps category count and top category apps installations from the dataset.**

In [None]:
# Chart - 2 visualization code

# Top apps Category count
top10_catg=df_1['Category'].value_counts().sort_values(ascending=False).head(10)
top10_catg

In [None]:
# Top Category apps installs
catg_apps_ins=df_1.groupby('Category')['Installs'].sum().sort_values(ascending=False).astype('int64').head(10)
catg_apps_ins

In [None]:
#Chart Visualization:

# Plotting top app categories by count
plt.figure(figsize=(10, 6))
top10_catg.plot(kind='bar', color='skyblue')
plt.title('Top 10 App Categories by Count')
plt.xlabel('Category')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()


# Plotting top categories by total installs
plt.figure(figsize=(10, 6))
catg_apps_ins.plot(kind='bar', color='lightgreen')
plt.title('Top 10 App Categories by Total Installs')
plt.xlabel('Category')
plt.ylabel('Total Installs')
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To find the top apps Category count & top Category apps installs.

##### 2. What is/are the insight(s) found from the chart?

1.Family category has the highest number of apps, with games following closely as the second-highest category in terms of app count.

2.Game apps rank first in terms of installations, while communication apps hold the second position. Family apps rank sixth in installations among the categories.








##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights show positive potential for business impact as the Family category exhibits the highest installation count. This suggests an opportunity for developers to focus on creating more apps within this category, indicating no negative growth trends.

#### **Chart - 3 To determine the number of installations in paid and free apps.**

In [None]:
# Chart - 3 visualization code

# Total number of apps.
T_Apps_count=df_1['App'].count()
T_Apps_count


In [None]:
# Number of free and Paid apps
N_apps_Count=df_1.groupby('Type')['App'].count()
N_apps_Count

In [None]:
#Total number of Installation in paid & Free apps
T_num_apps=df_1.groupby('Type')['Installs'].sum().reset_index()
T_num_apps

In [None]:
#convert exponenials to integer number
T_num_apps['Installs'] =T_num_apps['Installs'].astype('int64')
T_num_apps

In [None]:
#Chart Visualization:

fig=plt.figure(figsize=(14,4))
plt.subplot(1, 2, 1)

# percentage of free and Paid apps
N_apps_Count.plot(kind='pie', colors=['Blue', 'Red'], autopct='%1.1f%%',startangle =180, textprops={'fontsize': 14},explode = (0, 0.2))
plt.title('Total Percentage of Free and Paid Apps',fontsize=10,fontweight="bold")
plt.ylabel('')
plt.legend()

#Total number of Installation in paid & Free apps
plt.subplot(1, 2, 2)
sns.barplot(x='Type',y='Installs', data=T_num_apps, palette="Set2")
plt.title('Total number of apps',fontsize=10,fontweight="bold")
plt.ylabel('Count')


plt.show()

##### 1. Why did you pick the specific chart?

To understand the number of installations in paid and free apps.

##### 2. What is/are the insight(s) found from the chart?


The dataset contains a total of 10,346 apps, with 9,584 being free and 762 being paid. This means that about 92.6% of apps in the dataset are free, while 7.4% are paid. Despite having fewer paid apps, the total installations for free apps significantly surpass those of paid apps, making the difference difficult to visualize in a bar graph due to the lower count of paid apps.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights from the charts show a higher count of free apps compared to paid ones, suggesting a positive impact as free apps tend to dominate in quantity, potentially increasing user reach. This observation doesn't suggest any negative impact; instead, it indicates a favorable scenario for business due to the higher count of free apps.

#### **Chart - 4 top apps based on the highest number of installations and to identify the top apps within each category.**


In [None]:
# Chart - 4 visualization code

#Top Apps Install
Top_app=df_1.groupby('App')['Installs'].sum().sort_values(ascending=False).head(10).astype('int64')
Top_app

In [None]:

# Top Apps category Wise

top_catagery_apps=df_1.groupby(['Category','App'])['Installs'].sum().sort_values(ascending=False).head(10).astype('int64')
top_catagery_apps

In [None]:
#Chart Visualization:

fig=plt.figure(figsize=(18,4))
plt.subplot(1, 2, 1)

#Top Apps Install
Top_app.plot(kind='pie', startangle=280, pctdistance=0.85,autopct='%1.1f%%')
plt.title('Top apps Install',fontsize=15,fontweight='bold')
plt.ylabel('')

centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
fig.patch.set_facecolor('#95a3a6')

#Top Apps category Wise
plt.subplot(1, 2, 2)
top_catagery_apps.plot(kind='pie', startangle=280, pctdistance=0.85,autopct='%1.1f%%')
plt.title('Top Category apps Install',fontsize=15,fontweight='bold')
plt.ylabel('')

centre_circle = plt.Circle((0,0),0.70,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
fig.patch.set_facecolor('#95a3a6')

plt.show()

##### 1. Why did you pick the specific chart?

To identify the top app installations, you'll look for the apps with the highest number of installations. For category-wise top apps, you'll determine the most popular or highest-installed apps within each app category.

##### 2. What is/are the insight(s) found from the chart?

Among the top 10 installed apps, three gaming apps - Subway Surfer, Candy Crush Saga, and Temple Run - rank prominently.

Within the top 10 installed categories, two gaming apps hold positions, suggesting gaming apps' popularity.

Gaming apps stand out with the highest installations among Play Store categories.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights from the charts suggest a positive impact on business. Among the top installations, game apps like Subway Surfer, Candy Crush Saga, and Temple Run stand out. Additionally, within the top categories by installations, gaming apps hold a prominent position. This highlights the significant popularity of gaming apps, indicating a positive trend rather than a negative impact.

#### **Chart - 5 Analyzing the relationship between categories and ratings.**

In [None]:
# Chart - 5 visualization code

plt.figure(figsize=(12, 6))
sns.boxplot(x='Category', y='Rating', data=df_1)
plt.xticks(rotation=90)
plt.title('Category vs Rating')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.show()

##### 1. Why did you pick the specific chart?

Analyzing the relationship between categories and ratings often involves exploring how different app categories correlate with their respective ratings.

##### 2. What is/are the insight(s) found from the chart?


The Family category dominates in app count, featuring ratings of 4.6, 4.5, 4.3, and 4.2. Meanwhile, Games, with ratings of 4.4 and 4.3, shows the highest-rated apps and higher installation rates, indicating increased consumer interest in apps with superior ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights obtained from content rating analysis foster positive business impact by guiding app development towards popular content categories. No identified insights pose a risk of negative growth, ensuring all gained perspectives contribute positively to business decisions without detrimental effects.

#### **Chart - 6 To explore the correlation between 'Installations', 'Reviews', 'Rating', and 'Price'.**

In [None]:
# Chart - 6 visualization code

# Selecting relevant columns for correlation analysis
correlation_data = df_1[['Installs', 'Reviews', 'Rating', 'Price']]

In [None]:
# Calculating the correlation matrix
correlation_matrix = correlation_data.corr()


In [None]:
#Chart Visualization:

# Plotting the heatmap
plt.figure(figsize=(8, 6))
heatmap = sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap between Installations, Reviews, Rating, and Price')
plt.show()


##### 1. Why did you pick the specific chart?


To determine the correlation between variables in the dataset.

##### 2. What is/are the insight(s) found from the chart?



The observed factors significantly influencing app installations are the app reviews and ratings. These elements play a pivotal role in shaping user engagement and driving app installations in the market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, The insights gathered from the charts indicate opportunities for positive business impact, guiding developers towards content ratings with higher installations. No negative insights surfaced that could adversely affect business decisions or growth strategies, ensuring a focused approach without overlooking potential user segments.

#### **Chart - 7 To visualize the distribution of rating.**

In [None]:
# Chart - 7 visualization code

plt.figure(figsize=(8, 6))
plt.hist(df_1['Rating'], bins=20, color='skyblue', edgecolor='black')  # Adjust the number of bins as needed
plt.xlabel('Ratings')
plt.ylabel('Frequency')
plt.title('Distribution of Ratings')
plt.grid(axis='y', alpha=0.5)
plt.show()

##### 1. Why did you pick the specific chart?


Plotting a histogram with KDE for the 'Rating' column offers a visual depiction of its distribution, providing insights into the density and spread of app ratings.

##### 2. What is/are the insight(s) found from the chart?

The prevalence of app ratings between 4.0 and 4.5 signifies a positive trend for business, indicating a substantial portion of apps garnering favorable user feedback within a promising rating range. This prevalence suggests a favorable user sentiment and potential for sustained user engagement, potentially contributing to business success in the app market.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes,
The insights derived from high-rated content categories promise a positive business impact, guiding resource allocation for engaging apps. However, a prevalence of lower-rated apps may adversely affect business success by potentially limiting user engagement and market reach.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.


### **Based on the project observations:**

**App Distribution:** 92.6% are free, 7.4% are paid, with higher installations for free apps.

**Content Ratings:** Majority belong to 'Everyone' category; 'Everyone 10+' apps, despite fewer in count, gather higher installations than 'Mature 17+'.

**Category Insights:** Family and gaming categories dominate in app count; gaming apps lead in installations.

**Top Installations:** Gaming apps dominate the top installations, showcasing Subway Surfer, Candy Crush Saga, and Temple Run.

**Ratings:** Most apps hover around a 4.3 rating; games tend to have higher ratings and installations.

**Consumer Engagement:** Higher ratings correlate with increased consumer interest and app downloads.

**Sentiment Analysis:** Facebook shows fewer installations but positive sentiment polarity in reviews.

###**Client Recommendations:**

**Decision-Making:** Leverage these insights for informed decisions in app development and marketing strategies.

**Performance Review:** Use visualizations for performance reviews, goal tracking, and identifying underperforming areas.

**Customer Experience:** Tailor marketing strategies based on ratings and sentiment to enhance customer experiences.

**Anomaly Detection:** Utilize visualizations to swiftly identify anomalies or outliers for prompt intervention.


# **Conclusion**


In the dynamic realm of mobile applications, this Play Store app analysis delved deep into the ecosystem, offering invaluable insights for developers, marketers, and publishers. The prevalence of free apps suggests a strategic focus on capturing a larger user base. For paid apps, adherence to a 40MB size limit emerges as a crucial factor for success.

Through meticulous data cleansing, preprocessing, and insightful visualizations, this analysis not only mitigated risks but also sparked innovative app-building ideas. It underscores the transformative power of data-driven strategies, steering businesses away from avoidable pitfalls and toward enhanced performance and user engagement.

Data visualization emerges as a pivotal tool, enabling clients to harness insights for strategic decision-making, competitive edge, and overall performance improvement. It illuminates avenues for growth, aids in identifying challenges, and facilitates impactful communication within and beyond organizations.

In summation, this project highlights the roadmap to app triumph within the Google Play Store, showcasing the transformative impact of data-driven analyses in the ever-evolving landscape of mobile applications.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***