<a href="https://colab.research.google.com/github/nikesh11xx/eda-/blob/main/playstore_app_eda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Playstore App EDA

---





##### **Project Type**    - EDA



# **Project Summary -**



**Playstore App Dataset Summary**

The Playstore app dataset is a comprehensive collection of over 10,000 Android applications available on the Google Play Store. It provides valuable insights into the app ecosystem, including app categories, ratings, reviews, and installation counts.

**Key Findings:**

* **App Categories:** The dataset covers a wide range of app categories, including games, productivity tools, social networking apps, and more.
* **Ratings and Reviews:** The dataset includes both user ratings and review text for each app, providing valuable feedback on app quality and user experience.
* **Installation Counts:** The dataset provides information on the number of times each app has been installed, indicating its popularity among users.
* **App Size and Price:** The dataset includes data on app size and price, providing insights into the resource requirements and monetization strategies of different apps.

**Data Analysis and Visualization:**

The dataset can be analyzed using various techniques to uncover valuable insights about the Playstore app ecosystem. Data visualization tools can be used to explore app categories, ratings, and installation counts over time. Sentiment analysis can be performed on app reviews to gauge user sentiment towards specific apps or categories.

*
The Playstore app dataset has several potential applications, including:

* **App Recommendation:** The dataset can be used to build recommendation systems that suggest apps to users based on their preferences and past behavior.
* **App Market Analysis:** The dataset can be used to analyze trends and patterns in the app market, identifying emerging categories and popular app features.
* **App Optimization:** The dataset can be used to identify areas for improvement in existing apps, such as improving ratings, increasing installations, or optimizing app size.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**



**Problem Statement:**

Given the Playstore app dataset, the goal of this project is to analyze and understand the characteristics and trends of Android applications on the Google Play Store. The analysis should focus on identifying insights that can be used to improve app recommendations, optimize app market strategies, and enhance existing apps.

**Specific objectives include:**

* Identifying the most popular app categories and their growth over time.
* Analyzing user ratings and reviews to understand user sentiment towards different apps and categories.
* Investigating the relationship between app size, price, and installation counts.
* Exploring patterns and trends in app releases and updates.
* Identifying potential opportunities for app developers and publishers.

**The project aims to provide valuable insights that can be used to:**

* Improve the user experience and satisfaction with Playstore apps.
* Optimize app recommendations and discovery mechanisms.
* Inform app developers and publishers about market trends and user preferences.
* Drive innovation and growth in the Android app ecosystem.


# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

In [None]:
df = pd.read_csv('/content/drive/MyDrive/csv data/playstore app  dataset/Play Store Data (2).csv')
df_review = pd.read_csv('/content/drive/MyDrive/csv data/playstore app  dataset/User Reviews (2).csv')

In [None]:
#  set option to  be maximum for rows and columns
pd.set_option('display.max_columns',None)
pd.set_option('display.max_columns',None)

### Dataset First View

In [None]:
# Dataset First Look
df.head(5)

In [None]:
df_review.sample(5)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

print(f'The total no of rows {df.shape[0]} \nThe total no of columns {df.shape[1]}')

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
print(f'Missing values in all columns \n\n{df.isnull().sum()}')
print('*' * 50)
print(f'Missing value in percentage \n\n{(df.isnull().sum()/len(df))*100}')

In [None]:
# Visualizing the missing values
plt.figure(figsize=(6,4))
sns.heatmap(df.isnull(),cmap='viridis',cbar=False,yticklabels=False)
plt.title('Missing Values Heatmap')

### What did you know about your dataset?

In rating columns 1474 values are missing
in current ver 8 values and android ver 3 values are missing are missing


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

# **Column Name** | **Data Type** | **Description**

- App : object : Name of the app
- Category : object : Category of the app
- Rating : float : Overall rating of the app
- Reviews : int : Number of reviews for the app
- Size : object : Size of the app
- Installs : object : Number of times the app has been installed
- Type : object : Type of the app (Free, Paid, etc.)
- Price : object : Price of the app
- Content Rating : object : Content rating of the app
- Genres : object : Genres of the app
- Last Updated : object : Date when the app was last updated
- Current Ver : object : Current version of the app
- Android Ver : object : Minimum Android version required to install the appAnswer Here

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.head(1)

## handeling Size columns

In [None]:
df['Size'].unique()

In [None]:
df['Size'].isnull().sum()

###Observation:
three types of unique values are in size columns
1. M
2. k
3. 'Veries with device'
- No  missing values in size columns

verify the number of  values and 3 different kinds of values

In [None]:
def cnt_vl(lst):
  count_M = 0
  count_k =0
  count_Varies_with_device = 0
  for i in lst:
    if 'M'in i:
      count_M +=1
    elif 'k' in i:
      count_k +=1
    else:
      count_Varies_with_device +=1
  return f'count_M is {count_M}, count_k is {count_k}, count_Varies_with_device is {count_Varies_with_device}'

cnt_vl(df['Size'])

converting the whole size column into bytes

In [None]:
# creating a helper function
def convert_size(size):
  if isinstance(size,str):
    if 'M'in size:
      return float(size.replace('M','')) * 1024 * 1024
    elif 'k' in size:
      return float(size.replace('k',''))* 1024
    else:
      return np.nan
  return size



In [None]:
df['Size']

In [None]:
df['Size'] = df['Size'].apply(convert_size)

In [None]:
df['Size']

In [None]:
# changing size column name size to size_in_bytes

df.rename(columns={'Size':'Size_in_bytes'},inplace=True)

In [None]:
df['Size_in_Mb'] = df['Size_in_bytes'].apply(lambda x: x/(1024*1024))

In [None]:
# df['Size_in_kb'] = df['Size_in_bytes'].apply(lambda x :x/(1024))

In [None]:
df.head(1)

## Install columns
-  Removing + sign
- remove , sign
- convert the datatype of column into int

In [None]:
df['Installs'].unique()

In [None]:
df['Installs'].value_counts()

In [None]:
df['Installs'].isnull().sum()

In [None]:
df['Installs'] = df['Installs'].replace('Free','1,000+')

In [None]:
df['Installs'].value_counts()

In [None]:
# creating a helper  function to remove + and , sign and convert the data into int

def convert_in_int(value):
  '''this function remove + and , sign
  and also convert data type into int
  '''

  value = value.replace('+','').replace(',','')
  # converting to integer
  return int(value)

In [None]:
# calling the function

df['Installs'] = df['Installs'].apply(convert_in_int)

In [None]:
df['Installs']

In [None]:
# same change we can do by using lambda funtion

df['Installs'] = df['Installs'].apply(lambda x : x.replace('+','') if '+' in str(x) else x)
df['Installs'] = df['Installs'].apply(lambda x : x.replace(',','') if ',' in str(x) else x)
df['Installs'] = df['Installs'].apply(lambda x : int(x))

In [None]:
# making a new column called 'Install_category' which will  have the category of installs
bins = [-1,0,10,1000,10000,100000,1000000,10000000,10000000000]
labels = ['no','Very low','Low','Moderate','More than moderate','High','Very High','Top Notch']
df['Install_category'] = pd.cut(df['Installs'],bins=bins,labels=labels)

In [None]:
df['Install_category'].value_counts()

## price columns

In [None]:
df['Price'].dtype

In [None]:
df['Price'].value_counts()

In [None]:
count_with_dollar = len([i for i in df['Price'] if i.startswith('$')])
count_without_dollar = len([i for i  in df['Price'] if  not i.startswith('$')])
print('start with $ ',count_with_dollar)
print('start without $ ', count_without_dollar)

###Observation
- we have to remove $ sign from 800 rows
- also we have to change the datatype of the Price columns objects to Float

In [None]:
# removing the $ sign from the price column
df['Price'] = df['Price'].apply(lambda x : x.replace('$','')if '$' in str(x) else x)

In [None]:
# removing the row which has the value 'Everyone' in pricee columns
df.drop(df[df['Price']=='Everyone'].index,inplace=True)


In [None]:
df['Price'].unique()

In [None]:
# changing the data type in to float
df['Price'] = df['Price'].apply(lambda x : float(x))

In [None]:
df['Price']

##Handling Reviews columns

In [None]:
df['Reviews'].dtype

In [None]:
# converting the datatype of Reviews columns object to  int
df['Reviews']= df['Reviews'].astype(int)

##Last Updated  columns

In [None]:
print(df['Last Updated'])
df['Last Updated'] = pd.to_datetime(df['Last Updated'])

Creating year and month from 'Last Updated' column


In [None]:
df['year'] = df['Last Updated'].dt.year
df['month'] = df['Last Updated'].dt.month

##checkinng missing values after changing in columns

In [None]:
df.isnull().sum()

In [None]:
print(f'missing value in dataset \n\n{(df.isnull().sum()/len(df))*100}')


In [None]:
# ploting heatmap for missing value visualation
plt.figure(figsize=(6,4))
sns.heatmap(df.isnull(),cmap='viridis',yticklabels=False,cbar=False)

###Observation:-
- Rating columns has

###Observation:
- Rating columns has 1474 missing values which is 13.59 % of the total missing values in the columns
- Size_in_bytes columns has 1695 missing values which is 15.63% of the total missing values in the columns
- current ver columns has 8 missing values which is 0.07% of the total missing values in the columns
-  Android ver columns has 2 mising values whic is 0.01% of the total missing values in the columns

## Dealing with missing value

finding the corellation between rating and the installation columns

In [None]:
# makes a correlation matrix of number columns
plt.figure(figsize=(6,4))
numeric_cols = [i for i in df.columns if df[i].dtype == 'int64' or df[i].dtype == 'float64']

corr = df[numeric_cols].corr()
sns.heatmap(corr,annot=True,cmap='viridis')

In [None]:
df[numeric_cols].corr()

### Observation:
- rating column depend on installs columns because if some app has 0 installation we can not fill rating null value with median or mode

In [None]:
df.isnull().sum()

In [None]:
df.dropna(subset=['Type','Current Ver','Android Ver'],inplace= True)

In [None]:
df.isnull().sum()

- we have remove 11 rows having null values in the Current Ver, Android Ver,Category columns

### Observations:
- Only Rating and Size_in_bytes  columns are left with missing values
- Rating columns is directly linked with the Installs columns
- in Size columns we alredy know about 'Varies with device' values which have converted into null values, we do not need to impute at the moment, as every app has different size and nobody can predict that as nearly as possible

In [None]:
df.columns

In [None]:
# using grupby function to find the trend of Rating in each Installs_category
df.groupby('Install_category')['Rating'].describe()

In [None]:
df['Install_category'].loc[df['Rating'].isnull()].value_counts()

In [None]:
# df[df['Rating'].isnull()].groupby('Install_category').size()


In [None]:
# ploting boxplot of Rating in each Install_category
plt.figure(figsize=(12,6))
sns.boxplot(y=df['Rating'],x=df['Install_category'],hue=df['Install_category'])

### Observation:
- The rating is increasing with an increase in the number of installations
- For the 'Top Notch' category, the median rating is around 4.5
- For the 'No' category, the median rating is around 4.0
- For the 'Very low' category, the median rating is around 4.1
- For the 'Low' category, the median rating is around 4.2
- For the 'Moderate' category, the median rating is around 4.3
- For the 'More than moderate' category, the median rating is around 4.4
- For the 'High' category, the median rating is around 4.4
- For the 'Very High' category, the median rating is around 4.5
- The rating is not normally distributed in any of the categories
- There are outliers in all categories
- The rating is skewed to the right in all categories
- The rating is positively correlated with the number of installations
- The rating is not affected by the number of installations for the 'Top Notch' category
- The rating is affected by the number of installations for the 'No' category
- The rating is affected by the number of installations for the 'Very low' category
- The rating is affected by the number of installations for the 'Low' category
- The rating is affected by the number of installations for the 'Moderate' category
- The rating is affected by the number of installations for the 'More than moderate' category
- The rating is affected by the number of installations for the 'High' category
- The rating is affected by the number of installations for the 'Very High' category
- The rating is affected by the number of installations for the 'No' category
- The rating is affected by the number of installations for the 'Very low' category
- The rating is affected by the number of installations for the 'Low' category
- The rating is affected by the number of installations for the 'Moderate' category
- The rating is affected by the number of installations for the 'More than moderate' category
- The rating is affected by the number of installations for the 'High' category
- The rating is affected by the number of installations for the 'Very High' category

In [None]:
# Acalculate the mean for  columns  'Rating

df.groupby(['Install_category'])['Rating'].mean()


In [None]:
df['Install_category'].loc[df['Rating'].isnull()].value_counts()

In [None]:
# Calculate the mean rating for each category
mean_rating_by_category = df.groupby(['Install_category'])['Rating'].mean()

# Fill missing values in 'Rating' column with the mean rating for each category
df['Rating'].fillna(df['Install_category'].map(mean_rating_by_category), inplace=True)


In [None]:
df['Install_category'].loc[df['Rating'].isnull()].value_counts()

In [None]:
m = df[df['Rating'].isnull()].index
df.loc[m,'Rating']= df['Rating'].mean()

##checking duplicate

In [None]:
df.duplicated().sum()

In [None]:
# removing the duplicates
df.drop_duplicates(inplace=True)

In [None]:
df.duplicated().sum()

In [None]:
df.shape


### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

# This bar plot displays the top 10 app categories based on the count of apps.
x=df['Category'].value_counts().nlargest(10)
# Create a bar plot using seaborn
sns.barplot(y=x.index,x=x.values,palette='winter')
# Add text labels to each bar indicating the count of each category
for i in range(len(x)):
   # Place the count text at the corresponding position on the bar
  plt.text(x.values[i],i,str(x.values[i]),va='center',ha='left')

##### What is/are the insight(s) found from the chart?

-  Family category has the highest number of the count which is 1938
- Game category has the 2nd highest number of count which is 1121
- Tools category has the 3rd highest number of count which is 841

#### Chart - 2

In [None]:


# Filter the DataFrame to include only rows where the 'Type' column is 'Paid'
paid_df = df[df['Type'] == 'Paid']

# Group the filtered DataFrame by 'Category' and calculate the mean of the 'Price' column for each group
mean_price_by_category = paid_df.groupby('Category')['Price'].mean().sort_values()

# Plotting
plt.figure(figsize=(10, 10))  # Adjust figure size if needed
mean_price_by_category.plot(kind='barh', color='lightgreen')

# Adding labels and title
plt.xlabel('Average Price')
plt.ylabel('Category')
plt.title('Average Price of Paid Items by Category')

# Add value labels on each bar
for index, value in enumerate(mean_price_by_category):
    plt.text(value, index, str(round(value, 2)))

# Display the plot
plt.show()



##### What is/are the insight(s) found from the chart?

- The average price of a Finance app is  170.64
- The average price of a Lifestyle app is 124.26
- The average price of a Events app is 109.99

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# the top 10 categories with the highest summed ratings
top_rating = df.groupby(df['Category'])['Rating'].sum().sort_values(ascending=False).nlargest(10)

# Create a bar plot using seaborn
ax = sns.barplot(x=top_rating.values, y=top_rating.index, palette='Set1')

# Adding labels and title
plt.xlabel('Summed Ratings')  # X-axis label
plt.ylabel('Category')  # Y-axis label
plt.title('Top 10 Categories by Summed Ratings')  # Title

# Add labels on the bars
for i, v in enumerate(top_rating.values):
    ax.text(v + 0.1, i, str(round(v, 2)), color='black', va='center')  # Adjust the position and color of labels

# Display the plot
plt.show()



#####  What is/are the insight(s) found from the chart?

- Family category has the highest summed rating which is 8147.64
- Game category has the 2nd highest summed rating which is 4800.04
- Tools category has the 3rd highest summed rating which is 3432.11

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# ploting the rating distribution
plt.figure(figsize=(8,4))
sns.distplot(df['Rating'],color='blue')

##### 2. What is/are the insight(s) found from the chart?

- there are maximum rating in the range between 3.8 to 4.7
- some how there is some app whose rate is 1

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import matplotlib.pyplot as plt

# countig the number of paid and free app
type_counts = df['Type'].value_counts()

# Plotting the pie chart
plt.figure(figsize=(10, 6))
explode = [0.1] * len(type_counts)  # Explode slices for emphasis, adjust as needed
plt.pie(type_counts, labels=type_counts.index, autopct='%1.1f%%', startangle=140, explode=explode, shadow=True, pctdistance=0.85)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Distribution of Types')
plt.show()


##### 2. What is/are the insight(s) found from the chart?

- There are 7.4 % app are paid and 92.6 % app are free

#### Chart - 6

In [None]:
# Chart - 6 visualization code
paid_app = df[df['Type']=='Paid']
# calculating the average price of paid apps
average_price = paid_app['Price'].mean()
print(f'The average price of a paid app is {average_price}')

In [None]:
# Counting the occurrences of each category in the 'Content Rating'
cn = df['Content Rating'].value_counts().sort_values(ascending=False)
sns.barplot(x = cn.values,y=cn.index,palette='viridis')

# adding count labels on top of each bar
for i in range(len(cn)):
  plt.text(cn.values[i],i,str(cn.values[i]),va='center',ha='left')

plt.xticks(rotation=45)
plt.show()

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Get the counts of the top 10 genres
gn = df['Genres'].value_counts().head(10)

# Plotting the horizontal bar plot
plt.figure(figsize=(10, 6))
sns.barplot(x=gn.values, y=gn.index, palette='winter')

# Adding count labels on the right side of each bar
for i in range(len(gn)):
    plt.text(gn.values[i], i, str(gn.values[i]), ha='left', va='center')

plt.xlabel('Count')
plt.ylabel('Genres')
plt.title('Top 10 Genres')
plt.show()


##### 2. What is/are the insight(s) found from the chart?

On the basis if the genres
- tools app has the highest number of count which is 840
- Entertainment app has the 2nd highest number of count which is 587
- Education app has the 3rd highest number of count which is 526

#### Chart - 8

In [None]:
df.pivot_table(index='year', columns='month', aggfunc='size', fill_value=0)


In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10,3))
over_time = df.groupby(['year','month']).size()
over_time.plot(color='red', linestyle='-')
plt.title('Trend of App Updates Over Time')
plt.xlabel('Year-Month')
plt.ylabel('Number of Updates')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

- The Number of updates Gradually increases from the 2018

#### Chart - 9

In [None]:
# Chart - 9 visualization code

# plotting a boxplot to show the distribution of ratings for each category

fig,axes = plt.subplots(nrows=len(df['Category'].unique()),figsize=(8,4*len(df['Category'].unique())))

# Iterate through each category and create a distplot
for i , category in enumerate(df['Category'].unique()):
  sns.distplot(df[df['Category']==category]['Rating'].dropna(),kde=True,ax=axes[i])
  axes[i].set_title(f'Distribution of Ratings for {category}')
  axes[i].set_xlabel('Rating')
  axes[i].set_ylabel('Density')

plt.tight_layout()
plt.show()

In [None]:
# Create a FacetGrid for each category
g = sns.FacetGrid(df, col="Category", col_wrap=2, height=3, aspect=1.5)

# Map a distplot of Ratings for each category
g.map(sns.histplot, "Rating", kde=True)

# Set titles
g.set_titles("Distribution of Ratings for {col_name}")

plt.show()

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code


# Create the barplot
plt.figure(figsize=(8, 6))
sns.barplot(x=df['Type'], y=df['Installs'], estimator=len, ci=None,palette='Set1')

# Add labels on top of the bars
for index, value in enumerate(df.groupby('Type').size()):
    plt.text(index, value + 100, f'{value}', ha='center')

plt.xlabel('App Type')
plt.ylabel('Number of Apps')
plt.title('Number of Apps by App Type')
plt.show()


- There are 9584 app are free and 762 are paid

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(12,8))
sns.barplot(x=df['Install_category'],y=df['Size_in_Mb'])



in Top Notch category the size of app is highest

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# which content rating is more popular in install

# Create the barplot
plt.figure(figsize=(12, 8))
sns.barplot(x=df['Content Rating'], y=df['Installs'], estimator=len)

# Add labels on the bars with the count of apps in each content rating
for i, value in enumerate(df['Content Rating'].value_counts()):
    plt.text(i, value, f'{value}', ha='center', va='bottom')

# Set labels and title
plt.xlabel('Content Rating')
plt.ylabel('Number of Apps')
plt.title('Number of Apps by Content Rating')

plt.show()



- Everyone type app are the highest number of install which is 8372

#### Chart - 13

In [None]:
# Chart - 13 visualization code
category_counts = df['Category'].loc[df['Content Rating']=='Everyone'].value_counts()

plt.figure(figsize=(12,4))
# Plotting the counts in a line plot
category_counts.plot(kind='line')

# Adding title and labels
plt.title('Category Counts for Content Rating: Everyone')
plt.xlabel('Category')
plt.ylabel('Count')

# Display the plot
plt.show()


Answer Here.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(6,4))
numeric_cols = [i for i in df.columns if df[i].dtype == 'int64' or df[i].dtype == 'float64']

corr = df[numeric_cols].corr()
sns.heatmap(corr,annot=True,cmap='viridis')

#**Key Observations:**

- There is a strong positive correlation between 'Rating' and 'Reviews', indicating that apps with higher ratings tend to have more reviews.
- There is a moderate positive correlation between 'Installs' and 'Reviews', suggesting that apps with more installations tend to have more reviews.
- There is a weak positive correlation between 'Price' and 'Installs', indicating that apps with higher prices tend to have more installations.
- There is a weak negative correlation between 'Size_in_Mb' and 'Installs', suggesting that apps with larger sizes tend to have fewer installations.
- There is a weak positive correlation between 'Year' and 'Rating', indicating that apps released in recent years tend to have higher ratings.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Drop non-numeric columns if necessary
numeric_df = df.select_dtypes(include=['float64', 'int64'])

# Plotting pairplot
sns.pairplot(numeric_df)
plt.show()


17. Wordcloud

In [None]:
from wordcloud import WordCloud
text = " ".join(df['App'].values)
wordcloud = WordCloud(max_font_size=150, max_words=100, background_color="white")
wordcloud.generate(text)
plt.figure(figsize=(12,8))
plt.imshow(wordcloud, interpolation="bilinear")
plt.axis("off")
plt.show()


## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



- **Focus on user reviews and ratings:**
    - Encourage users to leave reviews and ratings by providing incentives or making the process easier.
    - Monitor and respond to reviews promptly, especially negative ones.
    - Use reviews and ratings to identify areas for improvement in your app.
- **Optimize app size:**
    - Keep app size as small as possible to improve installation rates.
    - Consider offering a "lite" version of your app for devices with limited storage.
- **Target the right audience:**
    - Use app store optimization (ASO) to ensure your app is visible to relevant users.
    - Consider running targeted advertising campaigns to reach your ideal customers.
- **Offer a compelling value proposition:**
    - Highlight the unique features and benefits of your app.
    - Consider offering a free trial or freemium model to attract new users.
- **Monitor and track key metrics:**
    - Track key metrics such as installs, reviews, and ratings to measure your app's performance.
    - Use this data to make informed decisions about how to improve your app and marketing strategy.

# **Conclusion**



In conclusion, the analysis of the app store data provided valuable insights into the factors that influence app success. By understanding the key trends and correlations, we can make informed decisions to improve app quality, increase visibility, and ultimately drive more installs and revenue.

The findings highlight the importance of user reviews and ratings, app size optimization, targeting the right audience, offering a compelling value proposition, and continuously monitoring and tracking key metrics. By leveraging these insights, app developers and marketers can significantly improve their chances of success in the competitive app market.

By implementing the recommended strategies, we can enhance the overall user experience, attract and retain a larger user base, and ultimately achieve the desired business objectives.

### ***Nikesh singh !!!***