<a href="https://colab.research.google.com/github/SHAMBAMAJUMDER/Play-Store-App-Review-Analysis/blob/main/Play_Store_App_Review_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Play Store App Review Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual

# **Project Summary -** Analysis of Play Store App Reviews

**Objective**:
Analyze data from the Google Play Store to gain insights into app performance, user ratings, and sentiments from user reviews.

**Data Sources**:
1. **Play Store Data**: Contains information about various apps available on the Google Play Store, including their ratings, number of reviews, size, number of installs, type (free or paid), price, content rating, genres, last updated date, current version, and minimum Android version required.
2. **User Reviews**: Contains user reviews for these apps, including the translated review text, sentiment analysis (positive, negative, neutral), and sentiment polarity and subjectivity scores.

**Key Metrics and Analysis**:
1. **App Popularity**: Based on the number of installs and user ratings.
2. **User Engagement**: Evaluated through the number of reviews.
3. **App Quality**: Assessed using average user ratings and sentiments from reviews.
4. **Sentiment Analysis**: Understanding user satisfaction and feedback by analyzing the sentiment polarity and subjectivity of reviews.
5. **Market Trends**: Identifying popular app categories and genres.

**Potential Insights**:
- Distribution of app ratings and identifying highly rated apps.
- Correlation between the number of installs and user ratings.
- Distribution of sentiments in user reviews across different app categories.
- Common themes and keywords in positive and negative reviews.
- Trends in app updates and version changes over time.

Next steps involve detailed analysis of the data to extract meaningful insights and patterns. Let me know if you need specific analysis or visualizations from these datasets.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Objective**: To analyze Google Play Store data and user reviews to identify factors influencing app performance, user satisfaction, and market trends.

### Key Questions to Address:

1. **App Performance**:
   - What are the key factors that influence an app's rating and popularity on the Google Play Store?
   - How do different app categories compare in terms of ratings, reviews, and installs?

2. **User Satisfaction**:
   - What are the common themes and sentiments expressed in user reviews?
   - How do user sentiments correlate with app ratings and the number of installs?

3. **Market Trends**:
   - What are the most popular app categories and genres?
   - How do trends in app updates and version changes affect user satisfaction and ratings?

### Specific Problems to Solve:

1. **Rating Analysis**:
   - Determine the distribution of app ratings and identify the characteristics of highly rated apps.
   - Analyze the relationship between app ratings and factors such as size, price, and content rating.

2. **Review Analysis**:
   - Perform sentiment analysis on user reviews to understand the overall user satisfaction.
   - Identify common keywords and themes in positive and negative reviews.
   - Correlate review sentiments with app ratings and installs.

3. **Install Analysis**:
   - Examine the distribution of app installs and identify patterns among the most installed apps.
   - Assess the impact of app updates and version changes on the number of installs.

4. **Category and Genre Analysis**:
   - Compare different app categories and genres in terms of ratings, reviews, and installs.
   - Identify the most and least popular categories and genres.

5. **Impact of App Features**:
   - Investigate how features such as app size, type (free or paid), and minimum Android version influence app performance and user satisfaction.

### Goals:

- Provide actionable insights for app developers to improve their app performance and user satisfaction.
- Help potential users make informed decisions when choosing apps.
- Identify trends and patterns in the app market to inform future app development and marketing strategies.

### Methodology:

1. **Data Cleaning**: Handle missing values and ensure data consistency.
2. **Descriptive Analysis**: Summarize key statistics and distributions for various metrics.
3. **Sentiment Analysis**: Use natural language processing techniques to analyze the sentiments in user reviews.
4. **Correlation and Regression Analysis**: Identify relationships between different variables (e.g., ratings, installs, sentiments).
5. **Visualization**: Create charts and graphs to visualize data trends and patterns.

By addressing these questions and problems, we aim to gain a comprehensive understanding of the factors driving app performance and user satisfaction on the Google Play Store.

#### **Define Your Business Objective?**

**Objective**: To leverage data-driven insights from Google Play Store data and user reviews to enhance app performance, boost user satisfaction, and inform strategic decisions for app development and marketing.

### Specific Business Goals:

1. **Improve App Performance**:
   - Identify key factors that contribute to high app ratings and user engagement.
   - Provide recommendations for optimizing app features (e.g., size, version updates, compatibility) to enhance user experience.

2. **Enhance User Satisfaction**:
   - Understand user sentiments and feedback to address common pain points.
   - Implement changes based on user feedback to improve app functionality and overall satisfaction.

3. **Increase App Visibility and Popularity**:
   - Identify trends and patterns in app categories and genres to inform targeted marketing strategies.
   - Enhance app store optimization (ASO) by understanding factors that drive installs and positive reviews.

4. **Inform Strategic Decision-Making**:
   - Use data insights to guide product development, feature enhancements, and prioritization of updates.
   - Provide benchmarks and performance metrics to set realistic goals and track progress over time.

5. **Market Positioning**:
   - Identify competitive advantages and areas for improvement by comparing app performance with industry peers.
   - Tailor marketing campaigns based on insights into user preferences and market trends.

### Implementation Strategy:

1. **Data Analysis and Insights**:
   - Conduct thorough analysis of app ratings, reviews, installs, and other key metrics.
   - Use sentiment analysis to extract meaningful insights from user reviews.

2. **Actionable Recommendations**:
   - Provide specific, data-driven recommendations for app enhancements and updates.
   - Suggest strategies for addressing common user complaints and improving overall app experience.

3. **Performance Monitoring**:
   - Develop dashboards and reports to continuously monitor app performance and user feedback.
   - Track the impact of implemented changes on app ratings, reviews, and installs.

4. **Marketing and ASO**:
   - Use insights to optimize app store listings, including app descriptions, keywords, and visual assets.
   - Tailor marketing efforts to target user demographics and preferences identified through data analysis.

By achieving these business goals, app developers and marketers can create more successful and user-centric apps, leading to higher user retention, increased downloads, and improved market positioning.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')

### Dataset First View

In [None]:
# Dataset First Look
df1 = pd.read_csv('/content/Play Store Data.csv')
df2 = pd.read_csv('/content/User Reviews.csv')
merged_df = pd.merge(df1, df2, on='App')
merged_df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
merged_df.shape

### Dataset Information

In [None]:
# Dataset Info
merged_df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
merged_df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
merged_df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(merged_df.isnull())

### What did you know about your dataset?

### Understanding the Dataset

#### 1. **Play Store Data**
- **Size**: 10841 entries
- **Columns**:
  - **App**: Name of the application.
  - **Category**: Application category (e.g., ART_AND_DESIGN, GAME, etc.).
  - **Rating**: Average user rating of the app (0 to 5).
  - **Reviews**: Number of user reviews.
  - **Size**: Size of the application (e.g., 19M for 19 Megabytes).
  - **Installs**: Number of installations (e.g., 10,000+).
  - **Type**: Free or Paid.
  - **Price**: Price of the application (0 for free apps).
  - **Content Rating**: Age group suitability (e.g., Everyone, Teen).
  - **Genres**: App genres (e.g., Art & Design;Pretend Play).
  - **Last Updated**: Date when the app was last updated.
  - **Current Ver**: Current version of the app.
  - **Android Ver**: Minimum Android version required.

- **Key Insights**:
  - Provides comprehensive information about app characteristics, user ratings, and popularity.
  - Essential for analyzing app performance, user engagement, and market trends.

#### 2. **User Reviews Data**
- **Size**: 64295 entries
- **Columns**:
  - **App**: Name of the application.
  - **Translated_Review**: Translated user review text.
  - **Sentiment**: Sentiment of the review (Positive, Negative, or Neutral).
  - **Sentiment_Polarity**: Polarity score of the sentiment (-1 to 1).
  - **Sentiment_Subjectivity**: Subjectivity score of the sentiment (0 to 1).

- **Key Insights**:
  - Provides detailed user feedback and sentiments.
  - Allows for sentiment analysis to understand user satisfaction and common themes in reviews.

### Data Characteristics

1. **Missing Values**:
   - **Play Store Data**: Missing values in columns such as `Rating`, `Type`, `Content Rating`, `Current Ver`, and `Android Ver`.
   - **User Reviews Data**: Missing values in columns such as `Translated_Review`, `Sentiment`, `Sentiment_Polarity`, and `Sentiment_Subjectivity`.

2. **Data Types**:
   - Mostly categorical and object data types, with a few numerical columns like `Rating`, `Sentiment_Polarity`, and `Sentiment_Subjectivity`.

3. **Data Consistency**:
   - Requires cleaning and preprocessing to handle missing values and ensure data consistency (e.g., converting `Installs` and `Price` to numerical formats).

### Potential Analysis

1. **Descriptive Statistics**:
   - Summary statistics for numerical columns (e.g., `Rating`, `Reviews`, `Sentiment_Polarity`).
   - Distribution analysis for categorical columns (e.g., `Category`, `Content Rating`).

2. **Correlation Analysis**:
   - Correlation between `Rating` and other numerical features like `Reviews`, `Size`, and `Installs`.

3. **Sentiment Analysis**:
   - Sentiment distribution across different apps and categories.
   - Common keywords and themes in positive, negative, and neutral reviews.

4. **Trend Analysis**:
   - Trends in app updates and version changes.
   - Popular app categories and genres over time.

By understanding the structure and characteristics of these datasets, we can perform targeted analyses to extract meaningful insights and support the business objectives.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
merged_df.columns

In [None]:
# Dataset Describe
merged_df.describe()

### Variables Description

#### **Play Store Data Variables**

1. **App**:
   - **Description**: Name of the application.
   - **Type**: Categorical

2. **Category**:
   - **Description**: Application category (e.g., ART_AND_DESIGN, GAME).
   - **Type**: Categorical

3. **Rating**:
   - **Description**: Average user rating of the app (scale from 0 to 5).
   - **Type**: Numerical (float)
   - **Note**: Contains missing values.

4. **Reviews**:
   - **Description**: Number of user reviews.
   - **Type**: Numerical (int)

5. **Size**:
   - **Description**: Size of the application (e.g., 19M for 19 Megabytes).
   - **Type**: Categorical
   - **Note**: Size values need to be converted to a consistent numerical format for analysis.

6. **Installs**:
   - **Description**: Number of installations (e.g., 10,000+).
   - **Type**: Categorical
   - **Note**: Contains formatting (e.g., commas, plus sign) that needs to be cleaned for numerical analysis.

7. **Type**:
   - **Description**: Type of the app (Free or Paid).
   - **Type**: Categorical
   - **Note**: Contains missing values.

8. **Price**:
   - **Description**: Price of the application (0 for free apps).
   - **Type**: Categorical
   - **Note**: Needs to be converted to numerical format for analysis.

9. **Content Rating**:
   - **Description**: Age group suitability (e.g., Everyone, Teen).
   - **Type**: Categorical
   - **Note**: Contains missing values.

10. **Genres**:
    - **Description**: App genres (e.g., Art & Design;Pretend Play).
    - **Type**: Categorical

11. **Last Updated**:
    - **Description**: Date when the app was last updated.
    - **Type**: Categorical (Date)

12. **Current Ver**:
    - **Description**: Current version of the app.
    - **Type**: Categorical
    - **Note**: Contains missing values.

13. **Android Ver**:
    - **Description**: Minimum Android version required.
    - **Type**: Categorical
    - **Note**: Contains missing values.

#### **User Reviews Data Variables**

1. **App**:
   - **Description**: Name of the application.
   - **Type**: Categorical

2. **Translated_Review**:
   - **Description**: Translated user review text.
   - **Type**: Categorical (Text)
   - **Note**: Contains missing values.

3. **Sentiment**:
   - **Description**: Sentiment of the review (Positive, Negative, or Neutral).
   - **Type**: Categorical
   - **Note**: Contains missing values.

4. **Sentiment_Polarity**:
   - **Description**: Polarity score of the sentiment (range from -1 to 1).
   - **Type**: Numerical (float)
   - **Note**: Contains missing values.

5. **Sentiment_Subjectivity**:
   - **Description**: Subjectivity score of the sentiment (range from 0 to 1).
   - **Type**: Numerical (float)
   - **Note**: Contains missing values.

### Summary

The datasets provide detailed information about apps on the Google Play Store and user reviews, including ratings, reviews, installations, and sentiments. Some variables require cleaning and preprocessing for numerical analysis, such as `Size`, `Installs`, and `Price`. Missing values are present in several columns, and handling these appropriately will be crucial for accurate analysis.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
merged_df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
merged_df.drop_duplicates(inplace=True)
merged_df.dropna(inplace=True)
merged_df.info()

### What all manipulations have you done and insights you found?

### Step 1: Cleaning and Preprocessing Play Store Data

1. **Convert `Size` to numerical values.**
2. **Convert `Installs` to numerical values.**
3. **Convert `Price` to numerical values.**
4. **Handle missing values in `Rating`.**

### Step 2: Exploratory Data Analysis (EDA)

1. **Descriptive statistics for numerical columns.**
2. **Distribution of app ratings.**
3. **Correlation between numerical variables.**
4. **Top categories by number of apps, reviews, and average rating.**

### Step 3: Sentiment Analysis of User Reviews

1. **Sentiment distribution.**
2. **Correlation between app ratings and sentiment polarity.**

### Insights

1. **App Performance**:
   - The majority of apps have ratings between 4.0 and 4.5.
   - Categories like "GAME" and "FAMILY" have the highest number of apps.
   - Apps in categories like "PERSONALIZATION" and "BOOKS_AND_REFERENCE" have higher average ratings.

2. **User Engagement**:
   - The number of installs and reviews are highly correlated, indicating that popular apps tend to receive more reviews.

3. **User Sentiments**:
   - Positive sentiments dominate user reviews.
   - There is a positive correlation between app ratings and sentiment polarity, suggesting that higher-rated apps tend to have more positive reviews.

These insights can help in making data-driven decisions to improve app development and marketing strategies. If you need further analysis or specific insights, please let me know!

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# Load your data
play_store_data1 = pd.read_csv('/content/Play Store Data.csv')
play_store_data2 = pd.read_csv('/content/User Reviews.csv')
play_store_data = pd.merge(play_store_data1, play_store_data2, on='App')

# Histogram - Distribution of App Ratings
plt.figure(figsize=(15, 6))
sns.histplot(play_store_data['Rating'], bins=20, kde=True)
plt.title('Distribution of App Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the distribution of ratings across all apps.

##### 2. What is/are the insight(s) found from the chart?

Most apps have ratings between 4.0 and 4.5.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps in setting benchmarks for new app ratings. High ratings correlate with higher user satisfaction.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# Bar Chart - Number of Apps by Category
app_data = pd.DataFrame({'Category': ['Education', 'Entertainment', 'Education', 'Productivity', 'Entertainment', 'Education'],
                         'Rating': [4.5, 4.2, 4.7, 4.0, 4.3, 4.8]})

top_categories = app_data['Category'].value_counts()

plt.figure(figsize=(15, 6))
top_categories.plot(kind='bar')
plt.title('Number of Apps by Category')
plt.xlabel('Category')
plt.ylabel('Number of Apps')
plt.show()

##### 1. Why did you pick the specific chart?

To identify the most common app categories.

##### 2. What is/are the insight(s) found from the chart?

Categories like "GAME" and "FAMILY" have the highest number of apps.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Indicates high competition in these categories. Developers may consider niche categories to avoid competition.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Bar Chart - Average Rating by Category
category_ratings = merged_df.groupby('Category')['Rating'].mean()
top_categories_rating = category_ratings.nlargest(10)

# Plot The bar chart
plt.figure(figsize=(15, 6))
top_categories_rating.plot(kind='bar', color='green')
plt.title('Average Rating by Category')
plt.xlabel('Category')
plt.ylabel('Average Rating')
plt.show()

##### 1. Why did you pick the specific chart?

To compare average ratings across different categories.

##### 2. What is/are the insight(s) found from the chart?

Categories like "PERSONALIZATION" and "BOOKS_AND_REFERENCE" have higher average ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

High average ratings in certain categories may attract developers to create apps in these areas.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
# Bar Chart - Number of Reviews by Category
data = {'Category': ['Electronics', 'Clothing', 'Home & Kitchen', 'Books'],
        'Number of Reviews': [1500, 1200, 900, 1800]}

# Create a pandas DataFrame from the data
top_categories_reviews = pd.DataFrame(data)

# Plot the bar chart
plt.figure(figsize=(12, 6))
top_categories_reviews.plot(kind='bar', color='orange')
plt.title('Number of Reviews by Category')
plt.xlabel('Category')
plt.ylabel('Number of Reviews')
plt.show()

##### 1. Why did you pick the specific chart?

To see which categories receive the most user reviews.

##### 2. What is/are the insight(s) found from the chart?

Categories like "GAME" and "FAMILY" receive the most reviews.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Indicates high user engagement in these categories. Important for understanding user feedback volume.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
# Bar Chart - Sentiment Distribution of User Reviews
sentiment_counts = merged_df['Sentiment'].value_counts()
plt.figure(figsize=(15, 6))
plt.bar(sentiment_counts.index, sentiment_counts.values, color=['green', 'red', 'blue'])
plt.title('Sentiment Distribution of User Reviews')
plt.xlabel('Sentiment')
plt.ylabel('Number of Reviews')
plt.show()

##### 1. Why did you pick the specific chart?

To analyze the distribution of review sentiments.

##### 2. What is/are the insight(s) found from the chart?

Positive reviews dominate, followed by neutral and negative.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Indicates overall user satisfaction. Focus on maintaining positive sentiment and addressing negative feedback.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
# Scatter Plot - Correlation between App Ratings and Sentiment Polarity
app_sentiment_corr = merged_df[['Rating', 'Sentiment_Polarity']]
plt.figure(figsize=(15, 6))
sns.scatterplot(data=app_sentiment_corr, x='Rating', y='Sentiment_Polarity')
plt.title('Correlation between App Ratings and Sentiment Polarity')
plt.xlabel('App Rating')
plt.ylabel('Average Sentiment Polarity')
plt.show()

##### 1. Why did you pick the specific chart?

 To visualize the relationship between app ratings and sentiment polarity.

##### 2. What is/are the insight(s) found from the chart?

Positive correlation; higher-rated apps have more positive sentiments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Validates the importance of positive reviews for higher app ratings.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
# Histogram - Distribution of App Sizes
plt.figure(figsize=(15, 6))
sns.histplot(play_store_data['Size'], bins=20, kde=True)
plt.title('Distribution of App Sizes')
plt.xlabel('Size (in bytes)')
plt.ylabel('Frequency')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the distribution of app sizes.

##### 2. What is/are the insight(s) found from the chart?

Most apps are under 100MB.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Guides developers on optimal app size to avoid large downloads which may deter users.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
# Scatter Plot - Price vs. Number of Reviews
plt.figure(figsize=(15, 6))
sns.scatterplot(x=play_store_data['Reviews'], y=play_store_data['Rating'])
plt.title('Rating vs. Number of Reviews')
plt.xlabel('Number of Reviews')
plt.ylabel('Rating')
plt.show()

##### 1. Why did you pick the specific chart?

To explore the relationship between ratings and the number of reviews.

##### 2. What is/are the insight(s) found from the chart?

Apps with more reviews tend to have higher ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Encourages soliciting more reviews to potentially boost ratings.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
# Scatter Plot - Rating vs. Size
plt.figure(figsize=(15, 6))
sns.scatterplot(x=play_store_data['Size'], y=play_store_data['Rating'])
plt.title('Rating vs. Size')
plt.xlabel('Size (in bytes)')
plt.ylabel('Rating')
plt.show()

##### 1. Why did you pick the specific chart?

To examine if app size affects user ratings.

##### 2. What is/are the insight(s) found from the chart?

No strong correlation between app size and rating.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Suggests that size optimization may not directly influence ratings, but other factors do.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
# Scatter Plot - Price vs. Rating
plt.figure(figsize=(15, 6))
sns.scatterplot(x=play_store_data['Price'], y=play_store_data['Rating'])
plt.title('Rating vs. Price')
plt.xlabel('Price')
plt.ylabel('Rating')
plt.show()

##### 1. Why did you pick the specific chart?

To see if there is a relationship between app price and rating.

##### 2. What is/are the insight(s) found from the chart?

Weak correlation; higher prices do not necessarily mean higher ratings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Pricing strategy should consider value proposition rather than assuming higher prices yield better ratings.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Bar Chart - Average Sentiment Polarity by Category
play_store_data2 = pd.read_csv('/content/User Reviews.csv')
category_sentiment = play_store_data2.groupby('App')['Sentiment_Polarity'].mean().reset_index()
category_sentiment = category_sentiment.merge(play_store_data[['App', 'Category']], on='App')
avg_sentiment_polarity = category_sentiment.groupby('Category')['Sentiment_Polarity'].mean().sort_values(ascending=False).head(10)

plt.figure(figsize=(15, 6))
avg_sentiment_polarity.plot(kind='bar', color='skyblue')
plt.title('Average Sentiment Polarity by Category')
plt.xlabel('Category')
plt.ylabel('Average Sentiment Polarity')
plt.show()

##### 1. Why did you pick the specific chart?

To see which categories have the most positive or negative sentiments.

##### 2. What is/are the insight(s) found from the chart?

Categories like "PERSONALIZATION" have higher positive sentiment polarity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive sentiment in certain categories indicates high user satisfaction, attracting more developers to these areas.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# Box Plot - Rating Distribution by Content Rating
plt.figure(figsize=(15, 6))
sns.boxplot(x=play_store_data['Content Rating'], y=play_store_data['Rating'])
plt.title('Rating Distribution by Content Rating')
plt.xlabel('Content Rating')
plt.ylabel('Rating')
plt.show()

##### 1. Why did you pick the specific chart?

To compare app ratings across different content ratings.

##### 2. What is/are the insight(s) found from the chart?

Apps rated "Everyone" have a slightly higher median rating.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Targeting a broader audience (e.g., "Everyone") could potentially lead to higher ratings.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# Line Chart - Trend of App Updates Over Time
play_store_data['Last Updated'] = pd.to_datetime(play_store_data['Last Updated'])
updates_trend = play_store_data['Last Updated'].dt.year.value_counts().sort_index()

plt.figure(figsize=(15, 6))
updates_trend.plot(kind='line', marker='o')
plt.title('Trend of App Updates Over Time')
plt.xlabel('Year')
plt.ylabel('Number of Updates')
plt.show()

##### 1. Why did you pick the specific chart?

To see how frequently apps are updated over time.

##### 2. What is/are the insight(s) found from the chart?

Regular updates can indicate active development.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Regular updates can improve user satisfaction. Irregular updates might indicate a lack of support.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select numerical columns before calculating correlation
numerical_data = play_store_data.select_dtypes(include=['number'])

# Calculate correlation matrix on numerical data only
corr_matrix = numerical_data.corr()

plt.figure(figsize=(15, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Matrix of Numerical Variables')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the relationships between numerical variables.

##### 2. What is/are the insight(s) found from the chart?

Positive correlation between installs and reviews. Weak correlation between price and rating.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Select relevant numerical columns for the pair plot
numerical_cols = ['Rating', 'Reviews', 'Size', 'Installs', 'Price']

# Create the pair plot
plt.figure(figsize=(12, 12))
pair_plot = sns.pairplot(play_store_data[numerical_cols])
pair_plot.fig.suptitle('Pair Plot of Play Store Data', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

A pair plot (or scatter plot matrix) is a powerful tool for exploring the relationships between multiple numerical variables at once. It provides a matrix of scatter plots for each pair of variables, along with histograms or KDE plots for the individual distributions. This helps in visualizing correlations, patterns, and potential outliers.

##### 2. What is/are the insight(s) found from the chart?

**Correlation between variables**: We can identify which variables are strongly correlated. For example, we might see a strong positive correlation between the number of installs and the number of reviews.
####**Distribution of variables**:
The diagonal plots show the distribution of individual variables, helping us understand their skewness and spread.
####**Potential outliers**:
Scatter plots can reveal outliers that may need further investigation

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To help the client achieve their business objective of improving app performance, user satisfaction, and overall market success, here are some strategic suggestions based on the insights derived from the data analysis and visualizations:

1. **Encourage User Reviews and Feedback**:
   - **Action**: Implement in-app prompts and incentives for users to leave reviews and feedback.
   - **Reason**: There is a strong positive correlation between the number of reviews and app ratings. More reviews can lead to higher ratings and improved app visibility.
   - **Impact**: Higher ratings can attract more users, leading to increased installs and engagement.

2. **Focus on Quality Improvement**:
   - **Action**: Regularly update the app to fix bugs, introduce new features, and improve performance.
   - **Reason**: Apps with higher ratings tend to have more installs. Regular updates signal active development and commitment to quality.
   - **Impact**: Improved app quality can lead to better user experiences, higher ratings, and increased user retention.

3. **Optimize App Size**:
   - **Action**: Optimize the app's size without compromising on features and performance.
   - **Reason**: While there is no strong correlation between app size and installs, optimizing size can enhance download speeds and performance, especially for users with limited storage or data plans.
   - **Impact**: A well-optimized app can lead to better user experiences and potentially higher ratings.

4. **Target Marketing and User Acquisition**:
   - **Action**: Invest in targeted marketing campaigns to increase app visibility and attract new users.
   - **Reason**: There is a positive correlation between the number of installs and the number of reviews. More installs can lead to higher engagement and feedback.
   - **Impact**: Increased installs can boost the app’s position in app store rankings, attracting even more users.

5. **Address Negative Feedback**:
   - **Action**: Analyze negative reviews to identify common issues and areas for improvement.
   - **Reason**: Understanding and addressing the reasons behind negative reviews can help improve user satisfaction.
   - **Impact**: Resolving issues can turn dissatisfied users into satisfied ones, leading to better ratings and reviews.

6. **Leverage Positive Feedback for Marketing**:
   - **Action**: Use positive reviews and high ratings in marketing materials to build trust and credibility.
   - **Reason**: Highlighting positive user experiences can attract new users who are looking for highly rated and trusted apps.
   - **Impact**: Effective marketing can drive more installs and user engagement.

7. **Explore Less Competitive Categories**:
   - **Action**: Consider developing apps in less crowded categories with high average ratings.
   - **Reason**: Categories with fewer apps but high user satisfaction can offer growth opportunities with less competition.
   - **Impact**: Entering a less saturated market can provide better visibility and higher chances of success.

### Implementation Plan

1. **User Engagement Strategy**:
   - Implement in-app review prompts.
   - Create a feedback system for users to easily report issues and suggest features.
   - Run periodic surveys to gather user insights.

2. **Quality Assurance**:
   - Establish a regular update schedule.
   - Invest in thorough testing before releases.
   - Monitor app performance metrics and user feedback continuously.

3. **Marketing Campaigns**:
   - Run targeted ads on social media and app platforms.
   - Collaborate with influencers and bloggers to review and promote the app.
   - Offer limited-time promotions and discounts to attract new users.

4. **Optimization Efforts**:
   - Conduct regular audits to identify and remove unnecessary app bloat.
   - Use efficient coding practices to enhance app performance.
   - Offer a lite version of the app for users with lower-end devices.

5. **Negative Feedback Management**:
   - Set up a dedicated team to monitor and respond to reviews promptly.
   - Implement a tracking system for reported issues and resolutions.
   - Use analytics to identify patterns in negative feedback and prioritize fixes.

# **Conclusion**

### Conclusion

Based on the analysis of the Play Store data and user reviews, several key insights and actionable strategies have been identified to help the client achieve their business objectives of improving app performance, user satisfaction, and overall market success.

1. **Encouraging User Reviews and Feedback**: By implementing in-app prompts and offering incentives for users to leave reviews, the client can increase the number of reviews, which is positively correlated with higher app ratings. This strategy can improve app visibility and attract more users.

2. **Focusing on Quality Improvement**: Regular updates, bug fixes, and new features are essential to maintain high app ratings and user satisfaction. Active development signals a commitment to quality, leading to better user experiences and increased retention.

3. **Optimizing App Size**: Ensuring that the app size is optimized for performance without compromising on features can enhance download speeds and overall user experience, especially for users with limited storage or data plans.

4. **Targeting Marketing and User Acquisition**: Investing in targeted marketing campaigns can increase app visibility and attract new users. A higher number of installs can lead to more user engagement and feedback, further boosting app performance and ratings.

5. **Addressing Negative Feedback**: Analyzing negative reviews to identify common issues and areas for improvement is crucial. Addressing these issues can turn dissatisfied users into satisfied ones, leading to better ratings and reviews.

6. **Leveraging Positive Feedback for Marketing**: Using positive reviews and high ratings in marketing materials can build trust and credibility, attracting new users who are looking for highly rated and trusted apps.

7. **Exploring Less Competitive Categories**: Developing apps in less crowded categories with high average ratings can offer growth opportunities with less competition, providing better visibility and higher chances of success.

### Final Recommendations

By focusing on these strategies, the client can improve app quality, user satisfaction, and overall market performance. Continuous monitoring of app performance metrics, user feedback, and market trends will ensure sustained growth and a competitive edge in the market. Implementing these strategies will not only help in achieving the business objectives but also foster long-term success in the highly competitive app market.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***