<a href="https://colab.research.google.com/github/B-7792/Exploratory-Play-Store/blob/main/Exploratory_Data_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -
#**Exploratory Data Analysis of Google Play Store Apps**


##### **Project Type**    - Data Analysis / Exploratory Data Analysis (EDA)
##### **Contribution**    - Individual
##### **Team Member 1** - *Bhushan Mohod*


# **Project Summary -**
* Google Play Store Apps Data: Contains various attributes for each app such as category, rating, size, installs, and price.

* Google Play Store User Reviews Data: Contains user reviews with attributes such as review ID, username, content, score, thumbs up count, and sentiment.

Write the summary here within 500-600 words.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**



This project aims to empower app developers with the insights they need to succeed in the competitive landscape of the Google Play Store. By conducting a thorough exploratory data analysis of app attributes and user reviews, we seek to uncover the key factors that drive app engagement and success, providing a clear pathway for developers to optimize their apps and achieve greater user satisfaction

#### **Define Your Business Objective?**

To perform an exploratory data analysis (EDA) of the Google Play Store apps data and user reviews to uncover key factors that influence app engagement and success. The goal is to provide actionable insights that app developers can use to optimize their apps for better performance and user satisfaction

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





Well-structured, formatted, and commented code is required.
Exception Handling, Production Grade Code & Deployment Ready Code will be a plus.
The notebook should be executable in one go without any errors.
Each and every logic should have proper comments.
Charts and Insights:
Each chart must address:
Why the specific chart was chosen.
Insights gained from the chart.
Potential business impact of the insights.
Any insights that may indicate negative growth, with justifications.
Analysis Plan:
Univariate Analysis (U):

App Ratings: Distribution of app ratings.
App Sizes: Distribution of app sizes.
Number of Installs: Distribution of installs.
App Prices: Distribution of app prices.
Bivariate Analysis (B):

Ratings by Category: Average rating by app category.
Installs vs. Ratings: Relationship between number of installs and ratings.
Price vs. Ratings: Relationship between app price and ratings.
Sentiment by Ratings: Sentiment polarity versus app ratings.
Multivariate Analysis (M):

Ratings by Category and Sentiment: How category and sentiment together affect app ratings.
Installs, Size, and Ratings: Relationship between app size, installs, and ratings.
Price, Category, and Ratings: Impact of price and category on app ratings

# Data Preprocessing:

In [None]:
from google.colab import files

uploaded = files.upload()
import pandas as pd

# Assume the file name is 'Play Store Data.csv'
df = pd.read_csv('Play Store Data.csv')
print(df.head())


In [None]:
from google.colab import files

# This will prompt you to select files from your local machine
uploaded = files.upload()
# Step 1: Import necessary libraries
import pandas as pd
from google.colab import drive

# Step 2: Mount Google Drive
drive.mount('/content/drive')

# Step 3: Specify the path to your CSV file
file_path = '/content/drive/My Drive/path_to_your_file/user_reviews.csv'

# Step 4: Read the CSV file into a pandas DataFrame
reviews_df = pd.read_csv(file_path)

# Step 5: Display the first few rows of the dataframe
reviews_df.head()


In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from textblob import TextBlob
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# Load the datasets
apps_df = pd.read_csv('Play Store Data.csv')
reviews_df = pd.read_csv('User_Reviews.csv')

# Display the first few rows of the datasets
apps_df.head()
reviews_df.head()




# Data Cleaning:

In [None]:
# Cleaning the apps dataset
# Removing duplicates
apps_df.drop_duplicates(inplace=True)

# Handling missing values
apps_df.dropna(inplace=True)

# Converting data types
apps_df['Size'] = apps_df['Size'].apply(lambda x: float(str(x).replace('M', '')) * 1e6 if 'M' in str(x) else (
    float(str(x).replace('k', '')) * 1e3 if 'k' in str(x) else (np.nan if x == 'Varies with device' else float(x))))
apps_df['Installs'] = apps_df['Installs'].str.replace('+', '').str.replace(',', '').astype(int)
apps_df['Price'] = apps_df['Price'].str.replace('$', '').astype(float)

# Cleaning the reviews dataset
reviews_df.drop_duplicates(inplace=True)
reviews_df.dropna(subset=['content'], inplace=True)

# Display cleaned datasets
apps_df.info()
reviews_df.info()


# Univariate Analysis (U):
* Distribution of App Ratings:

In [None]:
plt.figure(figsize=(10, 6))
sns.histplot(apps_df['Rating'], bins=30, kde=True)
plt.title('Distribution of App Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()

# Why did you pick the specific chart?
# A histogram was chosen to show the distribution of app ratings, as it helps visualize the spread and frequency of ratings.

# What is/are the insight(s) found from the chart?
# Most apps have ratings between 3 and 4.5, indicating a generally positive reception among users.

# Will the gained insights help create a positive business impact?
# Yes, understanding the common rating range helps developers set realistic benchmarks and strive for higher ratings.

# Are there any insights that lead to negative growth? Justify with specific reason.
# Apps with ratings below 3 could indicate poor user experience, which developers need to address to avoid negative growth.


# Bivariate Analysis (B):
* Ratings by Category:

In [None]:
plt.figure(figsize=(15, 8))
sns.boxplot(x='Category', y='Rating', data=apps_df)
plt.xticks(rotation=90)
plt.title('Average Rating by Category')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.show()

# Why did you pick the specific chart?
# A boxplot was chosen to compare the distribution of ratings across different app categories.

# What is/are the insight(s) found from the chart?
# Categories like "Health & Fitness" and "Books & Reference" tend to have higher median ratings compared to others like "Business" and "Tools".

# Will the gained insights help create a positive business impact?
# Yes, developers can focus on high-rating categories to increase the likelihood of success.

# Are there any insights that lead to negative growth? Justify with specific reason.
# Categories with consistently lower ratings may indicate market saturation or unmet user needs, which could lead to negative growth if not addressed.


# Multivariate Analysis (M):
* Ratings by Category and Sentiment:

In [None]:
# Perform sentiment analysis on reviews
sia = SentimentIntensityAnalyzer()
reviews_df['sentiment'] = reviews_df['content'].apply(lambda x: sia.polarity_scores(x)['compound'])

# Merge datasets
merged_df = pd.merge(apps_df, reviews_df, on='App')

# Plot ratings by category and sentiment
plt.figure(figsize=(15, 8))
sns.scatterplot(x='Category', y='Rating', hue='sentiment', data=merged_df, palette='coolwarm', alpha=0.6)
plt.xticks(rotation=90)
plt.title('Ratings by Category and Sentiment')
plt.xlabel('Category')
plt.ylabel('Rating')
plt.show()

# Why did you pick the specific chart?
# A scatter plot was chosen to visualize the relationship between category, ratings, and sentiment.

# What is/are the insight(s) found from the chart?
# Positive sentiment correlates with higher ratings across categories, with some categories showing more pronounced trends.

# Will the gained insights help create a positive business impact?
# Yes, understanding the impact of sentiment on ratings within categories can help developers tailor their strategies to improve user experience and satisfaction.

# Are there any insights that lead to negative growth? Justify with specific reason.
# Categories with negative sentiment and lower ratings indicate areas where user dissatisfaction is prevalent, highlighting the need for improvements to prevent negative growth.


# Exception Handling and Production Grade Code:

In [None]:
try:
    # Example of data loading and preprocessing with exception handling
    apps_df = pd.read_csv('googleplaystore.csv')
    reviews_df = pd.read_csv('googleplaystore_user_reviews.csv')

    # Further processing steps...

except FileNotFoundError as e:
    print(f"Error: {e}. Please ensure the dataset files are in the correct directory.")
except pd.errors.ParserError as e:
    print(f"Error: {e}. There was an issue parsing the CSV files.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")


# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

### Dataset Loading

In [None]:
# Load Dataset

### Dataset First View

In [None]:
# Dataset First Look

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

### Dataset Information

In [None]:
# Dataset Info

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

In [None]:
# Dataset Describe

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***