<a href="https://colab.research.google.com/github/Prajwal3440/EDA/blob/main/Play_Store.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **Project Name**  -  Play store



##### **Project Type**    - EDA
##### **Contribution**    - Individual/Team


**Project Summary -**

The main goal of this project is to analyze the Play Store app data, including attributes such as app ratings, size, category, price, content rating, and more. By performing this analysis, we aim to uncover key insights that can help developers, marketers, or analysts better understand the app market dynamics, identify successful app characteristics, and make data-driven decisions.

Dataset:
The dataset used in this project typically contains information about thousands of Android apps available on the Play Store. It may include attributes like:

App Name: The name of the app.
Category: The category to which the app belongs (e.g., Education, Games, Business, etc.).
Rating: The average user rating (usually on a scale from 1 to 5).
Reviews: The number of reviews the app has received.
Size: The size of the app in MB.
Installs: The number of downloads the app has.
Price: The price of the app, if it’s paid.
Content Rating: The content rating (e.g., Everyone, Teen, Mature 17+).
Last Updated: The date of the last update.
Current Ver: The version of the app.
Steps Involved:

Data Cleaning:

Handle missing data, remove duplicates, and address any inconsistencies (e.g., apps without ratings or categories).
Exploratory Data Analysis (EDA):

Perform basic descriptive statistics to get a sense of the data distribution.
Visualize key attributes, such as the relationship between app ratings and the number of installs, or the distribution of apps across different categories.
Analyze correlations between numerical variables (e.g., rating vs. price).
Data Visualization:

Use libraries like Matplotlib, Seaborn, or Plotly for creating visualizations such as bar charts, scatter plots, histograms, and heatmaps.
Create visualizations to understand the distribution of app categories, app ratings, or price versus installs.
Insights and Conclusion:


Highlight any trends, such as paid apps vs. free apps, or trends based on the content rating.
Advanced Analysis (Optional):

Sentiment analysis of app reviews to understand user satisfaction.
Predictive modeling using machine learning (e.g., predicting app success based on various features).
Tools & Libraries Used:

Python Libraries: Pandas, NumPy, Matplotlib, Seaborn, Plotly
Data Cleaning: Handling missing values, removing outliers, etc.
Data Visualization: Matplotlib, Seaborn
Advanced Analysis (Optional): Scikit-learn, Natural Language Toolkit (NLTK)
Outcome:


# **GitHub Link -**

In [None]:
Copy_of_Sample_EDA_Submission_Template.ipynb

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Load Dataset
path='/content/drive/MyDrive/Data set / Play Store Data.csv'
df=pd.read_csv(path)

### Dataset First View

In [None]:
# Dataset First Look
df.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
df[df['Rating'].isnull()]

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(),cbar=False)

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
df.loc[df['Rating']>5,'Rating'] = df['Rating'].median()

In [None]:
# Write your code to make your dataset analysis ready.
df=df.drop_duplicates(subset='App')
df

In [None]:
df['Last Updated'] = pd.to_datetime(df['Last Updated'],errors='coerce')
df['Last Updated'][0]

In [None]:
df['Last Updated Month']=df['Last Updated'].dt.month
df['Last Updated Year']=df['Last Updated'].dt.year
df

In [None]:
df.isnull().sum()

### What all manipulations have you done and insights you found?

[link text](https://)Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
df['Last Updated Month'].value_counts().sort_index()

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10,5))
plt.title('Last Updated Month Wise count')
plt.xlabel('Last Updated Month')
plt.ylabel('Count')
df['Last Updated Month'].value_counts().sort_index().plot(kind='bar')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [None]:
df['Last Updated Year'].value_counts().sort_index()

In [None]:
# Chart - 2 visualization code
x=df['Last Updated Year'].value_counts().sort_index().index
y=df['Last Updated Year'].value_counts().sort_index().values
plt.figure(figsize=(10,5))
plt.title('Last Updated Year Wise count')
plt.xlabel('Last Updated Year')
plt.ylabel('Count')
sns.barplot(x=x,y=y,palette=sns.color_palette('Set2'))

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [None]:
df['Category'].value_counts()

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10,5))
plt.title('Category Wise count')
plt.xlabel('Category')
plt.ylabel('Count')
df['Category'].value_counts().plot(kind='bar')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [None]:
category_rating=df.groupby('Category')['Rating'].mean().sort_values(ascending=False)
category_rating

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,5))
plt.title('Category Wise Rating')
plt.xlabel('Category')
plt.ylabel('Rating')
category_rating.plot(kind='bar')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [None]:
most_rewiewsapps=df.groupby('App')['Reviews'].max().sort_values(ascending=False).head()
most_rewiewsapps

In [None]:
df.drop(index=10472,inplace=True)

In [None]:
df['Reviews']=pd.to_numeric(df['Reviews'],errors='coerce')

In [None]:
most_rewiewsapps=df.groupby('App')['Reviews'].max().sort_values(ascending=False).head()
most_rewiewsapps=most_rewiewsapps.astype(int)
most_rewiewsapps

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10,5))
plt.title('Most Reviewed Apps')
plt.xlabel('App')
plt.ylabel('Reviews')
most_rewiewsapps.plot(kind='bar')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [None]:
top_installs_apps=df.groupby('App')['Installs'].max().sort_values(ascending=False).head()
top_installs_apps

In [None]:
top_installed_apps=top_installed_apps.astype(int)


In [None]:
# Chart - 6 visualization code
plt.figure(figsize=(10,5))
plt.title('Top Installs Apps')
plt.xlabel('App')
plt.ylabel('Installs')
top_installed_apps.plot(kind='line')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
maxsize_app=df.groupby('App')['Size'].max().sort_values(ascending=False).head()
maxsize_app

In [None]:
df['size']=df['Size'].str.replace('M','000')
df['size']=df['size'].str.replace('k','')
df['size']=df['size'].replace('Varies with device',np.nan)
df['size']=df['size'].astype(float)

In [None]:
maxsize_app=df.groupby('App')['Size'].max().sort_values(ascending=False).head()
maxsize_app

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10,5))
plt.title('Maximum Size Apps')
plt.xlabel('App')
plt.ylabel('')
maxsize_app.plot(kind='pie')

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10,5))
sns.heatmap(df.select_dtypes(include='number').corr(),annot=True)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df)

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***