# Project name : Google Playstore Apps Analysis & Visualization

## **About the project**
In this project, you will be working on a real-world dataset of the google play store, one of the most used applications for downloading android apps. This project aims on cleaning the dataset, analyze the given dataset, and mining informational quality insights. This project also involves visualizing the data to better and easily understand trends and different categories.

# **Project Description**
This project will help you understand how a real-world database is analyzed using SQL, how to get maximum available insights from the dataset, pre-process the data using python for a better upcoming performance, how a structured query language helps us retrieve useful information from the database, and visualize the data with the power bi tool.

# **Module 1: Pre-processing, Analyzing data using Python and SQL.**
In this module, you will query the dataset using structured query language to gain insights from the database. The problem statements to be solved will be provided to you and you need to provide the solution for the same using your logic. Different concepts of SQL will be used in this process such as aggregating the data, grouping the data, ordering the data, etc. Module 1 consists of subtasks which are as follows

# **Cleaning** **playstore_apps**

In [None]:
#importing libraries

import numpy as np
import pandas as pd
from numpy import nan

In [None]:
# import playstore_app data set

app=pd.read_csv("playstore_apps.csv",index_col='App')

In [None]:
app.shape

(10841, 12)

In [None]:
# drop duplicate values

app.drop_duplicates(keep=False,inplace=True)

In [None]:
app.head()

Unnamed: 0_level_0,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
App,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159.0,19M,10000.0,Free,0.0,Everyone,Art & Design,07-01-2018,1.0.0,4.0.3 and up
Coloring book moana,ART_AND_DESIGN,3.9,967.0,14M,500000.0,Free,0.0,Everyone,Art & Design;Pretend Play,15-01-2018,2.0.0,4.0.3 and up
"U Launcher Lite – FREE Live Cool Themes, Hide Apps",ART_AND_DESIGN,4.7,87510.0,8.7M,5000000.0,Free,0.0,Everyone,Art & Design,01-08-2018,1.2.4,4.0.3 and up
Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644.0,25M,50000000.0,Free,0.0,Teen,Art & Design,08-06-2018,Varies with device,4.2 and up
Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967.0,2.8M,100000.0,Free,0.0,Everyone,Art & Design;Creativity,20-06-2018,1.1,4.4 and up


In [None]:
app.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9930 entries, Photo Editor & Candy Camera & Grid & ScrapBook to iHoroscope - 2018 Daily Horoscope & Astrology
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Category        9930 non-null   object 
 1   Rating          8487 non-null   float64
 2   Reviews         9929 non-null   float64
 3   Size            9930 non-null   object 
 4   Installs        9929 non-null   float64
 5   Type            9929 non-null   object 
 6   Price           9929 non-null   float64
 7   Content Rating  9929 non-null   object 
 8   Genres          9930 non-null   object 
 9   Last Updated    9929 non-null   object 
 10  Current Ver     9922 non-null   object 
 11  Android Ver     9927 non-null   object 
dtypes: float64(4), object(8)
memory usage: 1008.5+ KB


In [None]:
# checking for null values
app.isnull().sum()

Category             0
Rating            1443
Reviews              1
Size                 0
Installs             1
Type                 1
Price                1
Content Rating       1
Genres               0
Last Updated         1
Current Ver          8
Android Ver          3
dtype: int64

In [None]:
# find Category is equal to 1.9 and drop that row
print(app[app['Category'] == '1.9'])

                                        Category  Rating  Reviews    Size  \
App                                                                         
Life Made WI-Fi Touchscreen Photo Frame      1.9    19.0      NaN  1,000+   

                                         Installs Type  Price Content Rating  \
App                                                                            
Life Made WI-Fi Touchscreen Photo Frame       NaN    0    NaN            NaN   

                                                    Genres Last Updated  \
App                                                                       
Life Made WI-Fi Touchscreen Photo Frame  February 11, 2018          NaN   

                                        Current Ver Android Ver  
App                                                              
Life Made WI-Fi Touchscreen Photo Frame  4.0 and up         NaN  


In [None]:
app.drop("Life Made WI-Fi Touchscreen Photo Frame",inplace=True)

In [None]:
print(app['Category'].unique())

['ART_AND_DESIGN' 'AUTO_AND_VEHICLES' 'BEAUTY' 'BOOKS_AND_REFERENCE'
 'BUSINESS' 'COMICS' 'COMMUNICATION' 'DATING' 'EDUCATION' 'ENTERTAINMENT'
 'EVENTS' 'FINANCE' 'FOOD_AND_DRINK' 'HEALTH_AND_FITNESS' 'HOUSE_AND_HOME'
 'LIBRARIES_AND_DEMO' 'LIFESTYLE' 'GAME' 'FAMILY' 'MEDICAL' 'SOCIAL'
 'SHOPPING' 'PHOTOGRAPHY' 'SPORTS' 'TRAVEL_AND_LOCAL' 'TOOLS'
 'PERSONALIZATION' 'PRODUCTIVITY' 'PARENTING' 'WEATHER' 'VIDEO_PLAYERS'
 'NEWS_AND_MAGAZINES' 'MAPS_AND_NAVIGATION']


In [None]:
# Add 0 in place of NAN value in Ratings
app["Rating"] = app["Rating"].replace(np.nan,0 )

In [None]:
# drop NAN values
app.dropna(inplace=True)

In [None]:
# rename Columns 
app=app.rename(columns={"Current Ver": "current_Ver","Android Ver":"Android_Ver","Last Updated":"Last_Updated","Content Rating":"Content_Rating"})

In [None]:
app.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9918 entries, Photo Editor & Candy Camera & Grid & ScrapBook to iHoroscope - 2018 Daily Horoscope & Astrology
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Category        9918 non-null   object 
 1   Rating          9918 non-null   float64
 2   Reviews         9918 non-null   float64
 3   Size            9918 non-null   object 
 4   Installs        9918 non-null   float64
 5   Type            9918 non-null   object 
 6   Price           9918 non-null   float64
 7   Content_Rating  9918 non-null   object 
 8   Genres          9918 non-null   object 
 9   Last_Updated    9918 non-null   object 
 10  current_Ver     9918 non-null   object 
 11  Android_Ver     9918 non-null   object 
dtypes: float64(4), object(8)
memory usage: 1007.3+ KB


In [None]:
app.shape

(9918, 12)

In [None]:
# exporting the cleaned data
app.to_csv("cleaned_apps_data.csv")

# **Cleaning playstore_reviews**

In [None]:
# read Playstore_reviews data
reviews=pd.read_csv("playstore_reviews.csv",index_col='App')

In [None]:
reviews.head()

Unnamed: 0_level_0,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
App,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10 Best Foods for You,I like eat delicious food. That's I'm cooking ...,Positive,1.0,0.533333
10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.288462
10 Best Foods for You,,,,
10 Best Foods for You,Works great especially going grocery store,Positive,0.4,0.875
10 Best Foods for You,Best idea us,Positive,1.0,0.3


In [None]:
reviews.shape

(44260, 4)

In [None]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 44260 entries, 10 Best Foods for You to Fantasy Football Manager (FPL)
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Translated_Review       25609 non-null  object 
 1   Sentiment               25612 non-null  object 
 2   Sentiment_Polarity      25612 non-null  float64
 3   Sentiment_Subjectivity  25612 non-null  float64
dtypes: float64(2), object(2)
memory usage: 1.7+ MB


In [None]:
# checking for null values
reviews.isnull().sum()

Translated_Review         18651
Sentiment                 18648
Sentiment_Polarity        18648
Sentiment_Subjectivity    18648
dtype: int64

In [None]:
# drop NAN values
reviews.dropna(axis=0,inplace=True)

In [None]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
Index: 25608 entries, 10 Best Foods for You to Fantasy Football Manager (FPL)
Data columns (total 4 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Translated_Review       25608 non-null  object 
 1   Sentiment               25608 non-null  object 
 2   Sentiment_Polarity      25608 non-null  float64
 3   Sentiment_Subjectivity  25608 non-null  float64
dtypes: float64(2), object(2)
memory usage: 1000.3+ KB


In [None]:
# getting all columns
reviews.columns

Index(['Translated_Review', 'Sentiment', 'Sentiment_Polarity',
       'Sentiment_Subjectivity'],
      dtype='object')

In [None]:
# checking for null values again
reviews.isnull().sum()

Translated_Review         0
Sentiment                 0
Sentiment_Polarity        0
Sentiment_Subjectivity    0
dtype: int64

In [None]:
# checking for unique values
print(reviews['Sentiment'].unique())

['Positive' 'Neutral' 'Negative']


In [None]:
# checking for unique values 
print(reviews['Sentiment_Polarity'].unique())

[ 1.          0.25        0.4        ... -0.25520833  0.25965608
  0.11714286]


In [None]:
# checking for unique values
print(reviews['Sentiment_Subjectivity'].unique())

[0.53333333 0.28846154 0.875      ... 0.29380952 0.54280303 0.67063492]


In [None]:
# getting dimensions of dataset after performing data cleaning
reviews.shape

(25608, 4)

In [None]:
# exporting the cleaned data
reviews.to_csv("cleaned_reviews_data.csv")