# Google Play Store Apps Analysis


## introduction

>Google Play Store  is a digital distribution service developed and operated by Google. It is an official apps store that provides variety content such as apps, books, magazines, music, movies and television programs. It serves an as platform to allow users with 'Google certified' Android operating system devices to donwload applications developed and published on the platform either with a charge or free of cost. With the rapidly growth of Android devices and apps, it would be interesting to perform data analysis on the data to obtain valuable insights.

>The dataset that is going to be used is 'Google Play Store Apps' from Kaggle. It contains 10k of web scraped Play Store apps data for analysing the Android market 

>you can get the dataset from https://www.kaggle.com/datasets/lava18/google-play-store-apps

### Description of App Dataset columns
 1- App : The name of the app

 2- Category : The category of the app

 3- Rating : The rating of the app in the Play Store

 4- Reviews : The number of reviews of the app

 5- Size : The size of the app

 6- Install : The number of installs of the app

 7- Type : The type of the app (Free/Paid)

 8- price : of the app (0 if it is Free)

 9- Content Rating :The appropiate target audience of the app

 10- Genres: The genre of the app

 11- Last Updated : The date when the app was last updated

 12- Current Ver : The current version of the app

 13- Android Ver : The minimum Android version required to run the app

## Data Preparation & Exploration  


In [40]:
#import the necessary libraries
import numpy as np 
import matplotlib.pyplot as plt
import pandas as pd 
import seaborn as sns
%matplotlib inline

In [None]:
# Upgrade pandas to use dataframe.explode() function. 
!pip install --upgrade pandas

In [41]:
#loading Apps dataset
df = pd.read_csv('GooglePlayData/apps.csv')

In [None]:
# look at the first 10 records in the apps dataframe
df.sample(5)

In [None]:
#to know the number of rows and columns
print(df.shape)

In [None]:
#to explore Apps columns dataframe
#values of App column 
df['App'].unique()

In [None]:
#to explore Apps dataframe
# type of Type
df['Type'].unique()

In [None]:
# values in category column
df['Category'].unique()

In [None]:
# type of Content Rating
df['Content Rating'].unique()

In [None]:
##to explore Appsdata 
df.info()

In [42]:
# to check for duplicates 
df.duplicated().sum()
print(f"DataFrame has {df.duplicated().sum()} duplicate values")

DataFrame has 0 duplicate values


In [None]:
# check on null values
df.isnull().sum()

In [None]:
# to Explore the describtive statistics of Apps dataframe
df.describe()

## Data Cleaning  

>in this section I will clean data by:


      1- handle with null values in (Rating , size , current ver , Android ver)

      2- Editing incorrect rows in (installs , prise) 

      3- changing the datatype of columns(installs , price , last_updated )

### 1- Handle with null values 

In [43]:
# handling with Rating&size values by mean using fillna()
def replace_with_mean(series):
    """
    Given a series, replace the rows with null values 
    with mean values
    """
    return series.fillna(series.mean())

df['Rating'] =df['Rating'].transform(replace_with_mean)
df['Size']= df['Size'].transform(replace_with_mean)

In [44]:
#handling with current ver& Android Ver Values by replacing with zero 
def replace_with_zero(series):
    return series.fillna(0)

df['Current Ver']= df['Current Ver'].transform(replace_with_zero)
df['Android Ver']= df['Android Ver'].transform(replace_with_zero)

In [31]:
df.isnull().sum()

Unnamed: 0        0
App               0
Category          0
Rating            0
Reviews           0
Size              0
Installs          0
Type              0
Price             0
Content Rating    0
Genres            0
Last Updated      0
Current Ver       0
Android Ver       0
dtype: int64

### 2- Editing incorrect rows

In [45]:
# removing '+' and '$' sign from installs and price columns

#List of signs to remove 
chars_to_remove = ['+',',','$']
# List of column names to Edit
cols_to_Edit = ['Installs','Price']

for col in cols_to_Edit:
    for char in chars_to_remove:
        df[col] = df[col].apply(lambda x: x.replace(char, ''))

In [46]:
df['Installs'].unique()

array(['10000', '500000', '5000000', '50000000', '100000', '50000',
       '1000000', '10000000', '5000', '100000000', '1000000000', '1000',
       '500000000', '50', '100', '500', '10', '1', '5', '0'], dtype=object)

In [47]:
df['Price'].unique()

array(['0', '4.99', '3.99', '6.99', '1.49', '2.99', '7.99', '5.99',
       '3.49', '1.99', '9.99', '7.49', '0.99', '9.00', '5.49', '10.00',
       '24.99', '11.99', '79.99', '16.99', '14.99', '1.00', '29.99',
       '12.99', '2.49', '10.99', '1.50', '19.99', '15.99', '33.99',
       '74.99', '39.99', '3.95', '4.49', '1.70', '8.99', '2.00', '3.88',
       '25.99', '399.99', '17.99', '400.00', '3.02', '1.76', '4.84',
       '4.77', '1.61', '2.50', '1.59', '6.49', '1.29', '5.00', '13.99',
       '299.99', '379.99', '37.99', '18.99', '389.99', '19.90', '8.49',
       '1.75', '14.00', '4.85', '46.99', '109.99', '154.99', '3.08',
       '2.59', '4.80', '1.96', '19.40', '3.90', '4.59', '15.46', '3.04',
       '4.29', '2.60', '3.28', '4.60', '28.99', '2.95', '2.90', '1.97',
       '200.00', '89.99', '2.56', '30.99', '3.61', '394.99', '1.26',
       '1.20', '1.04'], dtype=object)

### 3- changing the datatype

In [48]:
#convert price & installs to float
for x in ['Price','Installs']:
    df[x]= df[x].astype(float)
# Convert last Updated to datetime
df["Last Updated"] = pd.to_datetime(df["Last Updated"])

In [49]:
df.dtypes

Unnamed: 0                 int64
App                       object
Category                  object
Rating                   float64
Reviews                    int64
Size                     float64
Installs                 float64
Type                      object
Price                    float64
Content Rating            object
Genres                    object
Last Updated      datetime64[ns]
Current Ver               object
Android Ver               object
dtype: object

### 4- Rename columns

In [50]:
# rename columns 
df.rename(columns={'Content Rating':'Content_Rating',
                   'Last Updated':'Last_Updated',
                   'Current Ver':'Current_Ver',
                   'Android Ver':'Android_Ver',
                  },inplace=True)

#display the new name of columns 
df.head(0)

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content_Rating,Genres,Last_Updated,Current_Ver,Android_Ver


## Saving Data to CSV 

In [25]:
df.to_csv('GooglePlayData/apps_Updated.csv')