# 📊 Exploratory Data Analysis on Google Play Store Dataset

- This notebook performs exploratory data analysis (EDA) on a dataset of over 10,000 Android apps listed on the Google Play Store.
- The goal is to understand app trends, clean the data, and uncover patterns in ratings, installs, and other features.

In [223]:
import pandas as pd
import numpy as np

In [224]:
# load the dataset
data = pd.read_csv('google_playstore_dataset_raw.csv') 

In [225]:
data.head(10)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up


In [226]:
data['App']

0           Photo Editor & Candy Camera & Grid & ScrapBook
1                                      Coloring book moana
2        U Launcher Lite – FREE Live Cool Themes, Hide ...
3                                    Sketch - Draw & Paint
4                    Pixel Draw - Number Art Coloring Book
                               ...                        
10836                                     Sya9a Maroc - FR
10837                     Fr. Mike Schmitz Audio Teachings
10838                               Parkinson Exercices FR
10839                        The SCP Foundation DB fr nn5n
10840        iHoroscope - 2018 Daily Horoscope & Astrology
Name: App, Length: 10841, dtype: object

In [227]:
data.isnull()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,False,False,False,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,False,False,False,False,False,False,False,False,False,False,False,False,False
10837,False,False,False,False,False,False,False,False,False,False,False,False,False
10838,False,False,True,False,False,False,False,False,False,False,False,False,False
10839,False,False,False,False,False,False,False,False,False,False,False,False,False


In [228]:
data.isnull().sum()

App                  0
Category             0
Rating            1474
Reviews              0
Size                 0
Installs             0
Type                 1
Price                0
Content Rating       1
Genres               0
Last Updated         0
Current Ver          8
Android Ver          3
dtype: int64

In [229]:
df = data.dropna()

In [230]:
df.isnull().sum()

App               0
Category          0
Rating            0
Reviews           0
Size              0
Installs          0
Type              0
Price             0
Content Rating    0
Genres            0
Last Updated      0
Current Ver       0
Android Ver       0
dtype: int64

# Removing Null Values – Categorical

## Step 1: Handling Missing Values in Numeric Columns
Identifying and removing null values in key numeric fields to ensure accurate statistical analysis.

In [231]:
data = pd.read_csv('google_playstore_dataset_raw.csv').dropna()
data

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10834,FR Calculator,FAMILY,4.0,7,2.6M,500+,Free,0,Everyone,Education,"June 18, 2017",1.0.0,4.1 and up
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


## Step 2: Calculating the Average App Rating
Computed the overall average user rating across all apps in the dataset.

In [232]:
data['Rating']

0        4.1
1        3.9
2        4.7
3        4.5
4        4.3
        ... 
10834    4.0
10836    4.5
10837    5.0
10839    4.5
10840    4.5
Name: Rating, Length: 9360, dtype: float64

In [233]:
round(int(sum(data['Rating']))/9360,2)

4.19

In [234]:
s = 0
for i in data['Rating']:
    s += i
s = int(s)
print(s)

39235


In [235]:
int(sum(data['Rating']))

39235

In [236]:
len(data['Rating'])

9360

In [237]:
round(int(sum(data['Rating']))/len(data['Rating']),2)

4.19

In [238]:
print("Avg rating of these apps : ", round(int(sum(data['Rating']))/len(data['Rating']),2))

Avg rating of these apps :  4.19


## Step 3: Counting Apps with Perfect Ratings
Determined the total number of apps that received a perfect rating of 5.0 from users.

In [239]:
data['Rating']

0        4.1
1        3.9
2        4.7
3        4.5
4        4.3
        ... 
10834    4.0
10836    4.5
10837    5.0
10839    4.5
10840    4.5
Name: Rating, Length: 9360, dtype: float64

In [240]:
c = 0
for i in data['Rating']:
    if(i == 5.0):
        c += 1
print("there are",c,"many applications with rating 5")

there are 274 many applications with rating 5


## Step 4: Analyzing Rating Distributions Between 4.0–4.5 and 4.0–5.0
Filtered and counted apps with ratings within specified ranges to assess popularity and trustworthiness.

In [241]:
c = 0
for i in data['Rating']:
    if(i>=4.0 and i<=5.0):
        c +=1
print(c)


7363


In [242]:
c = 0
for i in data['Rating']:
    if(i>=4.0 and i<=4.5):
        c +=1
print(c)


5446


## Step 5: Calculating the Average Number of User Reviews
Analyzed the distribution and average count of user-submitted reviews per app.

In [243]:
s = 0
for i in data['Reviews']:
    s += int(i)
print(int(s/len(data['Rating'])))

514376


# Removing Null Values - CATEGORICAL

In [244]:
df = pd.read_csv('google_playstore_dataset_raw.csv').dropna()

In [245]:
df.head(5)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


# Q1: How many unique app categories are present in the dataset

In [246]:
df['Category'].unique()

array(['ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY',
       'BOOKS_AND_REFERENCE', 'BUSINESS', 'COMICS', 'COMMUNICATION',
       'DATING', 'EDUCATION', 'ENTERTAINMENT', 'EVENTS', 'FINANCE',
       'FOOD_AND_DRINK', 'HEALTH_AND_FITNESS', 'HOUSE_AND_HOME',
       'LIBRARIES_AND_DEMO', 'LIFESTYLE', 'GAME', 'FAMILY', 'MEDICAL',
       'SOCIAL', 'SHOPPING', 'PHOTOGRAPHY', 'SPORTS', 'TRAVEL_AND_LOCAL',
       'TOOLS', 'PERSONALIZATION', 'PRODUCTIVITY', 'PARENTING', 'WEATHER',
       'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES', 'MAPS_AND_NAVIGATION'],
      dtype=object)

In [247]:
# for i in df['Category'].unique():
#     print(i)

In [248]:
len(df['Category'].unique())

33

# Q2: How many applications belong to the "ART_AND_DESIGN" category?

In [249]:
c = 0
for i in df['Category']:
    if(i == 'ART_AND_DESIGN'):
        c +=1
print(c)

61


# Q3: What types of apps are available on the Play Store?

In [250]:
data = df

In [251]:
data['Type'].unique()

array(['Free', 'Paid'], dtype=object)

#  Q4: What is the distribution of Free and Paid applications?

In [252]:
f = 0
for i in data['Type']:
    if(i == 'Free'):
        f +=1
print("there are",f,"free and",end=' ')

p = 0
for i in data['Type']:
    if(i == 'Paid'):
        p +=1
print("and",p,"paid application")

there are 8715 free and and 645 paid application


# Q5: What percentage of apps in the dataset are free?

In [253]:
print(int(f/(f + p)*100),"% applictaions are free")

93 % applictaions are free


# Q6: What are the different content rating classifications in the dataset?

In [254]:
data['Content Rating'].unique()

array(['Everyone', 'Teen', 'Everyone 10+', 'Mature 17+',
       'Adults only 18+', 'Unrated'], dtype=object)

In [255]:
for i in data['Content Rating'].unique():
    print(i)

Everyone
Teen
Everyone 10+
Mature 17+
Adults only 18+
Unrated


# Exploring Categories Automatically

Instead of manually checking the number of apps in each category by filtering them one at a time (e.g., "ART_AND_DESIGN", "GAME", etc.), i used a more efficient and scalable approach to summarize all categories at once i.e by applying the value_counts() method on the Category column.

In [256]:
df

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10834,FR Calculator,FAMILY,4.0,7,2.6M,500+,Free,0,Everyone,Education,"June 18, 2017",1.0.0,4.1 and up
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


 # Q1: What is the total number of apps in each category?

In [257]:
for name in df['Category'].unique():
    ct = 0
    for i in df['Category']:
        if(i == name):
            ct +=1
    print(name, ':' ,ct)

ART_AND_DESIGN : 61
AUTO_AND_VEHICLES : 73
BEAUTY : 42
BOOKS_AND_REFERENCE : 178
BUSINESS : 303
COMICS : 58
COMMUNICATION : 328
DATING : 195
EDUCATION : 155
ENTERTAINMENT : 149
EVENTS : 45
FINANCE : 323
FOOD_AND_DRINK : 109
HEALTH_AND_FITNESS : 297
HOUSE_AND_HOME : 76
LIBRARIES_AND_DEMO : 64
LIFESTYLE : 314
GAME : 1097
FAMILY : 1746
MEDICAL : 350
SOCIAL : 259
SHOPPING : 238
PHOTOGRAPHY : 317
SPORTS : 319
TRAVEL_AND_LOCAL : 226
TOOLS : 733
PERSONALIZATION : 312
PRODUCTIVITY : 351
PARENTING : 50
WEATHER : 75
VIDEO_PLAYERS : 160
NEWS_AND_MAGAZINES : 233
MAPS_AND_NAVIGATION : 124


In [258]:
# In Dictionary
categories = {}

for name in df['Category'].unique():
    ct = 0
    for i in df['Category']:
        if(i == name):
            ct +=1
    categories[name] = ct

In [259]:
categories

{'ART_AND_DESIGN': 61,
 'AUTO_AND_VEHICLES': 73,
 'BEAUTY': 42,
 'BOOKS_AND_REFERENCE': 178,
 'BUSINESS': 303,
 'COMICS': 58,
 'COMMUNICATION': 328,
 'DATING': 195,
 'EDUCATION': 155,
 'ENTERTAINMENT': 149,
 'EVENTS': 45,
 'FINANCE': 323,
 'FOOD_AND_DRINK': 109,
 'HEALTH_AND_FITNESS': 297,
 'HOUSE_AND_HOME': 76,
 'LIBRARIES_AND_DEMO': 64,
 'LIFESTYLE': 314,
 'GAME': 1097,
 'FAMILY': 1746,
 'MEDICAL': 350,
 'SOCIAL': 259,
 'SHOPPING': 238,
 'PHOTOGRAPHY': 317,
 'SPORTS': 319,
 'TRAVEL_AND_LOCAL': 226,
 'TOOLS': 733,
 'PERSONALIZATION': 312,
 'PRODUCTIVITY': 351,
 'PARENTING': 50,
 'WEATHER': 75,
 'VIDEO_PLAYERS': 160,
 'NEWS_AND_MAGAZINES': 233,
 'MAPS_AND_NAVIGATION': 124}

# Q2: How many applications belong to the "ART_AND_DESIGN" category?

In [260]:
categories['ART_AND_DESIGN']

61

In [261]:
for i in df['Category'].unique():
    print(i,categories[i])

ART_AND_DESIGN 61
AUTO_AND_VEHICLES 73
BEAUTY 42
BOOKS_AND_REFERENCE 178
BUSINESS 303
COMICS 58
COMMUNICATION 328
DATING 195
EDUCATION 155
ENTERTAINMENT 149
EVENTS 45
FINANCE 323
FOOD_AND_DRINK 109
HEALTH_AND_FITNESS 297
HOUSE_AND_HOME 76
LIBRARIES_AND_DEMO 64
LIFESTYLE 314
GAME 1097
FAMILY 1746
MEDICAL 350
SOCIAL 259
SHOPPING 238
PHOTOGRAPHY 317
SPORTS 319
TRAVEL_AND_LOCAL 226
TOOLS 733
PERSONALIZATION 312
PRODUCTIVITY 351
PARENTING 50
WEATHER 75
VIDEO_PLAYERS 160
NEWS_AND_MAGAZINES 233
MAPS_AND_NAVIGATION 124


# Q3: What is the total number of apps by type (Free vs Paid)?

In [262]:
types = {}
for name in df['Type'].unique():
    ct = 0
    for i in df['Type']:
        if(i == name):
            ct +=1
    print(name, ':' ,ct)

Free : 8715
Paid : 645


In [263]:
# In Dictionary 
types = {}
for name in df['Type'].unique():
    ct = 0
    for i in df['Type']:
        if(i == name):
            ct +=1
    types[name] = ct
print(types)

{'Free': 8715, 'Paid': 645}


# Q4: What is the total number of apps for each content rating classification

In [264]:
content_rating = {}
for name in df['Content Rating'].unique():
    ct = 0
    for i in df['Content Rating']:
        if(i == name):
            ct +=1
    print(name, ':' ,ct)

Everyone : 7414
Teen : 1084
Everyone 10+ : 397
Mature 17+ : 461
Adults only 18+ : 3
Unrated : 1


In [265]:
# In Dictionary 
content_rating = {}
for name in df['Content Rating'].unique():
    ct = 0
    for i in df['Content Rating']:
        if(i == name):
            ct +=1
    content_rating[name] = ct
print(content_rating)

{'Everyone': 7414, 'Teen': 1084, 'Everyone 10+': 397, 'Mature 17+': 461, 'Adults only 18+': 3, 'Unrated': 1}


In [266]:
# Rating Distribution / Summary Statistics for App Ratings
df['Rating'].describe()

count    9360.000000
mean        4.191838
std         0.515263
min         1.000000
25%         4.000000
50%         4.300000
75%         4.500000
max         5.000000
Name: Rating, dtype: float64

In [267]:
# Summary Statistics for App Type
df['Type'].describe()

count     9360
unique       2
top       Free
freq      8715
Name: Type, dtype: object

In [268]:
# Summary Statistics for Content Rating
df['Content Rating'].describe()

count         9360
unique           6
top       Everyone
freq          7414
Name: Content Rating, dtype: object

In [269]:
# Summary Statistics for App Categories
df['Category'].describe()

count       9360
unique        33
top       FAMILY
freq        1746
Name: Category, dtype: object

# Handling Missing (Null) Values

In [270]:
import pandas as pd
import numpy as np

from sklearn.impute import SimpleImputer

In [271]:
df = pd.read_csv('google_playstore_dataset_raw.csv')
df

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [272]:
# missing values
df.isnull().sum()

App                  0
Category             0
Rating            1474
Reviews              0
Size                 0
Installs             0
Type                 1
Price                0
Content Rating       1
Genres               0
Last Updated         0
Current Ver          8
Android Ver          3
dtype: int64

In [273]:
df.iloc[ : , 2:3].values

array([[4.1],
       [3.9],
       [4.7],
       ...,
       [nan],
       [4.5],
       [4.5]])

In [274]:
#  create imputer to replace NAN with mean of a column 
impute = SimpleImputer(missing_values = np.nan, strategy = 'mean')
impute.fit(df.iloc[ : , 2:3].values) # calculates mean

In [275]:
# Replace missing values in the Rating column with the mean value computed above
df.iloc[ : , 2:3] = impute.transform(df.iloc[ : , 2:3].values)

In [276]:
df

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.100000,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.900000,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.700000,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.500000,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.300000,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.500000,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.000000,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,4.193338,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.500000,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [277]:
# impute = SimpleImputer(missing_values = np.nan, strategy = 'mean')
# impute.fit(df.iloc[ : , 2:3].values)
# df.iloc[ : , 2:3] = impute.transform(df.iloc[ : , 2:3].values)
# df.head()

In [278]:
# Remove any rows that still contain NaN values (from other columns)
df = df.dropna()

In [279]:
# final check of missing values in each column.
df.isnull().sum()

App               0
Category          0
Rating            0
Reviews           0
Size              0
Installs          0
Type              0
Price             0
Content Rating    0
Genres            0
Last Updated      0
Current Ver       0
Android Ver       0
dtype: int64

# Exporting the Cleaned Dataset to CSV

In [280]:
data.to_csv('google_play_store_data_cleaned.csv', index=False)