# Capstone -Two googleplaystore Data

This step focuses on collecting your data, organizing it, and making sure it's well defined. Paying attention to these tasks will pay off greatly later on. Some data cleaning can be done at this stage, but it's important not to be overzealous in your cleaning before you've explored the data to better understand it.

# 1.1 Imports

In [29]:
#Import pandas, matplotlib.pyplot, and seaborn in the correct lines below
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os

# # 1.2 Load the Google Play Store CSV Data

In [60]:
playstore_data = pd.read_csv('./data/googleplaystore.csv')

In [61]:
#Call the info method on playstore_data to see a summary of the data

playstore_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


In [62]:
#Call the head method on ski_data to print the first several rows of the data
playstore_data.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


# # Data Cleaning

In [63]:
# Data Cleaning Size Column
playstore_data['Size'] = playstore_data['Size'].map(lambda x:x.rstrip('M'))
playstore_data['Size'] = playstore_data['Size'].map(lambda x:
                                                    str(round((float(x.rstrip('k'))/1024),1)) if x[-1] == 'k' else x)
playstore_data['Size'] = playstore_data['Size'].map(lambda x: np.nan if x.startswith('Varies') else x)

In [64]:
# Data cleaning for "Installs" column
playstore_data['Installs'] = playstore_data['Installs'].map(lambda x: x.replace('+','').replace(',',''))


In [65]:
# Data cleaning for "Price" column
playstore_data['Price'] = playstore_data['Price'].map(lambda x: x.replace('$',''))

In [66]:
# Row 10472 removed due to missing value of Category
playstore_data.drop(playstore_data.index[10472], inplace=True)
# Row [7312,8266] removed due to "Unrated" value in Content Rating
playstore_data.drop(playstore_data.index[[7312,8266]], inplace=True)

In [67]:
# Sort by "Category"
playstore_data.sort_values("Category", inplace = True)

In [68]:
playstore_data['Last Updated'] = playstore_data['Last Updated'].apply(lambda x: x.replace(",", "").replace(' ','-'))
playstore_data['Last Updated'] = pd.to_datetime(playstore_data['Last Updated'],format='%B-%d-%Y')

In [69]:
# Change datatype
playstore_data['Reviews'] = pd.to_numeric(playstore_data['Reviews'])
playstore_data['Installs'] = pd.to_numeric(playstore_data['Installs'])
playstore_data['Price'] = pd.to_numeric(playstore_data['Price'])

In [70]:
playstore_data.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19.0,10000,Free,0.0,Everyone,Art & Design,2018-01-07,1.0.0,4.0.3 and up
35,How to draw Ladybug and Cat Noir,ART_AND_DESIGN,3.8,564,9.2,100000,Free,0.0,Everyone,Art & Design,2018-07-11,2.1,4.1 and up
36,UNICORN - Color By Number & Pixel Art Coloring,ART_AND_DESIGN,4.7,8145,24.0,500000,Free,0.0,Everyone,Art & Design;Creativity,2018-08-02,1.0.9,4.4 and up
37,Floor Plan Creator,ART_AND_DESIGN,4.1,36639,,5000000,Free,0.0,Everyone,Art & Design,2018-07-14,Varies with device,2.3.3 and up
38,PIP Camera - PIP Collage Maker,ART_AND_DESIGN,4.7,158,11.0,10000,Free,0.0,Everyone,Art & Design,2017-11-29,1.3,4.0.3 and up


In [73]:
playstore_data.dtypes

App                       object
Category                  object
Rating                   float64
Reviews                    int64
Size                      object
Installs                   int64
Type                      object
Price                    float64
Content Rating            object
Genres                    object
Last Updated      datetime64[ns]
Current Ver               object
Android Ver               object
dtype: object

In [74]:
playstore_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10838 entries, 0 to 3635
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype         
---  ------          --------------  -----         
 0   App             10838 non-null  object        
 1   Category        10838 non-null  object        
 2   Rating          9365 non-null   float64       
 3   Reviews         10838 non-null  int64         
 4   Size            9143 non-null   object        
 5   Installs        10838 non-null  int64         
 6   Type            10837 non-null  object        
 7   Price           10838 non-null  float64       
 8   Content Rating  10838 non-null  object        
 9   Genres          10838 non-null  object        
 10  Last Updated    10838 non-null  datetime64[ns]
 11  Current Ver     10830 non-null  object        
 12  Android Ver     10836 non-null  object        
dtypes: datetime64[ns](1), float64(2), int64(2), object(8)
memory usage: 1.2+ MB


In [None]:
.

