# Team 2 - Google Play Store

![](https://www.brandnol.com/wp-content/uploads/2019/04/Google-Play-Store-Search.jpg)

_For more information about the dataset, read [here](https://www.kaggle.com/lava18/google-play-store-apps)._

## Your tasks
- Name your team!
- Read the source and do some quick research to understand more about the dataset and its topic
- Clean the data
- Perform Exploratory Data Analysis on the dataset
- Analyze the data more deeply and extract insights
- Visualize your analysis on Google Data Studio
- Present your works in front of the class and guests next Monday

## Submission Guide
- Create a Github repository for your project
- Upload the dataset (.csv file) and the Jupyter Notebook to your Github repository. In the Jupyter Notebook, **include the link to your Google Data Studio report**.
- Submit your works through this [Google Form](https://forms.gle/oxtXpGfS8JapVj3V8).

## Tips for Data Cleaning, Manipulation & Visualization
- Here are some of our tips for Data Cleaning, Manipulation & Visualization. [Click here](https://hackmd.io/cBNV7E6TT2WMliQC-GTw1A)

_____________________________

## Some Hints for This Dataset:
- There are lots of null values. How should we handle them?
- Column `Installs` and `Size` have some strange values. Can you identify them?
- Values in `Size` column are currently in different format: `M`, `k`. And how about the value `Varies with device`?
- `Price` column is not in the right data type
- And more...


### Đọc dữ liệu

In [1]:
import pandas as pd


In [69]:
google_play=pd.read_csv('../data/02-google-play-store/google-play-store.csv')

In [70]:
google_play.head(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up


In [71]:
google_play.tail(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device
10840,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device


In [None]:
google_play.dtypes

In [74]:
def transform_Reviews(s):
    number=float(s.strip('M'))
    return number*1000000

def transform_Installs(s):
    s=s.replace(',','')
    number=int(s.strip('+'))
    return number    

In [75]:
google_play.Reviews=google_play.Reviews.apply(transform_Reviews)

In [81]:
google_play[google_play.Installs=='Free']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3000000.0,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


Vì cột Installs có giá trị 'Free' và các cột khác thì có vẻ như bị lệch nên ta quyết định xóa nó

In [82]:
google_play=google_play[google_play.Installs!='Free']

In [83]:
google_play.Installs=google_play.Installs.apply(transform_Installs)

In [94]:
google_play.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10840 entries, 0 to 10840
Data columns (total 13 columns):
App               10840 non-null object
Category          10840 non-null object
Rating            9366 non-null float64
Reviews           10840 non-null float64
Size              10840 non-null object
Installs          10840 non-null int64
Type              10839 non-null object
Price             10840 non-null object
Content Rating    10840 non-null object
Genres            10840 non-null object
Last Updated      10840 non-null object
Current Ver       10832 non-null object
Android Ver       10838 non-null object
dtypes: float64(2), int64(1), object(10)
memory usage: 1.2+ MB


In [103]:
print(google_play.Size.unique())

['19M' '14M' '8.7M' '25M' '2.8M' '5.6M' '29M' '33M' '3.1M' '28M' '12M'
 '20M' '21M' '37M' '2.7M' '5.5M' '17M' '39M' '31M' '4.2M' '7.0M' '23M'
 '6.0M' '6.1M' '4.6M' '9.2M' '5.2M' '11M' '24M' 'Varies with device'
 '9.4M' '15M' '10M' '1.2M' '26M' '8.0M' '7.9M' '56M' '57M' '35M' '54M'
 '201k' '3.6M' '5.7M' '8.6M' '2.4M' '27M' '2.5M' '16M' '3.4M' '8.9M'
 '3.9M' '2.9M' '38M' '32M' '5.4M' '18M' '1.1M' '2.2M' '4.5M' '9.8M' '52M'
 '9.0M' '6.7M' '30M' '2.6M' '7.1M' '3.7M' '22M' '7.4M' '6.4M' '3.2M'
 '8.2M' '9.9M' '4.9M' '9.5M' '5.0M' '5.9M' '13M' '73M' '6.8M' '3.5M'
 '4.0M' '2.3M' '7.2M' '2.1M' '42M' '7.3M' '9.1M' '55M' '23k' '6.5M' '1.5M'
 '7.5M' '51M' '41M' '48M' '8.5M' '46M' '8.3M' '4.3M' '4.7M' '3.3M' '40M'
 '7.8M' '8.8M' '6.6M' '5.1M' '61M' '66M' '79k' '8.4M' '118k' '44M' '695k'
 '1.6M' '6.2M' '18k' '53M' '1.4M' '3.0M' '5.8M' '3.8M' '9.6M' '45M' '63M'
 '49M' '77M' '4.4M' '4.8M' '70M' '6.9M' '9.3M' '10.0M' '8.1M' '36M' '84M'
 '97M' '2.0M' '1.9M' '1.8M' '5.3M' '47M' '556k' '526k' '76M' '7.6M'

In [102]:
google_play[google_play['Type'].isna()]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver


Vì cột Type trống và các cột khác cũng mất nhiều cột nên ta quyết định xóa nó

In [101]:
google_play=google_play.dropna(subset=['Type'])

Ta thấy cột Size có hai đơn vị là k (kB) và M (MB) nên ta sẽ chuyển về cùng đơn vị là M

In [123]:
def format_bytes(s):
    if s[-1:]=='k':
        return float(s[:-1])*1000
    else:
        return float(s[:-1])*1000000

#Thay thế các biến '' tại cột Size bằng 9999 vì đây là thông số tùy thuộc vào thiết bị
google_play.Size=[x.strip().replace('Varies with device','9999') for x in google_play.Size]

google_play.Size=google_play.Size.apply(format_bytes)

In [96]:
google_play[google_play['Current Ver'].isna()]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
15,Learn To Draw Kawaii Characters,ART_AND_DESIGN,3.2,55000000.0,2.7M,5000,Free,0,Everyone,Art & Design,"June 6, 2018",,4.2 and up
1553,Market Update Helper,LIBRARIES_AND_DEMO,4.1,20145000000.0,11k,1000000,Free,0,Everyone,Libraries & Demo,"February 12, 2013",,1.5 and up
6322,Virtual DJ Sound Mixer,TOOLS,4.2,4010000000.0,8.7M,500000,Free,0,Everyone,Tools,"May 10, 2017",,4.0 and up
6803,BT Master,FAMILY,,0.0,222k,100,Free,0,Everyone,Education,"November 6, 2016",,1.6 and up
7333,Dots puzzle,FAMILY,4.0,179000000.0,14M,50000,Paid,$0.99,Everyone,Puzzle,"April 18, 2018",,4.0 and up
7407,Calculate My IQ,FAMILY,,44000000.0,7.2M,10000,Free,0,Everyone,Entertainment,"April 3, 2017",,2.3 and up
7730,UFO-CQ,TOOLS,,1000000.0,237k,10,Paid,$0.99,Everyone,Tools,"July 4, 2016",,2.0 and up
10342,La Fe de Jesus,BOOKS_AND_REFERENCE,,8000000.0,658k,1000,Free,0,Everyone,Books & Reference,"January 31, 2017",,3.0 and up
