# Exploration

There are 13 features in the dataset, and each feature indicates some details of Google application name, category, rating, reviews, size, installs, type, price, content rating genres, last updated, current version and Android version.
- App: The application name.
- Category: The category the app belongs to.
- Rating: Overall user rating of the app.
- Reviews: Number of user reviews for the app.
- Size: The size of the app.
- Installs: Number of user installs for the app.
- Type: Either "Paid" or "Free".
- Price: The price of the app.
- Content Rating: The age group the app is targeted at - "Children" / "Mature 21+" / "Adult".
- Genres: Possibly multiple genres the app belongs to.
- Last Updated: The date the app was last updated.
- Current Ver: The current version of the app.
- Android Ver: The Android version is needed for this app.

In [7]:
import pandas as pd
from utils import data

In [9]:
data.head()

Unnamed: 0.1,Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,7-Jan-18,1.0.0,4.0.3 and up
1,1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,15-Jan-18,2.0.0,4.0.3 and up
2,2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,1-Aug-18,1.2.4,4.0.3 and up
3,3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,8-Jun-18,Varies with device,4.2 and up
4,4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,20-Jun-18,1.1,4.4 and up


In [19]:
# On remarque que la premièere colonne n'a pas de nom.
data.rename(columns={'Unnamed: 0': 'id'}, inplace=True)
data.columns

Index(['id', 'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs',
       'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated',
       'Current Ver', 'Android Ver'],
      dtype='object')

In [20]:
data.shape

(10841, 14)

## Verification de l'integrité des données

In [16]:
# Est-ce qu'il y a des nulls ?
data.isnull().sum()

id                   0
App                  0
Category             0
Rating            1474
Reviews              0
Size                 0
Installs             0
Type                 1
Price                0
Content Rating       1
Genres               0
Last Updated         0
Current Ver          8
Android Ver          3
dtype: int64

In [30]:
# Combien de valeurs uniques par colonne ?
for col in data.columns:
    print(f"{col}:")
    unique = data[col].unique()
    unique_length = unique.size
    print(f"\t Unique count: {unique_length}")
    # Et pour ceux avoir moins de 50 valeurs on les affiches
    if unique_length > 50:
        print("\t Too Many Values")
    else:
        print(f"\t {unique}")
    print()
    

id:
	 Unique count: 10841
	 Too Many Values

App:
	 Unique count: 9660
	 Too Many Values

Category:
	 Unique count: 34
	 ['ART_AND_DESIGN' 'AUTO_AND_VEHICLES' 'BEAUTY' 'BOOKS_AND_REFERENCE'
 'BUSINESS' 'COMICS' 'COMMUNICATION' 'DATING' 'EDUCATION' 'ENTERTAINMENT'
 'EVENTS' 'FINANCE' 'FOOD_AND_DRINK' 'HEALTH_AND_FITNESS' 'HOUSE_AND_HOME'
 'LIBRARIES_AND_DEMO' 'LIFESTYLE' 'GAME' 'FAMILY' 'MEDICAL' 'SOCIAL'
 'SHOPPING' 'PHOTOGRAPHY' 'SPORTS' 'TRAVEL_AND_LOCAL' 'TOOLS'
 'PERSONALIZATION' 'PRODUCTIVITY' 'PARENTING' 'WEATHER' 'VIDEO_PLAYERS'
 'NEWS_AND_MAGAZINES' 'MAPS_AND_NAVIGATION' '1.9']

Rating:
	 Unique count: 41
	 [ 4.1  3.9  4.7  4.5  4.3  4.4  3.8  4.2  4.6  3.2  4.   nan  4.8  4.9
  3.6  3.7  3.3  3.4  3.5  3.1  5.   2.6  3.   1.9  2.5  2.8  2.7  1.
  2.9  2.3  2.2  1.7  2.   1.8  2.4  1.6  2.1  1.4  1.5  1.2 19. ]

Reviews:
	 Unique count: 6002
	 Too Many Values

Size:
	 Unique count: 462
	 Too Many Values

Installs:
	 Unique count: 22
	 ['10,000+' '500,000+' '5,000,000+' '50,000,