![image.png](attachment:be374bac-70a7-445a-a0f0-4cfe57394dc1.png)

## **Introduction:**

This data analysis project focuses on exploring and understanding the Google Play Store Apps dataset. 

The dataset is loaded into a Pandas DataFrame using Python, and various data analysis tasks are performed to gain insights into the characteristics of the apps available on the Google Play Store.

## **Dataset Overview:**

The dataset consists of 10,841 entries and 13 columns, each representing different attributes of the apps. 

These attributes include the app name, category, rating, number of reviews, size, number of installs, type (free or paid), price, content rating, genres, last updated information, current version, and Android version compatibility.

## **Basic Data Exploration:**

The initial steps involve displaying the top 5 and last 3 rows of the dataset to get a glimpse of the data. 

Additionally, the shape of the dataset is explored, revealing that it contains 10,841 rows and 13 columns.

In [1]:
# Import all required library

import numpy as np
import pandas as pd

In [2]:
df=pd.read_csv("googleplaystore.csv")
df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up


In [3]:
# Shape of dataset

df.shape

(10841, 13)

# Get complete info about the dataset

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  object 
 4   Size            10841 non-null  object 
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content Rating  10840 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last Updated    10841 non-null  object 
 11  Current Ver     10833 non-null  object 
 12  Android Ver     10838 non-null  object 
dtypes: float64(1), object(12)
memory usage: 1.1+ MB


## Get overall statistics about the dataframe

In [5]:
df.describe()

Unnamed: 0,Rating
count,9367.0
mean,4.193338
std,0.537431
min,1.0
25%,4.0
50%,4.3
75%,4.5
max,19.0


In [6]:
df.describe(include = 'all')

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
count,10841,10841,9367.0,10841.0,10841,10841,10840,10841.0,10840,10841,10841,10833,10838
unique,9660,34,,6002.0,462,22,3,93.0,6,120,1378,2832,33
top,ROBLOX,FAMILY,,0.0,Varies with device,"1,000,000+",Free,0.0,Everyone,Tools,"August 3, 2018",Varies with device,4.1 and up
freq,9,1972,,596.0,1695,1579,10039,10040.0,8714,842,326,1459,2451
mean,,,4.193338,,,,,,,,,,
std,,,0.537431,,,,,,,,,,
min,,,1.0,,,,,,,,,,
25%,,,4.0,,,,,,,,,,
50%,,,4.3,,,,,,,,,,
75%,,,4.5,,,,,,,,,,


## Total number of app titles contain Astrology

In [7]:
df.columns

Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',
       'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',
       'Android Ver'],
      dtype='object')

In [8]:
len(df[df['App'].str.contains('Astrology', case=False)])

3

## Find the average App Rating

In [9]:
df.columns

Index(['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type',
       'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver',
       'Android Ver'],
      dtype='object')

In [12]:
df['Rating'].mean()

np.float64(4.193338315362443)

## Find total number of unique category

In [13]:
df['Category'].nunique()

34

## Which category getting the highest avgerage rating?

In [15]:
df.groupby('Category')['Rating'].mean().sort_values(ascending=False)

Category
1.9                    19.000000
EVENTS                  4.435556
EDUCATION               4.389032
ART_AND_DESIGN          4.358065
BOOKS_AND_REFERENCE     4.346067
PERSONALIZATION         4.335987
PARENTING               4.300000
GAME                    4.286326
BEAUTY                  4.278571
HEALTH_AND_FITNESS      4.277104
SHOPPING                4.259664
SOCIAL                  4.255598
WEATHER                 4.244000
SPORTS                  4.223511
PRODUCTIVITY            4.211396
HOUSE_AND_HOME          4.197368
FAMILY                  4.192272
PHOTOGRAPHY             4.192114
AUTO_AND_VEHICLES       4.190411
MEDICAL                 4.189143
LIBRARIES_AND_DEMO      4.178462
FOOD_AND_DRINK          4.166972
COMMUNICATION           4.158537
COMICS                  4.155172
NEWS_AND_MAGAZINES      4.132189
FINANCE                 4.131889
ENTERTAINMENT           4.126174
BUSINESS                4.121452
TRAVEL_AND_LOCAL        4.109292
LIFESTYLE               4.094904
V

## Find total number of apps having 5 star rating

In [16]:
len(df[df['Rating']==5.0])

274

## Find average value of reviews

In [17]:
df['Reviews'].dtypes

dtype('O')

### Convert the data type from Object to Int or Float

In [18]:
df['Reviews'].astype(float)

ValueError: could not convert string to float: '3.0M'

In [20]:
# We got an error of '3.0M'

df[df['Reviews']=='3.0M']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


In [21]:
df['Reviews'] = df['Reviews'].replace('3.0M', 3.0)

In [22]:
# Now convert it to float data type

df['Reviews'] = df['Reviews'].astype('float')

In [23]:
df['Reviews'].dtypes

dtype('float64')

In [24]:
df['Reviews'].mean()

np.float64(444111.9265750392)

## Find total number of Free and Paid apps

In [25]:
df['Type'].value_counts()

Type
Free    10039
Paid      800
0           1
Name: count, dtype: int64

## Which app has maximum reviews?

In [26]:
df[df['Reviews'].max()==df['Reviews']]['App']

2544    Facebook
Name: App, dtype: object

## Display Top 5 Apps Having Highest Reviews

In [27]:
index=df['Reviews'].sort_values(ascending=False).head(5).index

In [28]:
df.iloc[index]['App']

2544              Facebook
3943              Facebook
381     WhatsApp Messenger
336     WhatsApp Messenger
3904    WhatsApp Messenger
Name: App, dtype: object

## Find Average Rating of Free and Paid Apps

In [29]:
df.groupby('Type')['Rating'].mean()

Type
0       19.000000
Free     4.186203
Paid     4.266615
Name: Rating, dtype: float64

## Display Top 5 Apps Having Maximum Installs

In [30]:
df.head(1)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159.0,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up


In [31]:
df['Installs'].dtype

dtype('O')

In [32]:
df['Installs_1']=df['Installs'].str.replace(',','')

In [34]:
df.tail(1)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_1
10840,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307.0,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device,10000000+


In [35]:
df['Installs_1']=df['Installs_1'].str.replace('+','')

In [36]:
df.tail(1)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_1
10840,iHoroscope - 2018 Daily Horoscope & Astrology,LIFESTYLE,4.5,398307.0,19M,"10,000,000+",Free,0,Everyone,Lifestyle,"July 25, 2018",Varies with device,Varies with device,10000000


In [38]:
df['Installs_1'].astype('int')

ValueError: invalid literal for int() with base 10: 'Free'

In [39]:
df[df['Installs_1']=='Free']

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,Installs_1
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,,Free


In [40]:
df['Installs_1']=df['Installs_1'].str.replace('Free','0')

In [41]:
df['Installs_1']=df['Installs_1'].astype('int')

# now it is successfully converted to int
# assign it back - data['Installs_1']=

In [42]:
df['Installs_1'].dtype

dtype('int64')

In [43]:
index=df['Installs_1'].sort_values(ascending=False).head(5).index

In [44]:
df.iloc[index]['App']

5856    Google Play Games
5395        Google Photos
2853        Google Photos
2884        Google Photos
4170         Google Drive
Name: App, dtype: object

Using the iloc() function in python, we can easily retrieve any particular value from a row or column using index values.

### reference - 

https://www.youtube.com/watch?v=qBOw_kcTLpU&list=PL_1pt6K-CLoDMEbYy2PcZuITWEjqMfyoA&index=8