# Google Apps Analysis

<a id="table_of_contents"></a>

### Table of contents

<ol>
  <li><a href="#overview">Situation Overview</a>
  <ul>
   <li><a href='#dataset'>Examine the overall dataset
    </a></li>
    <li><a href='#cleaning'>Execute cleaning procedures
    </a></li>
    
  </ul>
  </li>
  
  <li><a href="#eda">Exploratory analysis</a>
  <ul>
    <li><a href='#unique_values'>Begin by computing distinct values
    </a></li>
    <li><a href='#null_values'>Identify any null values within the dataset
    </a></li>
    <li><a href='#apps_by_category'>Find out the number of apps by category
    </a></li>
    <li><a href='#apps_by_rating'>Find out the app ratings: minimum, maximum, and average
    </a></li>
    </ul>
  </li>  
  
  <li><a href="#analysis">Data analysis</a>
  <ul>
    <li><a href='#free_paid_apps'>Comparing ratings between free and paid apps
    </a></li>
    </ul>
  </li>  
</ol>

<a id="overview"></a>
## Situation Overview
An application developer objective is to create a successful Andrioid App, in order to make an informed decision the developer wants to answer the following questions:

1. what app categories are the most popular?
2. Which is the most optimal price?
3. Which is the best way to maximize users ratings?

<a href="#table_of_contents">Navigate to contents</a>


<a id="dataset"></a>
Examine the overall dataset
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [10]:
#load sql extension
%load_ext sql

In [11]:
#connect to mysql database
%sql mysql://root:NilArj_21@localhost:3306/project

In [12]:
%%sql 
select * from googleplaystore 
limit 3

 * mysql://root:***@localhost:3306/project
3 rows affected.


App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
"U Launcher Lite – FREE Live Cool Themes, Hide Apps",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up


In [13]:
%%sql 
select * from googleplaystore_user_reviews
limit 3

 * mysql://root:***@localhost:3306/project
3 rows affected.


App,Translated_Review,Sentiment,Sentiment_Polarity,Sentiment_Subjectivity
10 Best Foods for You,"I like eat delicious food. That's I'm cooking food myself, case ""10 Best Foods"" helps lot, also ""Best Before (Shelf Life)""",Positive,1.0,0.5333333333333333
10 Best Foods for You,This help eating healthy exercise regular basis,Positive,0.25,0.2884615384615384
10 Best Foods for You,,,,


In [14]:
%%sql
select column_name, data_type
from information_schema.columns
where table_schema = "project"
    and table_name = "googleplaystore"

 * mysql://root:***@localhost:3306/project
13 rows affected.


COLUMN_NAME,DATA_TYPE
Android Ver,text
App,text
Category,text
Content Rating,text
Current Ver,text
Genres,text
Installs,text
Last Updated,text
Price,text
Rating,double


In [15]:
%%sql
select column_name, data_type
from information_schema.columns
where table_schema = "project"
    and table_name = "googleplaystore_user_reviews"

 * mysql://root:***@localhost:3306/project
5 rows affected.


COLUMN_NAME,DATA_TYPE
App,text
Sentiment,text
Sentiment_Polarity,double
Sentiment_Subjectivity,double
Translated_Review,text


<a id="cleaning"></a>
Execute cleaning procedures
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [16]:
%%sql 
alter table googleplaystore
modify column Reviews float

 * mysql://root:***@localhost:3306/project
(MySQLdb.DataError) (1265, "Data truncated for column 'Reviews' at row 10473")
[SQL: alter table googleplaystore
modify column Reviews float]
(Background on this error at: https://sqlalche.me/e/20/9h9h)


In [17]:
%%sql 
select * 
from googleplaystore
where Reviews is not null and Reviews regexp "[a-zA-Z]+"

 * mysql://root:***@localhost:3306/project
1 rows affected.


App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,


In [22]:
%%sql
update googleplaystore
set Category = null, Rating = 1.9, Reviews = 19.0, Size = "3.0M", Installs = "1,000+", Type = "Free", Price = "0", `Content Rating` ="Everyone", Genres = null, `Last Updated` = "February 11, 2018", `Current Ver` = "1.0.19", `Android Ver`="4.0 and up"	 
where App = "Life Made WI-Fi Touchscreen Photo Frame"


 * mysql://root:***@localhost:3306/project
1 rows affected.


[]

In [23]:
%%sql 
alter table googleplaystore
modify column Reviews float

 * mysql://root:***@localhost:3306/project
10841 rows affected.


[]

In [24]:
%%sql
alter table googleplaystore
add column updated_price float

 * mysql://root:***@localhost:3306/project
0 rows affected.


[]

In [25]:
%%sql
update googleplaystore
set updated_price = case
                        when Price regexp "^$" then replace(Price, "$","")
                        else cast(Price as float)
                    end

 * mysql://root:***@localhost:3306/project
10841 rows affected.


[]

In [26]:
%%sql
select column_name, data_type
from information_schema.columns
where table_schema = "project"
    and table_name = "googleplaystore"

 * mysql://root:***@localhost:3306/project
14 rows affected.


COLUMN_NAME,DATA_TYPE
Android Ver,text
App,text
Category,text
Content Rating,text
Current Ver,text
Genres,text
Installs,text
Last Updated,text
Price,text
Rating,double


In [27]:
%%sql 
select * from googleplaystore 
limit 3

 * mysql://root:***@localhost:3306/project
3 rows affected.


App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver,updated_price
Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159.0,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up,0.0
Coloring book moana,ART_AND_DESIGN,3.9,967.0,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up,0.0
"U Launcher Lite – FREE Live Cool Themes, Hide Apps",ART_AND_DESIGN,4.7,87510.0,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up,0.0


## Exploratory analysis

<a id="unique_values"></a>
Begin by computing distinct values
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [28]:
%%sql
select count(distinct App) as uniqueApps
from googleplaystore

 * mysql://root:***@localhost:3306/project
1 rows affected.


uniqueApps
9638


In [29]:
%%sql
select count(distinct App) as uniqueAppsReviews
from googleplaystore_user_reviews

 * mysql://root:***@localhost:3306/project
1 rows affected.


uniqueAppsReviews
1074


<a id="null_values"></a>
Identify any null values within the dataset
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [31]:
%%sql
select count(*) as nullValues
from googleplaystore
where App = null

 * mysql://root:***@localhost:3306/project
1 rows affected.


nullValues
0


In [32]:
%%sql
select count(*) as nullValuesReviews
from googleplaystore_user_reviews
where App = null

 * mysql://root:***@localhost:3306/project
1 rows affected.


nullValuesReviews
0


<a id="apps_by_category"></a>
Find out the number of apps by category
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [33]:
%%sql 
select Category, Count(*) as catCount
from googleplaystore 
group by Category
order by catCount desc

 * mysql://root:***@localhost:3306/project
34 rows affected.


Category,catCount
FAMILY,1972
GAME,1144
TOOLS,843
MEDICAL,463
BUSINESS,460
PRODUCTIVITY,424
PERSONALIZATION,392
COMMUNICATION,387
SPORTS,384
LIFESTYLE,382


<a id="apps_by_rating"></a>
Find out the app ratings: minimum, maximum, and average
<br>
<a href="#table_of_contents">Navigate to contents</a>

In [34]:
%%sql
select min(Rating) as minRating,
       max(Rating) as maxRating,
       round(avg(Rating),2) as avgRating
from googleplaystore

 * mysql://root:***@localhost:3306/project
1 rows affected.


minRating,maxRating,avgRating
1.0,5.0,4.19


## Data analysis