# Data Analisys
## Profitable App Profiles for the App Store and Google Play Markets

The goal for this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, our job is to enable provide reliable information to our developer team to make data-driven decisions with respect to the kind of apps they build.

We will focus on 'Free Apps', free to download and install, and our main source of revenue consists of in-app ads, that means that our profits will come mostly by the number of users that use our app and engage with our ads. We will analyze the data to figure out which apps would likely attract more users.

| Column | name  |	Description  |
| id     | App ID | App Name | Size (in Bytes) | Currency Type | Price amount | User Rating counts (for all version)
"rating_count_ver"	User Rating counts (for current version)
"user_rating"	Average User Rating value (for all version)
"user_rating_ver"	Average User Rating value (for current version)
"ver"	Latest version code
"cont_rating"	Content Rating
"prime_genre"	Primary Genre
"sup_devices.num"	Number of supporting devices
"ipadSc_urls.num"	Number of screenshots showed for display
"lang.num"	Number of supported languages
"vpp_lic"	Vpp Device Based Licensing Enabled

In [2]:
#Accesing Google File
myfileg = open('googleplaystore.csv', encoding='utf8')
from csv import reader
read_fileg = reader(myfileg)
open_fileg = list(read_fileg)
googleh = open_fileg[0]
googles = open_fileg[1:]

#Accesing Apple File
myfilea = open('AppleStore.csv', encoding='utf8')
from csv import reader
read_filea = reader(myfilea)
open_filea = list(read_filea)
appleh = open_filea[0]
apples = open_filea[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print('GOOGLE STORE DATA')
explore_data(open_fileg,0,5,True)
print ('\n')
print('APPLE STORE DATA')
explore_data(open_filea,0,5,True)


GOOGLE STORE DATA
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13


APPLE STORE DATA
['id', 'track_name', 'size_byt

## The code below show a problem with the register value #10472 (Life Made WI-Fi Touchscreen Photo Frame) does not have any information for 'Category'

In [3]:
print(googles[10472])
print('\n')
print(googleh)
print('\n')
print(open_fileg[1])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


## We will proceed to delete that row.

In [4]:
print(len(googles))
del googles[10472]
print(len(googles))

10841
10840


## Now we will check if there is any duplicated entries on the Google App File.

In [5]:
lst_dupl = []
lst_sing = []
dicc_dupl = {}

for regi in googles:
    name = regi[0]
    if name in lst_sing:
        lst_dupl.append(name)
        if name in dicc_dupl:
            dicc_dupl[name] += 1
        else:
            dicc_dupl[name] = 1
    else:
        lst_sing.append(name)

print('Number of Duplicated Apps in Google Store is ' + str(len(lst_dupl)) + '')
print('\n')
print('Names and number of replicas of Duplicated Apps ', dicc_dupl)
print('\n')
print('Number of NOT Duplicated Apps is ' + str(len(lst_sing)) + '')

Number of Duplicated Apps in Google Store is 1181


Names and number of replicas of Duplicated Apps  {'Quick PDF Scanner + OCR FREE': 2, 'Box': 2, 'Google My Business': 2, 'ZOOM Cloud Meetings': 1, 'join.me - Simple Meetings': 2, 'Zenefits': 1, 'Google Ads': 2, 'Slack': 2, 'FreshBooks Classic': 1, 'Insightly CRM': 1, 'QuickBooks Accounting: Invoicing & Expenses': 2, 'HipChat - Chat Built for Teams': 1, 'Xero Accounting Software': 1, 'MailChimp - Email, Marketing Automation': 1, 'Crew - Free Messaging and Scheduling': 1, 'Asana: organize team projects': 1, 'Google Analytics': 1, 'AdWords Express': 1, 'Accounting App - Zoho Books': 1, 'Invoice & Time Tracking - Zoho': 1, 'Invoice 2go — Professional Invoices and Estimates': 1, 'SignEasy | Sign and Fill PDF and other Documents': 1, 'Genius Scan - PDF Scanner': 1, 'Tiny Scanner - PDF Scanner App': 1, 'Fast Scanner : Free PDF Scan': 1, 'Mobile Doc Scanner (MDScan) Lite': 1, 'TurboScan: scan documents and receipts in PDF': 1, 'Tiny Scanner Pr

## Now we will do the same with duplicated entries on the Apple Store File.

In [6]:
lst_dupl0 = []
lst_sing0 = []
dicc_dupl0 = {}

for regi in apples:
    name0 = regi[1]
    if name0 in lst_sing0:
        lst_dupl0.append(name0)
        if name0 in dicc_dupl0:
            dicc_dupl0[name0] += 1
        else:
            dicc_dupl0[name0] = 1
    else:
        lst_sing0.append(name0)

print('Number of Duplicated Apps in Apple Store is ' + str(len(lst_dupl0)) + '')
print('\n')
print('Names and number of replicas of Duplicated Apps ', dicc_dupl0)
print('\n')
print('Number of NOT Duplicated Apps is ' + str(len(lst_sing0)) + '')

Number of Duplicated Apps in Apple Store is 2


Names and number of replicas of Duplicated Apps  {'Mannequin Challenge': 1, 'VR Roller Coaster': 1}


Number of NOT Duplicated Apps is 7195


## Below we will check if a few specific names are included inside the Data Set using the same lists from the code above.

In [7]:
app_g = lst_sing
app_a = lst_sing0

print('Exist in Google Store?')
print('Instagram')
print('Instagram' in app_g)
print('Facebook')
print('Facebook' in app_g)
print('Snapchat')
print('Snapchat' in app_g)
print('Made Up App')
print('Made Up App' in app_g)
print('Bible')
print('Bible' in app_g)
print('\n')
print('Exist in Apple Store?')
print('Instagram')
print('Instagram' in app_a)
print('Facebook')
print('Facebook' in app_a)
print('Snapchat')
print('Snapchat' in app_a)
print('Made Up App')
print('Made Up App' in app_a)
print('Biblie')
print('Bible' in app_a)

Exist in Google Store?
Instagram
True
Facebook
True
Snapchat
True
Made Up App
False
Bible
True


Exist in Apple Store?
Instagram
True
Facebook
True
Snapchat
True
Made Up App
False
Biblie
True


## Now we will proceed to delete all the duplicated entries on both files, leaving only the ones with higher number of reviews.
Let's start by checking the number of non replicated apps on both data sets.

In [8]:
regi_unicg = len(googles) -len(lst_dupl)
print('Expected lengh of the file with no duplicates for Google Store is', regi_unicg)

regi_unica = len(apples) -len(lst_dupl0)
print('Expected lengh of the file with no duplicates for Apple Store is', regi_unica)

Expected lengh of the file with no duplicates for Google Store is 9659
Expected lengh of the file with no duplicates for Apple Store is 7195


## Let's check the headers for both data sets to know the positions of each field.

In [9]:
print(googleh)
print('\n')
print(appleh)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## We will create a dictionary with all the duplicates with the higher value rating for the Google Store.
After creating the dicctionary we will create two (02) lists that properly will help us to clean up the data set.

In [10]:
dicc_g = {}

for regi in googles:
    key = str(regi[0])
    rev = float(regi[3])
    if key in dicc_g:
        if  dicc_g[key]<rev:
            dicc_g[key] = rev
    else:
        dicc_g[key] = rev

print ('Lenght of diccionary not Duplicated and with Higher Nuber of Reviews', len(dicc_g))
print ('\n')

google_clean =[]
google_added =[]

for regi in googles:
    key = str(regi[0])
    rev = float(regi[3])
    if (dicc_g[key] == rev) and (key not in google_added):
        google_clean.append(regi)
        google_added.append(key)

explore_data(google_clean, 0, 3, True)

Lenght of diccionary not Duplicated and with Higher Nuber of Reviews 9659


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## We will create a dictionary with all the duplicates with the higher value rating for the Apple Store.
After creating the dicctionary we will create two (02) lists that properly will help us to clean up the data set.

In [11]:
dicc_a = {}

for regi in apples:
    key = str(regi[1])
    rev = float(regi[5])
    if key in dicc_a:
        if  dicc_a[key] < rev:
            dicc_a[key] = rev
    else:
        dicc_a[key] = rev

print ('Lenght of diccionary not Duplicated and with Higher Nuber of Reviews', len(dicc_a))
print ('\n')

apple_clean =[]
apple_added =[]

for regi in apples:
    key = str(regi[1])
    rev = float(regi[5])
    if (dicc_a[key] == rev) and (key not in apple_added):
        apple_clean.append(regi)
        apple_added.append(key)

explore_data(apple_clean, 0, 3, True)

Lenght of diccionary not Duplicated and with Higher Nuber of Reviews 7195


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7195
Number of columns: 16


## Now we will remove NON-ENGLISH apps by searching all the apps names looking for special characters non English based, taking into consideration only ASCII codes under 127.
We will start by creating a function that returns a True or False value if a string has more then 3 non english character

In [12]:
def non_engl(chain):
    nume_char = 0
    for character in chain:
        if ord(character) > 127:
            nume_char += 1
    if nume_char > 3:
        return False
    else:
        return True

Let's use the function defined above to find App names with 4 or more non English characters on both data sets.

In [13]:
google_step2 = []
google_nonengl = []

for regi in google_clean:
    name = regi[0]    
    if non_engl(name) == True:
        google_step2.append(regi)
    else:
        google_nonengl.append(name)

print('Google English Named Apps: ', len(google_step2))
print('Google Non English Named Apps: ', len(google_nonengl))

Google English Named Apps:  9614
Google Non English Named Apps:  45


In [14]:
apple_step2 = []
apple_nonengl = []

for regi in apple_clean:
    name = regi[1]    
    if non_engl(name) == True:
        apple_step2.append(regi)
    else:
        apple_nonengl.append(name)

print('Apple English Named Apps: ', len(apple_step2))
print('Apple Non English Named Apps: ', len(apple_nonengl))

Apple English Named Apps:  6181
Apple Non English Named Apps:  1014


+ Now that we just removed:\
    **• Removed inaccurate data**\
    **• Removed duplicate app entries**\
    **• Removed non-English apps**

We will proceeed to extract only the 'Free' apps from our data sets.

In [15]:
google_free =[]
google_nofree =[]
#Cost is in Row 7 as a String

for regi in google_step2:
    name = regi[1]
    cost = regi[7]
    if cost == '0':
        google_free.append(regi)
    else:
        google_nofree.append(regi)

google_total = google_step2

print('ANDROID APPS')
print(" Remaining Apps on Google Store =", len(google_step2))
print("Total Paid Apps on Google Store = ", len(google_nofree))
print('--------------------------------------')
print("Total Free Apps on Google Store =", len(google_free))

print("\n")

apple_free =[]
apple_nofree =[]
#Cost is in Row 4 as a Float

for regi in apple_step2:
    name = regi[1]
    cost = float(regi[4])
    if cost == 0.0:
        apple_free.append(regi)
        #print("GRATIS ", regi)
    else:
        apple_nofree.append(regi)
        #print("PAGA ", regi)

apple_total = apple_step2

print('iOS APPS')
print(" Remaining Apps on Apple Store =", len(apple_step2))
print("Total Paid Apps on Apple Store =", len(apple_nofree))
print('--------------------------------------')
print("Total Free Apps on Apple Store =", len(apple_free))

ANDROID APPS
 Remaining Apps on Google Store = 9614
Total Paid Apps on Google Store =  750
--------------------------------------
Total Free Apps on Google Store = 8864


iOS APPS
 Remaining Apps on Apple Store = 6181
Total Paid Apps on Apple Store = 2961
--------------------------------------
Total Free Apps on Apple Store = 3220


+ Until now our data set is cleaned from its original version, so far we:
    - Removed inaccurate data
    - Removed duplicate app entries
    - Removed non-English apps
    - Isolated the free apps

Because our developer team is mainly focused on knowing what type of apps are the most popular among the users, we will extract now the frecuency for the genre of all apps from both stores (**'Google'** & **'Apple'**) to get an idea of the most popular ones.

Below is the function to obtain the percent of each genre for the apps for **'Android'** and **'iOS'**.

In [16]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for regi in dataset:
        total += 1
        value = regi[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    tabl_perc = {}
    for row in table:
        percent = ((table[row] / total) * 100)
        new_perc = round(percent, 2)
        tabl_perc[row] = new_perc
    return tabl_perc

Below is the function to display the percent of each genre for the apps for **'Android'** and **'iOS'**.

## Google Store

In [17]:
#def display(dataset, index):
freq_table(google_free, 9)   

{'Art & Design': 0.6,
 'Art & Design;Creativity': 0.07,
 'Auto & Vehicles': 0.93,
 'Beauty': 0.6,
 'Books & Reference': 2.14,
 'Business': 4.59,
 'Comics': 0.61,
 'Comics;Creativity': 0.01,
 'Communication': 3.24,
 'Dating': 1.86,
 'Education': 5.35,
 'Education;Creativity': 0.05,
 'Education;Education': 0.34,
 'Education;Pretend Play': 0.06,
 'Education;Brain Games': 0.03,
 'Entertainment': 6.07,
 'Entertainment;Brain Games': 0.08,
 'Entertainment;Creativity': 0.03,
 'Entertainment;Music & Video': 0.17,
 'Events': 0.71,
 'Finance': 3.7,
 'Food & Drink': 1.24,
 'Health & Fitness': 3.08,
 'House & Home': 0.82,
 'Libraries & Demo': 0.94,
 'Lifestyle': 3.89,
 'Lifestyle;Pretend Play': 0.01,
 'Card': 0.45,
 'Arcade': 1.85,
 'Puzzle': 1.13,
 'Racing': 0.99,
 'Sports': 3.46,
 'Casual': 1.76,
 'Simulation': 2.04,
 'Adventure': 0.68,
 'Trivia': 0.42,
 'Action': 3.1,
 'Word': 0.26,
 'Role Playing': 0.94,
 'Strategy': 0.91,
 'Board': 0.38,
 'Music': 0.2,
 'Action;Action & Adventure': 0.1,
 'Casu

## Apple Store

In [18]:
freq_table(apple_free, 11)

{'Social Networking': 3.29,
 'Photo & Video': 4.97,
 'Games': 58.14,
 'Music': 2.05,
 'Reference': 0.56,
 'Health & Fitness': 2.02,
 'Weather': 0.87,
 'Utilities': 2.52,
 'Travel': 1.24,
 'Shopping': 2.61,
 'News': 1.34,
 'Navigation': 0.19,
 'Lifestyle': 1.58,
 'Entertainment': 7.89,
 'Food & Drink': 0.81,
 'Sports': 2.14,
 'Book': 0.43,
 'Finance': 1.12,
 'Education': 3.66,
 'Productivity': 1.74,
 'Business': 0.53,
 'Catalogs': 0.12,
 'Medical': 0.19}

Now we will try to order the percentages in descending order for both data sets.

In [19]:
def order_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Order data set in Google Store

In [20]:
order_table(google_free,1)

FAMILY : 18.91
GAME : 9.72
TOOLS : 8.46
BUSINESS : 4.59
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.53
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6


## Order data set in Apple Store

In [21]:
order_table(apple_free,11)

Games : 58.14
Entertainment : 7.89
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.52
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.34
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


Below we show the technique used to order a List, Dicc or Tuple like the one we used in the 'order_table' function above.

In [22]:
lista1 = ['Eddy', 'Lorena', 'Victoria', 'EddyJr.']
lista2 = {'Eddy':42, 'Lorena':39, 'Victoria':8, 'EddyJr.':6}
lista3 = [('Eddy',42), ('Lorena',39), ('Victoria',8), ('EddyJr.', 6)]
lista4 = [(42, 'Eddy'), (39, 'Lorena'), (8, 'Victoria'), (6, 'EddyJr.')]

print(sorted(lista1, reverse = True))
print(sorted(lista2, reverse = True))
print(sorted(lista3, reverse = False))
print(sorted(lista4, reverse = False))

['Victoria', 'Lorena', 'EddyJr.', 'Eddy']
['Victoria', 'Lorena', 'EddyJr.', 'Eddy']
[('Eddy', 42), ('EddyJr.', 6), ('Lorena', 39), ('Victoria', 8)]
[(6, 'EddyJr.'), (8, 'Victoria'), (39, 'Lorena'), (42, 'Eddy')]


Below we will use the function 'freq_table' to know how many rating 

In [23]:
genresT = freq_table(google_free,1)

for regi in genresT:
    tota_rati = 0
    leng_rati = 0
    for row in google_free:
        genr_goog = row[1]
        if genr_goog == regi:
            regi_rati = float(row[3])
            tota_rati += regi_rati
            leng_rati += 1
            
    aver_rati = round((tota_rati / leng_rati),2)
    total = int(tota_rati)
    print (regi, '(', total,'/',leng_rati, ')', '=', aver_rati)

ART_AND_DESIGN ( 1407867 / 57 ) = 24699.42
AUTO_AND_VEHICLES ( 1159503 / 82 ) = 14140.28
BEAUTY ( 396240 / 53 ) = 7476.23
BOOKS_AND_REFERENCE ( 16719063 / 190 ) = 87995.07
BUSINESS ( 9865569 / 407 ) = 24239.73
COMICS ( 2342209 / 55 ) = 42585.62
COMMUNICATION ( 285739629 / 287 ) = 995608.46
DATING ( 3622290 / 165 ) = 21953.27
EDUCATION ( 5798189 / 103 ) = 56293.1
ENTERTAINMENT ( 25648941 / 85 ) = 301752.25
EVENTS ( 161018 / 63 ) = 2555.84
FINANCE ( 12639775 / 328 ) = 38535.9
FOOD_AND_DRINK ( 6322667 / 110 ) = 57478.79
HEALTH_AND_FITNESS ( 21319927 / 273 ) = 78094.97
HOUSE_AND_HOME ( 1929789 / 73 ) = 26435.47
LIBRARIES_AND_DEMO ( 906842 / 83 ) = 10925.81
LIFESTYLE ( 11736951 / 346 ) = 33921.82
GAME ( 589197554 / 862 ) = 683523.84
FAMILY ( 189627665 / 1676 ) = 113143.0
MEDICAL ( 1167538 / 313 ) = 3730.15
SOCIAL ( 227936113 / 236 ) = 965830.99
SHOPPING ( 44553582 / 199 ) = 223887.35
PHOTOGRAPHY ( 105465239 / 261 ) = 404081.38
SPORTS ( 35198523 / 301 ) = 116938.61
TRAVEL_AND_LOCAL ( 2680327

We will repeat the previous step with the Apple Store data set

In [24]:
genresT = freq_table(apple_free,-5)

for regi in genresT:
    tota_rati = 0
    leng_rati = 0
    for row in apple_free:
        genr_appl = row[-5]
        if genr_appl == regi:
            regi_rati = float(row[5])
            tota_rati += regi_rati
            leng_rati += 1
    
    aver_rati = round((tota_rati / leng_rati),2)
    total = int(tota_rati)
    print (regi, '(', total,'/',leng_rati, ')', '=', aver_rati)

Social Networking ( 7584125 / 106 ) = 71548.35
Photo & Video ( 4550647 / 160 ) = 28441.54
Games ( 42705795 / 1872 ) = 22812.92
Music ( 3783551 / 66 ) = 57326.53
Reference ( 1348958 / 18 ) = 74942.11
Health & Fitness ( 1514371 / 65 ) = 23298.02
Weather ( 1463837 / 28 ) = 52279.89
Utilities ( 1513441 / 81 ) = 18684.46
Travel ( 1129752 / 40 ) = 28243.8
Shopping ( 2261254 / 84 ) = 26919.69
News ( 913665 / 43 ) = 21248.02
Navigation ( 516542 / 6 ) = 86090.33
Lifestyle ( 840774 / 51 ) = 16485.76
Entertainment ( 3563577 / 254 ) = 14029.83
Food & Drink ( 866682 / 26 ) = 33333.92
Sports ( 1587614 / 69 ) = 23008.9
Book ( 556619 / 14 ) = 39758.5
Finance ( 1132846 / 36 ) = 31467.94
Education ( 826470 / 118 ) = 7003.98
Productivity ( 1177591 / 56 ) = 21028.41
Business ( 127349 / 17 ) = 7491.12
Catalogs ( 16016 / 4 ) = 4004.0
Medical ( 3672 / 6 ) = 612.0


Now, we will repeat the previous technique to find the frequency but taking into consideration the number of downloads from our apps' main cathegory, we will start wit our Google data set. For this we will have to clean the data to erase special characters.

In [42]:
downG = freq_table(google_free,1)
  
for regi in downG:
    tota_down = 0 
    leng_down = 0
    for row in google_free:
        category = row[1]
        downs = row[5]
        if regi == category:
            data_inst = downs
            valu_inst = downs 
            valu_inst = valu_inst.replace(',','')
            valu_inst = valu_inst.replace('+','')
            tota_down = tota_down + int(valu_inst)
            leng_down += 1
            aver_down = round((tota_down / leng_down), 2)
    print (regi, ': Total Downloads=', tota_down, '- Number of Apps=', leng_down, 'Average Donwloads=', aver_down)    
    


ART_AND_DESIGN : Total Downloads= 113221100 - Number of Apps= 57 Average Donwloads= 1986335.09
AUTO_AND_VEHICLES : Total Downloads= 53080061 - Number of Apps= 82 Average Donwloads= 647317.82
BEAUTY : Total Downloads= 27197050 - Number of Apps= 53 Average Donwloads= 513151.89
BOOKS_AND_REFERENCE : Total Downloads= 1665884260 - Number of Apps= 190 Average Donwloads= 8767811.89
BUSINESS : Total Downloads= 696902090 - Number of Apps= 407 Average Donwloads= 1712290.15
COMICS : Total Downloads= 44971150 - Number of Apps= 55 Average Donwloads= 817657.27
COMMUNICATION : Total Downloads= 11036906201 - Number of Apps= 287 Average Donwloads= 38456119.17
DATING : Total Downloads= 140914757 - Number of Apps= 165 Average Donwloads= 854028.83
EDUCATION : Total Downloads= 188850000 - Number of Apps= 103 Average Donwloads= 1833495.15
ENTERTAINMENT : Total Downloads= 989460000 - Number of Apps= 85 Average Donwloads= 11640705.88
EVENTS : Total Downloads= 15973160 - Number of Apps= 63 Average Donwloads= 2

Let's repeat the steps above for the Apple data set.

In [45]:
downA = freq_table(apple_free,-5)
  
for regi in downA:
    tota_down = 0 
    leng_down = 0
    for row in apple_free:
        category = row[-5]
        downs = row[5]
        if regi == category:
            data_inst = downs
            valu_inst = downs 
            valu_inst = valu_inst.replace(',','')
            valu_inst = valu_inst.replace('+','')
            tota_down = tota_down + int(valu_inst)
            leng_down += 1
            aver_down = round((tota_down / leng_down), 2)
    print (regi, ': Total Ratings=', tota_down, '- Number of Apps=', leng_down, 'Average Ratings=', aver_down)

Social Networking : Total Ratings= 7584125 - Number of Apps= 106 Average Ratings= 71548.35
Photo & Video : Total Ratings= 4550647 - Number of Apps= 160 Average Ratings= 28441.54
Games : Total Ratings= 42705795 - Number of Apps= 1872 Average Ratings= 22812.92
Music : Total Ratings= 3783551 - Number of Apps= 66 Average Ratings= 57326.53
Reference : Total Ratings= 1348958 - Number of Apps= 18 Average Ratings= 74942.11
Health & Fitness : Total Ratings= 1514371 - Number of Apps= 65 Average Ratings= 23298.02
Weather : Total Ratings= 1463837 - Number of Apps= 28 Average Ratings= 52279.89
Utilities : Total Ratings= 1513441 - Number of Apps= 81 Average Ratings= 18684.46
Travel : Total Ratings= 1129752 - Number of Apps= 40 Average Ratings= 28243.8
Shopping : Total Ratings= 2261254 - Number of Apps= 84 Average Ratings= 26919.69
News : Total Ratings= 913665 - Number of Apps= 43 Average Ratings= 21248.02
Navigation : Total Ratings= 516542 - Number of Apps= 6 Average Ratings= 86090.33
Lifestyle : To

## Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded: the most popular categories for apps are: 'Games', 'Social Network', 'Family and Entertaiment' on both stores.