# Analyzing Mobile Data

We will look at mobile app data from both google and apple to help our developers understand what type of apps are likely to attract more users on Google Play and the App Store.  The company we are working for designs apps for an English-speaking audience and all apps are free.

### Exploring Data

We will open 2 seperate data sets, `apple_data` and `google_data` and explore the variables.  The headers of each dataset will be stored seperately in `google_header` and `apple_header`. A summary of both datasets can be found below.

Data Source: [google data](https://www.kaggle.com/datasets/lava18/google-play-store-apps)  [apple data](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

|Apple|||||Google|||
|---|---|:---|---|---|---|---|:---|
|index|variable|Description|||index|variable|Description|
|0|"id"|App ID|||0|App|Application name|
|1|"track_name"|App Name|||1|Category|Category the app belongs to|
|2|"size_bytes"|Size (in Bytes)|||2|Rating|Overall user rating of the app|
|3|"currency"|Currency Type|||3|Reviews|Number of user reviews for the app|
|4|"price"|Price amount|||4|Size|Size of the app|
|5|"rating_count_tot"|User Rating counts (for all version)|||5|Installs|Number of user downloads/installs|
|6|"rating_count_ver"|User Rating counts (for current version)|||6|Type|Paid or Free|
|7|"user_rating"|Average User Rating value (for all version)|||7|Price|---|
|8|"user_rating_ver"|Average User Rating value (for current version)|||8|Content Rating| Children / Mature 21+ / Adult|
|9|"ver"|Latest version code|||9|Genres|can belong to multiple genres|
|10|"cont_rating"|Content Rating|||10|Last Updated|Date when the app was last updated on Play Store|
|11|"prime_genre"|Primary Genre|||11|Current Ver|Current version of the app available on Play Store|
|12|"sup_devices.num"|Number of supporting devices|||12|Android Ver|Min required Android version|
|13|"ipadSc_urls.num"|Number of screenshots showed for display|||---|---|---|
|14|"lang.num"|Number of supported languages|||---|---|---|
|15|"vpp_lic"|Vpp Device Based Licensing Enabled|||---|---|---|

In [1]:
from csv import reader
opened_file=open('AppleStore.csv')
read_file = reader(opened_file)
data = list(read_file)
apple_header = data[0]
apple_data=data[1:]

opened_file=open('googleplaystore.csv')
read_file = reader(opened_file)
data = list(read_file)
google_header = data[0]
google_data=data[1:]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
explore_data(google_data, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [4]:
explore_data(apple_data, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


## Cleaning the Data

Entry 10472 has missing category.  All data has been shifted by one.  Entry will be deleted

In [5]:
print(google_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [6]:
del google_data[10472]

### Remvoing Duplicates from Google Data

There are several duplicate entries in the google data that need to be removed.

The code below sorts the apps into 2 lists. `duplicate_apps` stores only the duplicate app names.  `unique_apps` stores all app names without repeating the name.  

In [7]:
duplicate_apps = []
unique_apps = []

for row in google_data:
    name = row[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else :
        unique_apps.append(name)

print('duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: \n ', duplicate_apps[:10])

duplicate apps:  1181


Examples of duplicate apps: 
  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


#### Taking a closer look at the duplicates

We took a look at some of the duplicates.  We can see that some of the duplicates are exactly the same in each category and others have some differences.  For those that are the same, we just need to keep one of the entries.  For the apps that have differences, it looks like the number of reviews would be the most relevant way to determine what entries we keep.  We would like the most current data available for each app so we would keep the entry with the most reviews

In [8]:
for row in google_data :
    name = row[0]
    if name == 'Box' :
        print(row)
        
print('\n')

for row in google_data :
    name = row[0]
    if name == 'Instagram' :
        print(row)

['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', '

#### Process for removing duplicates

When we are done removing duplicates, we can expect to have 9,659 entries remaining

We will create an empty dictionary that stores all of the unique apps with the highest number of reviews.  That dictionary should match our expected length of 9,659.

We will then create 2 lists, one to track the apps that have been added and one to add the unique rows of data.  We need to keep track of the apps since some of the rows have exactly the same number of reviews.

We will store are clean data in `google_data2`.

In [9]:
print('Expected Length: ', len(google_data)-1181)

Expected Length:  9659


In [10]:
most_reviews = {}
for row in google_data:
    name = row[0]
    reviews = int(row[3])
    if name in most_reviews and reviews > most_reviews[name]:
        most_reviews[name]=reviews
    if name not in most_reviews :
        most_reviews[name]=reviews

print(len(most_reviews))
        

9659


In [11]:
added=[]
google_data2=[]

for row in google_data :
    name = row[0]
    reviews = int(row[3])
    if name not in added and reviews == most_reviews[name]:
        google_data2.append(row)
        added.append(name)
        
print(len(google_data2))


9659


#### Exploring the google_data2

Now that duplicates have been removed.  We need to explore the data to check and see if any other possible issues with the data could occur.  At this time, no other issues have been found regarding duplicates.

In [12]:
explore_data(google_data2, 175, 184, True)

['Visual Voicemail by MetroPCS', 'BUSINESS', '4.1', '16129', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', 'Varies with device', 'Varies with device']


['Indeed Job Search', 'BUSINESS', '4.3', '674730', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Business', 'May 21, 2018', 'Varies with device', 'Varies with device']


['Uber Driver', 'BUSINESS', '4.4', '1254730', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 3, 2018', 'Varies with device', 'Varies with device']


['ADP Mobile Solutions', 'BUSINESS', '4.3', '85185', '29M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 17, 2018', '3.4.2', '5.0 and up']


['Snag - Jobs Hiring Now', 'BUSINESS', '4.3', '32584', 'Varies with device', '1,000,000+', 'Free', '0', 'Everyone', 'Business', 'May 4, 2018', 'Varies with device', 'Varies with device']


['Docs To Go™ Free Office Suite', 'BUSINESS', '4.1', '217730', 'Varies with device', '50,

## Removing Duplicates from Apple Data

The apps in `apple_data` have a unique ID number.  To see if there are duplicates we need to check and see if there are duplicate ID numbers.  As we can see below, there are no duplicate ID numbers.

In [13]:
apple_unique = []
apple_duplicate = []
for row in apple_data:
    ID = row[0]
    if ID in apple_unique:
        apple_duplicate.append(ID)
    else :
        apple_unique.append(ID)

print('duplicate apps: ', len(apple_duplicate))
print('\n')
print('Examples of duplicate apps: \n ', apple_duplicate[:10])

duplicate apps:  0


Examples of duplicate apps: 
  []


## Removing Non-English Apps

Since our customer only designs apps in English, we are only interested in the apps designed for English speakers.  As a result, we need to remove the non-English apps from our dataset.  To do this, we will use the ASCII system and remove any app containing a character greater than 127.  However, this will remove apps that use emoji's in their name or other logos like (C) and TM.  We will only remove those entries that have more than 3 non-English Characters.

In [14]:
def English_String(a_string):
    non_English_count=0
    for char in a_string :
        if ord(char) > 127 :
            non_English_count +=1
    if non_English_count > 3 :
        return False
    return True

In [15]:
print(English_String('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(English_String('Docs To Go™ Free Office Suite'))
print(English_String('Instachat 😜'))

False
True
True


The function above determines if a string does not contain no more than 3 non-English characters.  When we use this function on both the Google and Apple data, we will have apps whose name is mostly in English.  We might lose a few apps as a result of this, however, There should not be many that have more than 3 English characters.

The code below will remove all of the non-English apps by first determing whether or not our app name is in English.  We will then split the data into 2 seperate data sets.  For google they will be `google_data_eng` and `google_data_non_eng` and for apple they will be `apple_data_eng` and `apple_data_non_eng`.

In [16]:
google_data_eng = []
google_data_non_eng = []

for row in google_data2 :
    name = row[0]
    is_English = English_String(name)
    if is_English :
        google_data_eng.append(row)
    else :
        google_data_non_eng.append(row)
        
print('Google English Apps')
explore_data(google_data_eng, 15, 20, True)
print('\n')
print('Google Non-English Apps')
explore_data(google_data_non_eng, 5, 10, True)

Google English Apps
['Photo Designer - Write your name with shapes', 'ART_AND_DESIGN', '4.7', '3632', '5.5M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design', 'July 31, 2018', '3.1', '4.1 and up']


['350 Diy Room Decor Ideas', 'ART_AND_DESIGN', '4.5', '27', '17M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'November 7, 2017', '1.0', '2.3 and up']


['FlipaClip - Cartoon animation', 'ART_AND_DESIGN', '4.3', '194216', '39M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 3, 2018', '2.2.5', '4.0.3 and up']


['ibis Paint X', 'ART_AND_DESIGN', '4.6', '224399', '31M', '10,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'July 30, 2018', '5.5.4', '4.1 and up']


['Logo Maker - Small Business', 'ART_AND_DESIGN', '4.0', '450', '14M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 20, 2018', '4.0', '4.1 and up']


Number of rows: 9614
Number of columns: 13


Google Non-English Apps
['RMEduS - 음성인식을 활용한 R 프로그래밍 실습 시스템', 'FAMILY', 'NaN', '4', '64M', '

In [17]:
apple_data_eng = []
apple_data_non_eng = []

for row in apple_data :
    name = row[1]
    is_English = English_String(name)
    if is_English :
        apple_data_eng.append(row)
    else :
        apple_data_non_eng.append(row)
        
print('Apple English Apps')
explore_data(apple_data_eng, 15, 20, True)
print('\n')
print('Apple Non-English Apps')
explore_data(apple_data_non_eng, 5, 10, True)

Apple English Apps
['575658129', 'Injustice: Gods Among Us', '1829599232', 'USD', '0.0', '612532', '410', '4.5', '4.5', '2.15.1', '12+', 'Games', '39', '5', '13', '1']


['506627515', 'Hay Day', '113641472', 'USD', '0.0', '567344', '4887', '4.5', '4.5', '1.33.134', '4+', 'Games', '38', '5', '18', '1']


['500116670', 'Clear Vision (17+)', '37879808', 'USD', '0.99', '541693', '69225', '4.5', '4.5', '1.1.3', '17+', 'Games', '43', '5', '1', '1']


['479516143', 'Minecraft: Pocket Edition', '147787776', 'USD', '6.99', '522012', '1148', '4.5', '4.5', '1.1', '9+', 'Games', '37', '1', '11', '1']


['293778748', 'PAC-MAN', '100849664', 'USD', '0.0', '508808', '99', '3.0', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


Number of rows: 6183
Number of columns: 16


Apple Non-English Apps
['414430589', '搜狐视频HD-欢乐颂2 全网首播', '79696896', 'USD', '0.0', '3768', '5', '4.0', '3.0', '6.6', '17+', 'Entertainment', '24', '5', '1', '1']


['895670960', '阴阳师-全区互通现世集结', '1040563200', 'USD', '0.0', '3478

## Isolating Free Apps

Our company is only makes free apps.  Therefore, we do not want to include data on apps that are paid.  We will remove all paid apps from our dataset.

We will do this by by looking at price fore apple and type for google since the google data specifically tells us if it is free or not.

In [18]:
google_data_eng_free = []
google_data_eng_paid = []
for row in google_data_eng :
    free = row[6]
    if free == 'Free' :
        google_data_eng_free.append(row)
    else :
        google_data_eng_paid.append(row)
        
print('Google Free Apps')
explore_data(google_data_eng_free, 12, 17, True)
print('\n')
print('Google Paid Apps')
explore_data(google_data_eng_paid, 7, 12, True)

Google Free Apps
['Mandala Coloring Book', 'ART_AND_DESIGN', '4.6', '4326', '21M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design', 'June 26, 2018', '1.0.4', '4.4 and up']


['3D Color Pixel by Number - Sandbox Art Coloring', 'ART_AND_DESIGN', '4.4', '1518', '37M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 3, 2018', '1.2.3', '2.3 and up']


['Learn To Draw Kawaii Characters', 'ART_AND_DESIGN', '3.2', '55', '2.7M', '5,000+', 'Free', '0', 'Everyone', 'Art & Design', 'June 6, 2018', 'NaN', '4.2 and up']


['Photo Designer - Write your name with shapes', 'ART_AND_DESIGN', '4.7', '3632', '5.5M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design', 'July 31, 2018', '3.1', '4.1 and up']


['350 Diy Room Decor Ideas', 'ART_AND_DESIGN', '4.5', '27', '17M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'November 7, 2017', '1.0', '2.3 and up']


Number of rows: 8863
Number of columns: 13


Google Paid Apps
['Moco+ - Chat, Meet People', 'DATING', '4.2', '1546', 'Varies

In [19]:
apple_data_eng_free = []
apple_data_eng_paid = []
for row in apple_data_eng :
    price = float(row[4])
    if price > 0 :
        apple_data_eng_paid.append(row)
    else :
        apple_data_eng_free.append(row)
        
print('Apple Free Apps')
explore_data(apple_data_eng_free, 12, 17, True)
print('\n')
print('Apple Paid Apps')
explore_data(apple_data_eng_paid, 7, 12, True)

Apple Free Apps
['469369175', 'CSR Racing', '524803072', 'USD', '0.0', '677247', '2029', '4.5', '4.5', '4.0.1', '4+', 'Games', '37', '5', '10', '1']


['924373886', 'Crossy Road - Endless Arcade Hopper', '165471232', 'USD', '0.0', '669079', '1087', '4.5', '4.5', '1.5.4', '9+', 'Games', '38', '5', '13', '1']


['575658129', 'Injustice: Gods Among Us', '1829599232', 'USD', '0.0', '612532', '410', '4.5', '4.5', '2.15.1', '12+', 'Games', '39', '5', '13', '1']


['506627515', 'Hay Day', '113641472', 'USD', '0.0', '567344', '4887', '4.5', '4.5', '1.33.134', '4+', 'Games', '38', '5', '18', '1']


['293778748', 'PAC-MAN', '100849664', 'USD', '0.0', '508808', '99', '3.0', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


Number of rows: 3222
Number of columns: 16


Apple Paid Apps
['625334537', 'Geometry Dash', '83931136', 'USD', '1.99', '266440', '6263', '5.0', '4.5', '2.10', '4+', 'Games', '37', '4', '1', '1']


['417817520', 'Tiny Wings', '30252032', 'USD', '0.99', '219418', '328', '4.5

# Finding an App Profile


We want to determine what app profile will work best on both the Google Play store and Apple Store.  We test our apps on the google play store and then only develop it further if it does well.  We Then transfer it to the apple store if it is profitable after six months.

In [20]:
def freq_table(dataset, index) :
    dictionary = {}
    for row in dataset :
        item = row[index]
        if item in dictionary :
            dictionary[item] += 1
        else :
            dictionary[item] = 1
    return dictionary

In [21]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

When looking at the Google Data by Genre and by Category, we can see that android apps are typically used as tools or for business and education.  Games and entertainement ranks high in both lists as well.

In [22]:
print('Google Data by Genre')
display_table(google_data_eng_free, 9)

Google Data by Genre
Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 80
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
A

In [23]:
print('Google Data by Category')
display_table(google_data_eng_free, 1)

Google Data by Category
FAMILY : 1675
GAME : 862
TOOLS : 750
BUSINESS : 407
LIFESTYLE : 346
PRODUCTIVITY : 345
FINANCE : 328
MEDICAL : 313
SPORTS : 301
PERSONALIZATION : 294
COMMUNICATION : 287
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 190
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 124
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 73
WEATHER : 71
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 57
COMICS : 55
BEAUTY : 53


Looking at the apple data, we can see that the primary genre of these apps are by far Games.  The second and third biggest Genre is Entertainment and photo & video.  It seems most of the apps are used for recreation.

In [24]:
print('Apple Data by Primary Genre')
display_table(apple_data_eng_free, 11)

Apple Data by Primary Genre
Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


### Looking for the most popular apps by Genre

We want to know which of the categories above are most popular.  For the apple data, we do not have information on number of installs so we will use the number of user ratings to see which Genre's are more popular.

We will get the average number of ratings by cycling through the `apple_data_eng_free` dataset and look for unique Genres.  Once a unique genre is found we will then cycle through the data set again so that we can sum the number of ratings and store that in `rating_sum`.  We then use this and the `count` variable to calculate the average rating count and store it in `average_rating_count`.  The genre and average rating count is then added to the dictionary `avg_num_rating`.

In [32]:
# apple_data_eng_free
# rating count tot is index 5
# genre is index 11

avg_num_rating = {}

for row in apple_data_eng_free :
    genre = row[11]
    if genre not in avg_num_rating :
        rating_sum = 0
        count = 0
        for row2 in apple_data_eng_free :
            gen = row2[11]
            if gen == genre :
                rating = float(row[5])
                rating_sum += rating
                count += 1
        average_rating_count = rating_sum/count
        avg_num_rating[genre]= average_rating_count
        


#### Apple Data Profile

The table below shows that the most popular genres of apps are social networking, photo & video, and games.  If we were to make an apple app, I would recommend our company make apps in those Genres, perhaps combineing a phote & video app in a game like manner.

In [31]:
table_display = []
for key in avg_num_rating:
        key_val_as_tuple = (avg_num_rating[key], key)
        table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
        print(entry[1], ':', entry[0])

Social Networking : 2974676.0
Photo & Video : 2161558.0
Games : 2130805.0
Music : 1126879.0
Reference : 985920.0
Health & Fitness : 507706.0
Weather : 495626.0
Utilities : 479440.0
Travel : 446185.0
Shopping : 417779.0
News : 354058.0
Navigation : 345046.0
Lifestyle : 342969.0
Entertainment : 308844.0
Food & Drink : 303856.0
Sports : 290996.0
Book : 252076.0
Finance : 233270.0
Education : 162701.0
Productivity : 161065.0
Business : 38681.0
Catalogs : 13345.0
Medical : 1341.0


#### Google Data

For the google data, we will use the number of installs data to determine which Genre is more popular.  This data is not that precise as it uses results such as `100,000+`.  For the purpose of our analysis we will treat this as 100,000 installs and all other instances will follow the same convention.

The code below loops through the `google_data_eng_free` looking for each unique genre.  Once a unique genre has been found, it will loop through itself again to get the sum of all the installs for that genre and stores that sum in `installs_sum`.  Before it can get the number of installs '+' and commas must be removed.  It also counts how many times that genre appears and stores that value in `count`.  Finally, it uses the previous variables to calculate the average installs in stores it in `average_installs`.  Then the genre and average installs are stored in the dictionary `g_avg_num_installs`.

In [35]:
# google_data_eng_free
# number of installs is index 5
# category is index 1

g_avg_num_installs = {}

for row in google_data_eng_free :
    genre = row[1]
    if genre not in g_avg_num_installs :
        installs_sum = 0
        count = 0
        for row2 in google_data_eng_free :
            gen = row2[1]
            if gen == genre :
                n_installs = row[5]
                n_installs = n_installs.replace("+", "")
                n_installs = n_installs.replace(",", "")
                n_installs = float(n_installs)
                installs_sum += n_installs
                count += 1
        average_installs = installs_sum/count
        g_avg_num_installs[genre]= average_installs

#### Google App Profile

The table below shows that the most popular apps  for android involve apps that play videos, are used as tools, for social interaction or communication.  If we were to make an Android app, I would recommend making something that plays videos and allows the user to communicate with others.

In [36]:
table_display = []
for key in g_avg_num_installs:
        key_val_as_tuple = (g_avg_num_installs[key], key)
        table_display.append(key_val_as_tuple)

table_sorted = sorted(table_display, reverse = True)
for entry in table_sorted:
        print(entry[1], ':', entry[0])

VIDEO_PLAYERS : 1000000000.0
TOOLS : 1000000000.0
SOCIAL : 1000000000.0
COMMUNICATION : 1000000000.0
PRODUCTIVITY : 500000000.0
MAPS_AND_NAVIGATION : 100000000.0
WEATHER : 50000000.0
TRAVEL_AND_LOCAL : 50000000.0
PERSONALIZATION : 50000000.0
SHOPPING : 10000000.0
NEWS_AND_MAGAZINES : 10000000.0
HOUSE_AND_HOME : 10000000.0
GAME : 10000000.0
FINANCE : 10000000.0
DATING : 10000000.0
BUSINESS : 10000000.0
LIFESTYLE : 5000000.0
PHOTOGRAPHY : 1000000.0
MEDICAL : 1000000.0
FAMILY : 1000000.0
ENTERTAINMENT : 1000000.0
HEALTH_AND_FITNESS : 500000.0
COMICS : 500000.0
BEAUTY : 500000.0
SPORTS : 100000.0
EVENTS : 100000.0
EDUCATION : 100000.0
AUTO_AND_VEHICLES : 100000.0
LIBRARIES_AND_DEMO : 50000.0
FOOD_AND_DRINK : 50000.0
BOOKS_AND_REFERENCE : 50000.0
PARENTING : 10000.0
ART_AND_DESIGN : 10000.0


# Conclusion

We looked at data from both the Google play store and the Apple Store to determine which kind of app does well on both platforms.  We know we will be implementing our app on android before we switch to ios.  With that said, there are a few app profiles that both android and apple share in common.  On both platforms, education and food and drink apps were not that popular.  The most popular apps on both platforms are ones that play video, or are used for social networking/communication.  I would recommend our company pursue an app that falls into one or both of those categories.