# Profitable App Profiles for the App Store and Google Play Store

### By: Timothy Han

### Our goal for this project is to determine which mobile app profile is the most sucessful and the most profitable in both the Apple App Store and the Google Play Store (for Android).
### Since our company is focused on developing apps that are free to download and generating its revenue from in-app ads, our role as data anlayst is to analyze data and base on our analysis of data is to help our developers know what kind of app are likely to attract the most users.

[Link to Apple App Store Data](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)

[Link to Android App Store Data](https://www.kaggle.com/datasets/lava18/google-play-store-apps)

### Importing the CSV datasets and converting it to a list of lists datastructures

In [1]:
open_file_apple = open('AppleStore.csv', encoding="utf8")
open_file_google = open('googleplaystore.csv', encoding="utf8")
from csv import reader
read_file_apple = reader(open_file_apple)
read_file_google = reader(open_file_google)
apple_data_list = list(read_file_apple)
google_data_list = list(read_file_google)

In [2]:
apple_header = apple_data_list[0]
apple = apple_data_list[1:]

In [3]:
google_header = google_data_list[0]
google = google_data_list[1:]

### Creating a function that will allow us to explore the datasets by rows.

In [4]:
def explore_data(dataset, start, end, rows_and_cols=False):
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n')
        
    if rows_and_cols:
        print('Number of rows: ', len(dataset))
        print('Number of columns: ', len(dataset[0]))

In [5]:
explore_data(google_header, 0, 13, True)
explore_data(google, 10472, 10473, True)

App


Category


Rating


Reviews


Size


Installs


Type


Price


Content Rating


Genres


Last Updated


Current Ver


Android Ver


Number of rows:  13
Number of columns:  3
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows:  10841
Number of columns:  13


In [6]:
explore_data(apple_header, 0, 13, True)
explore_data(apple, 0, 5, True)

id


track_name


size_bytes


currency


price


rating_count_tot


rating_count_ver


user_rating


user_rating_ver


ver


cont_rating


prime_genre


sup_devices.num


Number of rows:  16
Number of columns:  2
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows:  7197
Number of columns:  16


### Row at index 10472 in the Android dataset has some empty datapoints in certain columns.  Delete row 10472

In [7]:
print(len(google))
del google[10472]
print(len(google))

10841
10840


### In the Android apps store there are duplicate data with the app name, but only differ by the number of user "Reviews" (4th Column). 

In [8]:
for app_google in google:
    name = app_google[0]
    if name == 'Facebook':
        print(app_google)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


In [9]:
for app_google in google:
    name = app_google[0]
    if name == 'Instagram':
        print(app_google)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


### In the Android (Google Playstore) app store has duplicate app entries for Facebook and Instagram.  The only difference in data is the fourh position (Reviews).

In [10]:
for app_apple in apple:
    name = app_apple[0]
    if name == "Instagram":
        print(app_apple)

### No duplicate entry found in the Apple app store.

In [11]:
duplicate_apps = []
unique_apps = []

for app in google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
        
    else:
        unique_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:15])
print('\n')
print('Number of unique apps: ', len(unique_apps))
print('\n')
print('Examples of unqiue apps: ', unique_apps[:15])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Number of unique apps:  9659


Examples of unqiue apps:  ['Photo Editor & Candy Camera & Grid & ScrapBook', 'Coloring book moana', 'U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'Sketch - Draw & Paint', 'Pixel Draw - Number Art Coloring Book', 'Paper flowers instructions', 'Smoke Effect Photo Maker - Smoke Editor', 'Infinite Painter', 'Garden Coloring Book', 'Kids Paint Free - Drawing Fun', 'Text on Photo - Fonteee', 'Name Art Photo Editor - Focus n Filters', 'Tattoo Name On My Photo Editor', 'Mandala Coloring Book', '3D Color Pixel by Number - Sandbox Art Coloring']


### Determination of the of highest number of of reviews in the Android app store.

In [12]:
reviews_max = {}

for rows in google:
    name = rows[0]
    n_reviews = float(rows[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))

9659


### Removing Duplicate Rows

In [13]:
android_clean = []
already_added = []

for rows in google:
    name = rows[0]
    n_reviews = float(rows[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(rows)
        already_added.append(name)

print('Number of rows of cleaned Android data: ', len(android_clean))
print('\n')
# print('Cleaned Anroid data: ', android_clean)

Number of rows of cleaned Android data:  9659




In [14]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows:  9659
Number of columns:  13


### Detection of non-English base apps

In [15]:
def english_lang_det(string):
    for char in string:
        letters = ord(char)
        
        non_english_count = 0
        
        if letters > 127:
            non_english_count += 1
        
        if non_english_count >= 3:
            return False
        else:
            return True

In [16]:
english_lang_det('Instagram')

True

In [17]:
english_lang_det('爱奇艺PPS -《欢乐颂2》电视剧热播')

True

In [18]:
english_lang_det('Docs To Go™ Free Office Suite')

True

In [19]:
english_lang_det('Instachat 😜')

True

In [20]:
apple_english_apps = []
android_english_apps = []

for rows in apple:
    if english_lang_det(rows[1]):
        apple_english_apps.append(rows)
        
for rows in android_clean:
    if english_lang_det(rows[0]):
        android_english_apps.append(rows)
        
explore_data(apple_english_apps, 0, 5, True)

explore_data(android_english_apps, 0, 5, True)
        
# print(android_english_apps)
# print('\n')
# print(apple_english_apps)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows:  7197
Number of columns:  16
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 

### Isolating the free apps from the non-free apps

In [21]:
apple_free_apps = []
android_free_apps = []

for rows in apple_english_apps:
    apple_price = rows[4]
    
    if apple_price == '0.0':
        apple_free_apps.append(rows)
        
print(len(apple_free_apps))

for rows in android_english_apps:
    google_price = rows[7]
    
    if google_price == '0':
        android_free_apps.append(rows)
    
        
print(len(android_free_apps))
    

4056
8905


### Analyze previous app store datasets (both Apple and Android) to determine the most widely used and succesful apps in the app stores.  This analysis will inform us what type of apps are likely to attract more users, and it will help guide our app development to be more successful.


### Our validation strategy for our new will be three-folds:
#### 1. Build a lite version of the app and put it into the Google Play store.
#### 2. If the app has good feedback from user, then we will develop the app further.
#### 3. If the app is profitable after 6 months, then we will add build an iOS version and add it to the Apple Store.


### Genres of apps in both datasets: Apple and Android  
#### Family, Books, Sports, Entertainment, Games, Food, Travel, Finance, Health & Fitness, Photo & Video, Medical, Maps & Navigation, Productivity, News, Weather. Tools or Utilities.

In [22]:
def freq_table(dataset, index):
    
    genre = {}
    total = 0
    
    for row in dataset:
        
        total += 1
        
        freq_value = row[index]
        
        if freq_value in genre:
            genre[freq_value] += 1
        else:
            genre[freq_value] = 1
        
        
    table_percentages = {}
    
    for key in genre:
        percentages_value = (genre[key]/total)*100
        table_percentages[key] = percentages_value
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_value_tuple = (table[key], key)
        table_display.append(key_value_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [23]:
display_table(android_free_apps, -4) #Frequency table for "Genre" column in Android

Tools : 8.422234699606962
Entertainment : 6.086468276249298
Education : 5.390230207748456
Business : 4.581695676586187
Lifestyle : 3.9191465468837734
Productivity : 3.885457608085345
Finance : 3.6833239752947784
Medical : 3.5148792813026386
Sports : 3.4475014037057834
Personalization : 3.312745648512072
Communication : 3.2341381246490735
Action : 3.0881527231892196
Health & Fitness : 3.065693430656934
Photography : 2.9421673217293653
News & Magazines : 2.829870859067939
Social : 2.6501965188096577
Travel & Local : 2.313307130825379
Shopping : 2.2459292532285233
Books & Reference : 2.1785513756316677
Simulation : 2.0662549129702414
Dating : 1.8528916339135317
Arcade : 1.8416619876473892
Video Players & Editors : 1.7742841100505335
Casual : 1.7518248175182483
Maps & Navigation : 1.4149354295339696
Food & Drink : 1.235261089275688
Puzzle : 1.1229646266142617
Racing : 0.9882088714205502
Role Playing : 0.9320606400898372
Libraries & Demo : 0.9320606400898372
Strategy : 0.9208309938236946
Au

In [24]:
display_table(android_free_apps, 1)  #Frequency table for "Category" column in Android

FAMILY : 18.97810218978102
GAME : 9.70241437394722
TOOLS : 8.433464345873105
BUSINESS : 4.581695676586187
LIFESTYLE : 3.9303761931499155
PRODUCTIVITY : 3.885457608085345
FINANCE : 3.6833239752947784
MEDICAL : 3.5148792813026386
SPORTS : 3.3801235261089273
PERSONALIZATION : 3.312745648512072
COMMUNICATION : 3.2341381246490735
HEALTH_AND_FITNESS : 3.065693430656934
PHOTOGRAPHY : 2.9421673217293653
NEWS_AND_MAGAZINES : 2.829870859067939
SOCIAL : 2.6501965188096577
TRAVEL_AND_LOCAL : 2.3245367770915215
SHOPPING : 2.2459292532285233
BOOKS_AND_REFERENCE : 2.1785513756316677
DATING : 1.8528916339135317
VIDEO_PLAYERS : 1.7967434025828188
MAPS_AND_NAVIGATION : 1.4149354295339696
FOOD_AND_DRINK : 1.235261089275688
EDUCATION : 1.167883211678832
ENTERTAINMENT : 0.9545199326221224
LIBRARIES_AND_DEMO : 0.9320606400898372
AUTO_AND_VEHICLES : 0.9208309938236946
HOUSE_AND_HOME : 0.8197641774284109
WEATHER : 0.7973048848961257
EVENTS : 0.7074677147669848
PARENTING : 0.6513194834362718
ART_AND_DESIGN : 0

### The most common genre in the Google Play store is "Family" genre (~19%) and second is the "Game" genre (9.7%) within the scope of free-apps in English.  The third most widely common genre is the "Tools" category (8.4%).  This indicates that most consumers whom shop in the Google Play Store are parents who purchases apps for their kids either in the "Family" or "Games" genre.

In [25]:
display_table(apple_free_apps,-5)  #Frequency table for "Prime Genre" column in Apple

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


### In the Apple app store the most common genre is "Games" (53.7%), the second most common genre in the Apple app store is "Entertainment" (7.4%), and the third most common genre of apps  "Education" with 6.3% within the scope of the free-and-English base apps only.  Less than 5% of the consumers in the Apple app store use apps in other areas.   Overall within the free English base apps most of the user-base appears are interested in apps that are games and entertainment.

In [26]:
freq_table(apple_free_apps, -5)

{'Social Networking': 3.5256410256410255,
 'Photo & Video': 4.117357001972387,
 'Games': 55.64595660749507,
 'Music': 1.6518737672583828,
 'Reference': 0.4930966469428008,
 'Health & Fitness': 1.8737672583826428,
 'Weather': 0.7642998027613412,
 'Utilities': 2.687376725838264,
 'Travel': 1.3806706114398422,
 'Shopping': 2.983234714003945,
 'News': 1.4299802761341223,
 'Navigation': 0.4930966469428008,
 'Lifestyle': 2.3175542406311638,
 'Entertainment': 8.234714003944774,
 'Food & Drink': 1.0601577909270217,
 'Sports': 1.947731755424063,
 'Book': 1.6272189349112427,
 'Finance': 2.0710059171597637,
 'Education': 3.2544378698224854,
 'Productivity': 1.5285996055226825,
 'Business': 0.4930966469428008,
 'Catalogs': 0.22189349112426035,
 'Medical': 0.19723865877712032}

In [27]:
genre = freq_table(apple_free_apps, -5)

for key in genre:
    total = 0      #Total user ratings for each specific genre
    len_genre = 0  #Total number of apps for each specific genre
    
    for rows in apple_free_apps:
        genre_app = rows[-5]
        if genre_app == key:
            user_ratings = float(rows[5])
            total += user_ratings
            len_genre += 1
            
    avg_number_users = total/len_genre
            
    print(key, ':', avg_number_users)

Social Networking : 53078.195804195806
Photo & Video : 27249.892215568863
Games : 18924.68896765618
Music : 56482.02985074627
Reference : 67447.9
Health & Fitness : 19952.315789473683
Weather : 47220.93548387097
Utilities : 14010.100917431193
Travel : 20216.01785714286
Shopping : 18746.677685950413
News : 15892.724137931034
Navigation : 25972.05
Lifestyle : 8978.308510638299
Entertainment : 10822.961077844311
Food & Drink : 20179.093023255813
Sports : 20128.974683544304
Book : 8498.333333333334
Finance : 13522.261904761905
Education : 6266.333333333333
Productivity : 19053.887096774193
Business : 6367.8
Catalogs : 1779.5555555555557
Medical : 459.75


### The "Reference" genre apps has the highest number of installs.

In [28]:
for apps in apple_free_apps:
    if apps[-5] == 'Reference':
        print(apps[1], ':', apps[5] )

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
彩库宝典-【官方版】 : 0
Jishokun-Japanese English Dictionary & Translator : 0
無料で音楽や写真・カメラの裏技アプリ for iPhone7 : 0


### Upon closer examination of the "reference" genre of apps, the Bible and Dictionary.com apps saturates the number of reviews and skews the results.  This makes the "reference" genre more popular than it seems.  For a more accurate assesment removal of the Bible and Dictionary.com app will be necessary.

In [29]:
display_table(android_free_apps, 5)

1,000,000+ : 15.687815833801237
100,000+ : 11.577765300393038
10,000,000+ : 10.499719258843346
10,000+ : 10.252667040988209
1,000+ : 8.422234699606962
100+ : 6.917462099943853
5,000,000+ : 6.816395283548568
500,000+ : 5.53621560920831
50,000+ : 4.817518248175182
5,000+ : 4.525547445255475
10+ : 3.537338573834924
500+ : 3.2341381246490735
50,000,000+ : 2.2908478382930935
100,000,000+ : 2.1224031443009546
50+ : 1.9090398652442448
5+ : 0.7860752386299831
1+ : 0.5165637282425604
500,000,000+ : 0.26951151038742277
1,000,000,000+ : 0.22459292532285235
0+ : 0.044918585064570464
0 : 0.011229646266142616


In [30]:
freq_table(android_free_apps, 1)

{'ART_AND_DESIGN': 0.6513194834362718,
 'AUTO_AND_VEHICLES': 0.9208309938236946,
 'BEAUTY': 0.5951712521055587,
 'BOOKS_AND_REFERENCE': 2.1785513756316677,
 'BUSINESS': 4.581695676586187,
 'COMICS': 0.6288601909039866,
 'COMMUNICATION': 3.2341381246490735,
 'DATING': 1.8528916339135317,
 'EDUCATION': 1.167883211678832,
 'ENTERTAINMENT': 0.9545199326221224,
 'EVENTS': 0.7074677147669848,
 'FINANCE': 3.6833239752947784,
 'FOOD_AND_DRINK': 1.235261089275688,
 'HEALTH_AND_FITNESS': 3.065693430656934,
 'HOUSE_AND_HOME': 0.8197641774284109,
 'LIBRARIES_AND_DEMO': 0.9320606400898372,
 'LIFESTYLE': 3.9303761931499155,
 'GAME': 9.70241437394722,
 'FAMILY': 18.97810218978102,
 'MEDICAL': 3.5148792813026386,
 'SOCIAL': 2.6501965188096577,
 'SHOPPING': 2.2459292532285233,
 'PHOTOGRAPHY': 2.9421673217293653,
 'SPORTS': 3.3801235261089273,
 'TRAVEL_AND_LOCAL': 2.3245367770915215,
 'TOOLS': 8.433464345873105,
 'PERSONALIZATION': 3.312745648512072,
 'PRODUCTIVITY': 3.885457608085345,
 'PARENTING': 0.6

In [31]:
category = freq_table(android_free_apps, 1) 

for key in category:
    
    total = 0  #This variable will store the sum of installs specific to each genre
    len_category = 0 #This variable will store the number of apps specific to each genre.
    
    for row in android_free_apps:
        category_app = row[1]
        if category_app == key:
            n_installs = row[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs) 
            len_category += 1
            
    avg_number_installs = total/len_category
    
    print(key, ':', avg_number_installs)
        
        

ART_AND_DESIGN : 1952105.1724137932
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8587351.855670104
BUSINESS : 1708215.906862745
COMICS : 803234.8214285715
COMMUNICATION : 38322625.697916664
DATING : 854028.8303030303
EDUCATION : 1825480.7692307692
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1436126.94
GAME : 15551995.891203703
FAMILY : 3668870.823076923
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7001693.425
PHOTOGRAPHY : 17772018.759541985
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10787009.952063914
PERSONALIZATION : 5183850.806779661
PRODUCTIVITY : 16738957.554913295
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24573948.25
NEWS_AND_MAGAZINES : 9401635.95

### In the Google Play Store the communciation apps have the most installs: 38,322,626.  Closer examination of the communciations category is needed.  Examine the highest three most installs.

In [32]:
for app in android_free_apps:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'or app[5] =='500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

### The communications category is skewed by apps like Whatsapp, Skype, Googler Chrome, Gmail, Hangouts, and etc.  This makes the communications category more popular than it seems comapred to other categories in the Google Play Store.  

### Removal of of these apps with 100 million installs from the communications category, the average would be spproximately reduced by ten times.

In [33]:
under_100m = []

for app in android_free_apps:
    
    n_installs = app[5]
    n_installs = n_installs.replace(',','')
    n_installs = n_installs.replace('+','')
    
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100m.append(float(n_installs))

avg_installs_comm = sum(under_100m)/len(under_100m)

print('average communciation installs after filtering: ', avg_installs_comm)

average communciation installs after filtering:  3589717.245210728


### After filtering out apps that has over 100 million installs that skews the communications, we reduced the average almost ten-folds from to 3,589,717

### The video player category has the same pattern, which has the second highest installs of 24,573,948.  However, like the communciations category apps this result is skewed by a few apps like Youtube, Google Play Movies & TV, or MX Player.  So are the other categories like social, photography, and productivity apps.

### All these category of apps doesn't give us an accurate representation of the most popular apps base on the number of installs

### The books and reference category looks fairly popular, with an average installs of 8,587,351.  This might potentially work well and be profitable in both Apple App Store and the Google Play Store. 

### Closer analysis of the books and reference category of apps is needed to narrow the focus specifically kind of book or reference app is needed for development for our company app.

In [34]:
for app in android_free_apps:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

### A closer examination of the books and references category it does appear that there are a few apps that are extremely popular that skews the overall results that affects actual number of installs in this category. 

In [36]:
for app in android_free_apps:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5]=='500,000,000+' or app[5]=='100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


### So it appears that there are only five popular apps that skews the overall results compared to the other categores of apps.  This category of apps still has potential for our company to develop an app in this category.  

### We need to narrow are search for the most profitable and popular app in this category further.  We will search somehwere in the "middle" in terms of popularity (between 1,000,000 and 100,000,000 downloads).

In [37]:
for app in android_free_apps:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5]=='1,000,000+' or app[5]=='5,000,000+' or app[5]=='10,000,000+' or app[5]=='50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

### In terms of of the "middle" popularity (between 1,000,000 and 100,000,000 downloads) it is dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to develop an app similar to these since there will be signifficant competitons. 

### Base on the results of the "middle" popularity there seems to be quite a few number of apps base around the book Quran.  This indicates that this book is popular, and maybe buiding an app around a popular book might potentially be profitable in both the Apple App Store and the Google Play Store.

### Also, the makrets is full of libraries, so for our book app we can add some special features like daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can host discussion on the book, and etc., rather than simply developing a raw version of the book itself.

## Conclusion

### In this project we analyze two datasets: Apple App Store and Google Play Store with the objective of recommending an app profile that can be profitable for our company in both app stores.

### From our analysis we conclude that taking a popular book and turning it into an app could potentially be profitable in both the Android (Google) and the Apple 