# Profitable Mobile Apps for Google Play and App Store
This is a project to let the developers know about the types of apps that are the most attractive by the users. We seek for the apps that attract more number of users because we are only interested in free apps. We need more users for the advertisements to reach more people. 

My goal as a data analyst in this project is to make it clear what is demanded by users. 

## Opening the datasets

Below, we open the two data sets. And then write a function, named **explore_data** that explores the data. The explore_data function receives four parameters: the data set, the starting row index to show, the ending row index to show, and the number of columns and rows.

In [39]:
opened_data=open('AppleStore.csv')
from csv import reader
read_file=reader(opened_data)
apple_data=list(read_file)
apple_header=apple_data[0]

opened_data2=open('googleplaystore.csv')
from csv import reader
read_file2=reader(opened_data2)
google_data=list(read_file2)
google_header=google_data[0]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        

## Explore Datasets
Below we see the first two rows of Apple Store dataset. The data set includes 7198 rows and 16 columns. For more details about this dataset, you can check [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps). 
The columns of the **Apple Dataset**:
1. id
2. track_name
3. size_bytes
4. currency
5. price
6. rating_count_tot
7. rating_count_ver
8. user_rating
9. user_rating_ver
10. ver
11. cont_rating
12. prime_genre
13. sup_devices.num
14. ipadSc_urls.num
15. lang.num
16. vpp_lic




In [3]:
explore_data(apple_data,0,2,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows: 7198
Number of columns: 16


The first two rows of the Google Play Store dataset is shown below. The dataset includes 10842 rows and 13 columns. For more details about this dataset, you can check [here](https://www.kaggle.com/lava18/google-play-store-apps).
The columns of the **Google Dataset**:
1. App
2. Category
3. Rating
4. Reviews
5. Size
6. Installs
7. Type
8. Price
9. Content Rating
10. Genres
11. Last Updated
12. Current Ver
13. Android Ver

In [4]:
explore_data(google_data,0,2, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


## Data Cleaning
Now, we need to make sure that our data is accurate. For this we will
1. Detect inaccurate data, and correct or remove it.
2. Detect duplicate data, and remove the duplicates

## Delete the Inaccurate Data
We read the discussions about this dataset in the website we have taken this from, and seen that there is a row with wrong information. The index number of the row is said to be 10473. We show that row below, and then **delete the row**. 

In [5]:
print(google_data[10473])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [6]:
del (google_data[10473])

In [7]:
explore_data(google_data,0,2,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


As we can see above, after the row deletion, the number of rows left in the google dataset is 10841 (the first row being the header). 

We didnt see anything in the discussion part of the apple dataset about any rows with wrong data.

## Delete the Duplicate Entries
We have also recognized that there are duplicate entries. Below are two examples from Google dataset.

In [8]:
for app in google_data:
    name=app[0]
    if name=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [9]:
for app in google_data:
    name=app[0]
    if name=='Subway Surfers':
        print(app)

['Subway Surfers', 'GAME', '4.5', '27722264', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']
['Subway Surfers', 'GAME', '4.5', '27723193', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']
['Subway Surfers', 'GAME', '4.5', '27724094', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']
['Subway Surfers', 'GAME', '4.5', '27725352', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']
['Subway Surfers', 'GAME', '4.5', '27725352', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']
['Subway Surfers', 'GAME', '4.5', '27711703', '76M', '1,000,000,000+', 'Free', '0', 'Everyone 10+', 'Arcade', 'July 12, 2018', '1.90.0', '4.1 and up']


We collect the duplicate enries and the unique entries in seperate lists below

In [10]:
duplicate_apps =[]
unique_apps=[]
for app in google_data[1:]:
    name=app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of duplicate apps: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:15])
print('\n')
print('Number of unique apps: ', len(unique_apps)) 

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Number of unique apps:  9659


So we found that there are **9659 unique apps**. And there are totally 1881 duplicates of those unique apps. 

We will remove the duplicate rows. We will keep the most recent data for each app and remove the older ones. For doing this, we will check the **'Reviews'**. The highest review number would give us the most recent data and hence we will keep the rows with the highest number of reviews for each app. 

**Removing Duplicate Apps**

Below we create a dictionary in which the names and max number of reviews of unique apps are paired. 

In [11]:
reviews_max={} #name and max reviews
for app in google_data[1:]:
    name=app[0]
    n_reviews=float(app[3])
    if name in reviews_max and n_reviews>reviews_max[name]:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max.update({name:n_reviews})
print(len(reviews_max))



9659


Now we will delete the unwanted duplicates. For this we will actually crerate a new dataset (list of lists) named android_clean and it will only store the wanted rows.  

In [12]:
android_clean=[]
already_added=[]
for app in google_data[1:]:
    name=app[0]
    n_reviews=float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
explore_data(android_clean,0,2,True)        


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


# Removing the apps that are not for English speakers
We decided to filter the apps according to their names. We will remove the apps which have at least 4 non_English characters. Although this is not a perfect filter, it should be fairly effective. 

The function below, named *lang_check* is defined for this.

In [13]:
def lang_check(string): 
    countt=0
    for character in string:    #çağrı   爱奇艺PPS -《欢乐颂2》电视剧热播
        if countt>3:
            return False
        elif ord(character) > 127:
            countt+=1
    if countt<=3:
        return True
lang_check('hello')


True

Below we will filter out both data sets according to being either English or not with our filter function *lang_check*.

In [14]:
android_checked=[]
apple_checked=[]

for app in android_clean:
    if lang_check(app[0])==True:
        android_checked.append(app)
        
for app in apple_data[1:]:
    if lang_check(app[1])==True:
        apple_checked.append(app)       


explore_data(android_checked, 0,2,True)
print('\n')
explore_data(apple_checked, 0,2,True)
       

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6183
Number of columns: 16


At the end of removing the non_english apps, we have 9614 unique apps in google dataset and 6183 apps in apple dataset. 

# Remove the non-free apps
Next we will remove the apps which are not free from our lists. 

In [15]:
android=[]
apple=[]

for app in android_checked:
    price =(app[7])
    if price=='0' :  # type of 'price' is string in the dataset, and has a dollar sign ($) if not 0. 
       android.append(app)
        
for app in apple_checked:
    price=float(app[4])
    if price==0:
        apple.append(app)

        
explore_data(android, 0,2,True)
print('\n')
explore_data(apple, 0,2,True) 


    

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8864
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 3222
Number of columns: 16


So far, we have cleaned the data as we have: 
1. Removed inaccurate data
2. Removed duplicate app entries
3. Removed non-English apps
4. Isolated the free apps

And we are left with **8862 apps in Google Dataset** (android) and **3222 apps in Apple Dataset** (apple). 

Now, we will start analyzing the datasets: android and apple. 

Our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

We want to find an app profile that fits both the App Store and Google Play. We aim to minimize risks and overhead and for this, we validate our apps through three steps as follows:
1. We first build an Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we develop it further.
3. If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

# EXPLORING DATASETS - FREQUENCY TABLES

We start with exploring the datasets for finding out the apps that are being used more.

We will first draw frequency tables for number of ratings and reviews with respect to genres to find out the most common genres. 





In [16]:
def freq_table(dataset, index):  # returns a dictionary
    freq_dict={}
    totall=0
    for app in dataset:
        totall+=1
        rr= app[index]
        if rr in freq_dict:
            freq_dict[rr]+=1
        else:
            freq_dict[rr]=1
            
    percentages={}
    for key in freq_dict:
        perc=(freq_dict[key]/totall)*100
        percentages[key]=perc
            
    return percentages

In [17]:
freq_table(apple,11)

{'Book': 0.4345127250155183,
 'Business': 0.5276225946617008,
 'Catalogs': 0.12414649286157665,
 'Education': 3.662321539416512,
 'Entertainment': 7.883302296710118,
 'Finance': 1.1173184357541899,
 'Food & Drink': 0.8069522036002483,
 'Games': 58.16263190564867,
 'Health & Fitness': 2.0173805090006205,
 'Lifestyle': 1.5828677839851024,
 'Medical': 0.186219739292365,
 'Music': 2.0484171322160147,
 'Navigation': 0.186219739292365,
 'News': 1.3345747982619491,
 'Photo & Video': 4.9658597144630665,
 'Productivity': 1.7380509000620732,
 'Reference': 0.5586592178770949,
 'Shopping': 2.60707635009311,
 'Social Networking': 3.2898820608317814,
 'Sports': 2.1415270018621975,
 'Travel': 1.2414649286157666,
 'Utilities': 2.5139664804469275,
 'Weather': 0.8690254500310366}

In [18]:
def display_table (dataset, index):
        table=freq_table(dataset, index)
        table_display=[]
        for key in table:
            key_val_as_tuple= (table[key],key)
            table_display.append(key_val_as_tuple)
            
        table_sorted = sorted(table_display, reverse=True)
        for entry in table_sorted:
            print(entry[1],':',entry[0])
            


In [19]:
display_table(apple, 11)  # prime_genre

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


For the AppleStore apps, the most common genre is **Games** , which is well ahead of all the other genres. The runner-up is **Entertainment**. The list mostly continues with other entertainment and social networking kinds of apps. We see that apps for practical purposes such as education, shopping, and utilities are not as common as entertainment type apps. 

In [20]:
display_table(android,1 )

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [21]:
display_table(android,9)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

GooglePlay apps show a more balanced distribution of genres. The frequency of the apps for practical purposes such as Tools, Business, Lifestyle etc are almost as high as the frequencies of apps for fun. The most common genre is Tools and the second one is Entertainment. Based on categories, the most common category is Family and the runner up is Games; a big contrast to the picture in AppleStore. 

# Most Popular Apps by Genre on the App Store
We still don't know the user numbers and popularity of these apps. We will start with the AppleStore dataset and check the **rating_count_tot** to reflect the total number of users. 

In [22]:
unique_genres=freq_table(apple, 11)  # prime_genre

In [23]:
for genre in unique_genres:
    total=0
    len_genre=0
    for app in apple:
        genre_app=app[11]
        if genre_app ==genre:
            user_ratings=float(app[5]) # rating_count_total index: 5
            total+=user_ratings
            len_genre+=1
    avg_user_rating=total/len_genre
    print(genre,': ',avg_user_rating)
        

Health & Fitness :  23298.015384615384
News :  21248.023255813954
Productivity :  21028.410714285714
Finance :  31467.944444444445
Medical :  612.0
Games :  22788.6696905016
Utilities :  18684.456790123455
Lifestyle :  16485.764705882353
Photo & Video :  28441.54375
Weather :  52279.892857142855
Navigation :  86090.33333333333
Food & Drink :  33333.92307692308
Reference :  74942.11111111111
Catalogs :  4004.0
Sports :  23008.898550724636
Business :  7491.117647058823
Education :  7003.983050847458
Entertainment :  14029.830708661417
Music :  57326.530303030304
Social Networking :  71548.34905660378
Travel :  28243.8
Shopping :  26919.690476190477
Book :  39758.5


Observing the average number of ratings reveals that not only the games are popular among the applestore users. We see that Navigation apps are rated the highest, and it is followed by Social networking, Music, and Weather. Books are also rated higher than games. Let's examine these genres in more detail. Navigation seems to be popular because of two big apps: Waze and Google Maps. 

In [24]:
for app in apple:
    if app[11]=='Navigation':
        print(app[1],': ', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic :  345046
Google Maps - Navigation & Transit :  154911
Geocaching® :  12811
CoPilot GPS – Car Navigation & Offline Maps :  3582
ImmobilienScout24: Real Estate Search in Germany :  187
Railway Route Search :  5


Lets look at the social networking apps. 

In [25]:
for app in apple:
    if app[11]=='Social Networking':
        print(app[1],': ', app[5])

Facebook :  2974676
Pinterest :  1061624
Skype for iPhone :  373519
Messenger :  351466
Tumblr :  334293
WhatsApp Messenger :  287589
Kik :  260965
ooVoo – Free Video Call, Text and Voice :  177501
TextNow - Unlimited Text + Calls :  164963
Viber Messenger – Text & Call :  164249
Followers - Social Analytics For Instagram :  112778
MeetMe - Chat and Meet New People :  97072
We Heart It - Fashion, wallpapers, quotes, tattoos :  90414
InsTrack for Instagram - Analytics Plus More :  85535
Tango - Free Video Call, Voice and Chat :  75412
LinkedIn :  71856
Match™ - #1 Dating App. :  60659
Skype for iPad :  60163
POF - Best Dating App for Conversations :  52642
Timehop :  49510
Find My Family, Friends & iPhone - Life360 Locator :  43877
Whisper - Share, Express, Meet :  39819
Hangouts :  36404
LINE PLAY - Your Avatar World :  34677
WeChat :  34584
Badoo - Meet New People, Chat, Socialize. :  34428
Followers + for Instagram - Follower Analytics :  28633
GroupMe :  28260
Marco Polo Video Walki

Social networks have the high user ratings because of the very few number of highly popular social networks such as Facebook, Pinterest, Whatsapp. So the high number of avg rating is misleading for our purpose. 

Lets check the Reference:


In [26]:
for app in apple:
    if app[11]=='Reference':
        print(app[1],': ', app[5])

Bible :  985920
Dictionary.com Dictionary & Thesaurus :  200047
Dictionary.com Dictionary & Thesaurus for iPad :  54175
Google Translate :  26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran :  18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition :  17588
Merriam-Webster Dictionary :  16849
Night Sky :  12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) :  8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools :  4693
GUNS MODS for Minecraft PC Edition - Mods Tools :  1497
Guides for Pokémon GO - Pokemon GO News and Cheats :  826
WWDC :  762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free :  718
VPN Express :  14
Real Bike Traffic Rider Virtual Reality Glasses :  8
教えて!goo :  0
Jishokun-Japanese English Dictionary & Translator :  0


Reference genre seems promising because these apps are popular. There are two dominant apps; Bible and Dictionary.com, and it is possible to create a new app which would be about books and references and not be an alternative to bible or dictionary. So there seems to be a potential in choosing a popular book and creating an app around it. AppleStore users are more interested in apps for fun and hence the book could be chosen accordingly.  

# Most popular Apps on GooglePlay
Now lets look at the GooglePlay dataset and come up with an app profile recommendation for the GooglePlay store. We should keep in mind that our recomemndation should work in both of the markets. 


In [29]:
unique_cat=freq_table(android,1)

for category in unique_cat:
    total=0
    len_category=0
    for app in android:
        category_app=app[1]
        if category_app==category:
            installs=app[5]  # Installs index: 5
            installs= installs.replace('+','')
            installs=float(installs.replace(',',''))
            total+=installs
            len_category+=1
    avg_installs=total/len_category
    print(category,': ', avg_installs)

ENTERTAINMENT :  11640705.88235294
HOUSE_AND_HOME :  1331540.5616438356
FAMILY :  3695641.8198090694
FINANCE :  1387692.475609756
EVENTS :  253542.22222222222
ART_AND_DESIGN :  1986335.0877192982
HEALTH_AND_FITNESS :  4188821.9853479853
TRAVEL_AND_LOCAL :  13984077.710144928
BEAUTY :  513151.88679245283
WEATHER :  5074486.197183099
TOOLS :  10801391.298666667
SPORTS :  3638640.1428571427
EDUCATION :  1833495.145631068
BUSINESS :  1712290.1474201474
SHOPPING :  7036877.311557789
LIFESTYLE :  1437816.2687861272
FOOD_AND_DRINK :  1924897.7363636363
PERSONALIZATION :  5201482.6122448975
PHOTOGRAPHY :  17840110.40229885
GAME :  15588015.603248259
VIDEO_PLAYERS :  24727872.452830188
BOOKS_AND_REFERENCE :  8767811.894736841
MAPS_AND_NAVIGATION :  4056941.7741935486
MEDICAL :  120550.61980830671
NEWS_AND_MAGAZINES :  9549178.467741935
SOCIAL :  23253652.127118643
COMICS :  817657.2727272727
PARENTING :  542603.6206896552
LIBRARIES_AND_DEMO :  638503.734939759
PRODUCTIVITY :  16787331.344927534

The highest number of installations are for Communication apps: 38M. Lets check it out. Below we see that this genre is dominated by big message apps such as Messenger and Whatsapp. 

In [28]:
for app in android:
    if app[1]=='COMMUNICATION':
        print(app[0],': ', app[5])

WhatsApp Messenger :  1,000,000,000+
Messenger for SMS :  10,000,000+
My Tele2 :  5,000,000+
imo beta free calls and text :  100,000,000+
Contacts :  50,000,000+
Call Free – Free Call :  5,000,000+
Web Browser & Explorer :  5,000,000+
Browser 4G :  10,000,000+
MegaFon Dashboard :  10,000,000+
ZenUI Dialer & Contacts :  10,000,000+
Cricket Visual Voicemail :  10,000,000+
TracFone My Account :  1,000,000+
Xperia Link™ :  10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard :  10,000,000+
Skype Lite - Free Video Call & Chat :  5,000,000+
My magenta :  1,000,000+
Android Messages :  100,000,000+
Google Duo - High Quality Video Calls :  500,000,000+
Seznam.cz :  1,000,000+
Antillean Gold Telegram (original version) :  100,000+
AT&T Visual Voicemail :  10,000,000+
GMX Mail :  10,000,000+
Omlet Chat :  10,000,000+
My Vodacom SA :  5,000,000+
Microsoft Edge :  5,000,000+
Messenger – Text and Video Chat for Free :  1,000,000,000+
imo free video calls and chat :  500,000,000+
Calls & Tex

We eliminate the big apps from the communication genre and look at the average below. We see that if we eliminate the giant communication apps which have at least 100M installations, the average installation of the rest of the apps is around 3.6M. 

In [31]:
under_100M=[]

for app in android:
    installs=app[5]  # Installs index: 5
    installs= installs.replace('+','')
    installs=float(installs.replace(',',''))
    if (app[1] == 'COMMUNICATION') and (installs < 100000000):
        under_100M.append(installs)
avg_installs=sum(under_100M)/len(under_100M)
print(avg_installs)

3603485.3884615386


The second most installed type of apps is video players.


In [32]:
for app in android:
    if app[1]=='VIDEO_PLAYERS':
        print(app[0],': ', app[5])

YouTube :  1,000,000,000+
All Video Downloader 2018 :  1,000,000+
Video Downloader :  10,000,000+
HD Video Player :  1,000,000+
Iqiyi (for tablet) :  1,000,000+
Video Player All Format :  10,000,000+
Motorola Gallery :  100,000,000+
Free TV series :  100,000+
Video Player All Format for Android :  500,000+
VLC for Android :  100,000,000+
Code :  10,000,000+
Vote for :  50,000,000+
XX HD Video downloader-Free Video Downloader :  1,000,000+
OBJECTIVE :  1,000,000+
Music - Mp3 Player :  10,000,000+
HD Movie Video Player :  1,000,000+
YouCut - Video Editor & Video Maker, No Watermark :  5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects :  1,000,000+
YouTube Studio :  10,000,000+
video player for android :  10,000,000+
Vigo Video :  50,000,000+
Google Play Movies & TV :  1,000,000,000+
HTC Service － DLNA :  10,000,000+
VPlayer :  1,000,000+
MiniMovie - Free Video and Slideshow Editor :  50,000,000+
Samsung Video Library :  50,000,000+
OnePlus Gallery :  1,000,000+
LIKE – Magic Vi

"Video Players" is also dominated by a few big apps such as Youtube and Googl Play. This same trend is being seen for the categories like social apps and photography apps. In all those categories, there are a few apps which are owned by big companies and they are dominantly being used by the users. And it is out of our scope in this analysis to recommend an app type to compete with those giants. 

Lets look at other categories. Books and Reference is worthed to check. 


for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

Dictionaries, religious book apps and some book reading apps and Wikipedia seems to have the highest installations. Let's eliminate the few highly installed apps and see the rest.

In [34]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+' or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


Let's see the rest of the apps.

In [38]:
for app in android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] != '1,000,000,000+' and app[5] != '500,000,000+' and app[5] != '100,000,000+'):
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
English translation from Bengali : 1

Building an app around a popular book seems to be promising in terms of attracting users. Our analysis with applestore data also showed us the potential in this category. So we recommend choosing a popular book and creating an app around it. It would be a good idea to be creative and add some new features in the app.   