# Profit generating  App for both Google Pay Store and App store

The aim of this project is to find app profiles that are profitable for the App Store.

We only deal with apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.


In [1]:
opened_file1=open('AppleStore.csv')
from csv import reader
read_file1=reader(opened_file1)
appstoredata=list(read_file1)
opened_file2=open('googleplaystore.csv')
read_file2=reader(opened_file2)
gstoredata=list(read_file2)
#print(gstoredata[1:])
#appstoredata[1:]

In [7]:
def explore_data(dataset, start, end, rows_and_columns=True):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        


In [2]:
explore_data(gstoredata[1:],0,4)


In [3]:
explore_data(appstoredata[1:],0,4)

Column names of both data sets

In [13]:
print(gstoredata[0])
print('\n')
print(appstoredata[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Data Cleaning: Duplicate entries and Inaccurate data should be removed/ corrected.

In [15]:
rowlen=len(gstoredata[0])
for row in gstoredata:
    rowlen1=len(row)
    if rowlen1!=rowlen:
        print(row)
        print(gstoredata.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


In [16]:
del gstoredata[10473]

In [17]:
print(gstoredata[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


There are a plenty of Duplicate entries which have to be removed, for which we need to know no.of such existing entries.

In [18]:
duplicate_rows=[]
unique_rows=[]
for row in gstoredata[1:]:
    name=row[0]
    if name in unique_rows:
        duplicate_rows.append(name)
    else:
        unique_rows.append(name)
len(duplicate_rows)

1181

But they shouldn't be removed randomly, instead should be removed based on a valid criterion like, only the most recent data is kept.

In [25]:
for row in gstoredata[1:]:
        if row[0]=='Instagram':
            print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [23]:
reviews_max={}
for row in gstoredata[1:]:
    name=row[0]
    n_reviews=float(row[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name]=n_reviews
    if name not in reviews_max:
        reviews_max[name]=n_reviews
for element in reviews_max:
    if element=='Instagram':
        print(reviews_max[element])

66577446.0


Above code is to keep data with highest no.of reviews amongst duplicate entries. It is also verified with [Instagram] example. 

In [26]:
len(reviews_max)

9659

Adding cleaned data into a seperate list named android_clean

In [77]:
android_clean=[]
already_added=[]
for row in gstoredata[1:]:
    name=row[0]
    n_reviews=float(row[3])
    if (n_reviews==reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
len(android_clean)
    

9659

We'd like analyze only the apps that are directed toward an English-speaking audience. However, if we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.

In [55]:
asci=[]
def check_eng(strng):
    for i in range(0,len(strng)):
        if ord(strng[i])>127:
            asci.append(strng[i])
    if len(asci)>3:
        return False
    return True
          
s='爱奇艺PPS -《欢乐颂2》电视剧热播'  

check_eng(s)

False

The above procedure is repeated for App Store data.

In [61]:
rowlen=len(appstoredata[0])
for row in appstoredata:
    rowlen1=len(row)
    if rowlen1!=rowlen:
        print(row)
        print(appstoredata.index(row))

In [60]:
reviewsapp_max={}
for row in appstoredata[1:]:
    name=row[1]
    n_reviews=float(row[5])
    if name in reviewsapp_max and reviewsapp_max[name]<n_reviews:
        reviewsapp_max[name]=n_reviews
    if name not in reviewsapp_max:
        reviewsapp_max[name]=n_reviews
len(reviewsapp_max)

7195

In [62]:
apple_clean=[]
apple_added=[]
for row in appstoredata[1:]:
    name=row[1]
    n_reviews=float(row[5])
    if (n_reviews==reviewsapp_max[name]) and (name not in apple_added):
        apple_clean.append(row)
        apple_added.append(name)
len(apple_added)

7195

Filtering apps from App Store whose names are in English, reason foer which is already explained above.

In [76]:
filter_appdata=[]
def filt(dataset):
    filter_app=[]
    for row in dataset:
        asciapp=[]
        strng=row[1]
        for i in range(0,len(strng)):
            if ord(strng[i])>127:
                asciapp.append(strng[i])
        if len(asciapp)<3:
            filter_app.append(row)
    return filter_app
filter_appdata=filt(apple_clean)
filter_appdata
len(filter_appdata)
    


6153

Filtering apps from Google Play Store whose names are in English, reason foer which is already explained above.

In [78]:
filter_gstoredata=[]
def filtg(dataset):
    filter_app=[]
    for row in dataset:
        asciiapp=[]
        strng=row[0]
        for i in range(0,len(strng)):
            if ord(strng[i])>127:
                asciiapp.append(strng[i])
        if len(asciiapp)<3:
            filter_app.append(row)
    return filter_app
filter_gstoredata=filtg(android_clean)
len(filter_gstoredata)
   

9597

Filtering apps which are free to install from App store

In [93]:
free_apstore=[]
for row in filter_appdata:
    if float(row[4])==0:
        free_apstore.append(row)
len(free_apstore)
    

3201

Filtering apps which are free to install from Google Play Store

In [94]:
free_gstore=[]
y=0
for row in filter_gstoredata:
    if row[7][0]=='0':
        free_gstore.append(row) 
len(free_gstore)

   

8848

So far, we spent a good amount of time on cleaning data, and:

Removed inaccurate data, removed duplicate app entries, removed non-English apps, Isolated the free apps.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets. For instance, a profile that works well for both markets might be a productivity app.

Let's begin the analysis by getting a sense of what are the most common genres for each market. 

For this, let's look at the frequencies of genres amongst apps in both Appstore and Googlepaly store.

Calculating frequencies of genres in App store.

In [102]:
def freq_gen_ap(dataset,ind):
    gen_freq_ap={}
    gen_ap=[]
    for row in dataset:
        gen_ap.append(row[ind])
    gen_ap
    for element in gen_ap:
        if element in gen_freq_ap:
            gen_freq_ap[element]+=1
        else:
            gen_freq_ap[element]=1
    return gen_freq_ap

In [104]:
freq_gen_ap(free_apstore,11)

{'Book': 12,
 'Business': 17,
 'Catalogs': 4,
 'Education': 118,
 'Entertainment': 251,
 'Finance': 35,
 'Food & Drink': 26,
 'Games': 1864,
 'Health & Fitness': 65,
 'Lifestyle': 50,
 'Medical': 6,
 'Music': 66,
 'Navigation': 6,
 'News': 43,
 'Photo & Video': 160,
 'Productivity': 56,
 'Reference': 17,
 'Shopping': 83,
 'Social Networking': 106,
 'Sports': 69,
 'Travel': 40,
 'Utilities': 79,
 'Weather': 28}

Calculating frequencies of genres in Google Play store.

In [105]:
def freq_gen_gp(dataset,ind):
    gen_freq_gp={}
    gen_ap=[]
    for row in dataset:
        gen_ap.append(row[ind])
    gen_ap
    for element in gen_ap:
        if element in gen_freq_gp:
            gen_freq_gp[element]+=1
        else:
            gen_freq_gp[element]=1
    return gen_freq_gp

In [107]:
freq_gen_gp(free_gstore,9)
freq_gen_gp(free_gstore,1)


{'ART_AND_DESIGN': 57,
 'AUTO_AND_VEHICLES': 82,
 'BEAUTY': 53,
 'BOOKS_AND_REFERENCE': 189,
 'BUSINESS': 407,
 'COMICS': 54,
 'COMMUNICATION': 286,
 'DATING': 165,
 'EDUCATION': 103,
 'ENTERTAINMENT': 85,
 'EVENTS': 63,
 'FAMILY': 1676,
 'FINANCE': 328,
 'FOOD_AND_DRINK': 110,
 'GAME': 858,
 'HEALTH_AND_FITNESS': 273,
 'HOUSE_AND_HOME': 71,
 'LIBRARIES_AND_DEMO': 83,
 'LIFESTYLE': 344,
 'MAPS_AND_NAVIGATION': 123,
 'MEDICAL': 313,
 'NEWS_AND_MAGAZINES': 248,
 'PARENTING': 58,
 'PERSONALIZATION': 294,
 'PHOTOGRAPHY': 261,
 'PRODUCTIVITY': 345,
 'SHOPPING': 199,
 'SOCIAL': 236,
 'SPORTS': 300,
 'TOOLS': 748,
 'TRAVEL_AND_LOCAL': 207,
 'VIDEO_PLAYERS': 159,
 'WEATHER': 70}

In [108]:
def display_table(dataset, ind):
    table = freq_gen_ap(dataset, ind)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [4]:
display_table(free_apstore,11)

In [5]:
display_table(free_gstore,9)

In [6]:
display_table(free_gstore,1)

In [112]:
freqap=freq_gen_ap(free_apstore,11)
freqap

{'Book': 12,
 'Business': 17,
 'Catalogs': 4,
 'Education': 118,
 'Entertainment': 251,
 'Finance': 35,
 'Food & Drink': 26,
 'Games': 1864,
 'Health & Fitness': 65,
 'Lifestyle': 50,
 'Medical': 6,
 'Music': 66,
 'Navigation': 6,
 'News': 43,
 'Photo & Video': 160,
 'Productivity': 56,
 'Reference': 17,
 'Shopping': 83,
 'Social Networking': 106,
 'Sports': 69,
 'Travel': 40,
 'Utilities': 79,
 'Weather': 28}

In [114]:
for genre in freqap:
    total = 0
    len_genre = 0
    for app in free_apstore:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Utilities : 19156.493670886077
News : 21248.023255813954
Book : 46384.916666666664
Finance : 32367.02857142857
Photo & Video : 28441.54375
Education : 7003.983050847458
Music : 57326.530303030304
Weather : 52279.892857142855
Health & Fitness : 23298.015384615384
Catalogs : 4004.0
Games : 22910.83100858369
Sports : 23008.898550724636
Lifestyle : 16815.48
Shopping : 27230.734939759037
Food & Drink : 33333.92307692308
Medical : 612.0
Navigation : 86090.33333333333
Social Networking : 71548.34905660378
Business : 7491.117647058823
Reference : 79350.4705882353
Travel : 28243.8
Entertainment : 14195.358565737051
Productivity : 21028.410714285714


In [115]:
display_table(free_gstore,5)

1,000,000+ : 1394
100,000+ : 1021
10,000,000+ : 935
10,000+ : 902
1,000+ : 743
100+ : 613
5,000,000+ : 604
500,000+ : 492
50,000+ : 422
5,000+ : 397
10+ : 313
500+ : 287
50,000,000+ : 202
100,000,000+ : 189
50+ : 170
5+ : 70
1+ : 45
500,000,000+ : 24
1,000,000,000+ : 20
0+ : 4
0 : 1


In [118]:
freq_gs = freq_gen_gp(free_gstore, 1)

for category in freq_gs:
    total = 0
    len_category = 0
    for app in free_gstore:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

TRAVEL_AND_LOCAL : 13984077.710144928
HOUSE_AND_HOME : 1360598.042253521
SHOPPING : 7036877.311557789
PARENTING : 542603.6206896552
MAPS_AND_NAVIGATION : 4049274.6341463416
SPORTS : 3650602.276666667
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8814199.78835979
FOOD_AND_DRINK : 1924897.7363636363
TOOLS : 10830251.970588235
FINANCE : 1387692.475609756
EDUCATION : 1833495.145631068
AUTO_AND_VEHICLES : 647317.8170731707
WEATHER : 5145550.285714285
PRODUCTIVITY : 16787331.344927534
BUSINESS : 1712290.1474201474
FAMILY : 3695641.8198090694
ART_AND_DESIGN : 1986335.0877192982
VIDEO_PLAYERS : 24727872.452830188
HEALTH_AND_FITNESS : 4188821.9853479853
PERSONALIZATION : 5201482.6122448975
NEWS_AND_MAGAZINES : 9549178.467741935
DATING : 854028.8303030303
COMICS : 832613.8888888889
EVENTS : 253542.22222222222
ENTERTAINMENT : 11640705.88235294
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1446158.2238372094
COMMUNICATION : 385905

In [119]:
for app in free_apstore:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [121]:
for app in free_apstore:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


This niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

Most popular apps by genre on google Play.

In [122]:
for app in free_gstore:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

In [124]:
under_100_m = []

for app in free_gstore:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3617398.420849421

In [125]:
for app in free_gstore:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [126]:
for app in free_gstore:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

# Conclusion

In this project, we analyzed data and predicted taking a popular book app, which includes fun features as a profitable app for Google Play Store Market and App Store Market.

