# Profitable Apps in the Android Playstore and Apple Appstore markets

The aim of this project is to find the mobile apps that are profitable in the Android Playstore and Apple Appstore markets. I work as a data analyst for a company which develops Android and iOS mobile apps, and my job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, the apps that are built are totally free to download and install, therefore, the company's only revenue model is in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app.

So, our goal for this project is to analyze the data of free apps inorder to help developers understand the kind(genre) of apps most users are interested in.

## Opening and exploring the datasets

In [1]:
def open_dataset(filename):
    opened_file=open(filename)
    from csv import reader
    read_file=reader(opened_file)
    dataset=list(read_file)
    return dataset

In [2]:
def explore_data(dataset,start,end,rows_and_cols=False):
    sliced_data=dataset[start:end]
    
    for i in sliced_data:
        print(i)
        print('\n')
        
    if rows_and_cols:
        print('Number of rows:',len(dataset))
        print('Number of columns:',len(dataset[0]))

In [3]:
ios_dataset=open_dataset("AppleStore.csv")
ios_apps_data_header=ios_dataset[0]
ios_apps_data=ios_dataset[1:]
print(ios_apps_data_header)
print('\n')
explore_data(ios_apps_data,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [4]:
and_dataset=open_dataset("googleplaystore.csv")

del and_dataset[10473] #deleting the row with a missing value

and_apps_data_header=and_dataset[0]
and_apps_data=and_dataset[1:]
print(and_apps_data_header)
print('\n')

explore_data(and_apps_data,0,3,True)
print(len(and_dataset))

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10840
Number of columns: 13
10841


## Removing the duplicate rows

In [5]:
duplicate_apps=[]
unique_apps=[]

for i in and_apps_data:
    name=i[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print(len(duplicate_apps))
print(duplicate_apps[:10])

1181
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [6]:
reviews_max={}

for i in and_apps_data:
    name,reviews=i[0],int(i[3])
    if name in reviews_max:
        if reviews > reviews_max[name]:
            reviews_max[name]=reviews
    else:
        reviews_max[name]=reviews
    
print(len(reviews_max))

9659


In [7]:
print('Expected length:', len(and_apps_data) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


In [8]:
android_clean=[]
already_added=[]

for i in and_apps_data:
    name,reviews=i[0],int(i[3])
    if reviews==reviews_max[name] and name not in already_added:
        android_clean.append(i)
        already_added.append(name)       

In [9]:
explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removing Non-English apps

In [10]:
def remove_non_english(lst,indx):
    cleansed_list=[]

    for i in lst:
        name=i[indx]
        flag=True
        false_count=0
        
        for j in name:
            if ord(j)>127:
                false_count+=1
        
        if false_count<=3:
            flag=True
        else:
            flag=False                    
                        
        if flag==True:
            cleansed_list.append(i)
    
    return cleansed_list
    

In [11]:
android_english=remove_non_english(android_clean,0)
print(len(android_english))
ios_english=remove_non_english(ios_apps_data,1)
print(len(ios_english))

9614
6183


In [12]:
ios_final=[]
android_final=[]

for i in ios_english:
    price=float(i[4])
    if price==0.0:
        ios_final.append(i)


for i in android_english:
    price=i[7]
    if price=='0':
        android_final.append(i)

print(len(ios_final))
print(len(android_final))        

3222
8864


## Most Common Apps by Genre

In [13]:
def genre_freq(lst, indx):
    
    freq_dict={}
    percent_dict={}
    total=0
    
    for i in lst:
        total+=1
        genre=i[indx]
        
        if genre in freq_dict:
            freq_dict[genre]+=1
        else:
            freq_dict[genre]=1
    
    for i in freq_dict:
        percent_dict[i]=(freq_dict[i]/total)*100
    
    return percent_dict

In [14]:
def display_table(dataset,indx):
    freq_table=genre_freq(dataset,indx)
    list_of_tuples=[]
    
    for i in freq_table:
        freq_table_as_tuple=(freq_table[i],i)
        list_of_tuples.append(freq_table_as_tuple)
        
    sorted_list_of_tuples=sorted(list_of_tuples,reverse=True)    
        
    for i in sorted_list_of_tuples:
        print(i[1],':',i[0])

In [15]:
display_table(ios_final,11)
print('\n')
display_table(android_final,1)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
C

In [16]:
freq_dict={}
apps_by_genre_dict={}
avg_installs_dict={}
temp_list_of_tuples=[]
    
for i in ios_final:
    genre=i[11]
    n_installs=float(i[5])
        
    if genre in freq_dict:
        freq_dict[genre]+=1
        apps_by_genre_dict[genre]+=n_installs
    else:
        freq_dict[genre]=1
        apps_by_genre_dict[genre]=n_installs
            
for i in freq_dict:
    avg_installs_dict[i]=apps_by_genre_dict[i]/freq_dict[i]
    temp_tuple=(avg_installs_dict[i],i)
    temp_list_of_tuples.append(temp_tuple)
    
templist_sorted=sorted(temp_list_of_tuples, reverse=True)    
    
for i in templist_sorted:
    print(i[1],':',i[0])

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


In [17]:
for i in ios_final:
    if i[11]=="Productivity":
        print(i[1],':',i[5])

Evernote - stay organized : 161065
Gmail - email by Google: secure, fast & organized : 135962
iTranslate - Language Translator & Dictionary : 123215
Yahoo Mail - Keeps You Organized! : 113709
Google Docs : 64259
Google Drive - free online storage : 59255
Dropbox : 49578
Microsoft Word : 47999
Microsoft OneNote : 39638
Microsoft Outlook - email and calendar : 32807
Hotspot Shield Free VPN Proxy & Wi-Fi Privacy : 32499
Documents 6 - File manager, PDF reader and browser : 29110
Google Sheets : 24602
Microsoft Excel : 24430
Inbox by Gmail : 21561
T-Mobile : 19977
Paper by FiftyThree - Sketch, Diagram, Take Notes : 18219
MyScript Calculator - Handwriting calculator : 16555
VPN Proxy Master - Unlimited WiFi security VPN : 13674
Microsoft OneDrive – File & photo cloud storage : 12797
Ever - Capture Your Memories : 12755
Speak & Translate － Voice and Text Translator : 12062
Tayasui Sketches : 11505
Drawing Desk - Draw, Paint, Doodle & Sketch board : 11040
Microsoft PowerPoint : 10939
Email - F

In [18]:
freq_dict2={}
apps_by_genre_dict2={}
avg_installs_dict2={}
temp_list_of_tuples2=[]
    
for i in android_final:
    genre=i[1]
    n_installs=i[5]
    n_installs=n_installs.replace(',','')
    n_installs=n_installs.replace('+','')
    n_installs=float(n_installs)
        
    if genre in freq_dict2:
        freq_dict2[genre]+=1
        apps_by_genre_dict2[genre]+=n_installs
    else:
        freq_dict2[genre]=1
        apps_by_genre_dict2[genre]=n_installs
            
for i in freq_dict2:
    avg_installs_dict2[i]=apps_by_genre_dict2[i]/freq_dict2[i]
    temp_tuple2=(avg_installs_dict2[i],i)
    temp_list_of_tuples2.append(temp_tuple2)
    
templist_sorted2=sorted(temp_list_of_tuples2, reverse=True)    
    
for i in templist_sorted2:
    print(i[1],':',i[0])   

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

In [20]:
for i in android_final:
    if i[1]=="COMICS":
        print(i[0],':',i[5])

Manga Master - Best manga & comic reader : 500,000+
GANMA! - All original stories free of charge for all original comics : 1,000,000+
Röhrich Werner Soundboard : 500,000+
Unicorn Pokez - Color By Number : 50,000+
MangaToon - Comics updated Daily : 50,000+
Manga Net – Best Online Manga Reader : 50,000+
Manga Rock - Best Manga Reader : 1,000,000+
Manga - read Thai translation : 10,000+
The Vietnam Story - Fun Stories : 10,000+
Dragon Ball Wallpaper - Ringtones : 10,000+
Funny Jokes Photos : 10,000+
Truyện Vui Tý Quậy : 10,000+
Comic Es - Shojo manga / love comics free of charge ♪ ♪ : 100,000+
comico Popular Original Cartoon Updated Everyday Comico : 5,000,000+
漫咖 Comics - Manga,Novel and Stories : 1,000,000+
Emmanuella Funny Videos 2018 : 100,000+
Manga Zero - Japanese cartoon and comic reader : 1,000,000+
Marvel Unlimited : 1,000,000+
Tapas – Comics, Novels, and Stories : 1,000,000+
Children's cartoons (Mithu-Mina-Raju) : 100,000+
Narrator's Voice : 5,000,000+
【Ranobbe complete free】 No