# Analysing Free Profitable iOS and Android apps 

In this project, we are analysing a set of iOS and android apps that are free to download and install having the only source of revenue as in-app ads or purchases.

The goal here is to analyse patterns in data that make the app more engaging and incentivize continuous user engagement with the app, thus increasing revenue.



The datasets used in this project are the following : 


[Google Play Store Apps](https://www.kaggle.com/lava18/google-play-store-apps/home)

Credits : Lavanya Gupta

[iOS App Store](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home)

Credits : Ramanathan

Here are the column descriptions for both the datasets : 

Google Play Store Apps:

| Column Name    	| Column Description                                                                                                                            	|
|----------------	|-----------------------------------------------------------------------------------------------------------------------------------------------	|
| App            	| Application name                                                                                                                              	|
| Category       	| Category the app belongs to                                                                                                                   	|
| Rating         	| Overall user rating of the app (as when scraped)                                                                                              	|
| Reviews        	| Number of user reviews for the app (as when scraped)                                                                                          	|
| Size           	| Size of the app (as when scraped)                                                                                                             	|
| Installs       	| Number of user downloads/installs for the app (as when scraped)                                                                               	|
| Type           	| Paid or Free                                                                                                                                  	|
| Price          	| Price of the app (as when scraped)                                                                                                            	|
| Content Rating 	| Age group the app is targeted at - Children / Mature 21+ / Adult                                                                              	|
| Genres         	| An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres. 	|
| Last Updated   	| Date when the app was last updated on Play Store (as when scraped)                                                                            	|
| Current Ver    	| Current version of the app available on Play Store (as when scraped)                                                                          	|
| Android Ver    	| Min required Android version (as when scraped)                                                                                                	|                       

iOS App Store:


| Column Name        | Description                                     |
|--------------------|-------------------------------------------------|
|        "id"        | App ID                                          |
| "track_name"       | App Name                                        |
| "size_bytes"       | Size (in Bytes)                                 |
| "currency"         | Currency Type                                   |
| "price"            | Price amount                                    |
| "rating_count_tot" | User Rating counts (for all version)            |
| "rating_count_ver" | User Rating counts (for current version)        |
| "user_rating"      | Average User Rating value (for all version)     |
| "user_rating_ver"  | Average User Rating value (for current version) |
| "ver"              | Latest version code                             |
| "cont_rating"      | Content Rating                                  |
| "prime_genre"      | Primary Genre                                   |
| "sup_devices.num"  | Number of supporting devices                    |
| "ipadSc_urls.num"  | Number of screenshots showed for display        |
| "lang.num"         | Number of supported languages                   |
| "vpp_lic"          | Vpp Device Based Licensing Enabled              |

For our study , the primary columns that we will be focusing on will be the price of the application , total and average user rating and the genre. 

In [6]:
apple = open("AppleStore.csv",encoding="utf8")
google = open("googleplaystore.csv",encoding="utf8")
from csv import reader
readfile1 = reader(apple)
readfile2 = reader(google)
apple_data = list(readfile1)
google_data = list(readfile2)

#Storing the header containing the columns names in a separate list
apple_header = apple_data[0]
google_header = google_data[0]

# Assigning only the data values to the lists as the header has been separated
apple_data = apple_data[1:]
google_data = google_data[1:]
print('iOS Store Apps')
explore_data(apple_data,0,3,True)
print(' \n Play Store Apps')
explore_data(google_data,0,3,True)



iOS Store Apps
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 17
 
 Play Store Apps
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510

Here we use a function **explore_data** to obtain a subset of apps and their information.

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adding new line for a cleaner output

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

# Checking for incorrect data or fields

Upon checking the discussion section of the [google play store data set](https://www.kaggle.com/lava18/google-play-store-apps/discussion) , we see that the application against the row 10472 has a rating of 19 which is not possible for a play store app.

This is because the maximum rating for a google play store app can be 5.

`print (google_data[10472])`


`['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', ' ', 'February 11, 2018', '1.0.19', '4.0 and up']`


Accordingly, we delete this row.

In [7]:
#del google_data[10472] -> Commented as we do not want to accidentally run this again

Upon inspecting the deleted row further, we see that the error has creeped in due to the missing category value

Upon checking the app [Life Made WI-Fi Touchscreen Photo Frame](https://play.google.com/store/apps/details?id=com.lifemade.internetPhotoframe) , we see that the category is listed as :
>  Lifestyle

Since we now have this information , we can add a new record to the table with correct category value while maintaining the previous data.

In [11]:
#Appending this new row
life = ['Life Made WI-Fi Touchscreen Photo Frame', 'LIFESTYLE', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'February 11, 2018', '1.0.19', '4.0 and up']
#google_data.append(life) #Commenting so that the record isnt added again
print (google_data[-1])


['Life Made WI-Fi Touchscreen Photo Frame', 'LIFESTYLE', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'February 11, 2018', '1.0.19', '4.0 and up']


Next we check for duplicates in our datasets:

In [12]:
# Checking for duplicates in play store apps
dup_android = [] #duplicate android apps
uni_android = [] #unique android apps
for app in google_data:
    name=app[0]
    if name in uni_android:
          dup_android.append(name)
    else:
          uni_android.append(name)
print('Number of duplicate play store apps :', len(dup_android))
print('\n Examples of duplicate  play store apps :',dup_android[:5])
print ('\n')

#Checking for duplicates in iOS app store
dup_ios = [] #Duplicate iOS apps
uni_ios = [] #Unique iOS apps
for apps in apple_data:
     names=apps[1]
     if names in uni_ios:
          dup_ios.append(names)
     else:
          uni_ios.append(names)
print('Number of duplicate iOS store apps :',len(dup_ios))
print('\n Examples of duplicate iOS store apps :',dup_ios[:5])
print('\n')

#

Number of duplicate play store apps : 1181

 Examples of duplicate  play store apps : ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


Number of duplicate iOS store apps : 0

 Examples of duplicate iOS store apps : []




In [18]:
#Checking the duplicate Play store apps
print ('Some Duplicate play store apps \n')
for name in dup_android[:7]:
    print(name)                     

Some Duplicate play store apps 

Quick PDF Scanner + OCR FREE
Box
Google My Business
ZOOM Cloud Meetings
join.me - Simple Meetings
Box
Zenefits


Upon checking the duplicate play store apps, we see that these apps have the same values in size,genre,cost and the last updated date.

We can therefore conclude that these apps are duplicate and hence will have to be cleaned.

In [12]:
print ('\n \n Duplicate iOS apps \n')
#Checking the duplicate iOS store apps
for apps in apple_data:
    if (apps[1] == 'Mannequin Challenge' or apps[1]=='VR Roller Coaster'):
        print(apps)


 
 Duplicate iOS apps 

['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


Upon checking the duplicate iOS store apps, we see that these apps have significantly different sizes, ratings and version numbers.
 
We can therefore conclude that these apps are not the same but merely having the same name.

# Removing the duplicate data entries

We will now remove the duplicates by creating a dictionary that has the key as the unique app name and the value as the highest number of reviews of that app.

The reason we do this is because the only difference among the duplicate play store apps is the reviews.

In [13]:
reviews_max = {}
for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
            reviews_max[name] = n_reviews
print(len(reviews_max))

        

9660


We know that there were 1181 duplicate apps , hence the difference between the total apps in the entire dataset and 1181 should be the same as our obtained dictionary length.

In [14]:
print ( 'Expected length:' , len(google_data) - 1181)
print ( 'Actual length:' , len(reviews_max))

Expected length: 9660
Actual length: 9660


We now create two lists ,one to hold our new cleaned data and other to hold the duplicate app names.

Once we have created these, we loop over the play store dataset and for every iteration :
1. We assign the app name and ratings value to our respective variables.
2. We add the current row to the clean data list if the reviews for that list is the maximum value as checked with reviews_max list. 
3. Apart from this, we also check the list with the duplicate app names to ensure that we do not add any duplicate apps that have the same highest number of reviews.
4. We then print the length of the clean android app list and ensure that it is 9660

In [15]:
android_clean = []
already_added = []

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) 
print(len(android_clean))
        
        
        

9660


# Removing Non-English apps

In this study , we are looking for apps directed towards an English speaking audience.

As such non english apps will have to be removed from the datasets so that our results stay correct.

The following function checks each character using the [ord()](https://docs.python.org/2/library/functions.html#ord) built in function to obtain the numerical value of the character and ensure that the value is below 127 which confirms that the characted belongs to a set of common English characters.

Here is the [ASCII table](http://www.asciitable.com/index/asciifull.gif) for reference



In [16]:
def checkeng(strings):
    for ch in strings:
        if ord(ch) > 127:
            return False
    return True


In [17]:
print(checkeng('Instagram'))
print(checkeng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(checkeng('Docs To Go™ Free Office Suite'))
print(checkeng('Instachat 😜'))

True
False
False
False


Here we see that even though the apps "Docs To Go™ Free Office Suite" and "Instachat 😜" are English apps, due to the ™ and emoji character being greater than our defined limit , 127 (i.e our defined range) , these apps are classified as Non-English apps. 

In order to minimize this data loss, what we can do is alter the function to only return an application as Non-English if it has 3 or more characters that are greater than the limit 127 (i.e. the set of English language characters).

While this function is not perfect, it will be a lot more effective and efficient than our current one.

The new function will be as follows: 

In [18]:
def checkengnew(strings):
    count = 0 #to count the non ascii characters
    for ch in strings:
        if ord(ch) > 127:
            count+= 1
            if(count>3):
                return False
    return True
print(checkengnew('Instagram'))
print(checkengnew('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(checkengnew('Docs To Go™ Free Office Suite'))
print(checkengnew('Instachat 😜'))

True
False
True
True


Now that this function works, we will apply this to both our data sets in order to filter out the Non-English apps

In [19]:
andeng = [] #android english apps
ioseng = [] #ios english apps
for app in android_clean:
    name = app[0]
    ch1=checkengnew(name)
    if(ch1==True):
        andeng.append(app)
print ("The number of English Android apps are :",len(andeng))
for app in apple_data:
    name1 = app[1]
    ch1=checkengnew(name1)
    if(ch1==True):
        ioseng.append(app)
print ("The number of English iOS apps are :",len(ioseng))

The number of English Android apps are : 9615
The number of English iOS apps are : 6183


# Removing Paid Apps

As mentioned in the beginning , we are only looking to analyse free apps with their only source of revenue as in-app purchases or ads. 

We do this in the below function:

In [20]:
#Separating the paid android apps
andfinal = []
print('Free English Android apps: \n')
for app in andeng: 
    price = app[7]
    if price == '0':
       andfinal.append(app) 
explore_data(andfinal,0,5,True)
print('\n \n Free iOS Apps : \n')
#Separating the paid iOS apps
iosfinal=[]
for app in ioseng:
    price = app[4]
    if price == '0.0':
        iosfinal.append(app)
explore_data(iosfinal,0,5,True)
        
        

Free English Android apps: 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows: 8865
Number of columns: 13

 
 Free iOS Apps :

# Determining the most common and popular apps

As mentioned in the introduction, we are looking forward to analysing patterns and app profiles that work well on both Google Play and the App Store.

For instance, a profile that works well for both markers might be a productivity app that makes use of elements that are addictive, fun and promote daily use.

Upon inspecting the datasets, we can infer the following:

For the Google play store dataset, we can use the columns : 
1. Category
2. Installs
3. Genres

For the iOS app store dataset, we can use the columns : 
1. Rating count total
2. Prime genre

The function below takes in a dataset and index for the column and gives the frequency table in percentages for it.

In [21]:
def freq_table(dataset,index):
    dicts = {}
    tot = 0
    for row in dataset:
        tot +=1
        ele = row[index]
        if ele in dicts:
            dicts[ele]+=1
        else:
            dicts[ele]=1
    dictperc = {}
    for key in dicts:
        percentage = (dicts[key]/tot) * 100
        dictperc[key] = percentage
    return dictperc

The below function takes in a dataset and generates a frequency table using the above function.

It then converts the frequency table into a tuple so that it can be easily sorted.

In [22]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

## Analysing free English Play Store apps :

In [23]:
print("Sorted by category in descending order (%) \n \n")
display_table(andfinal,1)
print(" \n \nSorted by genre in descending order (%) \n \n")
display_table(andfinal,9)

Sorted by category in descending order (%) 
 

FAMILY : 18.905809362662154
GAME : 9.723632261703328
TOOLS : 8.460236886632826
BUSINESS : 4.591088550479413
LIFESTYLE : 3.914269599548787
PRODUCTIVITY : 3.8917089678511
FINANCE : 3.699943598420756
MEDICAL : 3.5307388606880994
SPORTS : 3.395375070501974
PERSONALIZATION : 3.3164128595600673
COMMUNICATION : 3.2374506486181613
HEALTH_AND_FITNESS : 3.0795262267343486
PHOTOGRAPHY : 2.9441624365482233
NEWS_AND_MAGAZINES : 2.7975183305132543
SOCIAL : 2.662154540327129
TRAVEL_AND_LOCAL : 2.33502538071066
SHOPPING : 2.2447828539199097
BOOKS_AND_REFERENCE : 2.143260011280316
DATING : 1.8612521150592216
VIDEO_PLAYERS : 1.793570219966159
MAPS_AND_NAVIGATION : 1.3987591652566271
FOOD_AND_DRINK : 1.2408347433728144
EDUCATION : 1.161872532430908
ENTERTAINMENT : 0.9588268471517203
LIBRARIES_AND_DEMO : 0.9362662154540328
AUTO_AND_VEHICLES : 0.924985899605189
HOUSE_AND_HOME : 0.8234630569655951
WEATHER : 0.8009024252679076
EVENTS : 0.7106598984771574
PARENTI

From simply analysing the genres, we see that 19.86% of apps belong to either the tools , entertainment or education category.

Furthermore we also see that apps that come under categories such as Family, games or tools consist of 37.08 of the total share of apps

An app profile that can reach out to maximum amount of users should belong to -: 

Genres : Tools , Entertainment or Education

Categories : Family, Game or Tools


## Analysing free English iOS apps

In [24]:
print ("Sorted by genre in descending order (%) \n \n")
display_table(iosfinal,11)

Sorted by genre in descending order (%) 
 

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Upon analysing iOS store apps , we see that a whopping 58.16 % of apps belong to the game genre followed by a distant Entertainment having 7.88 %

We can further find out which genres have the most users by calculating the average number of installs for each ap genre.

In the case of google play data set, we have a column installs which can be used to determine this but this information is missing in the app store data set.

Instead, to determine this, we can use the total number of user ratings for a particular genre and then divide it by the total number of apps belonging to that genre.

In [25]:
gdict = freq_table(iosfinal,11)
for genre in gdict:
    total=0
    len_genre=0
    for app in iosfinal:
        genre_app = app[11]
        if genre_app == genre:
            usr = float(app[5])
            total += usr
            len_genre += 1
    avgusr = total / len_genre
    print(genre,":" ,avgusr)
        
    

Shopping : 26919.690476190477
Photo & Video : 28441.54375
Travel : 28243.8
Utilities : 18684.456790123455
Navigation : 86090.33333333333
Social Networking : 71548.34905660378
Business : 7491.117647058823
Book : 39758.5
Catalogs : 4004.0
Finance : 31467.944444444445
Music : 57326.530303030304
Games : 22788.6696905016
Entertainment : 14029.830708661417
Productivity : 21028.410714285714
Medical : 612.0
Food & Drink : 33333.92307692308
Health & Fitness : 23298.015384615384
Education : 7003.983050847458
News : 21248.023255813954
Sports : 23008.898550724636
Weather : 52279.892857142855
Lifestyle : 16485.764705882353
Reference : 74942.11111111111


Upon analysis, we see that Navigation apps have the highest number of ratings.

We will now check the navigation apps to see the ratings for individual apps


In [26]:
for app in iosfinal:
    gen = app[11]
    if gen == 'Navigation':
        print(app[1],":",app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


We see that the 2 apps Waze and Google Maps have a large number of reviews which could result in the data being skewed

We also see the social networking genre having a high amount of reviews as these apps usually encourage repeated visits and as such more users.

Let us check these apps as well

In [27]:
for app in iosfinal:
    gen = app[11]
    if gen == 'Social Networking':
        print(app[1],":",app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Once again, we see that big companies and their apps such as Facebook, Pinterest and Skype have an unusually high amount of user reviews which could skew our data

Checking Reference apps also leads to similar conclusions with Bible and Dictionary.com apps having the vast majority of reviews


In [28]:
for app in iosfinal:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


For the most part, as seen from the percentage of games, we can analyse that iOS apps usually seem to be for fun apps.

There seems to be significant potential for apps that can combine gamification with any of these genres

Since our goal is to create an app or free with in-app purchases or ads, we need to create an app that the user spends significant time on. This removes the possibility of weather apps, or transportation apps such as flight or train information as people generally do not spend more time on such apps.

An idea for an app could include a reference/education app where users have to engage in a game in order to learn something which could also include daily rewards/tips. This type of app sounds like something more up our alley as people will generally spend more time on these apps thereby making in-app ads/purchases more rewarding.

# Analysing free English Play Store Apps

In the case of Play Store Apps , we can see that we have a column for installs which we can use in our study. The drawback with this column however is that the number of installs isnt an exact accurate number but an open ended estimate.

However, since we simply require information on which genres of apps attract how many people, this information should suffice.

In [29]:
display_table(andfinal,5)

1,000,000+ : 15.724760293288211
100,000+ : 11.551043429216017
10,000,000+ : 10.547095318668923
10,000+ : 10.197405527354766
1,000+ : 8.403835307388608
100+ : 6.91483361534123
5,000,000+ : 6.824591088550479
500,000+ : 5.561195713479977
50,000+ : 4.771573604060913
5,000+ : 4.512126339537507
10+ : 3.542019176536943
500+ : 3.248730964467005
50,000,000+ : 2.3011844331641287
100,000,000+ : 2.131979695431472
50+ : 1.9176536943034406
5+ : 0.7896221094190639
1+ : 0.5076142131979695
500,000,000+ : 0.2707275803722504
1,000,000,000+ : 0.2256063169768754
0+ : 0.04512126339537507
0 : 0.011280315848843767


In [36]:
gcat = freq_table(andfinal,1)
for cat in gcat:
    tot=0
    uniqc=0
    for key in andfinal:
        cate=key[1]
        if cat==cate:
            installs=key[5]
            installs=installs.replace('+','')
            installs=installs.replace(',','')
            tot += float(installs)
            uniqc += 1
    avgins= tot / uniqc
    print(cat,':',avgins)
            

GAME : 15588015.603248259
MEDICAL : 120550.61980830671
FOOD_AND_DRINK : 1924897.7363636363
HOUSE_AND_HOME : 1331540.5616438356
ENTERTAINMENT : 11640705.88235294
BOOKS_AND_REFERENCE : 8767811.894736841
LIFESTYLE : 1433675.5878962537
BUSINESS : 1712290.1474201474
WEATHER : 5074486.197183099
EVENTS : 253542.22222222222
VIDEO_PLAYERS : 24727872.452830188
MAPS_AND_NAVIGATION : 4056941.7741935486
EDUCATION : 1833495.145631068
HEALTH_AND_FITNESS : 4188821.9853479853
BEAUTY : 513151.88679245283
PRODUCTIVITY : 16787331.344927534
COMICS : 817657.2727272727
FINANCE : 1387692.475609756
PERSONALIZATION : 5201482.6122448975
FAMILY : 3695641.8198090694
SHOPPING : 7036877.311557789
ART_AND_DESIGN : 1986335.0877192982
TOOLS : 10801391.298666667
DATING : 854028.8303030303
PARENTING : 542603.6206896552
COMMUNICATION : 38456119.167247385
AUTO_AND_VEHICLES : 647317.8170731707
SOCIAL : 23253652.127118643
LIBRARIES_AND_DEMO : 638503.734939759
PHOTOGRAPHY : 17840110.40229885
TRAVEL_AND_LOCAL : 13984077.710144

We see that the communication category has the most app installs with around 38,456,119.

We can check this category further:


In [52]:
for row in andfinal:
    if row[1]=='COMMUNICATION' and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print (row[0]," ",row[5])

WhatsApp Messenger   1,000,000,000+
imo beta free calls and text   100,000,000+
Android Messages   100,000,000+
Google Duo - High Quality Video Calls   500,000,000+
Messenger – Text and Video Chat for Free   1,000,000,000+
imo free video calls and chat   500,000,000+
Skype - free IM & video calls   1,000,000,000+
Who   100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji   100,000,000+
LINE: Free Calls & Messages   500,000,000+
Google Chrome: Fast & Secure   1,000,000,000+
Firefox Browser fast & private   100,000,000+
UC Browser - Fast Download Private & Secure   500,000,000+
Gmail   1,000,000,000+
Hangouts   1,000,000,000+
Messenger Lite: Free Calls & Messages   100,000,000+
Kik   100,000,000+
KakaoTalk: Free Calls & Text   100,000,000+
Opera Mini - fast web browser   100,000,000+
Opera Browser: Fast and Secure   100,000,000+
Telegram   100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer   100,000,000+
UC Browser Mini -Tiny Fast Private & Secure   100,000,000+
Viber Mess

From the above results , we see that few extremely popular apps (Skype - free IM & video calls , Google Chrome: Fast & Secure , Messenger – Text and Video Chat for Free , WhatsApp Messenger) have over 1,000,000,000+ installs. 

These results tend to skew our analysis and if removed, would greatly reduce the amount of average installs

Since we were interested in an education or training app, we can check this catergory as well. This is done below:

In [58]:
for row in andfinal:
    if (row[1]=='BOOKS_AND_REFERENCE' or row[1]=='EDUCATION') and (row[5] == '1,000,000,000+'
                                      or row[5] == '500,000,000+'
                                      or row[5] == '100,000,000+'):
        print (row[0]," ",row[5])

Google Play Books   1,000,000,000+
Bible   100,000,000+
Amazon Kindle   100,000,000+
Wattpad 📖 Free Books   100,000,000+
Audiobooks from Audible   100,000,000+


As we can see here, The only apps with the highest amount of installs are the  apps from big companies such as Google Play Books , Amazon Kindle and Audible. The other app that has a high amount of installs was seen earlier as well which is the Bible. 

As such this section has a great amount of potential in both the play store and iOS appstore.

Furthermore, it can be seen that in both the stores , games and entertainment are the most commonly installed/reviewed apps.

An idea for an app that has the potential to be succesful is an education/reference app , one which can teach the user a skill or provide a resource. This can be improved even further if we add a mini game aspect to it thereby providing entertainment value to the app as well.

# Conclusion

From our study , we can draw the following conclusions:

1.It can be seen that in both the stores , games and entertainment are the most commonly installed/reviewed apps.

2.Social Networking and Communication apps have the highest users , however creating apps in these categories is extremely unrewarding as they are dominated by various apps with a billion plus installs already.

3.The most promising category for a profitable app seems to be the Education/Reference category. 
This category has very few apps with high amount of installs and these apps serve a very distinct purpose.

An idea for an app that has the potential to be succesful is an education/reference app , one which can teach the user a skill or provide a resource. This can be improved even further if we add a mini game aspect to it thereby providing entertainment value to the app as well.


