# Profiles of cost-effective applications

Our goal in this project is to find profitable mobile application profiles for the App Store and Google Play markets. We work as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions about what kind of apps we want. they create.

In our company, we only develop apps that are free to download and install, and our main source of revenue is from in-app ads. This means that our revenue for a given application is mainly influenced by the number of users who use our application. Our goal for this project is to analyze data to help our developers understand what types of apps are likely to attract more users.

## Exploring the datasets

In [1]:
from csv import reader

### L'ensemble de données App Store ###
opened_file = open('AppleStore.csv', encoding="utf8")
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

### L'ensemble de données Google Play ###
opened_file = open('googleplaystore.csv', encoding="utf8")
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

In [2]:
#creating a function named explore_data() to print rows in a readable way

def explore_data(dataset, start, end, rows_and_columns=False):
# The Python "TypeError: '_io.TextIOWrapper' object is not subscriptable" occurs when we try to use square brackets
# to access a key or index in a file object.To solve the error, use the readlines() method if you need a list of 
# the file's lines, or parse the JSON before accessing a key.
    #dataset_lines = dataset.readlines() 
    #dataset_slice=dataset_lines[start:end]
    
    dataset_slice=dataset[start:end]

    
    for row in dataset_slice:
        print(row)
        print('\n') # to add a new line after each row
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [3]:
#exploring our datasets using the function above
print("The Apple Store's Data: \n")
explore_data(ios, 0, 5, False)

The Apple Store's Data: 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']




In [4]:
print("The Android Play's Data: \n")
explore_data(android, 0, 5, False)

The Android Play's Data: 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']




In [5]:
#The column names
print("Apple's columns names: \n")
print(ios[0],'\n')

print("Android's columns names: \n")
print(android[0])

Apple's columns names: 

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 

Android's columns names: 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


# Removing duplicate items

In [6]:
#Example of duplicate apps
for app in android:
    name=app[0]
    if name == 'Twitter':
        print(app, '\n')

['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device'] 

['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11667403', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'August 6, 2018', 'Varies with device', 'Varies with device'] 

['Twitter', 'NEWS_AND_MAGAZINES', '4.3', '11657972', 'Varies with device', '500,000,000+', 'Free', '0', 'Mature 17+', 'News & Magazines', 'July 30, 2018', 'Varies with device', 'Varies with device'] 



In [7]:
unique_apps = []
duplicate_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

In [8]:
print('Number of duplicate apps:', len(duplicate_apps))
print('Number of unique apps:', len(unique_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:5])

Number of duplicate apps: 1181
Number of unique apps: 9660


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


In [9]:
android_No_Header = android[1:]
print('Expected length:' ,len(android_No_Header) - 1181)

Expected length: 9659


Reviews_max a dictionary where each key is a unique app name and the corresponding dictionary value is
the highest number of reviews of that app.

In [10]:
reviews_max = {}

for row in android:
    name = row[0]
    n_reviews = float(row[2])
    
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In [11]:
print('reviews_max: ', len(reviews_max))
print('unique_apps: ', len(unique_apps))

reviews_max:  9660
unique_apps:  9660


Removing duplicate data using the dictionary reviews_max

In [12]:
android_clean = []
already_added = []

for row in android:
    name = row[0]
    n_reviews = float(row[2])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(row)
        already_added.append(name)

In [13]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8197
Number of columns: 13


We'd like to analyze only the apps that are designed for an English-speaking audience.

# Removing Non-English Apps

In [14]:
#Total duplicate apps in Apple store
duplicate_apps = []
unique_apps = []

for app in ios:
    id=app[0]
    if id in unique_apps:
        duplicate_apps.append(id)
    else:
        unique_apps.append(id)
print('Number of duplicate apps:', len(duplicate_apps))
print('Example of duplicate apps', duplicate_apps[:15])

Number of duplicate apps: 0
Example of duplicate apps []


Writing a function that takes in a string and returns False if there's any character in the string that doesn't
belong to the set of common English characters; otherwise, the function returns True.

In [15]:
def is_english(string):
    for letter in string:
        if ord(letter) > 127:
            return False          
    return True

In [16]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(ord('™'))
print(ord('😜'))

True
False
False
False
8482
128540


To minimize the impact of data loss, we will only remove an app if its name has more than three non-ASCII characters:

In [17]:
def is_english(string):
    n=0
    for letter in string:
        if ord(letter) > 127:
            n += 1
    if n > 3:
        return False
    return True

In [18]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


In [19]:
#filter out non-English apps from android
english_android = []
english_ios = []

for row in android_clean:
    name = row[0]
    if is_english(name):
        english_android.append(row)
        
for row in ios:
    name = row[1]
    if is_english(name):
        english_ios.append(row)

In [20]:
explore_data(english_android, 0, 3, True)
print('\n')
explore_data(english_ios, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 8167
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+'

# Isolating the Free Apps

As mentioned in the introduction, we only develop apps that are free to download and install, and our main source of revenue is from in-app ads. Our datasets contain both free and non-free apps, and we will need to isolate only free apps for our analysis. Below we isolate the free apps for our two datasets

In [21]:
#isolating free apps in separate lists: 
free_android = []
free_ios = []

for row in english_android:
    price = row[7]
    if price == '0':
        free_android.append(row)

for row in english_ios:
    price = row[4]
    if price == '0.0':
        free_ios.append(row)

In [22]:
explore_data(free_android, 0, 5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 7565
Number of columns: 13


In [23]:
explore_data(free_ios,0, 5, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 3222
Number of columns: 16


# Most Common Apps by Genre:

Let's begin the analysis by determining the most common genres for each market. For this, we'll need to
build frequency tables for a few columns in our datasets (prime_genre column of the App Store
data set, and Genres and Category columns of the Google Play data set).

We'll build two functions we can use to analyze the frequency tables:

 1-One function to generate frequency tables that show percentages.
 
 2-Another function we can use to display the percentages in a descending order.

So far, we've spent a lot of time cleaning the data:

Removal of inaccurate data
Remove duplicate app entries
Remove non-English apps
Isolate free apps

As we mentioned in the introduction, our goal is to determine the types of apps that can attract more users, because our revenue is strongly influenced by the number of people using our apps.

Since our end goal is to add the apps to both the App Store and Google Play, we need to find app profiles that are successful in both markets. For example, a profile that might work well for both markets might be a productivity app that uses gamification.

Let's start the analysis by getting an idea of the most common genres for each market. To do this, we'll create a frequency table for the prime_genre column from the App Store dataset and the Genres and Category columns from the Google Play dataset.

We'll create two functions that we can use to analyze frequency tables:

A function to generate frequency tables displaying percentages
Another function we can use to display percentages in descending order

In [24]:
def freq_table(dataset, index):
    freq_dict = {}
    for row in dataset:
        value = row[index]
        if value in freq_dict:
            freq_dict[value] += 1
        else:
            freq_dict[value] = 1
    for key in freq_dict:
        freq_dict[key]= freq_dict[key]/len(dataset) * 100
    return freq_dict
    

In [25]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [26]:
display_table(free_ios, 11)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that among the free applications in English, more than half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of apps are designed for education, followed by social media apps which make up 3.29% of apps in our data set.

The general deduction is that the App Store (at least the part containing free applications in English) is dominated by applications designed for pleasure (games, entertainment, photo and video, social networks, sports, music, etc.) , while other apps with goals of (education, shopping, utilities, productivity, lifestyle, etc.) are rarer. However, just because fun apps are the most numerous does not mean that they also have the most users - the demand may not be the same as the supply.

In [27]:
display_table(free_android, 1)  #Category

FAMILY : 19.07468605419696
GAME : 11.037673496364839
TOOLS : 8.67151354923992
FINANCE : 3.8202247191011236
PRODUCTIVITY : 3.727693324520819
LIFESTYLE : 3.688037012557832
BUSINESS : 3.344348975545274
PHOTOGRAPHY : 3.2782551222736287
SPORTS : 3.146067415730337
COMMUNICATION : 3.0931923331130204
PERSONALIZATION : 3.0799735624586915
HEALTH_AND_FITNESS : 3.0667547918043625
MEDICAL : 3.0138797091870457
SOCIAL : 2.6569729015201586
NEWS_AND_MAGAZINES : 2.6173165895571713
TRAVEL_AND_LOCAL : 2.3661599471249173
SHOPPING : 2.3529411764705883
BOOKS_AND_REFERENCE : 2.1017845340383343
VIDEO_PLAYERS : 1.903502974223397
DATING : 1.7316589557171185
EDUCATION : 1.4937210839391937
MAPS_AND_NAVIGATION : 1.4805023132848645
ENTERTAINMENT : 1.3218770654329148
FOOD_AND_DRINK : 1.2161269001982815
AUTO_AND_VEHICLES : 0.9517514871116985
WEATHER : 0.8592200925313946
LIBRARIES_AND_DEMO : 0.8460013218770654
HOUSE_AND_HOME : 0.8195637805684072
ART_AND_DESIGN : 0.7534699272967614
COMICS : 0.7005948446794449
PARENTING 

The situation looks noticeably different on Google Play: there aren't many apps designed for fun, and it seems that a fair number of apps are designed for practical purposes (family, tools, business, lifestyle). life, productivity, etc.). However, if we dig deeper into this question, we can see that the family category (which accounts for almost 19% of apps) mainly means games for children.

Even so, how-to apps seem to have a better representation on Google Play compared to the App Store. This picture is also confirmed by the frequency chart we see for the Genres column:

In [28]:
display_table(free_android, 9)  #Genre

Tools : 8.658294778585592
Entertainment : 6.014540647719762
Education : 5.393258426966292
Finance : 3.8202247191011236
Productivity : 3.727693324520819
Lifestyle : 3.674818241903503
Action : 3.5426305353602117
Business : 3.344348975545274
Photography : 3.2782551222736287
Sports : 3.2253800396563124
Communication : 3.0931923331130204
Personalization : 3.0799735624586915
Health & Fitness : 3.0667547918043625
Medical : 3.0138797091870457
Social : 2.6569729015201586
News & Magazines : 2.6173165895571713
Travel & Local : 2.3529411764705883
Shopping : 2.3529411764705883
Simulation : 2.313284864507601
Books & Reference : 2.1017845340383343
Arcade : 2.0356906807666886
Casual : 1.982815598149372
Video Players & Editors : 1.890284203569068
Dating : 1.7316589557171185
Maps & Navigation : 1.4805023132848645
Food & Drink : 1.2161269001982815
Racing : 1.1103767349636484
Puzzle : 1.0971579643093192
Role Playing : 1.0575016523463316
Strategy : 1.0178453403833443
Auto & Vehicles : 0.9517514871116985
We

The difference between the Genres and Category columns is unclear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We are only looking for an overview at the moment, so we will work with the Category column in the future.

So far, we've found that the App Store is dominated by apps designed for fun, while Google Play presents a more balanced landscape of practical and fun apps. We would now like to have an idea of the type of apps that have the most users.

# Most popular apps by genre on the App Store

One way to find out which genres are the most popular (which have the most users) is to calculate the average number of installs for each app genre. For the Google Play dataset, we can find this information in the Installs column, but for the App Store dataset, this information is missing. To work around this problem, we will take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Below we calculate the average number of user reviews per app genre on the App Store:

In [29]:
for genre in freq_table(free_ios, 11):
    total = 0
    len_genre = 0
    for row in free_ios:
        genre_app = row[11]
        if genre_app == genre:
            rate = float(row[5])
            total += rate
            len_genre += 1
    avg_user_rating = total / len_genre
    print(genre, ': ', avg_user_rating)

Social Networking :  71548.34905660378
Photo & Video :  28441.54375
Games :  22788.6696905016
Music :  57326.530303030304
Reference :  74942.11111111111
Health & Fitness :  23298.015384615384
Weather :  52279.892857142855
Utilities :  18684.456790123455
Travel :  28243.8
Shopping :  26919.690476190477
News :  21248.023255813954
Navigation :  86090.33333333333
Lifestyle :  16485.764705882353
Entertainment :  14029.830708661417
Food & Drink :  33333.92307692308
Sports :  23008.898550724636
Book :  39758.5
Finance :  31467.944444444445
Education :  7003.983050847458
Productivity :  21028.410714285714
Business :  7491.117647058823
Catalogs :  4004.0
Medical :  612.0


We notice that navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have nearly half a million user reviews together:

In [30]:
for app in free_ios:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The same pattern applies to social media apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. The same goes for music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the median number.

Our goal is to find popular genres, but browsing, social media, or music apps may seem more popular than they actually are. The average number of ratings seems skewed by very few apps that have hundreds of thousands of user ratings, while other apps may struggle to get past the 10,000 threshold. We might get a better picture by removing these extremely popular apps for each genre, then reworking the averages, but we'll leave that level of detail for later.

The benchmark apps have an average of 74,942 user ratings, but it's actually the Bible and Dictionary.com that skew the average rating:

In [31]:
for app in free_ios:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


One thing we could do is take another popular book and turn it into an app where we could add different features on top of the raw version of the book. This can include daily quotes from the book, an audio version of the book, quizzes about the book, etc. Besides that, we can also embed a dictionary in the app, so users don't need to leave our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by entertainment apps. This suggests that the market might be a bit saturated with apps for fun, which means a handy app might have a better chance of standing out among the large number of apps on the App Store.

Other genres that seem popular include weather, book, food and drink, or finance. The book genre seems to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:

Weather apps - people don't usually spend too much time in the app, and the chances of taking advantage of in-app additions are low. Also, getting reliable live weather data may require us to plug our apps into non-free APIs.

Food and drink - examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. Creating a popular food and drink app therefore requires cooking and delivery service, which is outside the scope of our company.

Financial applications - these applications involve banking, bill paying, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.

Let's now analyze the Google Play market a bit.

# Applications les plus populaires par genre sur Google Play

For the Google Play market, we actually have data on the number of installs, so we should be able to get a clearer picture of how popular the genre is. However, the install numbers don't seem precise enough - we can see that most of the values are open (100+, 1,000+, 5,000+, etc.):

In [32]:
for category in freq_table(free_android, 1):
    total = 0
    len_category = 0
    for row in free_android:
        category_app = row[1]
        if category_app == category:
            n_install = row[5]
            n_install = n_install.replace('+', '')
            n_install = n_install.replace(',', '')
            n_install = float(n_install)
            total += n_install
            len_category += 1
    avg_number_install = total / len_category
    print(category, ': ', avg_number_install)

ART_AND_DESIGN :  2003791.2280701755
AUTO_AND_VEHICLES :  737219.4444444445
BEAUTY :  640861.9047619047
BOOKS_AND_REFERENCE :  10476157.264150944
BUSINESS :  2753974.1501976284
COMICS :  847567.9245283019
COMMUNICATION :  47166160.384615384
DATING :  1075582.5190839695
EDUCATION :  3108407.079646018
ENTERTAINMENT :  21134600.0
EVENTS :  354431.3333333333
FINANCE :  1574833.2179930797
FOOD_AND_DRINK :  2300192.934782609
HEALTH_AND_FITNESS :  4885919.051724138
HOUSE_AND_HOME :  1565838.7096774194
LIBRARIES_AND_DEMO :  813796.875
LIFESTYLE :  1782802.9032258065
GAME :  16655938.269461079
FAMILY :  3045982.508662509
MEDICAL :  168882.35087719298
SOCIAL :  27302664.05472637
SHOPPING :  7866974.382022472
PHOTOGRAPHY :  18738970.201612905
SPORTS :  4601628.844537815
TRAVEL_AND_LOCAL :  16171381.56424581
TOOLS :  12344508.658536585
PERSONALIZATION :  6562636.9527897
PRODUCTIVITY :  20537621.879432622
PARENTING :  647208.5416666666
WEATHER :  5542846.153846154
VIDEO_PLAYERS :  27268931.94444444

We notice that communication apps are the most installed: 38,456,119. This number is heavily skewed by a few apps that have over a billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with more than 100 and 500 million installations:

In [33]:
for app in free_android:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Messenger – Text and Video Chat for Free : 1,000,000,000+
WhatsApp Messenger : 1,000,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+
Firefox Browser fast & private : 100,000,000+
Yahoo Mail – Stay Organized : 100,000,000+
imo beta free calls and text : 100,000,000+
imo free video calls and chat : 500,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Who : 100,000,000+
WeChat : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Android Messages : 100,000,000+
Telegram : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
BBM - Free Calls & Messages : 100,000,000+

If we remove all communication apps that have more than 100 million installs, the average would be reduced by about ten times:

In [34]:
under_100_m = []

for app in free_android:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

4525997.729468599

We see the same pattern for the video player category, which is the runner-up with 24,727,872 installs. The market is dominated by applications such as Youtube, Google Play Movies & TV or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors) or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these kinds of apps may seem more popular than they actually are. Moreover, these niches seem to be dominated by a few giants that are hard to compete with.

The game genre seems quite popular, but we previously discovered that this part of the market seems a bit saturated, so we'd like to offer a different app recommendation if possible.

The books and reference genre also seems quite popular, with an average install count of 8,767,811. It's worth exploring this in more depth, as we've found that this genre has some potential to perform well on the App Store, and our goal is to recommend a genre of app that has potential for earning on both the App Store and Google Play.

Let's take a look at some of the apps of this genre and their number of installs:

In [35]:
for app in free_android:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

Wattpad 📖 Free Books : 100,000,000+
E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Amazon Kindle : 100,000,000+
Cool Reader : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Oxford Dictionary of English : Free : 10,000,000+
Offline: English to Tagalog Dictionary : 500,000+
Spanish English Translator : 10,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
NOOK App for NOOK Devices : 500,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read b

However, it seems that there are only a few very popular apps, so this market still shows potential. Let's try to come up with app ideas based on the type of apps that fall somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads): However, there seem to be only a few very popular apps , so this market still shows potential. Let's try to come up with app ideas based on the type of apps that fall somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [36]:
for app in free_android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Dictionary - Merriam-Webster : 10,000,000+
NOOK: Read eBooks & Magazines : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
Oxford Dictionary of English : Free : 10,000,000+
Spanish English Translator : 10,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
English Dictionary - Offline : 10,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) :

This type seems to be dominated by software for processing and reading e-books, as well as various library and dictionary collections, so it's probably not a good idea to build similar applications as there will be significant competition .

We also notice that there are many apps built around the Quran book, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a newer book) and turning it into an app could be profitable for the Google Play and App Store marketplaces.

However, it seems that the market is already full of libraries, so we need to add some special features besides the raw version of the book. This can include daily quotes from the book, an audio version of the book, quizzes about the book, a forum where people can discuss the book, etc.

# Conclusion

In this project, we analyzed data on the App Store and Google Play mobile apps with the aim of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a newer book) and turning it into an app could be profitable for the Google Play and App Store marketplaces. The marketplaces are already full of libraries, so we need to add special features on top of the raw version of the book. This can include daily quotes from the book, an audio version of the book, quizzes about the book, a forum where people can discuss the book, etc.