# Successful App Models for the App Store and Google Play Markets

1. This data analysis project focused on understanding the key factors that attract users to our free mobile apps within the Anroid and IOS ecosystems. As our company relies on in-app advertisements for revenue generation, mazimizing our user base is essential. By diving into our app usage data, we aim to identify the types of apps that consistently gain popularity and draw in larger audiences.
2. The goal of this project is straightforward: we want to provide our developers with data-driven insights into the characteristics of successful apps. These insights will help guide future development decesions, enabling us to create apps that are more likely to capture a wider user base and ultimately increase our advertising revenue.

# 1. Opening & Exploring the Data

As of September 2018, the App Store hosted around 2 million iOS applications, while Google Play had about 2.1 million Android applications. Given the extensive time and financial investment required to gather data on over 4 million applications, it’s more practical to examine a subset of the data. To conserve resources, we will explore available data that can be accessed at no cost before considering the collection of new data. We have two datasets that align with our objectives:

A dataset detailing approximately 10,000 Android apps from Google Play, collected in August 2018.
- Application name: The name of the application
- Category: The category the app belongs to
- Rating: Overall user rating of the app (as when scraped)
- Reviews: Number of user reviews for the app (as when scraped)
- Size: Size of the app (as when scraped)
- Installs: Number of user downloads/installs for the app (as when scraped)
- Type: Whether the app is Paid or Free
- Price: Price of the app (as when scraped)
- Content Rating: Age group the app is targeted at - Children / Mature 21+ / Adult
- Genres: Genres the app belongs to (apart from its main category)

A dataset with information on roughly 7,000 iOS apps from the App Store, gathered in July 2017.
- id: App ID
- track_name: App Name
- size_bytes: Size (in Bytes)
- currency: Currency Type
- price: Price amount
- rating_count_tot: User Rating counts (for all versions)
- rating_count_ver: User Rating counts (for current version)
- user_rating: Average User Rating value (for all versions)
- user_rating_ver: Average User Rating value (for current version)
- ver: Latest version code
- cont_rating: Content Rating
- prime_genre: Primary Genre
- sup_devices.num: Number of supporting devices
- ipadSc_urls.num: Number of screenshots showed for display
- lang.num: Number of supported languages
- vpp_lic: Vpp Device Based Licensing Enabled

In [35]:
from csv import reader

In [86]:
# The google play dataset:
opened_file = open('C:\\Users\\PC\\Desktop\\DA PROJECT\\Mobile Apps Data\\googleplaystore.csv',encoding="utf8")
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# The App Store dataset:
opened_file = open('C:\\Users\\PC\\Desktop\\DA PROJECT\\Mobile Apps Data\\AppleStore.csv',encoding="utf8")
read_file = reader(opened_file)
ios = list(read_file)
ios_header = IOS[0]
ios = ios[1:]

To simplify the examination of the two datasets, we develop a function called explore_data(). This function enables us to consistently investigate rows more clearly. Additionally, we incorporate a feature that allows this function to display the count of rows and columns for any dataset.

In [87]:
def explore_data(dataset, start, end, rows_and_columns = False):
    data_slice = dataset[start:end]
    for row in data_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columnns', len(dataset[0]))

Let's start with exploring the iOS data.

In [89]:
print(ios_header) 
print('\n')
explore_data(ios,0,3,rows_and_columns=True)

['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0', '2161558', '1289', '4.5', '4', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columnns 16


Based on the above snapshot, the most interesting data points in the iOS data set seems to be track_name, currency, price, rating_count_tot, rating_count_ver, and prime_genre.

We continue with exploring Android data.

In [91]:
print(android_header) 
print('\n')
explore_data(android,0,3,rows_and_columns=True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', '15-Jan-18', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


Number of rows: 10840
Number of columnns 13


In the Adroid data set the most interesting data points seems for the analysis seems to be App, Category, Rating, Reviews, Price, and Content Rating.

# 2. Delete Wrong Data

The Android dataset includes a discrepancy in row 10472. Let's examine this particular row, assess it alongside the header, and evaluate it against another row that is known to be accurate.

In [38]:
print(android[10472])
print('\n')
print(android_header)
print('\n')
print(android[0])
print('\n')

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', '7-Aug-18', '6.06.14', '4.4 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']




The row 10472 corresponds to the app Life Made WI-Fi Touchscreen Photo Frame, and we can see that the rating is 19. This is clearly off because the maximum rating for a Google Play app is 5 (as mentioned in the discussions section, this problem is caused by a missing value in the 'Category' column). As a consequence, we'll delete this row.

In [39]:
print("The number of row before deleting wrong data:", len(android)) 
del android[10472]
print("The number of row after deleting wrong data:", len(android))

The number of row before deleting wrong data: 10840
The number of row after deleting wrong data: 10839


In [40]:
def row_check(data_set_list, store):

    selected_store = ''
    length = 0

    if store == 'google':
        selected_store, length = 'google', 13
    elif store == 'apple':
        selected_store, length = 'apple', 16
    else:
        return print('Wrong App Store Name')

    for row in data_set_list:
        if len(row) != length:
            print(row)


row_check(android, 'google')
row_check(IOS, 'apple')

# 3. Removing duplicates entries

For example: Instagram has 4 entries.

In [41]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', '31-Jul-18', 'Varies with device', 'Varies with device']


There may have lots of duplicate entries. Let’s begin by identifying the affected records.

In [42]:
# Check the duplicates for android:
duplicate_apps_android = []
unique_apps_android = []

for app in android:
    name = app[0]
    if name in unique_apps_android:
        duplicate_apps_android.append(name)
    else:
        unique_apps_android.append(name)

print("Number of duplicate apps:" , len(duplicate_apps_android))
print('\n')
print("Examples of duplicate apps:" , duplicate_apps_android[:10])
print('\n')

Number of duplicate apps: 1180


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']




We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.

If you examine the rows we printed for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show the data was collected at different times. We can use this to build a criterion for keeping rows. We won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.

To remove the duplicates, we will do the following:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new dataset, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

We start with creating a dictionary that contains the name of the app and the highest rating. It will be used later as a condition for creating a unique list.

In [43]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print("Expected Length:", len(android) - 1180)
print("Actual Length:", len(reviews_max))

Expected Length: 9659
Actual Length: 9659


Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews.
We continue to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same).

As the next step, we will generate a new unique list of Android apps.

In [92]:
android_clean = []
already_add = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if reviews_max[name] == n_reviews and name not in already_add:
        android_clean.append(app)
        already_add.append(name)

explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columnns 13


Cleaned data set includes 9659 records as expected.

In [46]:
# Check the duplicates for IOS:
duplicate_apps_IOS = []
unique_apps_IOS = []

for app in IOS:
    name = app[0]
    if name in unique_apps_IOS:
        duplicate_apps_IOS.append(name)
    else:
        unique_apps_IOS.append(name)

print("Number of duplicate apps:" , len(duplicate_apps_IOS))
print('\n')
print("Examples of duplicate apps:" , duplicate_apps_IOS[:10])
print('\n')

Number of duplicate apps: 0


Examples of duplicate apps: []




# 4. Removing Non-English Apps:

# Part One

Additionally, we will implement one more check for non-English apps which will be excluded from our analysis. We start with building a code that checks if characters in a string are in an English text.

In [47]:
def is_english(string):
    non = 0
    for character in string:
        if ord(character) > 127:
            non +=1
    if non >=3:
        return False
    else:
        return True

In [48]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


The function appears to be functioning correctly. However, certain English app names incorporate emojis or other symbols such as ™, — (dash), etc. that extend beyond the ASCII character range. If we utilize the function in its current state, we may inadvertently exclude useful apps.

# Part Two

With the function provided above, there is a potential risk of losing valuable data due to misclassification of numerous English apps as non-English. To address this concern, we will exclude an app only if its name contains more than three characters with corresponding Unicode values beyond the ASCII range. Essentially, English apps with up to three emojis or other special characters will still be correctly identified as English. While our filter function isn’t perfect, it should be reasonably effective.

In [93]:
def isenglish_check(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
        if non_ascii > 3:
            return False
    
    return True

print(isenglish_check('Docs To Go™ Free Office Suite'))
print(isenglish_check('Instachat 😜'))
print(isenglish_check('爱奇艺PPS -《欢乐颂2》电视剧热播')) 

True
True
False


Let's isolate English apps into separate lists.

In [95]:
english_androids = []
english_ios = []

for row in android_clean:
    name = row[0]
    if is_english(name):
        english_androids.append(row)
        
for row in IOS:
    name = row[1]
    if is_english(name):
        english_ios.append(row)
        
explore_data(english_androids, 0, 3, True)   
print('\n')
explore_data(english_ios, 0, 3, True)        

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9597
Number of columnns 13


['284882215', 'Facebook', '389879808', 'USD', '0', '2974676', '212', '3.5', '3.5', '95', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0', '2161558', '1289', '4.5', '4', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '

Upon excluding non-English apps, we are left with 9597 records in the Android dataset and 6155 records in the iOS dataset.

# 5. Isolate the Free Apps

In our initial overview, we emphasized that our app development exclusively focuses on creating free-to-download and install applications. Our primary revenue stream relies on in-app advertisements. Our data sets encompass both free and non-free apps. For our analysis, we will specifically extract and isolate the free apps from both data sets.

In [98]:
android_english_free=[]
ios_english_free=[]

for app in english_androids:
    price = app[7]
    if price == '0':
        android_english_free.append(app)

for app in english_ios:
    price = app[4]
    if price == '0':
        ios_english_free.append(app)


print('Android free apps: ',len(android_english_free))
print('iOS free apps: ', len(ios_english_free))

Android free apps:  8848
iOS free apps:  3203


After isolating free apps we left with 8848 records from the Android data set and 3203 records from the iOS data set.

# 6. Most Common Apps by Genre

# Part One

Our objective, as stated in the introduction, is to identify app types that are likely to attract a larger user base. Our revenue heavily depends on the number of users who engage with our apps. To minimize risks and overhead, we follow a three-step validation strategy for app ideas:

Build a Minimal Android Version: We create a basic Android version of the app and release it on Google Play.
User Response Evaluation: If the app receives positive feedback from users, we proceed with further development.
Profitability Assessment: After six months, if the app proves profitable, we develop an iOS version and add it to the App Store.
Since our ultimate goal is to launch the app on both Google Play and the App Store, we seek app profiles that succeed in both markets. For instance, a profile that combines productivity features with gamification could be a promising choice.

We start our analysis with understanding the most common genres in both markets. We will build a frequency table for the prime_genre column of the App Store data set and the Genres and Category of the Google Play data set.

# Part Two

As a next step we develop two functions that will help in the analysis of frequency tables:

A function that creates frequency tables
A function that sorts and presents these percentages in descending order

In [52]:
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1

    percentaged_table = {}
    for key in table:
        percentage = (table[key] / total)*100
        percentaged_table[key] = percentage
    return percentaged_table

print(freq_table(android_english_free, 1))

{'ART_AND_DESIGN': 0.6442133815551537, 'AUTO_AND_VEHICLES': 0.9267631103074141, 'BEAUTY': 0.599005424954792, 'BOOKS_AND_REFERENCE': 2.1360759493670884, 'BUSINESS': 4.599909584086799, 'COMICS': 0.6103074141048824, 'COMMUNICATION': 3.2323688969258586, 'DATING': 1.8648282097649187, 'EDUCATION': 1.164104882459313, 'ENTERTAINMENT': 0.9606690777576853, 'EVENTS': 0.7120253164556962, 'FINANCE': 3.7070524412296564, 'FOOD_AND_DRINK': 1.2432188065099457, 'HEALTH_AND_FITNESS': 3.0854430379746836, 'HOUSE_AND_HOME': 0.8024412296564195, 'LIBRARIES_AND_DEMO': 0.9380650994575045, 'LIFESTYLE': 3.887884267631103, 'GAME': 9.697106690777577, 'FAMILY': 18.942133815551536, 'MEDICAL': 3.5375226039783, 'SOCIAL': 2.667269439421338, 'SHOPPING': 2.2490958408679926, 'PHOTOGRAPHY': 2.949819168173599, 'SPORTS': 3.390596745027125, 'TRAVEL_AND_LOCAL': 2.3395117540687163, 'TOOLS': 8.453887884267631, 'PERSONALIZATION': 3.322784810126582, 'PRODUCTIVITY': 3.899186256781193, 'PARENTING': 0.6555153707052441, 'WEATHER': 0.79

In [53]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display,reverse = True )
    for entry in table_sorted:
        print(entry[1],':',entry[0])      

# Part Three

We start with analyzing prime_genre in the iOS data set.

In [54]:
display_table(ios_english_free,11)

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


The prime_genre distribution in the iOS dataset reveals a dominant preference for Games, which constitute a significant 58.16% of the total. This is followed by Entertainment apps at 7.88%, and Photo & Video apps at 4.97%, indicating a strong inclination towards apps for leisure and media creation. Education and Social Networking apps also have a notable presence with 3.66% and 3.29% respectively, reflecting a balanced interest in learning and social interaction.

Other categories like Shopping, Utilities, and Sports range between 2% to 3%, showing a moderate level of usage. Music, Health & Fitness, and Productivity apps are close behind, each hovering around 2%, suggesting a consistent but smaller user base.

Categories such as Lifestyle, News, Travel, and Finance represent around 1% to 1.5%, indicating more specialized interests. The least represented genres are Weather, Food & Drink, Reference, Business, Book, Navigation, Medical, and Catalogs, each comprising less than 1% of the dataset, pointing to a niche audience for these app types.

In summary, the iOS app landscape is heavily skewed towards entertainment, with a significant portion of users engaging with games and media-related apps, while other genres cater to more specific needs and interests.

Let's continue with analyzing Category and Genres in the Android data set.

In [101]:
print('Android Category')
display_table(android_english_free, 1)
print('---------------')
print('Android Genres')
display_table(android_english_free, 9)

Android Category
FAMILY : 18.942133815551536
GAME : 9.697106690777577
TOOLS : 8.453887884267631
BUSINESS : 4.599909584086799
PRODUCTIVITY : 3.899186256781193
LIFESTYLE : 3.887884267631103
FINANCE : 3.7070524412296564
MEDICAL : 3.5375226039783
SPORTS : 3.390596745027125
PERSONALIZATION : 3.322784810126582
COMMUNICATION : 3.2323688969258586
HEALTH_AND_FITNESS : 3.0854430379746836
PHOTOGRAPHY : 2.949819168173599
NEWS_AND_MAGAZINES : 2.802893309222423
SOCIAL : 2.667269439421338
TRAVEL_AND_LOCAL : 2.3395117540687163
SHOPPING : 2.2490958408679926
BOOKS_AND_REFERENCE : 2.1360759493670884
DATING : 1.8648282097649187
VIDEO_PLAYERS : 1.7970162748643763
MAPS_AND_NAVIGATION : 1.3901446654611211
FOOD_AND_DRINK : 1.2432188065099457
EDUCATION : 1.164104882459313
ENTERTAINMENT : 0.9606690777576853
LIBRARIES_AND_DEMO : 0.9380650994575045
AUTO_AND_VEHICLES : 0.9267631103074141
HOUSE_AND_HOME : 0.8024412296564195
WEATHER : 0.7911392405063291
EVENTS : 0.7120253164556962
PARENTING : 0.6555153707052441
ART_

In the Android dataset, the Category distribution shows a strong preference for family apps, which lead with 18.91%. Games follow as the second most popular category at 9.72%, and tools are not far behind with 8.46%. Business apps also have a significant share at 4.59%, reflecting the platform's utility for professional activities.

Lifestyle and productivity apps are equally popular, each holding around 3.90% of the share. Finance, medical, and sports categories are well-represented too, each with just over 3.5%. Personalization and communication apps are also notable at approximately 3.3% each.

Health and fitness and photography apps are chosen by around 3% of users, while news and magazines and social apps are slightly less common at about 2.7%. Travel and local and shopping apps show moderate popularity, each with just over 2%.

The dataset also includes a variety of Genres, with tools being the most prevalent at 8.45%. Entertainment and education genres are also popular, with 6.07% and 5.35% respectively. Other genres such as business, productivity, lifestyle, finance, medical, and sports show a healthy presence, each ranging from 3% to 4.5%.

In summary, the Android app market is diverse, with a strong focus on family-oriented and game apps, followed by practical tools and business applications. There is also a wide range of other genres that cater to various interests and needs, from personalization to health and fitness, indicating a well-rounded selection for users.

# Most Popular Apps by Genre on the App Store

To determine the most popular genres (those with the highest user base), we can compute the mean number of installations per genre. This data is readily available in the Installs column of the Google Play data set, but not in the App Store data set. We will use the total user ratings count as an alternative measure, which is accessible in the rating_count_tot field.

In [57]:
genres_ios = freq_table(ios_english_free, 11)

In [68]:
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_english_free:
        genre_app = app[11]
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
    avg_number = total/len_genre
    print(genre, ':', avg_number)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22886.36709539121
Music : 57326.530303030304
Reference : 79350.4705882353
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 19156.493670886077
Travel : 28243.8
Shopping : 27230.734939759037
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16815.48
Entertainment : 14195.358565737051
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 46384.916666666664
Finance : 32367.02857142857
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


In App Store, Navigation apps stand out with the highest average user ratings at 86,090.33, closely followed by Social Networking and Reference apps with 71,548.35 and 74,942.11 respectively. Music and Weather apps also perform well, averaging 57,326.53 and 52,279.89. The popular Games category has a substantial average of 22,788.67, while Photo & Video, Health & Fitness, and Travel maintain moderate ratings. Food & Drink and Finance apps show solid averages above 30,000, contrasting with Education and Business apps, which are among the lowest at 7,003.98 and 7,491.12. The Medical genre has the least favorable average rating, sitting at just 612, indicating a significant disparity in user satisfaction across different app genres.

In [103]:
for row in ios_english_free:
    if row[11] == 'Navigation':
        print(row[1],':', row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


However, when looking more deeply into Navigation, it seems that the high average user rating is caused by the Waze app.

In [105]:
for row in ios_english_free:
    if row[11] == 'Social Networking':
        print(row[1], ':', row[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

Facebook leads with a staggering 2,974,676 ratings, indicating its vast popularity. Pinterest and Skype for iPhone follow with 1,061,624 and 373,519 ratings respectively, showcasing their significant but lesser user engagement compared to Facebook. The data reflects a wide disparity in user interaction across different social platforms.

In [107]:
for row in ios_english_free:
    if row[11] == 'Reference':
        print(row[1],':', row[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


The user ratings for a 'Referemce' apps, the Bible app stands out with 985,920 ratings, reflecting its widespread use and engagement. Educational and utility apps like Dictionary.com and Google Translate also show strong user interaction with 200,047 and 26,786 ratings respectively.

# Most Popular Apps by Genre on Google Play

The information about the number of installs in the Android data set is not exact, but grouped into buckets mostly in form of strings - '1,000+', '10,000+', etc.

In [69]:
genres_android = freq_table(android_english_free, 1)

In [83]:
for genre in genres_android:
    total = 0
    len_genre = 0
    for app in android_english_free:
        genre_app = app[1]
        if genre_app == genre:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_genre += 1
    avg_number = total/len_genre
    print(genre, ':', avg_number)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8814199.78835979
BUSINESS : 1712290.1474201474
COMICS : 832613.8888888889
COMMUNICATION : 38590581.08741259
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1360598.042253521
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1446158.2238372094
GAME : 15544014.51048951
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3650602.276666667
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10830251.970588235
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5145550.285714285
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_MAGAZ

The average number of installs across various app categories shows a diverse range of user interest. Communication apps lead with the highest average installs, followed by video players and social media apps, indicating a strong preference for connectivity and media consumption. Productivity and gaming apps also show high install averages, reflecting their utility and entertainment value. In contrast, categories like medical, beauty, and events have the lowest average installs, suggesting more niche markets. Overall, the data reveals a significant inclination towards apps that facilitate social interaction, information access, and personal entertainment.

To analyze further, we have a look at the apps in the most popular categories.

In [109]:
for app in android_english_free:
    category_app = app[1]
    n_installs = app[5]
    app_name = app[0]
    if category_app == 'COMMUNICATION' and (n_installs == '1,000,000,000+' 
                                            or n_installs == '500,000,000+' 
                                            or n_installs == '100,000,000+'):

        print(app_name,':',n_installs)

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Communication apps like WhatsApp, Messenger, and Gmail dominate the Android app market with over a billion installs each.

In [110]:
for app in android_english_free:
    category_app = app[1]
    n_installs = app[5]
    app_name = app[0]
    if category_app == 'VIDEO_PLAYERS' and (n_installs == '1,000,000,000+' 
                                            or n_installs == '500,000,000+' 
                                            or n_installs == '100,000,000+'):

        print(app_name,':',n_installs)

YouTube : 1,000,000,000+
Motorola Gallery : 100,000,000+
VLC for Android : 100,000,000+
Google Play Movies & TV : 1,000,000,000+
MX Player : 500,000,000+
Dubsmash : 100,000,000+
VivaVideo - Video Editor & Photo Movie : 100,000,000+
VideoShow-Video Editor, Video Maker, Beauty Camera : 100,000,000+
Motorola FM Radio : 100,000,000+


YouTube and Google Play Movies & TV lead platforms dominate the video category with over a billion installs and showcasing their dominance in video streaming and content sharing.

In [111]:
for app in android_english_free:
    category_app = app[1]
    n_installs = app[5]
    app_name = app[0]
    if category_app == 'SOCIAL' and (n_installs == '1,000,000,000+' 
                                            or n_installs == '500,000,000+' 
                                            or n_installs == '100,000,000+'):

        print(app_name,':',n_installs)

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Pinterest : 100,000,000+
Google+ : 1,000,000,000+
Badoo - Free Chat & Dating App : 100,000,000+
Tango - Live Video Broadcast : 100,000,000+
Instagram : 1,000,000,000+
Snapchat : 500,000,000+
LinkedIn : 100,000,000+
Tik Tok - including musical.ly : 100,000,000+
BIGO LIVE - Live Stream : 100,000,000+
VK : 100,000,000+


Facebook, Instagram, and Google+ clearly leads the Social category with over a billion installs each.

In [112]:
for app in android_english_free:
    category_app = app[1]
    n_installs = float(app[5].replace("+",'').replace(',',''))
    app_name = app[0]
    if category_app == 'PRODUCTIVITY' and n_installs > 10000000:
        print(app_name,':',n_installs)


Microsoft Word : 500000000.0
Microsoft Outlook : 100000000.0
Microsoft OneDrive : 100000000.0
Calculator - unit converter : 50000000.0
Microsoft OneNote : 100000000.0
Google Keep : 100000000.0
ES File Explorer File Manager : 100000000.0
Dropbox : 500000000.0
Google Docs : 100000000.0
Advanced Task Killer : 50000000.0
Microsoft PowerPoint : 100000000.0
My Airtel-Online Recharge, Pay Bill, Wallet, UPI : 50000000.0
Samsung Notes : 100000000.0
Do It Later: Tasks & To-Dos : 50000000.0
Verizon Cloud : 50000000.0
myAT&T : 50000000.0
SwiftKey Keyboard : 100000000.0
MEGA : 50000000.0
File Browser by Astro (File Manager) : 50000000.0
Google Drive : 1000000000.0
Adobe Acrobat Reader : 100000000.0
Google Sheets : 100000000.0
Microsoft Excel : 100000000.0
WPS Office - Word, Docs, PDF, Note, Slide & Sheet : 100000000.0
Google Slides : 100000000.0
ColorNote Notepad Notes : 100000000.0
Evernote – Organizer, Planner for Notes & Memos : 100000000.0
Google Calendar : 500000000.0
Cloud Print : 500000000.0

In [113]:
for app in android_english_free:
    category_app = app[1]
    n_installs = float(app[5].replace("+",'').replace(',',''))
    app_name = app[0]
    if category_app == 'PRODUCTIVITY' and (n_installs < 10000000
                                           and n_installs > 1000000):
        print(app_name,':',n_installs)

My Claro Peru : 5000000.0
Archos File Manager : 5000000.0
lifebox : 5000000.0
Yandex.Disk : 5000000.0
Content Transfer : 5000000.0
Microsoft Translator : 5000000.0
Trello : 5000000.0
The Maner : 5000000.0
Note Everything : 5000000.0
Keep My Notes - Notepad & Memo : 5000000.0
Ultimate Notepad : 5000000.0
Calendar Widget Month + Agenda : 5000000.0
DigiCal Calendar Agenda : 5000000.0
Business Calendar 2 : 5000000.0
G Cloud Backup : 5000000.0
MediaFire : 5000000.0
EBookDroid - PDF & DJVU Reader : 5000000.0
PDF Viewer & Book Reader : 5000000.0
To Do List : 5000000.0
Clean My Android : 5000000.0
Ginger Keyboard - Emoji, GIFs, Themes & Games : 5000000.0


In the productivity category, which is also popular on the iOS App Store, the areas for further exploration in app development could be calendars, notes, or to-do lists. These seem to be quite popular, but not yet oversaturated.

# Summary

When examining the app preferences across the Android and iOS platforms, we observe both distinct differences and notable similarities. The Android market shows a strong preference for family-oriented apps, leading with 18.91%, followed by games and tools, which are also popular categories. In contrast, the iOS market is heavily skewed towards games, occupying a significant 58.16% of the total app distribution, with entertainment and photo & video apps also being prominent.

Both platforms demonstrate a clear interest in entertainment-related apps. Games are highly popular on iOS, while on Android, they are the second most popular category. Entertainment genres such as photo & video on iOS and tools for productivity on Android indicate that users on both platforms value apps that provide leisure and enhance personal efficiency.

The major difference lies in the degree of concentration; the iOS market is more concentrated around games and entertainment, whereas the Android market is more evenly distributed across various categories, including family, tools, and business apps. This suggests that Android users have a broader range of interests when it comes to app usage, while iOS users are more focused on entertainment and media creation.

In summary, while there are differences in the concentration of app categories between Android and iOS, with Android offering a more diverse range and iOS focusing on entertainment, both platforms cater to users’ desires for entertainment and productivity.