# Analysis of non-paid applications from 2018 in App Store and Android to make an add profitable free application.
### Alejandro Hernández Martín 【龙乐杰】.

The project’s primary aim is to identify mobile app profiles with high profit potential in the App Store and Google Play markets. Operating as data analysts within a company specializing in Android and iOS development, our task is to provide insight that enables developers to make decisions based on quantifiable patterns, rather than assumption.

The company's development model focuses exclusively on applications that are free to download and install. Revenue is primarily generated through in-app advertisements. As a result, profitability is closely tied to user volume. Therefore, our analytical objective is to examine existing data to determine which categories or features are more likely to attract and retain users.

## Introduction：

 As of September 2018, the App Store hosted approximately 2 million iOS applications, while Google Play offered around 2.1 million Android applications. The acquisition and processing of data from over 4 million apps represents a substantial investment of time and financial resources. Therefore, a sample-based analysis is both pragmatic and efficient. To minimize resource expenditure, our initial step will involve identifying pre-existing, publicly available datasets.

Two such datasets have been located: one for Android, containing data on ~10,000 apps from Google Play, collected in August 2018; and one for iOS, with ~7,000 App Store entries. These datasets will serve as the foundation for developing a methodological framework aimed at identifying viable market opportunities—specifically, for ad-supported apps with high profit potential.

Publicly available datasets from Kaggle were used for both Android and iOS platforms:

L. Gupta, Google Play Store Apps, Feb 2019. [Online]. Available: https://www.kaggle.com/lava18/google-play-store-apps

Ramanathan Perumal, Mobile App Store (7200 apps), Jul 2017. [Online]. Available: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/data

----------------------------------------------------------------------------------
# 2018年 App Store 和 Android 的免费应用分析。

这个项目的主要目标是找出在 App Store 和 Google Play 市场中有较高盈利可能的手机应用类型。我们在一家公司工作，这家公司专门开发 Android 和 iOS 应用，我们的工作是做数据分析，帮助开发者根据可以计算的数据模式来做决定，而不是只依靠猜测。

公司的开发模式只关注可以免费下载和安装的应用。收入主要来自应用中的广告。因此，盈利情况和用户数量有很大的关系。我们的分析目标是检查已有的数据，看看哪些类型或者功能更容易吸引用户并让他们继续使用。

# 序言：

   截至2018年9月，App Store上大约有200万个iOS应用，而Google Play上有大约210万个Android应用。收集并处理超过400万个应用的数据需要花费大量的时间和金钱。因此，使用样本进行分析是比较实际也更高效的方法。为了节省资源，我们的第一步是寻找已经公开的可用数据。

我们已经找到了两个这样的数据集：一个是关于Android的，包含大约一万个来自Google Play的应用数据，收集时间是2018年8月；另一个是关于iOS的，包含大约七千个App Store上的应用。这些数据将作为我们建立分析方法的基础，目标是发现市场中有可能赚钱的机会，特别是那些通过广告获得高收入的应用。

# Explore Data Function

To streamline inspection of both datasets, we shall define a reusable function 'explore_data()' to display rows in a format optimized for readability. An optional parameter will be included to reveal the total number of rows and columns, enhancing structural comprehension of any dataset processed.

为了更方便地查看这两个数据集，我们先写一个叫 explore_data() 的函数，这样可以反复使用，更清楚地看每一行的内容。这个函数还有一个选项，可以显示数据集的行数和列数，帮助我们更好地了解数据的结构。

In [3]:
# The function 'explore_data' enables controlled visualization of dataset content. 
# 函数 'explore_data' 允许对数据集内容进行有控制的可视化。

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\\n')
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
from csv import reader

# Open Android dataset. | 打开 Android 数据集。
opened_file = open('googleplaystore.csv', encoding = 'utf8')
read_file = reader(opened_file)
android = list(read_file) # Create a list from the dataset. | 从数据集中创建一个列表。 
android_header = android[0] # Slice the header from the list. | 从列表中切下表头。
android = android[1:] # Slice the dataset without the header. | 切下不包含表头的数据集。

# Open Apple Store dataset. | 打开 Apple Store 数据集。
opened_file = open('AppleStore.csv', encoding = 'utf8')
read_file = reader(opened_file)
ios = list(read_file) # Create a list from the dataset. | 从数据集中创建一个列表。
ios_header = ios[0] # Slice the header from the list. | 从列表中切下表头。
ios = ios[1:] # Slice the dataset without the header. | 切下不包含表头的数据集。

print(android_header) 
explore_data(android, 10472, 10473)

print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
\n
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


# Deleting Wrong Data. | 删除错误的数据。

The Google Play dataset includes a discussion section in which a known anomaly is found, row 10472. This row refers to the app 'Life Made WI-Fi Touchscreen Photo Frame', which displays a rating value of 19. Given that the maximum valid rating for any Google Play app is 5, the value is evidently erroneous. Therefore, this row must be excluded from the dataset to maintain data integrity.

在 Google Play 的数据集中，有一个讨论区，其中一条讨论指出了第 10472 行的数据有问题。这一行对应的应用是'Life Made WI-Fi Touchscreen Photo Frame'，它的评分是 19。因为 Google Play 应用的最高评分是 5，所以这个评分明显不正确。为了保证数据的正确性，我们必须删除这一行

In [6]:
# Delete wrong data. | 删除错误的数据。

print(len(android))
del android[10472]  # Positively do not run this more than once. | 千万不要运行超过一次。
print(len(android)) # Must be 10840. | 结果必须是10840。  

10841
10840


# Removing duplicate datapoints | 删除重复的数据

To maintain dataset accuracy, we remove duplicate entries—retaining only one record per app. The variation in review counts across entries suggests they were collected at different times. By selecting the version with the highest number of reviews, we retain the most recent instance, while removing the others as outdated duplicates.

为了保证数据的准确性，我们要删除重复的记录，只保留每个应用的一条数据。评论数量的不同说明这些数据是在不同时间收集的。通过选择评论数最多的记录，我们可以保留最新的数据，删除其他重复的条目。

In [8]:
# Separates replicated data from unique data. | 把重复的数据和唯一的数据分开 

unique_data = []
duplicate_data = []

for app in android:
    name = app[0]
    if name in unique_data:
        duplicate_data.append(name)
    else:
        unique_data.append(name)

print('Duplicate elements: ', len(duplicate_data))
print('Examples of duplicate data found: \n\n', duplicate_data[:15])

Duplicate elements:  1181
Examples of duplicate data found: 

 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [9]:
# Assign all reviews data to a list. | 把所有评论的数据放进一个列表中。

reviews_max = {}

for i in android:
    name = i[0]
    n_reviews = float(i[3])

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max))

9659


In [10]:
# Assign all the non repetit data to a list. | 把所有不重复的数据放进一个列表中。

android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3]) # Row for reviews. | 评论的数据行。
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)

explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
\n
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']
\n
['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']
\n
Number of rows: 9659
Number of columns: 13


# Removing Non-English Entries | 删除非英文应用

Given that the project aims to target the English-speaking market, it is necessary to remove apps whose names are not in English. The rule applied identifies non-English titles by detecting names that contain more than three characters outside the English alphabet. This method is efficient because it filters out truly non-English entries while preserving English titles that may include special characters.

因为这个项目的目标是面向讲英语的市场，所以我们需要删除名字不是英文的应用。我们使用的规则是：如果一个应用的名字中有三个以上不是英文的字母，就认为它不是英文。这个方法很有效，它可以帮助我们删除真正的非英文应用，但不会误删一些包含特殊符号但仍是英文的应用名字。

In [12]:
# This function inspect if non english characters are in the name of the app.
# 函数检查应用的名字里有没有非英语的字符。
def is_english(string):
    non_ascii = 0

    for character in string:
        if ord(character) > 127:
            non_ascii += 1 # Count the number of non english characters detected.  | 统计检测到的非英语字符的数量。
            
    if non_ascii > 3: # If there's more than 3 non english characters return False. | 如果有超过三个非英语字符，就返回 False。
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


In [13]:
# Create two empty list to append only english name apps.  | 创建两个空列表，用来添加只有英语名字的应用。
android_english = []
ios_english = []

# Loops to create a 'clean' list of android apps with only english names. | 在Android中用循环创建一个只有英语名字的安卓应用的“干净”列表。
for app in android_clean: 
    name = app[0]
    if is_english(name):
        android_english.append(app) # Here we are adding 'app' into our english list  | 在这里我们把'app'添加进英语列表中
        
# Loops to create a 'clean' list of apple apps with only english names. | 用循环创建一个只有英语名字的苹果应用的“干净”列表。
for app in ios:
    name = app[1]

    if is_english(name):
        ios_english.append(app)

print(len(android_english)) # Total: 9614 | 总数： 9614
print(len(ios_english)) # Total: 6183 | 总数：6183

9614
6183


# Isolating Free Apps. | 隔离免费应用程序。

The objective only build apps that are free to download and install, and the main source of revenue consists of in-app ads. Data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both datasets.

我们的目标是只开发可以免费下载和安装的应用程序，主要收入来源是应用内广告。数据集中包含了免费和非免费的应用程序，我们需要将免费应用程序单独挑出来进行分析。下面我们把两个数据集中的免费应用程序都挑选出来了

In [15]:
# Create two empty list to append only free apps. | 创建两个空列表，只加入免费的应用程序。
android_final = []
ios_final = []

for app in android_english: # Loops through the 'Type' column to append free apps.  | 遍历 'Type' 列来添加免费的应用
    free = app[7]
    if free == '0':
        android_final.append(app)

for app in ios_english: # Loops through the 'price' column to append free apps. | 遍历 'price' 列来添加免费的应用。
    free = app[4]
    if free == '0.0':
        ios_final.append(app)

print(len(android_final)) # Total: 8864  | 总数：8864
print(len(ios_final)) # Total: 3222 | 总数：3222

8864
3222


# Frequency Table: Most Common Apps Per Genre | 各类别中最常见应用的频率表

In order to inspect the results of the clean datasets. The next self-builded functions are presented:

'freq_table()': Generates a frequency table and store it inside of a dictionary. The name of the app as the key and the percentage as the value.
'display_table()': Traslates the dictionary into a list to simplify observation through a 'print()' function.

为了检查清理后的数据集结果，下面是自定义的两个函数：

freq_table()：生成一个频率表，并将其存储在字典中，应用名称作为键，百分比作为值。
display_table()：将字典转换成列表，方便通过 print() 函数查看。

In [17]:
# Most common apps by genre function (shows results in percentage). | 按类型分类，显示最常见的应用程序（以百分比表示）。

def freq_table(dataset, index):
    table = {} # Save the counting of the dataset. | 这里保存数据的数量统计。
    total = 0 # Count the total in order to calculate average. | 这里计算总数，用来算平均值。

    for row in dataset:
        total += 1
        value = row[index]
        if value in table: # Assign new values or plus values that already have been added. | 给新的值，或加上已经加过的值。
            table[value] += 1
        else:
            table[value] = 1

    table_percentages = {}
    
    for key in table: # Calculates the average of each genre. | 计算每个类型的平均值。
        percentage = (table[key] / total) * 100 
        table_percentages[key] = percentage

    return table_percentages

def display_table(dataset, index): # Displays the dictionary table as list of sorted tuples. | 把字典表显示为排序好的元组列表
    table = freq_table(dataset, index)
    table_display = []

    for key in table:              # Assigns the keys of the dictionary to the 'table_display' list. | 把字典的键放到 'table_display' 列表中。
        key_tuple = (table[key], key) 
        table_display.append(key_tuple)

    table_sorted = sorted(table_display, reverse = True) # Displays values of 'table_display' in descendant order. | 把 'table_display' 的值按从大到小的顺序显示。
    for entry in table_sorted:
        print(entry[1], ' : ', round(entry[0], 2), '%')

# Anlysis of Results: App Store

## Top 3 most common genres:

- Games  :  58.16 %
- Entertainment  :  7.88 %
- Photo & Video  :  4.97 %

## What other patterns are observable?

Entertainment surpasses all other app categories in popularity. However, the sheer volume and variety of gaming applications render the sector extremely competitive. To introduce a viable product into this space, further research is essential.

Popularity within app markets is typically measured by user count. This can be estimated through the average number of installs per genre. However, the dataset is skewed—most apps struggle to surpass 10,000 reviews, while a few disproportionately successful apps inflate the category averages. These outliers create a visibility bias, limiting meaningful insights unless adjusted for.

Once the dataset is updated to reduce the weight of such outliers, an interesting niche appears: book-related applications, particularly within religious content.

## Recommendation for an app profile for the App Store market: Expanded answer and know how process avaiable in the following cells.

A book-based app—preferably with religious content—presents a strategic opportunity. For example, an application centered around the Qur'an may include:

- Daily inspirational quotes from the text.
- A complete audio version.
- Interactive quizzes for engagement.
- An integrated dictionary to assist comprehension.

These features are designed not only to provide value but to retain users within the app ecosystem.

------------------------------------------------------------------------------------------------------
# 结果分析：App Store
## 最常见的前三个类别：
- 游戏：58.16%
- 娱乐：7.88%
- 照片与视频：4.97%

## 还能看出什么样的规律？
在所有应用类别中，娱乐类的受欢迎程度仅次于游戏类。但是，由于游戏的数量多、种类广，这个市场的竞争非常激烈。想要推出一个能被接受的产品，还需要进一步的研究。

应用市场中的受欢迎程度通常是根据用户数量来判断的。这可以通过每个类别的平均下载量来估算。但是，这个数据有偏差——大多数应用很难超过一万条评论，而少数特别成功的应用则把平均值拉高了。这些“特别的应用”会造成“看起来很受欢迎”的错觉，影响真实的理解，除非对数据进行调整。

一旦更新数据，减小这些“特别应用”的影响，一个有意思的小领域就会出现：与书籍有关的应用，特别是宗教类内容的。

## 对 App Store 市场的应用建议（详细说明在下面的表格中）：
一个以书籍为基础的应用，最好是宗教内容的，有很好的发展机会。例如，一个以《古兰经》为主题的应用，可以包括以下内容：

- 每日一句启发性的名言。
- 完整的音频版本。
- 用来加强记忆的互动测试题。
- 一个帮助理解的内置字典。

这些功能不只是为了提供有用的内容，还能帮助用户持续使用这个应用。

In [19]:
# Visualize most common genres | 可视化最常见的应用类型。

print(ios_header) 
print("Most common applications by genre in App Store:\n")
display_table(ios_final, -5) # Displays Frequency Table of 'prime_genre' column | 显示 'prime_genre' 列的频率表

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
Most common applications by genre in App Store:

Games  :  58.16 %
Entertainment  :  7.88 %
Photo & Video  :  4.97 %
Education  :  3.66 %
Social Networking  :  3.29 %
Shopping  :  2.61 %
Utilities  :  2.51 %
Sports  :  2.14 %
Music  :  2.05 %
Health & Fitness  :  2.02 %
Productivity  :  1.74 %
Lifestyle  :  1.58 %
News  :  1.33 %
Travel  :  1.24 %
Finance  :  1.12 %
Weather  :  0.87 %
Food & Drink  :  0.81 %
Reference  :  0.56 %
Business  :  0.53 %
Book  :  0.43 %
Navigation  :  0.19 %
Medical  :  0.19 %
Catalogs  :  0.12 %


# Recommendation Analysis: Know-How Process | 推荐分析：操作过程

Upon examination, navigation apps exhibit the highest average number of user reviews. However, this metric is significantly biased by the presence of two dominant entities—Waze and Google Maps—which collectively contribute nearly half a million reviews. Their disproportionate influence elevates the category average, and thus, must be considered an outlier factor in strategic assessment.

经过分析后可以看出，导航类应用的平均用户评价数最高。不过，这个数据受到两个主要产品——Waze 和 Google 地图——的明显影响。这两个应用一共占了将近五十万个评价，对整个类别的平均数有很大的提高。因此，在进行策略评估时，这种不平衡的情况应被当作一个特殊因素来看待。

In [21]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings | 打印名称和评级数量 

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The same statistical distortion is observable within the social networking and music categories. In the former, platforms such as Facebook, Pinterest, and Skype elevate the average number of ratings disproportionately. In the latter, apps like Pandora, Spotify, and Shazam perform a similar function.

The objective is to identify genuinely popular genres. However, categories like navigation, social networking, and music may appear more successful than they are in aggregate. This illusion arises from a small number of apps accumulating hundreds of thousands of reviews, while the majority struggle to surpass 10,000.

A more accurate assessment would involve removing these outlier apps and recalculating the category averages. Reference apps report an average of 74,942 reviews, but this figure is inflated by outliers such as the Bible and Dictionary.com, which bias the genre's perceived popularity.

---------------------------------------------------
在社交和音乐这两个类别中，也可以看到同样的数据问题。前者中，像 Facebook、Pinterest 和 Skype 这样的应用让评分的平均数大大提高；后者中，像 Pandora、Spotify 和 Shazam 的作用也是一样。

我们的目标是找出真正受欢迎的应用类型。但是，像导航、社交和音乐这些类别，可能看起来比实际更受欢迎。这是因为少数几个应用有几十万条用户评价，而大多数应用却很难超过一万条。

为了有更准确的结果，我们可以去掉这些特别有名的应用，然后重新计算每个类别的平均数。例如，工具类应用的平均评价数量是 74,942，但这个数字是被一些非常特别的应用，比如 Bible 和 Dictionary.com 提高了，它们让这个类别看起来比实际更受欢迎。

In [23]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


While limited in scope, this niche reveals strategic potential. By selecting another widely recognized book and developing it into an app enhanced with auxiliary features—beyond the raw text—we increase user engagement. These features may include:

- Daily quotes from the book.
- An audio version.
- Interactive quizzes.
- An integrated dictionary to assist comprehension.

This approach aligns with a broader observation: the App Store is heavily populated by entertainment-driven ("for-fun") applications. Market saturation in this category implies that practical, utility-based apps may differentiate more effectively and thus possess greater potential to gain visibility.

Additional genres showing moderate popularity. However, most fall short of project alignment:

- Weather apps: Low in-session time limits ad revenue potential. Furthermore, reliable real-time weather data often requires access to paid APIs, raising cost barriers.

- Food & Drink apps: Dominated by major food chains (e.g., Starbucks, McDonald’s). Developing competitive alternatives would require physical service infrastructure—logistically and financially beyond project capacity.

- Finance apps: Involve specialized functions like payments, transfers, and account management. These require domain-specific expertise. Hiring qualified personnel exceeds project budget.

Therefore, a book-based application—enhanced with educational or reflective features—remains a viable and resource-aligned development path.

-----------------
虽然这个方向不大，但还是有一定的战略可能。我们可以选择另一本有名的书，把它做成一个应用，并加上一些特别的功能，不只是提供书的内容，这样可以让更多用户喜欢使用这个应用。这些功能可以包括：

- 每天的名言。
- 有声音的版本。
- 可以回答的问题。
- 帮助理解的词典。

这种方法和我们看到的一个现象一致：App Store 里大部分都是为了娱乐的应用。这种类型的应用已经非常多了，所以一个更实用的应用可能更容易被人注意，也更有机会成功。

还有一些类别也有中等的欢迎程度，但大多数不太适合我们的项目目标：

- 天气类应用：人们使用时间很短，很难通过广告赚钱。另外，要得到可靠的天气信息常常需要连接到要收费的 API，增加了开发的花费。
- 食物和饮料类应用：很多是大公司做的，比如 Starbucks 和 McDonald’s。如果想做一个有竞争力的应用，需要有自己的餐厅和送餐服务，这不在我们的能力范围内。
- 金融类应用：和付款、转账、管理钱有关系。这类应用需要特别的知识，找专业的人来做会花很多钱，也超过了我们的预算。

所以，一个以书为中心的应用，加上一些学习或思考的功能，还是一个可以继续做的方向，也在我们能做到的范围内。

In [25]:
genres_ios = freq_table(ios_final, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    print(genre, ' : ', avg_n_ratings)


Social Networking  :  71548.34905660378
Photo & Video  :  28441.54375
Games  :  22788.6696905016
Music  :  57326.530303030304
Reference  :  74942.11111111111
Health & Fitness  :  23298.015384615384
Weather  :  52279.892857142855
Utilities  :  18684.456790123455
Travel  :  28243.8
Shopping  :  26919.690476190477
News  :  21248.023255813954
Navigation  :  86090.33333333333
Lifestyle  :  16485.764705882353
Entertainment  :  14029.830708661417
Food & Drink  :  33333.92307692308
Sports  :  23008.898550724636
Book  :  39758.5
Finance  :  31467.944444444445
Education  :  7003.983050847458
Productivity  :  21028.410714285714
Business  :  7491.117647058823
Catalogs  :  4004.0
Medical  :  612.0


# Anlysis of Results: Android

## Top 3 most popular genres:
Based on number of installs:
- COMMUNICATION : 38,456,119
- VIDEO_PLAYERS : 24,727,872
- SOCIAL : 23,253,652

# Books apps installs:

- BOOKS_AND_REFERENCE : 8,767,811

## What other patterns are observable?

*Communication*, *video players*, and *social media* categories surpass all others in observed popularity. However, these segments introduce statistical distortion, as they contain applications with billions of installs. For example: WhatsApp in social media, YouTube in video players, and Skype in communication. The presence of such dominant platforms biases the dataset. As such, entering these markets would require further investigation to assess viability under realistic competitive conditions.

The game genre also demonstrates high popularity. However, prior analysis revealed significant saturation in this sector, reducing the likelihood of success without substantial innovation or marketing investment. Therefore, we shift our focus toward alternative recommendations.

The books and reference genre offers promising metrics, averaging 8,767,811 installs. This segment warrants deeper exploration, particularly given its potential compatibility with the App Store—a platform where utility-based and informational apps can thrive.

Our goal remains to identify an app category with cross-platform profitability potential. In this context, books and reference stand out as a viable candidate for development on both Google Play and the App Store.

## Recommendation for an app profile for the App Store market: Expanded answer and know how process avaiable in the following cells.

A book-based app—preferably with religious content—presents a strategic opportunity. For example, an application centered around the Qur'an may include:

- Daily inspirational quotes from the text.
- A complete audio version.
- Interactive quizzes for engagement.
- An integrated dictionary to assist comprehension.

These features are designed not only to provide value but to retain users within the app ecosystem.

-------------------------
# 分析结果：安卓

## 最受欢迎的前三个类型：
根据安装数量：

- COMMUNICATION ：38,456,119
- VIDEO_PLAYERS ：24,727,872
- SOCIAL ：23,253,652

## 图书类应用安装数量：
BOOKS_AND_REFERENCE ：8,767,811

## 可以看到的其他情况：
*Communication*、*video players* 和 *social media* 这些类型比其他的更受欢迎。不过，因为这些类型中有一些非常大的应用（比如：WhatsApp 属于社交类，YouTube 属于视频类，Skype 属于通信类），所以它们的数据有一定的偏差。这些特别受欢迎的应用影响了整体的平均值。如果想进入这些市场，需要进一步分析，看在现在的情况中有没有真正的机会。

游戏类的应用也很受欢迎，但之前的分析已经发现，这个市场已经很满了。如果没有特别的想法或者很强的市场推广，很难成功。因此，我们应该考虑其他类型。

图书与参考类的应用有不错的表现，平均安装量是 8,767,811。这个类型可以继续研究，特别是它可能在 App Store 上也有好的表现。因为 App Store 更适合那种有用的、和学习有关的应用。

我们的目标是找出一种可以在两个市场（Google Play 和 App Store）都能赚钱的应用类型。从目前来看，图书类是一个很好的选择。

## 推荐的 App Store 应用类型：完整回答和分析过程请看下面的单元格。
一个以图书为中心的应用（最好是和宗教有关的）有一定的战略意义。比如，可以做一个关于 Qur’an 的应用，里面可以包括：

- 每天的名言句子
- 完整的有声音的版本
- 互动的问答题，提高使用兴趣
- 一个帮助理解的词典

这些功能不仅有用，也可以让用户在应用里停留更久。

# Recommendation Analysis: Know-How Process

On average, communication apps report the highest install count—38,456,119 per app. However, this figure is significantly skewed by a small subset of applications with over one billion installs—notably WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts. Additional distortion is introduced by several other apps within the category, each exceeding 100 million or even 500 million installs, further elevating the mean beyond a representative value.

-----------------

# 推荐分析：方法过程

平均来看，communication 类型的应用安装数量是最高的——每个应用大约有 38,456,119 次安装。但是，这个数字被一些非常大的应用影响很大，这些应用的安装次数超过了十亿，比如 WhatsApp、Facebook Messenger、Skype、Google Chrome、Gmail 和 Hangouts。另外，在这个类型里，还有几个应用的安装数超过了一亿，甚至五亿，这些也让平均数变得更高，不太能代表这个类型的真实情况。

In [56]:
categories_android = freq_table(android_final, 1)

# List to store tuples of (category, average installs).
avg_installs_by_category = []

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',', '').replace('+', '')
            total += float(n_installs)
            len_category += 1

    avg_n_installs = total / len_category
    avg_installs_by_category.append((category, avg_n_installs))

# Sort the list by average installs in descending order.
sorted_avg_installs = sorted(avg_installs_by_category, key=lambda x: x[1], reverse=True)

# Print the results.
for category, avg in sorted_avg_installs:
    print(category, ':', avg)


COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

 Remove all the 'communication' apps that have over 100 million installs, the average would be reduced roughly ten times:

 --------------
 如果去掉所有安装数超过一亿的 communication 应用，平均数大概会变小十倍左右:

In [59]:
under_100_m = []

for app in android_final:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

The video players category exhibits a similar pattern, ranking second with an average of 24,727,872 installs. This metric is primarily driven by a small set of dominant applications, including YouTube, Google Play Movies & TV, and MX Player. The pattern replicates across other genres:

- Social apps, led by giants such as Facebook, Instagram, and Google+.
- Photography apps, featuring Google Photos and high-use editing tools.
- Productivity apps, where Microsoft Word, Dropbox, Google Calendar, and Evernote define the landscape.

In each case, the average install count is inflated by a handful of large-scale platforms. The implication: these genres may appear more viable than they are, due to outlier influence. Additionally, market entry is constrained by the entrenched dominance of these established entities.

The game genre also reports high engagement, yet prior analysis indicates market saturation, reducing the likelihood of successful entry without significant differentiation. As such, alternate recommendations are warranted. One notable candidate is the books and reference category, with a stable average of 8,767,811 installs. This genre is promising, particularly given its compatibility with the App Store's ecosystem, which favors practical, information-based applications.

Proceeding forward, we will review select apps from this genre and evaluate their install counts to refine strategic positioning.

----------------------------
视频类应用也有类似的情况，排在第二位，平均安装数量为 24,727,872。这个数字主要是因为一些非常受欢迎的应用，比如 YouTube、Google Play Movies & TV 和 MX Player。

这种情况在其他类型中也出现：

- 社交类应用，由一些大的公司主导，比如 Facebook、Instagram 和 Google+。
- 拍照类应用，比如 Google Photos 和一些常用的图片编辑工具。
- 办公类应用，有 Microsoft Word、Dropbox、Google Calendar 和 Evernote 等。

在这些类型中，平均安装数都被少数几个非常大的应用提高了。结果是：这些类型看起来好像更有市场，其实是受到了特别应用的影响。另外，因为这些公司已经有了很多用户，所以新的应用很难进入这个市场。

游戏类应用也有很多用户，但是我们之前的分析已经发现，这个市场已经非常满了。如果没有特别的想法或者新的设计，成功的机会不高。因此，我们需要找别的推荐方向。一个值得注意的类型是 图书与参考类，它的平均安装数量比较稳定，为 870万。这个类型很有发展机会，特别是在 App Store 上，因为这个平台更喜欢实用、有知识内容的应用。

接下来，我们会选择一些这个类型的应用，看看它们的安装数量，以帮助我们更好地做出战略决定。

In [62]:
for app in android_final:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

The niche appears to be dominated by ebook readers, digital libraries, and dictionary collections. Due to the existing saturation and functionality overlap, developing another standard reading or reference app would likely face substantial competition—thus, a low strategic yield.

However, an interesting pattern emerges: several apps in this category are centered around the Qur'an. This suggests that constructing an application around a single, widely recognized book—especially one with cultural or spiritual resonance can be profitable. Extending this logic, adapting a popular or contemporary book into an application may offer revenue potential across both Google Play and the App Store.

That said, the core market for basic "book viewer" apps is already saturated. To differentiate, the app must deliver augmented value beyond raw content. Recommended features may include:

- Daily curated quotes from the book.
- Audio narration of the full text.
- Interactive quizzes to encourage engagement.
- A discussion forum to foster community.
- Built-in dictionary to enhance reading flow.

These features convert the app from a passive reader into an interactive ecosystem, thereby increasing retention and monetization potential.

----------------------------------------------------

这个小市场主要是被电子书阅读器、数字图书馆和词典类应用占据的。因为这些功能已经很常见，市场也比较满，如果我们再做一个普通的阅读或参考类应用，很可能会遇到很多竞争，效果也不会很好。

但是，我们发现一个有意思的现象：这个类型中有一些应用是围绕 Qur'an 这个书做的。这说明，如果选择一本大家都知道的书，特别是有文化或宗教意义的书，做成一个应用，是有可能赚钱的。如果我们用这个想法，也可以选择一本流行的或新的书来开发应用，可能会在 Google Play 和 App Store 两个市场都有机会。

不过，现在的“阅读器”类应用已经太多了。为了让我们的应用不一样，必须加上一些特别的功能，不只是文字内容。可以加入的功能有：

- 每天从书中选出的句子
- 整本书的有声阅读
- 互动的问题，增加使用兴趣
- 一个可以讨论的地方，让用户交流
- 应用里有词典，帮助用户理解，不用换到别的应用

这些功能可以让一个普通的阅读应用变成一个互动的学习环境，这样用户会更愿意留下来，也更可能带来收入。

# Conclusions

Upon analyzing datasets from both the App Store and Google Play, with the objective of recommending a viable and profitable app profile for both platforms, the following insight has emerged:

The data supports a clear direction: transforming a popular book, ideally a contemporary title or religious book, into a dedicated mobile application presents a promising opportunity for monetization across both markets. However, due to the high saturation of basic library-style apps, success will depend on value-added features beyond static text.

To differentiate, the app should include:

- Daily quotes extracted from the book
- A complete audio version
- Interactive quizzes to enhance engagement
- A discussion forum for user-generated dialogue
- An embedded dictionary to minimize exit behavior

These features shift the product from a passive reading tool to an interactive, self-contained experience designed to maximize retention and user satisfaction.

# Recommendations For Further Research and Project Evolution
To refine and sustain the proposal:

- Web scraping tools should be deployed to keep the dataset current and detect shifting trends in app popularity.
- Additional research is recommended, particularly focused on English speakers that are also foreing language learners, to assess the demand for integrated dictionary tools in multi-language.
- Consider including a multi-language toggle, expanding the app’s reach to a global audience and increasing cross-market viability.

This approach ensures the application is not only data-driven but also adaptable to evolving user expectations.

---------------------
# 结论
我们分析了 App Store 和 Google Play 上的数据，希望推荐一个可以在两个平台都成功并赚钱的应用类型。分析后发现：

数据表明，把一本受欢迎的书（最好是现代书或宗教书）做成专门的手机应用，是一个很有前景的选择，可以在两个市场都赚钱。但因为现在“图书馆”类的普通应用已经非常多，想要成功，必须加入比单纯文字内容更多的特色功能。

为了让应用有区别，可以加上：

- 每天从书中选出的名言
- 整本书的有声版
- 互动问题，增加兴趣
- 可以让用户讨论的地方
- 应用内的词典，减少用户换到别的应用的情况

这些功能让产品不只是被动的阅读工具，而是一个互动的、完整的体验，这样用户会更愿意留下，也会更满意。

# 后续研究和项目发展的建议
为了让建议更完善，也让项目能继续发展：

- 建议使用 web scraping 工具，保持数据的最新，发现应用受欢迎程度的变化
- 推荐对英语为母语但也学外语的用户做进一步研究，了解多语言词典工具的需求
- 可以考虑加入多语言的选择，让应用能被更多国家的用户使用，增加全球市场的机会

这样的方法，不但保证应用以数据为基础，还能适应用户不断变化的需求。