# Profitable App Profiles for the App Store and Google Play Markets¶

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

我们这个项目的主要目的分析App Store 和Google Play 上免费的英文APP，找出适合我公司开发的APP类型，并通过在APP上投放的广告获取利润

## Oening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:


In [1]:
from csv import reader

### The Google Play data set ###
opened_file = open(r'C:\Users\Administrator\Desktop\test\googleplaystore.csv', encoding='utf8')# why use 'r' and ', encoding='utf8' ???
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file =open(r'C:\Users\Administrator\Desktop\test\AppleStore.csv', encoding='utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header =ios[0]
ios = ios[1:]

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    data_slice =dataset[start: end]
    for row in data_slice:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of Rows:', len(dataset))
        print('Number of Columns:',len(dataset[0]))

print(android_header)
print('\n')
        
explore_data(android, 0, 3, True)    


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of Rows: 10841
Number of Columns: 13


We can see that Google Play data set has 10841 Apps and 13 columns. At a quick glance the columns that can be useful are :App, Category, Rating, Type, Price, Content rating

Now let's look at App Store data set

In [3]:
print(ios_header)
print('\n')
explore_data(ios,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of Rows: 7197
Number of Columns: 16


We have 7197 App in the App Store data set, and the columns seem interesting are: track_name, currency, price, rating_count_ver, prime_genre

## Deleting Wrong Data

Check if there is an erro in android data set

In [4]:
for row in android:
    if len(row) != len(android_header):
        print(row)
        erro_row_index = android.index(row)
        print('erro row index:'+ str(erro_row_index))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
erro row index:10472


Remove the erro row in android data set

In [5]:
del android[erro_row_index]
print(android[10472])


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Check if there is a erro in App Store data set like in android data set

In [6]:
for row in ios:
    if len(row) != len(ios_header):
        print(row)
        print(ios.index(row))

## Removing Duplicate Entries

### Part 1

Find the duplicate entries

In [7]:
for app in android:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


In [8]:
duplicate_apps = []
unique_apps = []
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of Duplicate Apps:',len(duplicate_apps))
print('Examples of Duplicte Apps:',duplicate_apps[0:5])
print('Number of Unique Apps:',len(unique_apps))

Number of Duplicate Apps: 1181
Examples of Duplicte Apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']
Number of Unique Apps: 9659


def explore_duplicate_entries(dataset):
    duplicate_app = []
    unique_app = []
    for app in dataset:
        name = app[0]
        if name in dataset:
            duplicate_app.append(name)
        else:
            unique_app.append(name)    
            
print('Number of Duplicate Apps:',len(duplicate_app))
print('Examples of Duplicte Apps:',duplicate_apps[0:5])
    
explore_duplicate_entries(ios)   ---奇怪这个function 没用，输出的还是Android表的内容!

In [9]:
for app in ios:
    name = app[1]
    if name == 'Facebook':
        print(app)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


App Store中的app，很少可以忽略不计

In [10]:
duplicate_apps_ios = []
unique_apps_ios = []
for app in ios:
    name = app[1]
    if name in unique_apps_ios:
        duplicate_apps_ios.append(name)
    else:
        unique_apps_ios.append(name)

print('Number of Duplicate Apps:',len(duplicate_apps_ios))
print('Examples of Duplicte Apps:',duplicate_apps_ios[0:5])
print('Number of Unique Apps:',len(unique_apps_ios))

Number of Duplicate Apps: 2
Examples of Duplicte Apps: ['Mannequin Challenge', 'VR Roller Coaster']
Number of Unique Apps: 7195


### Part 2

Rather than remove the duplicate rows randomly, We'll only keep the row with highest number of reviews and remove other rows
删除重复项中review数量较小的app，新建字典 reviews_max中仅保留重复项中review 最高的app 名称和其相应的review数

In [11]:
reviews_max ={}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max [name]= n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(len(reviews_max))


9659


利用先前建立的reviews_max 字典，根据对应的reveiws数将所有app（无重复）的信息，录入一个新的android_clean表中。注意：其中用reviews数作为参考，还添加了name not in already_added, 是因为不同APP会有reviews数。

In [12]:

android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) # make sure this is inside the if block
        
print(len(android_clean))

9659


发现个严重的问题，有些没有定义的变量名居然不会报错？？？郁闷了例如：reviews_max ={}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max [name]= n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_rating
        
        最后一个n_rating 根本没有定义，然鹅我的jupyter notebook 竟然不报错，到底是哪里有问题？

In [13]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows: 9659
Number of Columns: 13


In [14]:
print(ios[813][1])
print(ios[6731][1])
print('\n')
print(android_clean[4412][0])   
print(android_clean[7940][0]) 

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜


中国語 AQリスニング
لعبة تقدر تربح DZ


删去非英文类APP.
可以用ord()函数根据ASCII code 对应的  英文文本常用的字符  范围是（0-127）从而判断非英文APP

In [15]:
def is_English(String):
    for charater in String:
        if ord(charater)>275:
            return False
    return True

print(is_English('爱奇艺PPS'))
print(is_English('english 学习'))

False
False


但是有个问题，如果英文APP名字里有其他符合或者表情other symbols，也会被删除

In [16]:
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters
减少数据损失，我们只删除有3个以上字符不在英文文本范围内（0-127）的APP

In [17]:
def is_English(String):
    not_english_count=0
    for charater in String:
        if ord(charater)>127:
            not_english_count+=1
    if not_english_count>3:
        return False
    else:
        return True
        
print(is_English('Docs To Go™ Free Office Suite'))
print(is_English('Instachat 😜'))
print(is_English('爱奇艺视频PPS'))
print(is_English('english 学习'))

True
True
False
True


使用is_English function 过滤数据表中的非英文APP

In [18]:
android_clean_english =[]
ios_english=[]
for app in android_clean:
    if is_English(app[0])== True:
       android_clean_english.append(app) 
    


for app in ios:
    if is_English(app[1])== True:
       ios_english.append(app) 
    
explore_data(android_clean_english, 0, 3, True)    
print('\n')
explore_data(ios_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows: 9614
Number of Columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

## Isolating the Free Apps

因为我们的收入来源是免费APP中植入的广告，现在需要找出所有的付费APP, 去掉所有non-free app

In [19]:
android_final =[]
ios_final=[]
print(android_clean_english[1][7])
print(ios_english[1][4])
for app in android_clean_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
print(len(ios_english_free))
print(len(android_final))

0
0.0


NameError: name 'ios_english_free' is not defined

## Most common App by Genre

### Part 1

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1.Build a minimal Android version of the app, and add it to Google Play.
2.If the app has a good response from users, we then develop it further.
3.If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.
Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.
找出最能吸引用户的APP,因为我们的利润由使用APP的人数所决定的。

为了降低风险和减少开销，我们采取了以下三个策略：
1. 先做一个低版本的安卓版APP，放Google Play 测试
2. 如果app反馈好，再进一步开发
3. 如果app六个月后开始盈利，我们将开始开发iOS版app

因为我们的最终目标是将app 上架Google Play和App Store，所以我们需要找到在这两个应用商店都成功的APP。

先从分析每个平台最常见的app类型。为此我们将建立一个频率表，记录App Store 的prime_genre 列，和Google Play 的Genres and Category

### Part 2

We'll build two functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages
Another function that we can use to display the percentages in a descending order

创建一个可以分析频率的function：
一个function生成频率表并显示百分比
一个function将按百分比顺序排列显示

In [None]:
def freq_table(dataset, index):
    table = {}
    total = 0
    for row in dataset:
        total+=1
        value = row[index]
        if value in table:
            table[value]+=1
        else:
            table[value]=1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key]/total)*100
        table_percentages[key] = percentage
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset,index)
    show_table = []
    for key in table:
        key_val_tuple = (table[key], key)    # 注意因为stored()函数只能对字典的Key进行排序，而Value将会被丢弃。所以此处将字典的键值转换成元组
        show_table.append(key_val_tuple)     #再添加到List中，将元组中的值置换位置，百分比在前，按百分比排序
        
    table_stored = sorted(show_table, reverse= True)   # 使用了排序function：sorted
#     print(table_stored)       # 这一步可以看到 APP 名在后，百分比在前，且以list 形式输出，不便查看
    for entry in table_stored:
        print(entry[1],':',entry[0])         # 这里又将键值位置换回来
  

### Part 3

分析App Store 中APP百分比情况

In [None]:
display_table(ios_final, 11)

As we can see that among free English apps, more than half(about 58%)are games. Entertainmet apps are close to 8% followed by Photo & Video apps which are close to 5%. Only 3.6% apps are designed for education, followed by Social Networking which are amout for 3.29% of apps in out data set.
General impression is that App Store(at least the part containing free English Apps) is dominated by apps that are designed for fun(games, entertainment, Photo & Social Networking, Sports, Music,ect.), while apps with practical purpose are more rare. However, the fun apps are have numerous number of apps(Eduction, Shopping, Utilities, Productivity, Lifestyle ), doesn't imply that they have the greatest number of users -- the demand may not be the same as the offer.

继续分析Google Paly 的App

In [None]:
display_table(android_final, 1)

The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and there are good numbers of apps are designed for practical purposes(family, tools, business, lifeatyle, etc.) If we investigate further we can find that colse to 19% apps are designed for famlily, which means most games designed for kids.

![title](img/picture.png)

In [None]:
display_table(android_final, -4)

Google Play 的Genres 和 Category类似，但后续还是会以category 为主进行分析

## Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Below, we calculate the average number of user ratings per app genre on the App Store:

Isolate the apps of each genre.
Sum up the user ratings for the apps of that genre.
Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [None]:
genres_ios = freq_table(ios_final, -5)
# print(genres_ios)

for genre in genres_ios:
    total_rating = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[-5]
        if genre_app == genre:
            total_rating = total_rating + float(app[5])
            len_genre += 1
    avg_rating = total_rating/len_genre
    print(genre,':', avg_rating)

由上可以看出，Navigation用户数量最多，但是这个数据受Waze 和 Google Maps的影响太大，这两个APP的评论数差不多有5百万

In [None]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5])
    

上述的情况还出现在social networking apps，平均值受巨头的影响很大，如：Facebook，Pinterest, Skype, etc.
音乐类app也有同样的情况，巨头： Pandora, Spotify, and Shazam 

参考类APP中Bible 和Dictionary.com 同样严重影响平均值

In [None]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

这个小众市场看起来很有潜力，我们可以将一本比较流行的书做成一个APP，除了那本书本身的特点，我们还可以加入些其他的特色，如：加上音频，这本书的daily quotes，关于这本书的测试，我们还可以加入字典，让用户不用退出软件就可以查询。

这个想法在以娱乐app为主的App Store中看起来似乎比较好，因为App Store 中娱乐app已经饱和，这意味着实用类app有更大的机会在App Store平台占据一席之地。

其他看起来比较流行的APP还有weather, book, food and drink, or finance，book类和我们上面的想法有点重叠，但是其他app都不适合我们：
1. 天气类app，要提供可靠的天气数据，我们的app需要接入付费的APIs
2. 食物饮料类app，像星巴克，麦当劳，得要有自己的餐饮店铺，配送服务等
3. 金融类app，专业性要求高，同时还要和各大银行，付款，账单联系起来，我们实力不允许

## Most Popular App by Genre on Google Play

对于Google Play平台上的APP，我们可以直接看install次数

In [None]:
display_table(android_final, 5)

由上我们可以看出问题，对于100,000+ 安装次数的app，我们并不清楚到底是安装了100,000 次， 200,000次， 还是 350,000次。当然我们的目的并不需要精确的数据，我们只想要哪张类型的app用户数较多。

我们将保持上述数字不变，并将按装100,000+的就当作治安装了100,000次，安装了1,000,000+当做只安装了1,000,000次

为了计算，我们要将安装次数转换成float型：

In [None]:
androids_category = freq_table(android_final, 1)

for category in androids_category:
    total = 0
    len_catogery = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            installs_str =  app[5]
            installs = installs_str.replace('+','')
            installs = installs.replace(',','')
            installs = float(installs)
            total += installs
            len_catogery += 1
    avg_installs = total/len_catogery
    print(category,':',avg_installs)


communication apps 安装次数最多: 38,456,119. 这个安装数据受社交软件的巨头影响比较大 (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

In [None]:
for app in android_final:
    if app[1]=='COMMUNICATION' and(app[5] == '1,000,000,000+'or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0],',',app[5])

如果去除安装次数超100million的app，communication类app安装次数将减少10倍

In [None]:
for_under_100M = []
for app in android_final:
    installs = app[5]
    installs = installs.replace(',','')
    installs = installs.replace('+','')
    installs = float(installs)
    if app[1]=='COMMUNICATION' and installs<100000000:
        for_under_100M.append(installs)

avg_installs = sum(for_under_100M)/len(for_under_100M)   
print(avg_installs)

We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).

Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.

The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.

The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play.

Let's take a look at some of the apps from this genre and their number of installs:

同样的情况也出现在video players中有24,727,872次安装，其中YouTube，Google Play Movies & TV, or MX Player占据了主要市场。
同意也存在于社交类（巨头：Facebook, Instagram, Google+, etc.）
photography apps(Google Photos and other popular photo editors)
productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.)

再次强调，这意味着这些app看起来比实际上更流行。此外这些市场被那些我们我们无法匹敌的巨头垄断。

游戏类app看起来相当流行，但是此前分析过市场饱和了，所以如果可能我们想开发一个不同的app。

参考类数和书籍类看起来也非常流行，平均安装次数8,767,811。因为这类app又很大的潜力，所以值得我们进一步研究。
现在来分析这类app的安装情况：

In [None]:
for app in android_final:
    if app[1]=='BOOKS_AND_REFERENCE':
        print(app[0],',',app[5])

The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:

In [None]:
for app in android_final:
    if app[1]=='BOOKS_AND_REFERENCE' and(app[5] == '1,000,000,000+'or app[5] == '500,000,000+' or app[5] == '100,000,000+'):
        print(app[0],',',app[5])

However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):

In [None]:
for app in android_final:
    if app[1]=='BOOKS_AND_REFERENCE' and(app[5] == '1,000,000+'or app[5] == '5,000,000+'or app[5] == '10,000,000+'or
                                         app[5] == '50,000,000+'):
        print(app[0],',',app[5])

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

这个市场似乎被软件处理，电子书，以及各种字典，图书收集软件垄断，所以再建一个类似的软件似乎并非明智之举。

## Conclusions

 In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

在这个项目中分析了App Store和Google Play中移动APP，致力于找到一个可以适应这两个市场的APP.

我们得出结论可以将一本比较流行的书做成一个APP，除了那本书本身的特点，我们还可以加入些其他的特色，如：加上音频，这本书的daily quotes，关于这本书的讨论，我们还可以加入字典，让用户不用退出软件就可以查询等待。
