# Profitable App Profiles for the App Store and Google Play Markets


We'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

Company only build apps that are free to download and install, and their main source of revenue consists of in-app ads.

We'll analyze the existing **data about mobile apps** availabel on Google Pay and App Store to discover what type of apps attract more users.

We'll focus in find he **Genre** of the app to be developed.

To find this we need to know:
- Which Genre has more apps developed
- Which Genre has more apps installed


## Exploring dataset

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Source: [Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)


These are two data sets that seem suitable for our goals:
- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

We make a helper function to open the two datasets.

In [1]:
# Open a .csv file and returns as list of list
def open_data_set(file_name, encoding=None):
    with open(file_name, encoding=encoding) as file:
        from csv import reader
        file_reader = reader(file)
        
        # Make file reader in list of list     
        dataset = list(file_reader)
        return dataset

In [2]:
ios_full = open_data_set('AppleStore.csv')
android_full =open_data_set('googleplaystore.csv')

We make a function to start exploring the datasets, explore data print a slice of data set and conditionally print information of the data set, like number of row and number of columns.

In [3]:
# Params
# data_set: list of list
# start: int end: int, represnet indices of slice from data set
# rows_and_columns: bool, condition to print extra info
def explore_data(data_set, start, end, rows_and_columns=False):
    for row in data_set[start:end]:
        print(row)
        print('\n')

    
    if rows_and_columns:
        print(f'Number of rows {len(data_set)}')
        print(f'Number of columns {len(data_set[0])}')

In [4]:
print('iOS')
explore_data(ios_full, 0, 3, True)
print('\n')
print('Android.')
explore_data(android_full, 0, 3, True)

iOS
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows 7198
Number of columns 16


Android.
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '

## Deleting inconsistent data

The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we can see that one of the discussions outlines an error for row 10472. 

We make a function that compares the header with all the other rows to check if it has inconsistent information.

In [5]:
#The dataset needs to have a header in the first row.
# Returns list of indices of rows with in inconsistent data.
def find_rows_with_missing_data(data_set):
    header = data_set[0]
    header_len = len(header)
    row_indexs = []
    index = 1
    for row in data_set[1:]:
        # If the len of row is different of len of heder 
        # is including in returning list    
        if len(row) != header_len: 
            row_indexs.append(index)
        index += 1
    return row_indexs

In [6]:
android_missing_data = find_rows_with_missing_data(android_full)
# Checking if there are more rows with inconsistent information
for row in android_missing_data:
    print(f'row: {row}, data: {android_full[row]}', '\n')

row: 10473, data: ['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 



We'll use the `del` statement to remove rows inconsistent data.

In [37]:
# We'll iterate and delte the rows that we find with inconsistent data
for row in android_missing_data:
        del android_full[row]

For readable in code, each time we modify any of the datasets, we will reassign them to the andorid or ios variables as appropriate.

We'll to copy the full data sets without header.

In [8]:
android = android_full[1:]
ios = ios_full[1:]

## Removing duplicate data

If you explore the Google Play data set long enough or look at the [discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion) section, you'll notice some apps have duplicate entries.

We'll make a function to find duplicates rows comparing an specific cell value.

In [9]:
def find_duplicates_values(data_set, cell):
    uniques = []
    duplicates = []
    for row in data_set:
        value = row[cell]
        if value in uniques:
            duplicates.append(value)
        else:
            uniques.append(value)
    return duplicates

In [10]:
android_duplicates = find_duplicates_values(android, 0)
print(f'Number of duplicate apps: {len(android_duplicates)}')
print('\n')
print('Examples of duplicate apps: ', android_duplicates[:5])

Number of duplicate apps: 1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


We can delete ramdoly all duplicates apps becase they can have dirents values, like the number of reviews. In this case we can delete all apps with the min Reviews.

For exaplame we have Quick PDF Scanner + OCR FREE app with 3 differents rows in daataset. 

In [11]:
for app in android:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(f'Reviews = {app[3]}')

Reviews = 80805
Reviews = 80805
Reviews = 80804


To remove the duplicates, we will:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [12]:
reviews_max = {}
for app in android:
    review = float(app[3])
    name = app[0]
    # comparison to find the app with max review value
    if review > reviews_max.get(name, -1): 
        reviews_max[name] = review

print(f'Expected length: {len(reviews_max)}')

Expected length: 9659


In [13]:
android_clean = []
already_added = []
for app in android:
    review = float(app[3])   
    name = app[0]
    if name in reviews_max and review == reviews_max[name]:
        if name not in already_added:
            already_added.append(name)
            android_clean.append(app)
        
explore_data(android_clean, 0, 5, True)
    

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of rows 9659
Number of columns 13


## Removing Non-English Apps
Remember we use English for the apps we develop at our company, and we'd like to analyze only the apps that are directed toward an English-speaking audience. However, if we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.

We'll remove each app with a name containing a symbol that is not commonly used in English text. 

To check that we'll use `ord` built-in function to check the value assign to every character in name of app, the values we are interesting are min of 127.

In data sets exists apps whit other characters there are no related to the langges of the app, like emojis `😜` or characters like `™`.

To consider these apps in our analysis, it seemed enough to us to have in the name of the apps as a maximum of three symbols that is not commonly used in English

In [14]:
def is_common_english(name):
    non_ascii = 0
    for character in name:
        if ord(character) > 127:
            non_ascii += 1
            # If we have more than 3 non ascii,
            # we can break the for loop
            if non_ascii > 3: 
                return False
    return True

We'll use the function `is_common_english` to check whether these app names are detected as English or non-English:

In [15]:
print(is_common_english('Instagram'))
print(is_common_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_common_english('Docs To Go™ Free Office Suite'))
print(is_common_english('Instachat 😜'))

True
False
True
True


We'll explore data sets to find non common English apps.


In [16]:
def find_non_common_english(data_set, cell):
    non_common_english = []
    for row in data_set:
        if not is_common_english(row[cell]):
            non_common_english.append(row)
            
    return non_common_english


In [17]:
ios_non_english = find_non_common_english(ios, 1)
explore_data(ios_non_english, 0, 2)
print('\n')
android_non_english = find_non_common_english(android_clean, 0)
explore_data(android_non_english,0,2)

['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '14844', '0', '4.0', '0.0', '6.3.3', '17+', 'Entertainment', '38', '5', '3', '1']


['405667771', '聚力视频HD-人民的名义,跨界歌王全网热播', '90725376', 'USD', '0.0', '7446', '8', '4.0', '4.5', '5.0.8', '12+', 'Entertainment', '24', '4', '1', '1']




['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 26, 2018', '3.3', '4.1 and up']


['သိင်္ Astrology - Min Thein Kha BayDin', 'LIFESTYLE', '4.7', '2225', '15M', '100,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'July 26, 2018', '4.2.1', '4.0.3 and up']




We'll explore data sets to find English apps.

In [18]:
def find_common_english(data_set, cell):
    common_english = []
    for row in data_set:
        if is_common_english(row[cell]):
            common_english.append(row)
            
    return common_english

In [19]:
android_english = find_common_english(android_clean, 0)
explore_data(android_english, 0, 1, True)
print('\n')
ios_english = find_common_english(ios, 1)
explore_data(ios_english, 0, 1, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows 9614
Number of columns 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows 6183
Number of columns 16


In [20]:
android = android_english
ios = ios_english

## Isolated the free apps

We'll loop through each data set to isolate the free apps in separate lists.

For Google Play data set the price come up as string (`0`, `$0.99`, `$2.99`).

In [21]:
android_free = []
for app in android:
    price = app[7]
    # compering to '0' string   
    # instead of 0 int or 0.0 float 
    if price == '0':
        android_free.append(app)
        
explore_data(android_free, 0,3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows 8864
Number of columns 13


The App Store data set the price come up as string (`0.0`, `$0.99`, `$2.99`).

In [22]:
ios_free = []
for app in ios:
    price = app[4]
    if price == '0.0':
        ios_free.append(app)

explore_data(ios_free, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows 3222
Number of columns 16


In [23]:
android = android_free
ios = ios_free

## Most Common Apps by Genre
Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful on both markets.

Let's begin the analysis by getting a sense of what are the most common genres for each market.

We'll build three functions we can use to analyze the frequency tables:

One function to generate frequency tables that show percentages.
Another function that we can use to display the percentages in a descending order.
The last funciton calling the other tow funtions, this is for reuse `freq_table` and `display_freq_table` funcitons.

In [24]:
# Returns frequency table for a data set for certian row index
def freq_table(dataset, row_index):
    table = {}
    for row in dataset:
        cell = row[row_index]
        if cell in table:
            table[cell] += 1
        else:
            table[cell] = 1
    
    total = len(dataset)
    
    for key in table:
        percentage = (table[key] / total) * 100
        table[key] = percentage
        
    return table

In [25]:
# Display frequency, with first elements of table, default value is 10.
def display_freq_table(table, top=10):
    table_display = []
    for k, v in table.items():
        table_display.append((v, k))
    
    table_sorted = sorted(table_display, reverse=True)[:top]
    table_slice = table_sorted[:top]
    for row in table_slice:
        print(row[1], ' : ', row[0])

In [26]:
def display_table(dataset, index, top=10):
    table = freq_table(dataset, index)
    display_freq_table(table)

For the App Store why have the top ten Genres. The Genres with more percentage are: `Games`, `Entertainment`, `Photo & Video`.

In [27]:
display_table(ios, 11)

Games  :  58.16263190564867
Entertainment  :  7.883302296710118
Photo & Video  :  4.9658597144630665
Education  :  3.662321539416512
Social Networking  :  3.2898820608317814
Shopping  :  2.60707635009311
Utilities  :  2.5139664804469275
Sports  :  2.1415270018621975
Music  :  2.0484171322160147
Health & Fitness  :  2.0173805090006205


Most of the apps designed for entertainment. 

We need more data to analyse for recommend an app profile, like number of downloads, or reviews in the store.

For Play Store this is the top ten Genres.

We do not have a clear pattern of the distribution of the genre of the applications.

Compare with the pattern of App Store, in Play Store we have more different Genres in the most common genres.

In [28]:
display_table(android, 9)
print('\n')
display_table(android, 1)

Tools  :  8.449909747292418
Entertainment  :  6.069494584837545
Education  :  5.347472924187725
Business  :  4.591606498194946
Productivity  :  3.892148014440433
Lifestyle  :  3.892148014440433
Finance  :  3.7003610108303246
Medical  :  3.531137184115524
Sports  :  3.463447653429603
Personalization  :  3.3167870036101084


FAMILY  :  18.907942238267147
GAME  :  9.724729241877256
TOOLS  :  8.461191335740072
BUSINESS  :  4.591606498194946
LIFESTYLE  :  3.9034296028880866
PRODUCTIVITY  :  3.892148014440433
FINANCE  :  3.7003610108303246
MEDICAL  :  3.531137184115524
SPORTS  :  3.395758122743682
PERSONALIZATION  :  3.3167870036101084


## Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre.

We'll make a function that calculate and display apps average number of installs for each app genre.

Because the Android dataset contains special characters in the Installs column value, an example is `100,000+`, we need to remove `,` and `+` character to can to cast a float value. To remove that character we'll use `replace` built-in function.

In [29]:
# index_genre is the index of cell in dataset that represent the genre.
# index_install is the index of cell in dataset that represent the installs value.
def freq_app_installs(dataset, index_genre, index_install):
    table = freq_table(dataset, index_genre)
    inner_table = {}
    
    for key in table:
        total = 0
        inner_table[key] = 0
        for row in dataset:
            genre = row[index_genre]
            # For android dataset we'll need to remove ',' and '+' characters.         
            installs = row[index_install].replace(',','')
            installs = installs.replace('+', '')
            installs = float(installs)
            
            if genre == key:
                inner_table[key] += installs
                total += 1
        freq = inner_table[key] / total 
        inner_table[key] = freq
    return inner_table

In [30]:
ios_freq_app_installs = freq_app_installs(ios, 11, 5)
display_freq_table(ios_freq_app_installs)

Navigation  :  86090.33333333333
Reference  :  74942.11111111111
Social Networking  :  71548.34905660378
Music  :  57326.530303030304
Weather  :  52279.892857142855
Book  :  39758.5
Food & Drink  :  33333.92307692308
Finance  :  31467.944444444445
Photo & Video  :  28441.54375
Travel  :  28243.8


For App Store we have categories `Navigation`, `Reference` and `Social Networking` with the largest number of applications installed.

We'll make a function to explore datasets for `genre`, we loop through dataset and filter apps for specific `genre`. When we have the apps filtered, we'll sort them in descending order to be clear about market trends.

In [31]:
def display_appp_by_genre(dataset, index_name, index_genre, index_installs, genre):
    apps = []
    for app in dataset:
        if app[index_genre] == genre:
            installs = app[index_installs].replace(',','')
            installs = installs.replace('+', '')
            installs = float(installs)
            name = app[index_name]
            apps.append((installs, name))
    
    apps_sorted = sorted(apps, reverse=True)
    
    for app in apps_sorted[:10]:
        print(f'{app[1]} : {app[0]}')

In [32]:
print('Navigation')
display_appp_by_genre(ios,1, 11, 5, 'Navigation')
print('\n','Reference')
display_appp_by_genre(ios,1, 11, 5, 'Reference')
print('\n', 'Social Networking')
display_appp_by_genre(ios,1, 11, 5, 'Social Networking')

Navigation
Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0

 Reference
Bible : 985920.0
Dictionary.com Dictionary & Thesaurus : 200047.0
Dictionary.com Dictionary & Thesaurus for iPad : 54175.0
Google Translate : 26786.0
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418.0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588.0
Merriam-Webster Dictionary : 16849.0
Night Sky : 12122.0
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693.0

 Social Networking
Facebook : 2974676.0
Pinterest : 1061624.0
Skype for iPhone : 373519.0
Messenger : 351466.0
Tumblr : 334293.0
WhatsApp Messenger : 287589.0
Kik : 260965.0
ooV

For the `Navigation` and ` Social Networking
` genre we see is clearly dominated by large companies tech like `Waze Mobile`, `Google` whit apps like `Google Maps`, `Faceebook` whit apps like `Facebook`, `Messenger` and `WhatsApp`. Which makes it very difficult to compete in these genres. On the other hand we have Gender `Reference` where no large companies tech appears in the top 3 apps.

## Most Popular Apps by Genre on the Play Store


For Play Store we'll show `Category` and `Genres` frequency tables

In [33]:
android_freq_app_installs = freq_app_installs(android, 1, 5)
display_freq_table(android_freq_app_installs)
print('\n')
android_freq_app_installs = freq_app_installs(android, 9, 5)
display_freq_table(android_freq_app_installs)

COMMUNICATION  :  38456119.167247385
VIDEO_PLAYERS  :  24727872.452830188
SOCIAL  :  23253652.127118643
PHOTOGRAPHY  :  17840110.40229885
PRODUCTIVITY  :  16787331.344927534
GAME  :  15588015.603248259
TRAVEL_AND_LOCAL  :  13984077.710144928
ENTERTAINMENT  :  11640705.88235294
TOOLS  :  10801391.298666667
NEWS_AND_MAGAZINES  :  9549178.467741935


Communication  :  38456119.167247385
Adventure;Action & Adventure  :  35333333.333333336
Video Players & Editors  :  24947335.796178345
Social  :  23253652.127118643
Arcade  :  22888365.48780488
Casual  :  19569221.602564104
Puzzle;Action & Adventure  :  18366666.666666668
Photography  :  17840110.40229885
Educational;Action & Adventure  :  17016666.666666668
Productivity  :  16787331.344927534


For Play Store we have `Communication`, `Video players`, `Adventure;Action & Adventure` and`Social` with the largest number of applications installed. Let's explore these genres.

In [34]:
print('Communication')
display_appp_by_genre(android, 0, 1, 5, 'COMMUNICATION')
print('\n', 'Video players')
display_appp_by_genre(android, 0, 1, 5, 'VIDEO_PLAYERS')
print('\n', 'Adventure;Action & Adventure')
display_appp_by_genre(android, 0, 9, 5, 'Adventure;Action & Adventure')
print('\n', 'Social')
display_appp_by_genre(android, 0, 1, 5, 'SOCIAL')

Communication
WhatsApp Messenger : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Hangouts : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
imo free video calls and chat : 500000000.0
Viber Messenger : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
LINE: Free Calls & Messages : 500000000.0

 Video players
YouTube : 1000000000.0
Google Play Movies & TV : 1000000000.0
MX Player : 500000000.0
VivaVideo - Video Editor & Photo Movie : 100000000.0
VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
VLC for Android : 100000000.0
Motorola Gallery : 100000000.0
Motorola FM Radio : 100000000.0
Dubsmash : 100000000.0
Vote for : 50000000.0

 Adventure;Action & Adventure
ROBLOX : 100000000.0
Transformers Rescue Bots: Hero Adventures : 5000000.0
Leo and Tig : 1000000.0

 Social
Instagram : 1000000000.0
Google+ : 1000000000.0
Facebook : 1000000000.0
Snapchat : 50000000

For the top genres that we have explored we see is clearly dominated by large companies tech like `Google` whit apps like `Google Chrome`, `Gmail`, `Youtube`, `Faceebook` whit apps like `Facebook`, `Instagram`. Which makes it very difficult to compete in these genres.

to see if there is any behavior similar to the one found in the Apple Store dataset.

First we have to find a genre that is similar to `Reference`.

In [35]:
genres = []
for app in android:
    genere = app[1]
    if genere not in genres:
        genres.append(genere)

print(genres)

['ART_AND_DESIGN', 'AUTO_AND_VEHICLES', 'BEAUTY', 'BOOKS_AND_REFERENCE', 'BUSINESS', 'COMICS', 'COMMUNICATION', 'DATING', 'EDUCATION', 'ENTERTAINMENT', 'EVENTS', 'FINANCE', 'FOOD_AND_DRINK', 'HEALTH_AND_FITNESS', 'HOUSE_AND_HOME', 'LIBRARIES_AND_DEMO', 'LIFESTYLE', 'GAME', 'FAMILY', 'MEDICAL', 'SOCIAL', 'SHOPPING', 'PHOTOGRAPHY', 'SPORTS', 'TRAVEL_AND_LOCAL', 'TOOLS', 'PERSONALIZATION', 'PRODUCTIVITY', 'PARENTING', 'WEATHER', 'VIDEO_PLAYERS', 'NEWS_AND_MAGAZINES', 'MAPS_AND_NAVIGATION']


We found `BOOKS_AND_REFERENCE` genre, we are going to explore.

In [36]:
display_appp_by_genre(android, 0, 1, 5, 'BOOKS_AND_REFERENCE')

Google Play Books : 1000000000.0
Wattpad 📖 Free Books : 100000000.0
Bible : 100000000.0
Audiobooks from Audible : 100000000.0
Amazon Kindle : 100000000.0
Wikipedia : 10000000.0
Spanish English Translator : 10000000.0
Quran for Android : 10000000.0
Oxford Dictionary of English : Free : 10000000.0
NOOK: Read eBooks & Magazines : 10000000.0


We found in the `BOOKS_AND_REFERENCE` genre we have the Bible as in Apple Store dataset. We can take this finding as a clue to develop an app that is about a popular boos, or a new publication.

# Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We conclude we can take an popular book and turning it into an app could be profitable for both the Google Play and the App Store market. We can include in the app interactive author's notes, or the function of making the book audible to make the app more attractive.