# Profitable App Profiles for the App Store and Google Play Markets

Our goal is to find mobile apps that are profitable for both Android & Apple through Google Play store & App Store. 

We are looking to create free mobile apps. Our main source of revenue will be from the in-app ads.

We will analyze data to help understand what kind of apps are likely to attract more users.


## Gathering the Data
The two dataset's we will be using are:

[Google data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play. 

[Apple data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. 

Let's start by opening the two data sets and then continue with exploring the data.

## Opening the file to explore

This function, named `"open_file"`, imports the `"reader"` function from the `"csv"` library and defines two parameters - `"file_name"` and `"remove_header"`. The function opens a file from a specified path and encodes it in utf8 format. The contents of the file are then read using the `"reader"` function, and converted to a list.

If the `"remove_header"` parameter is set to `"True"`, the function removes the first row of the list (which is assumed to be the header), and returns two values: the header as a separate list, and the remaining data as a list of lists. If `"remove_header"` is set to `"False"`, the function simply returns the original list of lists without removing the header.

In [1]:
from csv import reader

def open_file(file_name, remove_header = True):
    path = '/DataQuest_Projects/data_files/'
    opened_file = open(path+file_name, encoding='utf8')
    read_file = reader(opened_file)
    dataset = list(read_file)
    
    if remove_header:
        header = dataset[0]
        body = dataset[1:]
        return header, body
    else:
        return dataset


## Exploring the Dataset
This function, named `"explore_data"`, takes four parameters - `"dataset", "start", "end", and "rows_and_columns"`. The function first slices the dataset based on the start and end indices provided. It then iterates through each row of the sliced dataset and prints it, followed by a blank line.

If the `"rows_and_columns"` parameter is set to `"True"`, the function also prints the total number of rows and columns in the dataset by using the `"len()"` function. If the parameter is not set, this part of the code is not executed.

In summary, this function is used to explore a dataset by printing out a portion of it (based on the `"start"` and `"end"` parameters), and optionally providing information about the total number of rows and columns in the dataset.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


## Exploring Apple Dataset
In summary, this code reads in the `'AppleStore.csv'` file and then prints out the header of the dataset followed by the first five rows and the number of rows and columns of the dataset using the `'explore_data'` function.

In [3]:
apple_data = open_file('AppleStore.csv')

# print out the header
print(apple_data[0])
print('\n')
#print out the first 5 rows of the data & show the # rows and columns
explore_data(apple_data[1], 0, 5, rows_and_columns=True)


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16


## Exploring Google Dataset
In summary, this code reads in the `'googleplaystore.csv'` file and then prints out the header of the dataset followed by the first five rows and the number of rows and columns of the dataset using the `'explore_data'` function.

In [4]:
google_data = open_file('googleplaystore.csv')

# print out the header
print(google_data[0])
print('\n')
#print out the first 5 rows of the data & show the # rows and columns
explore_data(google_data[1], 0, 5, rows_and_columns=True)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

## Deleting the wrong Data



In [5]:
print(google_data[1][10472]) # incorrect row 
print('\n')
print(google_data[0]) # header 


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Confirming wrong number of entries
we check the length of the wrong data row and compare it to the number of columns in the header row

In [6]:
print('Row 10472 has',len(google_data[1][10472]), 'entries')

print('There should be',len(google_data[0]), 'entries based on the header columns')

Row 10472 has 12 entries
There should be 13 entries based on the header columns


As a consequence, we'll delete this row.

In [7]:
print(len(google_data[1]))
del google_data[1][10472] # deleting the wrong row
print(len(google_data[1]))


10841
10840


## Removing Duplicate Entries
In summary, this function is used to check for duplicate app names in a dataset and provides information on the number of occurrences of each duplicated app name.

Setting the `return_dublicate_names` to `True`, a `list()` of the duplicate names will be returned 


In [8]:
# check if the datasets have duplicate rows 
def check_for_duplicates(data, print_info = True, return_dublicate_names = False):
    '''
    By default this function will print the total number of
    duplicate apps and also the total number of unique apps.
    Setting the return_dublicate_names to True 
    A list of the duplicate names will be returned 
    '''
    duplicate_apps = []
    unique_apps = []
    for app in data:
        name = app[0]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    if print_info:
        print('Number of Duplicate Apps', len(duplicate_apps))
        print('Number of Unique Apps', len(unique_apps))
    if return_dublicate_names:
        return duplicate_apps

In [9]:
# lets check for duplicates in google dataset
print('Google Data')
check_for_duplicates(google_data[1])
print('\n')
print('Apple Data')
check_for_duplicates(apple_data[1])

Google Data
Number of Duplicate Apps 1181
Number of Unique Apps 9659


Apple Data
Number of Duplicate Apps 0
Number of Unique Apps 7197


## Confirm and check that the duplicates do in fact exist prior to deleting 


In [10]:
# get the list of duplicates
list_of_google_duplicates = check_for_duplicates(google_data[1], print_info=False, return_dublicate_names=True)

# print out the first element in our list of duplicates to confirm that there are duplicates
for app in google_data[1]:
    name = app[0]
    if name == list_of_google_duplicates[0]:
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


We create a dictionary called `"reviews_max"`. The for loop iterates through the rows of the `"google_data"` dataset (excluding the header row), extracts the name and number of reviews for each app, and then populates the `"reviews_max"` dictionary with the maximum number of reviews for each app.

In particular, the code checks if the app name already exists in the `"reviews_max"` dictionary, and if so, it updates the maximum number of reviews if the current number of reviews is greater. If the app name does not exist in the dictionary, the code adds the app name and its number of reviews as a new key-value pair.

Finally, the code checks that the length of the `"reviews_max"` dictionary matches the expected length of 9659 (which corresponds to the number of unique apps in the `"google_data"` dataset). A message is then printed stating the length of the dictionary and the expected length.


In [11]:
reviews_max = {}

for row in google_data[1]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

#check that the length of reviews_max dictionary is the expected length of 9659 from our unique dictionary        
print('length of reviews max dictionary',len(reviews_max), 'rows')  
print('Expected length is 9659')


length of reviews max dictionary 9659 rows
Expected length is 9659


Let's use the `reviews_max` dictionary to remove the duplicates. 

For the Duplicate rows we keep only the row with the highest number of reviews. 

* We start by initializing to empty lists `google_data_clean` and `already_added`
* We loop through our main `google dataset`
    * We isolate the `name` and `number_reviews`
    * We add the current row to the `google_data_clean` list and to the `already_added` list if:
        * number of reviews of the current row matches the number of reviews in the  `reviews_max` dictionary
        * the name is not already added to the `already_added` list
    


In [12]:
google_data_clean = []
already_added = []

for row in google_data[1]:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        google_data_clean.append(row)
        already_added.append(name)
        


Now let's explore the new data set, and confirm that the number of rows is 9659 

In [13]:
explore_data(google_data_clean, 0, 3, True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We have 9659 rows as expected

The given code defines a function called `"is_english_name"` which takes a single parameter `"a_string"` representing a name or string of characters.

The function loops through each character in the input string and uses the built-in `"ord"` function to convert the character to its corresponding ASCII code. If the ASCII code of the character is greater than 127 (which includes non-English characters), the function returns False, indicating that the input string is not entirely composed of English characters.

If all characters in the input string have an ASCII code less than or equal to 127 (which includes all English characters), the function returns True, indicating that the input string is composed entirely of English characters.

Overall, the purpose of this function is to determine whether a given string contains only English characters or not.

In [14]:
def is_english_name(a_string):
    '''
    This function will check if the name contains on english characters
    If english characters only then the function will return true  
    '''
    counter = 0 # used for counting non english characters
    for char in a_string:
        # Note if char has a number <= 127 then the character is in the english language  
        if ord(char) > 127:
            counter += 1
            if counter >= 3:
                return False
    return True    


The code below filters the `"google_data_clean"` dataset to create a new list called `"google_data_clean_english"` which contains only the rows with English app names.

The code uses the previously defined `"is_english_name"` function to check whether the name of each app in the `"google_data_clean"` dataset is composed entirely of English characters. If the name passes this check, meaning it is composed entirely of English characters, the entire row is appended to the `"google_data_clean_english"` list.

Overall, the purpose of this code is to create a new dataset that only contains rows with English app names, which can be useful for certain analyses or applications where language is important.

In [15]:
# filter out google dataset
google_data_clean_english = []

for row in google_data_clean:
    name = row[0]
    if is_english_name(name):
        google_data_clean_english.append(row)


The code below filters the `"apple_data[1]"` dataset to create a new list called `"apple_data_clean_english"` which contains only the rows with English app names.

The code uses the previously defined `"is_english_name"` function to check whether the name of each app in the `"apple_data[1]"` dataset is composed entirely of English characters. If the name passes this check, meaning it is composed entirely of English characters, the entire row is appended to the `"apple_data_clean_english"` list.

Overall, the purpose of this code is to create a new dataset that only contains rows with English app names, which can be useful for certain analyses or applications where language is important.

In [16]:
# filter out apple dataset
apple_data_clean_english = []

for row in apple_data[1]:
    name = row[1]
    if is_english_name(name):
        apple_data_clean_english.append(row)

The given code is used to filter out only the free apps from the previously filtered "google_data_clean_english" and "apple_data_clean_english" datasets, and create two new lists called `"google_data_clean_english_free"` and `"apple_data_clean_english_free"` respectively.

The code first creates an empty list called `"google_data_clean_english_free"` and then loops through each row in the `"google_data_clean_english"` dataset. For each row, the code extracts the app type (either "Free" or "Paid") from the 7th column of the row (index 6), and if the app type is `"Free" `(case-insensitive), the entire row is appended to the `"google_data_clean_english_free"` list.

The code then creates an empty list called `"apple_data_clean_english_free"` and loops through each row in the `"apple_data_clean_english"` dataset. For each row, the code extracts the app price from the 5th column of the row (index 4) and converts it to a float. If the price is equal to 0, indicating that the app is free, the entire row is appended to the "apple_data_clean_english_free" list.

Overall, the purpose of this code is to create two new datasets that contain only the free apps with English names, one for the Google Play Store (`"google_data_clean_english_free"`) and one for the Apple App Store (`"apple_data_clean_english_free"`).

In [17]:
# filter out only the free apps
google_data_clean_english_free = []
apple_data_clean_english_free = []

# isolating google free apps
for row in google_data_clean_english:
    app_type = row[6]
    if app_type.lower() == 'free':
        google_data_clean_english_free.append(row)
        
# isolating apple free apps
for row in apple_data_clean_english:
    price = float(row[4])
    if price == 0:
        apple_data_clean_english_free.append(row)


## Validation Strategy Explained

* Build a minimal Android version of the app, and add it to Google Play.
* If the app has a good response from users, we develop it further.
* If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Some common game design elements used in gamification include points, badges, levels, leaderboards, challenges, rewards, feedback, and social features such as sharing and collaboration. These elements are designed to create a sense of achievement, competition, and community among the users, which can help to increase their engagement, motivation, and retention.

We will be using Kivy to develope app games. 

Kivy is an open-source Python library that allows developers to create multi-touch applications with a natural user interface for desktop and mobile devices. It provides a cross-platform framework for building user interfaces and supports various input devices like mouse, keyboard, and touchscreen. Kivy also includes built-in support for graphics, audio, and video, making it easy to create visually appealing apps.

Kivy is compatible with Windows, macOS, Linux, iOS, and Android, and it can be used with a variety of programming languages, including Python.

By looking at the data headers we can determine which columns will be best to create frequency tables that might be useful for determining the most common genres in each market.


In [18]:
print('Google Data Headers')
print(google_data[0]) # 'Category' index[1] and 'Genres' index[9]
print('\n')
print('Apple Data Headers')
print(apple_data[0]) # 'prime_genre' index[11]

Google Data Headers
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Apple Data Headers
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


We'll build two functions we can use to analyze the frequency tables:

* One function to generate frequency tables that show percentages
* Another function we can use to display the percentages in a descending order


The `display_table()` function you see below does the following:

* Takes in two parameters: `dataset` and `index`. `dataset` will be a list of lists, and `index` will be an integer
* Generates a frequency table using the `freq_table()` function (which you're going to write as an exercise)
* Transforms the frequency table into a list of tuples, then sorts the list in a descending order
* Prints the entries of the frequency table in descending order


In [36]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

This code defines a function called `freq_table` that takes in two parameters: `dataset` and `index`. The `dataset` parameter represents a list of data, and the index parameter represents the index of the column in the data that we want to compute the frequency table for.

The function starts by initializing an empty dictionary called `freq_table`. It then loops through each row in the dataset and isolates the data point at the specified index. If the data point already exists in the freq_table dictionary, its frequency count is incremented by 1. If it does not exist, it is added to the dictionary with a frequency count of 1.

The function then prints out the resulting frequency table in its current form.

Next, the function converts the frequency counts in the dictionary to percentages. It loops through each key-value pair in the `freq_table` dictionary, computes the percentage of that value relative to the total number of values in the dictionary, and rounds it to two decimal places. The resulting dictionary is then printed out.

Overall, the function computes a frequency table for a specified column in a dataset, and converts the frequency counts to percentages.

In [37]:
def freq_table(dataset, index):
    '''
    this function loops through a given column of a dataset and 
    returns the number of occurances as a percentage
    '''
    table = {}
    total = 0
    # loop through the dataset isolate prime genres
    for row in dataset:
        total += 1
        a_data_point = row[index]
        if a_data_point in table:
            table[a_data_point] += 1
        else:
            table[a_data_point] = 1
    
    # convert frequency in dictionary to percentages
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table[key] = round(percentage, 2)
    # print(freq_table) # used for testing
    return table


## We now display the frequency table for the following:
* apple data [Column] prime_genre index[11]  
* google data [Column] Genres index[9]
* google data [Column] Category index[1]


In [21]:
# apple data [Column] prime_genre index[11]  
display_table(apple_data_clean_english_free, 11)

Games : 58.26
Entertainment : 7.84
Photo & Video : 5.0
Education : 3.68
Social Networking : 3.31
Shopping : 2.59
Utilities : 2.47
Sports : 2.15
Music : 2.06
Health & Fitness : 2.03
Productivity : 1.75
Lifestyle : 1.56
News : 1.34
Travel : 1.25
Finance : 1.09
Weather : 0.87
Food & Drink : 0.81
Reference : 0.53
Business : 0.53
Book : 0.37
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12


In [22]:
#google data [Column] Genres index[9]
# display_table(google_data_clean_english_free, 9) #NOT GOING TO USE - NOT ENOUGH DATA GROUP TOGETHER

In [23]:
# google data [Column] Category index[1]
display_table(google_data_clean_english_free, 1)

FAMILY : 18.93
GAME : 9.7
TOOLS : 8.45
BUSINESS : 4.6
PRODUCTIVITY : 3.9
LIFESTYLE : 3.89
FINANCE : 3.71
MEDICAL : 3.54
SPORTS : 3.39
PERSONALIZATION : 3.32
COMMUNICATION : 3.23
HEALTH_AND_FITNESS : 3.09
PHOTOGRAPHY : 2.95
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.67
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.87
VIDEO_PLAYERS : 1.8
MAPS_AND_NAVIGATION : 1.39
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.8
WEATHER : 0.79
EVENTS : 0.71
PARENTING : 0.66
ART_AND_DESIGN : 0.64
COMICS : 0.61
BEAUTY : 0.6


The column `'Genres'` in the google data set has a wide variety compared to the column `'Catagory'`.

So we work with the column `'Catagory'` for the rest of the analysis

## Most Popular Apps by Genre on the App Store

One way to find out what genres are the most popular (have the most users) is to calculate the average number of `installs` for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app.

Below, we calculate the average number of user ratings per app genre on the App Store:


In [24]:
genres_apple = freq_table(apple_data_clean_english_free, 11)

for genre in genres_apple:
    total = 0
    len_genre = 0
    for app in apple_data_clean_english_free:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total /len_genre
    print(genre, ':', round(avg_n_ratings, 2))
            


Social Networking : 71548.35
Photo & Video : 28441.54
Games : 22886.37
Music : 57326.53
Reference : 79350.47
Health & Fitness : 23298.02
Weather : 52279.89
Utilities : 19156.49
Travel : 28243.8
Shopping : 27230.73
News : 21248.02
Navigation : 86090.33
Lifestyle : 16815.48
Entertainment : 14195.36
Food & Drink : 33333.92
Sports : 23008.9
Book : 46384.92
Finance : 32367.03
Education : 7003.98
Productivity : 21028.41
Business : 7491.12
Catalogs : 4004.0
Medical : 612.0


On average we see `Navigation` has the lead with `86,090`, then `Reference` with `79,350` and third is `Social Networking` with `71,548`

In [25]:
for app in apple_data_clean_english_free:
    if app[11] == 'Navigation':
        print(app[1], ":", app[5])
        

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [26]:
for app in apple_data_clean_english_free:
    if app[11] == 'Reference':
        print(app[1], ":", app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


In [27]:
for app in apple_data_clean_english_free:
    if app[11] == 'Social Networking':
        print(app[1], ":", app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

## Most Popular Apps by Genre on the Google Play Store


In [28]:
display_table(google_data_clean_english_free, 5)

1,000,000+ : 15.76
100,000+ : 11.54
10,000,000+ : 10.57
10,000+ : 10.2
1,000+ : 8.4
100+ : 6.93
5,000,000+ : 6.83
500,000+ : 5.56
50,000+ : 4.77
5,000+ : 4.49
10+ : 3.54
500+ : 3.24
50,000,000+ : 2.28
100,000,000+ : 2.14
50+ : 1.92
5+ : 0.79
1+ : 0.51
500,000,000+ : 0.27
1,000,000,000+ : 0.23
0+ : 0.05


In [45]:
catagory_google = freq_table(google_data_clean_english_free, 1)

for catagory in catagory_google:
    total = 0
    len_catagory = 0
    for app in google_data_clean_english_free:
        catagory_app = app[1]
        if catagory_app == catagory:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            total += float(n_installs)
            len_catagory += 1
    avg_n_installs = total / len_catagory
    print(catagory, ':', round(avg_n_installs, 2))
        

ART_AND_DESIGN : 1986335.09
AUTO_AND_VEHICLES : 647317.82
BEAUTY : 513151.89
BOOKS_AND_REFERENCE : 8814199.79
BUSINESS : 1712290.15
COMICS : 832613.89
COMMUNICATION : 38590581.09
DATING : 854028.83
EDUCATION : 1833495.15
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
FINANCE : 1387692.48
FOOD_AND_DRINK : 1924897.74
HEALTH_AND_FITNESS : 4188821.99
HOUSE_AND_HOME : 1360598.04
LIBRARIES_AND_DEMO : 638503.73
LIFESTYLE : 1446158.22
GAME : 15544014.51
FAMILY : 3697848.17
MEDICAL : 120550.62
SOCIAL : 23253652.13
SHOPPING : 7036877.31
PHOTOGRAPHY : 17840110.4
SPORTS : 3650602.28
TRAVEL_AND_LOCAL : 13984077.71
TOOLS : 10830251.97
PERSONALIZATION : 5201482.61
PRODUCTIVITY : 16787331.34
PARENTING : 542603.62
WEATHER : 5145550.29
VIDEO_PLAYERS : 24727872.45
NEWS_AND_MAGAZINES : 9549178.47
MAPS_AND_NAVIGATION : 4049274.63


On average we see `COMMUNICATION` has the lead with `38,590,581,09`, then `VIDEO_PLAYERS` with `24,727,872.45` and third is `SOCIAL` with `23,253,652.13`




In [52]:
for app in google_data_clean_english_free:
    if app[1] == 'COMMUNICATION':
        print(app[0], ":", app[5])


WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

In [53]:
for app in google_data_clean_english_free:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0], ":", app[5])

YouTube : 1,000,000,000+
All Video Downloader 2018 : 1,000,000+
Video Downloader : 10,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Video Player All Format : 10,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
VLC for Android : 100,000,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
YouTube Studio : 10,000,000+
video player for android : 10,000,000+
Vigo Video : 50,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,

In [54]:
for app in google_data_clean_english_free:
    if app[1] == 'SOCIAL':
        print(app[0], ":", app[5])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
LiveMe - Video chat, new friends, and make money : 10,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, sti

## Conclusions

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. 

This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.
