# Profitable App Profiles for the App Store and Google Play Markets

**About the company** 

Mobile Free is a company that builds mobile apps for Android and iOS users. All the company's apps are in English and are free to download on the App Store and on Google Play. 

**About this project**

Since the main source of revenue for the company are in-app ads, the profit that the company makes is proportional to the amount of users who engage with those ads. Thus, it is desirable that more users are able to see and engage with the in-app ads to generate more revenue. The purpose of this project is to determine which types of apps are more likely to attract more users. 

This project was accomplished as part of [Dataquest's](https://www.dataquest.io/) Data Analyst Path.

**Results**

A recommendation was made for a Travel app profile. It was determined that apps under this genre are quite popular and are able to attract a substantial amount of users. The specific features of the app recommendation, which include showing local tourist destinations and generating itinerary options, were selected due to minimal competition among the apps for both the App Store and Google Play markets. This can boost chances for user attraction and, subsequently, revenue generation.

# Exploring the datasets

Two samples of data, collected from apps available on the App Store and on Google Play, will be used. 

* The data set containing about seven thousand iOS apps from the App Store can be downloaded from [this link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home). 

* The data set containing about ten thousand Android apps from Google Play can be downloaded from [this link](https://www.kaggle.com/lava18/google-play-store-apps). 

Below, the data sets are opened and are converted to two separate lists of lists, one for Android apps and one for iOS apps. 

**NOTE**: The headers for each data set are separated. This is to avoid any errors later on in the data cleaning and data analysis process. The headers are printed after both data sets are converted. 

In [1]:
from csv import reader

# Android apps data set
open_ggl = open('googleplaystore.csv', encoding='utf8')
read_ggl = reader(open_ggl)
ggl_apps = list(read_ggl)
ggl_header = ggl_apps[0] # header for the Android apps data set
ggl_apps = ggl_apps[1:] 


# iOS apps data set
open_apl = open('AppleStore.csv', encoding='utf8')
read_apl = reader(open_apl)
apl_apps = list(read_apl)
apl_header = apl_apps[0] # header for the iOS apps data set
apl_apps = apl_apps[1:]

# Printing the headers of both data sets
print('Android apps header')
print('\n') # This creates a space between two lines for better readability
print(ggl_header)
print('\n') 
print('iOS apps header')
print('\n')
print(apl_header)

Android apps header


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


iOS apps header


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


To explore the converted data sets (headers not included), a function is created that prints the rows specified by the arguments taken in by the `start` and `end` parameters. The total number of rows and columns are printed when the `rows_and_columns` parameter takes in an argument of Boolean `True`. A default argument of `False` is specified in the function. 

The first three rows of each data set are printed to demonstrate how the function works. The total number of rows and columns for each data set are also printed. 

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        
    if rows_and_columns:
        print('\n')
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print('Android apps data set')   
print('\n')
explore_data(ggl_apps, 0, 3, True) 
print('\n')
print('iOS apps data set') 
print('\n')
explore_data(apl_apps, 0, 3, True)        

Android apps data set


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


iOS apps data set


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']
['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805',

## Cleaning the data

In order to conduct an analysis that will yield accurate results, it is necessary to "clean" the data set of entries that may contribute to erroneous conclusions. Data cleaning for this project will involve:

- Removing inaccurate data
- Removing duplicate entries
- Removing non-english apps
- Isolating free apps

### Removing inaccurate data

When the data set of Android apps is explored extensively, an error can be observed with the row of index 10472. 

In [3]:
print(ggl_header)
print('\n')
print(ggl_apps[10472])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


From the printed lines above, the row with index 10472 lacks an entry for the **Category** column. The **Genres** column is also empty. Since on-hand information is not available to fill up the entries, this row will be deleted using the `del()` function. Removing this singular row will not have a significant impact on the analysis later on. 

After deleting the row, the new row assigned with the index is printed to show the change made. 

In [4]:
del(ggl_apps[10472])
print(ggl_apps[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


# Removing duplicate entries

Since the data set for iOS apps does not contain inaccurate data, the next step is to check whether the data sets have duplicate entries. 

The cell below contains code which scans the data set of Android apps for duplicates using a **for loop**. The names of the Android apps (which are in the column with index 0) are appended to the `unique_apps` list. If the loop comes across a name already in the previous list, this name is automatically appended to the `duplicate_apps` list. 

Once the loop is executed, the number of duplicate entries and some examples of them are printed. 

In [5]:
unique_apps_ggl = []
duplicate_apps_ggl = []

for row in ggl_apps:
    name = row[0]
    if name in unique_apps_ggl:
        duplicate_apps_ggl.append(name)
    else:
        unique_apps_ggl.append(name)
        
print('Number of unique entries:', len(unique_apps_ggl)) 
print('Number of duplicate entries:', len(duplicate_apps_ggl)) 
print('\n')

print('Some duplicate app names:')
print(duplicate_apps_ggl[0:10])

Number of unique entries: 9659
Number of duplicate entries: 1181


Some duplicate app names:
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


It can be seen that there is quite a number of duplicate entries, 1181. To scrutinize the contents of these entries and to come up with a possible reason for their presence in the data set, entries with the app names of 'Slack' and 'Google Ads' are printed. 

In [6]:
print('Examples of duplicate entries:')
print('\n')

for row in ggl_apps:
    name = row[0]
    if name == 'Slack':
        print(row)

print('\n')
        
for row in ggl_apps:
    name = row[0]
    if name == 'Google Ads':
        print(row)

Examples of duplicate entries:


['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


The duplicate entries above show that the only difference among the entries is in the **number of reviews** for the app. A possible reason for this difference is that the entries could have been included in the data set at different times. 

As a criteria, only the entry among the duplicates with the **highest** number of reviews will be retained. This is because, most likely, this entry was the latest to be included in the data set. 

In order to do this, a **dictionary** is first created wherein the app name will be the dictionary key and the highest number of reviews will be the corresponding value. A **for loop** will loop through the entire data set and store the necessary data in the dictionary. In-line commments are present to better understand the code. 

The number of app names in the dictionary is printed afterwards. This number needs to be the same as that **Number of unique entries**, which is 9659. Note that non-duplicate apps are also included in the dictionary. 

In [7]:
reviews_max = {}

for row in ggl_apps:
    name = row[0]
    n_reviews = float(row[3]) # The figures need to be converted to the float data type since they are stored as strings in the data set
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max: # An else statement cannot be used here since it can lead to the wrong value being assigned
        reviews_max[name] = n_reviews
        
print(len(reviews_max))
 

9659


After the dictionary is created, an empty list, `ggl_clean`, which will contain the entries to be retained, is made. This list will be used in the next step of the data cleaning process.

The dictionary serves as the reference for the entries to be retained. A **for loop** iterates through the Android apps data set and appends the pertinent entries to the empty list. 

The list `already_added` is needed to avoid appending the duplicate entries.

The number of entries in the `ggl_clean` list is printed and should be equal to 9659.

In [8]:
ggl_clean = []
already_added = []

for row in ggl_apps:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        ggl_clean.append(row)
        already_added.append(name)
        
print(len(ggl_clean))

9659


For the iOS apps data set, it can be observed that there are two duplicate app names. Note that the index for the app names is 1.

In [9]:
unique_apps_apl = []
duplicate_apps_apl = []

for row in apl_apps:
    name = row[1]
    if name in unique_apps_apl:
        duplicate_apps_apl.append(name)
    else:
        unique_apps_apl.append(name)
        
print('Number of unique entries:', len(unique_apps_apl)) 
print('Number of duplicate entries:', len(duplicate_apps_apl)) 
print('\n')

print('Some duplicate app names:')
print(duplicate_apps_apl[0:10])

Number of unique entries: 7195
Number of duplicate entries: 2


Some duplicate app names:
['Mannequin Challenge', 'VR Roller Coaster']


In [10]:
print('Examples of duplicate entries:')
print('\n')

for row in apl_apps:
    name = row[1]
    if name == 'Mannequin Challenge':
        print(row)

print('\n')
        
for row in apl_apps:
    name = row[1]
    if name == 'VR Roller Coaster':
        print(row)

Examples of duplicate entries:


['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']


['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


However, when analyzing the rows with these app names, the differences can be observed in multiple columns instead of just one. From this, it can be deduced that these apps are not duplicate apps. Instead, these are very different apps which happen to have the same name.

# Removing Non-English apps

Since Mobile Free produces only English apps, Non-English apps need to be removed as they are irrelevant for this project.

To detect Non-English apps in the data set, a function needs to be created that checks whether or not a string contains characters not typically used in the English language. Characters have corresponding numbers associated with them and to check for these numbers, the built-in `ord()` function is used. 

The use of the function is demonstrated below. 

In [11]:
print(ord('a'))
print(ord('4'))
print(ord('#'))

97
52
35


From the American Standard Code for Information Interchange (ASCII), the characters that are commonly used in English texts have numbers equal to or less than 127.

Despite this, there are some characters, such as emojis like 😜 or symbols like ™ that have numbers greater than 127. This complicates things as there are English texts which make use of characters such as these. 

To minimize data loss, a criteria can be set that strings which have more than three (3) characters of number greater than 127 shall be considered Non-English apps. Although this criteria is not perfect, it will suffice for the purposes of this project. 

With this in mind, a function below checks whether the following texts are English or not by taking in a string as an argument:

- 'Docs To Go™ Free Office Suite'
- 'Instachat 😜'
- '爱奇艺PPS -《欢乐颂2》电视剧热播'

Since strings are iterable, **for loops** can be used on them. 

In [12]:
def english_apps(string):
    count = 0
    for char in string:
        if ord(char) > 127:
            count += 1
            if count > 3:                
                print('Not English: ', string)
                break
    if count <= 3:            
        print('English: ', string)        

english_apps('Docs To Go™ Free Office Suite')
english_apps('Instachat 😜')
english_apps('爱奇艺PPS -《欢乐颂2》电视剧热播')

English:  Docs To Go™ Free Office Suite
English:  Instachat 😜
Not English:  爱奇艺PPS -《欢乐颂2》电视剧热播


The function above can now be used to filter out non-English apps from both data sets. Two empty lists, `apl_english` and `ggl_english`, will be created and will contain the English apps. 

The previous function is slightly modified to take in the following arguments:

- Cleaned data sets from the previous step (which removed duplicates)
- Index number of the column containing the app names
- The new data set to contain the English apps

The modified function is intended to work for both data sets. It loops through the cleaned data sets and appends entries for English apps to the initially empty lists. 

Note that the `apl_apps` dataset is used since no new, cleaned data set was created from the previous step for iOS apps.

In [13]:
apl_english = []
ggl_english = []

def english_apps(clean_dataset, name_index, eng_dataset):
    for row in clean_dataset:
        count = 0
        for char in row[name_index]:
            if ord(char) > 127:
                count += 1
                if count > 3:
                    break 
        if count <= 3:
            eng_dataset.append(row)

english_apps(ggl_clean, 0, ggl_english)
english_apps(apl_apps, 1, apl_english) # Note that the apl_apps dataset is used 

print('English Google Play apps:',len(ggl_english))
print('English App Store apps:',len(apl_english))

English Google Play apps: 9614
English App Store apps: 6183


The number of remaining entries for both data sets are shown above after removing Non-English apps. 

# Isolating free apps

The final step of the data cleaning process is to isolate the apps that are free. The cleaned data set resulting from this final step will be used for the succeeding analysis. 

Two empty lists, `apl_free` and `ggl_free`, are created below. Once these lists are filled, they will serve as the cleaned data sets just mentioned. 

The function below takes in three arguments:

- The data set from the previous step containing only English apps 
- Index number of the column containing the price
- The new data set to contain the free apps

The function loops over the English apps data set and appends the row to the pertinent free apps list when the corresponding price is zero (0). 

In [14]:
apl_free = []
ggl_free = []

def free_apps(eng_apps, price_index, free_dataset):
    for row in eng_apps:
        price = row[price_index]
        if price == '0' or price == '0.0': 
            free_dataset.append(row)
            
free_apps(ggl_english, 7, ggl_free)            
free_apps(apl_english, 4, apl_free)            

print('Free English Google Play apps:',len(ggl_free))
print('Free English App Store apps:',len(apl_free))

Free English Google Play apps: 8864
Free English App Store apps: 3222


The number of remaining apps for both data sets are shown above. 

# Analyzing the data

Now that the data have been cleaned, it's time to move on to the analysis portion of the project. The results of the analysis will help Mobile Free in determining the kinds of apps that need to be focused on for development.

### Validation Strategy

The goal of the analysis is to find the app profiles that will be successful in both Google Play and the App Store. In order to do this, the following steps will comprise the validation strategy:

- Add to Google Play an initial version of an Android app. This version will contain minimal and essential features. 
- Develop further the app if user response is positive. Additional features that will enhance user experience may be added. Any bugs will be fixed.
- If, after six months, the app is profitable, an iOS version will be built and added to the App Store. 

### Most Popular Apps by Genre

To start the analysis that will serve as the basis for carrying out the above strategy, the most common genres of apps for both data sets will be determined. This will done through the creation of frequency tables for each. 

Two functions are created to aid in this. The first function creates the actual frequency tables by taking in the pertinent final data set and the genre index as arguments. The `count` dictionary stores the genre type as the dictionary key and the number of instances it appears in the data set as the corresponding value. The values are then converted to percentages, for easier analysis, and stored in the `percent` dictionary. The `percent` dictionary is then returned when the `freq_table()` function is called.

In [15]:
 def freq_table(dataset, index):
    tot_count = 0
    count = {}
    for row in dataset:
        tot_count += 1
        if row[index] in count:
            count[row[index]] += 1
        elif row[index] not in count:
            count[row[index]] = 1
            
    percent = {}
    for entry in count:
        percent[entry] = (count[entry]/tot_count) * 100
        
    return percent

The second function's purpose is to display the contents of the `percent` dictionary in a readable manner and in order of percentage values from highest to lowest. After taking in the same arguments as the `freq_table()` function, the `display_table` function calls the former, loops through `percent` dictionary, and stores the entries as tuples. In each tuple, the percentage value comes first, followed by the dictionary key. The tuples are then appended to the `table_display` list. 

The tuple conversion is necessary as the `sorted()` function, which arranges the order of the entries, only works on the dictionary keys. The dictionary keys of the `percent` dictionary are the genre names, so the `sorted()` function will not work on it as intended. 

Once the `table_display` list is filled, the `sorted()` function arranges the entries in descending order. A **for loop** prints each entry in the intended order. 

For the cleaned iOS apps data set, the index number of the `prime_genre` column is 11. For the cleaned Android apps data set, there are two columns that can be analyzed: the `Category` column, with index 1, and the `Genres` column, with index 9. The `Genres` column is more specific than the `Category` column. As this project is concerned with a more general perspective of which kinds apps are popular, the `Category` column will be analyzed.

In [16]:
def display_table(dataset, index):
    
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

print('App Store apps:')
display_table(apl_free, 11)
print('\n')
print('Google Play apps (Category):')
display_table(ggl_free, 1)

App Store apps:
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Google Play apps (Category):
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122

As can be seen from the results above, majority of the apps in the App Store fall under the Games genre at 58.16%. This is followed by Entertainment at 7.88% and Photo & Video at 4.97%. It can be deduced from this that most apps in the App Store are catered towards leisure and entertainment. More practical apps, such as those under Utilities, Productivity, and Finance, comprise a much smaller portion of the app population. The distribution of genres in the above list also corroborates this as most apps on the lower half, which have small percentages, are for practical purposes.                                         

For the Google Play apps (Category) list, the results are more distributed rather than dominated by a single genre. There appears to be a more balanced mix between apps catered for leisure and entertainment and apps catered for practical purposes. Although fun apps are still at the top two places, the percentages are not as large as those in the App Store list. 

Despite knowing the amount of apps, by genre, present in the data sets, this is still not enough to determine which ones attract the most users. It can be the case that the App Store is saturated with gaming apps but these apps don't garner that many users as the others. To figure this out, further analysis of the data sets will be done.

# Most Popular Apps by Genre: App Store

In order to determine the kinds of apps that attract the most users, the actual number of people that use such apps needs to be determined. The Google Play data set makes this easier through the `Installs` column. As this is not present in the App Store, another column will be used as a substitute. The closest that can be used is the `rating_count_tot` column, which contains the total number of user ratings per app. 

As each app per genre contains varied amounts for the user ratings, the average per app per genre will be computed. This will then be used to determine the popularity of the apps by genre.  

In the `ratings_per_genre()` function below, the following arguments are taken:

- The pertinent final data set
- The index of the `prime_genre` column
- The index of the `rating_count_tot` column

A frequency table for the genres is first generated. 

A **nested for loop** is then created to compute the average number of user ratings per app per genre. The *outer* **for loop** loops through the `genre_list` dictionary. Two variables, `total` and `len_genre` are then initiated with values of 0. In-line comments describe the variables. 

While the *outer* **for loop** is at a specific genre in the `genre_list` dictionary, the inner **for loop** loops through the data set and checks whether the genre of a specific app is equal to the current genre the *outer* loop is at. If this checks out, the values of the `total` and `len_genre` variables are increased accordingly. Please note that the `num_ratings` variable needs to be converted to a *float* type since it is stored as a string in the CSV file. 

Once the *inner* loop has finished its task, the average number of user ratings per app per genre is computed by dividing the total number of user ratings per genre by the total number of apps per genre. The genre name and the average number of user ratings are added to the `genre_ave` dictionary. The process is repeated until all genres in the `genre_list` dictionary have been looped through.

To show the contents of the `genre_ave` dictionary in descending order, the `display_table()` function is slightly modified to take in the same arguments as the `ratings_per_genre()` function. The name is also changed to `display_table_genre()` to avoid confusion and for correct execution of the code. 

The results of the operation are shown.

In [17]:
 def ratings_per_genre(dataset, genre_index, ratings_index):
    genre_list = {}
    for row in dataset:
        if row[genre_index] in genre_list:
            genre_list[row[genre_index]] += 1
        elif row[genre_index] not in genre_list:
            genre_list[row[genre_index]] = 1
    
    # nested for loop
    genre_ave = {}   
    for genre in genre_list:
        total = 0 # This variable will store the sum of user ratings specific to each genre.
        len_genre = 0 # This variable will store the number of apps specific to each genre
        for row in dataset:
            genre_app = row[genre_index]
            if genre_app == genre:
                num_ratings = row[ratings_index]
                total += float(num_ratings)
                len_genre += 1
        ave = total/len_genre
        genre_ave[genre] = ave
    return genre_ave

def display_table_genre(dataset, genre_index, ratings_index):
    
    table = ratings_per_genre(dataset, genre_index, ratings_index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])        
        
display_table_genre(apl_free, 11, 5)      

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


As can be seen from the results above, Navigation apps are those with the highest average for number of user ratings. Gaming apps, which were at the top of list for their percent share in the App Store, fall in 14th place in the list above. It is impressive to note that even at only a 0.186% share in the App Store, Navigation apps manage to get an average of 86,090.33 ratings per app. Although this appears striking at first, it would be good to see what apps comprise this genre.

To see the apps under the Navigation genre and the corresponding number of ratings, a **for loop** is created below. 

In [18]:
for row in apl_free:
    if row[11] == 'Navigation':
        print(row[1], ':', row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The results show that there are only six (6) apps under the Navigation genre. The high value of the average user rating is skewed by the top apps of Waze and Google Maps. As it looks, Navigation apps might not be that popular in general since only two apps get significant attention.  

Based on the results, it would not be practical for Mobile Free to enter the market for Navigation apps due to the stiff competition already present. With only a few apps under Navigation, it would be resource-intensive to gain a substantial user base in a market dominated by only two apps. Moreover, since revenue comes only from in-app ads, it would take a significant amount of time before a return on investment happens given the initital effort that needs to be done. The time could have been spent on more productive efforts. 

The same scenario of a few apps dominating the market can be seen in the genres of Reference, Social Networking, and Music with apps like Bible, Facebook, and Pandora topping the lists, respectively. 

In [19]:
print('iOS Travel apps')
for row in apl_free:
    if row[11] == 'Travel':
        print(row[1], ':', row[5])

iOS Travel apps
Google Earth : 446185
Yelp - Nearby Restaurants, Shopping & Services : 223885
GasBuddy : 145549
TripAdvisor Hotels Flights Restaurants : 56194
Uber : 49466
Lyft : 46922
HotelTonight - Great Deals on Last Minute Hotels : 32341
Hotels & Vacation Rentals by Booking.com : 31261
Southwest Airlines : 30552
Airbnb : 22302
Expedia Hotels, Flights & Vacation Package Deals : 10278
Fly Delta : 8094
Hopper - Predict, Watch & Book Flights : 6944
United Airlines : 5748
Skiplagged — Actually Cheap Flights & Hotels : 1851
Viator Tours & Activities : 1839
iExit Interstate Exit Guide : 1798
Gogo Entertainment : 1482
Google Street View : 1450
Webcams – EarthCam : 912
HISTORY Here : 685
DB Navigator : 512
Mobike - Dockless Bike Share : 494
MiFlight™ – Airport security line wait times at checkpoints for domestic and international travelers : 493
BlaBlaCar - Trusted Carpooling : 397
Six Flags : 353
Google Trips – Travel planner : 329
Voyages-sncf.com : book train and bus tickets : 268
Trainl

Checking the apps under the Travel genre, Google Earth tops with 446,185 user ratings. When a closer look is taken though, it can be observed that a variety of other apps with different purposes are present. There are apps for checking, booking, and managing reservations for hotels, restaurants, and flights. Ride-hailing services like Uber and Lyft are also present in the list. There are also apps focused on specific regions, such as Trainline UK and FlixBus which focus on train and bus travel. 

From the different user ratings, it can be said that Travel apps are popular enough to have a substantial user base. 
Aside from those at the top three positions of the list, the number of average user ratings seems fairly distributed with the rest of the apps. With the variety of the different Travel apps, there is potential in building an app that might stand out. Even the top three apps serve different purposes from each other.

A potential app that can be built by Mobile Free is as an app that allows travelers to check local tourist destinations for a specific area. They can input a certain location or select from those available in the app, say Honolulu, Hawaii. The app can then recommend popular attractions in the locale, complete with a description, reviews, pictures, and things to do per destination. Filters, such as budget or price, popularity ratings, proximity from the location, among others, can be used to cater the results towards the specific needs of the traveler. An automatic itinerary generator can then show various itinerary options based on the user input. 

If user response is positive, additional features can be added. Instead of having to rely on third party apps, users can use Mobile Free's app for recommendations on accommodation and travel packages while also accounting for filters. 

Since the App Store is saturated with apps for leisure and entertainment, this app profile recommendation can have a better chance of standing out.

# Most Popular Apps by genre: Google Store 

As previously mentioned, the Google Play data set makes determining popular apps easier through the `Installs` column. Looking at this column though, it can be noticed that the values are not exact. 

Below, it can be seen that the values for installs per app are open-ended values. Accuracy of the values is not of primary concern though since focus is placed on the popularity of an app genre. The numbers will be left as is.

In [20]:
count = 0
for row in ggl_free:
    count += 1
    print(row[0], ':', row[5])       
    if count == 10:
        break

Photo Editor & Candy Camera & Grid & ScrapBook : 10,000+
U Launcher Lite – FREE Live Cool Themes, Hide Apps : 5,000,000+
Sketch - Draw & Paint : 50,000,000+
Pixel Draw - Number Art Coloring Book : 100,000+
Paper flowers instructions : 50,000+
Smoke Effect Photo Maker - Smoke Editor : 50,000+
Infinite Painter : 1,000,000+
Garden Coloring Book : 1,000,000+
Kids Paint Free - Drawing Fun : 10,000+
Text on Photo - Fonteee : 1,000,000+


Below, two functions, which are practically the same as those created for the iOS apps data set for determining and displaying the average user ratings, are created. The only differences lie in the function names, the parameter names, some variable names, and the content of the `installs_per_genre()` function. Since the values under the `installs` column contain characters that are not numbers, these characters need to be removed first for the `float()` function to work properly. Other than what was mentioned, everything else is the same as the previous functions for the iOS apps data set. 

In [21]:
def installs_per_genre(dataset, category_index, installs_index):
    category_list = {}
    for row in dataset:
        if row[category_index] in category_list:
            category_list[row[category_index]] += 1
        elif row[category_index] not in category_list:
            category_list[row[category_index]] = 1
            
    category_ave = {}    
    for category in category_list:
        total = 0 # This variable will store the sum of installs specific to each genre.
        len_category = 0 # This variable will store the number of apps specific to each genre
        for row in dataset:
            category_app = row[category_index]
            if category_app == category:
                num_installs = row[installs_index]
                num_installs = num_installs.replace('+','')
                num_installs = num_installs.replace(',','')
                total += float(num_installs)
                len_category += 1
        ave = total/len_category     
        category_ave[category] = ave         
    return category_ave
        
def display_table_installs(dataset, category_index, installs_index):
    
    table = installs_per_genre(dataset, category_index, installs_index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])        
        
display_table_installs(ggl_free, 1, 5)      

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

From the list above, Communication apps come first with an average of 38,456,119.17 installs. Since the app profile recommendation for the App Store is a Travel app, the apps under the TRAVEL_AND_LOCAL category of Google Play will be analyzed. The goal of the project is determine an app profile for both the App Store and Google Play so this step is consistent with the objective. Analysis of the said category will determine whether a Travel app will be profitable for Mobile Free for both Android and iOS users.   

In [22]:
print('Android Travel apps')
for row in ggl_free:
    if row[1] == 'TRAVEL_AND_LOCAL':
        print(row[0], ':', row[5])

Android Travel apps
trivago: Hotels & Travel : 50,000,000+
Hopper - Watch & Book Flights : 5,000,000+
TripIt: Travel Organizer : 1,000,000+
Trip by Skyscanner - City & Travel Guide : 500,000+
CityMaps2Go Plan Trips Travel Guide Offline Maps : 1,000,000+
KAYAK Flights, Hotels & Cars : 10,000,000+
World Travel Guide by Triposo : 500,000+
Booking.com Travel Deals : 100,000,000+
Hostelworld: Hostels & Cheap Hotels Travel App : 1,000,000+
Google Trips - Travel Planner : 5,000,000+
GPS Map Free : 5,000,000+
GasBuddy: Find Cheap Gas : 10,000,000+
Southwest Airlines : 5,000,000+
AT&T Navigator: Maps, Traffic : 10,000,000+
VZ Navigator : 50,000,000+
KakaoMap - Map / Navigation : 10,000,000+
AirAsia : 10,000,000+
Expedia Hotels, Flights & Car Rental Travel Deals : 10,000,000+
Goibibo - Flight Hotel Bus Car IRCTC Booking App : 10,000,000+
Allegiant : 1,000,000+
Amtrak : 1,000,000+
JAL (Domestic and international flights) : 1,000,000+
Flight & Hotel Booking App - ixigo : 5,000,000+
VZ Navigator fo

One observation from the list of Android travel apps is that quite a number of them have significant amounts of installs. This demonstrates the popularity of such apps, similar with those in the App Store. As compared to the App Store list though, there are much more apps for this category on Google Play. 

Just like the App Store list, Google Play also has apps for checking, booking, and managing reservations for hotels, restaurants, and flights (trivago, Hopper, and Hostelworld to name a few). Apps focused on specific locations are also on the list, such as Florida Travel Guide and Santa Fe Map and Walks. 

After scanning through the list, it looks like the app profile recommendation for the App Store can be applicable here as well. Despite the greater number of travel apps here, there isn't much competition present with World Travel Guide by Triposo being one of the few. The recommendation has a chance of working also since, as observed, Google Play has a good mix of apps for leisure and entertainment and apps for practical purposes.

# Conclusion  

Based on the analysis conducted on apps for both the App Store and Google Play, a recommendation for a Travel app profile was made. It appeared to be a viable option as apps for Travel are quite popular for both iOS and Android users. The specific recommendation also has a chance of standing out as there is not much competition for its particular niche. 

Mobile Free can build an app that allows travelers to check local tourist destinations for a specific area. They can input a certain location or select from available locations in the app. The app can then recommend popular attractions in the specified location, complete with a description, reviews, pictures, and things to do per destination. Filters, such as budget or price, popularity ratings, proximity from the location, among others, can be used to cater the results towards the specific needs of the traveler. An automatic itinerary generator can then show various itinerary options based on the user input. 

If user response is positive, additional features can be added. Instead of having to rely on third party apps, Mobile Free's app can also recommend accommodation and travel packages for the user while also accounting for filters. 

As stated in the Validation Strategy, the app will first be tested in Google Play. Depending on user response and profitability, the app can be improved and eventually added to the App Store. 
