# What App Genres Attract Users?

For this project, I'm working as a data analyst for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and in the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand **what type of apps are likely to attract more users**. We will do so using data from the [Google Play Store](https://www.kaggle.com/datasets/lava18/google-play-store-apps) and from the [Apple Store](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps), given that these existing samples should be able to tell us generalizable information relevant to our mission.

## Importing the Data Sets

In [1]:
opened_file = open('C:/Users/Curtiss Chapman/Desktop/Data Science Training/Dataquest/Data Scientist in Python/Datasets/AppleStore.csv', encoding='utf8')
from csv import reader
read_file = reader(opened_file)
apple_data = list(read_file)

opened_file = open('C:/Users/Curtiss Chapman/Desktop/Data Science Training/Dataquest/Data Scientist in Python/Datasets/googleplaystore.csv', encoding='utf8')
read_file = reader(opened_file)
google_data = list(read_file)


## Defining Key Functions

We'll want a function to look easily at our data sets, checking a particular slice of data and displaying the number of rows and columns.

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

## Exploring the Data

First, here are keys to the dataset variables:

| Store | Column name | Description |
| --- | --- | --- |
| Apple | "id" | App ID |
| Apple | "track name" | App Name |
| Apple | "size_bytes" | Size (Bytes) |
| Apple | "currency" | Currency Type |
| Apple | "price" | Price amount |
| Apple | "rating_count_tot" | User Rating counts (for all version) |
| Apple | "rating_count_ver" | User Rating counts (for current version) |
| Apple | "user_rating" | Average User Rating value (for all version) |
| Apple | "user_rating_ver" | Average User Rating value (for current version) |
| Apple | "ver" | Latest version code |
| Apple | "cont_rating" | Content Rating |
| Apple | "prime_genre" | Primary Genre |
| Apple | "sup_devices.num" | Number of supporting devices |
| Apple | "ipadSc_urls.num" | Number of screenshots showed for display |
| Apple | "lang.num" | Number of supported languages |
| Apple | "vpp_lic" | Vpp Device Based Licensing Enabled |
| Google | "App" | App ID |
| Google | "Category" | Primary Genre |
| Google | "Rating" | Average User Rating |
| Google | "Reviews" | Number of Reviews |
| Google | "Size" | Size |
| Google | "Installs" | Number of downloads/installs |
| Google | "Type" | Paid or Free |
| Google | "Price" | Price |
| Google | "Content Rating" | Content Rating |
| Google | "Genres" | All Genres |
| Google | "Last Updated" | Date Last Updated |
| Google | "Current Ver" | Current Version |
| Google | "Android Ver" | Android Version |

In [3]:
print('\n--- Google Data --- ')
explore_data(google_data, 0, 5, rows_and_columns=True)
print('--- Apple Data --- ')
explore_data(apple_data, 0, 5, rows_and_columns=True)


--- Google Data --- 
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10842
Number of columns: 13
--- Apple Data --- 
['', 'id', 'track_name', 

Quite a few variables seem intersting for looking into what drives app popularity. 

In the Google data set, the most interesting are `Category`, `Rating`, `Reviews`, `Installs`, and `Genres`. 

In the Apple data set, the most interesting are `rating_count_tot`, `user_rating`, `prime_genre`, and `sup_devices.num`. 

Of these interesting variables, those that seem comparable across data sets are `Category` with `prime_genre` and `Reviews` or `Installs` with `rating_count_tot`.

## Cleaning the Data

### First, find app with missing data (from Google dataset discussion) and delete it.

In [4]:
explore_data(google_data, 10469, 10475)
print(google_data[10473])


['Tassa.fi Finland', 'LIFESTYLE', '3.6', '346', '7.5M', '50,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'May 22, 2018', '5.5', '4.0 and up']


['TownWiFi | Wi-Fi Everywhere', 'COMMUNICATION', '3.9', '2372', '58M', '500,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', '4.2.1', '4.2 and up']


['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1

Row 10473 ('Life Made WI-Fi Touchscreen Photo Frame') is missing a category. Because we can't recover the category label, let's delete it.

In [5]:
del google_data[10473]

And then we'll check the data set to make sure it's gone.

In [6]:
explore_data(google_data, 10469, 10475)

['Tassa.fi Finland', 'LIFESTYLE', '3.6', '346', '7.5M', '50,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'May 22, 2018', '5.5', '4.0 and up']


['TownWiFi | Wi-Fi Everywhere', 'COMMUNICATION', '3.9', '2372', '58M', '500,000+', 'Free', '0', 'Everyone', 'Communication', 'August 2, 2018', '4.2.1', '4.2 and up']


['Jazz Wi-Fi', 'COMMUNICATION', '3.4', '49', '4.0M', '10,000+', 'Free', '0', 'Everyone', 'Communication', 'February 10, 2017', '0.1', '2.3 and up']


['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


['Sat-Fi Voice', 'COMMUNICATION', '3.4', '37', '14M', '1,000+', 'Free', '0', 'Everyone', 'Communication', 'November 21, 2014', '2.2.1.5', '2.2 and up']




It's gone.

### Are all rows the same length in both data sets?

Now we'll create a function that creates a frequency table for number of columns in the data set, which we can use for both data sets.


In [7]:
def ft_ncols(dataset):
    ncols_ft = {}

    for row in dataset:
        ncols = str(len(row))
        #print(ncols)
        if ncols in ncols_ft:
            ncols_ft[ncols] += 1
        elif ncols not in ncols_ft:
            ncols_ft[ncols] = 1
        
    print(ncols_ft)


We can use this function to take a look at whether there is more than one variable representing the number of columns in the data set.

In [8]:
print('Google')
ft_ncols(google_data)
print('Apple')
ft_ncols(apple_data)

Google
{'13': 10841}
Apple
{'17': 7198}


Because there is only one length of each row, we know that all rows are the same length in each data set.

### Are there duplicate apps in either data set?

In any data set, we want to be wary that there are no duplicated entries. Here, we check each data set for applications that appear more than once.

In [9]:
unique_apps_apple = [] 
duplicate_apps_apple = [] 

for app in apple_data: 
    app_name = app[1] 

    if app_name not in unique_apps_apple:
        unique_apps_apple.append(app_name)
    else:
        duplicate_apps_apple.append(app_name)
            
unique_apps_google = [] 
duplicate_apps_google = [] 

for app in google_data: 
    app_name = app[0] 

    if app_name not in unique_apps_google:
        unique_apps_google.append(app_name)
    else:
        duplicate_apps_google.append(app_name)
        
print('Apple')      
print(len(duplicate_apps_apple))
print('Google')
print(len(duplicate_apps_google))


Apple
0
Google
1181


There are no duplicates in the Apple store, but there are lots of duplicates in the Google app store. Let's look at a few in the context of the data to see what we might use to distinguish them and trim them.

In [10]:
for app in google_data:
    if app[0] == 'Quick PDF Scanner + OCR FREE':
        print(app)
    
for app in google_data:
    if app[0] == 'Box':
        print(app)
        
for app in google_data:
    if app[0] == 'Google My Business':
        print(app)
    

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Fr

We want to remove duplicates so that we're not counting apps more than once, but don't want to remove the duplicates randomly. The main variable that seems to differ between the duplicates is the number of reviews. Given that we want to know what features draw more users to apps, it may be helpful to keep entries with more reviews. More reviews should give a more representative overall rating variable, and we expect overall rating to relate to the draw of the app.

To do so, we'll first make a dictionary of the maximum number of reviews for each app.

In [11]:
g_reviews_max = {}

for row in google_data[1:]:
    name = row[0]
    reviews = float(row[3])
    if (name in g_reviews_max and g_reviews_max[name] <= reviews):
        g_reviews_max[name] = reviews
    else:
        g_reviews_max[name] = reviews
        
#print(g_reviews_max)

Then, we'll use that dictionary to create a new data set, where rows from the old dataset are only included if they match the maximum number of reviews for a given app and have not yet been added to the data set.

In [12]:
google_clean = []
already_added = []

for row in google_data[1:]:
    name = row[0]
    reviews = float(row[3])
    if reviews == g_reviews_max[name] and name not in already_added:
        google_clean.append(row)
        already_added.append(name)
        
print(len(google_clean))

9659


### Remove non-English entries from the data set

Looking through the data, we find that some app names suggest they're not direct toward an English-speaking audience. Because our target audience will be English-speaking, we don't want these apps to count toward our strategy. We'll remove them from the data set.

Commonly used English text characters range from 0 to 127 in the ASCII system, so we can use this ASCII range to filter the datasets further with the function `ord()`.

First, we'll write a function that determines if any character in a string is a non-English character. If it encounters a non-English character, the loop stops and returns "False". If it finishes processing the word and no character is non-English, it returns "True".

In [13]:
def string_eng(string):
    for i in string:
        char_ascii = ord(i)
        if char_ascii > 127:
            return False
        # print(char_ascii) # check on whether the return statement is stopping the loop
    return True

Now, let's check whether the function works.

In [14]:
print(string_eng('Instagram'))
print(string_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(string_eng('Docs To Go™ Free Office Suite'))
print(string_eng('Instachat 😜'))

True
False
False
False


The last two app names throw flags because emojis and 'TM' are not in ASCII 0-127. Because we don't want to filter out app names like the last two, we'll change the criterion of exclusion in the function we just defined. Instead of any single non-English character excluding an app, we'll say if there are more than 3 characters outside of the range, exclude it.

To do so, we'll increment a counter of non-English characters and use that as the filter in the 'if' statement.

In [15]:
def string_eng(string):
    nonEng_chars = 0
    for i in string:
        char_ascii = ord(i)
        if char_ascii > 127:
            nonEng_chars +=1
    
    if nonEng_chars <= 3:
        return True
    elif nonEng_chars > 3:
        return False

Now, let's test it again.

In [16]:
print(string_eng('Instagram'))
print(string_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(string_eng('Docs To Go™ Free Office Suite'))
print(string_eng('Instachat 😜'))

True
False
True
True


Now it's working how we want it to. Let's create new data sets removing apps with non-English names.

In [17]:
google_clean_2 = []

for row in google_clean:
    name = row[0]
    if string_eng(name):
        google_clean_2.append(row)
        
apple_clean = []

for row in apple_data[1:]:
    name = row[2]
    if string_eng(name):
        apple_clean.append(row)
    #print(name)

Now let's see how much it reduced our data sets in size.

In [18]:
print(len(google_clean))
print(len(google_clean_2))
print(len(apple_data))
print(len(apple_clean))

9659
9614
7198
6183


About 45 apps were removed from the Google data set, roughly 1000 from the Apple data set.

### Remove Non-Free Apps

Because our company will make a free app, we want our strategy to be based on other free apps. Therefore, we'll want to remove non-free apps.

Before we do so, we'll look at what the unique values of prices are and how they're formatted. We'll do this by creating a frequency table of prices for each data set.

In [19]:
g_price_ft = {}

for row in google_clean_2:
    price = row[7]
    if price in g_price_ft:
        g_price_ft[price] += 1
    elif price not in g_price_ft:
        g_price_ft[price] = 1

print('Google')
print(g_price_ft)

ap_price_ft = {}

for row in apple_clean:
    price = row[5]
    if price in ap_price_ft:
        ap_price_ft[price] += 1
    elif price not in ap_price_ft:
        ap_price_ft[price] = 1

print('Apple')
print(ap_price_ft)


Google
{'0': 8864, '$4.99': 70, '$3.99': 56, '$1.49': 45, '$2.99': 124, '$7.99': 7, '$5.99': 26, '$3.49': 7, '$1.99': 73, '$6.99': 10, '$9.99': 19, '$7.49': 2, '$0.99': 145, '$9.00': 1, '$5.49': 5, '$10.00': 2, '$11.99': 3, '$79.99': 1, '$16.99': 2, '$14.99': 9, '$1.00': 3, '$29.99': 5, '$2.49': 25, '$24.99': 3, '$10.99': 1, '$1.50': 1, '$19.99': 5, '$15.99': 1, '$33.99': 1, '$74.99': 1, '$39.99': 2, '$4.49': 9, '$1.70': 2, '$8.99': 5, '$2.00': 3, '$3.88': 1, '$25.99': 1, '$399.99': 11, '$17.99': 2, '$400.00': 1, '$3.02': 1, '$1.76': 1, '$4.84': 1, '$4.77': 1, '$1.61': 1, '$2.50': 1, '$1.59': 1, '$6.49': 5, '$1.29': 1, '$5.00': 1, '$13.99': 2, '$299.99': 1, '$379.99': 1, '$37.99': 1, '$18.99': 1, '$389.99': 1, '$19.90': 1, '$8.49': 2, '$1.75': 1, '$14.00': 1, '$4.85': 1, '$46.99': 1, '$109.99': 1, '$3.95': 1, '$154.99': 1, '$3.08': 1, '$2.59': 1, '$4.80': 1, '$1.96': 1, '$19.40': 1, '$3.90': 1, '$4.59': 1, '$15.46': 1, '$3.04': 1, '$12.99': 3, '$4.29': 1, '$2.60': 1, '$3.28': 1, '$4.60

Prices are strings in both data sets, and free apps are marked as '0' in both. There should be 8864 free Google apps and 4056 free Apple apps. Let's create new data sets by filtering with the string '0'.

In [20]:
google_clean_3 = []

for row in google_clean_2:
    price = row[7]
    if price == '0':
        google_clean_3.append(row)
    

apple_clean_2 = []

for row in apple_clean:
    price = row[5]
    if price == '0':
        apple_clean_2.append(row)

Now, let's see if the data sets are the right size.

In [21]:
print(len(google_clean_3))
print(len(apple_clean_2))

8864
3222


Bingo.

## Build Frequency Tables for Variables

To minimize risks and overhead, our validation strategy for an app idea has three steps:

1) Build a minimal Android version of the app, and add it to Google Play.

2) If the app has a good response from users, we develop it further.

3) If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets.

To begin looking into important factors for app popularity, let's create a function that can create frequency tables for a variable in a data set, given a data set and column number, since we'll be doing that pretty often. This will also express frequencies as a percentage of the total rather than just a count.

In [22]:
def freq_table(dataset, col):
    ft = {}

    for row in dataset:
        var = row[col]
        
        if var in ft:
            ft[var] += 1
        elif var not in ft:
            ft[var] = 1
    
    sum_ft = 0
    for key in ft:
        sum_ft = sum_ft + ft[key]
    
    for key in ft:
        val_perc = (ft[key] / sum_ft) * 100
        val_perc_round = round(val_perc, 2)
        ft[key] = val_perc_round
        
    return ft

Now, let's look at the most common genres in each app store's market.

In [23]:
genre_ft_g = freq_table(google_clean_3, 1)
genre_ft_ap = freq_table(apple_clean_2, 12)
print(genre_ft_g)
print(genre_ft_ap)

{'ART_AND_DESIGN': 0.64, 'AUTO_AND_VEHICLES': 0.93, 'BEAUTY': 0.6, 'BOOKS_AND_REFERENCE': 2.14, 'BUSINESS': 4.58, 'COMICS': 0.62, 'COMMUNICATION': 3.25, 'DATING': 1.86, 'EDUCATION': 1.13, 'ENTERTAINMENT': 0.88, 'EVENTS': 0.71, 'FINANCE': 3.7, 'FOOD_AND_DRINK': 1.24, 'HEALTH_AND_FITNESS': 3.07, 'HOUSE_AND_HOME': 0.82, 'LIBRARIES_AND_DEMO': 0.94, 'LIFESTYLE': 3.9, 'GAME': 9.51, 'FAMILY': 19.22, 'MEDICAL': 3.54, 'SOCIAL': 2.66, 'SHOPPING': 2.25, 'PHOTOGRAPHY': 2.94, 'SPORTS': 3.42, 'TRAVEL_AND_LOCAL': 2.34, 'TOOLS': 8.46, 'PERSONALIZATION': 3.32, 'PRODUCTIVITY': 3.89, 'PARENTING': 0.65, 'WEATHER': 0.8, 'VIDEO_PLAYERS': 1.78, 'NEWS_AND_MAGAZINES': 2.8, 'MAPS_AND_NAVIGATION': 1.4}
{'Productivity': 1.74, 'Weather': 0.87, 'Shopping': 2.61, 'Reference': 0.56, 'Finance': 1.12, 'Music': 2.05, 'Utilities': 2.51, 'Travel': 1.24, 'Social Networking': 3.29, 'Sports': 2.14, 'Health & Fitness': 2.02, 'Games': 58.16, 'Food & Drink': 0.81, 'News': 1.33, 'Book': 0.43, 'Photo & Video': 4.97, 'Entertainmen

This isn't very easy to look at, so let's build a function to sort these genres by number of entries and embed the freq table function in it.

In [24]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now, let's display some tables of apps by category.

In [25]:
print('--- Google Category ---')
display_table(google_clean_3, 1)
print('--- Apple Prime Genre ---')
display_table(apple_clean_2, 12)
print('--- Google Genres ---')
display_table(google_clean_3, 9)

--- Google Category ---
FAMILY : 19.22
GAME : 9.51
TOOLS : 8.46
BUSINESS : 4.58
LIFESTYLE : 3.9
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.54
SPORTS : 3.42
PERSONALIZATION : 3.32
COMMUNICATION : 3.25
HEALTH_AND_FITNESS : 3.07
PHOTOGRAPHY : 2.94
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.78
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.13
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
ENTERTAINMENT : 0.88
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART_AND_DESIGN : 0.64
COMICS : 0.62
BEAUTY : 0.6
--- Apple Prime Genre ---
Games : 58.16
Entertainment : 7.88
Photo & Video : 4.97
Education : 3.66
Social Networking : 3.29
Shopping : 2.61
Utilities : 2.51
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.33
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book


For the Apple data set, games takes a far lead (55.7%), followed by entertainment (8.2%) and photo & video apps (4.1%). These are followed by social networking, education, shopping, utilities, lifestyle, and finance apps.

For the Google data set, family apps (19.2%), games (9.5%), and tools (8.5%) are the most common categories. These are followed by business, lifestyle, productivity, finance, and medical apps (all 3.5-4.6%).

When considering multiple genres in the Google set, the top set are tools (8.45%), entertainment (6.1%), and education (5.4%). However, because many of the genres are subgenres, this variable diffuses the impact of larger categories. For example, many genre labels include what seem to be styles of game apps: puzzle, racing, strategy, role playing, casino, casual;action & adventure, etc. 

As a whole, games and other entertainment, including social media, take up a fair share of the percentage of apps, with games being a top percentage in both Google and Apple stores. However, the trends in the two stores differ for the top set of apps represented. The Google store predominantly has practical apps, whereas the Apple store predominantly has entertainment apps.

The fact that a large number of apps of a certain kind exist in the store may be an indicator that there is demand for that kind of app, but this is not certain without measuring the number of users for a given app genre. For example, it could be that although there are fewer social media and communication apps available in the store, but there are far more users of these apps than family apps or photo & video apps. 

## Determine Rough Number of Users by App Genre

For for Apple data set, we'll use total ratings count; for the Google data set, installs.

### Apple Store

We'll start by getting a frequency table for Apple's prime_genre variable.

In [26]:
a_ft = freq_table(apple_clean_2, 12)

Now, we'll loop over the unique genres in each data set, getting the average number of installs or ratings per app of a given genre. We'll also use the same method from the previous display_table function to organize the results.

We'll start with the Apple data set.

In [27]:
a_table_display = []
for genre in a_ft:
    total = 0
    len_genre = 0
    for row in apple_clean_2:
        genre_app = row[12]
        if genre_app == genre:
            n_ratings = float(row[6])
            total += n_ratings
            len_genre += 1
    avg_ratings = round(total / len_genre, 2)
    #print(genre + ": " + str(avg_ratings))
    
    genre_val_as_tuple = (avg_ratings, genre)
    a_table_display.append(genre_val_as_tuple)

a_table_sorted = sorted(a_table_display, reverse = True)
for entry in a_table_sorted:
    print(entry[1], ':', entry[0])

Navigation : 86090.33
Reference : 74942.11
Social Networking : 71548.35
Music : 57326.53
Weather : 52279.89
Book : 39758.5
Food & Drink : 33333.92
Finance : 31467.94
Photo & Video : 28441.54
Travel : 28243.8
Shopping : 26919.69
Health & Fitness : 23298.02
Sports : 23008.9
Games : 22788.67
News : 21248.02
Productivity : 21028.41
Utilities : 18684.46
Lifestyle : 16485.76
Entertainment : 14029.83
Business : 7491.12
Education : 7003.98
Catalogs : 4004.0
Medical : 612.0


Based on the average number of user ratings per app in the Apple store, four app types stand head and shoulders above the rest: Reference (67K+), Music (56K+), Social Networking (53K+), and Weather (47K+). Before making a recommendation to the company, however, it would be useful to see how much the top apps in these categories influence the average.

We'll do so by looking at the first few entries with each genre label.

In [28]:
for app in apple_clean_2:
    if app[12] == 'Reference':
        print(app[2], ':', app[6]) # print name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


The bible and dictionaries dominate reference, so a reference site isn't a great idea--unless you can build a better dictionary!

In [29]:
for app in apple_clean_2:
    if app[12] == 'Music':
        print(app[2], ':', app[6]) # print name and number of ratings

Pandora - Music & Radio : 1126879
Shazam - Discover music, artists, videos & lyrics : 402925
iHeartRadio – Free Music & Radio Stations : 293228
Deezer - Listen to your Favorite Music & Playlists : 4677
Sonos Controller : 48905
NRJ Radio : 38
radio.de - Der Radioplayer : 64
Spotify Music : 878563
SoundCloud - Music & Audio : 135744
Sing Karaoke Songs Unlimited with StarMaker : 26227
SoundHound Song Search & Music Player : 82602
Ringtones for iPhone & Ringtone Maker : 25403
Coach Guitar - Lessons & Easy Tabs For Beginners : 2416
TuneIn Radio - MLB NBA Audiobooks Podcasts Music : 110420
Magic Piano by Smule : 131695
QQ音乐HD : 224
The Singing Machine Mobile Karaoke App : 130
Bandsintown Concerts : 30845
PetitLyrics : 0
edjing Mix:DJ turntable to remix and scratch music : 13580
Smule Sing! : 119316
Amazon Music : 106235
AutoRap by Smule : 18202
My Mixtapez Music : 26286
Certified Mixtapes - Hip Hop Albums & Mixtapes : 9975
Karaoke - Sing Karaoke, Unlimited Songs! : 28606
Napster - Top Music 

There are a fair number of popular music apps--karaoke, music players, etc. A music app looks like a promising investment.

In [30]:
for app in apple_clean_2:
    if app[12] == 'Social Networking':
        print(app[2], ':', app[6]) # print name and number of ratings

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony™ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat : 12079
Cougar 

Social networking also includes a lot of promisingly large-follower app types, especially tools for enhancing the Instagram experience, but also including communication apps, and dating/meetup apps.

In [31]:
for app in apple_clean_2:
    if app[12] == 'Weather':
        print(app[2], ':', app[6]) # print name and number of ratings

WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
The Weather Channel: Forecast, Radar & Alerts : 495626
AccuWeather - Weather for Life : 144214
MyRadar NOAA Weather Radar Forecast : 150158
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
Météo-France : 24
Yurekuru Call : 53
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
FEMA : 128
Weather Underground: Custom Forecast & Local Radar : 49192
JaxReady : 22
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
Hurricane by American Red Cross : 1158
Weather & Radar : 37
WRAL Weather Alert : 25
Yahoo Weather : 112603
Weather Live Free - Weather Forecast & Alerts : 35702
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
iWeather - World weather forecast : 80
Almanac Long-Range Weather Forecast : 12
TodayAir : 0
Weather - Radar - Storm with Morecast App : 78
Storm Radar : 22792
WarnWetter : 0
wetter.com : 0
Forecast Bar : 375
Freddy the

Weather apps seem to be dominated by a small few: Accuweather, The Weather Channel, and MyRadar. Probably not a good choice.

### Recommendation based on Apple Store apps

Based on the average number of user ratings per app and the available options for app types, it seems that our company should make a Music or Social Networking app to maximize the number of app users. For music, making a breakthrough karaoke, podcast/audiobook player, or music editing app could be  popular. For social networking, an app for meetups, communication, or a novel use of the Instagram platform could be popular.

### Google Store

We'll start by getting a frequency table for Google's category variable.

In [32]:
g_ft = freq_table(google_clean_3, 1)

Next, we'll display the average number of installs per category. Installs in this data set are marked with a range, such as 1,000,000+ or 10,000+, making the numbers imprecise. However, for the purposes of estimation, we can just use the number range marker (e.g., 1,000,000 or 10,000), removing the commas and pluses from the strings and making these numbers into floats.

In [33]:
g_table_display = []
for category in g_ft:
    total = 0
    len_category = 0
    for row in google_clean_3:
        category_app = row[1]
        if category_app == category:
            n_installs = row[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            
            total += n_installs
            len_category += 1
    avg_ratings = round(total / len_category, 2)
    
    cat_val_as_tuple = (avg_ratings, category)
    g_table_display.append(cat_val_as_tuple)

g_table_sorted = sorted(g_table_display, reverse = True)
for entry in g_table_sorted:
    print(entry[1], ':', entry[0])

COMMUNICATION : 38326063.2
VIDEO_PLAYERS : 24790074.18
SOCIAL : 23253652.13
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16772838.59
TRAVEL_AND_LOCAL : 13984077.71
GAME : 12914435.88
TOOLS : 10801391.3
NEWS_AND_MAGAZINES : 9549178.47
ENTERTAINMENT : 9146923.08
BOOKS_AND_REFERENCE : 8767811.89
SHOPPING : 7036877.31
PERSONALIZATION : 5201482.61
FAMILY : 5180161.79
WEATHER : 5074486.2
SPORTS : 4274688.72
HEALTH_AND_FITNESS : 4167457.36
MAPS_AND_NAVIGATION : 4056941.77
ART_AND_DESIGN : 1986335.09
FOOD_AND_DRINK : 1924897.74
EDUCATION : 1768500.0
BUSINESS : 1704192.34
LIFESTYLE : 1437816.27
FINANCE : 1387692.48
HOUSE_AND_HOME : 1331540.56
DATING : 854028.83
COMICS : 817657.27
AUTO_AND_VEHICLES : 647317.82
LIBRARIES_AND_DEMO : 638503.73
PARENTING : 542603.62
BEAUTY : 513151.89
EVENTS : 253542.22
MEDICAL : 123064.79


### Recommendation based on Google Store apps

Based on the average number of installs per app in the Google app store, communication apps tower above the rest (38.3M+), followed by video players (24.8M+) and social (23.3M+). After these categories, the average installs drops substantially (17.8M and below). Once again, before making a recommendation to the company, it would be useful to see how much the top apps in these categories influence the average.

We'll do so once more by looking at the first few entries with each genre label.

In [34]:
for app in google_clean_3:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5]) # print name and number installs

Messenger – Text and Video Chat for Free : 1,000,000,000+
Messenger for SMS : 10,000,000+
Gmail : 1,000,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
imo free video calls and chat : 500,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
My Vodacom SA : 5,000,000+
Calls & Text by Mo+ 

The top apps in communication are very popular, but many other apps in this category are also quite popular, including web browsers, messaging apps, and keyboards. Therefore, communication apps offer potential for our purposes.

In [35]:
for app in google_clean_3:
    if app[1] == 'VIDEO_PLAYERS':
        print(app[0], ':', app[5]) # print name and number installs

All Video Downloader 2018 : 1,000,000+
HD Video Player : 1,000,000+
Iqiyi (for tablet) : 1,000,000+
Motorola Gallery : 100,000,000+
Free TV series : 100,000+
Video Player All Format for Android : 500,000+
Code : 10,000,000+
Vote for : 50,000,000+
XX HD Video downloader-Free Video Downloader : 1,000,000+
OBJECTIVE : 1,000,000+
Music - Mp3 Player : 10,000,000+
HD Movie Video Player : 1,000,000+
YouCut - Video Editor & Video Maker, No Watermark : 5,000,000+
Video Editor,Crop Video,Movie Video,Music,Effects : 1,000,000+
video player for android : 10,000,000+
Google Play Movies & TV : 1,000,000,000+
HTC Service － DLNA : 10,000,000+
VPlayer : 1,000,000+
MiniMovie - Free Video and Slideshow Editor : 50,000,000+
Samsung Video Library : 50,000,000+
OnePlus Gallery : 1,000,000+
LIKE – Magic Video Maker & Community : 50,000,000+
HTC Service—Video Player : 5,000,000+
Play Tube : 1,000,000+
Droid Zap by Motorola : 5,000,000+
video player : 1,000,000+
G Guide Program Guide (SOFTBANK EMOBILE WILLCOM 

Among apps in the category 'Video Players', there are quite a few popular video editors and downloaders. This kind of app could be a very valuable choice.

In [36]:
for app in google_clean_3:
    if app[1] == 'SOCIAL':
        print(app[0], ':', app[5]) # print name and number installs

Social network all in one 2018 : 100,000+
TextNow - free text + calls : 10,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+
Web Browser & Fast Explorer : 5,000,000+
VidStatus app - Status Videos & Status Downloader : 5,000,000+
Love Images : 1,000,000+
Web Browser ( Fast & Secure Web Explorer) : 500,000+
SPARK - Live random video chat & meet new people : 5,000,000+
Golden telegram : 50,000+
Facebook Local : 1,000,000+
Meet – Talk to Strangers Using Random Video Chat : 5,000,000+
MobilePatrol Public Safety App : 1,000,000+
💘 WhatsLov: Smileys of love, stickers and GIF : 1,000,000+
HTC Social Plugin - Facebook : 10,000,000+
Kate Mobile for VK : 10,000,000+
Family GPS tracker KidControl + GPS by SMS Locator : 1,000,000+
Moment : 1,000,000+
Text Me: Text Free, Call Fr

As on the Apple store, social apps including messaging, meetups, and dating are fairly popular, and they are not mainly coming from one provider. This suggests that creating one of these app types would be profitable.

Together, the results above suggest that there are many potential choices among communication, video, and social apps. The category that is least concentrated in a single popular app at present seems to be social. If our company created a dating or meetup app, it would be likely to maximize the number of app users.

## Conclusions

Both the Apple and Google store numbers suggest that a communication app could be the best choice for a new, free app that could draw many users. The breakdown of apps in this category strongly suggests that a dating or meetup app would be a wise choice, given that the market is not too concentrated in a single app.