# App download analysis
This notebook will document an analysis of two app download data sets. The data sets are lists of apps from both the Apple and Android online stores.

## Goals
The goal of this analysis is to determining what type of apps are likely to attract the most users.

### Background
This notebook is being prepared to satisfy the Dataquest "Python - Data Science Fundamentals" course.  Quoting of page numbers references the location of the project requirements.

###### *Ref. page 2*

The first step is to load the supplied data.

The cell below does this and also creates the supplied "explore_data" function.  

In [1]:
# Library requirements
import csv

# Import data
apple_opened = open('AppleStore.csv')
google_opened = open('googleplaystore.csv')

apple = list(csv.reader(apple_opened))
google = list(csv.reader(google_opened))

# Initialise explore_data function
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))


In [2]:
# Explore data - Apple store
explore_data(apple, 0, 3, "TRUE")

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 7198
Number of columns: 16


The Apple data has 7,197 records excluding the header.

Let's review the attributes and take a guess at those that may help in our goal of determining what type of apps are likely to attract more users.  

`track_name` is the name of the app.  Initial inspection would suggest `user_rating`, `rating_count_tot` and the `prime_genre` are useful as these attributes measure user feedback.

Moving on to the Google data.

In [3]:
# Explore data - Google play
explore_data(google, 0, 3, "TRUE")

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13


The Google data has 10,841 records excluding the header.  The Google data has 16 attributes, three less than the Apple data set.

What attributes may be useful for determining user attractiveness?   All attributes bar `Type` (this can be inferred from `Price`) and `Size` appear to be relevent.

###### *Ref. page 3*
## Data errors
We have been informed ([link](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015)) that some data may be incorrect.  Row 10473  of the Google data is the suspect.  It is claimed the `Rating` column is missing. Let's have a look.

In [4]:
# Extract row 10473 - Google play
explore_data(google, 0, 1)
explore_data(google, 10473, 10474)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




Let's double check the length of row 10473.

In [5]:
# Length of suspect row
len(google[10473:10474][0])

12

In [6]:
# OK, 12 is not 13. Let's remove it....
del google[10473]

# ...and check the length of the resultant list
len(google)

10841

That looks good.  One less than our original list.

Moving onto the App Store data, this [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion/90680) finds an extra column containing an empty string.  Looking back to our initial "explore_data" extract, this is not occuring in our data.  Since this is not an issue we will move on.

###### *Ref. page 4*
## Additional data cleaning
In the next block of code, we will look for duplicates across app name in the Google data set. This code loops through the list of apps and populates a "dupes" list should an app occur more than once.

In [7]:
dupes = []
unique = []

for i in google:
    app_name = i[0]
    if app_name in unique:
        dupes.append(app_name)
    else:
        unique.append(app_name)

print('Number of unique :', len(unique))
print('\n')
print('Number of duplicates :', len(dupes))
print('\n')
print('First 10 duplicates ;', dupes[:10])

Number of unique : 9660


Number of duplicates : 1181


First 10 duplicates ; ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Let's take a look at the duplicates for the first app returned, 'Quick PDF Scanner + OCR FREE'.

In [8]:
print(google[0:1])

for i in google:
    app_name = i[0]
    if app_name == 'Quick PDF Scanner + OCR FREE':
        print('\n')
        print(i)

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


There are 3 records of this App being returned.  The first two are identical across attributes while the third has one less review.

###### *Ref. page 5*

### Removal of duplicates

We want to keep the most recent records since these are most relevant to our analysis. We will thus delete the records containing the lowest number of reviews.  By definition, these must be older data records. In the case of ties as in the example above, we will remove on a random basis as there is no impact on the resultant data set.

The cell below loops through the Google data set and places apps into the `reviews_max` dictionary if they are unique.  If they are not unique, the record with the most reviews is added to `reviews_max`.

In [9]:
# Create dictionary
reviews_max = {}

# Loop through list to capture max reviews
for i in google[1:]:
    name = i[0]
    n_reviews = float(i[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max:
        reviews_max[name] = n_reviews

#print(reviews_max)

Let's see if the duplicate identified above has been returned with the correct value assigned to "max review"

In [10]:
reviews_max['Quick PDF Scanner + OCR FREE']

80805.0

That looks good.  The highest review count is 80,805 per cell 8 above. 

Let's see how many entries are in our dictionary?

In [11]:
len(reviews_max)

9659

9,659 is the expected result.

Next we will remove duplicate records from the original Google data set, creating a new list called `android_clean`.

In [12]:
# New list
android_clean = []

# List to store app names
already_added = []

# Loop through list to capture max reviews
for i in google[1:]:
    name = i[0]
    n_reviews = float(i[3])
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(i)
        already_added.append(name)


Let's check that the length of the `android_clean` list is as expected.  We expect a length of 9,659.

In [13]:
explore_data(android_clean, 0, 2, "TRUE")

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


That looks good. We no longer have the header row in the `android_clean` data.  The cell below re-inserts this and checks the result.

In [14]:
android_clean.insert(0,google[0:1])

explore_data(android_clean, 0, 2, "TRUE")

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows: 9660
Number of columns: 1


###### *Ref. page 6*
We have learned the Google data set may contain non-English speaking apps and that these are to be excluded from our analysis.  

Let's develop functionality to detect this.  The functions in the cell below return False if the text string ingested contains a non-English character.  I've included on non-looped variant but we will work with the looped implementation in line with the project requirements.

In [15]:
def new_fun(and_str = 'Instachat 😜'):
    for i in and_str:
        ch_code = ord(i)
        if ch_code > 127:
            return False
    return True
print(new_fun())

# Alternate implementation
def new_fun_1(and_str = 'Instachat 😜'):
    if ord(max(and_str)) > 127:
        return False
    return True
print(new_fun_1())

False
False


OK, let's test the function.  The ouput of the cell below should return True, False, False, False

In [16]:
print(new_fun(and_str = 'Instagram'))
print(new_fun(and_str = '爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(new_fun(and_str = 'Docs To Go™ Free Office Suite'))
print(new_fun(and_str = 'Instachat 😜'))

True
False
False
False


That look good.  Let's proceed.

###### *Ref. page 7*

The function above returns False if only one character is non-English.  We have now been asked to identify only app names with more than 3 non-English characters.  The function below does this. 

In [17]:
test = [['Docs To Go™ Free Office Suite'], 
        ['Instachat 😜'], 
        ['爱奇艺PPS -《欢乐颂2》电视剧热播']]

def non_eng(name_str = test):
    for j in name_str:
        char_str = j[0]
        non_eng_char = 0
        for i in char_str:
            ASCII = ord(i)
            if ASCII > 127:
                non_eng_char += 1
        if non_eng_char > 3:
            #return False    # comment in
            print('False')   # comment out
        else:                # comment out
            print('True')    # comment out
    #return True             # comment in

            
# Call function
non_eng()

True
True
False


The above function contains a nested loop on the basis that the input is a list of lists.  This does not work, True or False cannot be returned from this function.  Switching the active lines of code per the comments in the cell above will return only one instance of True or False.  Three results are expected.  This appears to be due to the return instruction breaking the loop.

Let's amend the function to assess one string at a time.

In [18]:
test1 = 'Docs To Go™ Free Office Suite'
test2 = 'Instachat 😜'
test3 = '爱奇艺PPS -《欢乐颂2》电视剧热播'

def non_engl(name_string = 'Docs To Go™ Free Office Suite'):
    non_engl_char = 0
    for i in name_string:
        ASCII = ord(i)
        if ASCII > 127:
            non_engl_char += 1
    if non_engl_char > 3:
        return False
    else:
        return True

            
# Call function
non_engl(test1)

True

In [19]:
non_engl(test2)

True

In [20]:
non_engl(test3)

False

This works as intended.  The test evidence is in the cells above.  

We will now use this function to filter out non-English apps using the same method used above to filter out duplicates.

In [21]:
# List for English names
android_english = []
apple_english = []

# List for non-English names
android_non_english = []
apple_non_english = []

# Loop through android list to capture max reviews
for i in android_clean[1:]:  
    name = i[0]
    if non_engl(name):
        android_english.append(i)
    else:
        android_non_english.append(i)
        
# Loop through apple list to capture max reviews
for i in apple[1:]:   
    name = i[1]
    if non_engl(name):
        apple_english.append(i)
    else:
        apple_non_english.append(i)

In [22]:
print('Android English count:', len(android_english))
print('Android non-English count:', len(android_non_english))
print('Apple English count:', len(apple_english))
print('Apple non-English count:', len(apple_non_english))

Android English count: 9614
Android non-English count: 45
Apple English count: 6183
Apple non-English count: 1014


Let's quickly review some of the non-English Android apps.  The cell below calls the "explore_data" function to review the first 4 records. 

In [23]:
explore_data(android_non_english, 0, 3, "TRUE")

['Flame - درب عقلك يوميا', 'EDUCATION', '4.6', '56065', '37M', '1,000,000+', 'Free', '0', 'Everyone', 'Education', 'July 26, 2018', '3.3', '4.1 and up']


['သိင်္ Astrology - Min Thein Kha BayDin', 'LIFESTYLE', '4.7', '2225', '15M', '100,000+', 'Free', '0', 'Everyone', 'Lifestyle', 'July 26, 2018', '4.2.1', '4.0.3 and up']


['РИА Новости', 'NEWS_AND_MAGAZINES', '4.5', '44274', '8.0M', '1,000,000+', 'Free', '0', 'Everyone', 'News & Magazines', 'August 6, 2018', '4.0.6', '4.4 and up']


Number of rows: 45
Number of columns: 13


This looks as expected so we will move on.  Once again we have lost the header row as a result of the `[1:]` looping condition.

###### *Ref. page 8*

We have been asked to look at only free apps.  This is because we only build apps that are free to download and install.  The next cell creates new lists for free and paid apps for each of the Apple and Google lists.  The length of these lists is returned for review.

In [24]:
# List for free english apps
android_free_english = []
apple_free_english = []

# List paid english apps
android_paid_english = []
apple_paid_english = []

# Loop through android list isolate free apps
for i in android_english:
    price = float(i[7].replace('$', ''))
    if price == 0:
        android_free_english.append(i)
    else:
        android_paid_english.append(i)
        
# Loop through apple list to isolate free apps
for i in apple_english:
    price = float(i[4].replace('$', ''))
    if price == 0:
        apple_free_english.append(i)
    else:
        apple_paid_english.append(i)

In [25]:
print('Android free count:', len(android_free_english))
print('Android paid count:', len(android_paid_english))
print('Apple free count:', len(apple_free_english))
print('Apple paid count:', len(apple_paid_english))

Android free count: 8864
Android paid count: 750
Apple free count: 3222
Apple paid count: 2961


The sum of the free and paid apps equals that of the source list for each of the Apple and Google data sets.  With that check confirmed we will move on.

###### *Ref. page 9*

We now want to find the profile of apps that are popular on both Google Play Store and the Apple app store.  Apps on both platforms are likely to have more users and fit with our strategy of developing for one platform before expanding to another.

Let's have a look at the attributes provided in both datasets in order to determine if there is a "genre" attribute common to both. 

In [26]:
print(google[0:1])
print('\n')
print(apple[0:1])

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]


[['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']]


It looks like "Category", "Genres" and "prime_genre" will do the trick. 

In [27]:
print(google[0][1])
print(google[0][9])
print(apple[0][11])

Category
Genres
prime_genre


###### *Ref. page 10*

In the series of cells below, we write a function to create a dictionary of the proportion of apps in a specified column (index) for a specified dataset (dataset). 

This function is then used in the supplied function, "display_table()" to view the `prime_genre`, `Genres`, and `Category` columns. 

The proportion or frequency of the genre will potentially provide information about the popularity of apps.

In [28]:
def freq_table(dataset, index):
    freq_tble = {}
    length = len(dataset)
    for row in dataset:
        col = row[index]
        if col in freq_tble:
            freq_tble[col] += 1
        else:
            freq_tble[col] =1
    
    # Convert to proportions
    for i in freq_tble:
        freq_tble[i] /= length
        freq_tble[i] = freq_tble[i] * 100
    
    return freq_tble

In [29]:
# Test the function on the Apple 'cont_rating' column
freq_table(apple_free_english, 10)

{'12+': 17.070142768466788,
 '17+': 6.424581005586592,
 '4+': 65.85971446306642,
 '9+': 10.645561762880199}

In [30]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [31]:
# Display the results
print('Android:', google[0][1])
display_table(android_free_english, 1)
print('\n')
print('Android:', google[0][9])
display_table(android_free_english, 9)
print('\n')
print('Apple:', apple[0][11])
display_table(apple_free_english, 11)

Android: Category
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989

###### *Ref. page 11*

###### Observations on the Apple: prime_genre data
By far the most common genre in the Apple data set is games.  Game apps account for 58% of all apps in our sample, with Entertainment in second place with 8%. 

As a general topic, most apps are from a genre that would be considered entertainment / leisure.  The most common non-entertainment genre is education, which accounts for less than 4%.

Although game apps account for 58% of our samples, we should be careful when we conclude that this amounts to popularity.  We do not know the extent to which these apps are installed and used.

###### Observations on the Android: Category & Android: Genres
The genre type of the Android app is more diverse than Apple.  The highest share is family with 19%, followed by games with 10%.  Most apps would be considered non-entertainment or practical use.  This contrasts with the Apple apps which are entertainment based.


Once again we shouldn't equate the proportion of genre with popularity.

At this stage, we should not recommend an app profile for development, as we have not yet analyzed installation or user ratings data.

###### *Ref. page 12*

We cannot equate the proportion of apps in each data set with popularity.  Popularity is better measured by the amount of users and/or the rating of those users.  

We will now look at the amount of users for the Apple data set.  Installation data is not available so we will use number of ratings submitted as a proxy for the number of users/installations.

The cell below calculates the average number of ratings submitted for apps in each genre.

In [32]:
# use 11 for genre
print('Genre : Average number of ratings')
for genre in freq_table(apple_free_english, 11):
    total = 0
    len_genre = 0
    #print(genre)
    for app in apple_free_english:
        genre_app = app[11]
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ':', avg_rating)

Genre : Average number of ratings
Medical : 612.0
News : 21248.023255813954
Finance : 31467.944444444445
Productivity : 21028.410714285714
Social Networking : 71548.34905660378
Shopping : 26919.690476190477
Games : 22788.6696905016
Education : 7003.983050847458
Lifestyle : 16485.764705882353
Sports : 23008.898550724636
Catalogs : 4004.0
Health & Fitness : 23298.015384615384
Book : 39758.5
Entertainment : 14029.830708661417
Reference : 74942.11111111111
Photo & Video : 28441.54375
Weather : 52279.892857142855
Travel : 28243.8
Business : 7491.117647058823
Utilities : 18684.456790123455
Music : 57326.530303030304
Navigation : 86090.33333333333
Food & Drink : 33333.92307692308


Navigation, Reference and Social Networking have the highest numbers of user ratings submitted.   Reference and Social Networking are among the lowest proportion of apps.  We can conclude that these genres have a small number of apps that are very popular.  

We recommend that an app be developed for one of these genres.  This recommendation is on the basis that market share can be taken from an incumbent in genres where there is not a lot of competition. 

###### *Ref. page 13*

We now turn to the Google data set.  This data set contains an `Installs` attribute, but this attribute is stratified and does not contain an integer of total installations.   

The frequency table output of the cell below demonstrates this.  

In [33]:
display_table(android_free_english, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


We are happy to derive a less than precise estimate of user installations, and so will use the stratification level as an integer.

For example when we see 10,000+, we will assume the total installations is 10,000.

The cell below calculates the average number of installations for apps in each genre using the same method that was applied to the Apple data.  This is done by converting the stratification level to a float using the `float` and `replace` functions.

In [34]:
print('Genre : Average number of installations')
for genre in freq_table(android_free_english, 1):
    total = 0
    len_genre = 0
    for app in android_free_english:
        genre_app = app[1]
        if genre_app == genre:
            ratings1 = app[5].replace('+','')
            ratings = float(ratings1.replace(',',''))
            total += ratings
            len_genre += 1
    avg_rating = total / len_genre
    print(genre, ':', avg_rating)

Genre : Average number of installations
ENTERTAINMENT : 11640705.88235294
MAPS_AND_NAVIGATION : 4056941.7741935486
PARENTING : 542603.6206896552
EDUCATION : 1833495.145631068
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
WEATHER : 5074486.197183099
TRAVEL_AND_LOCAL : 13984077.710144928
FOOD_AND_DRINK : 1924897.7363636363
DATING : 854028.8303030303
GAME : 15588015.603248259
HEALTH_AND_FITNESS : 4188821.9853479853
BOOKS_AND_REFERENCE : 8767811.894736841
TOOLS : 10801391.298666667
PRODUCTIVITY : 16787331.344927534
SOCIAL : 23253652.127118643
COMICS : 817657.2727272727
BEAUTY : 513151.88679245283
AUTO_AND_VEHICLES : 647317.8170731707
SHOPPING : 7036877.311557789
EVENTS : 253542.22222222222
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
LIBRARIES_AND_DEMO : 638503.734939759
PHOTOGRAPHY : 17840110.40229885
FINANCE : 1387692.475609756
VIDEO_PLAYERS : 24727872.452830188
PERSONALIZATION : 5201482.6122448975
NEWS_AND_MAGAZINES : 9549178.467741935
COMMUNICATION : 38456119.

We will keep with the theme of developing apps for genres that are not crowded, i.e. have as small number of apps, but have a high level of users.   The "Books and reference" genre is on the high to mid range of users and forms only 2% of all apps.  This is consistent with it's counterpart genre in the Apple data set, "Reference".

In conclusion our advise is to choose the "Books and reference" genre for development.  This genre has a high level of users in both the Apple and Google app stores suggesting it is a popular genre for download.  The "Books and reference" genre does not have a large number of apps which we think will make breaking into this genre easier.