# <center><span style="color:green"> Got Profit? Identifying the Most Profitable App Profiles for the App Store and Google Play Markets </span></center>
--------------
The goal of this project is to demonstrate how a data analyst working for a company that builds Android and iOS mobile apps could identify which type of mobile app profiles are more profitable for the App Store and Google Play markets.

In this company, we strive to build apps that are free to download and install, and the primary source of our revenue comes from in-app advertisement. Therefore, our goal is to analyze data to aid our developers in understanding what types of apps attract more users.

## <span style="color:green">  Getting and Diving into the Data </span>
The growth of the app-space has been phenomenal! There is quite a bit of data for us to walk through. As of September 2018, there were approximately 2.1 million Android apps, and 2.0 million iOS apps available on the App store (Source: [Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/) )

Since the acquisition of data can be a costly endeavor and this is a practice exercise, we first determined if there was any relevant existing data available at no cost. Fortunately, there are two preexisting datasets that are suitable for our purpose:

1. A data set that contains data from approximately 10k Android apps from Google    Play which was collected in AUG2018. (Source: [Kaggle](https://www.kaggle.com/lava18/google-play-store-apps/home) )
2. A data set containing data from about 7k iOS apps from the App Store which      was collected in JUL2007. (Source: [Kaggle](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home))

We will start first by opening up the datasets:

In [2]:
from csv import reader

### First we will import and open the Google Play Dataset which has been saved locally###
gp_opened_file = open('googleplaystore.csv')
gp_read_file = reader(gp_opened_file)
android = list (gp_read_file)
android_header = android[0]
android= android[1:]

### Opening up the iOS dataset ###
i_opened_file = open('AppleStore.csv')
i_read_file = reader(i_opened_file)
ios = list (i_read_file)
ios_header = ios[0]
ios= ios[1:]

### <span style="color:green">Exploring the data </span>
In order to explore the datasets better, a new function called `explore_data()` will be created. This function allows one to repeatedly print rows of lists in a readable way. 
The `explore_data()` function:
- **Input:** This function takes in four parameters:
    - `dataset`: Expects a list of lists with **<span style="color:red"> no header </span>**.
    - `start` and `end`: Integers that represent the starting and ending indices
      of a slice from the data set. 
    - `rows_and_columns`: A Boolean and has `False` as default. This prints the
       rows and columns if it is set to `True`.

In [3]:
def explore_data(dataset, start, end, rows_and_columns =False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # Add an empty line between each row
    
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
### Let's open up our datasets with our new function ###
print("Here is the start of the Android dataset:")
print ('----------------------------------------')
print (android_header) # remember we had seperated this out
print('\n')
explore_data (android, 0, 2, True)
print('\n')
print ('Here is the start of the iOS dataset:')
print ('-------------------------------------')
print('\n')
print (ios_header)
print('\n')  
explore_data (ios, 0, 2, True)

Here is the start of the Android dataset:
----------------------------------------
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Here is the start of the iOS dataset:
-------------------------------------


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD',

Both sets of data are long and appear to have similar content to the data. There are a few useful columns that could be useful for our purposes: *'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price',* and *'Genres'* in the Android lists. In the iOS, the useful categories appear to be: *'track_name', 'currency','cont_rating', 'prime_genre',* and *'user_rating'.

-----------------------------------------
## <span style = "color:green"> Lets clean it up! </span>


In the previous section, we found that the two datasets appear to contain data of value for our analysis. However, it didn't exactly fit our purposes exactly. Remember we are doing this for a very specific audience and we need data that fits the following criteria:

* The apps need to be *free* to download and install
* The apps need to be geared to *English-speakers*

Beyond this, we need to generally clean up both sets of data so that it easy to use, contains no errors, and no duplicate data. This section will walk you through how this was achieved.

### <span style="color:green"> Removing Data Errors </span>

In the online [discussion board](https://www.kaggle.com/lava18/google-play-store-apps/discussion) for the Google Play dataset, one of the [discussions]( https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes an error on row 10472.

First, let us explore the nature of the error by downloaded the faulty row:


In [4]:
print("Here is some correct data:")
print("---------------------------")
print('\n')
print(android_header)
print('\n')
print(android[1])
print('\n')
print("Here is the faulty data:")
print("------------------------")
print('\n')
print(android[10472])

Here is some correct data:
---------------------------


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Here is the faulty data:
------------------------


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


We can now see what the discussion board was commenting on. This row is missing the 'ratings' category and all the remaining columns have shifted over! 

We will now remove this line:

In [5]:
print("Before:", len(android))
del(android[10472])
print("After:", len(android))


Before: 10841
After: 10840


The erroneous row is now missing from the above data. In reading through the [discussion section[](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion) of the iOS data set, no reports were made regarding wrong data. Therefore, we will move onto our next task.

### <span style="color:green"> Removing Duplicate Enteries </span>

If we were to walk through both datasets, we might find that there are duplicate enteries for the same app. We need to remove these if we find them. 

Let's first see what apps might be duplicated in the Google Play list:

In [6]:
# Initialize empty lists
duplicate_android_apps = []
unique_apps = []

# Create list of duplicate apps:
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_android_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of duplicate Android apps:', len(duplicate_android_apps))
print('\n')
print('Examples of duplicate Android apps:', duplicate_android_apps[:10])

Number of duplicate Android apps: 1181


Examples of duplicate Android apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


There are 1181 duplicated Android apps. Let look at an example using 'Slack':


In [7]:
print(android_header)
print('\n')
for app in android:
    name = app[0]
    
    if name == "Slack":
        print(app)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


From the above information, we can see that the only difference between each of these rows is the `'Review'` category which corresponds to the number of reviews. Therefore, the main difference is the time at which the data was collected. We might anticipate that the higher the number of reviews there are, the more recent the data is. Therefore, we will remove all the duplicates but keep the app row with the highest number of reviews.

In order to do this efficiently, we will:

1.) Create a dictionary where each dictionary key is a unique app name and the
    corresponding dictionary value refers to the highest ratings number.
    
2.) Use the stored information in the dictionary and create a new data set with
    only one entry per app.

In [8]:
# Initialize new dictionaries
reviews_max ={} #Dictionary with the maximum number of views
android_clean = [] # contains the clean data
already_added = [] # contains the names of the cleaned entries

#***********************************
#* Create a for loop that runs through and finds the maximum number of reviews
#***********************************
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    
    if name not in reviews_max:
        reviews_max[name]=n_reviews



We will now use our new dictionary for step 2: removing the duplicate rows.

In [9]:
for app in android:
    name =app[0]
    n_reviews = float(app[3])
    
    if reviews_max[name] == n_reviews and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
print('The original number of apps was:', len(android))
print ('the number of non-duplicated apps is expected to be:',len(android)-1181)
print('Our new dictionary has:', len(reviews_max), 'values')        
print('The cleaned data has ', len(android_clean), 'rows')
print('\n')
explore_data(android_clean, 0,3, True)

The original number of apps was: 10840
the number of non-duplicated apps is expected to be: 9659
Our new dictionary has: 9659 values
The cleaned data has  9659 rows


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


### <span style = "color: green">Remove the Non-English Apps </span>

Some of the apps appear to not be geared towards English speakers. For example:


In [10]:
print('iOS App: ',ios[813][1])
print('Android App:', android_clean[4412][0])

iOS App:  爱奇艺PPS -《欢乐颂2》电视剧热播
Android App: 中国語 AQリスニング


For today's study, we aren't interested in keeping these apps. Therefore, we need to find a way to remove them. There are lots of ways that we can do this but one way is to remove any name containing a symbol that is not commonly found in English text. English text commonly contains:

* Numbers composed of digits between 0 to 9
* Punctuation marks: (.,!, ?, ;)
* Other symbols: (+,*, /)
* Upper and lower case letters of the English alphabet

Behind the scenes in Python, each character used in a string has a corresponding number associated with it. For example for character <span style= "color: red"> `'a'`</span> the corresponding number is 97. For character <span style= "color: red"> `'爱'`</span> the corresponding number is 29,233. We can get each corresponding number with the `ord()` function.

In [11]:
print(ord('a'))
print(ord('A'))
print(ord('爱'))
print(ord('5'))
print(ord('+'))

97
65
29233
53
43


The numbers and characters that we commonly use in an English text are all int he range 0 to 127, according the American Standard Code for Information Interchange (ASCII) system. Using this number range, we can build a function that detectes whether a character belongs to the set of common English characters. 

If an app name includes a character greater than 127, then it most likely has a non-English name. Since our names are strings, we can make use of the indexing to select an individual character and loop through it. 

To start things off, we will write a function that checks if a given string includes a non-english character and returns false if it is probably non-English.

In [12]:
def english_check (string):
    for character in string:
        if ord(character)> 127:
            return False
    else:
        return True
    
print('Verifying that our English detection app works:')
print('Checking "爱奇艺PPS -《欢乐颂2》电视剧热播":')
print( english_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Checking "Docs To Go™ Free Office Suite":')
print( english_check('Docs To Go™ Free Office Suite'))
print('Checking "Instagram":', english_check('Instagram'))
print('Checking "Instagram":', english_check('Instagram'))
print('Checking "Instachat 😜":', english_check('Instachat 😜'))

Verifying that our English detection app works:
Checking "爱奇艺PPS -《欢乐颂2》电视剧热播":
False
Checking "Docs To Go™ Free Office Suite":
False
Checking "Instagram": True
Checking "Instagram": True
Checking "Instachat 😜": False


Oops!! We excluded some data accidentally... not good. Let's try this again to maximize the amount of data included. Let's modify the function above to exclude only those apps with non-English characters greater than 3 characters.

In [13]:
def english_check(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:
        return False
    else:
        return True
    
# Let's recheck those values!
print('Verifying that our English detection app works:')
print('Checking "爱奇艺PPS -《欢乐颂2》电视剧热播":')
print( english_check('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print('Checking "Docs To Go™ Free Office Suite":')
print( english_check('Docs To Go™ Free Office Suite'))
print('Checking "Instagram":', english_check('Instagram'))
print('Checking "Instagram":', english_check('Instagram'))
print('Checking "Instachat 😜":', english_check('Instachat 😜'))
print('Checking "Hola Mi Amo":', english_check('Hola Mi Amo'))

Verifying that our English detection app works:
Checking "爱奇艺PPS -《欢乐颂2》电视剧热播":
False
Checking "Docs To Go™ Free Office Suite":
True
Checking "Instagram": True
Checking "Instagram": True
Checking "Instachat 😜": True
Checking "Hola Mi Amo": True


Still not perfect. There might still be hispanic apps that pass through but it should be good enough for our analysis. 

Lets now use our `english_check()` function to filter out the non-English data apps in each dataset:

In [14]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if english_check(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if english_check(name):
        ios_english.append(app)
        
explore_data(android_english, 0,3, True)
print('\n')
explore_data(ios_english, 0,3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

From the above we can see that we were left with 9614 Android apps and 6183 iOS apps. This should be plenty of data for our initial run.

### <span style="color: green"> Isolating the Free Apps </span>

Since we should be working with apps that are free to download and install, our data sets need to only contain apps with 0 cost.

In [15]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print('Final Android rows:', len(android_final))
print('Final iOS rows:', len(ios_final))

Final Android rows: 8864
Final iOS rows: 3222


We are left with > 10k apps. This should still be enough for our analysis.

## <span style="color:green"> An Analysis of the Most Common Apps by Genre </span>

Our end-goal is to build an a successful app on Google Play and iOS. To minimize the overall work and cost, we will accomplish this in three stages:

1.) Build the app on Google Play.
2.) Evaluate its success in six months. 
3.) IF the app is working well on Google Play and generating interest, then the developers will begin to develop the app for iOS. 

Since our end goal is to see how the apps We will begin to get a sense of what are the most common genres for each market. to do this we will build frequency tables for our data sets.




In [16]:
print(android_header)
print('\n')
print(ios_header)


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In order to make short work of walking through the data, two functions are made:

1. `frequency_table`: Builds a table out of the values at the
    index in the dataset table. It then calculates the percentage 
    at which a given category appears. 
2. `display_table`: Takes the table, sorts it from greatest to the
    least and prints it to the screen. 

In [17]:
def frequency_table (dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentage = {}    
    for key in table:
        percentage = (table[key]/total)*100
        table_percentage[key]=percentage
    
    return table_percentage

def display_table (dataset, index):
    table = frequency_table (dataset, index)
    table_display = []
    for key in table:
        val_as_tuple = (table[key], key)
        table_display.append(val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for item in table_sorted:
        print(item[1],':', round(item[0],2),'%')
    

We will start to walk through the dataset beginning with the genre.

In [18]:
display_table(ios_final, -5)

Games : 58.16 %
Entertainment : 7.88 %
Photo & Video : 4.97 %
Education : 3.66 %
Social Networking : 3.29 %
Shopping : 2.61 %
Utilities : 2.51 %
Sports : 2.14 %
Music : 2.05 %
Health & Fitness : 2.02 %
Productivity : 1.74 %
Lifestyle : 1.58 %
News : 1.33 %
Travel : 1.24 %
Finance : 1.12 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.56 %
Business : 0.53 %
Book : 0.43 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


In [19]:
display_table(android_final, 1)

FAMILY : 18.91 %
GAME : 9.72 %
TOOLS : 8.46 %
BUSINESS : 4.59 %
LIFESTYLE : 3.9 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.7 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.32 %
COMMUNICATION : 3.24 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.8 %
SOCIAL : 2.66 %
TRAVEL_AND_LOCAL : 2.34 %
SHOPPING : 2.25 %
BOOKS_AND_REFERENCE : 2.14 %
DATING : 1.86 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.4 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.16 %
ENTERTAINMENT : 0.96 %
LIBRARIES_AND_DEMO : 0.94 %
AUTO_AND_VEHICLES : 0.93 %
HOUSE_AND_HOME : 0.82 %
WEATHER : 0.8 %
EVENTS : 0.71 %
PARENTING : 0.65 %
ART_AND_DESIGN : 0.64 %
COMICS : 0.62 %
BEAUTY : 0.6 %


The free English-speaking apps in the Android app store are more distributed than they were in the App Store. This time family based apps are present at the greatest percentage (18.9%). Games are the second most frequent apps at 9%. Practical apps make up the next three most popular genres (Business, Lifestyle and Productivity).

## <span style="color:green"> App Popularity by Download </span>

In the previous section, we identified the most popular genres in each store. We still need a way of identifying app popularity though. One way that we might do this is with a calculation of the average number of installs for each app genre. This is relatively easy for the Google Play data set as they have this information already stored under the `Installs` column. However, there is not a similar app in the iOS store. 

To address app popularity in the App Store dataset, we will need to be a bit more clever. We will use the `rating_count_tot` column as a proxy.  To do this we will do the following:

1.) Seperate out the apps by genre like we did above.

2.) Summing up the user ratings for the apps in each genre.

3.) Divide the sum of the number of apps belonging to that genre (NOT the total
    number of apps.

In [20]:
# Identify the genre
prime_genre= frequency_table (ios_final, -5)


for genre in prime_genre:
    total =0 # Will store the sum of user ratings for each genre
    len_genre =0 # Will store the number of apps in each genre
    
    for app in ios_final:
        genre_app = app [-5]
        
        if genre_app == genre:
            rating_number = float(app[5])
            total += rating_number
            len_genre +=1
        
    avg_ratings = total/len_genre
    print(genre,':', round(avg_ratings,2))
   

Utilities : 18684.46
Education : 7003.98
Sports : 23008.9
Business : 7491.12
Games : 22788.67
Medical : 612.0
Photo & Video : 28441.54
Navigation : 86090.33
News : 21248.02
Lifestyle : 16485.76
Music : 57326.53
Catalogs : 4004.0
Finance : 31467.94
Health & Fitness : 23298.02
Weather : 52279.89
Food & Drink : 33333.92
Book : 39758.5
Travel : 28243.8
Social Networking : 71548.35
Shopping : 26919.69
Reference : 74942.11
Productivity : 21028.41
Entertainment : 14029.83


By far Navigation rates the highest in ratings numbers at 85k lets see why.

In [21]:
for app in ios_final:
    if app[-5] == 'Navigation':
        print(app[1],':',app[5])#Print out name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


Navigation is ruled by 'Waze' and 'Google Maps' not too surprising since most people who have a phone utilize one of these apps for day to day activities.  This category as well as music, and social media are dominated by the popularity a few giants. These areas are most likely not the best ones for our small company to break into. Lets look at the next most popular app, 'Reference' at 74k ratings. 

In [22]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[1],':',app[5])#Print out name and number of ratings

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


This genre makes use of two of the most popular books of all time: the bible and dictionary. It also features references for popular games. This might work well for our company. However, we should look at the Android market too.

### <span style="color:green"> Popular Apps by Genre on Google Play </span>

For the Google apps, we will utilize the `Installs` column to identify the more popular apps. Before we can do this though. we will need to formulate the string into a numerical value.

In [23]:
categories_android = frequency_table(android_final, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', round(avg_n_installs,2))

PRODUCTIVITY : 16787331.34
TOOLS : 10801391.3
PERSONALIZATION : 5201482.61
ENTERTAINMENT : 11640705.88
EVENTS : 253542.22
COMICS : 817657.27
ART_AND_DESIGN : 1986335.09
HEALTH_AND_FITNESS : 4188821.99
MAPS_AND_NAVIGATION : 4056941.77
BUSINESS : 1712290.15
COMMUNICATION : 38456119.17
TRAVEL_AND_LOCAL : 13984077.71
NEWS_AND_MAGAZINES : 9549178.47
AUTO_AND_VEHICLES : 647317.82
MEDICAL : 120550.62
PHOTOGRAPHY : 17840110.4
WEATHER : 5074486.2
EDUCATION : 1833495.15
BOOKS_AND_REFERENCE : 8767811.89
LIBRARIES_AND_DEMO : 638503.73
FAMILY : 3695641.82
SOCIAL : 23253652.13
SHOPPING : 7036877.31
FOOD_AND_DRINK : 1924897.74
FINANCE : 1387692.48
HOUSE_AND_HOME : 1331540.56
GAME : 15588015.6
DATING : 854028.83
PARENTING : 542603.62
SPORTS : 3638640.14
LIFESTYLE : 1437816.27
VIDEO_PLAYERS : 24727872.45
BEAUTY : 513151.89
