# Mobile Application Market Research Project

The ACME Corporation is seeking to improve our mobile applications to engage better with our customers.  The goal of this project is to find a mobile application profile that will attract more customers and make the app more profitable.  We are working as a data science consultant assisting the ACME app developers make data-driven decisions regarding what types of apps they should be building.

ACME only builds apps that are free to install with their main source of revenue from in-app ads.  The number of users reflects how much revenue is generated from a particular app.  By analyzing how their users engage with apps, we can use this information to attract more customers.  

## Obtaining the Data

We have chosen two of the most popular mobile app platforms to do research on, Google Play and Apple's App Store.  As of September 2018, there were approximately 2 million iOS apps in Apple's App Store and 2.1 million Android apps in the Google Play Store.  Since compiling all the data for both of these platforms would take a tremendous amount of time and effort, we have chosen to analyze only a sample of data.  We have also decided not to collect the data ourselves because it would cost a significant amount of additional resources.  We are deriving the basis of our analysis from the following sources:

   * A [dataset](https://www.kaggle.com/lava18/google-play-store-apps/home) from approximately 10,000 Android apps from the Google Play store.
   * A [dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) from approximately 7,200 iOS apps from Apple's App Store.

We'll begin by open both data sets and exploring how the data is formatted for a small sample of the data.

In [1]:
from csv import reader

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print("\n")
    
    if rows_and_columns:
        print("Number of rows:", len(dataset))
        print("Numer of columns:", len(dataset[0]))

The explore_data() function takes a dataset and displays a sample of the data as defined by the boundaries specified by the starting and ending indices.  The rows_and_columns switch displays the header fields along with the sample data if specified "True."

### Apple App Store Data Fields

In [2]:
appleStoreData = list(reader(open("AppleStore.csv", encoding="UTF8")))
print(appleStoreData[0])
explore_data(appleStoreData, 1, 11, True)


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


['5', '282935706', 'Bible', '92774400', 'USD', '0', '985920', '5320', '4.5', '5', '7.5.1', '4+', 'Reference', '37', 

For the App Store, there are 7198 records with 16 data columns.

The description of the fields:
* id : numeric app id
* track_name : app name
* size_bytes : size (in bytes)
* currency : currency type
* price : price amount
* rating_count_tot : user rating counts (for all versions)
* rating_count_ver : user rating counts (for current version)
* user_rating : average user rating (for all versions)
* user_rating_ver : average user rating (for current version)
* ver : app version number
* cont_rating : content rating
* prime_genre : primary genre
* sup_devices.num = number of supported devices
* ipadSc_urls.num = number of screenshots on app store page
* lang.num = number of supported languages
* vpp_lic = Vpp device-based licensing enabled

During initial analysis of the data, the fields that may be of
use for our analysis are:  user_rating, prime_genre, cont_rating, price.

### Google Play Store Data Fields

In [3]:
googlePlayData = list(reader(open("GooglePlayStore.csv", encoding="UTF8")))
print(googlePlayData[0])
explore_data(googlePlayData, 1, 11, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Every

The Google Play store data contains 10,842 records with 13 data fields.

The description of the fields:
* App : application name
* Category : category the app belongs to
* Rating : overall user rating of the app
* Reviews : number of user reviews for the app
* Size : size of the app
* Installs : number of user downloads/installs for the app
* Type : paid or free app
* Price : price of the app
* Content Rating : age group the app is targeted at:
    * Children
    * Mature 21+
    * Adult
* Genres : an app can belong to multiple genres (apart from its main category)
* Last Updated : date when the app was updated on Play Store
* Current Ver : current version of the app available on Play Store
* Android Ver : minimum Android version required

Initially, the following fields may be of use for our analysis:  Category, Rating, Reviews, Installs, Content Rating, Price, Genres, Type.

## Clean Up Bad Data

There is a [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) associated with the Google Play dataset that states there is an error in record # 10472 (without counting the header row).

In [4]:
print(googlePlayData[0])      # Header
print(googlePlayData[10473])  # incorrect row
print(googlePlayData[1])      # compare with a correct row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Row 10473 refers to the app *Life Made Wi-Fi Touchscreen Photo Frame* and has a rating of 19.  Since the range of Google Play ratings is from 1 to 5, this is clearly incorrect data.  To make our data as clean as possible, we'll just delete this data entry.

In [5]:
del googlePlayData[10473]     # delete row to correct error

## Remove Duplicate Google Play data

We need to investigate whether there is duplicate data.  We create 2 lists:  One list keeping track of unique apps found, and one with duplicate app entries.
As part of our analysis, we'll display the number of duplicate app entries and print a small sample of the data to analyze which entry to keep and which to remove from the dataset.

In [6]:
duplicate_apps = []
unique_apps = []

for app in googlePlayData:
    appName = app[0]
    if appName in unique_apps:
        duplicate_apps.append(appName)
    else:
        unique_apps.append(appName)

print("Number of duplicate apps: ", len(duplicate_apps))
print("\n")
print("Examples of duplicate apps: ", duplicate_apps[:15])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


According to our analysis, there are 1181 duplicate apps in our Google Play Store database.

To illustrate an example, we'll print the duplicate entries for "Slack" to discover which entry to keep:

In [7]:
for app in googlePlayData:
    appName = app[0]
    if appName == "Slack":
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


It looks like we can use the Reviews field, which indicates the number of reviews assocated with this particular app is a good indication of the latest entry.  Well keep the entry with the largest number of reviews as it's a good indication of the newest entry.

## Process to Remove Duplicate Entries
To delete duplicate entries, we create a dictionary that contains the unique app name and the highest number of reviews, as we indicated previously, the highest number of reviews may indicate the latest data entry for duplicate entries.
We'll use the information stored in the dictionary to create a new dataset with no duplicates.


In [8]:
reviews_max = {}

#Loop through Google Play data
for appData in googlePlayData[1:]:
    name = appData[0]
    n_reviews = float(appData[3])
    
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
        
    if name not in reviews_max:
        reviews_max[name] = n_reviews

print("Number of expected unique apps: ")
print(len(reviews_max))

Number of expected unique apps: 
9659


First, we create a dictionary (reviews_max) that only contains the name of Google Play apps along with the highest number of reviews for that app.  This dictionary will be used as a filter for the new Google Play Store database without duplicate app entries.

We loop through the Google Play Store app database and for each app entry:
* if the app name is already in the reviews_max dictionary and the number of ratings is more than the one stored in reviews_max, update the number of ratings in the reviews_max dictionary.
* if the app name is not in the reviews_max dictionary, then add it.

In [9]:
android_clean = []
already_added = []

for appData in googlePlayData[1:]:
    name = appData[0]
    n_reviews = float(appData[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(appData)
        already_added.append(name)

print("Number of apps in corrected app database: ")
print(len(android_clean))

Number of apps in corrected app database: 
9659


Next we utilize the reviews_max dictionary to create a new database (android_clean) with a unique app name (and highest number of user ratings) with all the fields associated with that app entry from the source database.

We loop through the Google Play Store app database again.  For each app entry, we'll add the app entry to the new non-duplicate database if:
* The number of reviews is the same as the one in the reviews_max dictionary; and
* The app entry has not already been added to the non-duplicate database.  This prevents the code from adding duplicate entries with the same number of user ratings as the highest one.

## Remove Non-English Apps

After additional exploration of the data set, there seems to be a few apps that are targeted towards a non-English speaking audience.
 
ACME is only interested in the English speaking market, so we'll write some code to remove any non-English apps in the database.  One approach to remove non-English apps is to remove apps that have symbols that aren't used in the English language, such as letters from the English alphabet, numbers, punctuation marks, and other symbols (i.e. +, -, *, /).
 
Each character is encoded numerically using the ASCII standard.  Most characters that are used in the English language falls into the ASCII range between 0 and 127.
 
Below, we built the function IsEnglishWord() to determine if a string is made up of mostly English words.  The function uses the ord() method to convert the character to a number so we can determine if it's in the English character range.

However, filtering out characters outside of ASCII 0-127 isn't sufficient for our needs.  This method also filters out emoji characters that are outside of this ASCII range.

To minimize the unnecessary data loss, we'll only remove an app if it has more than three non-English ASCII characters:

In [10]:
def IsEnglishWord(aString):
    numNonEnglishCharacters = 0
    for aChar in aString:
        if ord(aChar) > 127:
            numNonEnglishCharacters += 1

    if numNonEnglishCharacters > 3:
        return False
    else:
        return True

print(IsEnglishWord("Instagram"))
print(IsEnglishWord("爱奇艺PPS -《欢乐颂2》电视剧热播"))
print(IsEnglishWord("Docs To Go™ Free Office Suite"))
print(IsEnglishWord("Instachat 😜"))

True
False
True
True


Researching the entire range of non-English ASCII (or UTF-8) characters is a time consuming process and may not be worth the time to sufficiently clean the data for our purposes.  The process we came up with should be sufficient for our analysis.
 
Below, we will filter out non-English apps from the Apple App Store and Google Play Store databases:

In [11]:
englishOnlyAndroidApps = []

for app in android_clean:
    if IsEnglishWord(app[0]):
        englishOnlyAndroidApps.append(app)

explore_data(englishOnlyAndroidApps, 1, 11, True)
print(len(englishOnlyAndroidApps))

englishOnlyIOSApps = []
for app in appleStoreData[1:]:
    if IsEnglishWord(app[1]):
        englishOnlyIOSApps.append(app)

explore_data(englishOnlyIOSApps, 1, 11, True)
print(len(englishOnlyIOSApps))

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 26, 2018', '1.1', '4.0.3 and up']


['Infinite Painter', 'ART_AND_DESIGN', '4.1', '36815', '29M', '1,000,000+', 'Free', '0', 'Everyone', 

We can see there are 9614 Android and 6183 iOS apps targeted towards an English speaking audience.

## Isolate Free from Non-Free Apps

ACME is only interested in building apps that are free to download with their main source of revenue coming from in-app ads.  Our data set contains both free and paid apps, so we need to filter out the paid apps to continue with our analysis.

In [12]:
freeIOSApps = []
freeAndroidApps = []

for app in englishOnlyAndroidApps:
    price = app[7]
    if price == "0":
        freeAndroidApps.append(app)

for app in englishOnlyIOSApps:
    price = app[5]
    #price = app[4]
    #if price == "0.0":
    if price == "0":
        freeIOSApps.append(app)

print(len(freeAndroidApps))
print(len(freeIOSApps))

8864
4056


There are 8864 free Android Apps and 4056 free Apple Apps in the app store database.

## Analysis of the First Candidate App by Genre

ACME wants to determine what kinds of apps will likely attract the most users, as the revenue from the app comes from in-app ads.

To minimize risk, ACME's strategy for it's first app idea will be comprised of:

1. Build an Android version of the app and add it to the Google Play App Store.
2. Iterate on the Android app based on feedback from users.
3. Once the Android version becomes profitable, build an iOS version of the app and add it to the Apple App Store.

ACME's vision is to create a presence in the Google Play and Apple App stores.  We need to determine which apps will be successful in both markets.

We first start by looking at what genre of app is the most successful free apps on Google Play Store.
The fields that may be of importance are:  genre and rating.

Two functions will analyze any frequency tables:
* freq_table() will generate frequency tables along with the percentages of each genre
* display_table() displays the percentages in descending order

In [13]:
def freq_table(dataset, index):
    aFreqTable = {}
    totalRows = len(dataset)
    
    for record in dataset:
        key = record[index]
        if key in aFreqTable:
            aFreqTable[key] += 1
        else:
            aFreqTable[key] = 1
            
    # express as a percentage
    for key in aFreqTable:
        aFreqTable[key] = aFreqTable[key] / totalRows * 100
        
    return aFreqTable

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
    
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ":", entry[0])

Below we examine the frequency table for the prime_genre column of the Apple App store data set.  In deciding what type of app to build, we'll take into account the most popular genres of apps in Apple's App Store.

In [14]:
print("Free IOS Apps - prime_genre:\n")
display_table(freeIOSApps, 12)

Free IOS Apps - prime_genre:

Games : 55.64595660749507
Entertainment : 8.234714003944774
Photo & Video : 4.117357001972387
Social Networking : 3.5256410256410255
Education : 3.2544378698224854
Shopping : 2.983234714003945
Utilities : 2.687376725838264
Lifestyle : 2.3175542406311638
Finance : 2.0710059171597637
Sports : 1.947731755424063
Health & Fitness : 1.8737672583826428
Music : 1.6518737672583828
Book : 1.6272189349112427
Productivity : 1.5285996055226825
News : 1.4299802761341223
Travel : 1.3806706114398422
Food & Drink : 1.0601577909270217
Weather : 0.7642998027613412
Reference : 0.4930966469428008
Navigation : 0.4930966469428008
Business : 0.4930966469428008
Catalogs : 0.22189349112426035
Medical : 0.19723865877712032


For Apple's App Store, it looks like a good majority of the free apps are related to Games.  Followed by Entertainment and videos/photographs.

Games make up of approximately 58% of the free apps, while the Entertainment category only make up of less than 8%; followed by less than 5% of popular free apps relating to photography and videos.  There is a huge disparity between Games and all of the other genres in the free app space.

It seems like it may be a safe bet to make a brand new app in the Games genre for the App Store.  Due to the popularity of that genre, there could be a lot of competition since other companies are creating apps in the Games space.



In [15]:
print("\nFree Android Apps - Genres:\n")
display_table(freeAndroidApps, 9)


Free Android Apps - Genres:

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Aut

For the Google Play free apps sorted by genres, the distribution of genres seem to be more evenly distrubuted than the App Store counterparts.

The most common genres are: Tools at 8%, Entertainment at 6%, Education at 5%, and Business at 4.6%.

I'm not sure if a recommendation of the genre of app to make in the Google Play Store because there isn't a clearly popular genre of app.



In [16]:
print("\nFree Android Apps - Category:\n")
display_table(freeAndroidApps, 1)


Free Android Apps - Category:

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.65

For Google Play Store apps that are sorted by category, the clear majorty of apps are in the Family category at less than 19% of the market share.

So far, we've discovered that the Apple App Store is dominated by games and entertainment apps, but the Google Play Store apps are more diverse.

Next, we'll examine the install base for each classification of app.

## Analyze the Number of Installs for Apple's App Store

Another perspective to finding out which app genres are the most popular is to calculate the average number of installs.

Since the Apple store doesn't store the number of installs for each app genre, we can approximate this by using the average number of user ratings.  Average number of user ratings can be approximated by taking the total number of user ratings (rating_count_tot) divided by the number of apps in that genre.

In [17]:
uniqueAppStoreGenres = freq_table(freeIOSApps, 12)

for genre in uniqueAppStoreGenres:
    total = 0
    len_genre = 0
    for app in freeIOSApps:
        genre_app = app[12]
        if genre_app == genre:
            user_ratings = float(app[6])
            total += user_ratings
            len_genre += 1
            
    averageNumUserRatings = total / len_genre
    print(genre, ":", averageNumUserRatings)

Productivity : 19053.887096774193
Weather : 47220.93548387097
Shopping : 18746.677685950413
Reference : 67447.9
Finance : 13522.261904761905
Music : 56482.02985074627
Utilities : 14010.100917431193
Travel : 20216.01785714286
Social Networking : 53078.195804195806
Sports : 20128.974683544304
Health & Fitness : 19952.315789473683
Games : 18924.68896765618
Food & Drink : 20179.093023255813
News : 15892.724137931034
Book : 8498.333333333334
Photo & Video : 27249.892215568863
Entertainment : 10822.961077844311
Business : 6367.8
Lifestyle : 8978.308510638299
Education : 6266.333333333333
Navigation : 25972.05
Medical : 459.75
Catalogs : 1779.5555555555557


Based on this calculated number of installs per genre, Social Networking, Music, and Reference have the highest values in the Apple App store.

## Analyze Number of Installs for Google Play

Analyzing the number of installs for the Google Play Store is a little easier as there is a field named Installs.

However, the values in the Installs field do not contain specific values.  Rather, they seem to indicate a range or minimum value.  A precise value isn't necessary to be used in comparison.  We are just looking for a rough estimate of which genres have the highest number of installs.

In addition to the impreciseness of the Installs field, the values contain symbols such as commas and pluses.  To use these values as comparisons, we need to "clean" the data into numeric values (float specifically).

In [18]:
categoryFreq = freq_table(freeAndroidApps, 1)

for category in categoryFreq:
    total = 0
    len_category = 0
    
    for app in freeAndroidApps:
        category_app = app[1]
        if category_app == category:
            numInstalls = app[5]
            numInstalls = numInstalls.replace('+', '')
            numInstalls = numInstalls.replace(',', '')
            numInstalls = float(numInstalls)
            total += numInstalls
            len_category += 1
            
    avgNumInstalls = total / len_category
    print (category, "-", avgNumInstalls)

ART_AND_DESIGN - 1986335.0877192982
AUTO_AND_VEHICLES - 647317.8170731707
BEAUTY - 513151.88679245283
BOOKS_AND_REFERENCE - 8767811.894736841
BUSINESS - 1712290.1474201474
COMICS - 817657.2727272727
COMMUNICATION - 38456119.167247385
DATING - 854028.8303030303
EDUCATION - 1833495.145631068
ENTERTAINMENT - 11640705.88235294
EVENTS - 253542.22222222222
FINANCE - 1387692.475609756
FOOD_AND_DRINK - 1924897.7363636363
HEALTH_AND_FITNESS - 4188821.9853479853
HOUSE_AND_HOME - 1331540.5616438356
LIBRARIES_AND_DEMO - 638503.734939759
LIFESTYLE - 1437816.2687861272
GAME - 15588015.603248259
FAMILY - 3695641.8198090694
MEDICAL - 120550.61980830671
SOCIAL - 23253652.127118643
SHOPPING - 7036877.311557789
PHOTOGRAPHY - 17840110.40229885
SPORTS - 3638640.1428571427
TRAVEL_AND_LOCAL - 13984077.710144928
TOOLS - 10801391.298666667
PERSONALIZATION - 5201482.6122448975
PRODUCTIVITY - 16787331.344927534
PARENTING - 542603.6206896552
WEATHER - 5074486.197183099
VIDEO_PLAYERS - 24727872.452830188
NEWS_AND_

Based on the average number of installs per genre, Games, Entertainment, and Social have the highest number of installs in the Google Play Store.

## Conclusions

The goal of this project is to analyze the app genres in Apple App and Google Play stores with the intent to make recommendations to the ACME Corporation regarding building profitable mobile apps.

Based on our analysis, we recommend ACME create a social networking app as it has the largest number of installs in both the Google Play and Apple App stores.  However, the games and entertainment categories have the most popular ratings in the Apple App store and positive ratings in the Google Play store.  We would also recommend that ACME also add some entertainment or game functionality to their social networking app as well.