# Analyzing Mobile App Data

To have a successful business in any field, accurate processing, organization, and analysis of the business's data is crucial. Recognizing what has been profitable in the past, what needs to be discontinued, and monitoring new additions are all part of a company's advancement, and require detailed attention. As more data is acquired, the need for organization rapidly becomes more urgent. In this project, we will simulate accurate data analysis for a mobile app company.

In [2]:
opened_file = open('AppleStore.csv')
from csv import reader
read_file = reader(opened_file)
apple_store = list(read_file)
apple_store_header = apple_store[0]
apple_store = apple_store[1:]

opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
google_play = list(read_file)
google_play_header = google_play[0]
google_play = google_play[1:]

In the code above we have opened our two data files. They are now ready to be used! Before we dive into organizing our data, it is good to, if only for the benefit of the coder, get a feel for what our data files hold, and in what form the data is given. In the code below we will use a function to help us access the data repeatedly in an orderly fashion. The function also has the option to give us more information about the size of our data files.

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print(apple_store_header)
print('\n')
explore_data(apple_store, 0, 5, rows_and_columns=True)

print(google_play_header)
print('\n')
explore_data(google_play, 0, 5, rows_and_columns=True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16
['App', 'Catego

After using our function, we can see that the AppleStore file has 7197 rows and 16 columns, while the GooglePlay file has 10841 rows and 13 columns. Each column covers a singular topic, such as price, user ratings, reviews, name, and size. Each row holds the information for each new app. Understanding the form the data files are written in, we can now proceed to process and organize our data.

The first step in organizing our data is cleaning it. Data cleaning includes removing duplicate data, inaccurate data, or data that doesn't fit our criteria. Our criteria is free, English apps. If an app is not free, or not in English, it needs to be removed. As data analysts, we need to be very detailed and careful when cleaning data, as a proper analysis of our data cannot be achieved with incorrect data in our files.

To clean our data, we will first run our data through a for loop that will look for any apps missing data. Then we will use the helpful discussion pages for both the [AppleStore](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps/discussion) dataset and the [Google Play](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion) dataset to see if any other users have found more problems in these files.

In [4]:
index = 0
for app in apple_store: # Looping through AppleStore dataset
    if len(app) != len(apple_store_header):
        print("Error found in AppleStore dataset:")
        print(apple_store_header)
        print("\n")
        print(app)
        print(index)
    index += 1

index = 0
for app in google_play: # looping through Google Play dataset
    if len(app) != len(google_play_header):
        print("Error found in Google Play dataset:")
        print(google_play_header)
        print("\n")
        print(app)
        print(index)
    index += 1        

Error found in Google Play dataset:
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472


Our loops have returned an error in the Google Play dataset. It seems as though the app "Life Made WI-Fi Touchscreen Photo Frame" is missing a piece of data (has an empty string). Furthermore, after looking at the discussion pages for both datasets, the only error reported was the one we just found ourselves! We will now proceed to delete this app from the Google Play dataset. It is important to note that we must carefully handle the code below, only running it once. If the code is run more than once it will delete more than just our problematic app.

In [5]:
del google_play[10472]

Now we need to check our datasets for duplicates. In the code below, we will run through the names of all the apps, and if one appears more than once, we put it in the duplicate list. Once our duplicate lists are complete, we will print a small preview and the length of the list. We will repeat this same code for each dataset.

In [6]:
duplicate_as = [] # 'as' stands for AppleStore; same in next line
unique_as = []

for app in apple_store:
    name = app[1]
    if name in unique_as:
        duplicate_as.append(name)
    else:
        unique_as.append(name)

print('Number of duplicate apps in the AppleStore dataset:', len(duplicate_as))
print('Examples of duplicate apps in the AppleStore dataset:', duplicate_as[:15])
print('\n')

duplicate_gp = [] # 'gp' stands for Google Play; same in next line
unique_gp = []

for app in google_play:
    name = app[0]
    if name in unique_gp:
        duplicate_gp.append(name)
    else:
        unique_gp.append(name)

print('Number of duplicate apps in the Google Play dataset:', len(duplicate_gp))
print('Examples of duplicate apps in the Google Play dataset:', duplicate_gp[:15])



Number of duplicate apps in the AppleStore dataset: 2
Examples of duplicate apps in the AppleStore dataset: ['Mannequin Challenge', 'VR Roller Coaster']


Number of duplicate apps in the Google Play dataset: 1181
Examples of duplicate apps in the Google Play dataset: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Looking at our output, we can see that both our datasets have duplicates. The AppleStore dataset has only two, while the Google Play dataset has significantly more. However, we are not going to simply remove all the duplicate data, because one of the duplicates may have more updated information than the others. We need to examine the different rows and select the most updated ones, using a dictionary. For every app name that occurs in our dataset, a new dictionary key is created. The value for each key will then be the number of reviews for that app. Every time the app reoccurs as a duplicate, the number of reviews is checked to see if the new number of reviews is more updated than the current key number.

In [7]:
print('Expected length:', len(google_play) - 1181)
print('\n')

reviews_max_gp = {}
for app in google_play:
    name = app[0]
    n_reviews = float(app[3])
    if (name in reviews_max_gp) and (reviews_max_gp[name] < n_reviews):
        reviews_max_gp[name] = n_reviews
    elif name not in reviews_max_gp:
        reviews_max_gp[name] = n_reviews

print("Google Play Dictionary length: ", len(reviews_max_gp))

Expected length: 9659


Google Play Dictionary length:  9659


In [8]:
print('Expected length:', len(apple_store) - 2)
print('\n')

reviews_max_as = {}
for app in apple_store:
    name = app[1]
    n_reviews = float(app[5])
    if (name in reviews_max_as) and (reviews_max_as[name] < n_reviews):
        reviews_max_as[name] = n_reviews
    elif name not in reviews_max_as:
        reviews_max_as[name] = n_reviews

print("AppleStore Dictionary length: ", len(reviews_max_as))

Expected length: 7195


AppleStore Dictionary length:  7195


Now that we have our dictionaries, we can start using them to remove the duplicates in our datasets.

In [9]:
gp_clean = [] # 'gp' stands for Google Play
already_added_gp = []

for app in google_play:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max_gp[name] == n_reviews) and (name not in already_added_gp):
        gp_clean.append(app)
        already_added_gp.append(name)

explore_data(gp_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [10]:
as_clean = [] # 'as' stands for AppleStore
already_added_as = []

for app in apple_store:
    name = app[1]
    n_reviews = float(app[5])
    if (reviews_max_as[name] == n_reviews) and (name not in already_added_as):
        as_clean.append(app)
        already_added_as.append(name)

explore_data(as_clean, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7195
Number of columns: 16


In the codes above, we loop through both of our datasets. For each loop, we assign the name of the app iteration to a variable, and we assign the number of reviews to a variable. Using these for each app in the dataset, we check each one to see if it has the maximum number of reviews for that app, using our dictionary. We also check to see if the app in question has already been addressed, otherwise, we might accidentally include some duplicate apps in our gp_clean list. Finally, we use our explore_data function to ensure everything went as expected in our data cleaning loop.

Now that we have cleaned our datasets of duplicates, we would now like to clean it of apps that are not made for an English audience, or apps that are written in different languages. Before we start removing these apps from our dataset, however, we are first going to write a function that will check strings to see whether they are English or not. We will do this by checking the corresponding number for each character.

Note: The 'print' entries below the function enter practice or test strings into the function to make sure it runs correctly and returns the right return statement.

In [11]:
def is_english(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


In this function, we can see that when all the characters are English, it returns 'True'. However, the characters '™' and '😜' fall out of the [ASCII](https://en.wikipedia.org/wiki/ASCII) (American Standard Code for Information Interchange) range for English, which causes these strings to return 'False'.

Although the function completed its job, the last two strings we entered were labeled as 'not English' because of the two characters that were not in the English range. If we keep the function the way it is, we stand to lose much useful data. In the code below, we will rewrite our function, but with a few changes to preserve as much useful data as possible. We shall make it so that a string is only labeled as 'not English' if it has more than three non-English characters. If the string has even up to three non-English characters, it will still be labeled as English. While this new function filter is still imperfect, it should be able to get us the majority of the data that we need.

In [12]:
def is_english(string):
    non_english = 0
    for character in string:    
        if ord(character) > 127:
            non_english +=1
        if non_english > 3:
            return False
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


We shall now use our function to clean both the AppleStore dataset and the Google Play dataset by looping through the dataset. If the name of an app has three or fewer non-English characters, that app will be placed in a new clean list.

In [13]:
google_clean = []
apple_clean = []

for app in gp_clean:
    name = app[0]
    if is_english(name) == True:
        google_clean.append(app)

for app in as_clean:
    name = app[1]
    if is_english(name) == True:
        apple_clean.append(app)
        
print("Clean Google Play dataset sample:")
explore_data(google_clean, 0, 3, True)
print("\n")
print("Clean AppleStore dataset sample:")
explore_data(apple_clean, 0, 3, True)

Clean Google Play dataset sample:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


Clean AppleStore dataset sample:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '11647692

As we have cleaned our data thus far, we have removed inaccurate data, duplicate apps, and non-English apps. Our final step in the data cleaning process will be to remove all non-free apps since our data analysis covers only free apps.

In the code below we loop through both datasets and check each app to see if the price is equal to 0. If the app is free, then we add it to a new, clean list.

In [14]:
google_clean_final = []
apple_clean_final = []

for app in google_clean:
    price = app[7]
    if price == '0':
        google_clean_final.append(app)

for app in apple_clean:
    price = app[4]
    if price == '0.0':
        apple_clean_final.append(app)
        
print("Clean Google Play dataset sample:")
explore_data(google_clean_final, 0, 3, True)
print("\n")
print("Clean AppleStore dataset sample:")
explore_data(apple_clean_final, 0, 3, True)

Clean Google Play dataset sample:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8864
Number of columns: 13


Clean AppleStore dataset sample:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '11647692

Now that we have cleaned our data, it's time to analyze it. One of our jobs as data analysts is to determine what has been successful, what has been popular with customers or users, what hasn't been popular, etc. In this project, our goal is to create an app profile (an app template) that will be profitable. By analyzing our data, and seeing what is most popular and profitable, we will create and add our app profile to both Google Play and AppleStore.

To create our profitable app profile, we will need to create several frequency tables of different aspects of the apps in our datasets to see what is most profitable, popular, or used most often. 

After looking at the headers of our datasets, the best columns for our analysis seem to be the 'prime_genre' column in the AppleStore dataset, and the 'Genres' and 'Category' columns in the Google Play dataset. To create these frequency tables and analyze their data, we shall build two functions. The first function will generate the frequency tables that show the percentages of the different keys. The second function will display the percentages in descending order.

In [15]:
def freq_table(dataset, index):
    frequency_table = {}
    total = 0
    for row in dataset:
        total += 1
        a_data_point = row[index]
        if a_data_point in frequency_table:
            frequency_table[a_data_point] += 1
        else:
            frequency_table[a_data_point] = 1
    
    frequency_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / total) * 100
        frequency_percentages[key] = percentage
    
    return frequency_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


display_table(apple_clean_final, 11)
print('\n')
display_table(google_clean_final, 1)
print('\n')
display_table(google_clean_final, 9)

Games : 58.13664596273293
Entertainment : 7.888198757763975
Photo & Video : 4.968944099378882
Education : 3.6645962732919255
Social Networking : 3.291925465838509
Shopping : 2.608695652173913
Utilities : 2.515527950310559
Sports : 2.142857142857143
Music : 2.049689440993789
Health & Fitness : 2.018633540372671
Productivity : 1.7391304347826086
Lifestyle : 1.5838509316770186
News : 1.3354037267080745
Travel : 1.2422360248447204
Finance : 1.1180124223602486
Weather : 0.8695652173913043
Food & Drink : 0.8074534161490683
Reference : 0.5590062111801243
Business : 0.5279503105590062
Book : 0.43478260869565216
Navigation : 0.18633540372670807
Medical : 0.18633540372670807
Catalogs : 0.12422360248447205


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084


Now that we have all the percentages right in front of us, let's begin to analyze, starting with the 'prime_genre' column in the AppleStore dataset. We can see that the majority of the apps are "Games" in genre, at 58.1%. After "Games" there is "Entertainment", at 7.8%, followed by "Photo & Video, at 4.9%. From the order of these genres, we get the impression that most of these apps are designed for enjoyment purposes, while a lesser amount of the apps are designed for practical purposes. The profile we would recommend for the App Store market would be one designed for enjoyment, preferably somewhere around the "Games" and "Entertainment" genres. This is not necessarily to say that apps of these genres have the most users, as the demand for these genres is possibly not the same as the offer of them.

Turning our attention to the other two tables from the Google Play dataset, we can see that 'FAMILY' holds the highest rank in the 'Category' column at 18.9%, followed by 'GAME' at 9.7%. In the 'Genres' column, we have 'Tools' as the highest genre, seconded by 'Entertainment' at 6.0%. Studying the general patterns of these two columns, we can see that the majority of the apps in this dataset are designed for more practical uses, with the lesser percentage given to 'Games' or 'Entertainment'. In comparison with the AppleStore dataset, it is clear that while 'Games' are still high on the list, they are not the main leaning of the Google Play dataset, as is the AppleStore dataset. The recommended app profile these tables give for the Google Play market is one of a more practical genre rather than an entertainment-based genre. However, if we were to recommend an app profile from the data of both tables, it would be one of a more useful, while still entertaining, genre. Like maybe an educational game, for example.

After analyzing both datasets, we also want to determine which types of apps have the highest number of users. To do this we will calculate the average number of installs for each app genre. This analysis will help us determine which apps are popular among users, rather than simply which apps have been designed the most, as we did before.

In the code below, we are starting with the AppleStore dataset. Using nested for loops, we isolate the apps of each genre, calculate the total rating for that genre, and divide that number by the total number of apps in that genre. The final number will show us the average number of user ratings per app genre, thus showing us which genre of app is most popular.

In [16]:
genres_as = freq_table(apple_clean_final, 11) #'as' stands for AppleStore

for genre in genres_as:
    total = 0
    len_genre = 0
    for app in apple_clean_final:
        genre_app = app[11]
        if genre_app == genre:
            ratings = float(app[5])
            total += ratings
            len_genre += 1
    average = total/len_genre
    print('App Genre: ' + str(genre) + '; Average Number of User ratings: ' + str(average))
    print('\n')

App Genre: Social Networking; Average Number of User ratings: 71548.34905660378


App Genre: Photo & Video; Average Number of User ratings: 28441.54375


App Genre: Games; Average Number of User ratings: 22812.92467948718


App Genre: Music; Average Number of User ratings: 57326.530303030304


App Genre: Reference; Average Number of User ratings: 74942.11111111111


App Genre: Health & Fitness; Average Number of User ratings: 23298.015384615384


App Genre: Weather; Average Number of User ratings: 52279.892857142855


App Genre: Utilities; Average Number of User ratings: 18684.456790123455


App Genre: Travel; Average Number of User ratings: 28243.8


App Genre: Shopping; Average Number of User ratings: 26919.690476190477


App Genre: News; Average Number of User ratings: 21248.023255813954


App Genre: Navigation; Average Number of User ratings: 86090.33333333333


App Genre: Lifestyle; Average Number of User ratings: 16485.764705882353


App Genre: Entertainment; Average Number of Us

In studying this output, we see that the 'Navigation' genre has the highest average at 86,090, followed by 'Reference' at 74,942 and 'Social Networking' at 71,548. This result shows that in the AppleStore market, Navigation apps are the most popular. With that conclusion, we can recommend a popular app profile for the AppleStore market. This profile consists of a practical genre, specifically leaning toward navigation.

Moving on to the Google Play dataset, the code below will be very similar to what we did above with the AppleStore dataset. Once we determine the average number of installs for each app genre, we will make an analysis below.

In [19]:
category_gp = freq_table(google_clean_final, 1) #'gp' stands for Google Play

for category in category_gp:
    total = 0
    len_category = 0
    for app in google_clean_final:
        category_app = app[1]
        if category_app == category:
            installs = app[5]
            installs = installs.replace('+', '')
            installs = installs.replace(',', '')
            installs = float(installs)
            total += installs
            len_category += 1
    average = total/len_category
    print('App Category: ' + str(category) + '; Average Number of Installs: ' + str(average))
    print('\n')

App Category: ART_AND_DESIGN; Average Number of Installs: 1986335.0877192982


App Category: AUTO_AND_VEHICLES; Average Number of Installs: 647317.8170731707


App Category: BEAUTY; Average Number of Installs: 513151.88679245283


App Category: BOOKS_AND_REFERENCE; Average Number of Installs: 8767811.894736841


App Category: BUSINESS; Average Number of Installs: 1712290.1474201474


App Category: COMICS; Average Number of Installs: 817657.2727272727


App Category: COMMUNICATION; Average Number of Installs: 38456119.167247385


App Category: DATING; Average Number of Installs: 854028.8303030303


App Category: EDUCATION; Average Number of Installs: 1833495.145631068


App Category: ENTERTAINMENT; Average Number of Installs: 11640705.88235294


App Category: EVENTS; Average Number of Installs: 253542.22222222222


App Category: FINANCE; Average Number of Installs: 1387692.475609756


App Category: FOOD_AND_DRINK; Average Number of Installs: 1924897.7363636363


App Category: HEALTH_AND

We now have our categories paired with each one's number of installs. After studying this new output, we have determined that 'COMMUNICATION' has the most installs at 38456119. Next is 'VIDEO_PLAYERS' at 24727872 installs. Finally, in third place, we have 'SOCIAL' at 23253652 installs. In looking at the titles of these three categories, we can see a common genre of something like social media, where people communicate, share videos, and are social. That identified, we recommend a social media leaning genre or category for a popular app profile in the Google Play market.

To recap, our criteria for the mobile app profile were that the app should be free and consist of a popular genre. With clean datasets, containing only free apps that are written in English, we were able to make several analysis tables and recommendations. Finally, after analyzing the most popular app genres in both datasets, we can create a unified app profile. 

According to the AppleStore dataset, the 'Navigation' genre is the most popular. Meanwhile, in the Google Play dataset, the most popular genre is 'Social Media'. By combining elements of these two genres, we arrive at 'Community Navigation' as our ultimate genre. An app with this genre could look like a navigational app that connects the community of people using it, thus incorporating the social media aspect. For example, maybe one user could communicate a new road blockage, accident, etc., to other users of the app, making their navigation of the roads more efficient. Or, in another example, maybe one user could alert other users of flight cancelations, sales, or delays, thus making flight navigation faster as well. After determining the final genre, we can recommend a popular mobile app profile that fits the AppleStore and Google Play markets. The app should be free, available in English, and belong to the "Community Navigation" genre.