# Popular Google and Apple mobile apps
This is an analysis of the most popular android mobile apps that users frequent. 

This is intended to show marketing and development what types of Free apps are more likely to be downloaded by users, so that they can focus on making these types of apps. 

The data for this analysis can be download from [here](https://www.kaggle.com/lava18/google-play-store-apps)

In [1]:
# open and load the data set
opened_file = open('googleplaystore.csv')
from csv import reader
read_file = reader(opened_file)
app_data = list(read_file)

In [2]:
# function to list a specified number of rows of the data set 
# to explore contents.
def explore_data(data, start, end, rows_and_columns=False):
    list_data = data[start:end]
    for row in list_data:
        print(row)
        print('\n')
    if rows_and_columns:
        print('Number of rows:', len(list_data))
        print('Number of columns:', len(list_data[0]))     
    

In [3]:
# look at the first 5 rows of the data set
explore_data(app_data, 0, 5, rows_and_columns= True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 5
Number of columns: 13


Print the header columns to identify data to be used in the analysis. 

In [4]:
print(app_data[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Cleaning data
- Detect rows with missing columns.
- Detect duplicate rows.
- Remove non-English apps.
- Remove apps that aren't free.

In [5]:
# loop through the list to see which row may be missing columns.
for row in app_data: 
    if len(row) != len(app_data[0]):
        print(row)
        print(app_data.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


In [6]:
# remove the row with the index that was printed. This row is missing the 'Category column'
del app_data[10473]

In [7]:
# check for duplicate apps
duplicates = []
unique_apps = []
for row in app_data:
    app = row[0]
    if app in unique_apps:
        duplicates.append(app)
    else:
        unique_apps.append(app)
print('Number of duplicate apps:', len(duplicates))
print('\n')
print('Examples of duplicate apps:', duplicates[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


In [8]:
# inspect duplicates
for app in app_data:
    name= app[0]
    if name == 'Google Ads':
        print(app)

['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29313', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']
['Google Ads', 'BUSINESS', '4.3', '29331', '20M', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 30, 2018', '1.12.0', '4.0.3 and up']


Keep only the duplicate which has the highest number of reviews, which is the latest data.
 As seen above 'Google Ads' has three duplicates but one of them has higher number of reviews (4th column).

- Create a dictionary, where each key is an app name and the value is the highest number of reviews.
- Use this dictionary to filter the app_data into a new data set.

In [9]:
reviews_max ={}
for row in app_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and n_reviews > reviews_max[name]:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews     

In [10]:
print('Expected length', len(app_data[1:])- len(duplicates))
print('Actual length', len(reviews_max))

Expected length 9659
Actual length 9659


To remove the duplicates start by:

- We start by initializing two empty lists, android_clean and already_added.
- We loop through the android data set, and for every iteration:
     - We isolate the name of the app and the number of reviews.
     - We add the current row to the android_clean list, and the app name (name) to the already_added list if:
         - The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
         - The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.


In [11]:
# remove duplicate rows
android_clean =[]
already_added = []

for row in app_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_clean.append(row)
        already_added.append(name)
print(len(android_clean))    

9659


In [12]:
print(len(already_added))

9659


In [13]:
print(android_clean[0:5])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'], ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'], ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up'], ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']]


Remove non-English characters from the android_clean data.

In [14]:
# create a function that searches a string and returns False if there is a non-English character, otherwise it returns True.
def English_app(app_name):
    non_ascii = 0
    for character in app_name:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:
        return False
    else:
        return True


In [15]:
English_app('Docs To Go™ Free Office Suite')

True

In [16]:
# Use the function to filter out non-english apps from the cleaned data.
android_apps = []

for row in android_clean:
    name = str(row[0])
    if English_app(name):
        android_apps.append(row)

In [17]:
explore_data(android_apps, 0, 2, True)
print('Length of filtered Android apps:', len(android_apps))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 2
Number of columns: 13
Length of filtered Android apps: 9614


## Isolating Free Apps
Isolating the free apps from the cleaned data (android_apps). Price is 8th column = [7].

In [18]:
Android_apps_free = []
for row in android_apps:
    price = row[6]
    if price == 'Free':
        Android_apps_free.append(row)

print('Number of free apps:', len(Android_apps_free))
print('Number of non-free apps:', len(android_apps)-len(Android_apps_free))

Number of free apps: 8863
Number of non-free apps: 751


Out of these free apps, what are the most common genres or categories?
This will narrow down the types of apps for development.

In [19]:
# function creates a frequency table for a dataset and the index is the category you want to calculate the frequency for.
def freq_table(dataset, index):
    freq = {}
    total = 0
    for row in dataset:
        total += 1
        cat = row[index]
        if cat in freq:
            freq[cat] += 1
        else:
            freq[cat] = 1
    
    freq_percent = {}        
    for key in freq:
        percentage = (freq[key]/total)*100
        freq_percent[key] = percentage
    
    return freq_percent    

In [20]:
# function that uses the freq_table to generate a tuple that can be sorted in descending order.
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [21]:
# display the categories
display_table(Android_apps_free, 1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

We see the "Family" category dominates in the types of apps but this does not mean that this category is popular in terms of downloads. Viewing the categories with the highest downloads is more informative. 

In [36]:
# create a table using the categories column and the installs column [5] to display the average number of installs for that category.

categories = freq_table(Android_apps_free, 1)

# take out the special characters from the installs column and turn it to a float
for cat in categories:
    total = 0
    len_cat = 0
    for row in Android_apps_free:
        cat_app = row[1]
        if cat_app == cat:
            n_installs = row[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_cat += 1
    avg_installs = total/len_cat
    print(cat, ':', avg_installs)
    

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

Above we see the highest number of downloads come from:
COMMUNICATION
VIDEO_PLAYERS
SOCIAL
PHOTOGRAPHY
PRODUCTIVITY
GAME
TRAVEL_AND_LOCAL
ENTERTAINMENT
TOOLS
NEWS_AND_MAGAZINES

What sorts of apps dominate in these categories?

In [40]:
# category is in the column [1]
# below displays the category of the app with he highest installs.

for row in Android_apps_free:
    if row[1] == "COMMUNICATION" and (row[5] == '1,000,000,000+'
                                     or row[5] == '500,000,000+'):
        print(row[0], ':', row[5])


WhatsApp Messenger : 1,000,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Viber Messenger : 500,000,000+


In [61]:
# looking at games
for row in Android_apps_free:
    if row[1] == 'GAME' and (row[5] == '1,000,000,000+'
                                     or row[5] == '500,000,000+' 
                                     or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

Sonic Dash : 100,000,000+
PAC-MAN : 100,000,000+
Roll the Ball® - slide puzzle : 100,000,000+
Piano Tiles 2™ : 100,000,000+
Pokémon GO : 100,000,000+
Extreme Car Driving Simulator : 100,000,000+
Trivia Crack : 100,000,000+
Angry Birds 2 : 100,000,000+
Candy Crush Saga : 500,000,000+
8 Ball Pool : 100,000,000+
Subway Surfers : 1,000,000,000+
Candy Crush Soda Saga : 100,000,000+
Clash Royale : 100,000,000+
Clash of Clans : 100,000,000+
Plants vs. Zombies FREE : 100,000,000+
Pou : 500,000,000+
Flow Free : 100,000,000+
My Talking Angela : 100,000,000+
slither.io : 100,000,000+
Cooking Fever : 100,000,000+
Yes day : 100,000,000+
Score! Hero : 100,000,000+
Dream League Soccer 2018 : 100,000,000+
My Talking Tom : 500,000,000+
Sniper 3D Gun Shooter: Free Shooting Games - FPS : 100,000,000+
Zombie Tsunami : 100,000,000+
Helix Jump : 100,000,000+
Crossy Road : 100,000,000+
Temple Run 2 : 500,000,000+
Talking Tom Gold Run : 100,000,000+
Agar.io : 100,000,000+
Bus Rush: Subway Edition : 100,000,00

In [65]:
# looking at entertainment
for row in Android_apps_free:
    if row[1] == 'TOOLS' and (row[5] == '1,000,000,000+'
                                     or row[5] == '500,000,000+' 
                                     or row[5] == '100,000,000+'):
        print(row[0], ':', row[5])

Google : 1,000,000,000+
Google Translate : 500,000,000+
Calculator : 100,000,000+
Device Help : 100,000,000+
Account Manager : 100,000,000+
SHAREit - Transfer & Share : 500,000,000+
Samsung Calculator : 100,000,000+
Gboard - the Google Keyboard : 500,000,000+
Google Korean Input : 100,000,000+
Share Music & Transfer Files - Xender : 100,000,000+
Tiny Flashlight + LED : 100,000,000+
GO Keyboard - Cute Emojis, Themes and GIFs : 100,000,000+
Speedtest by Ookla : 100,000,000+
CM Locker - Security Lockscreen : 100,000,000+
Applock : 100,000,000+
Clean Master- Space Cleaner & Antivirus : 500,000,000+
Lookout Security & Antivirus : 100,000,000+
Google Now Launcher : 100,000,000+
360 Security - Free Antivirus, Booster, Cleaner : 100,000,000+
Samsung Smart Switch Mobile : 100,000,000+
Avast Mobile Security 2018 - Antivirus & App Lock : 100,000,000+
AppLock : 100,000,000+
AVG AntiVirus 2018 for Android Security : 100,000,000+
Security Master - Antivirus, VPN, AppLock, Booster : 500,000,000+
Batt

## Conclusion
It looks like the top 5 categories are dominated by apps made by google. When looking at games the majority are under 100,000,000+ downloads, meaning that this category is probably saturated and not likely to get your app noticed. The best categories, within the top 10, is Tools. This has a good variety of different apps with 100,000,000+ installs.