# **App Usage Density** #

Our objective is to analyze consumer density for various apps and to determine which type of app should be developed with respect to the data presented.

In [5]:
# Apple apps source - https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home
# Google apps source - https://www.kaggle.com/lava18/google-play-store-apps/home

opened_file = open('AppleStore copy.csv')
opened_file2 = open('googleplaystore copy.csv')
from csv import reader
read_file = reader(opened_file)
read_file2 = reader(opened_file2)

apple_data = list(read_file)
google_data = list(read_file2)

In [6]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [8]:
explore_data(apple_data, 1, 5, rows_and_columns=True)
print()
explore_data(google_data, 1, 5, rows_and_columns=True)

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7198
Number of columns: 17

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pr

Here we determine which columns produce relevant data to our objective.

We only want to analyze free, English apps only since those are the only apps our company intends to make.

In [11]:
print(apple_data[0:1])
# size_bytes, rating_count_tot, user_rating,
# user_rating_ver, cont_rating, prime_genre, sup_devices.num.

# [3, 6, 8]
# [9, 11, 12, 13]

print()

print(google_data[0:1])
# Category, Rating, Reviews, Size, Installs,  
# Content Rating, Genres, Last Updated 

# [1, 2, 4, 5]
# [8, 9, 10]

[['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']]

[['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']]


The Google Play data set has duplicate counts of many apps. To optimize our data, it's optimal to remove these duplicate accounts to be as close to accurate as possible when quantifying the difference between these apps.

In [70]:
for app in google_data:
    name = app[0]
    if name == 'Facebook':
        print(app)
     
print()   
    
for app in google_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 

In [68]:
duplicate = []
unique = []

for app in google_data:
    name = app[0]
    if name in unique:
        duplicate.append(name)
    else:
        unique.append(name)

print('Duplicates:', len(duplicate))

Duplicates: 1176


If we look closely at the duplicate instances of 'Facebook' and 'Instagram', we'll notice that although *most* of the data is similar, there is **one** difference with each row - the number of reviews. This implies that duplicate accounts of these apps had their data extracted at different times. Although removing these duplicates randomly is an option, it's better to keep the row that was **most recently** taken into account. The row with the most reviews out of its respective duplicates satisfies this condition.

In [87]:
reviews_max = {}

for app in google_data[1:]:
    name = app[0]
    # n_reviews = float(app[0])
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        #reviews_max.append(n_reviews)
        reviews_max[name] = n_reviews
        
    if name not in reviews_max:
        #reviews_max.append(name, n_reviews)
        reviews_max[name] = n_reviews

In [98]:
print('Total number of apps:', len(google_data[1:]) - 1176)


Total number of apps: 9659


In [106]:
google_clean = []
already_added = []

for app in google_data[1:]:
    name = app[0]
    n_reviews = float(app[3])
    
    #if n_reviews == reviews_max and name not in already_added:
    if reviews_max[name] == n_reviews and name not in already_added: 
        google_clean.append(app)
        already_added.append(name)

In [107]:
# Verify that the loop produced the accurate number of apps in the data set.

len(google_clean)

9659

In [134]:
def english(string):
    for value in string:
        
        if ord(value) > 127:
            return False 
    
        #else:
            #return True
    
    return True

In [140]:
english('Instagram')

True

In [136]:
english('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [137]:
english('Docs To Go™ Free Office Suite')

False

In [138]:
english('Instachat 😜')

False

In [189]:
def english(string):
    count = 0
    
    for value in string:
        #count = 0
        if ord(value) > 127:
            count += 1
        
        #if count > 3:
            #return False
        #else:
            #return True
    if count > 3:
        return False
    else:
        return True
    
    #return True

print(english('Docs To Go™ Free Office Suite'))
print(english('Instachat 😜'))
print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False


In [216]:
google_english = []
google_not_english = []
apple_english = []
apple_not_english = []

for apps in google_clean:
    name = apps[0]
    if english(name): #is True:
        #google_english.append(name)
        google_english.append(apps)
    else:
        #google_not_english.append(name)
        google_not_english.append(apps)

for apps in apple_data[1:]:
    name = apps[2]
    if english(name): #is True:
        #apple_english.append(name)
        apple_english.append(apps)
    else:
        #apple_not_english.append(name)
        apple_not_english.append(apps)

explore_data(google_english, 0, 3, True)
print('\n')
explore_data(apple_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188

In [229]:
google_free = []
apple_free = []

for apps in google_english:
    cost = apps[6]
    if cost == 'Free':
        google_free.append(apps)

for apps in apple_english:
    cost = float(apps[5])
    if cost == 0.0: 
        apple_free.append(apps)

print(len(google_free))
print(len(apple_free))

8863
3222


To maximize profitability, it's optimal to build an app that's compatible with both Google and Apple. If an app is profitible within the first 6 months of its availability in Google Play, we can have optimistic projections about making it available in the Apple Store.

In [251]:
google_genres = {}
number = 0

for row in google_free:
    genres = row[9]
    if genres in google_genres:
        number += 1
        google_genres[genres] = number
        
    else:    
        google_genres[genres] = number

apple_genres = {}
number2 = 0

for row2 in apple_free:
    genres = row2[12]
    if genres in apple_genres:
        number2 += 1
        apple_genres[genres] = number2
        
    else:    
        apple_genres[genres] = number2

print(google_genres)
print()
print(apple_genres)

{'Art & Design': 6959, 'Art & Design;Creativity': 8387, 'Auto & Vehicles': 8720, 'Beauty': 8238, 'Books & Reference': 8748, 'Business': 8744, 'Comics': 8737, 'Comics;Creativity': 239, 'Communication': 8700, 'Dating': 8119, 'Education': 8746, 'Education;Creativity': 1385, 'Education;Education': 7319, 'Education;Pretend Play': 6853, 'Education;Brain Games': 1489, 'Entertainment': 8730, 'Entertainment;Brain Games': 5804, 'Entertainment;Creativity': 5313, 'Entertainment;Music & Video': 7422, 'Events': 8680, 'Finance': 8727, 'Food & Drink': 8296, 'Health & Fitness': 8561, 'House & Home': 8297, 'Libraries & Demo': 8645, 'Lifestyle': 8749, 'Lifestyle;Pretend Play': 1134, 'Arcade': 8689, 'Card': 8713, 'Puzzle': 8581, 'Racing': 8688, 'Sports': 8687, 'Casual': 8704, 'Simulation': 8458, 'Adventure': 8454, 'Trivia': 8702, 'Action': 8712, 'Word': 8701, 'Role Playing': 8691, 'Strategy': 8718, 'Board': 8316, 'Music': 6871, 'Action;Action & Adventure': 6405, 'Casual;Brain Games': 7714, 'Educational;Cr

In [308]:
#app_count = {}
#freq_count = 0

def freq_table(dataset, index):
    freq_count = 0
    app_count = {}
    
    for row in dataset:
        freq_count += 1 # added
        observation = row[index]
        if observation in app_count: #row
            app_count[observation] += 1
        else:
            app_count[observation] = 1
    
    # forgot percentage
    
    table_percentages = {}
    for key in app_count:
        percentage = (app_count[key] / freq_count) * 100
        table_percentages[key] = percentage 
    
    return table_percentages
    
    #print(app_count)
    
def display_table(dataset, index):
    app_count = freq_table(dataset, index)
    table_display = []
    for key in app_count:
        key_val_as_tuple = (app_count[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
                
    

In [298]:
# prime-genre, Genres and Category 

display_table(apple_free, -5)
print()
display_table(google_free, -4)
print()
display_table(google_free, 1)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161

In [270]:
google_data[0:1]

[['App',
  'Category',
  'Rating',
  'Reviews',
  'Size',
  'Installs',
  'Type',
  'Price',
  'Content Rating',
  'Genres',
  'Last Updated',
  'Current Ver',
  'Android Ver']]

In [271]:
apple_data[0:1]

[['',
  'id',
  'track_name',
  'size_bytes',
  'currency',
  'price',
  'rating_count_tot',
  'rating_count_ver',
  'user_rating',
  'user_rating_ver',
  'ver',
  'cont_rating',
  'prime_genre',
  'sup_devices.num',
  'ipadSc_urls.num',
  'lang.num',
  'vpp_lic']]

In [272]:
google_free[0:1]

[['Photo Editor & Candy Camera & Grid & ScrapBook',
  'ART_AND_DESIGN',
  '4.1',
  '159',
  '19M',
  '10,000+',
  'Free',
  '0',
  'Everyone',
  'Art & Design',
  'January 7, 2018',
  '1.0.0',
  '4.0.3 and up']]

In [316]:
freq_table(apple_free, -5)

{'Productivity': 1.7380509000620732,
 'Weather': 0.8690254500310366,
 'Shopping': 2.60707635009311,
 'Reference': 0.5586592178770949,
 'Finance': 1.1173184357541899,
 'Music': 2.0484171322160147,
 'Utilities': 2.5139664804469275,
 'Travel': 1.2414649286157666,
 'Social Networking': 3.2898820608317814,
 'Sports': 2.1415270018621975,
 'Health & Fitness': 2.0173805090006205,
 'Games': 58.16263190564867,
 'Food & Drink': 0.8069522036002483,
 'News': 1.3345747982619491,
 'Book': 0.4345127250155183,
 'Photo & Video': 4.9658597144630665,
 'Entertainment': 7.883302296710118,
 'Business': 0.5276225946617008,
 'Lifestyle': 1.5828677839851024,
 'Education': 3.662321539416512,
 'Navigation': 0.186219739292365,
 'Medical': 0.186219739292365,
 'Catalogs': 0.12414649286157665}

In [338]:
apple_popular = freq_table(apple_free, -5)

for genre in apple_popular:
    total = 0
    len_genre = 0
    #for app in apple_popular:
    for app in apple_free:
        genre_app = app[-5]
        if genre_app == genre:
            num = float(app[6])
            # num += total
            total += num
            len_genre += 1
        

    avg = total / len_genre
    print(genre, ':', avg)
            

Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0






**Decide which type of App Store is best here**





In [352]:
google_popular = freq_table(google_free, 1)

for category in google_popular:
    total = 0
    len_category = 0
    for app in google_free:
        category_app = app[1]
        if category_app == category:
            num = app[5]
            # num.replace('+', '')
            num = num.replace('+', '')
            # num.replace(',', '')
            num = num.replace(',', '')
            total += float(num)
            len_category += 1
        
    avg = total / len_category
    print(category, ":", avg)
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_





**Decide which type of Google Play app is best here**



