# Profitable App Profiles for iOS mobile and Android

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

Opening and Exploring the Data

In [1]:
from csv import reader

##Google play store list
opened_file=open('googleplaystore.csv')
read_file=reader(opened_file)
android=list(read_file)
android_header=android[0]
android=android[1:]

##App store data set
opened_file=open('AppleStore.csv')
read_file=reader(opened_file)
ios=list(read_file)
ios_header=ios[0]
ios=ios[1:]

In [2]:
ios_header

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [3]:
android_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [4]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [5]:
explore_data(ios,1,5)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']




In [6]:
explore_data(ios,6,8,True)

['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


['553834731', 'Candy Crush Saga', '222846976', 'USD', '0.0', '961794', '2453', '4.5', '4.5', '1.101.0', '4+', 'Games', '43', '5', '24', '1']


Number of rows: 7197
Number of columns: 16


In [7]:
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [8]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [9]:
explore_data(android,1,5,True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 10841
Number of columns: 13


In [10]:
explore_data(android,10471,10473,True)

['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


Number of rows: 10841
Number of columns: 13


In [11]:
android[10472]

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

In [12]:
del android[10472]

The following code is to find the duplicate app entries in playstore data. We also store the name and hence the number of duplicate app entries. Create empty lists - unique apps and duplicate apps. If the name is in unique apps list, we will append the name to duplicate apps list, if not we will append them to unique_apps list.

In [13]:
duplicate_apps=[]
unique_apps=[]

for app in android:
    name=app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of dupliciate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of dupliciate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


For removing the duplicates, I will use the same criterion of keeping the apps with maximum number of downloads

In [15]:
reviews_max={}
for app in android:
    name=app[0]
    n_reviews=float(app[3])
    if name in reviews_max and reviews_max[name]<n_reviews:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews

In [18]:
reviews_max

{'Code': 239242.0,
 'The PCH App': 234971.0,
 'BC MVA Fines': 5.0,
 'Bullshite!': 48.0,
 'Sona - Nær við allastaðni': 31.0,
 'EY Catalyst Reader': 1.0,
 'My Cancer Coach': 44.0,
 'Free Radar Detector': 15680.0,
 'Titanic App - Feminist Dating Application': 120.0,
 'CF Church': 1.0,
 'ChatVideo Meet new people': 63986.0,
 '100+ C Programs': 20.0,
 'DZ PROMOS - Promotions & Sale Alerts in Algeria': 331.0,
 'CV maker for Job Applications and Resume Maker': 75.0,
 'Commonwealth CU Go Mobile': 231.0,
 'My Movies Pro - Movie & TV Collection Library': 6477.0,
 "DH Texas Poker - Texas Hold'em": 562345.0,
 'RadPad: Apartment Finder App': 6896.0,
 'BL PowerPoint Remote': 33.0,
 'Fishdom': 2158580.0,
 'Brainf*ck Interpreter': 24.0,
 'Morse Player Free': 268.0,
 'IP address BW': 7.0,
 'DW Espanol': 0.0,
 'Viva Decora - Decoration, Photos, Architecture, House': 3167.0,
 'E.G. Chess Free': 505.0,
 'Retro Camera': 125616.0,
 'MAIN': 7081.0,
 'bz Basel E-Paper': 72.0,
 'BW Taxi': 67.0,
 'Where Am I?':

In [19]:
print('Actual Length', len(android))
print('Expected Length', len(reviews_max))

Actual Length 10840
Expected Length 9659


In [20]:
android_clean=[]
already_added=[]
for app in android:
    name=app[0]
    n_reviews=float(app[3])
    if (n_reviews==reviews_max[name]) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
    

In [21]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


In [22]:
def isitenglish (strng):
    for cr in strng:
        if ord(cr)>127:
            return False
    return True

In [25]:
def isitenglish ('爱奇艺PPS -《欢乐颂2》电视剧热播')

SyntaxError: invalid syntax (<ipython-input-25-3deb3a7897e6>, line 1)

In [26]:
def isitenglish ('爱奇艺PPS -《欢乐颂2》电视剧热播')

SyntaxError: invalid syntax (<ipython-input-26-3deb3a7897e6>, line 1)

In [27]:
isitenglish ('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [28]:
isitenglish('Instagram')

True

In [29]:
isitenglish('Docs To Go™ Free Office Suite')

False

In [30]:
isitenglish('Instachat 😜')

False

In [32]:
def isitenglish_smil (strng):
    n=0
    for cr in strng:
        if ord(cr)>127:
            n=n+1
            if n>3:
                return False
    return True

In [33]:
isitenglish_smil('Instachat 😜')

True

In [34]:
isitenglish_smil('Docs To Go™ Free Office Suite')

True

In [35]:
isitenglish_smil('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [36]:
reviews_max_ios={}
for app in ios:
    name=app[1]
    n_reviews=float(app[5])
    if name in reviews_max_ios and reviews_max_ios[name]<n_reviews:
        reviews_max_ios[name]=n_reviews
    elif name not in reviews_max_ios:
        reviews_max_ios[name]=n_reviews


In [37]:
ios_clean=[]
ios_already_added=[]
for app in ios:
    name=app[1]
    n_reviews=float(app[5])
    if (n_reviews==reviews_max_ios[name]) and (name not in ios_already_added):
        ios_clean.append(app)
        ios_already_added.append(name)


In [38]:
android_final=[]
ios_final=[]
for app in android_clean:
    if isitenglish_smil(app[0]) and app[7]=='0':
        android_final.append(app)
for app in ios_clean:
    if isitenglish_smil(app[1]) and app[4]=='0.0':
        ios_final.append(app)

In [39]:
print(len(android_final))

8864


In [40]:
print(len(ios_final))

3220


In [42]:
ios_english = []
for app in ios:
    name = app[1]
    if isitenglish_smil(name):
        ios_english.append(app)

In [43]:
print(len(ios_english))

6183


I ran the ios_english function to check whether my logic is correct. The number of android_final apps of 8864 tallies with the solution but ios_final is 3220 compared to the answer of 3222. It seems like we are missing 2 apps but we do not. 
In app store, there are two apps named 'Mannequin Challenge' and 'VR Roller Coaster' which seem like repeat entries but they are from different developers.

My function kept just only one of them, so I ended up with 2 less apps. You can see the discussion here: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion/90409

In [44]:
ios_header[-5]

'prime_genre'

In [45]:
android_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [None]:
Frequency table

In [46]:
def freq_table(dataset, index):
    freq_table_dic={}
    for row in dataset:
        name=row[index]
        if name in freq_table_dic:
            freq_table_dic[name]+=1
        else:
            freq_table_dic[name]=1
    freq_table_per={}
    for key in freq_table_dic:
        freq_table_per[key]=freq_table_dic[key]*100/len(dataset)
    
    return freq_table_per
        
    

In [47]:
freq_table(android,1)

{'ART_AND_DESIGN': 0.5996309963099631,
 'AUTO_AND_VEHICLES': 0.7841328413284133,
 'BEAUTY': 0.488929889298893,
 'BOOKS_AND_REFERENCE': 2.1309963099630997,
 'BUSINESS': 4.243542435424354,
 'COMICS': 0.5535055350553506,
 'COMMUNICATION': 3.5701107011070112,
 'DATING': 2.158671586715867,
 'EDUCATION': 1.4391143911439115,
 'ENTERTAINMENT': 1.3745387453874538,
 'EVENTS': 0.5904059040590406,
 'FAMILY': 18.19188191881919,
 'FINANCE': 3.376383763837638,
 'FOOD_AND_DRINK': 1.1715867158671587,
 'GAME': 10.55350553505535,
 'HEALTH_AND_FITNESS': 3.1457564575645756,
 'HOUSE_AND_HOME': 0.8118081180811808,
 'LIBRARIES_AND_DEMO': 0.7841328413284133,
 'LIFESTYLE': 3.5239852398523985,
 'MAPS_AND_NAVIGATION': 1.2638376383763839,
 'MEDICAL': 4.271217712177122,
 'NEWS_AND_MAGAZINES': 2.61070110701107,
 'PARENTING': 0.5535055350553506,
 'PERSONALIZATION': 3.6162361623616235,
 'PHOTOGRAPHY': 3.0904059040590406,
 'PRODUCTIVITY': 3.911439114391144,
 'SHOPPING': 2.3985239852398523,
 'SOCIAL': 2.7214022140221403

In [48]:
freq_table(android_final,1)

{'ART_AND_DESIGN': 0.6430505415162455,
 'AUTO_AND_VEHICLES': 0.9250902527075813,
 'BEAUTY': 0.5979241877256317,
 'BOOKS_AND_REFERENCE': 2.1435018050541514,
 'BUSINESS': 4.591606498194946,
 'COMICS': 0.6204873646209387,
 'COMMUNICATION': 3.237815884476534,
 'DATING': 1.861462093862816,
 'EDUCATION': 1.1620036101083033,
 'ENTERTAINMENT': 0.9589350180505415,
 'EVENTS': 0.7107400722021661,
 'FAMILY': 18.907942238267147,
 'FINANCE': 3.700361010830325,
 'FOOD_AND_DRINK': 1.2409747292418774,
 'GAME': 9.724729241877256,
 'HEALTH_AND_FITNESS': 3.079873646209386,
 'HOUSE_AND_HOME': 0.8235559566787004,
 'LIBRARIES_AND_DEMO': 0.9363718411552346,
 'LIFESTYLE': 3.9034296028880866,
 'MAPS_AND_NAVIGATION': 1.3989169675090252,
 'MEDICAL': 3.5311371841155235,
 'NEWS_AND_MAGAZINES': 2.7978339350180503,
 'PARENTING': 0.6543321299638989,
 'PERSONALIZATION': 3.3167870036101084,
 'PHOTOGRAPHY': 2.9444945848375452,
 'PRODUCTIVITY': 3.892148014440433,
 'SHOPPING': 2.2450361010830324,
 'SOCIAL': 2.6624548736462

In [49]:
freq_table(ios_final,-5)

{'Book': 0.43478260869565216,
 'Business': 0.5279503105590062,
 'Catalogs': 0.12422360248447205,
 'Education': 3.6645962732919255,
 'Entertainment': 7.888198757763975,
 'Finance': 1.1180124223602483,
 'Food & Drink': 0.8074534161490683,
 'Games': 58.13664596273292,
 'Health & Fitness': 2.018633540372671,
 'Lifestyle': 1.5838509316770186,
 'Medical': 0.18633540372670807,
 'Music': 2.049689440993789,
 'Navigation': 0.18633540372670807,
 'News': 1.3354037267080745,
 'Photo & Video': 4.968944099378882,
 'Productivity': 1.7391304347826086,
 'Reference': 0.5590062111801242,
 'Shopping': 2.608695652173913,
 'Social Networking': 3.2919254658385095,
 'Sports': 2.142857142857143,
 'Travel': 1.2422360248447204,
 'Utilities': 2.515527950310559,
 'Weather': 0.8695652173913043}

In [50]:
d=freq_table(ios_final,5)
sorted_d = sorted((value, key) for (key,value) in d.items())

In [51]:
sorted_d

[(0.031055900621118012, '100035'),
 (0.031055900621118012, '1002'),
 (0.031055900621118012, '100477'),
 (0.031055900621118012, '1005'),
 (0.031055900621118012, '1009'),
 (0.031055900621118012, '1010'),
 (0.031055900621118012, '10107'),
 (0.031055900621118012, '10109'),
 (0.031055900621118012, '10118'),
 (0.031055900621118012, '10122'),
 (0.031055900621118012, '101222'),
 (0.031055900621118012, '1015'),
 (0.031055900621118012, '10159'),
 (0.031055900621118012, '101599'),
 (0.031055900621118012, '101679'),
 (0.031055900621118012, '1017'),
 (0.031055900621118012, '1018'),
 (0.031055900621118012, '10263'),
 (0.031055900621118012, '10275'),
 (0.031055900621118012, '10298'),
 (0.031055900621118012, '1033'),
 (0.031055900621118012, '1034'),
 (0.031055900621118012, '1035'),
 (0.031055900621118012, '10358'),
 (0.031055900621118012, '103655'),
 (0.031055900621118012, '10368'),
 (0.031055900621118012, '1037'),
 (0.031055900621118012, '104060'),
 (0.031055900621118012, '104502'),
 (0.0310559006211

In [52]:
d=freq_table(ios_final,-5)
sorted_d = sorted((value, key) for (key,value) in d.items())

In [53]:
sorted_d

[(0.12422360248447205, 'Catalogs'),
 (0.18633540372670807, 'Medical'),
 (0.18633540372670807, 'Navigation'),
 (0.43478260869565216, 'Book'),
 (0.5279503105590062, 'Business'),
 (0.5590062111801242, 'Reference'),
 (0.8074534161490683, 'Food & Drink'),
 (0.8695652173913043, 'Weather'),
 (1.1180124223602483, 'Finance'),
 (1.2422360248447204, 'Travel'),
 (1.3354037267080745, 'News'),
 (1.5838509316770186, 'Lifestyle'),
 (1.7391304347826086, 'Productivity'),
 (2.018633540372671, 'Health & Fitness'),
 (2.049689440993789, 'Music'),
 (2.142857142857143, 'Sports'),
 (2.515527950310559, 'Utilities'),
 (2.608695652173913, 'Shopping'),
 (3.2919254658385095, 'Social Networking'),
 (3.6645962732919255, 'Education'),
 (4.968944099378882, 'Photo & Video'),
 (7.888198757763975, 'Entertainment'),
 (58.13664596273292, 'Games')]

In [54]:
d=freq_table(ios_final,-5)
sorted_d = sorted((value, key) for (key,value) in d.items(), reverse=True)

SyntaxError: Generator expression must be parenthesized if not sole argument (<ipython-input-54-c19403261c9c>, line 2)

In [55]:
d=freq_table(ios_final,-5)
sorted_d = sorted(((value, key) for (key,value) in d.items()), reverse=True)

In [56]:
sorted_d

[(58.13664596273292, 'Games'),
 (7.888198757763975, 'Entertainment'),
 (4.968944099378882, 'Photo & Video'),
 (3.6645962732919255, 'Education'),
 (3.2919254658385095, 'Social Networking'),
 (2.608695652173913, 'Shopping'),
 (2.515527950310559, 'Utilities'),
 (2.142857142857143, 'Sports'),
 (2.049689440993789, 'Music'),
 (2.018633540372671, 'Health & Fitness'),
 (1.7391304347826086, 'Productivity'),
 (1.5838509316770186, 'Lifestyle'),
 (1.3354037267080745, 'News'),
 (1.2422360248447204, 'Travel'),
 (1.1180124223602483, 'Finance'),
 (0.8695652173913043, 'Weather'),
 (0.8074534161490683, 'Food & Drink'),
 (0.5590062111801242, 'Reference'),
 (0.5279503105590062, 'Business'),
 (0.43478260869565216, 'Book'),
 (0.18633540372670807, 'Navigation'),
 (0.18633540372670807, 'Medical'),
 (0.12422360248447205, 'Catalogs')]

In [57]:
d=freq_table(ios_final,-5)
sorted_d = sorted(((key, value) for (key,value) in d.items()), reverse=True)

In [58]:
sorted_d

[('Weather', 0.8695652173913043),
 ('Utilities', 2.515527950310559),
 ('Travel', 1.2422360248447204),
 ('Sports', 2.142857142857143),
 ('Social Networking', 3.2919254658385095),
 ('Shopping', 2.608695652173913),
 ('Reference', 0.5590062111801242),
 ('Productivity', 1.7391304347826086),
 ('Photo & Video', 4.968944099378882),
 ('News', 1.3354037267080745),
 ('Navigation', 0.18633540372670807),
 ('Music', 2.049689440993789),
 ('Medical', 0.18633540372670807),
 ('Lifestyle', 1.5838509316770186),
 ('Health & Fitness', 2.018633540372671),
 ('Games', 58.13664596273292),
 ('Food & Drink', 0.8074534161490683),
 ('Finance', 1.1180124223602483),
 ('Entertainment', 7.888198757763975),
 ('Education', 3.6645962732919255),
 ('Catalogs', 0.12422360248447205),
 ('Business', 0.5279503105590062),
 ('Book', 0.43478260869565216)]

In [59]:
d=freq_table(ios_final,-5)
sorted_d = sorted(((value, key) for (key,value) in d.items()), reverse=True)

In [60]:
sorted_d

[(58.13664596273292, 'Games'),
 (7.888198757763975, 'Entertainment'),
 (4.968944099378882, 'Photo & Video'),
 (3.6645962732919255, 'Education'),
 (3.2919254658385095, 'Social Networking'),
 (2.608695652173913, 'Shopping'),
 (2.515527950310559, 'Utilities'),
 (2.142857142857143, 'Sports'),
 (2.049689440993789, 'Music'),
 (2.018633540372671, 'Health & Fitness'),
 (1.7391304347826086, 'Productivity'),
 (1.5838509316770186, 'Lifestyle'),
 (1.3354037267080745, 'News'),
 (1.2422360248447204, 'Travel'),
 (1.1180124223602483, 'Finance'),
 (0.8695652173913043, 'Weather'),
 (0.8074534161490683, 'Food & Drink'),
 (0.5590062111801242, 'Reference'),
 (0.5279503105590062, 'Business'),
 (0.43478260869565216, 'Book'),
 (0.18633540372670807, 'Navigation'),
 (0.18633540372670807, 'Medical'),
 (0.12422360248447205, 'Catalogs')]

In [61]:
type(sorted_d)

list

In [62]:
type(sorted_d[0])

tuple

In [63]:
d=freq_table(android_final,1)
sorted_d = sorted(((key, value) for (key,value) in d.items()), reverse=True)

In [64]:
sorted_d

[('WEATHER', 0.8009927797833934),
 ('VIDEO_PLAYERS', 1.7937725631768953),
 ('TRAVEL_AND_LOCAL', 2.33528880866426),
 ('TOOLS', 8.461191335740072),
 ('SPORTS', 3.395758122743682),
 ('SOCIAL', 2.6624548736462095),
 ('SHOPPING', 2.2450361010830324),
 ('PRODUCTIVITY', 3.892148014440433),
 ('PHOTOGRAPHY', 2.9444945848375452),
 ('PERSONALIZATION', 3.3167870036101084),
 ('PARENTING', 0.6543321299638989),
 ('NEWS_AND_MAGAZINES', 2.7978339350180503),
 ('MEDICAL', 3.5311371841155235),
 ('MAPS_AND_NAVIGATION', 1.3989169675090252),
 ('LIFESTYLE', 3.9034296028880866),
 ('LIBRARIES_AND_DEMO', 0.9363718411552346),
 ('HOUSE_AND_HOME', 0.8235559566787004),
 ('HEALTH_AND_FITNESS', 3.079873646209386),
 ('GAME', 9.724729241877256),
 ('FOOD_AND_DRINK', 1.2409747292418774),
 ('FINANCE', 3.700361010830325),
 ('FAMILY', 18.907942238267147),
 ('EVENTS', 0.7107400722021661),
 ('ENTERTAINMENT', 0.9589350180505415),
 ('EDUCATION', 1.1620036101083033),
 ('DATING', 1.861462093862816),
 ('COMMUNICATION', 3.2378158844

In [65]:
d=freq_table(android_final,1)
sorted_d = sorted(((value, key) for (key,value) in d.items()), reverse=True)

In [66]:
sorted_d

[(18.907942238267147, 'FAMILY'),
 (9.724729241877256, 'GAME'),
 (8.461191335740072, 'TOOLS'),
 (4.591606498194946, 'BUSINESS'),
 (3.9034296028880866, 'LIFESTYLE'),
 (3.892148014440433, 'PRODUCTIVITY'),
 (3.700361010830325, 'FINANCE'),
 (3.5311371841155235, 'MEDICAL'),
 (3.395758122743682, 'SPORTS'),
 (3.3167870036101084, 'PERSONALIZATION'),
 (3.237815884476534, 'COMMUNICATION'),
 (3.079873646209386, 'HEALTH_AND_FITNESS'),
 (2.9444945848375452, 'PHOTOGRAPHY'),
 (2.7978339350180503, 'NEWS_AND_MAGAZINES'),
 (2.6624548736462095, 'SOCIAL'),
 (2.33528880866426, 'TRAVEL_AND_LOCAL'),
 (2.2450361010830324, 'SHOPPING'),
 (2.1435018050541514, 'BOOKS_AND_REFERENCE'),
 (1.861462093862816, 'DATING'),
 (1.7937725631768953, 'VIDEO_PLAYERS'),
 (1.3989169675090252, 'MAPS_AND_NAVIGATION'),
 (1.2409747292418774, 'FOOD_AND_DRINK'),
 (1.1620036101083033, 'EDUCATION'),
 (0.9589350180505415, 'ENTERTAINMENT'),
 (0.9363718411552346, 'LIBRARIES_AND_DEMO'),
 (0.9250902527075813, 'AUTO_AND_VEHICLES'),
 (0.82355595

In [67]:
def display_table(dataset,index):
    d=freq_table(dataset,index)
    sorted_d = sorted(((value, key) for (key,value) in d.items()), reverse=True)
    return sorted_d

In [68]:
display_table(android_final,1)

[(18.907942238267147, 'FAMILY'),
 (9.724729241877256, 'GAME'),
 (8.461191335740072, 'TOOLS'),
 (4.591606498194946, 'BUSINESS'),
 (3.9034296028880866, 'LIFESTYLE'),
 (3.892148014440433, 'PRODUCTIVITY'),
 (3.700361010830325, 'FINANCE'),
 (3.5311371841155235, 'MEDICAL'),
 (3.395758122743682, 'SPORTS'),
 (3.3167870036101084, 'PERSONALIZATION'),
 (3.237815884476534, 'COMMUNICATION'),
 (3.079873646209386, 'HEALTH_AND_FITNESS'),
 (2.9444945848375452, 'PHOTOGRAPHY'),
 (2.7978339350180503, 'NEWS_AND_MAGAZINES'),
 (2.6624548736462095, 'SOCIAL'),
 (2.33528880866426, 'TRAVEL_AND_LOCAL'),
 (2.2450361010830324, 'SHOPPING'),
 (2.1435018050541514, 'BOOKS_AND_REFERENCE'),
 (1.861462093862816, 'DATING'),
 (1.7937725631768953, 'VIDEO_PLAYERS'),
 (1.3989169675090252, 'MAPS_AND_NAVIGATION'),
 (1.2409747292418774, 'FOOD_AND_DRINK'),
 (1.1620036101083033, 'EDUCATION'),
 (0.9589350180505415, 'ENTERTAINMENT'),
 (0.9363718411552346, 'LIBRARIES_AND_DEMO'),
 (0.9250902527075813, 'AUTO_AND_VEHICLES'),
 (0.82355595

In [69]:
android_header

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [70]:
ios_header

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [71]:
display_table(android_final,-4)

[(8.449909747292418, 'Tools'),
 (6.069494584837545, 'Entertainment'),
 (5.347472924187725, 'Education'),
 (4.591606498194946, 'Business'),
 (3.892148014440433, 'Productivity'),
 (3.892148014440433, 'Lifestyle'),
 (3.700361010830325, 'Finance'),
 (3.5311371841155235, 'Medical'),
 (3.463447653429603, 'Sports'),
 (3.3167870036101084, 'Personalization'),
 (3.237815884476534, 'Communication'),
 (3.1024368231046933, 'Action'),
 (3.079873646209386, 'Health & Fitness'),
 (2.9444945848375452, 'Photography'),
 (2.7978339350180503, 'News & Magazines'),
 (2.6624548736462095, 'Social'),
 (2.3240072202166067, 'Travel & Local'),
 (2.2450361010830324, 'Shopping'),
 (2.1435018050541514, 'Books & Reference'),
 (2.041967509025271, 'Simulation'),
 (1.861462093862816, 'Dating'),
 (1.8501805054151625, 'Arcade'),
 (1.7712093862815885, 'Video Players & Editors'),
 (1.759927797833935, 'Casual'),
 (1.3989169675090252, 'Maps & Navigation'),
 (1.2409747292418774, 'Food & Drink'),
 (1.128158844765343, 'Puzzle'),
 

In [76]:
prime_genre=freq_table(ios_final,-5)
for genre in prime_genre:
    total=0
    len_genre=0
    for app in ios_final:
        genre_app=app[-5]
        if genre_app==genre:
            n_rat=float(app[5])
            total+=n_rat
            len_genre+=1
    avg_rating=total/len_genre
    print(genre, ':', avg_rating)

Lifestyle : 16485.764705882353
Social Networking : 71548.34905660378
Games : 22812.92467948718
Finance : 31467.944444444445
Weather : 52279.892857142855
Music : 57326.530303030304
Food & Drink : 33333.92307692308
Catalogs : 4004.0
Photo & Video : 28441.54375
Business : 7491.117647058823
Entertainment : 14029.830708661417
Productivity : 21028.410714285714
Navigation : 86090.33333333333
Health & Fitness : 23298.015384615384
News : 21248.023255813954
Shopping : 26919.690476190477
Book : 39758.5
Travel : 28243.8
Utilities : 18684.456790123455
Sports : 23008.898550724636
Education : 7003.983050847458
Reference : 74942.11111111111
Medical : 612.0


In [78]:
cat=freq_table(android_final,1)
for category in cat:
    total=0
    len_category=0
    for app in android_final:
        category_app=app[1]
        if category_app==category:
            n_installs=app[5]
            n_installs=n_installs.replace('+','')
            n_installs=n_installs.replace(',','')
            total+=float(n_installs)
            len_category+=1
    avg_installs=total/len_category
    print(category,':', avg_installs)

EVENTS : 253542.22222222222
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
WEATHER : 5074486.197183099
FAMILY : 3695641.8198090694
GAME : 15588015.603248259
HEALTH_AND_FITNESS : 4188821.9853479853
PHOTOGRAPHY : 17840110.40229885
SHOPPING : 7036877.311557789
ART_AND_DESIGN : 1986335.0877192982
PRODUCTIVITY : 16787331.344927534
EDUCATION : 1833495.145631068
LIFESTYLE : 1437816.2687861272
COMICS : 817657.2727272727
DATING : 854028.8303030303
MAPS_AND_NAVIGATION : 4056941.7741935486
HOUSE_AND_HOME : 1331540.5616438356
MEDICAL : 120550.61980830671
AUTO_AND_VEHICLES : 647317.8170731707
SOCIAL : 23253652.127118643
PERSONALIZATION : 5201482.6122448975
FINANCE : 1387692.475609756
PARENTING : 542603.6206896552
FOOD_AND_DRINK : 1924897.7363636363
BUSINESS : 1712290.1474201474
SPORTS : 3638640.1428571427
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
TRAVEL_AND_LOCAL : 13984077.710144928
VIDEO_PLAYERS : 24727872.452830188
LIBRARIES_AND_DEMO : 638503.73493975