# Customer Analysis for App Store and Google Play Markets

Goal for this project is to analyze data of play store and app store to help developers understand what type of apps are likely to attract more users and could be profitable. We are only analysis applications that are free to download and install in both market. 
- [A data set](https://bit.ly/2rQv6lb) containing data about approximately ten thousand Android apps from Google Play
- [A data set](https://bit.ly/32T3Mjj) containing data about approximately seven thousand iOS apps from the App Store


### Import libraries

In [1]:
import numpy as np
from csv import reader

### Get datasets


In [2]:
# read apple store data
file = open("AppleStore.csv");
ios_data = list(reader(file))
ios_data_header = ios_data[0]
del ios_data[0]

In [3]:
# read google play store data
file =  open("googleplaystore.csv");
android_data = list(reader(file))
android_data_header = android_data[0]
del android_data[0]

### Explore data

In [4]:
def explore_data(dataset, start=0, end=0, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [5]:
explore_data(ios_data, 0,3)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']




In [6]:
explore_data(android_data, 0,3)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']




In [7]:
# number of rows and columns in apple store data
explore_data(ios_data, rows_and_columns=True)

Number of rows: 7197
Number of columns: 16


In [8]:
# number of rows and columns in google play store data
explore_data(android_data, rows_and_columns=True)

Number of rows: 10841
Number of columns: 13


In [9]:
# removing wrong observation
del android_data[10472] # delete wrong entry

In [10]:
# exploring duplicate values
print(android_data_header)
for item in android_data:
    if item[0] == "Instagram":
        print(item)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [11]:
# exploring duplicate values
print(android_data_header)
for item in android_data:
    if item[0] == "Box":
        print(item)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [12]:
# find total duplicate values
duplicate_app, unique_app = [], []
for item in android_data:
    app_name = item[0]
    if app_name in unique_app:
        duplicate_app.append(app_name)
    else:
        unique_app.append(app_name)
print("duplicate apps:",len(duplicate_app))
print("unique apps:",len(unique_app))

duplicate apps: 1181
unique apps: 9659


In [13]:
duplicate_app[:5]

['Quick PDF Scanner + OCR FREE',
 'Box',
 'Google My Business',
 'ZOOM Cloud Meetings',
 'join.me - Simple Meetings']

## Cleaning data
### Remove duplicate observations

In [14]:
# Deleting duplicate observations and keeping one which have highest reviews from duplicates
# find max reviews for duplicate apps
max_review_app = {}                                                  # store a max review from duplicates value 
for item in android_data:
    app_name = item[0]
    n_reviews = float(item[3])
    # if app name is already in max_review_app then update current max value else add new one 
    if app_name in max_review_app:
        max_review_app[app_name] = n_reviews if max_review_app[app_name] < n_reviews else max_review_app[app_name]
    else:
        max_review_app[app_name] = n_reviews

In [15]:
max_review_app["Instagram"]

66577446.0

In [16]:
# creating clean data set
android_data_clean = []
already_added = []

for item in android_data:
    app_name = item[0]
    max_reviews = max_review_app[app_name]
    if float(item[3]) == max_reviews and app_name not in already_added:
        android_data_clean.append(item), already_added.append(item[0])

In [17]:
explore_data(android_data_clean, rows_and_columns=True)

Number of rows: 9659
Number of columns: 13


### Delete non-english apps

In [18]:
def is_english_chars(chars):
    """
        chars: input string
        return: return boolean; True if the string contains all english char else False.
    """
    non_english_cnt = 0
    for char in chars:
        if ord(char) > 127:
            if non_english_cnt >= 3:
                return False
            else:
                non_english_cnt = non_english_cnt + 1
    return True

In [19]:
#test function
test_input = ['Instagram', '爱奇艺PPS -《欢乐颂2》电视剧热播', 'Docs To Go™ Free Office Suite', 'Instachat 😜']

print([i + ': ' + str(is_english_chars(i)) for i in test_input])

['Instagram: True', '爱奇艺PPS -《欢乐颂2》电视剧热播: False', 'Docs To Go™ Free Office Suite: True', 'Instachat 😜: True']


In [20]:
# delete non english apps from android and ios dataset
android_english = []
for item in android_data_clean:
    app_name = item[0]
    if is_english_chars(app_name):
        android_english.append(item)

In [21]:
len(android_english)

9614

In [22]:
ios_english = []
for item in ios_data:
    app_name = item[1]
    if is_english_chars(app_name):
        ios_english.append(item)

In [23]:
len(ios_english)

6183

### Filter free apps

In [24]:
# isolating free apps from ios and android datasets
android_free = []
for app in android_english:
    if app[6] == 'Free':
        android_free.append(app)

ios_free = []
for app in ios_english:
    if app[4] == '0.0':
        ios_free.append(app)

In [25]:
print("Android Free apps:",len(android_free))
print("IOS Free Apps:", len(ios_free))

Android Free apps: 8863
IOS Free Apps: 3222


### Data analysis

In [26]:
def freq_table(dataset, idx):
    """
    Generate frequency table for the given column.
    param: 
          dataset: dataset rows list of list.
          idx: column index.
    """
    freq_table = {}
    total = 0
    
    for row in dataset:
        value = row[idx]
        if value in freq_table:
            freq_table[value] += 1
        else:
            freq_table[value] = 1
    
    table_percentages = {}
    for key in freq_table:
        percentage = (freq_table[key] / len(dataset)) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    """
    Display frequency table.
    param:
          dataset: dataset rows list of list.
          idx: column index.
    """
    table = freq_table(dataset, index)
    table_sorted = sorted(table.items(),key=lambda x:x[1], reverse = True)
    for entry in table_sorted:
        print(entry[0], ':', entry[1])

In [27]:
# for prime_genre in apple data
display_table(ios_free, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Medical : 0.186219739292365
Navigation : 0.186219739292365
Catalogs : 0.12414649286157665


- Most common free apps in app store are games. More than half near 58% of apps are games. second or runner-up is Entertainment apps and follow by Eduction genre. So in bigger picture free apps market dominated by Games followed by most of Entertainment, Education, Photo & Video, Utilities others are too rare.
- So General most apps developed for entertainment (games, photo and video, social networking, sports, music) purpose than practical (education, shopping, utilities, productivity, lifestyle) purposes.
- it seems games has more market share than other but we can suggest it for profile if same seems in user base for particular genres. 

In [28]:
# for Category in google data
display_table(android_free,1)

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

- Most common genres are Family, Games and Tools. its seems google play store has similar number of apps for different categories. where in Free IOS app store more dominate by Games genre only.

In [29]:
# Average ratings by genres
genres_ios = freq_table(ios_free, -5)
ratings_by_genre = {}

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg = total / len_genre
    ratings_by_genre[genre]=avg

In [30]:
# Sort average ratings of genres in descending order
sorted(ratings_by_genre.items(), key=lambda x:x[1], reverse=True)

[('Navigation', 86090.33333333333),
 ('Reference', 74942.11111111111),
 ('Social Networking', 71548.34905660378),
 ('Music', 57326.530303030304),
 ('Weather', 52279.892857142855),
 ('Book', 39758.5),
 ('Food & Drink', 33333.92307692308),
 ('Finance', 31467.944444444445),
 ('Photo & Video', 28441.54375),
 ('Travel', 28243.8),
 ('Shopping', 26919.690476190477),
 ('Health & Fitness', 23298.015384615384),
 ('Sports', 23008.898550724636),
 ('Games', 22788.6696905016),
 ('News', 21248.023255813954),
 ('Productivity', 21028.410714285714),
 ('Utilities', 18684.456790123455),
 ('Lifestyle', 16485.764705882353),
 ('Entertainment', 14029.830708661417),
 ('Business', 7491.117647058823),
 ('Education', 7003.983050847458),
 ('Catalogs', 4004.0),
 ('Medical', 612.0)]

In [31]:
def ios_get_genre_apps(app_name, num_records=15):
    n_record = 0
    for app in ios_free:
        if app[-5] == app_name:
            print(app[1], ':', app[5])
            n_record += 1
        if n_record >= num_records: break

In [32]:
ios_get_genre_apps("Navigation")

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [33]:
ios_get_genre_apps("Reference")

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14


In [34]:
ios_get_genre_apps("Social Networking")

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412


In [35]:
ios_get_genre_apps("Book")

Kindle – Read eBooks, Magazines & Textbooks : 252076
Audible – audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive – Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury — 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
謎解き : 0
謎解き2016 : 0


- After Exploring few top categories we can suggest one of the app is status video for social networking apps, message quotes, motivation quoted images that user can share on Social networkings. Social networkings apps have big user base so its more likely to get quicker user installation.

In [36]:
sorted(freq_table(android_free,5).items(),key=lambda x:x[1],reverse=True)

[('1,000,000+', 15.728308699086089),
 ('100,000+', 11.55365000564143),
 ('10,000,000+', 10.549475346947986),
 ('10,000+', 10.199706645605326),
 ('1,000+', 8.394448832223853),
 ('100+', 6.916393997517771),
 ('5,000,000+', 6.826131106848697),
 ('500,000+', 5.562450637481666),
 ('50,000+', 4.772650344127271),
 ('5,000+', 4.513144533453684),
 ('10+', 3.542818458761142),
 ('500+', 3.2494640640866526),
 ('50,000,000+', 2.3017037120613786),
 ('100,000,000+', 2.1324607920568655),
 ('50+', 1.9180864267178157),
 ('5+', 0.7898002933543946),
 ('1+', 0.5077287600135394),
 ('500,000,000+', 0.270788672007221),
 ('1,000,000,000+', 0.2256572266726842),
 ('0+', 0.045131445334536835)]

In [37]:
# Average installations by categories
categories_android = freq_table(android_free, 1)
installations_by_category = {}

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free:
        category_app = app[1]
        if category_app == category:            
            n_installs = float(app[5].replace(',','').replace('+',''))
            total += n_installs
            len_category += 1
    avg = total / len_category
    installations_by_category[category]=avg

In [38]:
sorted(installations_by_category.items(),key=lambda x:x[1],reverse=True)

[('COMMUNICATION', 38456119.167247385),
 ('VIDEO_PLAYERS', 24727872.452830188),
 ('SOCIAL', 23253652.127118643),
 ('PHOTOGRAPHY', 17840110.40229885),
 ('PRODUCTIVITY', 16787331.344927534),
 ('GAME', 15588015.603248259),
 ('TRAVEL_AND_LOCAL', 13984077.710144928),
 ('ENTERTAINMENT', 11640705.88235294),
 ('TOOLS', 10801391.298666667),
 ('NEWS_AND_MAGAZINES', 9549178.467741935),
 ('BOOKS_AND_REFERENCE', 8767811.894736841),
 ('SHOPPING', 7036877.311557789),
 ('PERSONALIZATION', 5201482.6122448975),
 ('WEATHER', 5074486.197183099),
 ('HEALTH_AND_FITNESS', 4188821.9853479853),
 ('MAPS_AND_NAVIGATION', 4056941.7741935486),
 ('FAMILY', 3697848.1731343283),
 ('SPORTS', 3638640.1428571427),
 ('ART_AND_DESIGN', 1986335.0877192982),
 ('FOOD_AND_DRINK', 1924897.7363636363),
 ('EDUCATION', 1833495.145631068),
 ('BUSINESS', 1712290.1474201474),
 ('LIFESTYLE', 1437816.2687861272),
 ('FINANCE', 1387692.475609756),
 ('HOUSE_AND_HOME', 1331540.5616438356),
 ('DATING', 854028.8303030303),
 ('COMICS', 81765

In [39]:
def android_get_category_apps(app_name,installations=[], num_records = 15):
    n_record = 0
    for app in android_free:
        if app[1] == app_name:
            if len(installations) > 0 and app[5] in installations:
                print(app[0], ':', app[5])
                n_record += 1
            if len(installations) == 0:
                print(app[0], ':', app[5])
                n_record += 1
                
        if n_record >= num_records: break

In [40]:
android_get_category_apps("COMMUNICATION", ['1,000,000,000+','500,000,000+','100,000,000+'])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+


In [41]:
under_100_m = []

for app in android_free:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

In [42]:
android_get_category_apps("BOOKS_AND_REFERENCE",[])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+


In [43]:
android_get_category_apps("BOOKS_AND_REFERENCE",['1,000,000,000+','500,000,000+','100,000,000+'])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [44]:
android_get_category_apps("BOOKS_AND_REFERENCE",['1,000,000+','5,000,000+','10,000,000+','50,000,000+'])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+


In [45]:
android_get_category_apps("SOCIAL",[])

Facebook : 1,000,000,000+
Facebook Lite : 500,000,000+
Tumblr : 100,000,000+
Social network all in one 2018 : 100,000+
Pinterest : 100,000,000+
TextNow - free text + calls : 10,000,000+
Google+ : 1,000,000,000+
The Messenger App : 1,000,000+
Messenger Pro : 1,000,000+
Free Messages, Video, Chat,Text for Messenger Plus : 1,000,000+
Telegram X : 5,000,000+
The Video Messenger App : 100,000+
Jodel - The Hyperlocal App : 1,000,000+
Hide Something - Photo, Video : 5,000,000+
Love Sticker : 1,000,000+


We can conclude same app recomandation as app store for play store both store have high number of users in social,communication app that mainly dominated by facebook,whatsapp like social networking platform and messaging apps. These dominated by big company and have big fraction of users. so we can make apps that can be use as tools for social communication apps or something could add into these apps. we can add functionality likes text for status, short video status, motivation quotes, images. So people could get content easliy that they would like to share on its social media feed.