# Determining App Production by Analyzing Popularity in both iOS and Android Markets

We seek to answer which apps attract the most customers so that we can bolster our revenue by adjusting strategy accordingly.

The current plan is to introduce apps on Google first, gauge consumer response for further development, and create a version for iOS if the app is profitiable after 6 months. 

To this end, we need to understand what types of apps are lucrative in both markets.

In [1]:
from csv import reader

### Apple
open_apple = open('AppleStore.csv')
read_apple = reader(open_apple)
apple = list(read_apple)
apple_header = apple[0]
apple_data = apple[1:]

### Google
open_google = open('googleplaystore.csv')
read_google = reader(open_google)
google = list(read_google)
google_header = google[0]
google_data = google[1:]

In [2]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

## Explore Apple iOS data

In [3]:
explore_data(apple_data,1,3, rows_and_columns = True)

['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [4]:
print(apple_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


## Explore Google Play data

In [5]:
explore_data(google_data,1,3, rows_and_columns = True)

['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [6]:
print(google_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [7]:
print(google_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
print(google_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [9]:
print(len(google_data))

10841


In [10]:
del google_data[10472]

In [11]:
print(len(google_data))

10840


## Find duplicate apps

Are there duplicate apps in the datasets? We iterate through each dataset to find out.

In [12]:
unique_google = []
duplicate_google = []

for app in google_data:
    name = app[0]
    if name not in unique_google:
        unique_google.append(name)
    elif name in unique_google:
        duplicate_google.append(name)
len(duplicate_google)

1181

In [13]:
unique_apple = []
duplicate_apple = []

for app in apple_data:
    name = app[0]
    if name not in unique_apple:
        unique_apple.append(name)
    elif name in unique_apple:
        duplicate_apple.append(name)
len(duplicate_apple)

0

## Clean Google data

We want to keep records with the highest number of reviews since they are the most current. To this end, we put the correct values in the dictionary ``reviews_max``, referencing this dictionary while sorting app data into a clean list.

Next, we determine which apps have english characters in them as we only want to analyze english-speaking apps. These go into a list for Google and Apple each.

In [14]:
reviews_max = {}
for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
len(reviews_max)

9659

In [15]:
android_clean = []
already_added = []

for app in google_data:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
len(android_clean)

9659

In [16]:
def english_check(string):
    check = 0
    for char in string:
        if ord(char) > 127:
            check+=1
    if check > 3:
        return False
    else:
        return True



In [17]:
print(english_check('Docs To Go™ Free Office Suite'))

True


In [18]:
google_english = []

for app in android_clean:
    name = app[0]
    if english_check(name):
        google_english.append(app)

In [19]:
len(google_english)

9614

In [20]:
apple_english = []

for app in apple_data:
    name = app[1]
    if english_check(name):
        apple_english.append(app)

In [21]:
len(apple_english)

6183

In [22]:
free_google = []
free_apple = []

for app in google_english:
    price = app[7]
    if price == '0':
        free_google.append(app)

for app in apple_english:
    price = app[4]
    if price == '0.0':
        free_apple.append(app)

print(len(free_google))
print(len(free_apple))

8864
3222


## Analysis

Now, we determine the most common genres for Google and Apple. We want to find cross-over here since we plan to develop apps for both markets, first launching and further developing in Play (Google) and then launching in iOS (Apple) if we see profits after 6 months in Play.

We will examine the genre columns, indexed 9 in the Google dataset and indexed 11 in Apple.

In [23]:
def freq_table(dataset, index):
    dct = {}
    total= 0
    
    for row in dataset:
        total+=1
        column = row[index]
        if column not in dct:
            dct[column] = 1
        elif column in dct:
            dct[column]+=1
            
    table_percentages = {}
    for key in dct:
        percentage = (dct[key]/total)*100
        table_percentages[key] = percentage
    
    return table_percentages

In [24]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

### Run a table for Google's Genre column

Tools is the most common free app genre at 8.4%, followed by Entertainment at 6%. Practical genres such as Education, Business, Productivity, etc, dominate the top percentages with the exception of Entertainment.

It seems that games have been split into very specific genres and that there is no overall games genre to observe.

In [25]:
print(display_table(free_google,-4))


Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

### Run a table for Apple's Prime Genre column

Games by far dominate Apple's free app genres by making up 58% of iOS apps, long followed by Entertainment, Education, Photo & Video, and Utilities.

This is very different from Google's frequency tables, which focus on Family and Tools over games. Additionally Apple's apps are not as evenly distributed as Google's.

In [26]:
print(display_table(free_apple,-5))


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


### Run a table for Google's Category column

Family dominates Google's free app Categories by 18.9%, followed by Games and Tools at 9.7% and 8.4%. 

In [27]:
print(display_table(free_google,1))


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

### Calculate average number of ratings per genre for Apple iOS

First, we save a frequency table for Apple's Prime Genre column. Then, we iterate through iOS free app data to determine the average count of reviews per genre.

In [30]:
genres_ios = freq_table(free_apple,-5)
print(genres_ios)

{'Social Networking': 3.2898820608317814, 'Photo & Video': 4.9658597144630665, 'Games': 58.16263190564867, 'Music': 2.0484171322160147, 'Reference': 0.5586592178770949, 'Health & Fitness': 2.0173805090006205, 'Weather': 0.8690254500310366, 'Utilities': 2.5139664804469275, 'Travel': 1.2414649286157666, 'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Navigation': 0.186219739292365, 'Lifestyle': 1.5828677839851024, 'Entertainment': 7.883302296710118, 'Food & Drink': 0.8069522036002483, 'Sports': 2.1415270018621975, 'Book': 0.4345127250155183, 'Finance': 1.1173184357541899, 'Education': 3.662321539416512, 'Productivity': 1.7380509000620732, 'Business': 0.5276225946617008, 'Catalogs': 0.12414649286157665, 'Medical': 0.186219739292365}


In [35]:
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in free_apple:
        genre_app = app[-5]
        if genre_app == genre:
            ratings = float(app[5])
            total = total+ratings
            len_genre+=1
    avg_ratings = total/len_genre
    print(genre, avg_ratings)

Social Networking 71548.34905660378
Photo & Video 28441.54375
Games 22788.6696905016
Music 57326.530303030304
Reference 74942.11111111111
Health & Fitness 23298.015384615384
Weather 52279.892857142855
Utilities 18684.456790123455
Travel 28243.8
Shopping 26919.690476190477
News 21248.023255813954
Navigation 86090.33333333333
Lifestyle 16485.764705882353
Entertainment 14029.830708661417
Food & Drink 33333.92307692308
Sports 23008.898550724636
Book 39758.5
Finance 31467.944444444445
Education 7003.983050847458
Productivity 21028.410714285714
Business 7491.117647058823
Catalogs 4004.0
Medical 612.0


### Observing popular iOS genres by average review count

First, we delved into free Business apps. It seems that a handful of career apps account for the highest average review counts and that there is not a wide variety of other apps.

Next, we looked into Education free apps. While there are a few giants with high average review counts, there are also more apps for users to choose from with a more inclusive scale of average review counts.

In [36]:
for app in free_apple:
    if app[-5] == 'Business':
        print(app[1],':',app[5])

Indeed Job Search : 38681
Flashlight ◎ : 24744
Adobe Acrobat Reader: View, Create, & Convert PDFs : 20069
Scanner App - PDF Document Scan : 11696
SayHi Translate : 8623
ADP Mobile Solutions : 8324
Sideline - 2nd Phone Number : 7907
Uber Driver : 3289
AirWatch Agent : 1150
VPN Go - Safe Fast & Stable VPN Proxy : 881
Cisco AnyConnect : 825
GreenVPN - Free & fast VPN with unlimited traffic : 464
iPlum Business Phone Number for Calling & Texting : 392
OPEN Forum : 200
Pulse Secure : 53
DingTalk : 40
Mon Espace - Pôle emploi : 11


In [37]:
for app in free_apple:
    if app[-5] == 'Education':
        print(app[1],':',app[5])

Duolingo - Learn Spanish, French and more : 162701
Guess My Age  Math Magic : 123190
Lumosity - Brain Training : 96534
Elevate - Brain Training and Games : 58092
Fit Brains Trainer : 46363
ClassDojo : 35440
Memrise: learn languages : 20383
Peak - Brain Training : 20322
Canvas by Instructure : 19981
ABCmouse.com - Early Learning Academy : 18749
Quizlet: Study Flashcards, Languages & Vocabulary : 16683
Photomath - Camera Calculator : 16523
iTunes U : 15801
Blackboard Mobile Learn™ : 13567
Star Chart : 13482
Remind: Fast, Efficient School Messaging : 9796
PBS KIDS Video : 8651
Toca Kitchen Monsters : 8062
Toca Hair Salon - Christmas Gift : 8049
Edmodo : 7197
Prodigy Math Game : 6683
Epic! - Unlimited Books for Kids : 6676
ChineseSkill -Learn Mandarin Chinese Language Free : 6077
Google Classroom : 5942
TED : 5782
Khan Academy: you can learn anything : 5459
Got It - Homework Help Math, Chem, Physics Solver : 4903
PowerSchool Mobile : 4547
SkyView® Free - Explore the Universe : 4188
Hopsco

### iOS app recommendation

Education is a big share of iOS free apps and also sees a high average count of reviews among those apps, which to me indicates that this is a popular category for users to explore new apps. To this end, I would recommend building an Education app for iOS.

### Calculate average number of install per genre for Google Android

In [39]:
genres_android = freq_table(free_google,1)

In [40]:
print(display_table(free_google, 5))

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343
None


In [44]:
for genre in genres_android:
    total = 0
    len_category = 0
    for app in free_google:
        category_app = app[1]
        if category_app == genre:
            installs = app[5]
            installs = installs.replace('+','')
            installs = installs.replace(',','')
            total = total + int(installs)
            len_category += 1
    avg_installs = total/len_category
    print(genre, ':', avg_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In [45]:
for app in free_google:
    if app[1] == 'MAPS_AND_NAVIGATION':
        print(app[0], ':', app[5])

Waze - GPS, Maps, Traffic Alerts & Live Navigation : 100,000,000+
T map (te map, T map, navigation) : 5,000,000+
MapQuest: Directions, Maps, GPS & Navigation : 10,000,000+
Yahoo! transit guide free timetable, operation information, transfer search : 10,000,000+
乗換NAVITIME　Timetable & Route Search in Japan Tokyo : 5,000,000+
Transit: Real-Time Transit App : 5,000,000+
Mapy.cz - Cycling & Hiking offline maps : 1,000,000+
Uber : 100,000,000+
GPS Navigation & Offline Maps Sygic : 50,000,000+
Map and Router Badge : 500,000+
Yandex.Transport : 10,000,000+
Air Traffic : 1,000,000+
Speed Cameras Radar : 1,000,000+
Atlan3D Navigation: Korea navigator : 1,000,000+
Compass : 10,000,000+
Mappy - Plan, route comparison, GPS : 1,000,000+
Gps Route Finder : 100,000+
My Location: GPS Maps, Share & Save Places : 5,000,000+
Yanosik: "antyradar", traffic jams, navigation, camera : 5,000,000+
NAVITIME - Map & Transfer Navi : 5,000,000+
Sygic Car Navigation : 5,000,000+
Czech Public Transport IDOS : 1,000,

In [47]:
for app in free_google:
    if app[1] == 'PRODUCTIVITY':
        print(app[0], ':', app[5])

Microsoft Word : 500,000,000+
All-In-One Toolbox: Cleaner, Booster, App Manager : 10,000,000+
AVG Cleaner – Speed, Battery & Memory Booster : 10,000,000+
QR Scanner & Barcode Scanner 2018 : 10,000,000+
Chrome Beta : 10,000,000+
Microsoft Outlook : 100,000,000+
Google PDF Viewer : 10,000,000+
My Claro Peru : 5,000,000+
Power Booster - Junk Cleaner & CPU Cooler & Boost : 1,000,000+
Google Assistant : 10,000,000+
Microsoft OneDrive : 100,000,000+
Calculator - unit converter : 50,000,000+
Microsoft OneNote : 100,000,000+
Metro name iD : 10,000,000+
Google Keep : 100,000,000+
Archos File Manager : 5,000,000+
ES File Explorer File Manager : 100,000,000+
ASUS SuperNote : 10,000,000+
HTC File Manager : 10,000,000+
MyMTN : 1,000,000+
Dropbox : 500,000,000+
ASUS Quick Memo : 10,000,000+
HTC Calendar : 10,000,000+
Google Docs : 100,000,000+
ASUS Calling Screen : 10,000,000+
lifebox : 5,000,000+
Yandex.Disk : 5,000,000+
Content Transfer : 5,000,000+
HTC Mail : 10,000,000+
Advanced Task Killer : 50

In [48]:
for app in free_google:
    if app[1] == 'PHOTOGRAPHY':
        print(app[0], ':', app[5])

TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
Groovebook Photo Books & Gifts : 500,000+
Moony Lab - Print Photos, Books & Magnets ™ : 50,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
B612 - Beauty & Filter Camera : 100,000,000+
Waterfall Photo Frames : 1,000,000+
Photo frame : 100,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Garden Photo Frames - Garden Photo Editor : 500,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo Edito

### Google Android app recommendation

I would recommend a Productivity app since this is popular in the market and there are many apps with high installations