# Profitable App Profiles for iOS and Android

Will introduce the project here. Goals etc.

The following cells will open the data files we have for iOS and Android and display and overview

In [1]:
# Creating a function to explore the datasets we have.
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
# Opening and displaying the datasets
from csv import reader
open_ios = open('AppleStore.csv')
open_android = open('googleplaystore.csv')
data_ios = list(reader(open_ios))
data_android = list(reader(open_android))

explore_data(data_ios, 0, 5, True)
print('\n')
explore_data(data_android, 0, 5, True)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


['4', '282614216', 'eBay: Best App to Buy, Sell, Save! Online Shopping', '128512000', 'USD', '0', '262241', '649', '4', '4.5', '5.10.0', '12+', 'Shopping', '37', '5', '9', '1']


Number of rows: 7198
Number of columns: 17


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 

In [3]:
# Checking for an error reported in row 10472 in Android and looks for the same in iOS

for row in data_android:
    headerlength = len(data_android[0])
    rowlength = len(row)
    
    if rowlength != headerlength:
        print(row)
        print(data_android.index(row))
        
for row in data_ios:
    headerlength = len(data_ios[0])
    rowlength = len(row)
    
    if rowlength != headerlength:
        print(row)
        print(data_ios.index(row))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10473


In [4]:
#The above code verifies an error in 10472 and therefore removing that row

del data_android[10473]

Will find duplicates in the Android Playstore data.

In [7]:
duplicate_android_apps = []
unique_android_apps = []

for app in data_android:
    name = app[0]
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)
        
print('Number of duplicate apps: ', len(duplicate_android_apps))

Number of duplicate apps:  1181


We'll want to remove these duplicates but retain the duplicate with the highest review number. To do this, we'll first create a dictionary storing only the highest review total for each app, and then we'll prove that the length is the same as expected (based on the number of duplicates we found in the last step, 1181)

In [13]:
print('Expected length: ', len(data_android[1:]) - 1181)

reviews_max = {}

for app in data_android[1:]:
    name = app[0]
    reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < reviews:
        reviews_max[name] = reviews
        
    elif name not in reviews_max:
        reviews_max[name] = reviews
        
print(len(reviews_max))

Expected length:  9659
9659


Next we'll create a clean list without duplicates.

In [15]:
android_clean = []
android_already = []

for app in data_android[1:]:
    name = app[0]
    reviews = float(app[3])
    if (reviews_max[name] == reviews) and (name not in android_already):
        android_clean.append(app)
        android_already.append(name)
        
print(len(android_clean))

9659


Create a function to find non-English characters. We discovered that ASCII characters will prevent our function from detecting English

In [42]:
def detectEnglish(string):
    asciiNum = 0
    
    for index in string:
        if ord(index) > 127:
            asciiNum += 1
    
    if asciiNum > 3:
        return False
    else:
        return True

name1 = 'Instagram'
name2 = '爱奇艺PPS -《欢乐颂2》电视剧热播'
name3 = ' Docs To Go  Free Office Suite ™'
name4 = 'Instachat 😜'

print(detectEnglish(name1))
print(detectEnglish(name2))
print(detectEnglish(name3))
print(detectEnglish(name4))

True
False
True
True


Using the detectEnglish function to filter out any non-English apps for the iOS and Android data sets. 

In [44]:
android_english = []
android_non_english = []

for row in android_clean:
    name = row[0]
    if detectEnglish(name):
        android_english.append(row)
    else:
        android_non_english.append(row)
        
print(len(android_english))
print(len(android_non_english))

9614
45


In [52]:
ios_english = []
ios_non_english = []

for row in data_ios[1:]:
    name = row[2]
    if detectEnglish(name):
        ios_english.append(row)
    else:
        ios_non_english.append(row)
        
print(len(ios_english))
print(len(ios_non_english))

6183
1014


We'll isolate only the free apps.

In [60]:
free_android = []
free_ios = []

for row in android_english:
    price = row[7]
    if price == '0':
        free_android.append(row)
    
    
for row in ios_english:
    price = row[5]
    if price == '0':
        free_ios.append(row)
        
print(len(free_android))
print(len(free_ios))

8864
3222


Our goal is to discover a successful app profile to develop an app that has a chance to attract users in both app stores. We'll analyze what genres have been successful using the number of ratings against categories and genres to draw conclusions around the type of app we should develop. The end goal will be to test the popularity of a minimally developed app in the Google Play store, and use the success of that app to decide if we should further develop it and eventually launch an iOs version.

Creating two functions to analyze the data. One to create a dictionary of specified columns (as a percent of the total) and another to sort those dictionaries properly (to find the most popular).

In [64]:
def freq_table(dataset, index):
    dataDictionary = {}
    total = 0
    for row in dataset:
        total += 1
        value = row[index]
        if value in dataDictionary:
            dataDictionary[value] += 1
        else:
            dataDictionary[value] = 1
            
    percentDictionary = {}
    for key in dataDictionary:
        percent = (dataDictionary[key] / total) * 100
        percentDictionary[key] =  percent
        
    return percentDictionary

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
print('iOS Genres:')
display_table(free_ios, 12)
print('\n')
print('Android Categories:')
display_table(free_android, 1)
print('\n')
print('Android Genres')
display_table(free_android, 9)

iOS Genres:
Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


Android Categories:
FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSON

**Top Ten iOS app Genres:**
1. Games : 58.16%
2. Entertainment : 7.88%
3. Photo & Video : 4.97%
4. Education : 3.66%
5. Social Networking : 3.29%
6. Shopping : 2.61%
7. Utilities : 2.51%
8. Sports : 2.14%
9. Music : 2.05%
10. Health & Fitness : 2.02%

**Top Ten Android apps by Category**
1. FAMILY : 18.91%
2. GAME : 9.72%
3. TOOLS : 8.46%
4. BUSINESS : 4.59%
5. LIFESTYLE : 3.90%
6. PRODUCTIVITY : 3.89%
7. FINANCE : 3.70%
8. MEDICAL : 3.53%
9. SPORTS : 3.40%
10. PERSONALIZATION : 3.32%

**Top Ten Android apps by Genre**
1. Tools : 8.45%
2. Entertainment : 6.07%
3. Education : 5.35%
4. Business : 4.59%
5. Productivity : 3.89%
6. Lifestyle : 3.89%
7. Finance : 3.70%
8. Medical : 3.53%
9. Sports : 3.46%
10. Personalization : 3.32%

Conclusion: 
* Games are by far the most popular genre on iOS, followed by entertainment and education. 
* Android has a more disperse share of genres.
  * Family and Games has the highest share by Category
  * Tools and productivity are very high on the list
  * Games disappear from the genre breakdown because there are many types of game genre in the data (otherwise these would show up higher).
* Final conclusion: If we're looking for a popular genre that spans both stores, games and entertainment is the safest bet. However, this only shows the genres with the most apps, not necessarily the most popular with users.  

Next we'll isolate the genres of iOS apps and use the rating count to find out which genres are most popular.

In [79]:
genreTable = freq_table(free_ios, 12)

for genre in genreTable:
    total = 0
    genreLength = 0
    
    for row in free_ios:
        genre_app = row[12]
        if genre_app == genre:
            ratings = float(row[6])
            total += ratings
            genreLength += 1
                
    avg_ratings = total / genreLength
    print(genre, ' : ', avg_ratings)

Productivity  :  21028.410714285714
Weather  :  52279.892857142855
Shopping  :  26919.690476190477
Reference  :  74942.11111111111
Finance  :  31467.944444444445
Music  :  57326.530303030304
Utilities  :  18684.456790123455
Travel  :  28243.8
Social Networking  :  71548.34905660378
Sports  :  23008.898550724636
Health & Fitness  :  23298.015384615384
Games  :  22788.6696905016
Food & Drink  :  33333.92307692308
News  :  21248.023255813954
Book  :  39758.5
Photo & Video  :  28441.54375
Entertainment  :  14029.830708661417
Business  :  7491.117647058823
Lifestyle  :  16485.764705882353
Education  :  7003.983050847458
Navigation  :  86090.33333333333
Medical  :  612.0
Catalogs  :  4004.0


**Sort the categories to find the top ten:**
1. Navigation: 86,090
  * Navigation is dominated by Google and would require competing with Apple's native Maps.
2. **Reference: 74,942**
  * Could be an interesting genre to explore.
3. Social Networking: 71,548
  * Dominated by Facebook and Twitter.
4. Music: 57,327
  * Dominated by Apple (native) and Spotify.
5. Weather: 52,280
  * Would have to compete with the native Apple app.
6. **Book: 39,759**
  * Could be interesting
7. **Food & Drink: 33,334**
  * Could be interesting
8. **Finance: 31,469**
  * Could be interesting.
9. Photo & Video: 28,442
  * Native Apple app.
10. Shopping: 26,920
  * Dominated by Amazon.

Let's explore the competitors in the highlighted genres.

In [81]:
for app in free_ios:
    if app[12] == 'Reference':
        print(app[2],":", app[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


In [82]:
for app in free_ios:
    if app[12] == 'Book':
        print(app[2],":", app[6])

Kindle – Read eBooks, Magazines & Textbooks : 252076
OverDrive – Library eBooks and Audiobooks : 65450
Audible – audio books, original series & podcasts : 105274
BookShout: Read eBooks & Track Your Reading Goals : 879
ikouhoushi : 0
Dr. Seuss Treasury — 50 best kids books : 451
Weirdwood Manor : 197
Green Riding Hood : 392
HOOKED - Chat Stories : 47829
Color Therapy Adult Coloring Book for Adults : 84062
MangaTiara - love comic reader : 0
MangaZERO - comic reader : 9
謎解き2016 : 0
謎解き : 0


In [83]:
for app in free_ios:
    if app[12] == 'Food & Drink':
        print(app[2],":", app[6])

OpenTable - Restaurant Reservations : 113936
Allrecipes Dinner Spinner : 109349
McDo France : 22
Starbucks : 303856
Lieferando.de : 29
Domino's Pizza USA : 258624
Lieferheld - Delicious food delivery service : 29
Bon Appetit : 750
Chefkoch - Rezepte, Kochen, Backen & Kochbuch : 20
Chick-fil-A : 5665
Postmates - Food Delivery, Faster : 9519
Open Food Facts : 1
7-Eleven, Inc. : 1356
Nowait Guest : 1625
DoorDash - Food Delivery : 25947
SONIC Drive-In : 1645
Youmiam : 9
McDonald's : 4050
Deliveroo: Restaurant Delivery - Order Food Nearby : 1702
Outback : 805
Dunkin' Donuts - Get Offers, Coupons & Rewards : 9068
UberEATS: Uber for Food Delivery : 17865
Delish Eatmoji Keyboard : 154
Marmiton Twist : 2
Starbucks Keyboard : 457
Whataburger : 197


In [84]:
for app in free_ios:
    if app[12] == 'Finance':
        print(app[2],":", app[6])

PayPal - Send and request money safely : 119487
Bank of America - Mobile Banking : 119773
Chase Mobile℠ : 233270
Mint: Personal Finance, Budget, Bills & Money : 232940
Citi Mobile® : 48822
Wells Fargo Mobile : 43064
USAA Mobile : 19946
Sparkasse - Your mobile branch : 77
TaxCaster – Free tax refund calculator : 17516
Venmo : 21090
Amex Mobile : 11421
Chase Mobile : 34322
Ma Banque : 17
Capital One Mobile : 56110
La Banque Postale : 8
IRS2Go : 1329
Bank of America - Mobile Banking for iPad : 7569
Zaim : 44
Lloyds Bank Mobile Banking : 17
Halifax Mobile Banking : 8
Impots.gouv : 0
KeyBank Mobile : 623
Credit Karma: Free Credit Scores, Reports & Alerts : 101679
Wells Fargo for iPad : 2207
Capital One for iPad : 21858
Square Cash - Send Money for Free : 23775
TurboTax Tax Return App - File 2016 income taxes : 9635
Paribus - Rebates When Prices Drop : 768
U by BB&T : 790
Capital One CreditWise - Credit score and report : 1019
Digit: Save Money Without Thinking About It : 1506
Stash Invest: 

Now let's examine the Google Play Store.

In [88]:
CatTable = freq_table(free_android, 1)

for category in CatTable:
    total = 0
    CatLength = 0
    
    for row in free_android:
        AppCategory = row[1]
        if AppCategory == category:
            downloads = row[5]
            downloads = downloads.replace('+', '')
            downloads = downloads.replace(',', '')
            total += float(downloads)
            CatLength += 1
    avg_downloads = total / CatLength
    print(category, ': ', avg_downloads)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

The Books Category looks like a good crossover. Let's examine the top dowloaded apps in that category.

In [90]:
for app in free_android:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                           or app[5] == '5,000,000+'
                                           or app[5] == '10,000,000+'
                                           or app[5] == '50,000,000+'):
        print(app[0],': ', app[5])

Wikipedia :  10,000,000+
Cool Reader :  10,000,000+
Book store :  1,000,000+
FBReader: Favorite Book Reader :  10,000,000+
Free Books - Spirit Fanfiction and Stories :  1,000,000+
AlReader -any text book reader :  5,000,000+
FamilySearch Tree :  1,000,000+
Cloud of Books :  1,000,000+
ReadEra – free ebook reader :  1,000,000+
Ebook Reader :  5,000,000+
Read books online :  5,000,000+
eBoox: book reader fb2 epub zip :  1,000,000+
All Maths Formulas :  1,000,000+
Ancestry :  5,000,000+
HTC Help :  10,000,000+
Moon+ Reader :  10,000,000+
English-Myanmar Dictionary :  1,000,000+
Golden Dictionary (EN-AR) :  1,000,000+
All Language Translator Free :  1,000,000+
Aldiko Book Reader :  10,000,000+
Dictionary - WordWeb :  5,000,000+
50000 Free eBooks & Free AudioBooks :  5,000,000+
Al-Quran (Free) :  10,000,000+
Al Quran Indonesia :  10,000,000+
Al'Quran Bahasa Indonesia :  10,000,000+
Al Quran Al karim :  1,000,000+
Al Quran : EAlim - Translations & MP3 Offline :  5,000,000+
Koran Read &MP3 30

The category is dominated by general reference and language dictionaries. A unique reference app would have potential for success in both stores. 