**Profitable App Profiles for the App Store and Google Play Markets**
================================
Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better.

**Goal**
--------------------------------------------------------------
Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [1]:
def open_file(dataset, header = True):
    opened_file = open(dataset, encoding='utf8')
    from csv import reader
    read_file = reader(opened_file)
    data = list(read_file)
    if header:
        return data[0], data[1:]
    else:
        return data[1:]

In [2]:
android_header, android = open_file('googleplaystore.csv')
ios_header, ios = open_file('AppleStore.csv')

In [3]:
def explore_data(dataset, start, end, rows_columns = False):
    data = dataset[start:end]
    for row in data:
        print(row)
        print('\n')
        
    if rows_columns:
        print('Number of rows: ', len(dataset))
        print('Number of columns: ', len(dataset[0]))
        

In [4]:
print(android_header)
print('\n')
explore_data(android, 0, 1, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13


In [5]:
print(ios_header)
print('\n')
explore_data(ios, 0, 1, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


Number of rows:  7197
Number of columns:  16


**Deleting Wrong Data**
---------------------------------------
The entry in row 10472 has missing 'Rating' and a column shift happened for next columns..


In [6]:
print(len(android))
del android[10472]  # don't run this more than once
print(len(android))

10841
10840


Removing Duplicate Entries: Part One
-----------------------------------------------


In [7]:
for app in android:
    name = app[0]
    if name == 'Facebook':
        print(app)

['Facebook', 'SOCIAL', '4.1', '78158306', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']
['Facebook', 'SOCIAL', '4.1', '78128208', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'August 3, 2018', 'Varies with device', 'Varies with device']


In [8]:
def duplicate(dataset):
    duplicate_apps = []
    unique_apps_name = []
    unique_apps = []
    for apps in dataset:
        name = apps[0]
        if name in unique_apps_name:
            duplicate_apps.append(apps)
        else:
            unique_apps.append(apps)
            unique_apps_name.append(name)
    print('No. of duplicate apps are: ',len(duplicate_apps))
    #print('The duplicate apps are: \n',duplicate_apps[:2])
    print('\n')
    return duplicate_apps, unique_apps
    
duplicate_apps, unique_apps = duplicate(android)

No. of duplicate apps are:  1181




Removing Duplicate Entries: Part Two
----------------------------------------------

To remove the duplicates, we will:

Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.

Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [9]:
def duplicate_removal(dataset):
    reviews_max = {}
    for apps in dataset:
        name = apps[0]
        n_reviews = float(apps[3])
        if name in reviews_max:
            if reviews_max[name] < n_reviews:
                reviews_max[name] = n_reviews
        else:
            reviews_max[name] = n_reviews
    return reviews_max

In [10]:
print(len(android) - len(duplicate_apps))
review_max = duplicate_removal(android)
print(len(review_max))

9659
9659


In [11]:
def remove_duplicate(dataset):
    android_clean = []
    already_added = []
    for app in dataset:
        name = app[0]
        n_reviews = float(app[3])
        if (name not in already_added) and (review_max[name] == n_reviews):
            android_clean.append(app)
            already_added.append(name)
    return android_clean
android_clean = remove_duplicate(android) 
len(android_clean)

9659

Removing Non-English Apps: Part One
-----------------------------------------------



In [12]:
def test_non_english(string):
    for char in string:
        if ord(char)> 127:
            return False
    return True
print(test_non_english('Instagram'))
print(test_non_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(test_non_english('Docs To Go™ Free Office Suite'))
print(test_non_english('Instachat 😜'))


True
False
False
False


Removing Non-English Apps: Part Two
----------------------------------------------

In [13]:


def test_non_english(string):
    no_of_string = 0
    for char in string:
        if ord(char)> 127:
            no_of_string += 1
            if no_of_string > 3:
                return False
    return True

print(test_non_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(test_non_english('Docs To Go™ Free Office Suite'))
print(test_non_english('Instachat 😜'))

False
True
True


In [14]:
def remove_non_eng(dataset, name_index):
    android_eng = []
    for app in dataset:
        name = app[name_index]
        eng = test_non_english(name)
        if eng:
            android_eng.append(app)
    return android_eng

android_english = remove_non_eng(android_clean, 0)
ios_english = remove_non_eng(ios, 1)          

In [15]:
print(len(android_english))
print(len(ios_english))

9614
6183


Isolating the Free Apps
--------------------------------------------



In [16]:

def free_app(dataset, index):
    free_app = []
    for app in dataset:
        price = app[index]
        if price == '0' or price == '0.0':
            free_app.append(app)
    return free_app

android_final = free_app(android_english, 7)
ios_final = free_app(ios_english, 4)          

print(len(android_final))
print(len(ios_final))

8864
3222


Most Common Apps by Genre: Part One
----------------------------------------------------

As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

1. Build a minimal Android version of the app, and add it to Google Play.
2. If the app has a good response from users, we then develop it further.
3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.

Most Common Apps by Genre: Part Two
------------------------------------------------

In [17]:
def freq_table(dataset, index):
    frequency_dict = {}
    for app in dataset:
        name = app[index]
        if name in frequency_dict:
            frequency_dict[name] += 1
        else:
            frequency_dict[name] = 1
    return frequency_dict

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
    return table_sorted
        
android_genre = display_table(android_final, -4)

Tools : 749
Entertainment : 538
Education : 474
Business : 407
Productivity : 345
Lifestyle : 345
Finance : 328
Medical : 313
Sports : 307
Personalization : 294
Communication : 287
Action : 275
Health & Fitness : 273
Photography : 261
News & Magazines : 248
Social : 236
Travel & Local : 206
Shopping : 199
Books & Reference : 190
Simulation : 181
Dating : 165
Arcade : 164
Video Players & Editors : 157
Casual : 156
Maps & Navigation : 124
Food & Drink : 110
Puzzle : 100
Racing : 88
Role Playing : 83
Libraries & Demo : 83
Auto & Vehicles : 82
Strategy : 81
House & Home : 73
Weather : 71
Events : 63
Adventure : 60
Comics : 54
Beauty : 53
Art & Design : 53
Parenting : 44
Card : 40
Casino : 38
Trivia : 37
Educational;Education : 35
Board : 34
Educational : 33
Education;Education : 30
Word : 23
Casual;Pretend Play : 21
Music : 18
Racing;Action & Adventure : 15
Puzzle;Brain Games : 15
Entertainment;Music & Video : 15
Casual;Brain Games : 12
Casual;Action & Adventure : 12
Arcade;Action & Advent

In [18]:
ios_genre = display_table(ios_final, -5)

Games : 1874
Entertainment : 254
Photo & Video : 160
Education : 118
Social Networking : 106
Shopping : 84
Utilities : 81
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 51
News : 43
Travel : 40
Finance : 36
Weather : 28
Food & Drink : 26
Reference : 18
Business : 17
Book : 14
Navigation : 6
Medical : 6
Catalogs : 4


android_category = display_table(android_final, 1)

In [19]:
print('Most commom genre: ',android_genre[0][-1])
print('Most commom genre runner-up : ',android_genre[1][-1])

Most commom genre:  Tools
Most commom genre runner-up :  Entertainment


Most Popular Apps by Genre on the App Store
-----------------------------------------------------------

One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

1. Isolate the apps of each genre.
2. Sum up the user ratings for the apps of that genre.
3. Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [20]:
def popular_apps(dataset, index_genre, index_install):
    genre_popularity = {}
    genre_frequency = freq_table(dataset, index_genre)
    popularity_display = []
    for genre in genre_frequency:
        total = 0
        len_genre = 0
        for app in dataset:
            if app[index_genre] == genre:
                install_count = float(app[index_install])
                total += install_count
                len_genre += 1
        avg_n_ratings = total/len_genre
        genre_popularity[genre] = avg_n_ratings
        #print(genre, ':', avg_n_ratings)
        
        key_val_as_tuple = (avg_n_ratings, genre)
        popularity_display.append(key_val_as_tuple)
        
    popularity_sorted = sorted(popularity_display, reverse = True)
    for entry in popularity_sorted:
        print(entry[1], ':', entry[0])
    return popularity_sorted
    #print(genre_popularity)
    #return genre_popularity    

In [21]:
#popular_apps(ios_final, -5, 5)

In [22]:
ios_popularity = popular_apps(ios_final, -5, 5)

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0


In [34]:
print('Most popular genre: ',ios_popularity[0][1])
print('\n')
print('Most popular apps: ')
for app in ios_final:
    if app[-5] == ios_popularity[0][1]:
        print(app[1], ':', app[5])

Most popular genre:  Navigation


Most popular apps: 
Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [36]:
print('Most popular genre runner up: ',ios_popularity[1][1])
print('\n')
print('Most popular apps runner up: ')
for app in ios_final:
    if app[-5] == ios_popularity[1][1]:
        print(app[1], ':', app[5])

Most popular genre runner up:  Reference


Most popular apps runner up: 
Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Most Popular Apps by Genre on Google Play
-------------------------------------------------------

For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.)

To remove characters from strings, we can use str.replace(old, new) method.

In [29]:
def popular_apps(dataset, index_genre, index_install, google = False):
    genre_popularity = {}
    genre_frequency = freq_table(dataset, index_genre)
    popularity_display = []
    for genre in genre_frequency:
        total = 0
        len_genre = 0
        for app in dataset:
            if google:
                installs= app[index_install]
                installs = installs.replace(',', '')
                installs = installs.replace('+', '')
                
            if app[index_genre] == genre:
                install_count = float(installs)
                total += install_count
                len_genre += 1
        avg_n_ratings = total/len_genre
        genre_popularity[genre] = avg_n_ratings
        #print(genre, ':', avg_n_ratings)
        
        key_val_as_tuple = (avg_n_ratings, genre)
        popularity_display.append(key_val_as_tuple)
        
    popularity_sorted = sorted(popularity_display, reverse = True)
    for entry in popularity_sorted:
        print(entry[1], ':', entry[0])
    return popularity_sorted
    #print(genre_popularity)
    #return genre_popularity    
    
android_popular = popular_apps(android_final, 1, 5, True)

COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

In [39]:
print('Most popular genre: ',android_popular[0][1])
print('\n')
print('Most popular apps: ')
for app in android_final:
    if app[1] == android_popular[0][1]:
        print(app[1], ':', app[5])

Most popular genre:  COMMUNICATION


Most popular apps: 
COMMUNICATION : 1,000,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 100,000,000+
COMMUNICATION : 50,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 1,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 1,000,000+
COMMUNICATION : 100,000,000+
COMMUNICATION : 500,000,000+
COMMUNICATION : 1,000,000+
COMMUNICATION : 100,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 10,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 1,000,000,000+
COMMUNICATION : 500,000,000+
COMMUNICATION : 5,000,000+
COMMUNICATION : 50,000,000+
COMMUNICATION : 1,000,000,000+
COMMUNICATION : 100,000,000+
COMMUNICATION : 100,000,000+
COMMUNICATION : 1,000,000+
COMMUNICATION : 10

Conclusion
-------------------------------------------------

I have tried to define functions so that they could be used in future.