# Researching Profitable iOS and Android Free Apps 
This project is focused on analyzing the data on Apple Store and Google Play Store mobile apps

The goal of this project is to identify the kind of apps that generate high user engagements and increase company revenue

In view of the costs associated with collecting data on over 4 million apps currently available on both Apple and Google Markets, we shall be using a sizable sample of data collected and published on Kaggle.com by Ramananhan and Lavanya Gupta. Here is the google data [link](https://www.kaggle.com/lava18/google-play-store-apps#googleplaystore.csv) and Apple data [link](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)


# Open and Explore Android Apps Data

To explore the google apps data, we may be interested in knowing:
* the description of each column
* the examples of some data points
* The number of entries in the dataset
* the number of columns




In [1]:
#open Data
opened_file = open("googleplaystore.csv")
from csv import reader
read_file = reader(opened_file)
android_data = list(read_file)

#Explore to have a working understanding of the data
# We create an explore function
def explore_data(dataset, start_row, end_row, length_rows_columns = True): #dataset should have no header rows
    data_rows = dataset[start_row : end_row]
    for each_row in data_rows:
        column_length = data_rows[0]
        print(each_row)
        print("\n")
    if length_rows_columns:
            print("Number of Rows:", len(dataset))
            print("\n")
            print("Number of columnns:", len(column_length))
            

android_no_header = android_data[1:]


# to understand the column descriptions
print(android_data[0])

print("\n")

#to view the data points under each column, to know the number of columns, and the number of entries in the google apps data 
explore_data(android_no_header, 0, 3)
    





['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of Rows: 10841


Number of columnns: 13


From the explored google apps data, we can clearly see that all the columns are important in helping us achieve our project goals. Company decisions are summarised by such columns as apps, category, type, price, genres, content rating, old and current version; while data on user engagements are summarised in such columns as ratings, reviews and installs

# Open and Explore IOS Apps Data

To explore the google apps data, we may be interested in knowing:
* the description of each column
* the examples of some data points
* The number of entries in the dataset
* the number of columns

In [2]:
#open Data
opened_file = open("AppleStore.csv")
from csv import reader
read_file = reader(opened_file)
ios_data = list(read_file)

#Explore to have a working understanding of the data
# We create an explore function
def explore_data(dataset, start_row, end_row, length_rows_columns = True): #dataset should have no header rows
    data_rows = dataset[start_row : end_row]
    for each_row in data_rows:
        column_length = data_rows[0]
        print(each_row)
        print("\n")
    if length_rows_columns:
            print("Number of Rows:", len(dataset))
            print("\n")
            print("Number of columnns:", len(column_length))
            

ios_no_header = ios_data[1:]


# to understand the column descriptions
print(ios_data[0])

print("\n")

#to view the data points under each column, to know the number of columns, and the number of entries in the IOS apps data 
explore_data(ios_no_header, 0, 3)
    



['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of Rows: 7197


Number of columnns: 16


Some of the Applestore data column descriptions seem vague. We may need to access [the data source on kaggle](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) to get a full description. As with the google apps data however, the ios data can also be segmented into two: company actions and user reactions, which makes it appropriate for our objectives

# Cleaning the Google Apps Data

* Correct or remove wrong or missing data 
* Remove duplicates
* Modify data to fit project objectives

Carefully reading through the documentation on the dataset shows that a datapoint is missing in entry 10472, and there are duplicated items in some rows. Thus, we check to confirm, correct where possible and delete where necessary.


In [3]:
del android_no_header[10472] # deleting the row of the missing data

### Check for Duplicates

In [4]:
#check for duplicate datapoints in several rows

duplicate_items = []
unique_items = []
for each_list in android_no_header:
    app_name = each_list[0]
    if app_name in unique_items:
        duplicate_items.append(app_name)
    else:
        unique_items.append(app_name)
print((duplicate_items[:5]))
print(len(duplicate_items))


for each_item in duplicate_items:
    if each_item == "ZOOM Cloud Meetings":
        that_app = each_item
print(len(that_app))

    
# Check for duplicate rows

duplicate_row = []
unique_row = []
for each_list in android_no_header:
    each_data_entry = each_list
    if each_data_entry in unique_row:
        duplicate_row.append(each_data_entry)
    else:
        unique_row.append(each_data_entry)
print(len(duplicate_row))
print(len(unique_row))

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']
1181
19
483
10357


### Remove Duplicates

* From the above, we clearly see that we have different apps that are duplicated 1,181 times. As an example, the app - "Zoom Cloud Meetings" was duplicated 19 times. 

* Also, we see that different rows are duplicated 483 times in the google dataset, leaving 10,357 unique rows.

* Consequently we seek to delete this duplicates ensuring that the most recent entry of the data is left undeleted. A careful inspection of the number of reviews of the app helped us come up with this deletion criterion

In [5]:
# Here, we use a dictionary to map each unique app to its highest number of reviews

unique_apps = {}
for each_list in android_no_header:
    apps = each_list[0]
    reviews = float(each_list[3])
    if apps in unique_apps and unique_apps[apps] < reviews:
        unique_apps[apps] = reviews
    if apps not in unique_apps:
        unique_apps[apps] = reviews
print(len(unique_apps)) # To check if we are on track. The expected number of entries is 9659

#Here, we use our created dictionary to remove duplicates according to our specified criterion.

android_clean = []
apps_already_added = []
for each_row in android_no_header:
    apps = each_row[0]
    reviews = float(each_row[3])
    if (unique_apps[apps] == reviews) and (apps not in apps_already_added):
        android_clean.append(each_row)
        apps_already_added.append(apps)
explore_data(android_clean, 0, 3)
print(len(android_clean))


    


    

9659
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of Rows: 9659


Number of columnns: 13
9659


# Removing Non-English Apps 

Upon exploring our data further, it was observed that there are some non-English Apps in both the Google and IOS data. Since our Apps are developed for English speakng audience, having non-English apps in our analysis won"t do be beneficial. Thus we remove it.

### The Criteria  for Removal
Worthy of note is that each character used in a string has a corresponding number asssociated with it. For English texts which usually contain characters such as numbers, letters, punctuation marks, and special characters; the corresponding numbers according to the American Standard Code for Information Interchange [ASCII](http://www.asciitable.com/) range from 0 - 127. Strings, just as lists, are indexed and iterable. As such we can loop through each app name to ascertain which character is outside the expected range for English Apps.



In [6]:
# Evidence that some of the apps are non-English
print(ios_no_header[813][1])
print(android_clean[4412][0])

# We write a function to detect non-English Apps. An in-built Ord function can help us detect the corresponding number of a string charater
def english_apps(string):
    for character in string:
        if ord(character) > 127:
            return False
        
    return True
    
print(english_apps("Docs To Go‚Ñ¢ Free Office Suite")) # This is an English App. We expect True
print("\n")
print(english_apps("Instachat üòú")) # This is an English App. We expect True
print("\n")
print(english_apps("Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠")) # This is a non-English App. We expect False
print("\n")
print(english_apps("Instagram")) # This is an English App. We expect True
    
    

Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠
‰∏≠ÂõΩË™û AQ„É™„Çπ„Éã„É≥„Ç∞
False


False


False


True


### Analysis of the above output

From the output above, we see that our function is not perfect - it is identifying some English Apps as non-English, because of some special characters that are contained in the name of the Apps which are outside the 0 - 127 range. If unchecked, this will lead to some significant data loss 

### Modification of Criterion.

In view of the above, we modify our non-English Apps removal criterion. If an App name consists of 4 or more special characters that are outside the 0 - 127 range, the we regard that App as non-English and should be removed.

In [7]:
def english_apps(string):
    outlier_character = 0
    for character in string:
        if ord(character) > 127:
            outlier_character = outlier_character + 1
    if outlier_character > 3:
        return False
    else:
         return True

# we check the efficacy of the modified function

print(english_apps("Docs To Go‚Ñ¢ Free Office Suite")) # This is an English App. We expect True
print("\n")
print(english_apps("Instachat üòú")) # This is an English App. We expect True
print("\n")
print(english_apps("Áà±Â•áËâ∫PPS -„ÄäÊ¨¢‰πêÈ¢Ç2„ÄãÁîµËßÜÂâßÁÉ≠Êí≠")) # This is a non-English App. We expect False
print("\n")
print(english_apps("Instagram")) # This is an English App. We expect True

True


True


False


True


* from the above result, our function has proven to be fairly accurate. As such, we proced with identifying and removing non-English Apps

### Removing non-English Apps from Google Dataset

In [8]:
googleapps_english = []
googleapps_non_english = []
for each_row in android_clean:
    name_app = each_row[0]
    if english_apps(name_app):
        googleapps_english.append(each_row)
    else:
        googleapps_non_english.append(each_row)

print(len(googleapps_english))
print(len(googleapps_non_english))
print(len(android_clean))
print("\n")
print(explore_data(googleapps_english, 0, 5)) # To feel our cleaned data

9614
45
9659


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite ‚Äì FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


Number of Rows: 9614


Number of columnns: 13
None


### Removing non-English Apps from Apple Dataset

In [9]:
iosapps_english = []
iosapps_non_english = []
for each_row in ios_no_header:
    name_app = each_row[1]
    if english_apps(name_app):
        iosapps_english.append(each_row)
    else:
        iosapps_non_english.append(each_row)

print(len(iosapps_english))
print(len(iosapps_non_english))
print(len(ios_no_header))
print("\n")
print(explore_data(iosapps_english, 0, 5)) # To feel our cleaned data

6183
1014
7197


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of Rows: 6183


Number of columnns: 16
None


After removing the non-English Apps from both dataset, we now have:
* 9614 rows for android data
* 6183 entries for ios data

# Data Slice of Free Apps -  Google Data

In [10]:
free_google_apps = []
for each_row in googleapps_english:
    price = each_row[7]
    if price == "0":
        free_google_apps.append(each_row)
print(len(free_google_apps))
    

8864


# Data Slice of Free Apps -  IOS Data

In [11]:
free_ios_apps = []
for each_row in iosapps_english:
    price = each_row[4]
    if price == "0.0":
        free_ios_apps.append(each_row)
print(len(free_ios_apps))

3222


# Google and IOS Datasets - Ready for Analysis

We have successfully cleaned our datasets and made it ready for analysis in accordance with pur project objectives by:
* Removing identified inaccurate and missing data
* Removing duplicate data entries
* Removing non-English Apps
* Removing paid Apps

# Market Penetration Strategy

* The more users we have for our apps, the more revenue we generate. As such, our target market is both the Google Play and AppleStore Markets so as to maximize revenue. our risk minimizing market penetration strategy is to first develop an app for the goole play market, assess users response to it. If positive, then we develop the app further and launch it on the ios platform

* in furtherance of our project goals, we seek to identify the common app genres (social, games, etc.) in both markets



### Google Dataset - Common App Genres Measured in Percentages

* We use a dictionary to build a frequency table (count) for each app genre

In [12]:
# column for app genres is in index 1 and 9, which are the category and genres columns respectively.
# We build a function to generate a frequency table for any column. The function takes in a list and and index as a parameter.

def freq_table(dataset, index):
    table = {}
    total = 0
    for each_list in dataset:
        total = total + 1
        key = each_list[index]
        if key in table:
            table[key] = table[key] + 1
        else:
            table[key] = 1
    for each_key in table:
        table[each_key] = (table[each_key]/total) * 100
    return table

# We build a function to transform the frequency table into a list of tuples, then sorts the list in a descending order.

def analyse_freq_table(dataset, index) :
    dictionary = freq_table(dataset, index)
    dict_to_list = []
    for each_key in dictionary:
        dict_to_tuple = (dictionary[each_key], each_key)
        dict_to_list.append(dict_to_tuple)
    sorted_table = sorted(dict_to_list, reverse = True)
    for each_item in sorted_table:
        print(each_item[1], ":", each_item[0])
count_free_apps_google1 = (analyse_freq_table(free_google_apps, 1))
count_free_apps_google2 = (analyse_freq_table(free_google_apps, 9))
print(count_free_apps_google1)
print(count_free_apps_google2)

    

   



FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

### IOS Dataset - Common App Genres Measured in Percentages

In [13]:
# column for app genres is in index 11
count_free_apps_ios = (analyse_freq_table(free_ios_apps, 11))
print(count_free_apps_ios)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


### Google Dataset - Analysis of Results

From our result, we see that the most common free english app category is family which is over 18% of the total number of apps on playstore; followed by games with 9%. We can safely say the these first 2 app categories strongly indicates how the apps are distributed between practical and fun apps. Taking a cursory look at the precentage distribution of the app categories, we see that most of the apps are geared towards solving one real life problem or the other. That is not to say there are no apps that are mainly for fun.

Buttressing our hunch further, we look at the percentage frequency distribution using the genres column and see that tools, as a free english app, is the highest with 8% followed by entertainment with 6%. The genres column seem to have more app categories than the category column and we see there are alot of apps with practical uses. The conclusion is that on playstore, we have more apps with practical uses than fun apps but the gap is not very significant.

Note that the conclusion that we reached above does not indicate whether free English apps with practical uses have more users than other kinds of apps.

### IOS Dataset  -  Analysis of Result

From our result, entertainment apps dominate free English apps on Appstore. Over 77% of the total number of free English apps on appstore are for Entertainment with gaming apps leading by far - accounting for a thumping 58%.

However, note that we have not yet reached a conclusion as to which genre of app have more users, which has been the objective of this project all along. What we have yet discovered is the genre of app that is more frequent.

### Google Dataset - Apps with Highest Number of Users

For the google dataset, our proxy for user engagements will be the "Installs" column. However, upon observing the installs column, we discovered that the values are given in ranges. Take a look

In [42]:
a = []
for each_list in free_google_apps:
    a.append(each_list[5])
print(a[:10])

['10,000+', '5,000,000+', '50,000,000+', '100,000+', '50,000+', '50,000+', '1,000,000+', '1,000,000+', '10,000+', '1,000,000+']


We clearly see that the values are given in ranges (10,000,000+). We want to disregard the ranges and take the values for what they are. Also, since the values are strings, we will have to convert the strings. To float. To successfully do this, we will have to replace the commas and + sign with nothing respectively. Generally, we will proceed as given below

* Firstly we isolate the unique apps on the playstore
* secondly we sum up the total number of installs for each unique app
* Thirdly, we count number of times each unique app is installed 
* Then we compute the average.


In [53]:
def analyse_freq_table_1(dataset, index) :
    dictionary = freq_table(dataset, index)
    dict_to_list = []
    for each_key in dictionary:
        dict_to_list.append(each_key)
    return dict_to_list

app_list = analyse_freq_table_1(free_google_apps, 1)
print(app_list)
for app_kind in app_list:
    total = 0
    number_app_kind = 0
    for each_list in free_google_apps:
        app_category = each_list[1]
        if app_kind == app_category:
            n_installs = each_list[5]
            n_installs = n_installs.replace(",", "")
            n_installs1 = n_installs.replace("+", "")
            n_installs = float(n_installs1)
            total = total + n_installs
            number_app_kind = number_app_kind + 1
    avg_ratings = total / number_app_kind
    print(app_kind, ":", avg_ratings )
    print("\n")


['ENTERTAINMENT', 'ART_AND_DESIGN', 'LIBRARIES_AND_DEMO', 'COMMUNICATION', 'EDUCATION', 'WEATHER', 'DATING', 'MEDICAL', 'BOOKS_AND_REFERENCE', 'PERSONALIZATION', 'LIFESTYLE', 'HEALTH_AND_FITNESS', 'SOCIAL', 'NEWS_AND_MAGAZINES', 'AUTO_AND_VEHICLES', 'PHOTOGRAPHY', 'FAMILY', 'MAPS_AND_NAVIGATION', 'SHOPPING', 'TRAVEL_AND_LOCAL', 'FINANCE', 'SPORTS', 'EVENTS', 'GAME', 'TOOLS', 'BUSINESS', 'PARENTING', 'BEAUTY', 'PRODUCTIVITY', 'VIDEO_PLAYERS', 'COMICS', 'FOOD_AND_DRINK', 'HOUSE_AND_HOME']
ENTERTAINMENT : 11640705.88235294


ART_AND_DESIGN : 1986335.0877192982


LIBRARIES_AND_DEMO : 638503.734939759


COMMUNICATION : 38456119.167247385


EDUCATION : 1833495.145631068


WEATHER : 5074486.197183099


DATING : 854028.8303030303


MEDICAL : 120550.61980830671


BOOKS_AND_REFERENCE : 8767811.894736841


PERSONALIZATION : 5201482.6122448975


LIFESTYLE : 1437816.2687861272


HEALTH_AND_FITNESS : 4188821.9853479853


SOCIAL : 23253652.127118643


NEWS_AND_MAGAZINES : 9549178.467741935


AUTO_AND

### App Recommendation Criteria

Since we are aware that some top dogs in some categories may skew the average number of apps installs, which serves as our proxy for measuring user engagements with apps, we develop a criteria for recommending an app in line with the objectives of our project.

* We want a category with the lowest number of app types with a fairly high number of app installs on the average. This will increase our chances of successfully penetrating the industry
* We want a category with little or no domain knowledge requirement. This will reduce the cost of developing the app
* We want a category with a low variance, with respect to the numbr of installs of the app types. This will make our proxy a better measure of user engagements with the apps in the genre


In [59]:
# Here we develop the code to satisfy our criteria
total = 0
for each_list in free_google_apps:
    if each_list[1] == "BOOKS_AND_REFERENCE":
        total = total + 1
        print(each_list[0], ":", each_list[5])
print("\n")
print("total number of apps", ":", total)

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra ‚Äì free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+

Although the number of app types under the category "BOOKS_AND_REFERENCING seem relatively high, we can see that it has a low variance. Having developed the code to fulfil our requirement, we have iterated with different genres and discovered that the genre "Book" satisfies our criteria

### IOS Dataset - Apps with Highest Number of Users

Here, we will be using the column rating_count_tot as a proxy to measure apps with the highest number of users

* Firstly we isolate the unique apps on the appstore
* secondly we sum up the total ratings for each unique app
* Thirdly, we count number of times each unique app is rated 
* Then we compute the average ratings figure

In [14]:
def analyse_freq_table_1(dataset, index) :
    dictionary = freq_table(dataset, index)
    dict_to_list = []
    for each_key in dictionary:
        dict_to_list.append(each_key)
    return dict_to_list

app_list = analyse_freq_table_1(free_ios_apps, 11)
print(app_list)
for app_kind in app_list:
    total = 0
    number_app_kind = 0
    for genre in free_ios_apps:
        app_genre = genre[11]
        if app_kind == app_genre:
            ratings_count = float(genre[5])
            total = total + ratings_count
            number_app_kind = number_app_kind + 1
    avg_ratings = total / number_app_kind
    print(app_kind, ":", avg_ratings )
    print("\n")


['Finance', 'Education', 'Music', 'Lifestyle', 'Reference', 'Catalogs', 'News', 'Travel', 'Games', 'Utilities', 'Productivity', 'Book', 'Shopping', 'Medical', 'Weather', 'Photo & Video', 'Health & Fitness', 'Entertainment', 'Food & Drink', 'Social Networking', 'Sports', 'Business', 'Navigation']
Finance : 31467.944444444445


Education : 7003.983050847458


Music : 57326.530303030304


Lifestyle : 16485.764705882353


Reference : 74942.11111111111


Catalogs : 4004.0


News : 21248.023255813954


Travel : 28243.8


Games : 22788.6696905016


Utilities : 18684.456790123455


Productivity : 21028.410714285714


Book : 39758.5


Shopping : 26919.690476190477


Medical : 612.0


Weather : 52279.892857142855


Photo & Video : 28441.54375


Health & Fitness : 23298.015384615384


Entertainment : 14029.830708661417


Food & Drink : 33333.92307692308


Social Networking : 71548.34905660378


Sports : 23008.898550724636


Business : 7491.117647058823


Navigation : 86090.33333333333




From our result above, we know that the average number of ratings might have been seriously affected by the ratings of some very popular apps in some genres like googlemaps in Navigation genre, Facebook in Social Networking genre, etc. 

### App Recommendation Criteria

Since we are aware that some top dogs in some genres may skew the user ratings value which serves as our proxy for measuring user engagements with apps, we develop a criteria for recommending an app in line with the objectives of our project.

* We want an genre with the lowest number of app types with a fairly high user rating. This will increase our chances of successfully penetrating the industry
* We want a genre with little or no domain knowledge requirement. This will reduce the cost of developing the app
* We want a genre with a low variance, with respect to the user rating of the app types. This will make our proxy a better measure of user engagements with the apps in the genre


In [39]:
# Here we develop the code to satisfy our criteria
total = 0
for genre in free_ios_apps:
    if genre[11] == "Book":
        total = total + 1
        print(genre[1], ":", genre[5])
print("\n")
print("total number of apps", ":", total)

Kindle ‚Äì Read eBooks, Magazines & Textbooks : 252076
Audible ‚Äì audio books, original series & podcasts : 105274
Color Therapy Adult Coloring Book for Adults : 84062
OverDrive ‚Äì Library eBooks and Audiobooks : 65450
HOOKED - Chat Stories : 47829
BookShout: Read eBooks & Track Your Reading Goals : 879
Dr. Seuss Treasury ‚Äî 50 best kids books : 451
Green Riding Hood : 392
Weirdwood Manor : 197
MangaZERO - comic reader : 9
ikouhoushi : 0
MangaTiara - love comic reader : 0
Ë¨éËß£„Åç : 0
Ë¨éËß£„Åç2016 : 0


total number of apps : 14


Having developed the code to fulfil our requirement, we have iterated with different genres and discovered that the genre "Book" satisfies our criteria. 

### Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.