# ** Profitable App Profiles for the App Store and Google Play Markets **
------------------------------------------------------------------
## This Project is about ...

1. Helping our developers understand what type of apps are likely to attract more users on Google Play and the App Store.

2. Collecting and analyzing data about mobile apps available on Google Play and the App Store.


## Now, we'll open up and explore two datasets:

### As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

#### Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead.

- 'AppleStore.csv' contains data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017.

- 'googleplaystore.csv' contains data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018.


To make opening and exploring these two data sets easier, we've created a function named `explore_data()`. This function can be used repeatedly to print rows in a readable way.



The `open_dataset()` function:
- Takes in two parameters:
    - `file_name`, expected to be a list of lists.
    - `header`, which is `False`.
- returns the dataset with or without the header row

In [1]:
%config IPCompleter.greedy=True

In [2]:
def open_dataset(file_name, header=True):
    
    opened_file = open(file_name)
    from csv import reader
    read_file = reader(opened_file)
    
    data = []
    file_data = list(read_file)
    
    if header:
        for row in file_data[1:]:
            data.append(row)
    else:
        for row in file_data[:]:
            data.append(row)
            
    return data

<font color=blue>Inspect some Apple Store iOS App data:</font>

In [3]:
ios_apps_data = open_dataset('AppleStore.csv')
print(ios_apps_data[0])
print('\n')
print(ios_apps_data[1])
print('\n')
print(ios_apps_data[2])
print('\n')
print(ios_apps_data[3])
print('\n')
print(ios_apps_data[4])

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


<font color=blue>Inspect some Google Play Store App data:</font>

In [4]:
android_apps_data = open_dataset('googleplaystore.csv')
print(android_apps_data[0])
print('\n')
print(android_apps_data[1])
print('\n')
print(android_apps_data[2])
print('\n')
print(android_apps_data[3])
print('\n')
print(android_apps_data[4])

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


<font color=blue>The `explore_data()` function:</font>
- Takes in four parameters:
    - `dataset`, which is expected to be a list of lists.
    - `start` and `end`, which are both expected to be integers and represent the starting and the ending indices of a slice from the data set.
    - `rows_and_columns`, which is expected to be a Boolean and has False as a default argument.
- Slices the data set using `dataset[start:end]`.
- Loops through the slice, and for each iteration, prints a row and adds a new line after that row using print('\n').
    - The \n in print('\n') is a special character and won't be printed. Instead, the \n character adds a new line, and we use print('\n') to add some blank space between rows.
- Prints the number of rows and columns if rows_and_columns is True.
dataset shouldn't have a header row, otherwise the function will print the wrong number of rows (one more row compared to the actual length).

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [6]:
explore_data(ios_apps_data, 0, 4, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [7]:
explore_data(android_apps_data, 0, 4, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


<font color=blue>Print the column names and try to identify the columns that could help us with our analysis.</font>

In [8]:
import pandas as pd
df_ios_apps_data = pd.DataFrame(open_dataset('AppleStore.csv', False))
df_ios_apps_data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
1,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
2,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
3,529479190,Clash of Clans,116476928,USD,0.0,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
4,420009108,Temple Run,65921024,USD,0.0,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
5,284035177,Pandora - Music & Radio,130242560,USD,0.0,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1
6,429047995,Pinterest,74778624,USD,0.0,1061624,1814,4.5,4.0,6.26,12+,Social Networking,37,5,27,1
7,282935706,Bible,92774400,USD,0.0,985920,5320,4.5,5.0,7.5.1,4+,Reference,37,5,45,1
8,553834731,Candy Crush Saga,222846976,USD,0.0,961794,2453,4.5,4.5,1.101.0,4+,Games,43,5,24,1
9,324684580,Spotify Music,132510720,USD,0.0,878563,8253,4.5,4.5,8.4.3,12+,Music,37,5,18,1


<font color=blue>Print the column names and try to identify the columns that could help us with our analysis.</font>

In [9]:
df_android_apps_data = pd.DataFrame(open_dataset('googleplaystore.csv', True))
df_android_apps_data

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
6,Smoke Effect Photo Maker - Smoke Editor,ART_AND_DESIGN,3.8,178,19M,"50,000+",Free,0,Everyone,Art & Design,"April 26, 2018",1.1,4.0.3 and up
7,Infinite Painter,ART_AND_DESIGN,4.1,36815,29M,"1,000,000+",Free,0,Everyone,Art & Design,"June 14, 2018",6.1.61.1,4.2 and up
8,Garden Coloring Book,ART_AND_DESIGN,4.4,13791,33M,"1,000,000+",Free,0,Everyone,Art & Design,"September 20, 2017",2.9.2,3.0 and up
9,Kids Paint Free - Drawing Fun,ART_AND_DESIGN,4.7,121,3.1M,"10,000+",Free,0,Everyone,Art & Design;Creativity,"July 3, 2018",2.8,4.0.3 and up


In [10]:
df_ios_apps_data.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
count,7198,7198,7198,7198,7198.0,7198,7198,7198.0,7198.0,7198.0,7198,7198,7198,7198,7198,7198
unique,7198,7196,7108,2,37.0,3186,1139,11.0,11.0,1591.0,5,24,21,7,58,3
top,998276924,VR Roller Coaster,31624192,USD,0.0,0,0,4.5,4.5,1.0,4+,Games,37,5,1,1
freq,1,2,3,7197,4056.0,929,1443,2663.0,2205.0,317.0,4433,3862,3263,4503,3767,7147


In [11]:
df_android_apps_data.describe()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
count,10841,10841,10841.0,10841,10841,10841,10841,10841,10841,10841,10841,10841,10840
unique,9660,34,41.0,6002,462,22,4,93,7,120,1378,2834,34
top,ROBLOX,FAMILY,,0,Varies with device,"1,000,000+",Free,0,Everyone,Tools,"August 3, 2018",Varies with device,4.1 and up
freq,9,1972,1474.0,596,1695,1579,10039,10040,8714,842,326,1459,2451


In [12]:
data_slice = df_android_apps_data[10470:10474]
data_slice

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
10470,Jazz Wi-Fi,COMMUNICATION,3.4,49,4.0M,"10,000+",Free,0,Everyone,Communication,"February 10, 2017",0.1,2.3 and up
10471,Xposed Wi-Fi-Pwd,PERSONALIZATION,3.5,1042,404k,"100,000+",Free,0,Everyone,Personalization,"August 5, 2014",3.0.0,4.0.3 and up
10472,Life Made WI-Fi Touchscreen Photo Frame,1.9,19.0,3.0M,"1,000+",Free,0,Everyone,,"February 11, 2018",1.0.19,4.0 and up,
10473,osmino Wi-Fi: free WiFi,TOOLS,4.2,134203,4.1M,"10,000,000+",Free,0,Everyone,Tools,"August 7, 2018",6.06.14,4.4 and up


In [13]:
df_android_apps_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
0     10841 non-null object
1     10841 non-null object
2     10841 non-null object
3     10841 non-null object
4     10841 non-null object
5     10841 non-null object
6     10841 non-null object
7     10841 non-null object
8     10841 non-null object
9     10841 non-null object
10    10841 non-null object
11    10841 non-null object
12    10840 non-null object
dtypes: object(13)
memory usage: 1.1+ MB


In [14]:
df_android_apps_data.drop(axis=0, index=10472, inplace=True)

In [15]:
data_slice = df_android_apps_data[10470:10474]
data_slice

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
10470,Jazz Wi-Fi,COMMUNICATION,3.4,49,4.0M,"10,000+",Free,0,Everyone,Communication,"February 10, 2017",0.1,2.3 and up
10471,Xposed Wi-Fi-Pwd,PERSONALIZATION,3.5,1042,404k,"100,000+",Free,0,Everyone,Personalization,"August 5, 2014",3.0.0,4.0.3 and up
10473,osmino Wi-Fi: free WiFi,TOOLS,4.2,134203,4.1M,"10,000,000+",Free,0,Everyone,Tools,"August 7, 2018",6.06.14,4.4 and up
10474,Sat-Fi Voice,COMMUNICATION,3.4,37,14M,"1,000+",Free,0,Everyone,Communication,"November 21, 2014",2.2.1.5,2.2 and up


<font color=blue>Check for duplicate app records in the datasets.
Here is an example of known duplicate app 'Instagram' app having 4 entries:</font>

In [16]:
for app in android_apps_data:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


<font color=blue>Now, let's determine all the duplicate app rows for the android_apps_data_set</font>

In [17]:
duplicate_apps = []
unique_apps = []

for app in android_apps_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('No. of duplicate apps in Android App data set: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps in Android App data set: ', duplicate_apps[:10])


No. of duplicate apps in Android App data set:  1181


Examples of duplicate apps in Android App data set:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


<font color=blue>And also, let's determine all the duplicate app rows for the ios_apps_data_set</font>

In [18]:
duplicate_apps = []
unique_apps = []

for app in ios_apps_data:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
print('No. of duplicate apps in iOS App data set: ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps in iOS App data set: ', duplicate_apps[:10])


No. of duplicate apps in iOS App data set:  0


Examples of duplicate apps in iOS App data set:  []


<font color=blue>Ok, so the iOS dataset is good insofar as duplicates go. That is, the iOS dataset has no dupes.
Let's clean the Android data set by removing the duplicates. We'll keep the one row having the highest number of reviews, column index 3, as it's the most impactful and one could say the most important or meaningful.</font?

In [19]:
data_slice = df_android_apps_data[10460:10484]
data_slice

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12
10460,Wi-Fi Rabbit Unlock Key,TOOLS,4.5,142,26k,"5,000+",Paid,$1.00,Everyone,Tools,"June 26, 2011",1.0.0,2.1 and up
10461,Wi-Fi settings shortcut,TOOLS,4.1,479,29k,"50,000+",Free,0,Everyone,Tools,"December 13, 2013",1.4,1.6 and up
10462,Wi-Fi Master,TOOLS,4.5,3673,5.8M,"500,000+",Free,0,Everyone,Tools,"July 24, 2018",1.2.1,4.1 and up
10463,Micro Fi,FAMILY,3.7,39,2.8M,"5,000+",Free,0,Everyone,Education,"February 12, 2014",2.4,2.3.3 and up
10464,fi,FAMILY,2.9,691,5.8M,"100,000+",Free,0,Everyone,Strategy,"June 8, 2016",1.0.3,4.0 and up
10465,Portable Wi-Fi hotspot Free,COMMUNICATION,4.0,1711,2.1M,"100,000+",Free,0,Everyone,Communication,"January 11, 2018",1.2.5.4-11,2.3 and up
10466,Wi-Fi Networks,PRODUCTIVITY,4.2,115,1.4M,"10,000+",Free,0,Everyone,Productivity,"October 5, 2015",1.4,4.1 and up
10467,FI CFL,FINANCE,3.7,112,3.9M,"10,000+",Free,0,Everyone,Finance,"July 5, 2018",1.1.1,5.0 and up
10468,Tassa.fi Finland,LIFESTYLE,3.6,346,7.5M,"50,000+",Free,0,Everyone,Lifestyle,"May 22, 2018",5.5,4.0 and up
10469,TownWiFi | Wi-Fi Everywhere,COMMUNICATION,3.9,2372,58M,"500,000+",Free,0,Everyone,Communication,"August 2, 2018",4.2.1,4.2 and up


In [20]:
print(android_apps_data[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [21]:
del(android_apps_data[10472])

In [22]:
# Let's create a dictionary to store the 'keeper' rows.
highest_num_of_reviews = {}

for app in android_apps_data:
    name = app[0]
    num_of_reviews = float(app[3])
    
    if name in highest_num_of_reviews and highest_num_of_reviews[name] < num_of_reviews:
        highest_num_of_reviews[name] = num_of_reviews
    elif name not in highest_num_of_reviews:
        highest_num_of_reviews[name] = num_of_reviews

In [23]:
print('Length expected: ', len(android_apps_data) - 1181)
print('Length reported: ', len(highest_num_of_reviews))


Length expected:  9659
Length reported:  9659


In [24]:
android_clean = []
already_been_cleaned = []

for app in android_apps_data:
    name = app[0]
    num_of_reviews = float(app[3])
    
    if (highest_num_of_reviews[name] == num_of_reviews) and (name not in already_been_cleaned):
        android_clean.append(app)
        already_been_cleaned.append(name)

In [25]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


<font color=blue>For this analysis, we are only interested in English-speaking audience. So, we will remove non-English apps from the data set. Here are some examples of rows we will discard:</font>

In [26]:
print(ios_apps_data[813][1])

爱奇艺PPS -《欢乐颂2》电视剧热播


In [27]:
print(android_clean[4412][0])

中国語 AQリスニング


<font color=blue>Let's create a function that will let us know if the variable we pass in is English or not. If an app name contains a character that is greater than 127, then it probably means that the app has a non-English name. </font>

In [28]:
def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True

print(is_english('Facebook'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


<font color=blue>To reduce chances of discarding good data, we will only discard an app if it's name has more than three non-ascii characters in it.</font>

In [29]:
def is_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
            
    if non_ascii > 3:
        return False
    else:
        return True

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
True


<font color=blue>Now, use the is_english() function to filter out the non-English apps for both data sets:</font>

In [30]:
android_is_english = []
ios_is_english = []

for app in android_clean:
    name = app[0]
    if is_english(name):
        android_is_english.append(app)
        
for app in ios_apps_data:
    name = app[1]
    if is_english(name):
        ios_is_english.append(app)
        
explore_data(android_is_english, 0, 3, True)
print('\n')
explore_data(ios_is_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 

<font color=blue>Finally, let's retain only the free english apps like in this fashion:</font>

In [31]:
android_free_english = []
ios_free_english = []

for app in android_is_english:
    price = app[7]
    if price == '0':
        android_free_english.append(app)
        
for app in ios_is_english:
    price = app[4]
    if price == '0.0':
        ios_free_english.append(app)
        
print(len(android_free_english))
print(len(ios_free_english))

8864
3222


## <font color=blue>Most Common Apps by Genre</font>

In [32]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [33]:
display_table(ios_free_english, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


In [34]:
display_table(android_free_english, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

In [35]:
display_table(android_free_english, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

#### <font color=blue>Calculate the average number of user ratings per app genre on the App Store:</font>

In [36]:
genres_ios = freq_table(ios_free_english, -5)

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_free_english:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

Reference : 74942.11111111111
Photo & Video : 28441.54375
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Navigation : 86090.33333333333
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Productivity : 21028.410714285714
Sports : 23008.898550724636
Education : 7003.983050847458
Catalogs : 4004.0
Shopping : 26919.690476190477
Food & Drink : 33333.92307692308
Social Networking : 71548.34905660378
Utilities : 18684.456790123455
Business : 7491.117647058823
Finance : 31467.944444444445
Lifestyle : 16485.764705882353
Travel : 28243.8
Entertainment : 14029.830708661417
Medical : 612.0
News : 21248.023255813954


In [37]:
for app in ios_free_english:
    if app[-5] == 'Navigation':
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [38]:
for app in ios_free_english:
    if app[-5] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


#### <font color=blue>Most populer Apps on the Google Play Store by Genre</font>

In [39]:
display_table(android_free_english, 5)

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


In [40]:
categories_android = freq_table(android_free_english, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_free_english:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

LIBRARIES_AND_DEMO : 638503.734939759
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
NEWS_AND_MAGAZINES : 9549178.467741935
TOOLS : 10801391.298666667
LIFESTYLE : 1437816.2687861272
ART_AND_DESIGN : 1986335.0877192982
FINANCE : 1387692.475609756
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
EDUCATION : 1833495.145631068
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
DATING : 854028.8303030303
TRAVEL_AND_LOCAL : 13984077.710144928
COMICS : 817657.2727272727
SPORTS : 3638640.1428571427
HEALTH_AND_FITNESS : 4188821.9853479853
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
ENTERTAINMENT : 11640705.88235294
GAME : 15588015.603248259
AUTO_AND_VEHICLES : 647317.8170731707
FOOD_AND_DRINK : 1924897.7363636363
EVENTS : 253542.22222222222
SHOPPING : 7036877.311557789
HOUSE_AND_HOME : 1331540.5616438356
WEATHER : 5074486.197183099
PERSONALIZATION : 5201482.6122448975
BUSINESS : 1712290.1474201474
MAPS_AND_NAVIGATION : 4056941.77419

In [41]:
for app in android_free_english:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

<font color=blue>If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:</font>

In [42]:
under_100_m = []

for app in android_free_english:
    n_installs = app[5]
    n_installs = n_installs.replace(',', '')
    n_installs = n_installs.replace('+', '')
    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):
        under_100_m.append(float(n_installs))
        
sum(under_100_m) / len(under_100_m)

3603485.3884615386

<font color=blue>apps from this genre and their number of installs</font>

In [43]:
for app in android_free_english:
    if app[1] == 'BOOKS_AND_REFERENCE':
        print(app[0], ':', app[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In [44]:
for app in android_free_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'
                                            or app[5] == '500,000,000+'
                                            or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+


In [45]:
for app in android_free_english:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.

We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.

However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.

## Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.