# Profitable App Profiles for Google Play Store and iOS App Store
In this project, we will be analyzing apps from the Google Play Store and iOS App Store for profitability. We are working as Data Analysts to see which apps perform the best in terms of user count. Our app will be free and make money through in-app advertisements. 

Our goal is to analyze what type of apps have the highest user counts in order to optimize our app for the highest profit margin.

# Opening Our Files
We must open our csv files to access the data. 
1. Import reader from csv by using `from csv import reader`.
2. Use `open('AppleStore.csv')` to open the file and save it to the variable `opened_app_store`.
3. Use `reader(opened_app_store)` to read the file and save it to the variable `read_app_store`.
4. Use `list(read_app_store)` to create a list of the data and save it to the variable `app_store_data`.

Repeat steps 2-4 steps for the `googleplaystore.csv`.

In [1]:
# Open AppStore.csv
from csv import reader
opened_app_store = open('AppleStore.csv')
read_app_store = reader(opened_app_store)
app_store_data = list(read_app_store)

# Open GooglePlayStore.csv
opened_play_store = open('googleplaystore.csv')
read_play_store = reader(opened_play_store)
play_store_data = list(read_play_store)

# Exploring the CSV for Relevant Data
We print the first few rows of the data sets to see what columns we will be using for our analysis. 

We use the header row as a reference to select our columns, as well as the rows with actual data to visualize what the data will look like.

We then print the number of rows and number of columns.

In [2]:
# Print the first few rows
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        rows = ('Number of Rows', len(dataset))
        columns = ('Number of Columns', len(dataset[0]))
        print(rows)
        return columns

print('App Store Data')
print('--------------------------------------------------------------')
print('\n')
print(explore_data(app_store_data, 0, 3, True))
print('\n')
print('\n')
print('Play Store Data')
print('--------------------------------------------------------------')
print('\n')
print(explore_data(play_store_data, 0, 3, True))

App Store Data
--------------------------------------------------------------


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


('Number of Rows', 7198)
('Number of Columns', 17)




Play Store Data
--------------------------------------------------------------


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+'

# Columns We Will Be Using
* [App Store Documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)
* [Google Play Store Documentation](https://www.kaggle.com/lava18/google-play-store-apps)

From `AppStore.csv`, we will be using:
* track_name - App Name
* currency - Currency Type
* price - Price of App
* rating_count_tot - Total Ratings (all versions)
* rating_count_ver - Total Ratings (current version)
* user_rating - Avg User Rating (all versions)
* prime_genre

From `googleplaystore.csv`, we will be using: 
* App - App Name
* Category - App Category
* Rating - Avg User Rating
* Installs - Number of Installs
* Type - Paid or Free
* Price - Price of App
* Genres - App Genre

# Deleting Incorrect Data
Row `10473` in our Google Play Store data set is missing its `Category`, which is causing a shift in the columns. Since the columns are now shifted, `19` becomes the app's `Rating`, which is not possible. We delete it by executing `del play_store_data[10473]`. 

In [3]:
print(play_store_data[0])
print('\n')
print(play_store_data[10473])
del play_store_data[10473]
print('\n')
print(play_store_data[10473])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


We print the Header Row to compare to Row `10473`. Since the `Category` column is not present, all values to the right are shifted left. After deletion, Row `10473` is printed again to show the new row. As you can see, this new row has a `Category` column, which leaves the row unaffected.

# Removing Duplicate Entries: Part One
There are multiple cases in which duplicate rows appear in the Google Play Store data set. As you can see below, `Instagram` appears 4 separate times.

In [4]:
for row in play_store_data[1:]:
    if row[0] == 'Instagram':
        print(row)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In fact, there are a total of 1,181 cases of duplicate rows in the Google Play Store data set.

In [5]:
def find_duplicates(data_set):
    duplicates = []
    unique = []
    for row in data_set[1:]:
        if row[0] in unique:
            duplicates.append(row[0])
        else:
            unique.append(row[0])
            
    print('Number of Unique Apps: ', len(unique))
    print('Number of Duplicate Apps: ', len(duplicates))

print('Google Play Store Data:')
play_store = find_duplicates(play_store_data)
print('\n')
print('App Store Data:')
app_store = find_duplicates(app_store_data)

Google Play Store Data:
Number of Unique Apps:  9659
Number of Duplicate Apps:  1181


App Store Data:
Number of Unique Apps:  7197
Number of Duplicate Apps:  0


Using `Instagram` as an example, we will be keeping apps with the highest number of reviews, and deleting the duplicates with less reviews. A higher number of reviews means that row is the most up-to-date row that we need.

# Removing Duplicate Entries: Part Two
We want to create a dictionary, where each key is a unique app, and each value is the app's number of reviews.

### Play Store Expected Length

In [6]:
# Create a new dictionary
reviews_max = {}
# Iterate through data set
for row in play_store_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected Length of Play Store List: ', len(reviews_max))

Expected Length of Play Store List:  9659


We iterate through the data set and check the dictionary to see if it contains the values of the data set. 

If the app is already in the dictionary, it checks the number of reviews. If the next app is a duplicate **and** it has more reviews, then set the number of reviews for that app to the bigger value. 

### Play Store Length Without Duplicates

In [7]:
android_no_duplicates = []
already_added = []

for row in play_store_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        android_no_duplicates.append(row)
        already_added.append(name)
        
print('Length of Play Store List Without Duplicates: ', len(android_no_duplicates))
print('\n')
explore_data(android_no_duplicates, 0, 3, True)

Length of Play Store List Without Duplicates:  9659


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


('Number of Rows', 9659)


('Number of Columns', 13)

We create two lists: `android_no_duplicates` for our clean data set and `already_added` for apps that already exist in `android_no_duplicates`.

We then iterate through our original data set, `play_store_data`, and grab the `app name` and `number of reviews`.

We check `if` the `n_reviews` is equal to the number of reviews in our dictionary `reviews_max` for that app, **and** the `app name` is not in the `already_added` list. If this is **True**, then we append the `row` to `android_no_duplicates` and the `app name` to the `already_added` list.

Finally, we print the length of the `android_no_duplicates` list, `9,659`, which is the expected length for our data set.

We can now do this process for our `App Store` data set, as shown below.

We repeat the same process of finding our expected length first.

### App Store Expected Length

In [8]:
# Create a new dictionary
reviews_max = {}
# Iterate through data set
for row in app_store_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (name in reviews_max) and (reviews_max[name] < n_reviews):
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected Length of App Store List: ', len(reviews_max))

Expected Length of App Store List:  7197


Then we find the length after removing the duplicates.

### App Store Length Without Duplicates

In [9]:
ios_no_duplicates = []
already_added = []

for row in app_store_data[1:]:
    name = row[0]
    n_reviews = float(row[3])
    
    if (n_reviews == reviews_max[name]) and (name not in already_added):
        ios_no_duplicates.append(row)
        already_added.append(name)
        
print('Length of App Store List Without Duplicates: ', len(ios_no_duplicates))
print('\n')
explore_data(ios_no_duplicates, 0, 3, True)

Length of App Store List Without Duplicates:  7197


['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


('Number of Rows', 7197)


('Number of Columns', 17)

Now we have our cleaned `App Store` list along with our cleaned `Play Store` list.

# Removing Non-English Apps: Part One
In this situation, we only use English for the apps the company we work for develops. Therefore, we only want to analyze the apps directed towards an English-speaking audience.

In [10]:
print(app_store_data[815][2])
print(app_store_data[820][2])
print('\n')
print(android_no_duplicates[4412][0])
print(android_no_duplicates[7940][0])

搜狐新闻—新闻热点资讯掌上阅读软件
聚力视频-蓝光电视剧电影在线热播


中国語 AQリスニング
لعبة تقدر تربح DZ


Each character has an associated value to it. The value indicates its ASCII value. The English alphabet, numbers, punctuations, etc fall between 0 and 127. Any value greater than 127 is not in the English vocabulary.

In [11]:
print(ord('a'))
print(ord('V'))
print(ord('搜'))
print(ord('7'))
print(ord('='))

97
86
25628
55
61


We can build a function that detects whether a character belongs to the set of common English characters, which will assist us in finding and removing all the non-English apps.

In [12]:
def is_english(string):
    for char in string:
        if ord(char) > 127:
            return False
        
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
False
False


# Removing Non-English Apps: Part Two
We can see that the function works well for `Instagram` and `爱奇艺PPS -《欢乐颂2》电视剧热播`, returning `True` and `False` respectively. However, the function doesn't seem to pick up characters like `™` or emojis.

This is because the value of the characters fall out of the ASCII range of 0-127 as shown below.

In [13]:
print(ord('™'))
print(ord('😜'))

8482
128540


To minimize the data loss that would be caused by this function, we can change our function to return `False` if the `app name` has more than **three** characters that fall outside the ASCII range.

The function is not perfect, but it will reduce the amount of data loss from the previous version of the function.

In [14]:
def is_english(string):
    count = 0
    for char in string:
        if ord(char) > 127:
            count += 1
        if count > 3:
            return False
        
    return True

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


We use a `count` to track the number of characters that fall out of our ASCII range. 

If the character's value is greater than 127, then add 1 to `count`.
If `count > 3`, return `False`.

Now that we have improved our function, we can use it on our data sets. 

### Removing Non-English Apps From App Store
We will now use the function defined above on the `App Store` data set. 

We append the English apps to the `ios_english` list.

In [15]:
ios_english = []
for row in ios_no_duplicates:
    if is_english(row[2]):
        ios_english.append(row)

print(explore_data(ios_english, 0, 3, True))

['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['3', '281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


('Number of Rows', 6183)
('Number of Columns', 17)


### Removing Non-English Apps From Play Store
We will now use the function defined above on the `Play Store` data set.

We append the English apps to the `android_english` list.

In [16]:
android_english = []
for row in android_no_duplicates:
    if is_english(row[0]):
        android_english.append(row)

print(explore_data(android_english, 0, 3, True))

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


('Number of Rows', 9614)
('Number of Columns', 13)


# Isolating Free Apps
Now, we have two lists of apps with no inaccurate data, no duplicates, and no non-English characters.

However, our lists contain both paid and free apps. We only want to analyze the free apps. Therefore, we must isolate these free apps in our list.

### Isolating Free Apps From App Store

In [17]:
# Price is at index 5
ios_clean = []
for row in ios_english:
    if row[5] == '0':
        ios_clean.append(row)
        
print('Number of Free Apps in the App Store: ', len(ios_clean))

Number of Free Apps in the App Store:  3222


### Isolating Free Apps From Play Store

In [18]:
# Price is at index 7
android_clean = []
for row in android_english:
    if row[7] == '0':
        android_clean.append(row)
        
print('Number of Free Apps in the Play Store: ', len(android_clean))

Number of Free Apps in the Play Store:  8864


As we can see above, the two lists containing all English apps was narrowed down to just `Free` apps. 

In the `App Store List`, we have **3,222** free apps.

In the `Play Store List`, we have **8,864** free apps.

# Most Common Apps by Genre: Part One
The goal of this project is to determine what kind of apps are likely to attract more users since our revenue is influenced by the number of people using our app.

Our main strategy is:

1. Develop a minimal Android app for the `Google Play Store`. 
2. Then, if the app is doing well, we will develop it further. 
3. Finally, if the app is profitable after six months, we will build an iOS app for the `iOS App Store`. 

The way we do this is by first analyzing what app profiles are the most successful on both app stores.

We will be creating a frequency table using the `prime_genre` column for the `App Store Data Set`, and the `Genres` and `Category` columns for the `Google Play Store Data Set`.

# Most Common Apps by Genre: Part Two
We need two functions:

1. One function to generate a frequency table that shows percentages.
2. A second function that displays the percentages in descending order.

In [19]:
# Function to generate a frequency table that shows percentages
def freq_table(dataset, index):
    frequency_table = {}
    length = 0
    for row in dataset:
        length += 1
        value = row[index]
        if value in frequency_table:
            frequency_table[value] += 1
        else:
            frequency_table[value] = 1
    
    frequency_percentages = {}
    for key in frequency_table:
        percentage = (frequency_table[key] / length) * 100
        frequency_percentages[key] = percentage
    
    return frequency_percentages

# Function that displays the percentages in descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

Now that we have the two functions we need, we can begin to calculate the percentages for our data sets.

### Percentages for App Store (prime_genre)

In [20]:
display_table(ios_clean, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


We can see that **58%** of the apps on the `iOS App Store` are `Games`. **7%** of the apps are `Entertainment` apps, and **4%** are `Photo & Video`. 

It's safe to say that a majority of the free, English apps on the `App Store` are apps that are for fun (games, entertainment, photo & video, social networking, etc). However, this doesn't mean that they have the greatest number of users. 

### Percentages for Play Store (Genres)

In [21]:
display_table(android_clean, -4)

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The first major difference you may notice is that there is a wide array of apps on the `Google Play Store` in terms of `Genres`. There are more apps that have a practical use, such as `Tools`, `Education`, `Productivity`, etc. 

### Percentages for Play Store (Category)

In [22]:
display_table(android_clean, 1)

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 

Once again, we see that the `Google Play Store` contains more apps with a practical use. However, we do see that the `Family` Category takes up about **19%** of the `Google Play Store`.

We can see that there are much more Genres than Categories. This is because an app in a Category can be a part of multiple Genres. However, we only want the overall idea of the `Google Play Store`, so we will be using the Category column for our data.

# Most Popular Apps by Genre on the App Store
We now need to find which genres are the most popular. 

For the `Google Play Store` data set, we have the `Installs` column, and for the `App Store` data set, we have the `rating_tot_count` column. 

We need to calculate the averages of these two columns to find the most popular genres.

We start by first generating a frequency table for the `prime_genre` column in the `App Store` data set. 

In [23]:
ios_table = freq_table(ios_clean, -5)

for genre in ios_table:
    # Store sum of user ratings
    total = 0
    # Store number of apps specific to each genre
    len_genre = 0
    
    for row in ios_clean:
        genre_app = row[-5]
        if genre_app == genre:
            ratings = float(row[6])
            total += ratings
            len_genre += 1
            
    avg = total / len_genre
    print(genre, ': ', avg)

Productivity :  21028.410714285714
Weather :  52279.892857142855
Shopping :  26919.690476190477
Reference :  74942.11111111111
Finance :  31467.944444444445
Music :  57326.530303030304
Utilities :  18684.456790123455
Travel :  28243.8
Social Networking :  71548.34905660378
Sports :  23008.898550724636
Health & Fitness :  23298.015384615384
Games :  22788.6696905016
Food & Drink :  33333.92307692308
News :  21248.023255813954
Book :  39758.5
Photo & Video :  28441.54375
Entertainment :  14029.830708661417
Business :  7491.117647058823
Lifestyle :  16485.764705882353
Education :  7003.983050847458
Navigation :  86090.33333333333
Medical :  612.0
Catalogs :  4004.0


The genre with the highest average total ratings is the `Navigation` genre with **86,090 average ratings**.

Let's examine this genre further:

In [27]:
for row in ios_clean:
    if row[-5] == 'Navigation':
        print(row[2], ':', row[6])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Geocaching® : 12811
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5
CoPilot GPS – Car Navigation & Offline Maps : 3582
Google Maps - Navigation & Transit : 154911


The `Navigation` genre is skewed by more popular apps such as `Waze` or `Google Maps`. With numbers in the 100,000's, these apps make the `Navigation` genre appear more popular than it really is.

The `Reference` and `Social Networking` genres are the next runners-up. 

Let's examine the `Reference` genre below: 

In [30]:
for row in ios_clean:
    if row[-5] == 'Reference':
        print(row[2], ':', row[6])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


In the `Reference` genre, we can see that the `Bible` and `Dictionary.com` apps have a very large rating count. These apps also skew the data, making the genre appear more popular than it may be.

We examine the `Social Networking` genre below:

In [29]:
for row in ios_clean:
    if row[-5] == 'Social Networking':
        print(row[2], ':', row[6])

Facebook : 2974676
LinkedIn : 71856
Skype for iPhone : 373519
Tumblr : 334293
Match™ - #1 Dating App. : 60659
WhatsApp Messenger : 287589
TextNow - Unlimited Text + Calls : 164963
Grindr - Gay and same sex guys chat, meet and date : 23201
imo video calls and chat : 18841
Ameba : 269
Weibo : 7265
Badoo - Meet New People, Chat, Socialize. : 34428
Kik : 260965
Qzone : 1649
Fake-A-Location Free ™ : 354
Tango - Free Video Call, Voice and Chat : 75412
MeetMe - Chat and Meet New People : 97072
SimSimi : 23530
Viber Messenger – Text & Call : 164249
Find My Family, Friends & iPhone - Life360 Locator : 43877
Weibo HD : 16772
POF - Best Dating App for Conversations : 52642
GroupMe : 28260
Lobi : 36
WeChat : 34584
ooVoo – Free Video Call, Text and Voice : 177501
Pinterest : 1061624
知乎 : 397
Qzone HD : 458
Skype for iPad : 60163
LINE : 11437
QQ : 9109
LOVOO - Dating Chat : 1985
QQ HD : 5058
Messenger : 351466
eHarmony™ Dating App - Meet Singles : 11124
YouNow: Live Stream Video Chat : 12079
Cougar 

It's very obvious that the bigger apps such as `Facebook, Instagram, Pinterest, Skype, etc` will have the highest amount of ratings.

These apps dominate the genre, and are very difficult to compete with. Therefore, they will skew the average.

Our goal is to make money through in-app advertisements. 

Advertising on `Social Networking` apps is very competitive and expensive. Also, you do not want to distract people in the `Navigation` genre with advertisements if they are driving or travelling.

An app in the `Reference` genre may prove to be profitable, as it is not too expensive, competitive, or distracting to the user. 

### Frequency Table for Category in Google Play Store

In [24]:
android_table = freq_table(android_clean, 1)

for category in android_table:
    # Store sum of user ratings
    total = 0
    # Store number of apps specific to each genre
    len_category = 0
    
    for row in android_clean:
        category_app = row[1]
        if category_app == category:
            installs = row[5]
            installs = installs.replace(',', '')
            installs = installs.replace('+', '')
            total += float(installs)
            len_category += 1
            
    avg = total / len_category
    print(category, ': ', avg)

ART_AND_DESIGN :  1986335.0877192982
AUTO_AND_VEHICLES :  647317.8170731707
BEAUTY :  513151.88679245283
BOOKS_AND_REFERENCE :  8767811.894736841
BUSINESS :  1712290.1474201474
COMICS :  817657.2727272727
COMMUNICATION :  38456119.167247385
DATING :  854028.8303030303
EDUCATION :  1833495.145631068
ENTERTAINMENT :  11640705.88235294
EVENTS :  253542.22222222222
FINANCE :  1387692.475609756
FOOD_AND_DRINK :  1924897.7363636363
HEALTH_AND_FITNESS :  4188821.9853479853
HOUSE_AND_HOME :  1331540.5616438356
LIBRARIES_AND_DEMO :  638503.734939759
LIFESTYLE :  1437816.2687861272
GAME :  15588015.603248259
FAMILY :  3695641.8198090694
MEDICAL :  120550.61980830671
SOCIAL :  23253652.127118643
SHOPPING :  7036877.311557789
PHOTOGRAPHY :  17840110.40229885
SPORTS :  3638640.1428571427
TRAVEL_AND_LOCAL :  13984077.710144928
TOOLS :  10801391.298666667
PERSONALIZATION :  5201482.6122448975
PRODUCTIVITY :  16787331.344927534
PARENTING :  542603.6206896552
WEATHER :  5074486.197183099
VIDEO_PLAYERS 

The category with the highest average installations is the `BOOKS_AND_REFERENCE` category with **8,767,811 average installations**.

The genre with the highest average ratings in the `App Store` is `Reference`, as well as here in the `Play Store`. However, we want to verify that the data is not skewed here.

In [34]:
for row in android_clean:
    if row[1] == 'BOOKS_AND_REFERENCE':
        print(row[0], ':', row[5])

E-Book Read - Read Book for free : 50,000+
Download free book with green book : 100,000+
Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Free Panda Radio Music : 100,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
English Grammar Complete Handbook : 500,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
Google Play Books : 1,000,000,000+
AlReader -any text book reader : 5,000,000+
Offline English Dictionary : 100,000+
Offline: English to Tagalog Dictionary : 500,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
Recipes of Prophetic Medicine for free : 500,000+
ReadEra – free ebook reader : 1,000,000+
Anonymous caller detection : 10,000+
Ebook Reader : 5,000,000+
Litnet - E-books : 100,000+
Read books online : 5,000,000+
English to Urdu Dictionary : 500,000+
eBoox: book reader fb2 epub zip : 1,000,000+
English Persian Dictionary : 500,000+
Flybook : 500,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
E

In the `BOOKS_AND_REFERENCE` category, there are many popular apps such as dictionary apps, religious text apps, and reading apps with millions of ratings or more. 

However, many of the apps in this category have many installations regardless of popularity. An app in the `BOOKS_AND_REFERENCE` category on the `Play Store` may prove to be profitable. 

# Conclusion
Using the information we have gathered, it seems that an app in the `BOOKS_AND_REFERENCE`category may fit **Step 1** of our strategy, by developing a minimal app for the `Play Store`. 

If the app is doing well, we develop it further, fulfilling **Step 2** of our strategy.

Finally, if the app is still profitable after six months, we can develop it for iOS on the `App Store`, which fulfills **Step 3** of our strategy.


### Our Profitable App Profile:
If we attempt to imitate these popular apps, a profitable app profile may be a `Book, or Reference app` which supports quotes, definitions, reading, etc. Advertisements may be played during use, and a membership fee could be implemented to remove advertisements while maintaining profits.