# App Data Project Ikkanda

*I will be analyzing data collected from two different stores.  The goal of the project is to be able to understand what apps will be the most attractive to users and make recommendations according to the data.*

In [1]:
opened_file = open('googleplaystore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

In [2]:
opened_file = open('AppleStore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [3]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [4]:
print(android_header)
print('\n')
explore_data(android,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', '15-Jan-18', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


*The Google Play dataset has 10,841 apps and 13 fields.  Initial viewing seems to show that* ```'App'```, ```'Rating'```, ```'Category'```, ```'Reviews'```, ```'Installs'```, ```'Type'```, ```'Price'```, *and* ```'Genres'```.

## iOS Data

In [5]:
print(ios_header)
print('\n')
explore_data(ios,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


*The App Store dataset has 7,197 apps and 16 fields.  Initial viewing seems to show that* ```'track_name'```, ```'price'```, ```'rating_count_tot'```, ```'rating_count_ver'```, ```'user_rating'```, *and* ```'prime_genre'```.  To get more information about the dataset, you can take a look at the additional [documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps) for the dataset.

# Data Cleaning

Before working with the data, the data must be cleaned up by taking out data with errors, removing duplicates, or removing data that is not part of our desires parameters.

## Cleaning Part 1: Column Shift

One of the errors in the dataset is a row that is missing the ```'Category'``` field and therefore ther is a column shift.  This row will be deleted.

In [6]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [7]:
print(android[10472])

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', '11-Feb-18', '1.0.19', '4.0 and up', '']


In [8]:
del android[10472]

In [9]:
print(android[10472])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', '7-Aug-18', '6.06.14', '4.4 and up']


## Cleaning Part 2: Duplicates

### Identifying duplicates in data
One way to see if there are duplicate entries is by creating a list of unique apps.  Then you can compare its length with the entire list.

In [10]:
android_duplicate_apps = []
android_unique_apps = []

for app in android:
    app_name = app[0]
    if app_name not in android_unique_apps:
        android_unique_apps.append(app_name)
    else:
        android_duplicate_apps.append(app_name)

In [11]:
print('Number of unique android apps: ' + str(len(android_unique_apps)))
print('Total number of android apps: ' + str(len(android)))
print('Number of duplicate android apps: ' + str(len(android) - len(android_unique_apps)))

Number of unique android apps: 9659
Total number of android apps: 10840
Number of duplicate android apps: 1181


In [12]:
print(android_duplicate_apps[:10])

['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [13]:
ios_unique_apps = []
ios_duplicate_apps = []

for app in ios:
    app_name = app[1]
    if app_name not in ios_unique_apps:
        ios_unique_apps.append(app_name)
    else:
        ios_duplicate_apps.append(app_name)

In [14]:
print('Number of unique iOS apps: ' + str(len(ios_unique_apps)))
print('Total number of iOS apps: ' + str(len(ios)))
print('Number of duplicate iOS apps: ' + str(len(ios) - len(ios_unique_apps)))

Number of unique iOS apps: 7195
Total number of iOS apps: 7197
Number of duplicate iOS apps: 2


In [15]:
print(ios_duplicate_apps)

['Mannequin Challenge', 'VR Roller Coaster']


A quick glance at the duplicate apps will enable us to figure out how to approach deleting the data.

In [16]:
for app in android:
    name = app[0]
    if name == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', '26-Feb-18', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', '26-Feb-18', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', '26-Feb-18', 'Varies with device', '4.0.3 and up']


Once can notice that the differences happen at the 4th field which is the number of reviews.  So it would make sense to keep the data point that has the most amount of reviews and deleting all the others (rather than deleting data points at random)

### Deleting duplicates based on specific criteria

First we will determine the length of the data set once we have deleted the duplicates as a way to verify that we deleted just the duplicates.

In [17]:
print('Length of dataset after deleting duplicates: ', len(android) - 1181)

Length of dataset after deleting duplicates:  9659


To delete the specific data we will
1.  create a dictionary where the key is a unique app name and the corresponding dictionary value is the highest number of reviews for that app.
2.  create a new dataset from the dictionary information where we will only select the entry with the highest number of reviews.

In [18]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        

In [19]:
print(reviews_max['osmino Wi-Fi: free WiFi'])

134203.0


In [20]:
print('Actual length of dataset after deleting duplicates: ',len(reviews_max))

Actual length of dataset after deleting duplicates:  9659


Now that we have the dictionary with the right amount of entries, we will create a new dataset with just the unique entries and remove the duplicates that have less number of reviews.

In [21]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
    

Just to double check, I'm going to explore the new cleaned up dataset.

In [22]:
explore_data(android_clean,0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


Great! We have the expected amount of rows.

Now for the apple store dataset to remove duplicates.  We need to determine the index numbers to individually delete the two duplicate entries.

In [23]:
count = 0
for app in ios:
    count += 1
    name = app[1]
    tot_rating = app[2]
    if name == 'Mannequin Challenge' or name == 'VR Roller Coaster':
        print(count,app)

2949 ['1173990889', 'Mannequin Challenge', '109705216', 'USD', '0.0', '668', '87', '3.0', '3.0', '1.4', '9+', 'Games', '37', '4', '1', '1']
4443 ['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1']
4464 ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1']
4832 ['1089824278', 'VR Roller Coaster', '240964608', 'USD', '0.0', '67', '44', '3.5', '4.0', '0.81', '4+', 'Games', '38', '0', '1', '1']


In [24]:
print(ios[4441:4444])

[['1080345135', 'Heavy Metal Tennis Training', '64642048', 'USD', '0.0', '107', '12', '3.0', '4.0', '1.2.2', '9+', 'Games', '40', '5', '1', '1'], ['952877179', 'VR Roller Coaster', '169523200', 'USD', '0.0', '107', '102', '3.5', '3.5', '2.0.0', '4+', 'Games', '37', '5', '1', '1'], ['353573707', 'Coaster VR, Extreme Endless 3D Stereograph', '30977024', 'USD', '0.99', '107', '0', '2.5', '0.0', '1.1.5', '4+', 'Entertainment', '37', '5', '1', '1']]


In [25]:
del ios[4442]
print(ios[4441:4444])

[['1080345135', 'Heavy Metal Tennis Training', '64642048', 'USD', '0.0', '107', '12', '3.0', '4.0', '1.2.2', '9+', 'Games', '40', '5', '1', '1'], ['353573707', 'Coaster VR, Extreme Endless 3D Stereograph', '30977024', 'USD', '0.99', '107', '0', '2.5', '0.0', '1.1.5', '4+', 'Entertainment', '37', '5', '1', '1'], ['1148053146', 'Osteya: Adventures', '65756160', 'USD', '0.0', '107', '5', '3.5', '5.0', '2.0.2', '9+', 'Games', '37', '5', '1', '1']]


In [26]:
print(ios[4461:4464])

[['625411864', 'Sproggiwood', '438601728', 'USD', '4.99', '105', '37', '4.5', '4.5', '1.2.10', '12+', 'Games', '40', '5', '1', '1'], ['1178454060', 'Mannequin Challenge', '59572224', 'USD', '0.0', '105', '58', '4.0', '4.5', '1.0.1', '4+', 'Games', '38', '5', '1', '1'], ['1041406978', 'DOFUS Touch', '3366912', 'USD', '0.0', '104', '3', '4.0', '4.0', '1.9.28', '12+', 'Games', '37', '5', '6', '1']]


In [27]:
del ios[4462]
print(ios[4461:4464])

[['625411864', 'Sproggiwood', '438601728', 'USD', '4.99', '105', '37', '4.5', '4.5', '1.2.10', '12+', 'Games', '40', '5', '1', '1'], ['1041406978', 'DOFUS Touch', '3366912', 'USD', '0.0', '104', '3', '4.0', '4.0', '1.9.28', '12+', 'Games', '37', '5', '6', '1'], ['1095681176', "Ninja Dude vs Zombies - endless tap 'n' slash zombie arcade game", '83969024', 'USD', '0.0', '104', '17', '4.0', '4.0', '1.2.1', '12+', 'Games', '38', '5', '10', '1']]


## Cleaning Part 3: Removing non-English entries

Some of the entries have titles in different languages (examples below), so we will identify and remove those entries by first identifying rows that do not have common english characters.

In [28]:
print(ios[813])
print('\n')
print(android_clean[4412])

['445375097', '爱奇艺PPS -《欢乐颂2》电视剧热播', '224617472', 'USD', '0.0', '14844', '0', '4.0', '0.0', '6.3.3', '17+', 'Entertainment', '38', '5', '3', '1']


['中国語 AQリスニング', 'FAMILY', 'NaN', '21', '17M', '5,000+', 'Free', '0', 'Everyone', 'Education', '22-Jun-16', '2.4.0', '4.0 and up']


### Step 1: Identifying entries with non-English characters

In [29]:
def eng_char(string):
    for character in string:
        if ord(character) > 127:
            return False

    return True

print(eng_char('Instagram'))
    

True


In [30]:
print(eng_char('Instagram'))
print(eng_char('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
False


In [31]:
print(eng_char('Docs To Go™ Free Office Suite'))
print(eng_char('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


### Step 2: Keeping entries that are english but include non-ASCII characters.

We will try to sort out mostly English titles by keeping entries that have no more than three ASCII characters that are above 127.

In [32]:
def eng_char(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True

print(eng_char('Docs To Go™ Free Office Suite'))
print(eng_char('Instachat 😜'))

True
True


In [33]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if eng_char(name):
        android_english.append(app)

for app in ios:
    name = app[1]
    if eng_char(name):
        ios_english.append(app)
        
explore_data(android_english, 0,3,True)
print('\n')
explore_data(ios_english, 0,3,True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', '7-Jan-18', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', '1-Aug-18', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', '8-Jun-18', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5

We see that we are now left with 9614 Android apps and 6181 iOS apps.

## Cleaning Part 4: Isolating Free apps

In [34]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)

print(len(android_final))
print(len(ios_final))

8864
3220


We are now left with datasets that have 8864 Andoird apps and 3220 iOS apps, which is enough for our analysis.

# Analysis: Most Popular Apps by Genre
Now that we have cleaned up data sets, we want to analyze the data sets to determine which apps are likely to attract more users.

## Rationale
The general validation strategy for an app idea moves in three steps:
1.  Start off with a minimal Android version of the app that is then added to Google Play store.
2.  If the app has attracted a significant number of users, the app will be developed further.
3.  After 6 months, if the app is profitable, then an iOS version of the app will be built and added to the App Store.

Recognizing that a successful app will be one that is popular in Google Play and the App Store, we will try to identify characteristics of apps that fit that criteria.  So first, we will look at the genres of games that are popular on each market.

To begin, we will analyze the data with this in mind as we explore the genres for each market.  In the App Store data set, we will explore the ```prime_genre``` column and in the Google Play data set, we will explore the ```Genres``` and ```Category``` columns.

## Building functions for analysis

Two functions are built so that we can easily create frequency tables.
- Frequency table generating function that shows percentages
- Displaying percentages in descending order

In [35]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
            
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) *100
        table_percentages[key] = percentage
        
    return table_percentages

In [36]:
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', round(entry[0],2),'%')

## Creating a frequency table for iOS
Examining the frequency table for the ```prime_genre``` column of the App Store data set.

In [37]:
display_table(ios_final,11)

Games : 58.14 %
Entertainment : 7.89 %
Photo & Video : 4.97 %
Education : 3.66 %
Social Networking : 3.29 %
Shopping : 2.61 %
Utilities : 2.52 %
Sports : 2.14 %
Music : 2.05 %
Health & Fitness : 2.02 %
Productivity : 1.74 %
Lifestyle : 1.58 %
News : 1.34 %
Travel : 1.24 %
Finance : 1.12 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.56 %
Business : 0.53 %
Book : 0.43 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


Observing the list of genres, we see that games dominate the genres by far with 58.14% while the genre with the next most amount of apps is Entertainment with 7.90%.  Additionally, the least amount of apps are found in specific fields like 'Business' or 'Medical' and other apps made for reference or reading like 'Reference', 'Book', 'Navigation' and 'Catalogs' which may not have that many variations on the apps in general.

## Creating a frequency table for Android using the category and genre fields

In [38]:
display_table(android_final,1)

FAMILY : 18.91 %
GAME : 9.72 %
TOOLS : 8.46 %
BUSINESS : 4.59 %
LIFESTYLE : 3.9 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.7 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.32 %
COMMUNICATION : 3.24 %
HEALTH_AND_FITNESS : 3.08 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.8 %
SOCIAL : 2.66 %
TRAVEL_AND_LOCAL : 2.34 %
SHOPPING : 2.25 %
BOOKS_AND_REFERENCE : 2.14 %
DATING : 1.86 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.4 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.16 %
ENTERTAINMENT : 0.96 %
LIBRARIES_AND_DEMO : 0.94 %
AUTO_AND_VEHICLES : 0.93 %
HOUSE_AND_HOME : 0.82 %
WEATHER : 0.8 %
EVENTS : 0.71 %
PARENTING : 0.65 %
ART_AND_DESIGN : 0.64 %
COMICS : 0.62 %
BEAUTY : 0.6 %


Taking a look at this data, the family category has the most apps at 18.91%, while games is next at 9.72%.  The category list seems to be more broad than the genres list.  But games does seem to be consistently common relative to the other categories.

In [39]:
display_table(android_final,9)

Tools : 8.45 %
Entertainment : 6.07 %
Education : 5.35 %
Business : 4.59 %
Productivity : 3.89 %
Lifestyle : 3.89 %
Finance : 3.7 %
Medical : 3.53 %
Sports : 3.46 %
Personalization : 3.32 %
Communication : 3.24 %
Action : 3.1 %
Health & Fitness : 3.08 %
Photography : 2.94 %
News & Magazines : 2.8 %
Social : 2.66 %
Travel & Local : 2.32 %
Shopping : 2.25 %
Books & Reference : 2.14 %
Simulation : 2.04 %
Dating : 1.86 %
Arcade : 1.85 %
Video Players & Editors : 1.77 %
Casual : 1.76 %
Maps & Navigation : 1.4 %
Food & Drink : 1.24 %
Puzzle : 1.13 %
Racing : 0.99 %
Role Playing : 0.94 %
Libraries & Demo : 0.94 %
Auto & Vehicles : 0.93 %
Strategy : 0.91 %
House & Home : 0.82 %
Weather : 0.8 %
Events : 0.71 %
Adventure : 0.68 %
Comics : 0.61 %
Beauty : 0.6 %
Art & Design : 0.6 %
Parenting : 0.5 %
Card : 0.45 %
Casino : 0.43 %
Trivia : 0.42 %
Educational;Education : 0.39 %
Board : 0.38 %
Educational : 0.37 %
Education;Education : 0.34 %
Word : 0.26 %
Casual;Pretend Play : 0.24 %
Music : 0.2 %
R

Observing the Genres field, it seems that 'Tools' has the most apps at 8.45% while 'Entertainment' has the next highest at 6.07% of the total apps.  However, the genres on the Google Play store seems to be more even when it comes to the variety of genres (whether about entertainment/games or practical/educational).

So overall, even still there is a good amount of apps created for games and entertainment, but whether those actually attract the most amount of users, we'll need to analyze the data more.  Additionally, these data sets are constrained to free applications that are made for English speaking groups so we must keep that in mind when making conclusions.

## Analysis on Number of Reviews/Installs of Each App

One can assess to a certain extent, the popularity of an app, by analyzing the number of installs for that app.  So for Google Play, we will analyze the ```Installs``` column, but since that information is not in the App Store data set, we will use the ```rating_count_tot``` as a substitute measure for the popularity of the app.

In [40]:
genres_ios = freq_table(ios_final,11)

In [41]:
for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in ios_final:
        genre_app = app[11]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', round(avg_n_ratings))

Social Networking : 71548
Photo & Video : 28442
Games : 22813
Music : 57327
Reference : 74942
Health & Fitness : 23298
Weather : 52280
Utilities : 18684
Travel : 28244
Shopping : 26920
News : 21248
Navigation : 86090
Lifestyle : 16486
Entertainment : 14030
Food & Drink : 33334
Sports : 23009
Book : 39758
Finance : 31468
Education : 7004
Productivity : 21028
Business : 7491
Catalogs : 4004
Medical : 612


According to the average number of ratings for the App Store by genre, the apps with the top number of ratings are the Navigation apps (86,090 ratings) and reference apps (74,942 ratings).  The games and entertainment apps did not have as many ratings.

In [42]:
for app in ios_final:
    if app[11] == 'Navigation':
        print(app[1],':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


It looks like the number of ratings are much more heavily seen in the Waze and Google Maps apps which gives the navigation apps a much higher rating count.  However this is a very narrow market to enter into because there does not seem to be as much variety within the apps in this category.  Therefore, it may be difficult to create a popular app within this field.

In [43]:
for app in ios_final:
    if app[11] == 'Reference':
        print(app[1],':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


Within the reference genre, there seems to be more variety of different apps, but again, there are a few apps that seem to have a very large number of reviews while many others do not so it is difficult to say if we can break into this market.

Let's take a look at just a different genre in general to get a feel for the distribution of reviews.  For example, education.

In [44]:
for app in ios_final:
    if app[11] == 'Education':
        print(app[1],':', app[5])

Duolingo - Learn Spanish, French and more : 162701
Guess My Age  Math Magic : 123190
Lumosity - Brain Training : 96534
Elevate - Brain Training and Games : 58092
Fit Brains Trainer : 46363
ClassDojo : 35440
Memrise: learn languages : 20383
Peak - Brain Training : 20322
Canvas by Instructure : 19981
ABCmouse.com - Early Learning Academy : 18749
Quizlet: Study Flashcards, Languages & Vocabulary : 16683
Photomath - Camera Calculator : 16523
iTunes U : 15801
Blackboard Mobile Learn™ : 13567
Star Chart : 13482
Remind: Fast, Efficient School Messaging : 9796
PBS KIDS Video : 8651
Toca Kitchen Monsters : 8062
Toca Hair Salon - Christmas Gift : 8049
Edmodo : 7197
Prodigy Math Game : 6683
Epic! - Unlimited Books for Kids : 6676
ChineseSkill -Learn Mandarin Chinese Language Free : 6077
Google Classroom : 5942
TED : 5782
Khan Academy: you can learn anything : 5459
Got It - Homework Help Math, Chem, Physics Solver : 4903
PowerSchool Mobile : 4547
SkyView® Free - Explore the Universe : 4188
Hopsco

The education genre looks like there is a more evenly distributed number of reviews for a variety of apps so possibly this may be an arena to enter into while not having to compete against a few giants in the field.  However, these conlcusions are based on number of reviews and not the number of installs nor whether the reviews are good or bad.  So we will take a look at the Google Play data which has more information to help us make some clearer conclusions.

## Google Play data

We will explore some of the google play data based on number of installs.

In [45]:
display_table(android_final,5)

1,000,000+ : 15.73 %
100,000+ : 11.55 %
10,000,000+ : 10.55 %
10,000+ : 10.2 %
1,000+ : 8.39 %
100+ : 6.92 %
5,000,000+ : 6.83 %
500,000+ : 5.56 %
50,000+ : 4.77 %
5,000+ : 4.51 %
10+ : 3.54 %
500+ : 3.25 %
50,000,000+ : 2.3 %
100,000,000+ : 2.13 %
50+ : 1.92 %
5+ : 0.79 %
1+ : 0.51 %
500,000,000+ : 0.27 %
1,000,000,000+ : 0.23 %
0+ : 0.05 %
0 : 0.01 %


Because the number of installs are in ranges and not specific numbers, we are going to simplify the number ranges to just the numbers themselves.  For example, instead of 5+ installs, we are just going to say 5 installs so we can do calculations.

In [46]:
categories_android = freq_table(android_final,1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_final:
        category_app = app[1]
        if category_app == category:
            n_install = app[5]
            n_install = n_install.replace(',','')
            n_install = n_install.replace('+','')
            total += float(n_install)
            len_category += 1
    avg_installs_per_category = total / len_category
    print(category, ':', round(avg_installs_per_category))

ART_AND_DESIGN : 1986335
AUTO_AND_VEHICLES : 647318
BEAUTY : 513152
BOOKS_AND_REFERENCE : 8767812
BUSINESS : 1712290
COMICS : 817657
COMMUNICATION : 38456119
DATING : 854029
EDUCATION : 1833495
ENTERTAINMENT : 11640706
EVENTS : 253542
FINANCE : 1387692
FOOD_AND_DRINK : 1924898
HEALTH_AND_FITNESS : 4188822
HOUSE_AND_HOME : 1331541
LIBRARIES_AND_DEMO : 638504
LIFESTYLE : 1437816
GAME : 15588016
FAMILY : 3695642
MEDICAL : 120551
SOCIAL : 23253652
SHOPPING : 7036877
PHOTOGRAPHY : 17840110
SPORTS : 3638640
TRAVEL_AND_LOCAL : 13984078
TOOLS : 10801391
PERSONALIZATION : 5201483
PRODUCTIVITY : 16787331
PARENTING : 542604
WEATHER : 5074486
VIDEO_PLAYERS : 24727872
NEWS_AND_MAGAZINES : 9549178
MAPS_AND_NAVIGATION : 4056942


Looking at these averages, COMMUNICATION apps have the most amount of installs at 38,456,119 installs on average while SOCIAL, VIDEO PLAYERS and PRODUCTIVITY also have a high number of installs.  Peering into the data further we want to see which apps have the most installs.

In [47]:
android_com_installs = []
android_com_apps = []
android_com_tuples = []

for app in android_final:
    if app[1] == 'COMMUNICATION':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '')
        n_installs = float(n_installs)
        mytuple = (n_installs, app[0])
        android_com_tuples.append(mytuple)
        android_com_apps.append(app[0])
        android_com_installs.append(n_installs)

android_com_table = []
android_com_table.append(android_com_apps)
android_com_table.append(android_com_installs)

sorted_android_com_table = sorted(android_com_tuples, reverse = True)

for entry in sorted_android_com_table:
    print(entry[1], ':', entry[0])


WhatsApp Messenger : 1000000000.0
Skype - free IM & video calls : 1000000000.0
Messenger – Text and Video Chat for Free : 1000000000.0
Hangouts : 1000000000.0
Google Chrome: Fast & Secure : 1000000000.0
Gmail : 1000000000.0
imo free video calls and chat : 500000000.0
Viber Messenger : 500000000.0
UC Browser - Fast Download Private & Secure : 500000000.0
LINE: Free Calls & Messages : 500000000.0
Google Duo - High Quality Video Calls : 500000000.0
imo beta free calls and text : 100000000.0
Yahoo Mail – Stay Organized : 100000000.0
Who : 100000000.0
WeChat : 100000000.0
UC Browser Mini -Tiny Fast Private & Secure : 100000000.0
Truecaller: Caller ID, SMS spam blocking & Dialer : 100000000.0
Telegram : 100000000.0
Opera Mini - fast web browser : 100000000.0
Opera Browser: Fast and Secure : 100000000.0
Messenger Lite: Free Calls & Messages : 100000000.0
Kik : 100000000.0
KakaoTalk: Free Calls & Text : 100000000.0
GO SMS Pro - Messenger, Free Themes, Emoji : 100000000.0
Firefox Browser fast &

FO PSA Sept-Fons : 100.0
FO OP St-Nazaire : 100.0
FO Interim : 100.0
FO AIRBUS Nantes : 100.0
EP RSS Reader : 100.0
EHiN-FH conferenceapp : 100.0
DG Card : 100.0
Cy Messenger : 100.0
Carpooling FH Hagenberg : 100.0
CW Bluetooth SPP : 100.0
CJ DVD Rentals : 100.0
CJ Browser - Fast & Private : 100.0
CF Chat: Connecting Friends : 100.0
C W Browser : 100.0
Bee'ah Employee App : 100.0
BV : 100.0
Amadeus GR & CY : 100.0
Aj.Petra : 100.0
Hyundai CX Conference : 50.0
DK TEL Dialer : 50.0
Cb browser : 50.0
BS-Mobile : 50.0
AC-BL : 50.0
ei : 10.0
Oklahoma Ag Co-op Council : 10.0
Hlášenírozhlasu.cz : 10.0
FP Live : 10.0
FNH Payment Info : 10.0
FN Web Radio : 10.0
Ek IRA : 10.0
EO Mumbai : 10.0
EJ messenger : 10.0
DM - The Offical Messaging App : 10.0
DK Browser : 10.0
CK Call NEW : 10.0
Best Browser BD social networking : 10.0
BJ - Confidential : 10.0
Test Server SMS FA : 5.0
Of the wall Arapaho bk : 5.0
BA SALES : 1.0


In [48]:
android_ent_installs = []
android_ent_apps = []
android_ent_tuples = []

for app in android_final:
    if app[1] == 'SOCIAL':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '')
        n_installs = float(n_installs)
        mytuple = (n_installs, app[0])
        android_ent_tuples.append(mytuple)
        android_ent_apps.append(app[0])
        android_ent_installs.append(n_installs)

android_ent_table = []
android_ent_table.append(android_ent_apps)
android_ent_table.append(android_ent_installs)

sorted_android_ent_table = sorted(android_ent_tuples, reverse = True)

for entry in sorted_android_ent_table:
    print(entry[1], ':', entry[0])

Instagram : 1000000000.0
Google+ : 1000000000.0
Facebook : 1000000000.0
Snapchat : 500000000.0
Facebook Lite : 500000000.0
VK : 100000000.0
Tumblr : 100000000.0
Tik Tok - including musical.ly : 100000000.0
Tango - Live Video Broadcast : 100000000.0
Pinterest : 100000000.0
LinkedIn : 100000000.0
Badoo - Free Chat & Dating App : 100000000.0
BIGO LIVE - Live Stream : 100000000.0
ooVoo Video Calls, Messaging & Stories : 50000000.0
Zello PTT Walkie Talkie : 50000000.0
SKOUT - Meet, Chat, Go Live : 50000000.0
POF Free Dating App : 50000000.0
MeetMe: Chat & Meet New People : 50000000.0
textPlus: Free Text & Calls : 10000000.0
magicApp Calling & Messaging : 10000000.0
YouNow: Live Stream Video Chat : 10000000.0
We Heart It : 10000000.0
Waplog - Free Chat, Dating App, Meet Singles : 10000000.0
TextNow - free text + calls : 10000000.0
Text free - Free Text + Call : 10000000.0
Text Me: Text Free, Call Free, Second Phone Number : 10000000.0
Tapatalk - 100,000+ Forums : 10000000.0
Tagged - Meet, Ch

In [49]:
android_ent_installs = []
android_ent_apps = []
android_ent_tuples = []

for app in android_final:
    if app[1] == 'VIDEO_PLAYERS':
        n_installs = app[5]
        n_installs = n_installs.replace(',', '')
        n_installs = n_installs.replace('+', '')
        n_installs = float(n_installs)
        mytuple = (n_installs, app[0])
        android_ent_tuples.append(mytuple)
        android_ent_apps.append(app[0])
        android_ent_installs.append(n_installs)

android_ent_table = []
android_ent_table.append(android_ent_apps)
android_ent_table.append(android_ent_installs)

sorted_android_ent_table = sorted(android_ent_tuples, reverse = True)

for entry in sorted_android_ent_table:
    print(entry[1], ':', entry[0])

YouTube : 1000000000.0
Google Play Movies & TV : 1000000000.0
MX Player : 500000000.0
VivaVideo - Video Editor & Photo Movie : 100000000.0
VideoShow-Video Editor, Video Maker, Beauty Camera : 100000000.0
VLC for Android : 100000000.0
Motorola Gallery : 100000000.0
Motorola FM Radio : 100000000.0
Dubsmash : 100000000.0
Vote for : 50000000.0
Vigo Video : 50000000.0
VMate : 50000000.0
Samsung Video Library : 50000000.0
Ringdroid : 50000000.0
MiniMovie - Free Video and Slideshow Editor : 50000000.0
LIKE – Magic Video Maker & Community : 50000000.0
KineMaster – Pro Video Editor : 50000000.0
HD Video Downloader : 2018 Best video mate : 50000000.0
DU Recorder – Screen Recorder, Video Editor, Live : 50000000.0
video player for android : 10000000.0
iMediaShare – Photos & Music : 10000000.0
YouTube Studio : 10000000.0
Video Player All Format : 10000000.0
Video Downloader - for Instagram Repost App : 10000000.0
Video Downloader : 10000000.0
Ustream : 10000000.0
Quik – Free Video Editor for photos

It looks as if the communication apps are dominated by a few widely used apps but many of the other communication apps do not have as many installs which makes the category as a whole seem more popular.  The entertainment category does seem to show more evenly distributed numbers of installs across different apps so a strategy may be to enter into this market with new apps to grow to be popular.

Let's drill down more to see the genres and their average installs per genre since the genre list is a little more granular and may provide more insight.

In [50]:
genres_android = freq_table(android_final,9)
android_genre_tuples = []

for genre in genres_android:
    total = 0
    len_genre = 0
    for app in android_final:
        genres_app = app[9]
        if genres_app == genre:
            n_install = app[5]
            n_install = n_install.replace(',','')
            n_install = n_install.replace('+','')
            total += float(n_install)
            len_genre += 1
    avg_installs_per_genre = total / len_genre
    mytuple = (avg_installs_per_genre, genre)
    android_genre_tuples.append(mytuple)

sorted_android_genre_table = sorted(android_genre_tuples, reverse = True)

for entry in sorted_android_genre_table:
    print(entry[1], ':', round(entry[0]))

Communication : 38456119
Adventure;Action & Adventure : 35333333
Video Players & Editors : 24947336
Social : 23253652
Arcade : 22888365
Casual : 19569222
Puzzle;Action & Adventure : 18366667
Photography : 17840110
Educational;Action & Adventure : 17016667
Productivity : 16787331
Racing : 15910646
Travel & Local : 14051476
Casual;Action & Adventure : 12916667
Action : 12603589
Strategy : 11199903
Tools : 10802461
Tools;Education : 10000000
Role Playing;Brain Games : 10000000
Lifestyle;Pretend Play : 10000000
Casual;Music & Video : 10000000
Card;Action & Adventure : 10000000
Adventure;Education : 10000000
News & Magazines : 9549178
Music : 9445583
Educational;Pretend Play : 9375000
Puzzle;Brain Games : 9280667
Word : 9094459
Racing;Action & Adventure : 8816667
Books & Reference : 8767812
Puzzle : 8302862
Video Players & Editors;Music & Video : 7500000
Shopping : 7036877
Role Playing;Action & Adventure : 7000000
Casual;Pretend Play : 6957143
Entertainment;Music & Video : 6413333
Action;Ac

Recognizing that the Games category seemed to show evidence of high popularity, looking at genres in the Google Play store seems to affirm that same trend.  It seems that the genre "Action & Adventure" is very popular and is able to cover a variety of different apps.

In [51]:
genres_android = freq_table(android_final,9)
android_genre_tuples = []

for genre in genres_android:
    total = 0
    len_genre = 0
    for app in android_final:
        genres_app = app[9]
        if genres_app == genre:
            n_install = app[5]
            n_install = n_install.replace(',','')
            n_install = n_install.replace('+','')
            total += float(n_install)
            len_genre += 1
    avg_installs_per_genre = total / len_genre
    mytuple = (avg_installs_per_genre, genre)
    android_genre_tuples.append(mytuple)

sorted_android_genre_table = sorted(android_genre_tuples, reverse = True)

for entry in sorted_android_genre_table:
    if 'Action & Adventure' in entry[1]:
        print(entry[1], ':', round(entry[0]))

Adventure;Action & Adventure : 35333333
Puzzle;Action & Adventure : 18366667
Educational;Action & Adventure : 17016667
Casual;Action & Adventure : 12916667
Card;Action & Adventure : 10000000
Racing;Action & Adventure : 8816667
Role Playing;Action & Adventure : 7000000
Action;Action & Adventure : 5888889
Sports;Action & Adventure : 5050000
Simulation;Action & Adventure : 4857143
Arcade;Action & Adventure : 3190909
Board;Action & Adventure : 3000000
Entertainment;Action & Adventure : 2333333
Strategy;Action & Adventure : 1000000
Health & Fitness;Action & Adventure : 1000000
Education;Action & Adventure : 1000000
Travel & Local;Action & Adventure : 100000
Art & Design;Action & Adventure : 100000


Observing this specific genre of apps it would seem that if we are wanting to enter into a specific app market, then creating an "Action & Adventure" app that fits within the game genre in the App store would be a good arena to create new apps that will eventually become popular.