# Profitable App Profiles for the App Store and Google Play Markets

Our company builds free Android and iOS mobile apps which are available on Google Play and in the App Store. Our main source of revenue is from in-app ads. The goal of this project is to analyze Google Play and App Store data to help developers determine what type of apps is most likely to attract users and therefore revenue from in-app ads.

Our validation strategy for app development has three steps: (1) Build a minimal Android version of the app, and add it to Google Play. (2) If the app has a good response from users, we develop it further. (3) If the app is profitable after six months, we build an iOS version of the app and add it to the App Store. We are interested in developing apps that can be used on both platforms.

To perform this analysis, we sampled ~10,000 Android apps on Google Play in August 2018 and ~7,000 iOS apps on the App Store in July 2017. You can find the information about and a download of the Android app data [here](https://www.kaggle.com/lava18/google-play-store-apps) and the same for the iOS app data [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).



## -+-+-+-+-+-Using pandas-+-+-+-+-+-

In [1]:
# import packages and initialize variables

import pandas as pd

already_deleted = False

In [2]:
# read in datasets and explore 
ios_df = pd.read_csv('AppleStore.csv')
android_df = pd.read_csv('googleplaystore.csv')

print('\niOS')
print(ios_df.info())
print(ios_df.head())
print('\nAndroid')
print(android_df.info())
print(android_df.head())


iOS
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7197 entries, 0 to 7196
Data columns (total 16 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                7197 non-null   int64  
 1   track_name        7197 non-null   object 
 2   size_bytes        7197 non-null   int64  
 3   currency          7197 non-null   object 
 4   price             7197 non-null   float64
 5   rating_count_tot  7197 non-null   int64  
 6   rating_count_ver  7197 non-null   int64  
 7   user_rating       7197 non-null   float64
 8   user_rating_ver   7197 non-null   float64
 9   ver               7197 non-null   object 
 10  cont_rating       7197 non-null   object 
 11  prime_genre       7197 non-null   object 
 12  sup_devices.num   7197 non-null   int64  
 13  ipadSc_urls.num   7197 non-null   int64  
 14  lang.num          7197 non-null   int64  
 15  vpp_lic           7197 non-null   int64  
dtypes: float64(3), int64(8), object(5)
me

## Cleaning the data

* Only free apps
* Only apps designed for English speakers
* Remove duplicate apps
* Assume bucketed number of installs is actual number of installs
* Android row 10472 missing Category per [Kaggle discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/164101)
* (iOS Kaggle discussion is [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion))

In [3]:
# remove android row missing category

print(len(android_df))
print(android_df[10471:10474])

if not(already_deleted):
    android_df.drop([10472], inplace = True)
    already_deleted = True

print('\n\n\n')
print(len(android_df))
print(android_df[10471:10474])

10841
                                           App         Category  Rating  \
10471                         Xposed Wi-Fi-Pwd  PERSONALIZATION     3.5   
10472  Life Made WI-Fi Touchscreen Photo Frame              1.9    19.0   
10473                  osmino Wi-Fi: free WiFi            TOOLS     4.2   

      Reviews    Size     Installs  Type     Price Content Rating  \
10471    1042    404k     100,000+  Free         0       Everyone   
10472    3.0M  1,000+         Free     0  Everyone            NaN   
10473  134203    4.1M  10,000,000+  Free         0       Everyone   

                  Genres    Last Updated Current Ver   Android Ver  
10471    Personalization  August 5, 2014       3.0.0  4.0.3 and up  
10472  February 11, 2018          1.0.19  4.0 and up           NaN  
10473              Tools  August 7, 2018     6.06.14    4.4 and up  




10840
                           App         Category  Rating Reviews  Size  \
10471         Xposed Wi-Fi-Pwd  PERSONALIZATION     3.5  

In [4]:
# remove duplicate apps

def remove_duplicate_apps(app_df, name_col_name, reviews_col_name):
    
    # sort by the number of reviews
    app_df.sort_values(by=reviews_col_name, inplace=True)
    app_df.reset_index(drop=True, inplace=True)
    
    # keep only the version with the largest number of reviews (since that's the most recent record)
    app_df.drop_duplicates(subset=[name_col_name], keep='last', inplace=True)
    
    return app_df

ios_df = remove_duplicate_apps(ios_df, 'id', 'rating_count_tot')
android_df = remove_duplicate_apps(android_df, 'App', 'Reviews')

In [5]:
# convert bucketed installs to floats

android_df['Installs'] = android_df['Installs'].str.replace('+', '').str.replace(',', '').astype(float)

  android_df['Installs'] = android_df['Installs'].str.replace('+', '').str.replace(',', '').astype(float)


In [6]:
# remove non-free apps

def remove_non_free_apps(app_df, price_col_name):
    # convert column to float - may only work for certain rows or certain datasets
    try:
        app_df[price_col_name] = app_df[price_col_name].str.replace('$', '').astype(float)
    except:
        pass
    
    # filter to only free apps
    app_df = app_df[app_df[price_col_name] == 0.0]
    
    return app_df
    
ios_df = remove_non_free_apps(ios_df, 'price')
android_df = remove_non_free_apps(android_df, 'Price')

  app_df[price_col_name] = app_df[price_col_name].str.replace('$', '').astype(float)


In [7]:
# remove apps whose titles contain 3 or more non-english characters

# count the number of non-english characters in a string
def count_non_eng_char(string):
    cnt = 0
    for char in string:
        if ord(char) > 127:
            cnt += 1
            
    return cnt


# remove apps with a certain number of non-english characters in the name
def remove_non_eng_apps(app_df, name_col_name):
    
    app_df['num non-eng chars'] = app_df[name_col_name].apply(lambda x: count_non_eng_char(x))
    
    app_df = app_df[app_df['num non-eng chars'] < 3]
    
    return app_df


ios_df = remove_non_eng_apps(ios_df, 'track_name')
android_df = remove_non_eng_apps(android_df, 'App')


## Analyzing the data

We get the number of each type of app in the app store then analyze these results. On iOS, the most common type of app is **Games** and there is a wide gap between that and the next most common. On Android, the most common category is **Family**, which has double the number of apps as the next most common categories: **Game** and **Tools**. The most common genres on Android support this as well: **Tools**, followed by **Entertainment**.

However, the number of apps available in each genre does not indicate which apps are most popular. We look to see which Android apps are the most frequently installed and which iOS apps have the largest number of ratings (since the number of installs is not available). On iOS, practical apps have the largest number of reviews: **Navigation** and **Reference** apps top the list. On Android, **Communication** apps have the most installs, followed by **Video Players** and **Social** apps, but **Action/Adventure** is the second most popular subgenre.

Digging in deeper, we find that the top categories on iOS are heavily skewed by a few apps: Waze and Google Maps for **Navigation** and the Bible for **Reference**. The categories with the next largest number of ratings are **Music** (dominated by Pandora and Spotify) and **Weather**, which has more evenly distributed rating numbers among its apps. Although the **Reference** and **Music** categories are skewed, the apps that don't top the list still have a large number of ratings.

Android's top categories include a few with over a billion downloads (e.g. Whatsapp and Skype under **Communication**, Youtube and Google Play under **Video Players**, and Instagram and Facebook under **Social**). The **Action/Adventure** subgenre is heavily skewed, dominated by ROBLOX with over 100,000,000,000 installs. The apps in all the top categories on Android all seem to have a large number of installs. 

(What category does music fall in on Android? It's difficult to tell. It's all over the board and Spotify and Pandora don't exist, making the accuracy of this data suspect.)

In [8]:
# get the number of apps that exist in each category

print('iOS - prime_genre')
print(ios_df['prime_genre'].value_counts())

print('\nAndroid - Category')
print(android_df['Category'].value_counts())

print('\nAndroid - Genres')
print(android_df['Genres'].value_counts())

iOS - prime_genre
Games                1866
Entertainment         251
Photo & Video         160
Education             118
Social Networking     106
Shopping               83
Utilities              79
Sports                 69
Music                  66
Health & Fitness       65
Productivity           56
Lifestyle              50
News                   43
Travel                 40
Finance                35
Weather                28
Food & Drink           26
Reference              17
Business               17
Book                   12
Medical                 6
Navigation              6
Catalogs                4
Name: prime_genre, dtype: int64

Android - Category
FAMILY                 1680
GAME                    855
TOOLS                   747
BUSINESS                407
PRODUCTIVITY            345
LIFESTYLE               344
FINANCE                 328
MEDICAL                 312
SPORTS                  300
PERSONALIZATION         294
COMMUNICATION           286
HEALTH_AND_FITNESS      

In [9]:
ios_num_installs_df = ios_df[['prime_genre', 'rating_count_tot']].groupby('prime_genre').mean().reset_index().sort_values(by='rating_count_tot', ascending=False)
android_num_installs_cat_df = android_df[['Category', 'Installs']].groupby('Category').mean().reset_index().sort_values(by='Installs', ascending=False)
android_num_installs_gen_df = android_df[['Genres', 'Installs']].groupby('Genres').mean().reset_index().sort_values(by='Installs', ascending=False)

print('iOS - prime_genre')
print(ios_num_installs_df)
print('\nAndroid - Category')
print(android_num_installs_cat_df)
print('\nAndroid - Genres')
print(android_num_installs_gen_df)



iOS - prime_genre
          prime_genre  rating_count_tot
12         Navigation      86090.333333
16          Reference      79350.470588
18  Social Networking      71548.349057
11              Music      57326.530303
22            Weather      52279.892857
0                Book      46384.916667
6        Food & Drink      33333.923077
5             Finance      32367.028571
14      Photo & Video      28441.543750
20             Travel      28243.800000
17           Shopping      27230.734940
8    Health & Fitness      23298.015385
19             Sports      23008.898551
7               Games      22886.367095
13               News      21248.023256
15       Productivity      21028.410714
21          Utilities      19156.493671
9           Lifestyle      16815.480000
4       Entertainment      14195.358566
1            Business       7491.117647
3           Education       7003.983051
2            Catalogs       4004.000000
10            Medical        612.000000

Android - Category
  

In [10]:
print('iOS - Navigation')
print(ios_df[ios_df['prime_genre'] == 'Navigation'][['track_name', 'rating_count_tot']].sort_values(by='rating_count_tot', ascending=False))
print('iOS - Reference')
print(ios_df[ios_df['prime_genre'] == 'Reference'][['track_name', 'rating_count_tot']].sort_values(by='rating_count_tot', ascending=False))
print('\niOS - Music')
print(ios_df[ios_df['prime_genre'] == 'Music'][['track_name', 'rating_count_tot']].sort_values(by='rating_count_tot', ascending=False))
print('\niOS - Weather')
print(ios_df[ios_df['prime_genre'] == 'Weather'][['track_name', 'rating_count_tot']].sort_values(by='rating_count_tot', ascending=False))

print('\nAndroid - COMMUNICATION')
print(android_df[android_df['Category'] == 'COMMUNICATION'][['App', 'Installs']].sort_values(by='Installs', ascending=False))
print('\nAndroid - VIDEO_PLAYERS')
print(android_df[android_df['Category'] == 'VIDEO_PLAYERS'][['App', 'Installs']].sort_values(by='Installs', ascending=False))
print('\nAndroid - SOCIAL')
print(android_df[android_df['Category'] == 'SOCIAL'][['App', 'Installs']].sort_values(by='Installs', ascending=False))
print('\nAndroid - Adventure;Action & Adventure')
print(android_df[android_df['Genres'] == 'Adventure;Action & Adventure'][['App', 'Installs']].sort_values(by='Installs', ascending=False))


iOS - Navigation
                                            track_name  rating_count_tot
7147   Waze - GPS Navigation, Maps & Real-time Traffic            345046
7066                Google Maps - Navigation & Transit            154911
6315                                       Geocaching®             12811
5563       CoPilot GPS – Car Navigation & Offline Maps              3582
3207  ImmobilienScout24: Real Estate Search in Germany               187
1136                              Railway Route Search                 5
iOS - Reference
                                             track_name  rating_count_tot
7190                                              Bible            985920
7106              Dictionary.com Dictionary & Thesaurus            200047
6861     Dictionary.com Dictionary & Thesaurus for iPad             54175
6645                                   Google Translate             26786
6481  Muslim Pro: Ramadan 2017 Prayer Times, Azan, Q...             18418
6458  New Fu

In [11]:
android_music_apps = android_df[(android_df['App'].str.upper().str.contains('MUSIC|SPOTIFY|PANDORA', regex=True))]
print(android_music_apps[['App', 'Category', 'Genres']].sort_values(by=['Category', 'Genres']))

                                                    App             Category  \
1183  Install images with music to make video withou...       ART_AND_DESIGN   
7161                             Free Panda Radio Music  BOOKS_AND_REFERENCE   
174                            JoJo Siwa Top Hits Music               DATING   
839            Ost. Zombies Cast - New Music and Lyrics               DATING   
860                     Soy Luna - Top Music And Lyrics               DATING   
9018                       Kpop Music Quiz (K-pop Game)               FAMILY   
21                           AJ Tracey Music and Lyrics               FAMILY   
25                               AJ Rafael Music Lyrics               FAMILY   
228                                          music (CG)               FAMILY   
421                                            EP Music               FAMILY   
638                             AJ Rafael Music Lessons               FAMILY   
2527        FD VR Music Videos - MTV Pop

## Conclusions

What type of app should our company develop in order to get the largest possible number of users on both iOS and Android? iOS users tend to use practical apps for navigation, reference, and weather, while Android users tend to use their apps for communication and social media. However, iOS users also enjoy music apps while Android users also enjoy video apps. Neither of these markets is overly saturated; there are 160 photo & video apps and 66 music apps on iOS, and 159 video apps on Android. Something like a music video app could be popular on both systems.

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

## -+-+-+-+-+-Using the Dataquest instructions-+-+-+-+-+-

In [12]:
# import packages and initialize variables

from csv import reader

already_deleted = False


In [13]:
# function to explore datasets

# dataset = list of lists
# start, end = integers representing the start and end indices of a slice of the dataset
# rows_and_columns = should it display the number of rows and columns?

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [14]:
# read in the datasets and explore

def read_file_as_lists(csv_file):
    opened_file = open(csv_file)
    read_file = reader(opened_file)
    apps_data = list(read_file)
    
    return apps_data

ios_apps = read_file_as_lists('AppleStore.csv')
android_apps = read_file_as_lists('googleplaystore.csv')

print('\niOS')
explore_data(ios_apps, 0, 5, True)
print('\nAndroid')
explore_data(android_apps, 0, 5, True)

# potentially useful ios columns: price, rating_count_tot, rating_count_ver, user_rating, user_rating_ver, prime_genre, 
# potentially useful android columns: category, rating, reviews, installs, type, price, genres



iOS
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16

Android
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'

## Cleaning the data

* Only free apps
* Only apps designed for English speakers
* Remove duplicate apps
* Assume bucketed number of installs is actual number of installs
* Android row 10472 missing Category per [Kaggle discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/164101)
* (iOS Kaggle discussion is [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion))

**Remove Android row 10472, per the Kaggle discussion**

Instead, remove all rows that have fewer than 13 entries - that indicates a field is missing. Only one row was removed: 10473.

In [15]:
# remove android row missing category

if not(already_deleted):
    print('\n')
    print(android_apps[0])
    
    row_num = 0
    for app in android_apps[1:]:
        row_num += 1
        if len(app) < 13:
            print(row_num)
            del android_apps[row_num]
            already_deleted = True
    
print(len(android_apps))
# should be 10841
    



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
10473
10841


**Transform bucketed number of installs**

Number of installs is formatted like "10,000+", bucketing the actual numbers. Transform this field, removing the "+" and assuming the minimum number in the bucket is the actual number of installs

In [16]:
# convert number of ratings to numbers in android

for app in android_apps[1:]:
    num_installs = app[5]
    app[5] = float(num_installs.replace('+', '').replace(',', ''))

**Remove duplicates**

First, find the number of duplicates and print some examples. Then, loop through the data to find the duplicate with the largest number of reviews, as this indicates that it's the most recent record. Finally, create a new, clean dataset that contains only the unique apps and the most recent duplicates (i.e. the duplicate with the largest number of reviews.)

In [17]:
# remove duplicates

# get the number of dupes
def get_dupe_info(app_data, name_col):
    duplicate_apps = []
    unique_apps = []
    
    for app in app_data[1:]:
        name = app[name_col]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    
    print('Number of duplicate app rows: ' + str(len(duplicate_apps)))
    print('Examples of duplicate apps: ')
    print(duplicate_apps[:5])

print('\niOS')
get_dupe_info(ios_apps, 0)
# no dupes in ios
print('\nAndroid')
get_dupe_info(android_apps, 0)


# get the version of each dupes that should be kept
def get_dupes_to_keep(app_data, name_col, review_col):
    
    # get the dupe app with the largest number of reviews
    reviews_max = {}
    for app in app_data[1:]:
        name = app[name_col]
        n_reviews = float(app[review_col])
        
        # if the app isn't in the dict add it; if the app is in the dict with fewer reviews than the current row, add the current row instead
        if name in reviews_max:
            if reviews_max[name] < n_reviews:
                reviews_max[name] = n_reviews
        else:
            reviews_max[name] = n_reviews
        
    # check that it's correct
    print(len(reviews_max))   # should be 9659 for android
    
    return reviews_max
        
# print('\niOS')
# ios_reviews_max = get_dupes_to_keep(ios_apps, 0, 5)
print('\nAndroid')
android_reviews_max = get_dupes_to_keep(android_apps, 0, 3)


# get a dataset with no dupes
def remove_dupes(app_data, name_col, review_col, reviews_max):
    apps_clean = []
    already_added = []
    for app in app_data[1:]:
        name = app[name_col]
        n_reviews = float(app[review_col])
        
        # if the app has the max number of reviews, add it to the clean dataset
        if (n_reviews == reviews_max[name]) and (name not in already_added):
            apps_clean.append(app)
            already_added.append(name)
            
    return apps_clean
            
# print('\niOS')
# ios_clean = remove_dupes(ios_apps, 0, 5, ios_reviews_max)
ios_clean = ios_apps.copy()
explore_data(ios_clean, 1, 5, True)
print('\nAndroid')
android_clean = remove_dupes(android_apps, 0, 3, android_reviews_max)
explore_data(android_clean, 1, 5, True)



iOS
Number of duplicate app rows: 0
Examples of duplicate apps: 
[]

Android
Number of duplicate app rows: 1181
Examples of duplicate apps: 
['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']

Android
9659
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7198
Number of columns: 16

Android
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', 5000000.0, 'Free',

**Remove non-English apps**

If more than three characters in the app name are not English characters, remove it from our list.

In [18]:
# determine if a string has fewer than 3 non-english characters
def is_english(string):
    cnt = 0
    for char in string:
        if ord(char) > 127:
            cnt += 1
            
        if cnt >= 3:
            return False
            
    return True

# # test
# print(is_english('Instagram'))
# print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
# print(is_english('Docs To Go™ Free Office Suite'))
# print(is_english('Instachat 😜'))

# add only apps with english names to new dataset
def remove_non_english_apps(app_data, name_col):
    apps_eng = []
    for app in app_data[1:]:
        name = app[name_col]
        if is_english(name):
            apps_eng.append(app)
    
    return apps_eng

print('\niOS')
ios_clean = remove_non_english_apps(ios_clean, 1)
explore_data(ios_clean, 1, 5, True)
print('\nAndroid')
android_clean = remove_non_english_apps(android_clean, 0)
explore_data(android_clean, 1, 5, True)


iOS
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 6155
Number of columns: 16

Android
['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', 50000000.0, 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', 100000.0, 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers inst

**Remove non-free apps**

If an app costs money, it's not included in our dataset.

In [19]:
# keep only apps with prices of 0
def remove_non_free_apps(app_data, price_col):
    apps_free = []
    for app in app_data[1:]:
        price = float(app[price_col].replace('$', '').replace(',', ''))
        if price == 0:
            apps_free.append(app)
            
    return apps_free
        
print('\niOS')
ios_clean = remove_non_free_apps(ios_clean, 4)
explore_data(ios_clean, 1, 5, True)
print('\nAndroid')
android_clean = remove_non_free_apps(android_clean, 7)
explore_data(android_clean, 1, 5, True)


iOS
['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


Number of rows: 3202
Number of columns: 16

Android
['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', 100000.0, 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', 50000.0, 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


['Smoke Effect Photo Maker 

## Analyzing the data

We get the number of each type of app in the app store then analyze these results. On iOS, the most common type of app is **Games** and there is a wide gap between that and the next most common. On Android, the most common category is **Family**, which has double the number of apps as the next most common categories: **Game** and **Tools**. The most common genres on Android support this as well: **Tools**, followed by **Entertainment**.

However, the number of apps available in each genre does not indicate which apps are most popular. We look to see which Android apps are the most frequently installed and which iOS apps have the largest number of ratings (since the number of installs is not available). On iOS, practical apps have the largest number of reviews: **Navigation** and **Reference** apps top the list. On Android, **Communication** apps have the most installs, followed by **Video Players** and **Social** apps, but **Action/Adventure** is the second most popular subgenre.

Digging in deeper, we find that the top categories on iOS are heavily skewed by a few apps: Waze and Google Maps for **Navigation** and the Bible for **Reference**. The categories with the next largest number of ratings are **Music** (dominated by Pandora and Spotify) and **Weather**, which has more evenly distributed rating numbers among its apps. Although the **Reference** and **Music** categories are skewed, the apps that don't top the list still have a large number of ratings.

Android's top categories include a few with over a billion downloads (e.g. Whatsapp and Skype under **Communication**, Youtube and Google Play under **Video Players**, and Instagram and Facebook under **Social**). The **Action/Adventure** subgenre is heavily skewed, dominated by ROBLOX with over 100,000,000,000 installs. The apps in all the top categories on Android all seem to have a large number of installs. 

(What category does music fall in on Android? It's difficult to tell. It's all over the board and Spotify and Pandora don't exist, making the accuracy of this data suspect.)

In [20]:
# create a frequency table for a chosen column
def freq_table(app_data, col):
    freq_dict = {}
    for app in app_data:
        if app[col] not in freq_dict:
            freq_dict[app[col]] = 1
        else:
            freq_dict[app[col]] += 1
            
    return freq_dict
            

def display_table(table, max_rows = 1000000):
    table_display = []
    
    #transform the dictionary into a list of tuples
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    # sort the tuples in descending order and print
    table_sorted = sorted(table_display, reverse = True)
    cnt = 0
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        cnt += 1
        
        if cnt >= max_rows:
            break
        
print('\niOS - prime_genre')
ios_table = freq_table(ios_clean, 11)
display_table(ios_table)
print('\nAndroid - Category')
android_cat_table = freq_table(android_clean, 1)
display_table(android_cat_table)
print('\nAndroid - Genres')
android_genre_table = freq_table(android_clean, 9)
display_table(android_genre_table)


iOS - prime_genre
Games : 1866
Entertainment : 251
Photo & Video : 160
Education : 118
Social Networking : 105
Shopping : 83
Utilities : 79
Sports : 69
Music : 66
Health & Fitness : 65
Productivity : 56
Lifestyle : 50
News : 43
Travel : 40
Finance : 35
Weather : 28
Food & Drink : 26
Reference : 17
Business : 17
Book : 12
Navigation : 6
Medical : 6
Catalogs : 4

Android - Category
FAMILY : 1676
GAME : 858
TOOLS : 748
BUSINESS : 407
PRODUCTIVITY : 345
LIFESTYLE : 344
FINANCE : 328
MEDICAL : 313
SPORTS : 300
PERSONALIZATION : 294
COMMUNICATION : 286
HEALTH_AND_FITNESS : 273
PHOTOGRAPHY : 261
NEWS_AND_MAGAZINES : 248
SOCIAL : 236
TRAVEL_AND_LOCAL : 207
SHOPPING : 199
BOOKS_AND_REFERENCE : 189
DATING : 165
VIDEO_PLAYERS : 159
MAPS_AND_NAVIGATION : 123
FOOD_AND_DRINK : 110
EDUCATION : 103
ENTERTAINMENT : 85
LIBRARIES_AND_DEMO : 83
AUTO_AND_VEHICLES : 82
HOUSE_AND_HOME : 71
WEATHER : 70
EVENTS : 63
PARENTING : 58
ART_AND_DESIGN : 55
COMICS : 54
BEAUTY : 53

Android - Genres
Tools : 747
Enter

In [21]:
def get_avg_ratings(app_data, genre_col, ratings_col):
    app_freq_dict = freq_table(app_data, genre_col)
    
    genre_rating_dict = {}
    
    for genre in app_freq_dict:
        total = 0
        len_genre = 0
        
        for app in app_data:
            genre_app = app[genre_col]
            if genre_app == genre:
                num_rating = float(app[ratings_col])
                total += num_rating
                len_genre += 1
                
        avg_num_rating = total / len_genre
        
        genre_rating_dict[genre] = round(avg_num_rating, 2)
        
    display_table(genre_rating_dict)
                
print('\niOS - prime_genre avg # ratings')        
get_avg_ratings(ios_clean, 11, 5)
print('\nAndroid - Category')
get_avg_ratings(android_clean, 1, 5)
print('\nAndroid - Genres')
get_avg_ratings(android_clean, 9, 5)


iOS - prime_genre avg # ratings
Navigation : 86090.33
Reference : 79350.47
Music : 57326.53
Weather : 52279.89
Book : 46384.92
Social Networking : 43899.51
Food & Drink : 33333.92
Finance : 32367.03
Photo & Video : 28441.54
Travel : 28243.8
Shopping : 27230.73
Health & Fitness : 23298.02
Sports : 23008.9
Games : 22886.37
News : 21248.02
Productivity : 21028.41
Utilities : 19156.49
Lifestyle : 16815.48
Entertainment : 14195.36
Business : 7491.12
Education : 7003.98
Catalogs : 4004.0
Medical : 612.0

Android - Category
COMMUNICATION : 38590581.09
VIDEO_PLAYERS : 24727872.45
SOCIAL : 23253652.13
PHOTOGRAPHY : 17840110.4
PRODUCTIVITY : 16787331.34
GAME : 15544014.51
TRAVEL_AND_LOCAL : 13984077.71
ENTERTAINMENT : 11640705.88
TOOLS : 10830251.97
NEWS_AND_MAGAZINES : 9549178.47
BOOKS_AND_REFERENCE : 8814199.79
SHOPPING : 7036877.31
PERSONALIZATION : 5201482.61
WEATHER : 5145550.29
HEALTH_AND_FITNESS : 4188821.99
MAPS_AND_NAVIGATION : 4049274.63
FAMILY : 3695641.82
SPORTS : 3650602.28
ART_AND

In [22]:
# get the apps with the largest number of installs to check for skew
def convert_app_to_table(app_data, cat, cat_col, name_col, sort_col):
    app_dict = {}
    
    for app in app_data:
        if app[cat_col] == cat:
            app_dict[app[name_col]] = float(app[sort_col])
            
    display_table(app_dict, 20)
    
print('\niOS - Navigation')
convert_app_to_table(ios_clean, 'Navigation', 11, 1, 5)
print('\niOS - Reference')
convert_app_to_table(ios_clean, 'Reference', 11, 1, 5)
print('\niOS - Music')
convert_app_to_table(ios_clean, 'Music', 11, 1, 5)
print('\niOS - Weather')
convert_app_to_table(ios_clean, 'Weather', 11, 1, 5)
print('\nAndroid - COMMUNICATION')
convert_app_to_table(android_clean, 'COMMUNICATION', 1, 0, 5)
print('\nAndroid - VIDEO_PLAYERS')
convert_app_to_table(android_clean, 'VIDEO_PLAYERS', 1, 0, 5)
print('\nAndroid - SOCIAL')
convert_app_to_table(android_clean, 'SOCIAL', 1, 0, 5)
print('\nAndroid - Communication')
convert_app_to_table(android_clean, 'Communication', 9, 0, 5)
print('\nAndroid - Adventure;Action & Adventure')
convert_app_to_table(android_clean, 'Adventure;Action & Adventure', 9, 0, 5)
print('\nAndroid - Video Players & Editors')
convert_app_to_table(android_clean, 'Video Players & Editors', 9, 0, 5)


iOS - Navigation
Waze - GPS Navigation, Maps & Real-time Traffic : 345046.0
Google Maps - Navigation & Transit : 154911.0
Geocaching® : 12811.0
CoPilot GPS – Car Navigation & Offline Maps : 3582.0
ImmobilienScout24: Real Estate Search in Germany : 187.0
Railway Route Search : 5.0

iOS - Reference
Bible : 985920.0
Dictionary.com Dictionary & Thesaurus : 200047.0
Dictionary.com Dictionary & Thesaurus for iPad : 54175.0
Google Translate : 26786.0
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418.0
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588.0
Merriam-Webster Dictionary : 16849.0
Night Sky : 12122.0
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535.0
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693.0
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497.0
Guides for Pokémon GO - Pokemon GO News and Cheats : 826.0
WWDC : 762.0
Horror Maps for Minecraft PE - Downlo

In [23]:
# what is music on android?
for app in android_clean:
    if ('MUSIC' in app[0].upper() or 'SPOTIFY' in app[0].upper() or 'PANDORA' in app[0].upper()):
        print('\n')
        print(app[0])
        print(app[1])

print('done')



Install images with music to make video without Net - 2018
ART_AND_DESIGN


Free Panda Radio Music
BOOKS_AND_REFERENCE


Ost. Zombies Cast - New Music and Lyrics
DATING


Soy Luna - Top Music And Lyrics
DATING


JoJo Siwa Top Hits Music
DATING


Cookpad - FREE recipe search makes fun cooking · musical making!
FOOD_AND_DRINK


Meditation Music - Relax, Yoga
HEALTH_AND_FITNESS


Piano Kids - Music & Songs
FAMILY


Video Editor Music,Cut,No Crop
PHOTOGRAPHY


Share Music & Transfer Files - Xender
TOOLS


Classical music for baby
PARENTING


Baby Panda Musical Genius
PARENTING


Music - Mp3 Player
VIDEO_PLAYERS


Video Editor,Crop Video,Movie Video,Music,Effects
VIDEO_PLAYERS


Tik Tok - including musical.ly
SOCIAL


Kpop Music Quiz (K-pop Game)
FAMILY


K-pop Music
FAMILY


Rock n Roll Music Quiz Game
GAME


DJMAX TECHNIKA Q - Music Game
GAME


AJ Rafael Music Lyrics
FAMILY


AJ Tracey Music and Lyrics
FAMILY


AJ Rafael Music Lessons
FAMILY


iMediaShare – Photos & Music
VIDEO_PLAYERS


## Conclusions

What type of app should our company develop in order to get the largest possible number of users on both iOS and Android? iOS users tend to use practical apps for navigation, reference, and weather, while Android users tend to use their apps for communication and social media. However, iOS users also enjoy music apps while Android users also enjoy video apps. Neither of these markets is overly saturated; there are 160 photo & video apps and 66 music apps on iOS, and 159 video apps on Android. Something like a music video app could be popular on both systems.