# Guided Project (Dataquest)
## Profitable App Profiles for the App Store and Google Play Markets

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. 
We make our apps available on Google Play and in the App Store.

We only build apps that are free to download and install and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

### 1. Extract data
We need to import the datasets for Android Apps and iOS Apps

In [1]:
#Data from Android and iOS apps

path_android = 'googleplaystore.csv'
path_ios = 'AppleStore.csv'

#extract data from a dataset
def getData(dataset):
    from csv import reader
    open_file = open(dataset, encoding='utf8')
    read_file = reader(open_file)
    data = list(read_file)
    return data

google_apps = getData(path_android)
apple_apps = getData(path_ios)
    

### 2. Explore data
A first approach to see how data is structured

In [2]:
#Explore data
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print('Exploring Android Apps data: \n')        
explore_data(google_apps,0,4,True)
print('\nExploring iOS Apps data: \n') 
explore_data(apple_apps,0,4,True)

Exploring Android Apps data: 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10842
Number of columns: 13

Exploring iOS Apps data: 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.nu

### 3. Cleaning data
Filtering data which is not interesting for this analysis. We'll only study free apps for English speakers. 
We'll remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播 and apps that aren't free.

In [3]:
#There is an error in row 10473 for Google. There is not Category info:
print(google_apps[0])
print(google_apps[10472])
print(google_apps[10473])
print(google_apps[10474])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Xposed Wi-Fi-Pwd', 'PERSONALIZATION', '3.5', '1042', '404k', '100,000+', 'Free', '0', 'Everyone', 'Personalization', 'August 5, 2014', '3.0.0', '4.0.3 and up']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [4]:
#We decided to remove this row
del google_apps[10473]
print(google_apps[10473])

['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


In [5]:
#There are rows duplicated in Android and iOS dataset
def checkFreq(dataset, typeofApp):
    freq = {}
    if typeofApp == 'android': 
        for row in dataset:
            name=row[0]
            if name in freq:
                freq[name] +=1
            else:
                freq[name] = 1
    elif typeofApp == 'ios':
        for row in dataset:
            name=row[1]
            if name in freq:
                freq[name] +=1
            else:
                freq[name] = 1
    return freq

def checkDuplicated (dataset,typeofApp):
    duplicated = {}
    freq = checkFreq (dataset,typeofApp)
    for element in freq:
        if freq[element] > 1:
            duplicated[element] = freq[element]
    return duplicated
            

dup_google = checkDuplicated(google_apps,'android')
dup_apple = checkDuplicated(apple_apps,'ios')

print ('Android apps duplicated: '+str(len(dup_google))+'. Total Android Apps: '+str(len(google_apps[1:])))
print(dup_google)
print('\n')
print ('iOS apps duplicated: '+str(len(dup_apple))+'. Total Apple Apps: '+str(len(apple_apps[1:])))
print(dup_apple)


Android apps duplicated: 798. Total Android Apps: 10840
{'Coloring book moana': 2, 'Mcqueen Coloring pages': 2, 'UNICORN - Color By Number & Pixel Art Coloring': 2, 'Textgram - write on photos': 2, 'Wattpad 📖 Free Books': 2, 'Amazon Kindle': 2, 'Dictionary - Merriam-Webster': 2, 'NOOK: Read eBooks & Magazines': 2, 'Oxford Dictionary of English : Free': 2, 'Spanish English Translator': 2, 'NOOK App for NOOK Devices': 2, 'Ebook Reader': 2, 'English Dictionary - Offline': 2, 'Docs To Go™ Free Office Suite': 2, 'Google My Business': 3, 'OfficeSuite : Free Office + PDF Editor': 2, 'Curriculum vitae App CV Builder Free Resume Maker': 2, 'Facebook Pages Manager': 2, 'Box': 3, 'Call Blocker': 2, 'ZOOM Cloud Meetings': 2, 'Facebook Ads Manager': 2, 'Quick PDF Scanner + OCR FREE': 3, 'SignEasy | Sign and Fill PDF and other Documents': 2, 'Genius Scan - PDF Scanner': 2, 'Tiny Scanner - PDF Scanner App': 2, 'Fast Scanner : Free PDF Scan': 2, 'Mobile Doc Scanner (MDScan) Lite': 2, 'TurboScan: scan 

In [6]:
#We will remove this duplicated rows following the criteria of maximum number of reviews in Google

##Searching for the index of max rating
def maxRatingPosition(dataset,name,typeofApp):
    
    maxRevIndex = 0
    maxRev = 0
    index = 0
    
    if typeofApp == 'android':
        for row in dataset[1:]:
            index += 1
            if row[0] == name:
                tempRev = int(row[3])
                if maxRev < tempRev:
                    maxRev = tempRev
                    maxRevIndex = index
    elif typeofApp == 'ios':
        for row in dataset[1:]:
            index += 1
            if row[1] == name:
                tempRev = int(row[5])
                if maxRev < tempRev:
                    maxRev = tempRev
                    maxRevIndex = index
                    
    return maxRevIndex
    

#create a new dataset without duplicated app
def removeDuplicated(dataset,typeofApp):
    
    dataNotDup = [dataset[0]]
    index = 0
    
    if typeofApp == 'android':
        for row in dataset[1:]:
            name=row[0]
            index += 1
            if name in dup_google:
                maxIndex = maxRatingPosition(dataset,name,typeofApp)
                if maxIndex == index:
                    dataNotDup.append(row)
            else:
                dataNotDup.append(row)
            
    elif typeofApp == 'ios':
         for row in dataset[1:]:
            name=row[1]
            index += 1
            if name in dup_apple:
                maxIndex = maxRatingPosition(dataset,name,typeofApp)
                if maxIndex == index:
                    dataNotDup.append(row)
            else:
                dataNotDup.append(row)
            
                            
    return dataNotDup

#Datasets without duplicated apps:
google_apps = removeDuplicated(google_apps,'android')
apple_apps = removeDuplicated(apple_apps,'ios')


In [7]:
#Check duplicated apps again:
dup_google = checkDuplicated(google_apps,'android')
dup_apple = checkDuplicated(apple_apps,'ios')

print ('Android apps duplicated: '+str(len(dup_google))+'. Total Android Apps: '+str(len(google_apps[1:])))
print(dup_google)
print('\n')
print ('iOS apps duplicated: '+str(len(dup_apple))+'. Total Apple Apps: '+str(len(apple_apps[1:])))
print(dup_apple)

Android apps duplicated: 0. Total Android Apps: 9656
{}


iOS apps duplicated: 0. Total Apple Apps: 7195
{}


In [8]:
#English speakers

#Determine if a string is English or not thanks to ASCII code (0-127)
def isEnglish_v1(str):
    for i in str:
        if ord(i)<0 or ord(i)>=127:
            return False
    
    return True

#Check function

print(isEnglish_v1('Instagram'))
print(isEnglish_v1('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish_v1('Docs To Go™ Free Office Suite'))
print(isEnglish_v1('Instachat 😜'))          

True
False
False
False


In [9]:
#Not really useful cause we are loosing information due to ™ or emojis
#We are going to redefine the function with a 3 letters non-english threshold

def isEnglish(str):
    nbr_false = 0
    for i in str:
        if ord(i)<0 or ord(i)>=127:
            nbr_false += 1
            if nbr_false > 3: 
                return False
    
    return True

#Check function

print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))          

True
False
True
True


In [10]:
#English Apps Filer

def englisgApps (dataset, typeofApp):
    e_apps = []
    for row in dataset[1:]:
        if typeofApp == 'ios':
            name = row[1]
        elif typeofApp == 'android':
            name = row[0]          
        
        if isEnglish(name):
            e_apps.append(row)               
    return e_apps

#Datasets with English apps:
google_apps = englisgApps(google_apps,'android')
apple_apps = englisgApps(apple_apps,'ios')

print ('English Android apps: '+str(len(google_apps)))
print ('English iOS apps: '+str(len(apple_apps)))  

English Android apps: 9611
English iOS apps: 6181


In [11]:
#Free Apps Filter

def freeApps(dataset,typeofApp):
    f_apps = []
    if typeofApp == 'ios':
        for row in dataset[1:]:
            price = float(row[4])
            if price == 0.0:
                f_apps.append(row)
    elif typeofApp == 'android':
        for row in dataset[1:]:
            price = row[6]
            if price == 'Free':
                f_apps.append(row)
                
    return f_apps


#Datasets for Free & English apps:
google_apps = freeApps(google_apps,'android')
apple_apps = freeApps(apple_apps,'ios')

print ('English & Free Android apps: '+str(len(google_apps)))
print ('English & Free iOS apps: '+str(len(apple_apps)))  

English & Free Android apps: 8859
English & Free iOS apps: 3219


## 4. Which apps attract more users?
Our goal is to determine the kinds of apps that are likely to attract more users because the number of people using our apps affect our revenue.

We will determine the most common genres for each market. For this, we'll need to build frequency tables for a few columns in our datasets. 

We start identifying which columns are more interesting for our analysis. Our conclusion is that we'll need to build a frequency table for the `prime_genre` column of the App Store data set, and for the `Genres` and `Category` columns of the Google Play data set.

In [12]:
#We are going to modify the function checkFreq we previously defined to admit any column (index). With the index data, typeofApp parameter is not useful anymore. 

def checkFreq(dataset, index):
    freq = {}
    for row in dataset:
        column = row[index]
        if column in freq:
            freq[column] +=1
        else:
            freq[column] = 1
    return freq

#iOS: prime_genre column is in index 11
#Android: Category column is in index 1 and Genres is index 9

f_genre_apple = checkFreq(apple_apps,11)
f_category_google = checkFreq(google_apps,1)
f_genre_google = checkFreq(google_apps,9)

print('Frequency Genre in iOS:')
print(f_genre_apple)
print('Frequency Category in Android:')
print(f_category_google)
print('Frequency Genre in Android:')
print(f_genre_google)

Frequency Genre in iOS:
{'Photo & Video': 160, 'Games': 1872, 'Music': 66, 'Social Networking': 105, 'Reference': 18, 'Health & Fitness': 65, 'Weather': 28, 'Utilities': 81, 'Travel': 40, 'Shopping': 84, 'News': 43, 'Navigation': 6, 'Lifestyle': 51, 'Entertainment': 254, 'Food & Drink': 26, 'Sports': 69, 'Book': 14, 'Finance': 36, 'Education': 118, 'Productivity': 56, 'Business': 17, 'Catalogs': 4, 'Medical': 6}
Frequency Category in Android:
{'ART_AND_DESIGN': 56, 'AUTO_AND_VEHICLES': 82, 'BEAUTY': 53, 'BOOKS_AND_REFERENCE': 190, 'BUSINESS': 407, 'COMICS': 55, 'COMMUNICATION': 287, 'DATING': 165, 'EDUCATION': 103, 'ENTERTAINMENT': 85, 'EVENTS': 63, 'FINANCE': 328, 'FOOD_AND_DRINK': 110, 'HEALTH_AND_FITNESS': 273, 'HOUSE_AND_HOME': 73, 'LIBRARIES_AND_DEMO': 83, 'LIFESTYLE': 346, 'GAME': 862, 'FAMILY': 1675, 'MEDICAL': 310, 'SOCIAL': 236, 'SHOPPING': 199, 'PHOTOGRAPHY': 261, 'SPORTS': 301, 'TRAVEL_AND_LOCAL': 207, 'TOOLS': 750, 'PERSONALIZATION': 294, 'PRODUCTIVITY': 345, 'PARENTING': 5

In [13]:
#We are going to check last data in % 

def checkPerc(dataset,freq):
    total = len(dataset)
    for element in freq:
        freq[element] = round((100 * freq[element] / total),2)
    return freq

p_genre_apple = checkPerc(apple_apps,f_genre_apple)
p_category_google = checkPerc(google_apps,f_category_google)
p_genre_google = checkPerc(google_apps,f_genre_google)

print('% Genre in iOS:')
print(p_genre_apple)
print('% Category in Android:')
print(p_category_google)
print('% Genre in Android:')
print(p_genre_google)

% Genre in iOS:
{'Photo & Video': 4.97, 'Games': 58.15, 'Music': 2.05, 'Social Networking': 3.26, 'Reference': 0.56, 'Health & Fitness': 2.02, 'Weather': 0.87, 'Utilities': 2.52, 'Travel': 1.24, 'Shopping': 2.61, 'News': 1.34, 'Navigation': 0.19, 'Lifestyle': 1.58, 'Entertainment': 7.89, 'Food & Drink': 0.81, 'Sports': 2.14, 'Book': 0.43, 'Finance': 1.12, 'Education': 3.67, 'Productivity': 1.74, 'Business': 0.53, 'Catalogs': 0.12, 'Medical': 0.19}
% Category in Android:
{'ART_AND_DESIGN': 0.63, 'AUTO_AND_VEHICLES': 0.93, 'BEAUTY': 0.6, 'BOOKS_AND_REFERENCE': 2.14, 'BUSINESS': 4.59, 'COMICS': 0.62, 'COMMUNICATION': 3.24, 'DATING': 1.86, 'EDUCATION': 1.16, 'ENTERTAINMENT': 0.96, 'EVENTS': 0.71, 'FINANCE': 3.7, 'FOOD_AND_DRINK': 1.24, 'HEALTH_AND_FITNESS': 3.08, 'HOUSE_AND_HOME': 0.82, 'LIBRARIES_AND_DEMO': 0.94, 'LIFESTYLE': 3.91, 'GAME': 9.73, 'FAMILY': 18.91, 'MEDICAL': 3.5, 'SOCIAL': 2.66, 'SHOPPING': 2.25, 'PHOTOGRAPHY': 2.95, 'SPORTS': 3.4, 'TRAVEL_AND_LOCAL': 2.34, 'TOOLS': 8.47, '

In [14]:
#We need to show last data in a descending order
#To do this, it is needed to transforms the frequency table into a list of tuples in order to sort the list in a descending order

def transformInTuple (dictionary):
    table_tuple = []
    for k in dictionary:
        key_value_tuple = (dictionary[k],k)
        table_tuple.append(key_value_tuple)
    return table_tuple

def descOrder(listTuple):
    table_sorted = sorted(listTuple,reverse=True)
    return table_sorted

def showTable (dictionary):
    listTuple = transformInTuple (dictionary)
    sortedRevList = descOrder(listTuple)
    for i in sortedRevList:
        print(i[1], ':', i[0])

print('% Genre in iOS:')
showTable(p_genre_apple)
print('\n% Category in Android:')
showTable(p_category_google)
print('\n% Genre in Android:')
showTable(p_genre_google)

% Genre in iOS:
Games : 58.15
Entertainment : 7.89
Photo & Video : 4.97
Education : 3.67
Social Networking : 3.26
Shopping : 2.61
Utilities : 2.52
Sports : 2.14
Music : 2.05
Health & Fitness : 2.02
Productivity : 1.74
Lifestyle : 1.58
News : 1.34
Travel : 1.24
Finance : 1.12
Weather : 0.87
Food & Drink : 0.81
Reference : 0.56
Business : 0.53
Book : 0.43
Navigation : 0.19
Medical : 0.19
Catalogs : 0.12

% Category in Android:
FAMILY : 18.91
GAME : 9.73
TOOLS : 8.47
BUSINESS : 4.59
LIFESTYLE : 3.91
PRODUCTIVITY : 3.89
FINANCE : 3.7
MEDICAL : 3.5
SPORTS : 3.4
PERSONALIZATION : 3.32
COMMUNICATION : 3.24
HEALTH_AND_FITNESS : 3.08
PHOTOGRAPHY : 2.95
NEWS_AND_MAGAZINES : 2.8
SOCIAL : 2.66
TRAVEL_AND_LOCAL : 2.34
SHOPPING : 2.25
BOOKS_AND_REFERENCE : 2.14
DATING : 1.86
VIDEO_PLAYERS : 1.79
MAPS_AND_NAVIGATION : 1.4
FOOD_AND_DRINK : 1.24
EDUCATION : 1.16
ENTERTAINMENT : 0.96
LIBRARIES_AND_DEMO : 0.94
AUTO_AND_VEHICLES : 0.93
HOUSE_AND_HOME : 0.82
WEATHER : 0.8
EVENTS : 0.71
PARENTING : 0.65
ART

## 5. Conclusions
The **most common genre** in free apps for iOS is Games - 58.15% - (followed by Entertainment - 7.89%) but Tools - 8.45% - (closely followed by Entertainment - 6.07%) in Android, both for an English audience. Android users prefer the Family category.

Both markets would rather propose apps designed for entertainment (games, photo and video, social networking, sports, music) than apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle).

Although we can see a clearly app profile in the App Store market (Games), it does not imply that these apps always have a large number of users.

Categories and genres in Google Play market are more diverse. We can find a category more popular than other as Family but if we look at Genres, we can find Tools, Entertainment, Education, Business, Productivity... apps in a similar proportion.


## 6. More analysis

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to do the following:

In [15]:
#Isolate the apps of each genre and average rating

def checkRatingIOS (dataset):
    
    freq = {}
    indexGen = 11
    indexRat = 5
    
    for row1 in dataset:
        genre = row1[indexGen]
        nbrApp = 0
        totUsers = 0
        if genre not in freq:
            for row2 in dataset:
                tempGenre = row2[indexGen]
                if genre == tempGenre:
                    totUsers += int(row2[indexRat])
                    nbrApp += 1
                    freq[genre] = round(totUsers/nbrApp)
    return freq

f_rating_apple = checkRatingIOS(apple_apps)
showTable(f_rating_apple)         
            

Navigation : 86090
Reference : 74942
Music : 57327
Weather : 52280
Social Networking : 43900
Book : 39758
Food & Drink : 33334
Finance : 31468
Photo & Video : 28442
Travel : 28244
Shopping : 26920
Health & Fitness : 23298
Sports : 23009
Games : 22813
News : 21248
Productivity : 21028
Utilities : 18684
Lifestyle : 16486
Entertainment : 14030
Business : 7491
Education : 7004
Catalogs : 4004
Medical : 612


As we can see, Navigation apps (average users: 86090) in iOS are the most popular followed by Reference (74942) and Music (57327) apps. Games, which was the genre that includes more apps in the Apple market, is not so popular (22813) as we could have thought.

What happen with Google market? The `install` column show data the following way:

In [16]:
print(checkFreq(google_apps, 5))

{'5,000,000+': 605, '50,000,000+': 204, '100,000+': 1024, '50,000+': 423, '1,000,000+': 1394, '10,000+': 903, '10,000,000+': 935, '5,000+': 400, '500,000+': 493, '1,000,000,000+': 20, '100,000,000+': 189, '1,000+': 744, '500,000,000+': 24, '500+': 288, '100+': 612, '50+': 169, '10+': 313, '1+': 45, '5+': 70, '0+': 4}


We need to remove '+' character to be able to change the type from `str` to `int`

In [17]:
def removeChar(string,char):
    newString = ''
    for c in string:
        if c!=char:
            newString += c
    return newString

print(removeChar('10000+','+'))
            
        

10000


In [18]:
def checkRatingAndroid (dataset):
    
    freq = {}
    indexCat = 1
    indexRat = 5
    
    for row1 in dataset:
        genre = row1[indexCat]
        nbrApp = 0
        totUsers = 0
        if genre not in freq:
            for row2 in dataset:
                tempGenre = row2[indexCat]
                if genre == tempGenre:
                    nbrInstall = removeChar(removeChar(row2[indexRat],'+'),',')
                    totUsers += int(nbrInstall)
                    nbrApp += 1
                    freq[genre] = round(totUsers/nbrApp)
                    
    return freq

f_rating_google = checkRatingAndroid(google_apps)
showTable(f_rating_google)         

COMMUNICATION : 38456119
VIDEO_PLAYERS : 24727872
SOCIAL : 23253652
PHOTOGRAPHY : 17840110
PRODUCTIVITY : 16787331
GAME : 15588016
TRAVEL_AND_LOCAL : 13984078
ENTERTAINMENT : 11640706
TOOLS : 10801391
NEWS_AND_MAGAZINES : 9549178
BOOKS_AND_REFERENCE : 8767812
SHOPPING : 7036877
PERSONALIZATION : 5201483
WEATHER : 5074486
HEALTH_AND_FITNESS : 4188822
MAPS_AND_NAVIGATION : 4056942
FAMILY : 3697848
SPORTS : 3638640
ART_AND_DESIGN : 2021627
FOOD_AND_DRINK : 1924898
EDUCATION : 1833495
BUSINESS : 1712290
LIFESTYLE : 1437816
FINANCE : 1387692
HOUSE_AND_HOME : 1331541
DATING : 854029
COMICS : 817657
AUTO_AND_VEHICLES : 647318
LIBRARIES_AND_DEMO : 638504
PARENTING : 542604
BEAUTY : 513152
EVENTS : 253542
MEDICAL : 121717


On average, COMMUNICATION apps have the most installs: 38456119, followed by VIDEO_PLAYERS (24727872) and SOCIAL (23253652). BEAUTY, EVENTS and MEDICAL apps are not so popular and show less than 600K installs.

In [19]:
def checkRatingAndroidGenre (dataset):
    
    freq = {}
    indexCat = 9
    indexRat = 5
    
    for row1 in dataset:
        genre = row1[indexCat]
        nbrApp = 0
        totUsers = 0
        if genre not in freq:
            for row2 in dataset:
                tempGenre = row2[indexCat]
                if genre == tempGenre:
                    nbrInstall = removeChar(removeChar(row2[indexRat],'+'),',')
                    totUsers += int(nbrInstall)
                    nbrApp += 1
                    freq[genre] = round(totUsers/nbrApp)
                    
    return freq

f_rating_google = checkRatingAndroidGenre(google_apps)
showTable(f_rating_google)

Communication : 38456119
Adventure;Action & Adventure : 35333333
Video Players & Editors : 24947336
Social : 23253652
Arcade : 22888365
Casual : 19569222
Puzzle;Action & Adventure : 18366667
Photography : 17840110
Educational;Action & Adventure : 17016667
Productivity : 16787331
Racing : 15910646
Travel & Local : 14051476
Casual;Action & Adventure : 12916667
Action : 12603589
Strategy : 11339901
Tools : 10802461
Tools;Education : 10000000
Role Playing;Brain Games : 10000000
Lifestyle;Pretend Play : 10000000
Casual;Music & Video : 10000000
Card;Action & Adventure : 10000000
Adventure;Education : 10000000
News & Magazines : 9549178
Music : 9445583
Educational;Pretend Play : 9375000
Puzzle;Brain Games : 9280667
Word : 9094459
Racing;Action & Adventure : 8816667
Books & Reference : 8767812
Puzzle : 8302862
Video Players & Editors;Music & Video : 7500000
Shopping : 7036877
Role Playing;Action & Adventure : 7000000
Casual;Pretend Play : 6957143
Entertainment;Music & Video : 6413333
Action;Ac

In [23]:
def visAppPopular (dataset,genre, typeofApp):
    if typeofApp == 'android':
        indexName = 0
        indexCat = 1
        indexRat = 5
    if typeofApp == 'ios':
        indexName = 1
        indexCat = 11
        indexRat = 5
        
    for app in dataset:
        if app[indexCat] == genre and (app[indexRat] == '1,000,000,000+'
                                      or app[indexRat] == '500,000,000+'
                                      or app[indexRat] == '100,000,000+'):
            print(app[indexName], ':', app[indexRat])

visAppPopular(google_apps,'COMMUNICATION','android')


WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

Apps as Whatsapp, Messenger, Skype, Google Chrome, Gmail have over 1000 M or 500M installs making the communication category very popular. But is a communucation app a really good profile for this market?
We'll filter these apps to see if communication category can be profitable.

In [26]:
def under100m (dataset,genre,typeofApp):
    under_100_m = []
    if typeofApp == 'android':
        indexName = 0
        indexCat = 1
        indexRat = 5
    if typeofApp == 'ios':
        indexName = 1
        indexCat = 11
        indexRat = 5
        
    for app in dataset:
        n_installs = app[indexRat]
        if typeofApp == 'android':
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
        n_installs = int(n_installs)
        if (app[indexCat] == genre) and (n_installs < 100000000):
            under_100_m.append(n_installs)
    
    return round(sum(under_100_m) / len(under_100_m))

avg_com = under100m(google_apps,'COMMUNICATION','android')
print('Average installs under 100M installs apps for COMMUNICATION category: ',avg_com)
avg_vid = under100m(google_apps,'VIDEO_PLAYERS','android')
print('Average installs under 100M installs apps for VIDEO_PLAYERS category: ',avg_vid)
avg_soc = under100m(google_apps,'SOCIAL','android')
print('Average installs under 100M installs apps for SOCIAL category: ',avg_soc)
avg_pho = under100m(google_apps,'PHOTOGRAPHY','android')
print('Average installs under 100M installs apps for PHOTOGRAPHY category: ',avg_pho)

Average installs under 100M installs apps for COMMUNICATION category:  3603485
Average installs under 100M installs apps for VIDEO_PLAYERS category:  5544878
Average installs under 100M installs apps for SOCIAL category:  3084583
Average installs under 100M installs apps for PHOTOGRAPHY category:  7670532


The categories of COMMUNICATION, VIDEO_PLAYERS, SOCIAL or PHOTOGRAPHY with had an average number of installs of 38M+, 24M+, 23M+ and 17M+ respectively, they show now less than 8M when we remove the apps with more than 100M installs.
Are they that popular?

Let's check a part of the market for BOOKS_AND_REFERENCE category (average installs: 8767812 but only 2.14% of the market).

In [27]:
visAppPopular(google_apps,'BOOKS_AND_REFERENCE','android') #apps over 100M
avg_bo = under100m(google_apps,'BOOKS_AND_REFERENCE','android') #apps under 100M
print('Average installs under 100M installs apps for COMMUNICATION category: ',avg_bo)

Google Play Books : 1,000,000,000+
Bible : 100,000,000+
Amazon Kindle : 100,000,000+
Wattpad 📖 Free Books : 100,000,000+
Audiobooks from Audible : 100,000,000+
Average installs under 100M installs apps for COMMUNICATION category:  1437212


In [28]:
def visAppLessPopular (dataset,genre, typeofApp):
    if typeofApp == 'android':
        indexName = 0
        indexCat = 1
        indexRat = 5
    if typeofApp == 'ios':
        indexName = 1
        indexCat = 11
        indexRat = 5
        
    for app in dataset:
        if app[indexCat] == genre and (app[indexRat] == '1,000,000+'
                                      or app[indexRat] == '5,000,000+'
                                      or app[indexRat] == '10,000,000+'
                                      or app[indexRat] == '50,000,000+'):
            print(app[indexName], ':', app[indexRat])

visAppLessPopular(google_apps,'COMMUNICATION','android')


Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Seznam.cz : 1,000,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Calls & Text by Mo+ : 5,000,000+
free video calls and chat : 50,000,000+
Messaging+ SMS, MMS Free : 1,000,000+
chomp SMS : 10,000,000+
Glide - Video Chat Messenger : 10,000,000+
Text SMS : 10,000,000+
Talkray - Free Calls & Texts : 10,000,000+
GroupMe : 10,000,000+
mysms SMS Text Messaging Sync : 1,000,000+
2ndLine - Second Phone Number : 1,000,000+
Ninesky 

There are just a few apps with more than 100M installs and a lot between 1M and 50M installs. That makes the Book category interesting for the market growth due to its big popularity.
