   # PROFITABLE ENGLISH APPLICATIONS ON GOOGLE PLAY AND APPLE         STORE

- In this project, we will be examining two sets of data containing the list of all mobile applications available for downloads on Google Play and Apple Store.


- We are an English company and invariably, have English-speaking target customers. Therefore, we will analyse our data to check for and extract only those applications which can be used by our target customers.


- After extracting our English apps, we will then select the ones which are profitable and are in high demand by our target customers. In order to have viable results, we need to analyse and check for apps which have the best performance both on Google Play and Apple Store. 

In [None]:
from csv import reader
open_google_file = open('googleplaystore.csv', encoding = 'utf8')
read_file = reader(open_google_file)
android = list(read_file)
android_header = android[0]
android = android[1:]
# print (android)


open_ios_file = open('AppleStore.csv', encoding = 'utf8')
read_file = reader(open_ios_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [None]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [None]:
print (android_header)
print ('\n')
android_explore = explore_data(android,0,3, True)
print (android_explore)
print ('\n')
print (ios_header)
print ('\n')
ios_explore = explore_data(ios, 0,3, True)
print ('\n')
print (ios_explore)

In [None]:
faulty_app = android[10472]
print (faulty_app)
print ('\n')
print (android_header)
print ('\n')
print (android[0])


In [None]:
print (len(android))
del (android[10472])
print (len(android))
print (android[10472])

# REMOVING DUPLICATE ENTRIES

- When cleaning our data, we need to take care of duplicate values and to do this, we need to write a for loop which allows us to identify and print out the duplicate entries available.

- We count the total number of duplicate values using the `len()` function below.

- Due to the fact that some of the duplicate entries can have different corresponding values, we will not delete the duplicate values conventionally.

- Rather, I propose that we make use of a value that differs in every duplicated value.

- Based on the some of the values printed, most had different **RATINGS** values and so, the deletion will be based on this criterion. We assume that the reviews for every app will increase periodically till the most recent. 

- So, we delete the duplicate entries with lower reviews and retain those with the highest values as we assume they represent the most recent picture of the apps's description in the store.

In [None]:
duplicate_apps =[]
unique_apps = []
for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

duplicate_ios_apps = []
unique_ios_apps = []
for app in ios:
    title = app[0]
    if title in unique_ios_apps:
        duplicate_ios_apps.append(title)
    else:
        unique_ios_apps.append(title)
print ('Total Duplicate Android apps:', len(duplicate_apps))
print ('Total Unique Android apps:', len(unique_apps))
print ('Total Duplicate ios apps:', len(duplicate_ios_apps))
print ('Total Unique ios apps:', len(unique_ios_apps))
        
        

In [None]:
print (len(duplicate_apps))
print ('\n')
print (duplicate_apps)

In [None]:
for app in android:
    name = app[0]
    if name == 'Viber Messenger':
        print (app)

In [None]:
android_clean = []
already_added = []
for app in android[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if (reviews_max[name])== n_reviews and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
print (len(android_clean))
print ('\n')
print (android_clean)
print ('\n')
print (already_added)

In [None]:
reviews_max = {}
for app in android[1:]:
    name = app[0]
    n_reviews = float(app[3])
#     reviews_max.append(name)
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
print (explore_data(android_clean, 0, 3, True))
print ('Actual Value:', len(reviews_max))
print ('\n')
print ('Expected Value:', len(android) - 1181) 
print ('\n')
# print (reviews_max)
# This value differs because we have removed the header row from the rows

### DELETING NON-ENGLISH APPS

- We are an English company and we only want to deal with apps that are profitable for our target market (i.e. english-speaking users).

- As a result, we need to eliminate the non-English apps available within our dataset. 

- To do this, we cannot analyse the data individually before we eliminate the non-English apps. 

- Good news is that, behind the scenes, each character we use in a string has a corresponding number associated with it.

- According to the ASCII (American Standard Code for Information Interchange) system, the numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127.

- Thus, based on this range, we can build a function that detects whether a character belongs to the set of common English characters or not.

In [None]:
def english_character(string):
    
    for character in string:
        if ord(character) > 127:
            return False
    return True
print (english_character('Viber'))
print (english_character('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (english_character('Docs To Go™ Free Office Suite'))
print (english_character('Instachat 😜'))

### Re-Adjusting Our English Character Function

- With the function created above to filter out non-English apps, we have realized that our function didn't prove as efficient as we needed it to be.

- The english apps that contained an emoji were marked as Non-English apps and we want to make our function flexible to accomodate some conditions so that we don't eliminate relevant applications.

- To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. 

- This implies that all English apps with up to three emoji or other special characters will still be labeled as English.



In [None]:
def english_character(string):
    non_english = 0
    
    for character in string:
        if ord(character) > 127:
            non_english += 1
    if non_english > 3:
        return False
    else:
        return True
print (english_character('Viber'))
print (english_character('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print (english_character('Docs To Go™ Free Office Suite'))
print (english_character('Instachat 😜'))

In [None]:
android_english_apps = []
ios_english_apps = []

for app in android_clean:
    name = app[0]
    if english_character(name):
        android_english_apps.append(app)

for app in ios:
    name = app[1]
    if english_character(name):
        ios_english_apps.append(app)

print (explore_data(android_english_apps, 0, 7, True))
print ('\n')
print (explore_data(android_english_apps, 0, 7, True))
    

### Categorising the sorted English apps by Genres

- As mentioned earlier, the goal is to determine the type of apps which are likely to pull more users because our income is generated from more user traction.


- To reduce our risks and overhead, we need to create a validation strategy and a feasible plan to execute that strategy.


- Our end goal is to have our English apps both on the Google Play Store and Apple Store markets. To acheive this, we have to follow these steps:
    
    - Build a minimal Android version of the app, and add it to Google Play.
    
    - If the app has a good response from users, we develop it further.
    
    - If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.
 


- Based on our end goal, we need to then find and extract apps which are both successful on the Android and iOS stores. To begin with, let us analyse and deduce the apps genres which are most common in both markets.



- On the Google Play Store, the best column(s) that can be used in analysing the most  common genres are the *Category* and *Genres* columns while for the App Store, we can use the *prime genre* column.

In [None]:
def freq_table(dataset, index):
    table = {}
    total = 0
    
    for row in dataset:
        total +=1
        value = row[index]
        if value in table:
            table[value] +=1
        else:
            table[value] = 1
    
    table_percentages = {}
    for key in table: 
        percentage = (table[key]/total) * 100
        table_percentages[key] = percentage
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [None]:
android_genre_freq_table = display_table(android_english_apps, 1)
print ('\n')
print ('\n')
android_genre_freq_table1 = display_table(android_english_apps, 9)
print ('\n')
print ('\n')
ios_genre_freq_table = display_table(ios_english_apps, -5)
print ('\n')
# print (android_genre_freq_table)
# print ('\n')
# print (ios_genre_freq_table)
# print ('\n')
# print (android_genre_freq_table1)

### Analysis of the Frequency Tables

- From the table we generated above, we realise that among the free android english apps: 

    - **FAMILY** related apps ranked the highest with a 19.33% ranking.
    - **GAME** apps ranked as the first runner up with a 9.82% ranking.
    - **TOOLS** apps ranked as the second runner up with a 8.61% ranking.


- Among the iOS apps, the following can be observed: 

    - **Games** related apps ranked the highest with a 54.86% ranking.
    - **Entertainment** apps ranked as the first runner up with a 7.26% ranking.
    - **Education** apps ranked as the second runner up with a 6.63% ranking.



- The following more general patterns were noticed:

    - Most of the apps that are ranked higher on the Android Store are not exactly topping the list on the iOS store. Some of these apps include: **Finance**, **Business**, **Medical** apps and so on.
    - More apps on the Google Play Store were designed for practical purposes (with an existing balance between fun and practical apps) while those on the iOS store are more designed for entertainment purposes.

### Most Popular Apps by Genre

- Based on the analysis we executed in the previous code, we would now like to know the English apps with the highest no of users.


- A way to go about this is to ** calculate the average number of installs for each app genre **.


- To do this, we would need to find the column within our two datasets that helps us to derive this information. 

    - For the Android Store, we can use the *Installs* Column
    
    - For the iOS store, since we don't have a similar column containing the number of installs, we use the *rating_count_tot* column.

In [None]:
ios_genre_frequency_table = freq_table(ios_english_apps, -5)

for genre in ios_genre_frequency_table :
    total = 0
    len_genre = 0
    for app in ios_english_apps:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)

In [None]:
android_genres = freq_table(android_english_apps, 1)

for category in android_genres:
    total = 0
    len_category = 0
    for app in android_english_apps:
        category_app = app[1]
        if category_app == category:            
            android_installs = app[5]
            android_installs = android_installs.replace(',', '')
            android_installs = android_installs.replace('+', '')
            total += float(android_installs)
            len_category += 1
    avg_android_installs = total / len_category
    print(category, ':', avg_android_installs)

In [None]:
for app in android_english_apps:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'
                                      or app[5] == '500,000,000+'
                                      or app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

In [None]:
for app in ios_english_apps:
    if app[-5] == 'Social Networking':
        print(app[1], ':', app[5])

### ANALYSIS AND RECOMMENDATION FOR PROFITABLE APPS

- Based on the analysis conducted on the apps on the Google Play Store, we realized that the app with the highest installs fell under the category of **COMMUNICATION**.


- The following are the top applicationa which show the potential for being profitable on the Google Play Store are: 

    1. WhatsApp Messenger : 1,000,000,000+

    2. Messenger – Text and Video Chat for Free : 1,000,000,000+

    3. Skype - free IM & video calls : 1,000,000,000+

    4. Google Chrome: Fast & Secure : 1,000,000,000+

    5. Gmail : 1,000,000,000+

    6. Hangouts : 1,000,000,000+
    

- These are the potential profitable apps on the App Store: 

    1. Facebook           :           2974676
    
    2. Pinterest          :           1061624

    3. Skype for iPhone   :            373519
    
    4. Messenger          :            351466
    
    5. Tumblr             :            334293
    
    6. WhatsApp Messenger :            287589
