# Profitable App Profiles for mobile app markets (iOS & android):
    
    This project aims to conduct an analysis of the apps available in iOS and android app markets: the objective here being to determine features or attributes that affect profitability. As analysts our task is to provide insights for the development team about which app profiles are best suited for revenue generation in the free apps category.
    
    Since, the apps aimed for development will be in the free apps category, the primary source of revenue generation will be through in-app ads. With revenue generation being affected by the number of users and their active engagement with the in-app ads, our goal is to provide the developmental team with the necessary information to build apps that appeal to the most number of users.
    

# Exploring the first three rows, the header row, the number of rows and columns for the iOS and android app datasets

In [1]:
def open_dataset(file_name, has_header = True):
    opened_file = open(file_name, encoding = 'utf8');
    from csv import reader;
    read_file = reader(opened_file);
    data_set = list(read_file);
    
    if has_header:
        data_set = data_set[1:];
        return data_set;
    else:
        return data_set;

apps_data_iOS = open_dataset('AppleStore.csv');
apps_data_android = open_dataset('googleplaystore.csv');



def explore_data(data_set,start,end,rows_and_columns=False):
    
    data_set_portion = data_set[start:end];
    
    for row in data_set_portion:
        print(row);
        print('\n');
        
    if rows_and_columns:
        print("Number of rows:\t\t",len(data_set));
        print("Number of columns:\t",len(data_set[0]));
        print('\n')
        
explore_data(apps_data_iOS,0,3,rows_and_columns=True);
explore_data(apps_data_android,0,3,rows_and_columns=True);

apps_data_iOS_hdr = open_dataset('AppleStore.csv',has_header = False);
apps_data_android_hdr = open_dataset('googleplaystore.csv',has_header = False);

print("apps_data_iOS_hdr:\n",apps_data_iOS_hdr[0]);print('\n');
print("apps_data_android_hdr:\n",apps_data_android_hdr[0]);print('\n');



['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows:		 7197
Number of columns:	 16


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August

# Fixing an error: Removing an app and its information from the android app dataset 

In [2]:
#Life Made WI-Fi Touchscreen Photo Frame
i=0;
for row in apps_data_android:
    app_name = "Life Made WI-Fi Touchscreen Photo Frame"
    if row[0] == app_name:
        error_index = i;
        print("error index:\t",error_index);
        print(row);
        del(apps_data_android[i]);
    i+=1;

print('\n');print('index of row deleted:\t',error_index);    
print('Current row:')
print(apps_data_android[error_index]); print('\n');

print('Current # of rows (android):\t\t',len(apps_data_android))


 


error index:	 10472
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


index of row deleted:	 10472
Current row:
['osmino Wi-Fi: free WiFi', 'TOOLS', '4.2', '134203', '4.1M', '10,000,000+', 'Free', '0', 'Everyone', 'Tools', 'August 7, 2018', '6.06.14', '4.4 and up']


Current # of rows (android):		 10840


** Links to documentation:**
- [**iOS**][id1]  
- [**android**][id2]
    
   [id1]: https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps
   [id2]: https://www.kaggle.com/lava18/google-play-store-apps
   
   
   
   
  
    

 > ** Certain concerns raised in the discussion section indicate the presence of duplicate entries in the android app store dataset. Some instances of these duplicate entries are shown in the code below. **
   

# Instances of counts for duplicate apps in android app dataset:

In [3]:

def freq_table(data_set):
    duplicate_temp_count = {};
    duplicate_count = {}
    unique_count = {};
    for row in data_set:
        name = row[0];
        if name in unique_count:
            duplicate_temp_count[name] += 1;
        else:
            unique_count[name] = 1;
            duplicate_temp_count[name] = 0;
            
    for key in duplicate_temp_count:
        if duplicate_temp_count[key] != 0:
            duplicate_count[key] = duplicate_temp_count[key];
    
    
    return duplicate_count, unique_count

(duplicate_tally,unique_tally) = freq_table(apps_data_android);

print("Instances of counts for duplicate apps:");
count = 1;
for key in duplicate_tally:
    print("%s: %d"%(key,duplicate_tally[key]));
    if count >15:
        break;
    count +=1;
    
 

def tot_freq_table(dict):
    tot = 0;
    for key in dict:
        tot += dict[key];
    return tot

print("\nThe total number of duplicates:\t%d"%tot_freq_table(duplicate_tally))

    




Instances of counts for duplicate apps:
Free Dating App - Meet Local Singles - Flirt Chat: 1
Cartoon Network App: 1
Moto File Manager: 1
GO Keyboard - Emoticon keyboard, Free Theme, GIF: 1
Sago Mini Friends: 2
Textgram - write on photos: 1
Cricbuzz - Live Cricket Scores & News: 1
HBO GO: Stream with TV Package: 1
join.me - Simple Meetings: 2
eBay: Buy & Sell this Summer - Discover Deals Now!: 4
Amazon Kindle: 1
FastMeet: Chat, Dating, Love: 1
Robinhood - Investing, No Fees: 1
HBO NOW: Stream TV & Movies: 3
QuickPic - Photo Gallery with Google Drive Support: 2
Calorie Counter - Macros: 1

The total number of duplicates:	1181


> ** For a given app with multiple entries, original and duplicate(s), we only keep the entry with highest number of reviews. To achieve this we will use a sorting function that takes the app store dataset, apps_data_android, as input and outputs a dataset, app_data_android_clean, that lists only unique entries for apps listed in the app store dataset, corresponding to the highest n.o. of reviews for a given app. ** 

# Create a new android apps data set, apps_store_android_clean, with unique entries corresponding to the highest number of reviews

In [4]:
def sorting_function(data_set):
    reviews_max = {};
    android_clean = [];
    already_added = [];
    
    for row in data_set:
        name = row[0];
        n_reviews = float(row[3]);
        if name not in reviews_max:
            reviews_max[name] = n_reviews;
        elif (name in reviews_max) and (n_reviews > reviews_max[name]):
            reviews_max[name] = n_reviews;
        
    for row in data_set:
        name = row[0];
        n_reviews = float(row[3]);
        if (name not in already_added) and (n_reviews == reviews_max[name]):
            android_clean.append(row);
            already_added.append(name);
    
    print("len(android_clean):\t",len(android_clean));  
    return android_clean
    
apps_data_android_clean = sorting_function(apps_data_android)
        
        

len(android_clean):	 9659


Two steps taken above to create a new data set, apps_store_android_clean, of apps with unique entries corresponding to the highest number of reviews:

1. Loop through the dataset, apps_data_android, to create a dictionary, reviews_max, where each key is the name of an unique app and the corresponding value the highest number of reviews for that app.
2. Loop again through the dataset, apps_data_android, and use the dictionary, reviews_max, key value pairs to create the new data set, apps_data_android_clean.


# Removing non-english apps from iOS and android app datasets

In [5]:
def has_english_name(name):
    ct_non_en_char = 0;
    for char in name:
        if ord(char)>127:
            ct_non_en_char += 1;
            if ct_non_en_char > 3:
                return False
    return True

app1_has_eng =  has_english_name('Instagram');
print("app1_has_eng:\t",app1_has_eng);

app2_has_eng =  has_english_name('爱奇艺PPS -《欢乐颂2》电视剧热播');
print("app2_has_eng:\t",app2_has_eng);

app3_has_eng =  has_english_name('Docs To Go™ Free Office Suite');
print("app3_has_eng:\t",app3_has_eng);

app4_has_eng =  has_english_name('Instachat 😜');
print("app4_has_eng:\t",app4_has_eng);



app1_has_eng:	 True
app2_has_eng:	 False
app3_has_eng:	 True
app4_has_eng:	 True


In [6]:
def english_app_selector_iOS(dataset):
    new_dataset = [];
    for row in dataset:
        name = row[1];
        if has_english_name(name):
            new_dataset.append(row);
    return new_dataset

def english_app_selector_andr(dataset):
    new_dataset = [];
    for row in dataset:
        name = row[0];
        if has_english_name(name):
            new_dataset.append(row);
    return new_dataset


# Two datasets to be evaluated:
# 1. apps_data_iOS
# 2. apps_data_android_clean

apps_data_iOS_eng = english_app_selector_iOS(apps_data_iOS);
print("len(apps_data_iOS_eng):\t\t\t",len(apps_data_iOS_eng));

apps_data_android_clean_eng = english_app_selector_andr(apps_data_android_clean);
print("len(apps_data_android_clean_eng):\t",len(apps_data_android_clean_eng));


len(apps_data_iOS_eng):			 6183
len(apps_data_android_clean_eng):	 9614


# Removing non-free apps from iOS and android app datasets

In [7]:
def free_app_selector_iOS(dataset):
    new_dataset = []
    for row in dataset:
        price = float(row[4]);
        if price == 0.0:
            new_dataset.append(row);
    return new_dataset

def free_app_selector_andr(dataset):
    new_dataset = []
    for row in dataset:
        price = row[7];
        if price == '0':
            new_dataset.append(row);
    return new_dataset

apps_data_iOS_eng_free = free_app_selector_iOS(apps_data_iOS_eng);
print("len(apps_data_iOS_eng_free):\t\t",len(apps_data_iOS_eng_free));


apps_data_android_clean_eng_free = free_app_selector_andr(apps_data_android_clean_eng);
print("len(apps_data_android_clean_eng_free):\t",len(apps_data_android_clean_eng_free));

len(apps_data_iOS_eng_free):		 3222
len(apps_data_android_clean_eng_free):	 8864


The aim of our analysis is to determine the app profiles that are best suited for the iOS and android app markets for attracting the largest number of potential users.

Our strategy for accomplishing this will consist of 3 steps:
1. Build an android app that meets the mimimal requirements for best suited app profiles.
2. Develop the android app further, if it appeals to a large and growing userbase.
3. If android app begins to generate profits after 6 months, build an app for the iOS app market.

# Frequency percentages tables for iOS & android apps datasets based on the number of apps in a given genre or category

In [8]:
# Genre columns for iOS and android:
# 1. iOS_prime_genre = row[11] or row[-5];
# 2. andr_Genres = row[9] or row[-4];
# 3. andr_Category = row[1];

def freq_table(dataset,index):
    genre_tally = {};
    genre_tally_p = {};
    sum = 0;
    for row in dataset:
        genre = row[index];
        if genre in genre_tally:
            genre_tally[genre] += 1;
        elif genre not in genre_tally:
            genre_tally[genre] = 1;
    
    for key in genre_tally:
        sum += genre_tally[key];
    
    for key in genre_tally:
        genre_tally_p[key] = genre_tally[key]*(100/sum);
    
    return genre_tally_p

def display_table(dataset,index):
    table = freq_table(dataset,index);
    tuple_list = [];
    for key in table:
        val_key_tuple = (table[key],key);
        tuple_list.append(val_key_tuple);
    
    sorted_tuple_list = sorted(tuple_list, reverse = True);
    for row in sorted_tuple_list:
        print(row[1],': ',row[0])


In [9]:
#fq_table_iOS_prime_genre:
print('fq_table_iOS_prime_genre:\n');
display_table(apps_data_iOS_eng_free,-5);
print('\n');

fq_table_iOS_prime_genre:

Games :  58.16263190564866
Entertainment :  7.883302296710117
Photo & Video :  4.9658597144630665
Education :  3.6623215394165114
Social Networking :  3.2898820608317814
Shopping :  2.60707635009311
Utilities :  2.513966480446927
Sports :  2.1415270018621975
Music :  2.0484171322160147
Health & Fitness :  2.0173805090006205
Productivity :  1.7380509000620732
Lifestyle :  1.5828677839851024
News :  1.3345747982619491
Travel :  1.2414649286157666
Finance :  1.1173184357541899
Weather :  0.8690254500310366
Food & Drink :  0.8069522036002482
Reference :  0.5586592178770949
Business :  0.5276225946617008
Book :  0.4345127250155183
Navigation :  0.186219739292365
Medical :  0.186219739292365
Catalogs :  0.12414649286157665




The frequency percentages table for the prime_genre column of the iOS apps dataset, apps_data_iOS_eng_free, indicate the following:

* The most common genre is the Games (58.16 %) genre. The next most common genre is the Entertainment (7.88 %) genre.
* The genres for entertainment (Games, Entertainment, Photo & Video, Social Networking, Sports, Music) account for most (~78 %) of the apps present in the iOS apps dataset, apps_data_iOS_eng_free.
* The genres of practical utility (Education, Shopping, Utilities, Productivity, Lifestyle) account for little over a fifth (20+ %) of the apps present in the iOS apps dataset, apps_data_iOS_eng_free.
* Based on the information presented in the frequency percentages table, an app profile cannot be recommended. This is the case as the information on the number of apps in a genre does not necessarily correlate to having the potential to attract a large number of users for that genre.

In [10]:
#fq_table_andr_Genres:
print('fq_table_andr_Genres:\n')
display_table(apps_data_android_clean_eng_free,-4);
print('\n');

fq_table_andr_Genres:

Tools :  8.44990974729242
Entertainment :  6.069494584837545
Education :  5.347472924187726
Business :  4.591606498194946
Productivity :  3.8921480144404335
Lifestyle :  3.8921480144404335
Finance :  3.700361010830325
Medical :  3.531137184115524
Sports :  3.4634476534296033
Personalization :  3.3167870036101084
Communication :  3.2378158844765346
Action :  3.1024368231046933
Health & Fitness :  3.0798736462093865
Photography :  2.9444945848375452
News & Magazines :  2.7978339350180508
Social :  2.6624548736462095
Travel & Local :  2.3240072202166067
Shopping :  2.2450361010830324
Books & Reference :  2.143501805054152
Simulation :  2.041967509025271
Dating :  1.861462093862816
Arcade :  1.8501805054151625
Video Players & Editors :  1.7712093862815885
Casual :  1.7599277978339352
Maps & Navigation :  1.3989169675090254
Food & Drink :  1.2409747292418774
Puzzle :  1.128158844765343
Racing :  0.9927797833935019
Role Playing :  0.9363718411552348
Libraries & Demo : 

In [11]:
#fq_table_andr_Category:
print('fq_table_andr_Category:\n')
display_table(apps_data_android_clean_eng_free,1);
print('\n');


fq_table_andr_Category:

FAMILY :  18.90794223826715
GAME :  9.724729241877258
TOOLS :  8.461191335740073
BUSINESS :  4.591606498194946
LIFESTYLE :  3.903429602888087
PRODUCTIVITY :  3.8921480144404335
FINANCE :  3.700361010830325
MEDICAL :  3.531137184115524
SPORTS :  3.3957581227436826
PERSONALIZATION :  3.3167870036101084
COMMUNICATION :  3.2378158844765346
HEALTH_AND_FITNESS :  3.0798736462093865
PHOTOGRAPHY :  2.9444945848375452
NEWS_AND_MAGAZINES :  2.7978339350180508
SOCIAL :  2.6624548736462095
TRAVEL_AND_LOCAL :  2.3352888086642603
SHOPPING :  2.2450361010830324
BOOKS_AND_REFERENCE :  2.143501805054152
DATING :  1.861462093862816
VIDEO_PLAYERS :  1.7937725631768955
MAPS_AND_NAVIGATION :  1.3989169675090254
FOOD_AND_DRINK :  1.2409747292418774
EDUCATION :  1.1620036101083033
ENTERTAINMENT :  0.9589350180505416
LIBRARIES_AND_DEMO :  0.9363718411552348
AUTO_AND_VEHICLES :  0.9250902527075813
HOUSE_AND_HOME :  0.8235559566787004
WEATHER :  0.8009927797833936
EVENTS :  0.7107400722

The frequency percentages tables for the Genres & Category columns of the android apps store dataset, apps_data_android_clean_eng_free, indicate the following:

* The most common genre is the Tools (8.4 %) genre, followed by the Entertainment (6.0 %) and Education (5.3 %) genres.
* The app categories of practical (Family, Tools, Business, Productivity,...) utility account for over two-thirds (66+ %) of the apps present in the android apps dataset, apps_data_android_clean_eng_free.
* The app categories for entertainment (Game, Lifestyle, Sports, Photography, Social, Dating, Entertainment, Art and Design, Comics) account for over a quarter (25+ %) of the apps present in the android apps dataset, apps_data_android_clean_eng_free.
* Up to this point, we found that the iOS apps dataset is dominated by apps designed for entertainment, while the android apps dataset shows a more balanced landscape of both practical and entertainment apps. 
* Based on the information presented in the frequency percentages tables, an app profile cannot be recommended. This is the case as the information on the number of apps in a genre does not necessarily correlate to having the potential to attract a large number of users for that genre.

* Most frequent app categories/genres for android and iOS app datasets:
    * iOS: Games, Entertainment, Photo & Video, Social Networking, Sports, Music
    * android: Family, Tools, Business, Productivity

# Average number of ratings for app categories for iOS app datasets.

In [12]:
def freq_table_gen_rct(dataset):
    gen_rct = {};
    
    for row in dataset:
        genre = row[-5];
        total = 0;
        len_genre = 0;

        if genre not in gen_rct:
            for row in dataset:
                genre_app = row[-5];
                rct = float(row[5]);
                if genre_app == genre:
                    total += rct;
                    len_genre += 1;
            
            gen_rct[genre] = total/len_genre;
            
    return gen_rct

def freq_table_sorted_gen_rct(dataset):
    gen_rct = freq_table_gen_rct(dataset);
    tuple_gen_rct = [];
    sorted_gen_rct = [];
    
    for key in gen_rct:
        val_key_gen_rct = (gen_rct[key],key);
        tuple_gen_rct.append(val_key_gen_rct);
    
    sorted_tuple_gen_rct = sorted(tuple_gen_rct, reverse = True);
    
    for row in sorted_tuple_gen_rct:
        key_gen = row[1];
        val_rct = row[0]; 
        row_c = [key_gen,val_rct];
        sorted_gen_rct.append(row_c);
        print(key_gen, ": ", val_rct)
    
    
    return sorted_gen_rct

dataset_gen_rct = freq_table_sorted_gen_rct(apps_data_iOS_eng_free)

print('\n');
print(dataset_gen_rct[:5])



Navigation :  86090.33333333333
Reference :  74942.11111111111
Social Networking :  71548.34905660378
Music :  57326.530303030304
Weather :  52279.892857142855
Book :  39758.5
Food & Drink :  33333.92307692308
Finance :  31467.944444444445
Photo & Video :  28441.54375
Travel :  28243.8
Shopping :  26919.690476190477
Health & Fitness :  23298.015384615384
Sports :  23008.898550724636
Games :  22788.6696905016
News :  21248.023255813954
Productivity :  21028.410714285714
Utilities :  18684.456790123455
Lifestyle :  16485.764705882353
Entertainment :  14029.830708661417
Business :  7491.117647058823
Education :  7003.983050847458
Catalogs :  4004.0
Medical :  612.0


[['Navigation', 86090.33333333333], ['Reference', 74942.11111111111], ['Social Networking', 71548.34905660378], ['Music', 57326.530303030304], ['Weather', 52279.892857142855]]


> 
Based on the information presented in the frequency table for the key value pair, prime_genre, and rating_count_tot, the following genres are recommended for app profiles for the iOS apps store:  
  1. Navigation
  2. Reference/Social Networking
  3. Music
  4. Weather
>  
>  Eg: Waze + Wikipedia/Reddit + Spotify + Weatherapp


# Average number of installs for app categories for android app datasets.

In [13]:
def freq_table_categ_ninst(dataset):
    categ_ninst = {};
    for row in dataset:
        categ = row[1];
        total = 0;
        len_category = 0;
        if categ not in categ_ninst:
            for row in dataset:
                categ_app = row[1];
                if categ_app == categ:
                    str_install = row[5];
                    str_install = str_install.replace('+','');
                    str_install = str_install.replace(',','');
                    n_install = float(str_install);
                    total += n_install;
                    len_category += 1;
            
            if total != 0 or len_category != 0:
                categ_ninst[categ] = total/len_category;
    
    return categ_ninst

def dataset_sorted_categ_ninst(dataset):
    sorted_dataset_categ_ninst = [];
    tuple_categ_ninst = [];
    table = freq_table_categ_ninst(dataset);
    for key in table:
        key_val_tup = (table[key],key);
        tuple_categ_ninst.append(key_val_tup);
    
    sorted_tuple_categ_ninst = sorted(tuple_categ_ninst, reverse = True);
    
    for row in sorted_tuple_categ_ninst:
        key_categ = row[1];
        val_ninst = row[0];
        print(key_categ,': ',val_ninst);
        row_c = [key_categ,val_ninst];
        sorted_dataset_categ_ninst.append(row_c);
    
    return sorted_dataset_categ_ninst

dataset_categ_ninst = dataset_sorted_categ_ninst(apps_data_android_clean_eng_free);
print('\n');
#print(dataset_categ_ninst[:5])
    

COMMUNICATION :  38456119.167247385
VIDEO_PLAYERS :  24727872.452830188
SOCIAL :  23253652.127118643
PHOTOGRAPHY :  17840110.40229885
PRODUCTIVITY :  16787331.344927534
GAME :  15588015.603248259
TRAVEL_AND_LOCAL :  13984077.710144928
ENTERTAINMENT :  11640705.88235294
TOOLS :  10801391.298666667
NEWS_AND_MAGAZINES :  9549178.467741935
BOOKS_AND_REFERENCE :  8767811.894736841
SHOPPING :  7036877.311557789
PERSONALIZATION :  5201482.6122448975
WEATHER :  5074486.197183099
HEALTH_AND_FITNESS :  4188821.9853479853
MAPS_AND_NAVIGATION :  4056941.7741935486
FAMILY :  3695641.8198090694
SPORTS :  3638640.1428571427
ART_AND_DESIGN :  1986335.0877192982
FOOD_AND_DRINK :  1924897.7363636363
EDUCATION :  1833495.145631068
BUSINESS :  1712290.1474201474
LIFESTYLE :  1437816.2687861272
FINANCE :  1387692.475609756
HOUSE_AND_HOME :  1331540.5616438356
DATING :  854028.8303030303
COMICS :  817657.2727272727
AUTO_AND_VEHICLES :  647317.8170731707
LIBRARIES_AND_DEMO :  638503.734939759
PARENTING :  54

> 
Based on the information presented in the frequency table for the key value pair, Category, and Installs, the following genres are recommended for app profiles for the android apps store:  
  1. Communication
  2. Video_Players
  3. Social
  4. Photography
>  
> Eg: Wire + Vlc + Reddit + Adobe Lightroom

# Average ratings for app genres for android app datasets.

In [14]:
#dataset = apps_data_android_clean_eng_free; genre = row[-4]; rating = row[2]; 

def freq_table_andr_genre_rating(dataset,in1,in2):
    import numpy as np;
    fq_table = {};
    for row in dataset:
        genre = row[in1];
        
        if genre not in fq_table:
            total = 0;
            len_genre = 0;
            for row in dataset:
                genre_app = row[in1];
                if genre_app == genre:
                    rating = row[in2];
                    rating = float(row[in2]);
                    if np.isnan(rating):
                        pass;
                    else:
                        total += rating;
                        len_genre += 1; 
            if len_genre != 0 or total !=0:
                fq_table[genre] = total/len_genre;

    return fq_table


def sorted_tuple(dataset,in1,in2):
    tup_list = [];           
    table = freq_table_andr_genre_rating(dataset,in1,in2)
                
    for key in table:
        key_val_tup = (table[key],key);
        tup_list.append(key_val_tup);

    sorted_tup_list = sorted(tup_list,reverse = True);
                 
    for tup in sorted_tup_list:
        key = tup[1];
        val = tup[0];
        print(key,": ",val)        

           
                 
dataset = apps_data_android_clean_eng_free; in1 = -4; in2 = 2; 
sorted_tuple(dataset,in1,in2)                

Comics;Creativity :  4.8
Health & Fitness;Education :  4.7
Strategy;Action & Adventure :  4.6
Puzzle;Education :  4.6
Simulation;Pretend Play :  4.55
Entertainment;Creativity :  4.533333333333333
Tools;Education :  4.5
Strategy;Education :  4.5
Sports;Action & Adventure :  4.5
Racing;Pretend Play :  4.5
Arcade;Pretend Play :  4.5
Casual;Brain Games :  4.475
Music;Music & Video :  4.449999999999999
Events :  4.435555555555557
Education;Brain Games :  4.433333333333334
Strategy;Creativity :  4.4
Simulation;Education :  4.4
Puzzle;Creativity :  4.4
Entertainment;Education :  4.4
Arcade;Action & Adventure :  4.4
Adventure;Action & Adventure :  4.3999999999999995
Parenting :  4.3921052631578945
Education;Creativity :  4.375
Puzzle :  4.355421686746987
Casual;Creativity :  4.35
Art & Design;Creativity :  4.35
Books & Reference :  4.347798742138364
Simulation;Action & Adventure :  4.3428571428571425
Art & Design :  4.3352941176470585
Role Playing;Action & Adventure :  4.333333333333333
Parent

# Average ratings for app categories for android app datasets.


In [15]:
#dataset = apps_data_android_clean_eng_free; category = row[1]; rating = row[2]; 

dataset = apps_data_android_clean_eng_free; in1 = 1; in2 = 2; 
sorted_tuple(dataset,in1,in2)            


EVENTS :  4.435555555555557
BOOKS_AND_REFERENCE :  4.347798742138364
EDUCATION :  4.3401960784313705
PARENTING :  4.3395833333333345
ART_AND_DESIGN :  4.338181818181818
PERSONALIZATION :  4.300000000000001
BEAUTY :  4.278571428571428
SOCIAL :  4.252736318407958
HEALTH_AND_FITNESS :  4.236051502145922
GAME :  4.2320341047503085
WEATHER :  4.229230769230768
SHOPPING :  4.227528089887643
SPORTS :  4.212605042016807
AUTO_AND_VEHICLES :  4.184722222222223
PRODUCTIVITY :  4.1819148936170265
LIBRARIES_AND_DEMO :  4.178125
COMICS :  4.177358490566039
FAMILY :  4.171361185983833
FOOD_AND_DRINK :  4.1673913043478255
PHOTOGRAPHY :  4.164516129032258
MEDICAL :  4.147807017543858
HOUSE_AND_HOME :  4.140983606557378
FINANCE :  4.128373702422146
COMMUNICATION :  4.126923076923076
ENTERTAINMENT :  4.118823529411763
NEWS_AND_MAGAZINES :  4.1045454545454545
BUSINESS :  4.10395256916996
LIFESTYLE :  4.082078853046592
TRAVEL_AND_LOCAL :  4.068156424581004
VIDEO_PLAYERS :  4.043448275862069
MAPS_AND_NAVIGA

> Based on the information presented in the frequency table for the key value pair, Category, and Rating, the following genres are recommended for app profiles for the android apps store, since avg. ratings indicate room for the development of better rated apps:
>
1. Communication
2. Video_Players
3. Social
4. Photography
>  
> Eg: Wire + Vlc + Reddit + Adobe Lightroom

# Picking the top three apps in each of the above categories based on number of installs for an app

In [16]:
#dataset = apps_data_android_clean_eng_free; category = row[1]; name = row[0]; n_installs = row[5]; 

def table_keyword(dataset,in1,in2,in3,keyword):
    keyword_dict = {};
    for row in dataset:
        categ = row[1];
        #print(categ);
        #print(keyword);
        if categ == keyword:
            name = row[0];
            str_install = row[5];
            str_install = str_install.replace("+","");
            str_install = str_install.replace(",","");
            n_install   = float(str_install);
            keyword_dict[name] = n_install;
    return keyword_dict

 
def sorted_table(table):
    
    tup_list = [];
    
    for key in table:
        key_val_tup = (table[key],key);
        tup_list.append(key_val_tup);
    
    sorted_tup_list = sorted(tup_list,reverse = True);
    
    count = 0;
    for key_val in sorted_tup_list:
        if count>=3:
            break;
        name_ = key_val[1]; ninst = key_val[0];
        print(name_,': ',ninst);
        count += 1;
    
    
        
dataset = apps_data_android_clean_eng_free; 
in1 = 1; in2 = 0; in3 = 5; keyword = "COMMUNICATION"
print(keyword)
table = table_keyword(dataset,in1,in2,in3,keyword); 
sorted_table(table)

COMMUNICATION
WhatsApp Messenger :  1000000000.0
Skype - free IM & video calls :  1000000000.0
Messenger – Text and Video Chat for Free :  1000000000.0


In [17]:
dataset = apps_data_android_clean_eng_free; 
in1 = 1; in2 = 0; in3 = 5; keyword = "VIDEO_PLAYERS"
print(keyword)
table = table_keyword(dataset,in1,in2,in3,keyword); 
sorted_table(table)

VIDEO_PLAYERS
YouTube :  1000000000.0
Google Play Movies & TV :  1000000000.0
MX Player :  500000000.0


In [18]:
dataset = apps_data_android_clean_eng_free; 
in1 = 1; in2 = 0; in3 = 5; keyword = "SOCIAL";
print(keyword)
table = table_keyword(dataset,in1,in2,in3,keyword); 
sorted_table(table)

SOCIAL
Instagram :  1000000000.0
Google+ :  1000000000.0
Facebook :  1000000000.0


In [19]:
dataset = apps_data_android_clean_eng_free; 
in1 = 1; in2 = 0; in3 = 5; keyword = "PHOTOGRAPHY";
print(keyword)
table = table_keyword(dataset,in1,in2,in3,keyword); 
sorted_table(table)

PHOTOGRAPHY
Google Photos :  1000000000.0
Z Camera - Photo Editor, Beauty Selfie, Collage :  100000000.0
YouCam Perfect - Selfie Photo Editor :  100000000.0


> The above information indicates that the development of a multi-purpose app that offers the services of a communication app, a video player app, a social networking app, and a photography app will likely appeal to the most number of potential users for the android app store.

# Average rating for app genres for iOS app datasets.

In [20]:
def freq_table_genre_rating(dataset):
    genre_rating = {};
    for row in dataset:
        genre = row[-5];
        total = 0;
        len_genre = 0;
        if genre not in genre_rating:
            for row in dataset:
                genre_app = row[-5];
                if genre_app == genre:
                    rating = float(row[7]);
                    total += rating;
                    len_genre += 1;
            
            genre_rating[genre] = total/len_genre;
    
    return genre_rating

def dataset_sorted_genre_rating(dataset):
    sorted_dataset_genre_rating = [];
    tuple_genre_rating = [];
    table = freq_table_genre_rating(dataset);
    for key in table:
        key_val_tup = (table[key],key);
        tuple_genre_rating.append(key_val_tup);
    
    sorted_tuple_genre_rating = sorted(tuple_genre_rating, reverse = True);
    
    for row in sorted_tuple_genre_rating:
        key_genre = row[1];
        val_rating = row[0];
        print(key_genre,': ',val_rating);
        row_c = [key_genre,val_rating];
        sorted_dataset_genre_rating.append(row_c);
    
    return sorted_dataset_genre_rating

dataset_genre_rating = dataset_sorted_genre_rating(apps_data_iOS_eng_free);
print('\n');
#print(dataset_genre_rating[:5])
    

Catalogs :  4.125
Games :  4.037086446104589
Productivity :  4.0
Business :  3.9705882352941178
Shopping :  3.9702380952380953
Music :  3.946969696969697
Photo & Video :  3.903125
Navigation :  3.8333333333333335
Health & Fitness :  3.769230769230769
Reference :  3.6666666666666665
Education :  3.635593220338983
Food & Drink :  3.6346153846153846
Social Networking :  3.5943396226415096
Entertainment :  3.5393700787401574
Utilities :  3.5308641975308643
Travel :  3.4875
Weather :  3.482142857142857
Lifestyle :  3.411764705882353
Finance :  3.375
News :  3.244186046511628
Book :  3.0714285714285716
Sports :  3.0652173913043477
Medical :  3.0




> Based on the information presented in the frequency table for the key value pair, prime_genre, and user_rating, the following genres are recommended for app profiles for the iOS apps store, since avg. ratings indicate room for the development of better rated apps:
>
1. Navigation
2. Social Networking
3. Music
4. Weather
>
> Eg: Waze + Reddit + Spotify + Weatherapp

# Picking the top three apps in each of the above categories based on total number of ratings for an app

In [21]:
#dataset = apps_data_iOS_eng_free:
#prime_genre = row[-5];
#track_name = row[1];
#rating_count_tot = row[5];


def dict_keyword(dataset,in1,in2,in3,keyword):
    keyword_dict = {};
    for row in dataset:
        genre = row[in1];
        if genre == keyword:
            name = row[in2];
            r_ct = float(row[in3]);
            keyword_dict[name] = r_ct;
            
    return keyword_dict

def dataset_sorted(dataset,in1,in2,in3,keyword):
    
    table = dict_keyword(dataset,in1,in2,in3,keyword);
    tup_list = [];
    for key in table: 
        key_val_tup = (table[key], key);
        tup_list.append(key_val_tup);
    
    sorted_tup_list = sorted(tup_list,reverse = True);
    
    count = 0;
    for row in sorted_tup_list:
        name = row[1];
        rct = row[0];
        print(name,": ",rct);
        count += 1;
        if count>=3:
            break;

in1 = -5; 
in2 = 1; 
in3 = 5; 
keyword = 'Navigation';
dataset = apps_data_iOS_eng_free
print(keyword)
dataset_sorted(dataset,in1,in2,in3,keyword)
        
        

Navigation
Waze - GPS Navigation, Maps & Real-time Traffic :  345046.0
Google Maps - Navigation & Transit :  154911.0
Geocaching® :  12811.0


In [22]:
#dataset = apps_data_iOS_eng_free:
#prime_genre = row[-5];
#track_name = row[1];
#rating_count_tot = row[5];

in1 = -5; 
in2 = 1; 
in3 = 5; 
keyword = 'Social Networking';
dataset = apps_data_iOS_eng_free
print(keyword)
dataset_sorted(dataset,in1,in2,in3,keyword)
        

Social Networking
Facebook :  2974676.0
Pinterest :  1061624.0
Skype for iPhone :  373519.0


In [23]:
#dataset = apps_data_iOS_eng_free:
#prime_genre = row[-5];
#track_name = row[1];
#rating_count_tot = row[5];

in1 = -5; 
in2 = 1; 
in3 = 5; 
keyword = 'Music';
dataset = apps_data_iOS_eng_free
print(keyword)
dataset_sorted(dataset,in1,in2,in3,keyword)

Music
Pandora - Music & Radio :  1126879.0
Spotify Music :  878563.0
Shazam - Discover music, artists, videos & lyrics :  402925.0


In [24]:
#dataset = apps_data_iOS_eng_free:
#prime_genre = row[-5];
#track_name = row[1];
#rating_count_tot = row[5];

in1 = -5; 
in2 = 1; 
in3 = 5; 
keyword = 'Weather';
dataset = apps_data_iOS_eng_free
print(keyword)
dataset_sorted(dataset,in1,in2,in3,keyword)

Weather
The Weather Channel: Forecast, Radar & Alerts :  495626.0
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking :  208648.0
WeatherBug - Local Weather, Radar, Maps, Alerts :  188583.0


> The above information indicates that the development of a multi-purpose app that offers the services of a navigation app, a social networking app, a music streaming service, and a weather app will likely appeal to the most number of potential users for the iOS app store.

> The above information indicates that the development of an app that offers the services of a social networking/communication app will likely appeal to the most number of potential users for the android and iOS app stores.