# Profitable App Profiles for the App Store and Google Play Markets
Our aim is to find mobile apps that are profitable to both AppleStore and Google playstore which would help developers understand what kinds of apps are likely to attract more users.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.



# Opening and exploring data files
We have data files for both data sets.

- A data set containing data about approximately 10,000 Android apps from Google Play;  You can download the data set directly from this   <a href="https://www.kaggle.com/lava18/google-play-store-apps#googleplaystore.csv">Google Playstore dataset</a> .

- A data set containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this link.

Are you excited to open these files to explore? our journey begins now

In [1]:
from csv import reader

# Opening google play store data set
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]
    
# Opening Apple Store data set
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios =ios[1:]

Lets now start exploring each row. Oue explore_data() function helps us to explore each row.
- We are printing the column names and
- the first 5 rows in googleplaystore

The description of each column can be read here                            <a href="https://www.kaggle.com/lava18/google-play-store-apps">Link to column description</a>

googleplaystore.csv - row and column count excluding header.
- no_of_cols:  13
- no_of_rows:  10841 

In [2]:
def explore_data(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print("\n")
    if rows_and_columns:
        print("no_of_cols: ", len(dataset[0]))
        print("no_of_rows: ", len(dataset))
print(android_header)
print("\n")
    
explore_data(android,0,5,True)
        

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Eve

We are printing ios header and the first five rows of AppleStore.csv.

AppleStore.csv -  row and column count excluding header
- no_of_cols:  16
- no_of_rows:  7197

In [3]:
print(ios_header)
explore_data(ios,0,5, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


no_of_cols:  16
no_of_rows:  7197


# Data Cleaning
Lets remove
- Inaccurate dta
- Duplicate app entries
- non English apps

Let dive deep into the inaccurate data
google playstore.csv has an incorrect row which has less columns compared to the other rows.
Lets delete the incorrect row.

In [4]:
print(android[10472]) #incorrect row
print("\n")
print("cols of incorrect data - ", len(android[10472]))
print("\n")
print(android_header) #prints header
print("\n")
print("cols of android header - ", len(android_header))
print("\n")
print(android[0]) #prints correct row
print("\n")
print("cols of normal row - ", len(android[0]))

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


cols of incorrect data -  12


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


cols of android header -  13


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


cols of normal row -  13


# Remove incorrect data

function removeIncorrectData(android):
- error_dataset is a list which stores the each record of error.
- Takes android dataset as parameter and removes all the rows if len(row) != len(android_header).
- The function could be executed any number of times. If there is no error record, it wouldn't delete any rows.



In [5]:
def removeIncorrectData(android):
    error_dataset = []
    print("Initial count",len(android))
    
    for app in android:
        if  len(android_header) != len(app):
            error_dataset.append(app)
             
    for i in error_dataset:
        android.remove(i)
    print("latest count",len(android))
    print("length of hesader" , len(android_header))
    print("length of app", len(app))
    
removeIncorrectData(android)


Initial count 10841
latest count 10840
length of hesader 13
length of app 13


# Removing duplicate entries

First, lets find out the total number of dupliacte apps.

we have 2 lists- 
duplicate_android_apps = [] - stores app names which are repeating
unique_android_apps = [] - stores unique app names

Lets see the total number of duplicate apps in android. First column in googleplaystore.csv has the app name.


In [6]:
duplicate_android_apps = []
unique_android_apps = []
for app in android[0:]:
    name = app[0] # app name is stored at index 0
    if name in unique_android_apps:
        duplicate_android_apps.append(name)
    else:
        unique_android_apps.append(name)
print("Total no. of duplicate apps- ", len(duplicate_android_apps))
print("Total no. of unique apps- ", len(unique_android_apps))
print("Examples of duplicate apps- ", duplicate_android_apps[:10])

Total no. of duplicate apps-  1181
Total no. of unique apps-  9659
Examples of duplicate apps-  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


Now, lets see whether ios has any duplicate apps.
ios has only 2 duplicate apps.

In [7]:
duplicate_ios_apps =[]
unique_ios_apps =[]
for app in ios[0:]:
    name = app[1]  # app name is stored at index 1
    if name in unique_ios_apps:
        duplicate_ios_apps.append(name)
    else:
        unique_ios_apps.append(name)
print("Total no. of duplicate ios apps- ", len(duplicate_ios_apps))
print("Total no. of unique ios apps- ", len(unique_ios_apps))
print("The duplicate ios apps are- ", duplicate_ios_apps)

Total no. of duplicate ios apps-  2
Total no. of unique ios apps-  7195
The duplicate ios apps are-  ['Mannequin Challenge', 'VR Roller Coaster']


Lets, Create a dictionary reviews_max = {} 
where each key is a unique app name, and the value is the highest number of reviews of that app.
Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)
The below code tells that each duplicate app had more than one review.

In [8]:
reviews_max ={}
for app in android[0:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
         reviews_max[name] = n_reviews
print("len of reviews_max:", len(reviews_max))
print("len of unique_apps:" , len(unique_android_apps))

len of reviews_max: 9659
len of unique_apps: 9659


Using the dictionary created above, now lets remove the duplicate rows.
we have 2 lists -
android_clean - stores new cleaned data set and 
already_added - stores just the app name

In [9]:
android_clean = []
already_added = []
for app in android[0:]:
    name = app[0]
    n_reviews = float(app[3])
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
explore_data(android_clean,0,5, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


no_of_cols:  13
no_of_rows:  9659


# Removing non English Apps
Our goal as a company is to build free English apps which are likely to attract many users.
Let's see whether there is any non English app in both android and ios dataset.

Lets write a function isEnglish to incorporate this logic.

In [10]:
def isEnglish(inputStr):
    non_char_count = 0
    for i in inputStr:
        if ord(i) > 127 :
            non_char_count += 1
    if(non_char_count > 3):
            return False
    else:
        return True
print(isEnglish('Instagram'))
print(isEnglish('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(isEnglish('Docs To Go™ Free Office Suite'))
print(isEnglish('Instachat 😜'))
    

True
False
True
True


Now, lets pass both datasets to isEnglish app and find out the English app count.

In [11]:
android_English = []
ios_English = []

for app in android_clean:
    name = app[0]
    if isEnglish(name):
        android_English.append(app)
        
for app in ios:
    name = app[1]
    if isEnglish(name):
        ios_English.append(app)
print("count of Android English app - ", len(android_English))
print("count of ios English app - ", len(ios_English))
print("\n")

print(explore_data(android_English,0,3,True))
print(explore_data(ios_English,0,3,True))
        


count of Android English app -  9614
count of ios English app -  6183


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


no_of_cols:  13
no_of_rows:  9614
None
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928

# Isolating Free apps
Lets focus on finding out apps which are free.
- Lets iterate through both sets of data and find which apps are free.
- If price starts with $, we will strip it and process the data.


In [12]:
free_android_apps = []
free_ios_apps =[]

for app in android_English:
    priceStr = app[7] #stores price as string
    if priceStr.startswith('$',0): # stripping '$', if price starts with '$'
        price = float(priceStr.strip('$'))
        if price == 0.0:
            free_android_apps.append(app)  
    elif float(priceStr) == 0:
        free_android_apps.append(app)
        
for app in ios_English:
    price = float(app[4])
    if price == 0:
        free_ios_apps.append(app)
        
print("free android app count", len(free_android_apps))
print("free ios app count", len(free_ios_apps))
    

free android app count 8864
free ios app count 3222


# Most common apps by Genre
function freq_table
-  creates a dictionary based on the columnname we pass as the parameter.
-   returns the percentage dictionary calculated.

function display_table
- sorts the dictionary based on the values.(we have to make a tuple list out of each key in dictionary for acheiving this goal)


In [13]:
def freq_table(dataset,columnname):
    header_length = len(dataset[0])
    if header_length == len(ios_header): # header_length = 16 for AppleStore.csv
        index = ios_header.index(columnname) 
        total_rows = len(free_ios_apps)
    else:
        index = android_header.index(columnname) 
        total_rows = len(free_android_apps)
    print("index of "+columnname +" is", index)
    print("total rows: ",total_rows)
    print("\n")
    
    table_dict={}
    for app in dataset:
        key = app[index]
        if key in  table_dict:
             table_dict[key] += 1
        else:
             table_dict[key] = 1
    print("dictionary of "+columnname+" - ",table_dict)
    print("\n")
    table_percentage ={}
    for key in table_dict:
        percentage = (table_dict[key]/total_rows)*100
        table_percentage[key] = percentage
    print("percentage dict of "+columnname+" - ",table_percentage)
    print("\n")
    return table_percentage


def display_table(dataset, columnname):
    table = freq_table(dataset, columnname)
    tuple_table = []
    for key in table:
        tuple_row = (table[key], key)
        tuple_table.append(tuple_row)

    sorted_table = sorted(tuple_table, reverse = True)
    print("sorted "+columnname +" table for display")
    print("\n")
    for row in sorted_table:
        print(row[1], ':', row[0])
display_table(free_ios_apps,'prime_genre')

index of prime_genre is 11
total rows:  3222


dictionary of prime_genre -  {'Food & Drink': 26, 'News': 43, 'Book': 14, 'Finance': 36, 'Weather': 28, 'Travel': 40, 'Photo & Video': 160, 'Business': 17, 'Sports': 69, 'Games': 1874, 'Reference': 18, 'Health & Fitness': 65, 'Entertainment': 254, 'Catalogs': 4, 'Shopping': 84, 'Music': 66, 'Productivity': 56, 'Lifestyle': 51, 'Social Networking': 106, 'Utilities': 81, 'Navigation': 6, 'Education': 118, 'Medical': 6}


percentage dict of prime_genre -  {'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Finance': 1.1173184357541899, 'Weather': 0.8690254500310366, 'Travel': 1.2414649286157666, 'Photo & Video': 4.9658597144630665, 'Business': 0.5276225946617008, 'Sports': 2.1415270018621975, 'Games': 58.16263190564867, 'Reference': 0.5586592178770949, 'Navigation': 0.186219739292365, 'Health & Fitness': 2.0173805090006205, 'Entertainment': 7.883302296710118, 'Catalogs': 0.12414649286157665, 'Food & Drink': 0.8069522036002483, 'Music'

In [14]:
display_table(free_android_apps,'Genres')

index of Genres is 9
total rows:  8864


dictionary of Genres -  {'Auto & Vehicles': 82, 'Libraries & Demo': 83, 'Lifestyle': 345, 'Educational;Brain Games': 6, 'Dating': 165, 'Food & Drink': 110, 'Role Playing;Pretend Play': 4, 'Casual;Brain Games': 12, 'Video Players & Editors': 157, 'Weather': 71, 'Sports': 307, 'Tools;Education': 1, 'Board;Action & Adventure': 2, 'Strategy': 81, 'House & Home': 73, 'Travel & Local;Action & Adventure': 1, 'Strategy;Action & Adventure': 1, 'Entertainment;Creativity': 3, 'Shopping': 199, 'Parenting;Brain Games': 1, 'Casino': 38, 'Card;Action & Adventure': 1, 'Role Playing': 83, 'Puzzle;Creativity': 2, 'Casual': 156, 'Simulation': 181, 'Entertainment;Pretend Play': 2, 'Role Playing;Action & Adventure': 3, 'Education': 474, 'Health & Fitness;Action & Adventure': 1, 'Health & Fitness': 273, 'Educational;Pretend Play': 8, 'Personalization': 294, 'Books & Reference;Education': 1, 'Educational;Creativity': 3, 'Education;Action & Adventure': 3, 'Art & Design

In [15]:
display_table(free_android_apps,'Category')

index of Category is 1
total rows:  8864


dictionary of Category -  {'PERSONALIZATION': 294, 'COMICS': 55, 'WEATHER': 71, 'SPORTS': 301, 'VIDEO_PLAYERS': 159, 'MEDICAL': 313, 'SOCIAL': 236, 'ART_AND_DESIGN': 57, 'TRAVEL_AND_LOCAL': 207, 'GAME': 862, 'EVENTS': 63, 'EDUCATION': 103, 'LIBRARIES_AND_DEMO': 83, 'FAMILY': 1676, 'NEWS_AND_MAGAZINES': 248, 'COMMUNICATION': 287, 'FINANCE': 328, 'HEALTH_AND_FITNESS': 273, 'BEAUTY': 53, 'HOUSE_AND_HOME': 73, 'LIFESTYLE': 346, 'BOOKS_AND_REFERENCE': 190, 'PHOTOGRAPHY': 261, 'TOOLS': 750, 'ENTERTAINMENT': 85, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'AUTO_AND_VEHICLES': 82, 'BUSINESS': 407, 'FOOD_AND_DRINK': 110, 'MAPS_AND_NAVIGATION': 124, 'DATING': 165, 'SHOPPING': 199}


percentage dict of Category -  {'PERSONALIZATION': 3.3167870036101084, 'COMICS': 0.6204873646209386, 'WEATHER': 0.8009927797833934, 'SPORTS': 3.395758122743682, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'ART_AND_DESIGN': 0.6430505415162455, 'TRAVEL_AND_LOCAL': 2.

Analysis on the frequency tables

Apps of Games and entertainment are hot in Appstore whereas Family and Games app based on Category and Tools and Entertainment apps based on Genres in googleplaystore.

# Most popular apps on AppStore by installs

In google playstore, we have a column to get the number of installs wheras appstore doesn't have it.

Lets incorporate this logic to find the number of installs.

1. Lets find the total number of user rating for each Genre.
2. To get the number of installs, lets divide the total obtained in step 1 by total the no_of records representing that Genre.


In [16]:
dict_table = freq_table(free_ios_apps,'prime_genre')
index = ios_header.index('prime_genre')
index_rating_count=ios_header.index('rating_count_tot')
for key in dict_table:
    total_user_rating = 0
    genre_len = 0
    for app in free_ios_apps:
        app_genre = app[index]
        if app_genre == key:
            total_user_rating += float(app[index_rating_count])
            genre_len += 1
            
    avg_user_rating = total_user_rating/genre_len
    print(key, ":", avg_user_rating)
        



index of prime_genre is 11
total rows:  3222


dictionary of prime_genre -  {'Food & Drink': 26, 'News': 43, 'Book': 14, 'Finance': 36, 'Weather': 28, 'Travel': 40, 'Photo & Video': 160, 'Business': 17, 'Sports': 69, 'Games': 1874, 'Reference': 18, 'Health & Fitness': 65, 'Entertainment': 254, 'Catalogs': 4, 'Shopping': 84, 'Music': 66, 'Productivity': 56, 'Lifestyle': 51, 'Social Networking': 106, 'Utilities': 81, 'Navigation': 6, 'Education': 118, 'Medical': 6}


percentage dict of prime_genre -  {'Shopping': 2.60707635009311, 'News': 1.3345747982619491, 'Finance': 1.1173184357541899, 'Weather': 0.8690254500310366, 'Travel': 1.2414649286157666, 'Photo & Video': 4.9658597144630665, 'Business': 0.5276225946617008, 'Sports': 2.1415270018621975, 'Games': 58.16263190564867, 'Reference': 0.5586592178770949, 'Navigation': 0.186219739292365, 'Health & Fitness': 2.0173805090006205, 'Entertainment': 7.883302296710118, 'Catalogs': 0.12414649286157665, 'Food & Drink': 0.8069522036002483, 'Music'

In [23]:
for apps in free_ios_apps:
    if apps[11] == 'Photo & Video':
        print(apps[1], ":", apps[5])

Instagram : 2161558
Snapchat : 323905
YouTube - Watch Videos, Music, and Live Streams : 278166
Pic Collage - Picture Editor & Photo Collage Maker : 123433
Funimate video editor: add cool effects to videos : 123268
musical.ly - your video social network : 105429
Photo Collage Maker & Photo Editor - Live Collage : 93781
Vine Camera : 90355
Google Photos - unlimited photo and video storage : 88742
Flipagram : 79905
Mixgram - Picture Collage Maker - Pic Photo Editor : 54282
Shutterfly: Prints, Photo Books, Cards Made Easy : 51427
Pic Jointer – Photo Collage, Camera Effects Editor : 51330
Color Pop Effects - Photo Editor & Picture Editing : 45320
Photo Grid - photo collage maker & photo editor : 40531
iSwap Faces LITE : 39722
MOLDIV - Photo Editor, Collage & Beauty Camera : 39501
Photo Editor by Aviary : 39501
Photo Lab: Picture Editor, effects & fun face app : 34585
Rookie Cam - Photo Editor & Filter Camera : 33921
FotoRus -Camera & Photo Editor & Pic Collage Maker : 32558
PicsArt Photo St

In [24]:
for apps in free_ios_apps:
    if apps[11] == 'Shopping':
        print(apps[1], ":", apps[5])

Groupon - Deals, Coupons & Discount Shopping App : 417779
eBay: Best App to Buy, Sell, Save! Online Shopping : 262241
Wish - Shopping Made Fun : 141960
shopkick - Shopping Rewards & Discounts : 130823
Amazon App: shop, scan, compare, and read reviews : 126312
Target : 108131
Zappos: shop shoes & clothes, fast free shipping : 103655
Walgreens – Pharmacy, Photo, Coupons and Shopping : 88885
Best Buy : 80424
Walmart: Free 2-Day Shipping,* Easy Store Shopping : 70286
OfferUp - Buy. Sell. Simple. : 57348
Apple Store : 55171
Shop Savvy Barcode Scanner - Price Compare & Deals : 54630
Ibotta: Cash Back App, Grocery Coupons & Shopping : 44313
letgo: Buy & Sell Second Hand Stuff : 38424
CVS Pharmacy : 35981
Victoria’s Secret – The Sexiest Bras & Lingerie : 34507
Etsy: Shop Handmade, Vintage & Creative Goods : 30434
Gilt : 26353
Mercari: Shopping Marketplace to Buy & Sell Stuff : 24244
Shopular Coupons, Weekly Deals for Target, Walmart : 22729
RetailMeNot Shopping Deals, Coupons, Savings : 18544


# Most popular apps in google playstore by installs


In [19]:
display_table(free_android_apps,'Installs')

index of Installs is 5
total rows:  8864


dictionary of Installs -  {'500+': 288, '1+': 45, '100+': 613, '5,000,000+': 605, '10,000,000+': 935, '100,000+': 1024, '10+': 314, '1,000,000+': 1394, '500,000,000+': 24, '1,000+': 744, '500,000+': 493, '1,000,000,000+': 20, '10,000+': 904, '50,000,000+': 204, '50+': 170, '0+': 4, '0': 1, '100,000,000+': 189, '5,000+': 400, '50,000+': 423, '5+': 70}


percentage dict of Installs -  {'500+': 3.2490974729241873, '0+': 0.04512635379061372, '100+': 6.915613718411552, '5,000,000+': 6.825361010830325, '10,000,000+': 10.548285198555957, '50+': 1.917870036101083, '100,000+': 11.552346570397113, '10+': 3.5424187725631766, '1,000,000+': 15.726534296028879, '1,000+': 8.393501805054152, '1,000,000,000+': 0.22563176895306858, '10,000+': 10.198555956678701, '500,000,000+': 0.2707581227436823, '50,000,000+': 2.3014440433213, '5,000+': 4.512635379061372, '0': 0.01128158844765343, '1+': 0.5076714801444043, '100,000,000+': 2.1322202166064983, '500,000+': 5.561

Find out the app preference with installs in google playstore. Lets find the total number of app installs for each genre in google playstore.

In [20]:
android_dict = freq_table(free_android_apps,'Category')
index = android_header.index('Category')
index_installs = android_header.index('Installs')
for key in android_dict:
    total_installs = 0;
    len_category = 0
    for app in free_android_apps:
        category_app = app[index]
        if category_app == key:
            n_installs = app[index_installs]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total_installs += float(n_installs)
            len_category += 1
    avg = total_installs/ len_category
    print(key,":",avg)
      
        

index of Category is 1
total rows:  8864


dictionary of Category -  {'PERSONALIZATION': 294, 'COMICS': 55, 'WEATHER': 71, 'SPORTS': 301, 'VIDEO_PLAYERS': 159, 'MEDICAL': 313, 'SOCIAL': 236, 'ART_AND_DESIGN': 57, 'TRAVEL_AND_LOCAL': 207, 'GAME': 862, 'EVENTS': 63, 'EDUCATION': 103, 'LIBRARIES_AND_DEMO': 83, 'FAMILY': 1676, 'NEWS_AND_MAGAZINES': 248, 'COMMUNICATION': 287, 'FINANCE': 328, 'HEALTH_AND_FITNESS': 273, 'BEAUTY': 53, 'HOUSE_AND_HOME': 73, 'LIFESTYLE': 346, 'BOOKS_AND_REFERENCE': 190, 'PHOTOGRAPHY': 261, 'TOOLS': 750, 'ENTERTAINMENT': 85, 'PRODUCTIVITY': 345, 'PARENTING': 58, 'AUTO_AND_VEHICLES': 82, 'BUSINESS': 407, 'FOOD_AND_DRINK': 110, 'MAPS_AND_NAVIGATION': 124, 'DATING': 165, 'SHOPPING': 199}


percentage dict of Category -  {'PERSONALIZATION': 3.3167870036101084, 'COMICS': 0.6204873646209386, 'WEATHER': 0.8009927797833934, 'SPORTS': 3.395758122743682, 'MEDICAL': 3.531137184115524, 'SOCIAL': 2.6624548736462095, 'ART_AND_DESIGN': 0.6430505415162455, 'TRAVEL_AND_LOCAL': 2.

In [21]:

for app in free_android_apps :
    if app[1] == 'PHOTOGRAPHY' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

TouchNote: Cards & Gifts : 1,000,000+
FreePrints – Free Photos Delivered : 1,000,000+
LALALAB prints your photos, photobooks and magnets : 1,000,000+
Snapfish : 1,000,000+
Motorola Camera : 50,000,000+
HD Camera - Best Cam with filters & panorama : 5,000,000+
LightX Photo Editor & Photo Effects : 10,000,000+
Sweet Snap - live filter, Selfie photo edit : 10,000,000+
HD Camera - Quick Snap Photo & Video : 1,000,000+
Waterfall Photo Frames : 1,000,000+
Huji Cam : 5,000,000+
Unicorn Photo : 1,000,000+
HD Camera : 5,000,000+
Makeup Editor -Beauty Photo Editor & Selfie Camera : 1,000,000+
Makeup Photo Editor: Makeup Camera & Makeup Editor : 1,000,000+
Moto Photo Editor : 5,000,000+
InstaBeauty -Makeup Selfie Cam : 50,000,000+
Photo Frame : 10,000,000+
Selfie Camera - Photo Editor & Filter & Sticker : 50,000,000+
Selfie Camera: Beauty Camera, Photo Editor,Collage : 1,000,000+
Night Photo Frame : 1,000,000+
Selfie Photo Editor : 1,000,000+
ASUS Gallery : 50,000,000+
Kids Photo Frames : 1,000,0

In [22]:
for app in free_android_apps :
    if app[1] == 'SHOPPING' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Amazon for Tablets : 10,000,000+
OfferUp - Buy. Sell. Offer Up : 10,000,000+
Shopee - No. 1 Online Shopping : 10,000,000+
Shopee: No.1 Online Shopping : 10,000,000+
Kroger : 5,000,000+
Walmart : 10,000,000+
letgo: Buy & Sell Used Stuff, Cars & Real Estate : 50,000,000+
Lazada - Online Shopping & Deals : 50,000,000+
OLX - Buy and Sell : 50,000,000+
The wall : 1,000,000+
Flipp - Weekly Shopping : 10,000,000+
Shrimp skin shopping: spend less, buy better : 5,000,000+
Lotte Home Shopping LOTTE Homeshopping : 5,000,000+
Horn, free country requirements : 1,000,000+
Jiji.ng : 1,000,000+
GS SHOP : 10,000,000+
The birth : 50,000,000+
Home & Shopping - Only in apps. 10% off + 10% off : 10,000,000+
EHS Dongsen Shopping : 1,000,000+
bigbasket - online grocery : 5,000,000+
Bukalapak - Buy and Sell Online : 10,000,000+
Life market : 1,000,000+
Jabong Online Shopping App : 10,000,000+
Family Dollar : 1,000,000+
Jumia online shopping : 10,000,000+
Mercado Libre: Find your favorite brands : 50,000,000+


# Conclusion

if we build an app which would combine both photography and shopping. It would be a hit in both the markets. Something like, when you shop a new dress online, adding an image of it along with your picture to our app would combine this combination and will give a new picture of you in that dress along with hair style and other accessories which would match the dress.