# Profitable App Profiles for Mobile Markets
- Find mobile app profiles that are profitable in the App Store and Google Play markets
- Showcase which app should be built based on the historical data
- In a real-life scenario, the app should be a source of revenue, thus discovering which genre of apps attract more users

## Data source

* [GooglePlay](https://www.kaggle.com/lava18/google-play-store-apps): ~10,000 apps
* [AppStore](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps): ~7,000 apps

## Imports

In [43]:
import sys
!{sys.executable} -m pip install nltk

You should consider upgrading via the '/dataquest/system/env/python3/bin/python3 -m pip install --upgrade pip' command.[0m


In [48]:
import pandas as pd
import nltk
nltk.download('words')
from nltk.corpus import words

[nltk_data] Downloading package words to /home/dq/nltk_data...
[nltk_data]   Unzipping corpora/words.zip.


## Data Explore and Wrangle

In [26]:
googleplay = pd.read_csv("googleplaystore.csv")
print(googleplay.shape)
googleplay.head(2)

(10841, 13)


Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
1,Coloring book moana,ART_AND_DESIGN,3.9,967,14M,"500,000+",Free,0,Everyone,Art & Design;Pretend Play,"January 15, 2018",2.0.0,4.0.3 and up


In [10]:
appstore = pd.read_csv("AppleStore.csv")
print(appstore.shape)
appstore.head(2)

(7197, 16)


Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.0,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.0,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1


GooglePlay | 10841 apps and 13 columns
* useful features at glance: 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', 'Genres'

AppStore | 7197 iOS apps and 16 columns
* useful features at glance: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'prime_genre'

In [18]:
#kaggle discussion section outlines row 10472 is wrong data
googleplay.iloc[10472] #check with header

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

In [27]:
googleplay.drop(10472, inplace=True) #1 row less 

In [34]:
#drop duplicates but keep the highest review
googleplay= googleplay.sort_values('Reviews', ascending=False).drop_duplicates('App').sort_index()
googleplay.head(2)

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up


In [33]:
googleplay.shape

(9659, 13)

In [70]:
#drop rows that are not english
#Word = list(set(words.words()))
#googleplay = googleplay[googleplay['App'].str.contains('|'.join(Word))]

def is_en(string):
    non_ascii = 0
    for c in string:
        if ord(c) > 127:
            non_ascii += 1
    return non_ascii < 3 #checking for less than 3 non-ascii to minimize data loss

android_eng = []
ios_eng = []
for i in googleplay.App:
    if is_en(i):
        android_eng.append(i)

for i in appstore.track_name:
    if is_en(i):
        ios_eng.append(i)

In [71]:
print(len(android_eng))
print(len(ios_eng))

9597
6155


In [75]:
googleplay[googleplay['App'].isin(android_eng)]

Unnamed: 0,App,Category,Rating,Reviews,Size,Installs,Type,Price,Content Rating,Genres,Last Updated,Current Ver,Android Ver
0,Photo Editor & Candy Camera & Grid & ScrapBook,ART_AND_DESIGN,4.1,159,19M,"10,000+",Free,0,Everyone,Art & Design,"January 7, 2018",1.0.0,4.0.3 and up
2,"U Launcher Lite – FREE Live Cool Themes, Hide ...",ART_AND_DESIGN,4.7,87510,8.7M,"5,000,000+",Free,0,Everyone,Art & Design,"August 1, 2018",1.2.4,4.0.3 and up
3,Sketch - Draw & Paint,ART_AND_DESIGN,4.5,215644,25M,"50,000,000+",Free,0,Teen,Art & Design,"June 8, 2018",Varies with device,4.2 and up
4,Pixel Draw - Number Art Coloring Book,ART_AND_DESIGN,4.3,967,2.8M,"100,000+",Free,0,Everyone,Art & Design;Creativity,"June 20, 2018",1.1,4.4 and up
5,Paper flowers instructions,ART_AND_DESIGN,4.4,167,5.6M,"50,000+",Free,0,Everyone,Art & Design,"March 26, 2017",1.0,2.3 and up
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10836,Sya9a Maroc - FR,FAMILY,4.5,38,53M,"5,000+",Free,0,Everyone,Education,"July 25, 2017",1.48,4.1 and up
10837,Fr. Mike Schmitz Audio Teachings,FAMILY,5.0,4,3.6M,100+,Free,0,Everyone,Education,"July 6, 2018",1.0,4.1 and up
10838,Parkinson Exercices FR,MEDICAL,,3,9.5M,"1,000+",Free,0,Everyone,Medical,"January 20, 2017",1.0,2.2 and up
10839,The SCP Foundation DB fr nn5n,BOOKS_AND_REFERENCE,4.5,114,Varies with device,"1,000+",Free,0,Mature 17+,Books & Reference,"January 19, 2015",Varies with device,Varies with device


In [76]:
appstore[appstore['track_name'].isin(ios_eng)]

Unnamed: 0,id,track_name,size_bytes,currency,price,rating_count_tot,rating_count_ver,user_rating,user_rating_ver,ver,cont_rating,prime_genre,sup_devices.num,ipadSc_urls.num,lang.num,vpp_lic
0,284882215,Facebook,389879808,USD,0.00,2974676,212,3.5,3.5,95.0,4+,Social Networking,37,1,29,1
1,389801252,Instagram,113954816,USD,0.00,2161558,1289,4.5,4.0,10.23,12+,Photo & Video,37,0,29,1
2,529479190,Clash of Clans,116476928,USD,0.00,2130805,579,4.5,4.5,9.24.12,9+,Games,38,5,18,1
3,420009108,Temple Run,65921024,USD,0.00,1724546,3842,4.5,4.0,1.6.2,9+,Games,40,5,1,1
4,284035177,Pandora - Music & Radio,130242560,USD,0.00,1126879,3594,4.0,4.5,8.4.1,12+,Music,37,4,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7182,1070854722,Be-be-bears!,480781312,USD,2.99,0,0,0.0,0.0,1.0.2.5,4+,Games,35,5,13,1
7186,1169971902,Hey Duggee: We Love Animals,136347648,USD,2.99,0,0,0.0,0.0,1.2,4+,Games,40,5,1,1
7192,1170406182,Shark Boom - Challenge Friends with your Pet,245415936,USD,0.00,0,0,0.0,0.0,1.0.9,4+,Games,38,5,1,1
7194,1070052833,Go!Go!Cat!,91468800,USD,0.00,0,0,0.0,0.0,1.1.2,12+,Games,37,2,2,1


In [85]:
#isolate free apps
#want to see profitable app that make revenue
googleplay = googleplay[googleplay['Price'] == '0']
appstore = appstore[appstore['price'] == 0.00]

In [86]:
print(googleplay.shape)
print(appstore.shape)

(8901, 13)
(4056, 16)


## Discover Common Apps by Genre

In [90]:
googleplay.Category.value_counts(ascending=False)

FAMILY                 1693
GAME                    861
TOOLS                   750
BUSINESS                408
LIFESTYLE               350
PRODUCTIVITY            346
FINANCE                 328
MEDICAL                 312
SPORTS                  301
PERSONALIZATION         295
COMMUNICATION           288
HEALTH_AND_FITNESS      273
PHOTOGRAPHY             262
NEWS_AND_MAGAZINES      250
SOCIAL                  236
TRAVEL_AND_LOCAL        207
SHOPPING                200
BOOKS_AND_REFERENCE     194
DATING                  165
VIDEO_PLAYERS           160
MAPS_AND_NAVIGATION     126
FOOD_AND_DRINK          110
EDUCATION               105
ENTERTAINMENT            84
LIBRARIES_AND_DEMO       83
AUTO_AND_VEHICLES        82
HOUSE_AND_HOME           73
WEATHER                  71
EVENTS                   63
PARENTING                58
ART_AND_DESIGN           58
COMICS                   56
BEAUTY                   53
Name: Category, dtype: int64

Family category is mostly for kids, the apps are more practical such as family, tools, business, lifestyle, productivity, etc

In [93]:
googleplay.Genres.value_counts(ascending=False)

Tools                              749
Entertainment                      542
Education                          480
Business                           408
Lifestyle                          349
                                  ... 
Art & Design;Action & Adventure      1
Simulation;Education                 1
Lifestyle;Pretend Play               1
Lifestyle;Education                  1
Strategy;Creativity                  1
Name: Genres, Length: 115, dtype: int64

prime_genre vs Genres: more categories in the general column but contains top apps such as Tools, Entertainment, Education, etc

In [88]:
appstore.prime_genre.value_counts(ascending=False)

Games                2257
Entertainment         334
Photo & Video         167
Social Networking     143
Education             132
Shopping              121
Utilities             109
Lifestyle              94
Finance                84
Sports                 79
Health & Fitness       76
Music                  67
Book                   66
Productivity           62
News                   58
Travel                 56
Food & Drink           43
Weather                31
Business               20
Reference              20
Navigation             20
Catalogs                9
Medical                 8
Name: prime_genre, dtype: int64

Seeing a lot of fun apps such as games, entertainment, photo and video, social networking, sports, music, etc

## Popular genre by users

In [152]:
def print_full(x):
    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)
    pd.set_option('display.width', 2000)
    pd.set_option('display.float_format', '{:20,.2f}'.format)
    pd.set_option('display.max_colwidth', None)
    print(x)
    pd.reset_option('display.max_rows')
    pd.reset_option('display.max_columns')
    pd.reset_option('display.width')
    pd.reset_option('display.float_format')
    pd.reset_option('display.max_colwidth')

In [154]:
apptemp = appstore.groupby(['prime_genre', 'user_rating'])['rating_count_tot'].agg(['mean'])

In [155]:
print_full(apptemp)

                                              mean
prime_genre       user_rating                     
Book              0.00                        0.00
                  1.00                        1.00
                  3.00                      180.00
                  3.50                  126,044.00
                  4.00                   11,121.33
                  4.50                   17,118.00
                  5.00                   14,638.50
Business          0.00                        0.00
                  2.00                       53.00
                  2.50                      578.50
                  3.00                    3,289.00
                  3.50                      512.50
                  4.00                   16,778.50
                  4.50                    8,971.00
                  5.00                      446.00
Catalogs          0.00                        0.00
                  3.50                      213.00
                  4.00         

In [159]:
googleplay.dropna(inplace=True)

In [162]:
gtemp = googleplay.groupby(['Category', 'Reviews'])['Rating'].agg(['mean'])

In [163]:
print_full(gtemp)

                                             mean
Category            Reviews                      
ART_AND_DESIGN      1                        5.00
                    1015                     4.20
                    1070                     4.60
                    1120                     4.20
                    117                      4.20
                    118                      4.70
                    121                      4.70
                    13                       4.40
                    132                      4.30
                    136                      3.90
                    13791                    4.40
                    13880                    4.40
                    1518                     4.40
                    158                      4.70
                    159                      4.10
                    167                      4.40
                    174531                   4.70
                    175                      4.20


Music and Food apps are most attractive in the App Store, whereas Book and Reference apps are popular in the GooglePlay Store