<a href="https://colab.research.google.com/github/ErikoMc/dq-guided-project/blob/master/dq_guided_project_june30_2020.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Profitable Apps Profiles for the Apple Store and Google Play Markets

The purpose of this project is too analyze data to help our developers understand what type of apps are likely to attract more users.  We will build apps that are free to download and install.  Therefore, our main source of revenue consists of in-app ads.  We will identify the type of apps that is mostly influenced by the number of users.

**Data Sets**
- A data set containing Android apps from Google Play
- A data set containing iOS apps from Apple Store


In [35]:
# define a function to explore data sets
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [36]:
from google.colab import files
uploaded = files.upload()

Saving AppleStore.csv to AppleStore (1).csv
Saving googleplaystore.csv to googleplaystore (1).csv


In [37]:
from csv import reader

# The Google Play data set
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

# The Apple Store data set
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
iso = ios[1:]

In [38]:
# explore the Google Play dataset
# print first few rows and print numbers of rows and columns
explore_data(android, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [39]:
# explore the Apple Store data set
# print first few rows and print numbers of rows and columns
explore_data(iso, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


In [40]:
# print the column names for the Google Play data set
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [41]:
# print the column names for the Apple Store data set
print(ios_header)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']




*   The Google Play data set has 10841 apps and 13 columns.  The columns that might be useful for our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'genre'.  The detail about each column can be found in the [documentation](https://www.kaggle.com/lava18/google-play-store-apps).
*   The Apple Store data set has 7197 apps and 16 columns.  The columns that might be useful for our analysis are 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'.  The detail about each column can be found in the [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps).



**Data Cleaning: Deleting Wrong Data**

The [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) describes that there is a wrong rating for entry 10472 in the Google Play data set.  We will find this entry and delete it if the error is confirmed.



In [42]:
print(android_header)
print('\n')
print(android[10472]) # row 10472
print(android[0]) # correct row

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


By comparing the row 10472 and the correct row, it seems the cotegory of the row 10472 is missing.

In [43]:
print(len(android))
# delete the row 10472
del android[10472]
# make sure that only one row is deleted
print(len(android))

10841
10840


**Data Cleaning: Removing Duplicate Entries**

For example, there are duplicate entries for "Instagram" in the Google Play data set.

In [44]:
for app in android:
  name = app[0]
  if name == 'Instagram':
    print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


We will find a number of the duplicated entries and print the name of the duplicated apps.

In [45]:
duplicated_apps = []
unique_apps = []

for app in android:
  name = app[0]
  if name in unique_apps:
    duplicated_apps.append(name)
  else:
      unique_apps.append(name)

print('Number of duplicated apps:', len(duplicated_apps))
print('\n')
print('Examples of duplicated app:', duplicated_apps[:15])

Number of duplicated apps: 1181


Examples of duplicated app: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Form the example of the duplicated Instagram entries, we can see the number of reviews is different.  The higher the number of reviews, the more recent the data should be.  In stead of ramdomly removing the duplicated entries, we will keep the entry has the highest number of reviews.

In [46]:
print('Expected length:', len(android) - 1181)

Expected length: 9659


There are 1181 duplicated apps and the expected number of apps in the data set after removing the duplicated apps is 9659.


In [47]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Length of the reviews_max dictionary:', len(reviews_max))

Length of the reviews_max dictionary: 9659


In [48]:
android_clean = [] # new cleaned Google Play data set
already_added = [] # store apps name only

for app in android:
  name = app[0]
  n_reviews = float(app[3])
  if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)


In [49]:
# explore the cleaned Google Play data set
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


**Data Cleaning: Removing Non-English Apps**

Both data sets have apps with names that suggest they are not directed toward an English-speaking audience.  We will remove those apps since our target is an English-speaking audience.


In [50]:
# define a function to return False if there are non-English characters
def is_english(string):
  for character in string:
    if ord(character) > 127: # accordign to ASCII system, all characters in English are in a range of 0 to 127
      return False

    return True

In [51]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('™'))
print(is_english('😜'))

True
False
True
True
False
False


From the above result the function `is_english` cannot predict correctly the English apps with a special symbols or an emoji.

We will only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range. This new function will not be perfect still but better and should be effective.

In [52]:
# re-define the function
def is_english(string):
  non_ascii = 0
  
  for character in string:
    if ord(character) > 127:
      non_ascii += 1

  if non_ascii >3:
    return False
  else:
    return True


In [53]:
print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


We will use the new is_english function to remove non_English apps in both data sets.

In [54]:
android_english = []
iso_english = []

for app in android_clean:
  name = app[0]
  if is_english(name):
    android_english.append(app)

for app in iso:
  name = app[1]
  if is_english(name):
    iso_english.append(app)

In [55]:
# explore the new Google Play data set
explore_data(android_english, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9614
Number of columns: 13


In [56]:
# explore the new Apple Store
explore_data(iso_english, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 6183
Number of columns: 16


Now we have 9614 apps in the Google Play data set and 6183 apps in the Apple Store data set.

**Cleaning Data: Isolating Free Apps**

We are interested in free apps only so we will isolate the free apps for our analysis.

In [57]:
android_free = []
iso_free = []

for app in android_english:
  price = app[6]
  if price == 'Free':
    android_free.append(app)

for app in iso_english:
  price = app[4]
  if price == '0.0':
    iso_free.append(app)

In [58]:
# explore the new Google Play data set
explore_data(android_free, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 8863
Number of columns: 13


In [59]:
# explorre the new Apple Store data set
explore_data(iso_free, 0, 3, True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 3222
Number of columns: 16


Finally, we are left with 8863 Google Play apps and 3222 Apple Store apps.

**Analysis: Frequency Tables by Using Category and Genre**

Our revenue is highly influenced by the number of people using our apps.  In order to determine the kinds of apps that are likely to attract more users, we will find the most common genres in Google Play and Apple Store by building frequency tables for a few columns in our data sets.

We will frequency tables from the `Category` and `Genres` columns for the Google Play data set and the `prime_genre` column for the Apple Store data set.

In [60]:
# define a function to generate frequency tables that show percentages
def freq_table(dataset, index):
  table = {}
  total = 0

  for row in dataset:
    total += 1
    value = row[index]
    if value in table:
      table[value] += 1
    else:
      table[value] = 1

  table_percentages = {}
  for key in table:
    percentage = (table[key]/total)*100
    table_percentages[key] = percentage

  return table_percentages

The frequency table we generate by using the `freq_table` function defined above is a dictionary.  We need to sort the frequency table in a descending order to find out which kinds of apps have higher frequency.  In order to do so, We will define a function to transform the frequency table into a list of tuples, then sort the list in a descending order.

In [61]:
# define a function to transform a dictionary into a list of tuples and sort the list in a descending order
def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

We will creat a frequency table for the `prime_genre` column in the Apple Store data set.

In [62]:
print(display_table(iso_free, 11))

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None


About 58% of apps in the Apple Store data set is categorized as games followed by entertainment (7.9%), photo & video (5.0%), and education (3.7%).  The most of apps are for entertainment.  The apps for practical use are rare.  However, the fact that entertainment apps are the most numerous doesn't also imply that they also have the greatest number of users.

We will creat frequency tables for the `Category` and `Genres` columns in the Google Play data set.

In [63]:
print(display_table(android_free, 1)) # Category

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

In [64]:
print(display_table(android_free, 9)) # Genres

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

When we look at a frequency table for the `Category` column in the Google Play data set, about 19% of apps is catagorized as family followed by game (9.7%), tools (8.5%), and business (4.6%).

When we look at a frequency table for the `Genre` column in the Google Store data set, about 8.5% of apps is catagorized as tools followed by entertainment (6.1%), education (8.5%), and business (4.6%).

The difference between `Category` and `Genre` is not clear.  However, we can conclude that the Google Play has a more balanced landscape of both practical and entertainment apps compared to the Apple Store.

Still, we cannot identify the type of apps that is mostly influenced by the number of users from these frequecy tables.

**Analysis: Most Popular Genre by Using total Number of User Ratings in Apple Store Data Set**

We will calculate the average number of user ratings per app genre by using the `rating_count_tot` column in the Apple Store data set.



In [65]:
genre_iso = freq_table(iso_free, 11)

for genre in genre_iso:
  total = 0
  len_genre = 0
  for app in iso_free:
    genre_app = app[11]
    if genre_app == genre:
      n_ratings = float(app[5])
      total += n_ratings
      len_genre += 1
  ave_n_ratings = total / len_genre
  print(genre, ':', ave_n_ratings)

Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


In the Apple Store, the "Navigation" apps have the highest average number of ratings followed by "Reference" and "Social Networking".

In [66]:
# print name and number of ratings for the navigation apps
for app in iso_free:
    if app[11] == 'Navigation':
        print(app[1], ':', app[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


In [67]:
# print name and number of ratings for the reference apps
for app in iso_free:
    if app[11] == 'Reference':
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
教えて!goo : 0
Jishokun-Japanese English Dictionary & Translator : 0


In [68]:
# print name and number of ratings for the navigation apps
for app in iso_free:
    if app[11] == 'Social Networking':
        print(app[1], ':', app[5])

Facebook : 2974676
Pinterest : 1061624
Skype for iPhone : 373519
Messenger : 351466
Tumblr : 334293
WhatsApp Messenger : 287589
Kik : 260965
ooVoo – Free Video Call, Text and Voice : 177501
TextNow - Unlimited Text + Calls : 164963
Viber Messenger – Text & Call : 164249
Followers - Social Analytics For Instagram : 112778
MeetMe - Chat and Meet New People : 97072
We Heart It - Fashion, wallpapers, quotes, tattoos : 90414
InsTrack for Instagram - Analytics Plus More : 85535
Tango - Free Video Call, Voice and Chat : 75412
LinkedIn : 71856
Match™ - #1 Dating App. : 60659
Skype for iPad : 60163
POF - Best Dating App for Conversations : 52642
Timehop : 49510
Find My Family, Friends & iPhone - Life360 Locator : 43877
Whisper - Share, Express, Meet : 39819
Hangouts : 36404
LINE PLAY - Your Avatar World : 34677
WeChat : 34584
Badoo - Meet New People, Chat, Socialize. : 34428
Followers + for Instagram - Follower Analytics : 28633
GroupMe : 28260
Marco Polo Video Walkie Talkie : 27662
Miitomo : 2

The reference apps are mostly bible and dictionary apps so it may not be our interest to create another bible or dictionary apps.  The majority of the number of reviews in the navigation and social networking apps are the apps such as Waze, Google Map, Facebook, and Pinterest created by big companies.

**Analysis: Most Popular Genre by Using Average Number of Installs in Google Play Data Set**


In the Google Play data set, we will look at the installs column.  However, the numbers of installs are not preciese like 100+, 1,000+, 5,000+, 10,000+, etc.  For our analysis, we don't need precise numbers so we will simplied by removing the "+" sign.  For example, if the number of installs for the paricular app is 10,000+, we will assume it is 10,000 installs.  In order to do a calculation, we will remove "," and change it to the float.

In [70]:
cat_android = freq_table(android_free, 1)

for category in cat_android:
  total = 0
  len_category = 0
  for app in android_free:
    category_app = app[1]
    if category_app == category:
      n_installs = app[5]
      n_installs = n_installs.replace('+', '')
      n_installs = n_installs.replace(',', '')
      total += float(n_installs)
      len_category += 1
  ave_n_installs = total / len_category
  print(category, ':', ave_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

In the Google Play apps, the category "communication" has a highest average number of installs.

In [73]:
# print name and number of installs for the communication apps
for app in android_free:
    if app[1] == 'COMMUNICATION':
        print(app[0], ':', app[5])

WhatsApp Messenger : 1,000,000,000+
Messenger for SMS : 10,000,000+
My Tele2 : 5,000,000+
imo beta free calls and text : 100,000,000+
Contacts : 50,000,000+
Call Free – Free Call : 5,000,000+
Web Browser & Explorer : 5,000,000+
Browser 4G : 10,000,000+
MegaFon Dashboard : 10,000,000+
ZenUI Dialer & Contacts : 10,000,000+
Cricket Visual Voicemail : 10,000,000+
TracFone My Account : 1,000,000+
Xperia Link™ : 10,000,000+
TouchPal Keyboard - Fun Emoji & Android Keyboard : 10,000,000+
Skype Lite - Free Video Call & Chat : 5,000,000+
My magenta : 1,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Seznam.cz : 1,000,000+
Antillean Gold Telegram (original version) : 100,000+
AT&T Visual Voicemail : 10,000,000+
GMX Mail : 10,000,000+
Omlet Chat : 10,000,000+
My Vodacom SA : 5,000,000+
Microsoft Edge : 5,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Calls & Text by Mo+ : 5,000,000+
free 

**Conclusion**

We would like to create a free app that generate the most revenue from adds.  We used the Apple Store and Google Play data sets to find a profile of the most popular free apps.  We used the category and genre data, the number of ratings, and the number of installs to find out the profile of the apps that can have a high user traffic.

From both data sets, we saw the social networking and communication apps are very popular.  These apps having a high user traffic are those created by big companies.  However, there may be an oppotunity to create a social networking app that has a potenial to generate high revenue if we specialize our user target.

We can exclude those apps crated by big companies to do an analysis.  In addition, We need more specific information about ads in the apps such as how many users clicked the adds and how much money was spent by the usersin a paticular genre of apps.
