# **Apps that attract users in Google Play and Apple Store**

**Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets.** 
- We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to provide infromation to our team of developers so that they can build apps that are more attractive for the users.

- At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads.
- This means that our revenue for any given app is mostly influenced by the number of users that use our app.
- Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.



## Opening and Exploring the Data

There are approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Instead of collecting vast of the data. In this project, I am going to collect sample of the data that is available at no cost.

- A [data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately ten thousand Android apps from Google Play
- A [data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home) containing data about approximately seven thousand iOS apps from the App Store

1. **Opening the data and making the data list of lists**
- Opening the data using open(), reader() and list() commands 
  - Separately, store the headers and data of the apps in vairables.
    - android_header
    - android
    - ios_header
    - ios 
2. **Exploring data by creating a explore_data() function**
- It is easier to create a function to explore information any dataset and slicing the information as per our need 
  - Create explore_data() function using following paramenters
    - dataset
    - start (Start of the slice)
    - end (End of the slice)
    - row_and_column and set it as default arugument False
  - Loop the data and print each row in a given slice and add new line after each row
  

In [1]:
### The Google Play data set ###
opened_file = open('googleplaystore.csv',encoding='utf8')
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The APPLE STORE data set ###
opened_file = open('AppleStore.csv',encoding='utf8')
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

In [2]:
def explore_data(dataset, start, end, row_and_column:False):
    dataset_slice=dataset[start:end]
    for row in dataset:
        print(row)
        print('\n')
    if row_and_column:
        print ('Number of rows:', len(dataset))
        print ('Number of column:', len(dataset[0]))
        
print(android_header)
print('\n')
print(ios_header)
    

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


 - **In Google Store dataset, the columns that might be useful for the purpose of our analysis are 'App', 'Category', 'Reviews', 'Installs', 'Type', 'Price', and 'Genres'.**
 - **In Apple Store dataset, the columns that seem interesting are: 'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', and 'prime_genre'**

## **DATA CLEANING**

- Deleting worng entry
  
- Delete Duplicate entries

- Remove unwanted information

- Remove non-english apps

- Remove non-free apps

### Deleting worng entry

- In uncleaned android dataset without header, we figured out 10472 entry is wrong.
  - Row 10472 has missing value of Category. Which indeed created the shift in index value for particular index number.
  - So, delete the row 10472
- Check and then remove the entry using del statement accordingly

In [3]:
print(android_header)
print(android[10472])
print(android[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


**Checking the length of the dataset before deleting entry**

In [4]:
print(len(android))
print('\n')
del android[10472]
print(len(android))

10841


10840


### Delete Duplicate entries

- We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app
  - As per dataset discussion, I identified instagram has duplicate entries
  - Using loop, printing all the entries with Instagram as app name
  - It will help identify the number of duplicate entries
- **Loop the data for app name and create lists of duplicate_apps and unique_apps**
- Logically remove the duplicate entries. In our case, I am only keeping duplicate entries with highest reviews assuming that as a latest updated entry.

In [5]:
for app in android:
    name=app[0]
    if name=='Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [6]:
duplicate_apps=[]
unique_apps=[]

for app in android:
    name=app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
        
        
print('Number of duplicate apps:',len(duplicate_apps))
print('\n')
print('Number of unique apps:',len(unique_apps))

Number of duplicate apps: 1181


Number of unique apps: 9659


#### Creating a dictionary with unique apps and corresponding highest number of reviews
 
- If name already exists as a key in the *reviews_max dictionary and reviews_max[name] < n_reviews*, update the number of reviews for that entry in the reviews_max dictionary.

- If *name is not in the reviews_max dictionary as a key*, create a new entry in the dictionary where the key is the app name, and the value is the number of reviews. Make sure you don't use an else clause here, otherwise the number of reviews will be incorrectly updated whenever *reviews_max[name] < n_reviews* evaluates to False

In [7]:
reviews_max={}

for app in android:
    name=app[0]
    n_reviews= float(app[3])
    if (name in reviews_max) and reviews_max[name]< n_reviews:
        reviews_max[name]=n_reviews
    elif name not in reviews_max:
        reviews_max[name]=n_reviews
        
print(len(reviews_max))

9659


#### Remove the duplicate entries using above dictionary 

- Create two empty lists andriod_clean and already_added
  - Loop over the dataset, check if the app name not in already_added and n_reviews= the number can be found in the reviews_max     dictionary
  - append the name to already_added
- Use print statement to check the if duplicate entries are removed or not

In [8]:
android_clean =[]
already_added =[]
for app in android:
    name=app[0]
    n_reviews=float(app[3])
    if n_reviews==reviews_max[name] and (name not in already_added):
        android_clean.append(app)
        already_added.append(name)
        
print('Cleaned android dataset length:', len(android_clean))
        

Cleaned android dataset length: 9659


### Removing non-english apps

- Note: python convert every character into numbers. The english alphabets and mostly used characters lie in the range of 0 to 127 accorbing to ASCII system . We can get the corresponding number of each character using the **ord()** built-in function.

- If an app name contains a character that is **greater than 127**, then it probably means that the **app has a non-English name**.

- In Python, strings are indexable and iterable, which means we can use indexing to select an individual character.


**If you explore the data sets, we noticed the names of some of the apps non-English. Below, we see a couple of examples from both data sets**

In [9]:
print(ios[813][1])
print(ios[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

436672029
1144164707
中国語 AQリスニング
لعبة تقدر تربح DZ



- Write a function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters, otherwise it returns True.
-- def is_english(string):
    for character in string:
        if ord(character) > 127:
            return False
    return True
-- if  we use above function we will lose lot of data. for example : 'Instachat 😜' is an english app with emoji. To avoid loosing most of the data, I decide a string at max have 3 character greater than 127.
        


In [10]:

def is_english(string_a):
    non_ascii = 0
    
    for character in string_a:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
        
print(is_english('Instagram'))

print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

print(is_english('欢乐颂 视剧  Docs To Go™ Free Office Suite 欢乐颂 视剧  '))
              
print(is_english('欢乐颂 视剧 😜 Instachat'))
        
        

True
False
False
False


#### Google play store data

app name index number is 0
price is index 7
#### Apple store datase

app name index number is 2

price is index 5

In [11]:
android_english=[]
ios_english=[]

for app in android_clean:
    name=app[0]
    if is_english(name):
        android_english.append(app)

        
print('android length:', len(android_english))
        
       
for app in ios:
    name=app[2]
    if is_english(name):
        ios_english.append(app)
print('\n')

print('ios length:', len(ios_english))

android length: 9614


ios length: 6183


### Isolate Free Apps

In [12]:
android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
print(ios[0:2])
print(ios_header)

for app in ios_english:
    price = app[5]
    if price == '0':
        ios_final.append(app)
        
print('Final android dataset length:',len(android_final))
print('Final ios dataset length:',len(ios_final))

[['1', '281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1'], ['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']]
['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
Final android dataset length: 8864
Final ios dataset length: 3222


###  Analysis for Most Common Apps by Genre
#### Part One
As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.

To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:

- Build a minimal Android version of the app, and add it to Google Play.
- If the app has a good response from users, we then develop it further.
- If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.

Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.

**Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the prime_genre column of the App Store data set, and the Genres and Category columns of the Google Play data set.**

#### Part Two
We'll build two functions we can use to analyze the frequency tables:

- One function to generate frequency tables that show percentages 
- Another function that we can use to display the percentages in a descending order

In [13]:
print(android_header)
print(android_final[0:2])
print('\n')
print(ios_header)
print(ios_final[0])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']]


['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['2', '281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


##### Build two functions we can use to analyze the frequency tables:

However, dictionaries don't have order, and it will be very difficult to analyze the frequency tables. Using **built-in sorted() function** in second function will help to organize the data in ascending or descending order
- sorted() function doesn't work too well with dictionaries because it only considers and returns the dictionary keys.
- However, the sorted() function works well if we transform the dictionary into a list of tuples, where each tuple contains a dictionary key along with its corresponding dictionary value
- In built in function, have parameter that help us to chose the order.
  - sort (a_list, reverse= True). Here, reverse parameter controls the order of the output

CODE DESCRIPTION
- One function to generate frequency tables **freq_table (dataset, index)** that show percentages
  - This will allow to create frequency table for any column
  - add variable **count** that shows total no of rows in a column
- Another function we can use to display the percentages in a descending order
  - convert the dictionary to list of tuples by looping over the dataset
    - Once tuple is created use sorted() to get the data as asc or desc order
      - converted dictionary={key1:value1,key2:value2} to list of tuples [(value1,key1),(value2,key2)]
      - sorted() sorts the tuple by value
      - start a loop for table_sorted
        - for each entry, we are printing key and then value



In [14]:
def freq_table(dataset, index):
    frequency_table={}
    count=0 # Total no of items in the dictionary
    for app in dataset:
        value=app[index]
        count= count+1
        if value in frequency_table:
            frequency_table[value]+=1
        else:
            frequency_table[value]=1
    
    frequency_table_percentages={}
    for key in frequency_table:
        percentage= (frequency_table[key]/count)*100
        frequency_table_percentages[key]= percentage
    return frequency_table_percentages

# Second function

def display_table(dataset,index):
    table = freq_table(dataset, index)
    table_display=[]
    for key in table:
        table_tuple = (table[key], key)
        table_display.append(table_tuple)
        
    table_sorted = sorted(table_display, reverse=True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [15]:
display_table(ios_final, -5)

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665



We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

In [16]:
display_table(android_final, 1) # Category

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0.6543321299638989
ART_AND_DESIGN : 


The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.

Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column:

In [17]:

display_table(android_final, -4) # genre

Tools : 8.449909747292418
Entertainment : 6.069494584837545
Education : 5.347472924187725
Business : 4.591606498194946
Productivity : 3.892148014440433
Lifestyle : 3.892148014440433
Finance : 3.7003610108303246
Medical : 3.531137184115524
Sports : 3.463447653429603
Personalization : 3.3167870036101084
Communication : 3.2378158844765346
Action : 3.1024368231046933
Health & Fitness : 3.0798736462093865
Photography : 2.944494584837545
News & Magazines : 2.7978339350180503
Social : 2.6624548736462095
Travel & Local : 2.3240072202166067
Shopping : 2.2450361010830324
Books & Reference : 2.1435018050541514
Simulation : 2.0419675090252705
Dating : 1.861462093862816
Arcade : 1.8501805054151623
Video Players & Editors : 1.7712093862815883
Casual : 1.7599277978339352
Maps & Navigation : 1.3989169675090252
Food & Drink : 1.2409747292418771
Puzzle : 1.128158844765343
Racing : 0.9927797833935018
Role Playing : 0.9363718411552346
Libraries & Demo : 0.9363718411552346
Auto & Vehicles : 0.9250902527075

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.

## Most popular apps by genres

- One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. 
- For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set.
- As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot (index number 6)app.

Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll need to:

- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

CODE DESCRIPTION
- rating_count_tot (index number 6)app represents number of user rating
- frequency table for prime_genre with index -5
- start a loop for unique ios apps
  - variable named **total** with a **value of 0**. This variable will store the **sum of user ratings**
  - variable named **len_genre** with a **value of 0**. This variable will store the **number of apps specific to each genre**.

In [18]:
ios_unique_genre = freq_table(ios_final,-5)
print(ios_header)
for genre in ios_unique_genre:
    total=0
    len_genre =0 
    for app in ios_final:
        genre_app=app[-5]
        if genre==genre_app:
            user_rating =float(app[6])
            total+=user_rating
            len_genre+=1
    average_user_rating= total/len_genre
    print(genre, ':',average_user_rating)

['', 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
Productivity : 21028.410714285714
Weather : 52279.892857142855
Shopping : 26919.690476190477
Reference : 74942.11111111111
Finance : 31467.944444444445
Music : 57326.530303030304
Utilities : 18684.456790123455
Travel : 28243.8
Social Networking : 71548.34905660378
Sports : 23008.898550724636
Health & Fitness : 23298.015384615384
Games : 22788.6696905016
Food & Drink : 33333.92307692308
News : 21248.023255813954
Book : 39758.5
Photo & Video : 28441.54375
Entertainment : 14029.830708661417
Business : 7491.117647058823
Lifestyle : 16485.764705882353
Education : 7003.983050847458
Navigation : 86090.33333333333
Medical : 612.0
Catalogs : 4004.0


In [19]:
for app in ios_final:
    if app[-5] == 'Reference':
        print(app[2], ':', app[6]) # print name index 2 and number of ratings index 6

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
Merriam-Webster Dictionary : 16849
Google Translate : 26786
Night Sky : 12122
WWDC : 762
Jishokun-Japanese English Dictionary & Translator : 0
教えて!goo : 0
VPN Express : 14
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Real Bike Traffic Rider Virtual Reality Glasses : 8


- **For References genre, religious apps, dictionaries seems to contribute to the most for that genre**.
    - I believe that there is more scope to tap the potential of this niche market with more religious and dictionaries for           very language with english translation.

In [20]:
print(android_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


In [21]:
display_table(android_final, 5) # the Installs columns

1,000,000+ : 15.726534296028879
100,000+ : 11.552346570397113
10,000,000+ : 10.548285198555957
10,000+ : 10.198555956678701
1,000+ : 8.393501805054152
100+ : 6.915613718411552
5,000,000+ : 6.825361010830325
500,000+ : 5.561823104693141
50,000+ : 4.7721119133574
5,000+ : 4.512635379061372
10+ : 3.5424187725631766
500+ : 3.2490974729241873
50,000,000+ : 2.3014440433213
100,000,000+ : 2.1322202166064983
50+ : 1.917870036101083
5+ : 0.78971119133574
1+ : 0.5076714801444043
500,000,000+ : 0.2707581227436823
1,000,000,000+ : 0.22563176895306858
0+ : 0.04512635379061372
0 : 0.01128158844765343


#### For Goolge play store data we are using number of installs to figure out most popular apps

- first,remove any +symbols using **str.replace(old,new)command** for Installs column
- convert string to float  

In [22]:
android_unique_category= freq_table(android_final,1)

for category in android_unique_category:
    total=0
    len_category  =0 
    for app in android_final:
        category_app=app[1]
        if category==category_app:
            n_installs=app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs=float(n_installs)
            total+=n_installs
            len_category +=1
    average_n_installs= total/len_category 
    print(category, ':',average_n_installs)

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3695641.8198090694
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_


On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:

# Conclusions
In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.