# Guided Project: Profitable App Profiles for the App Store and Google Play Markets

As a data-analyst working for a company that make free app and get money from ads, the goal of this project is to help our developers understand what type of apps are likely to attract more users on Google Play and the App Store, because the number of people using our apps affect our revenue.

## Definition of the fonction explore_data()
This function help us print the rows of our data in a readable way.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [2]:
from csv import reader

open_apple = open('./assets/AppleStore.csv', encoding='utf8')
open_google = open('./assets/googleplaystore.csv', encoding='utf8')

apple_data = list(reader(open_apple))
google_data = list(reader(open_google))

## Import data
We open the necessary data for our project and save them in two lists

## Explore the data
We use the explore_data function for that.

In [3]:
rows_apple = apple_data[1:]
rows_google = google_data[1:]

explore_data(rows_apple, 0, 5, rows_and_columns=True)
print()
explore_data(rows_google, 0, 5, rows_and_columns=True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


Number of rows: 7197
Number of columns: 16

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', 

## Columns identification
Identification of the columns in the datasets that could help us with our analysis.
For more information about the column name check the following links:
* [Apple Store data](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)
* [Google Play Store data](https://www.kaggle.com/datasets/lava18/google-play-store-apps)

**Identified columns for apple store**: track_name, price, rating_count_tot, user_rating, prime_genre.
**Identified columns for google store**: App, Category, Rating, Installs, Type, Genres.

In [4]:
print(apple_data[0])
print(google_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


## Data cleaning
### Remove the inaccurate data
From that [discussion](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) it seems that a column is missing in a certain row in the google play data.
First, we use the **for loop** to determine that row.
Second, we print that row
Third, we delete that row.

In [5]:
i = 0 
for row in rows_google:
    if len(google_data[0]) != len(row):
        print(f"Error occured in row number {i}\n")
        print(row, "\n")
        del rows_google[i]
        print("row deleted...")

    i += 1

Error occured in row number 10472

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] 

row deleted...


Now lets's check if our google play dataset contains duplicate datas. To do that we will :
- Create to list. One for the uniques data and another for the duplicate data;
- loop other the google play data set;
- Verify if the current data exist in the list of unique datas, if not, we append the list, else we append the list of duplicate data.

In [6]:
unique_datas = []
duplicate_datas = []

for row in rows_google:
    name = row[0]
    if name in unique_datas:
        duplicate_datas.append(name)
    else:
        unique_datas.append(name)

print("From our loop we see that the following data name for example is duplicated: ")

for row in rows_google:
    name = row[0]
    if name == 'Slack':
        print(row)

print("\nAll in all we have, ",len(duplicate_datas), "duplicated datas")

From our loop we see that the following data name for example is duplicated: 
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']

All in all we have,  1181 duplicated datas


For our analyse we will remove duplicate datas. We want to keep the more recents datas. Since the total row of our google data is 10841 and 1181 of them are duplicated - the header, 9659 should be left after cleaning.
We can do it by using the datas with the most elevate number of reviews, since reviews are cumulative, the highter number represent the more recent.

### Now let's remove the duplicated entries
This operation will be made in 02 steps.
First: With the help of dictionary we create a table that will contain the name of the app and they maximum review.
Second: We loop on the google strore data, and we verify if the maximum review of one app is equal to the maximum review of the same app in our first table. If yes, we add that row in a new list and add the name of the app on a list that will contain app already added (because some app have the same maximum of reviews).

In [7]:
reviews_max = {}

for row in rows_google:
    name = row[0]
    n_reviews = float(row[3])

    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print(len(reviews_max)) # to verify that all went ok !
        

9659


In [8]:
google_clean = []
already_added = []

for row in rows_google:
    name = row[0]
    n_reviews = float(row[3])

    if n_reviews == reviews_max[name] and name not in already_added:
        google_clean.append(row)
        already_added.append(name)

print(len(google_clean)) # to verify that all went ok !

9659


### remove non-english apps
All english characters as a value in the ASCII norm. So to verify is our app name is in english we create a function that will verify if each character of our app correspond to a ASCII number. If more than 3 characters don't correspond to that we consider our app to be non-english.

In [9]:
def is_english_app(app_name):
    i = 0
    for character in app_name:
        if ord(character) > 127:
            i += 1
            
            if i > 3: 
                return False
    else:
        return True

# We test our function
print(is_english_app('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english_app('Docs To Go™ Free Office Suite'))
print(is_english_app('Instachat 😜'))

False
True
True


In [10]:
filtered_apple = []
filtered_google = []

for row in google_clean:
    name = row[0]

    if is_english_app(name):
        filtered_google.append(row)

for row in rows_apple:
    name = row[1]

    if is_english_app(name):
        filtered_apple.append(row)

print(len(filtered_google))
print(len(filtered_apple))


9614
6183


### Remove the non-free apps

In [11]:
cleaned_apple = []
cleaned_google = []

for row in filtered_apple:
    price = row[4]

    if price == '0.0':
        cleaned_apple.append(row)

for row in filtered_google:
    price = row[7]

    if price == '0':
        cleaned_google.append(row)


print(len(cleaned_apple))
print(len(cleaned_google))

3222
8864


## Let's Determine the most common genre
As already mentionned, the revenue of our company is affected by the number of people that download our app since we make money on ads. It's therefore important to us to add on Apple Store and Google PlayStore, apps that will have good response from the users.

So we will make a first version of our app (a test version) that we will add on Google PlayStore and if the users like it, we will develop it further and then add it on Apple Store if the app is profitable after six months.

To start building our first test app, we have to find a common genres for each market. We will use for that the columns ***prime_genre*** for Apple Store datas and ***Genres and Category*** for Google PlayStore.

In [12]:
def freq_table(dataset, index):
    freq_table =  {}
    total = 0

    for row in dataset:
        key = row[index]
        total += 1

        if key in freq_table:
            freq_table[key] += 1
        else:
            freq_table[key] = 1
    
    table_percentages = {}
    for key in freq_table:
        percentage = (freq_table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages


def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])


print(display_table(cleaned_apple, 11))
print()
print(display_table(cleaned_google, 1))
print()
print(display_table(cleaned_google, 9))

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665
None

FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.31678700361010

#### Analyze of the frequency table for App Store

We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set.

The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer.

#### Analyze of the frequency table for Google Play S
The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.
Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the Genres column.

The difference between the Genres and the Category columns is not crystal clear, but one thing we can notice is that the Genres column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the Category column moving forward.

Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that ha e most uses.e
* 

## Let's find the most popular app by genre in App Store

In [18]:
prime_genre_freq_table = freq_table(cleaned_apple, 11)

for genre in prime_genre_freq_table:
    total = 0
    len_genre = 0

    for app in cleaned_apple:
        genre_app = app[11]

        if genre_app == genre:
            total += float(app[5])
            len_genre += 1
    
    rating_count_average = total / len_genre
    print(f"{genre}: {rating_count_average}")

Social Networking: 71548.34905660378
Photo & Video: 28441.54375
Games: 22788.6696905016
Music: 57326.530303030304
Reference: 74942.11111111111
Health & Fitness: 23298.015384615384
Weather: 52279.892857142855
Utilities: 18684.456790123455
Travel: 28243.8
Shopping: 26919.690476190477
News: 21248.023255813954
Navigation: 86090.33333333333
Lifestyle: 16485.764705882353
Entertainment: 14029.830708661417
Food & Drink: 33333.92307692308
Sports: 23008.898550724636
Book: 39758.5
Finance: 31467.944444444445
Education: 7003.983050847458
Productivity: 21028.410714285714
Business: 7491.117647058823
Catalogs: 4004.0
Medical: 612.0


On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together.

In [35]:
for app in cleaned_apple:
    if app[11] == 'Book':
        print(f"{app[1]}: {app[5]}")

Kindle – Read eBooks, Magazines & Textbooks: 252076
Audible – audio books, original series & podcasts: 105274
Color Therapy Adult Coloring Book for Adults: 84062
OverDrive – Library eBooks and Audiobooks: 65450
HOOKED - Chat Stories: 47829
BookShout: Read eBooks & Track Your Reading Goals: 879
Dr. Seuss Treasury — 50 best kids books: 451
Green Riding Hood: 392
Weirdwood Manor: 197
MangaZERO - comic reader: 9
ikouhoushi: 0
MangaTiara - love comic reader: 0
謎解き: 0
謎解き2016: 0


Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages, but we'll leave this level of detail for later.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

In [36]:
for app in cleaned_apple:
    if app[11] == 'Reference':
        print(f"{app[1]}: {app[5]}")

Bible: 985920
Dictionary.com Dictionary & Thesaurus: 200047
Dictionary.com Dictionary & Thesaurus for iPad: 54175
Google Translate: 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran: 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition: 17588
Merriam-Webster Dictionary: 16849
Night Sky: 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE): 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools: 4693
GUNS MODS for Minecraft PC Edition - Mods Tools: 1497
Guides for Pokémon GO - Pokemon GO News and Cheats: 826
WWDC: 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free: 718
VPN Express: 14
Real Bike Traffic Rider Virtual Reality Glasses: 8
教えて!goo: 0
Jishokun-Japanese English Dictionary & Translator: 0


However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.

This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.

In [37]:
prime_genre_freq_table = freq_table(cleaned_google, 1)

for category in prime_genre_freq_table:
    total = 0
    len_category = 0

    for app in cleaned_google:
        category_app = app[1]

        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace("+", "").replace(",", "")
            total += int(n_installs)
            len_category += 1
    
    install_count_average = total / len_category
    print(f"{category}: {install_count_average}")

ART_AND_DESIGN: 1986335.0877192982
AUTO_AND_VEHICLES: 647317.8170731707
BEAUTY: 513151.88679245283
BOOKS_AND_REFERENCE: 8767811.894736841
BUSINESS: 1712290.1474201474
COMICS: 817657.2727272727
COMMUNICATION: 38456119.167247385
DATING: 854028.8303030303
EDUCATION: 1833495.145631068
ENTERTAINMENT: 11640705.88235294
EVENTS: 253542.22222222222
FINANCE: 1387692.475609756
FOOD_AND_DRINK: 1924897.7363636363
HEALTH_AND_FITNESS: 4188821.9853479853
HOUSE_AND_HOME: 1331540.5616438356
LIBRARIES_AND_DEMO: 638503.734939759
LIFESTYLE: 1437816.2687861272
GAME: 15588015.603248259
FAMILY: 3695641.8198090694
MEDICAL: 120550.61980830671
SOCIAL: 23253652.127118643
SHOPPING: 7036877.311557789
PHOTOGRAPHY: 17840110.40229885
SPORTS: 3638640.1428571427
TRAVEL_AND_LOCAL: 13984077.710144928
TOOLS: 10801391.298666667
PERSONALIZATION: 5201482.6122448975
PRODUCTIVITY: 16787331.344927534
PARENTING: 542603.6206896552
WEATHER: 5074486.197183099
VIDEO_PLAYERS: 24727872.452830188
NEWS_AND_MAGAZINES: 9549178.467741935
MA

As with App Store, we see that building an app around a popular book can be profitable. 

In [38]:
for app in cleaned_google:
    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'
                                            or app[5] == '5,000,000+'
                                            or app[5] == '10,000,000+'
                                            or app[5] == '50,000,000+'):
        print(app[0], ':', app[5])

Wikipedia : 10,000,000+
Cool Reader : 10,000,000+
Book store : 1,000,000+
FBReader: Favorite Book Reader : 10,000,000+
Free Books - Spirit Fanfiction and Stories : 1,000,000+
AlReader -any text book reader : 5,000,000+
FamilySearch Tree : 1,000,000+
Cloud of Books : 1,000,000+
ReadEra – free ebook reader : 1,000,000+
Ebook Reader : 5,000,000+
Read books online : 5,000,000+
eBoox: book reader fb2 epub zip : 1,000,000+
All Maths Formulas : 1,000,000+
Ancestry : 5,000,000+
HTC Help : 10,000,000+
Moon+ Reader : 10,000,000+
English-Myanmar Dictionary : 1,000,000+
Golden Dictionary (EN-AR) : 1,000,000+
All Language Translator Free : 1,000,000+
Aldiko Book Reader : 10,000,000+
Dictionary - WordWeb : 5,000,000+
50000 Free eBooks & Free AudioBooks : 5,000,000+
Al-Quran (Free) : 10,000,000+
Al Quran Indonesia : 10,000,000+
Al'Quran Bahasa Indonesia : 10,000,000+
Al Quran Al karim : 1,000,000+
Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+
Koran Read &MP3 30 Juz Offline : 1,000,000+
H

In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.

We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.