# Maximizing App Profits on the App Store and Google Play
This project aims to identify profitable mobile app profiles on the App Store and Google Play store in English-speaking markets. As a data analyst, I support our company in developing Android and iOS apps, empowering developers to make data-driven decisions. Our focus is on creating free apps that generate revenue through in-app advertisements, making user attraction crucial for success. Through comprehensive data analysis, we seek valuable insights into the app types with a higher potential to captivate a larger user base.


## I.   Gathering the Data
We save time and reduce cost while still providing valuable insights by focusing on a representative sample rather than collecting data for over four million apps available on the App Store and Google Play as of September 2018.  We have identified two suitable datasets that align with our project objectives, eliminating the need for extensive data collection. These datasets will enable us to conduct an in-depth analysis and draw meaningful conclusions.

* A [dataset](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv) containing data about approximately 10,000 Android apps from Google Play (data collected August 2018). 
* A [dataset](https://dq-content.s3.amazonaws.com/350/AppleStore.csv). containing data about approximately 7,000 iOS apps from the App Store (data was collected July 2017).


###      A.  Open Data Files

In [1]:
# Open AppleStore CSV File
Apple_opened_file = open('AppleStore.csv')
from csv import reader
Apple_read_file = reader(Apple_opened_file)
Apple_data = list(Apple_read_file)
Apple_header = Apple_data[0]
Apple = Apple_data[1:]

# Open GooglePlay Store CSV File
Google_opened_file = open('googleplaystore.csv')
from csv import reader
Google_read_file = reader(Google_opened_file)
Google_data = list(Google_read_file)
Google_header = Google_data[0]
Google = Google_data[1:]

###      B.   Exploring the Datasets
We will extract a sample of the data, the first 4 rows of each dataset, determine the number of rows and columns in our datasets, and print the header row for columns to help identify which columns we want to use for analysis.

In [2]:
#Create a function to extract and print the first three rows of each dataset

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

#Print number of rows and columns  

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print('\n')
print(str('Apple App Store Data, Rows 1 - 3')) #Print Apple dataset title
print('\n')
print(Apple_header) #This is the Apple Header Row
print('\n')
explore_data(Apple, 0, 3, rows_and_columns=True)
print('\n')
print('\n')

print(str('Google Play Store Data, Rows 1 - 3')) #Print Google Play dataset title
print('\n')
print(Google_header) #This is the Google Play Header Row   
print('\n')
explore_data(Google, 0, 3, rows_and_columns=True)
print('\n')
       



Apple App Store Data, Rows 1 - 3


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16




Google Play Store Data, Rows 1 - 3


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN

###      C.   Determine Relevant Data

After reviewing the datasets, the relevant columns in each dataset for our analysis include:
* **Apple App Store Data:**
'track_name', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'prime_genre'

 *For detailed information regarding header definitions, see* [documentation](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps).

* **Google Play Store Data:**
'App', 'Category', 'Rating', 'Reviews', 'Installs', 'Type', 'Price', 'Genres'

 *For detailed information regarding header definitions, click* [documentation](https://www.kaggle.com/datasets/lava18/google-play-store-apps).
 

## II.  Cleaning the Data

### A.  Validate suspected data errors and delete row if error found

In [3]:
#Check row number 10472 for inaccuracies in column number
print(Google[10472])
print(len(Google[10472]))
print(Google_header)
print(len(Google_header))


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
12
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
13


In [4]:
# Delete erroneous row

del Google[10472]
print(len(Google)) # to confirm number of rows are correct after deletion


10840


### B.  Removing Duplicate Entries

We need to find out how many unique apps there are, and how many duplicate apps there are in our Google data list and store the results.
We can do this with the following code:


In [5]:
# Remove Duplicate Entries

duplicate_apps = []
unique_apps = []

for app in Google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Names of Apps with Duplicates:', duplicate_apps[:15])
print('\n')

# Print Instagram Duplicates as example
for app in Google:
    name = app[0]
    if name == 'Instagram':
        print(app)
        

Number of duplicate apps: 1181


Names of Apps with Duplicates: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 

Now we subtract the number of duplicate apps from the Google dataset and assign it to the variable expected length.

In [6]:
# Print length of list expected without counting duplicates

print('Expected_length:', len(Google) - 1181) 

Expected_length: 9659


We are going to extract the entry with the highest number of reviews for each app, and then remove all other duplicate entries.  

First we create a dictionary that holds the reviews_max for each app.
Then we iterate over the data and assign the app name to the variable 'name'
And create a variable named n_reviews to store the number of reviews as a float.

In [7]:
# Create dictionary to hold the reviews max for each app
# iterate and assign variables to count 
# store the number of reviews as a float

reviews_max = {}

for app in Google:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews        

We need a clean data list and one more, the already_added list, just in case there are more than one entry with the max amount of reviews.  This will prevent those duplicates.

In [8]:
# Create clean data lists and already added lists to prevent duplicate entries 

Google_data_clean = []
already_added = []

for app in Google:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        Google_data_clean.append(app)
        already_added.append(name)

print('\n')
explore_data(Google_data_clean, 0, 3, True)
print('\n')



['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13




We now have the expected number of rows confirmed.



### C.  Removing Non-English Apps

We are targeting an English-speaking demographic so we want to remove all apps that are targeting non-English speaking demographics.

We can do this by creating a function targeting non-ASCII encoding, as all English-speaking text uses this encoding.  

We set it to target apps with greater than 127 characters first, then we target apps with greater than 3 non-ascii characters in their name so no English apps using emojis or symbols in their name get filtered out, but all but the shortest-named, non-English apps will be caught.

In [9]:
# create functions to remove non-english apps >127 characters, with non-ascii characters >3

def is_english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    if non_ascii > 3:       
        return False
    else:
        return True


Now we can create our english-only app lists for each platform, and count them.


In [10]:
# Create lists and assign apps to relevant lists, 
# print first 4 rows of each list 
# return total number of rows and columns for each.

Google_english = []
Apple_english = []

for app in Google_data_clean:
    name = app[0]
    if is_english(name):
        Google_english.append(app)
        
for app in Apple:
    name = app[1]
    if is_english(name):
        Apple_english.append(app)

print('\n')
print('Google English Apps')        
explore_data(Google_english, 0, 4, True)
print('\n')
print('\n')
print('Apple English Apps')
explore_data(Apple_english, 0, 4, True)
print('\n')



Google English Apps
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9614
Number of columns: 13




Apple English Apps
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagra

### D.  Isolate the Free Apps

Now we need to create lists to include only the free apps from the english lists just created, and functions to sort the apps into the lists.

In [11]:
# Create free and english lists and a function to sort into each, print totals

Google_free_english_apps = []
Apple_free_english_apps = []

for app in Google_english:
    price = app[7]
    if price == '0':
        Google_free_english_apps.append(app)
        
for app in Apple_english:
    price = app[4]
    if price == '0.0':
        Apple_free_english_apps.append(app)
        
print('\n')
print('Google Free English Apps:')        
print(len(Google_free_english_apps))
print('\n')
print('Apple Free English Apps:')
print(len(Apple_free_english_apps))
print('\n')



Google Free English Apps:
8864


Apple Free English Apps:
3222




## III.  Analyzing the Data

Goal: Build successful and profitable applications by aligning with user preferences in both markets.

Objective: Determine app profiles with high user appeal and revenue potential.

Validation Strategy:
  * Build minimal Android app version and add it to Google Play.
  * Further develop app if it receives positive user response.
  * Create iOS version and add it to App Store if app proves profitable after six months.

Analysis Approach:
  * Generate frequency tables for prime_genre column in App Store dataset and Genres/Category columns in Google Play dataset.
  * Identify most common genres in each market.

Insights and Actionable Steps:
  * Focus on app profiles that have potential for success on both App Store and Google Play.
  * Optimize app development efforts based on user preferences in each market.


### A.  Determine Most Common Genres Across Each Platform

We want to target apps that are profitable on both platforms, so we need to identify common profiles of the most popular apps for each platform within our datasets. 

We start by identifying the most common genres.




### 1.   Determine relevant column headings 

In [12]:
# Identify relevant column headings in each dataset

print('\n')
print(Google_data[0])
print('\n')
print(Apple_data[0])



['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


Here we can see Google columns Category and Genres, and Apple column prime genre should be analyzed. 


### 2.  Create functions to generate freqency tables with values as percentage


In [13]:
# Creating frequency table function that shows results as a percentage 

def freq_table(dataset, index):
    table = {}
    total = 0

    for row in dataset:
        total += 1
        value = row[index]
        if value in table:
            table[value] += 1
        else:
            table[value] = 1
    
    table_percentages = {}      # convert numbers to percentages
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage
        
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])  
print('\n')        





In [14]:
# Display table data for each platform

print('\n')
print('Apple Genre and Percentage')
print('\n')
display_table(Apple_free_english_apps, -5)
print('\n')



Apple Genre and Percentage


Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665




In [15]:
print('\n')
print('Google Category and Percentage')
print('\n')
display_table(Google_free_english_apps, 1) #category column
print('\n')
print('Google Genre and Percentage')# Genre Column
display_table(Google_free_english_apps, -4)
print('\n')



Google Category and Percentage


FAMILY : 18.907942238267147
GAME : 9.724729241877256
TOOLS : 8.461191335740072
BUSINESS : 4.591606498194946
LIFESTYLE : 3.9034296028880866
PRODUCTIVITY : 3.892148014440433
FINANCE : 3.7003610108303246
MEDICAL : 3.531137184115524
SPORTS : 3.395758122743682
PERSONALIZATION : 3.3167870036101084
COMMUNICATION : 3.2378158844765346
HEALTH_AND_FITNESS : 3.0798736462093865
PHOTOGRAPHY : 2.944494584837545
NEWS_AND_MAGAZINES : 2.7978339350180503
SOCIAL : 2.6624548736462095
TRAVEL_AND_LOCAL : 2.33528880866426
SHOPPING : 2.2450361010830324
BOOKS_AND_REFERENCE : 2.1435018050541514
DATING : 1.861462093862816
VIDEO_PLAYERS : 1.7937725631768955
MAPS_AND_NAVIGATION : 1.3989169675090252
FOOD_AND_DRINK : 1.2409747292418771
EDUCATION : 1.1620036101083033
ENTERTAINMENT : 0.9589350180505415
LIBRARIES_AND_DEMO : 0.9363718411552346
AUTO_AND_VEHICLES : 0.9250902527075812
HOUSE_AND_HOME : 0.8235559566787004
WEATHER : 0.8009927797833934
EVENTS : 0.7107400722021661
PARENTING : 0


### 3.   Reviewing the Tables


The largest Genres on the Apple Platform are: Games at 58% and Entertainment at almost 8%.
Other categories include Photo, Education and Social Networking.  Four of the top 5 genres are for entertainment and social networking. The smallest genres are practical helper apps.  

The largest Categories on the Google Platform are:  Family at 19% and Games at almost 10%.  If we look at the Google Play platform [Link](https://play.google.com/store/search?q=family&c=apps), we will see the subcategory most represented are children's games, making the largest category Family Games at roughly 29%, followed by Tools at 8.5%.

The largest Genre, which is broken down into very detailed subgenres, shows Tools represent 8.5%, identical to the same category percentage, followed by 6% Entertainment. Other trends present in the top 5 results of all three lists include photography, entertainment, lifestyle and business apps.

As we can see in this chart, it appears the Apple Store tends to offer a substantially larger percentage of entertainment apps, while Google tends to offer many more practical apps.


|                         | Apple Genre | Google Category | Google Genre |
|-------------------------|-------------:|----------------:|--------------:|
| % Practical Genres      |         3.66|           32.52|         26.29|
| % General Entertainment |        74.30|           13.05|          6.10|


However, the genre percentages of app offerings is not the same as which apps or genre of apps is most popular.  We need more data analysis!



## B.   Determining Most Popular Apps on Each Platform


### 1.   Determine relevant columns for each App

We need to calculate the popularity of the apps in each store, so we need to find comparable data in each list. Google Play has number of installs data, but Apple does not. They do have total user ratings count data, which we can use in place of the install category.

We need to calculate the average number of user ratings for each genre.


### 2.   Calculate Average Number of User Ratings Per App Genre in Apple Store

In [16]:
# calculate average number of user ratings per app genre in Apple store

Pop_Apple_genres = freq_table(Apple_free_english_apps, -5)
avg_ratings = [] # Create a list to store tuples (average_rating, genre)

for genre in Pop_Apple_genres:
    total = 0 # sum of number of user ratings per genre
    len_genre = 0 # number of apps per genre
    
    for app in Apple_free_english_apps:
        genre_app = app[-5]
        if genre_app == genre:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
    avg_n_ratings = total / len_genre
    avg_ratings.append((avg_n_ratings, genre)) # add a tuple to the list

sorted_avg_ratings = sorted(avg_ratings, reverse=True) # Sort in descending order

for avg_n_ratings, genre in sorted_avg_ratings:
    print(genre, ':', avg_n_ratings)
print('\n')

Navigation : 86090.33333333333
Reference : 74942.11111111111
Social Networking : 71548.34905660378
Music : 57326.530303030304
Weather : 52279.892857142855
Book : 39758.5
Food & Drink : 33333.92307692308
Finance : 31467.944444444445
Photo & Video : 28441.54375
Travel : 28243.8
Shopping : 26919.690476190477
Health & Fitness : 23298.015384615384
Sports : 23008.898550724636
Games : 22788.6696905016
News : 21248.023255813954
Productivity : 21028.410714285714
Utilities : 18684.456790123455
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Business : 7491.117647058823
Education : 7003.983050847458
Catalogs : 4004.0
Medical : 612.0





By far, Navigation, Reference and Social Networking are the most popular genres based on number of user ratings.  Based on this alone, I recommend building apps within any of these top 3 genres.

However, our goal is to find a popular cross-platform genre, so we need to look at the Google Play data and find a highly-rated shared category between the App Store and Google Play store.


### 3.  Calculate Average Number of Installs Per App Category in Google Play Store

Since the numbers give a general range only, we will use the low end of the numeric ranges given to do this, convert our string numbers and characters to float so we can calculate, and then generate our frequency table.

In [17]:
# Calculate Google Play Installs Per App Category

Pop_Google_Categories = freq_table(Google_free_english_apps, 1)
avg_installs = [] # Create a list to store tuples (average_installs, category)

for category in Pop_Google_Categories:
    total = 0    # sum of number of user installs per category
    len_category = 0    # number of apps per category
    
    for app in Google_free_english_apps:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')   # Replace characters with space
            n_installs = n_installs.replace(',', '') 
            n_installs = float(n_installs)     # So we can convert the strings to floats
            total += n_installs                # And perform calculations on our numbers
            len_category += 1
    avg_n_installs = total / len_category
    avg_installs.append((avg_n_installs, category))   # add a tuple to the list

sorted_avg_installs = sorted(avg_installs, reverse=True)   # Sort in descending order

for avg_n_installs, category in sorted_avg_installs:
    print(category, ':', avg_n_installs )
   


COMMUNICATION : 38456119.167247385
VIDEO_PLAYERS : 24727872.452830188
SOCIAL : 23253652.127118643
PHOTOGRAPHY : 17840110.40229885
PRODUCTIVITY : 16787331.344927534
GAME : 15588015.603248259
TRAVEL_AND_LOCAL : 13984077.710144928
ENTERTAINMENT : 11640705.88235294
TOOLS : 10801391.298666667
NEWS_AND_MAGAZINES : 9549178.467741935
BOOKS_AND_REFERENCE : 8767811.894736841
SHOPPING : 7036877.311557789
PERSONALIZATION : 5201482.6122448975
WEATHER : 5074486.197183099
HEALTH_AND_FITNESS : 4188821.9853479853
MAPS_AND_NAVIGATION : 4056941.7741935486
FAMILY : 3695641.8198090694
SPORTS : 3638640.1428571427
ART_AND_DESIGN : 1986335.0877192982
FOOD_AND_DRINK : 1924897.7363636363
EDUCATION : 1833495.145631068
BUSINESS : 1712290.1474201474
LIFESTYLE : 1437816.2687861272
FINANCE : 1387692.475609756
HOUSE_AND_HOME : 1331540.5616438356
DATING : 854028.8303030303
COMICS : 817657.2727272727
AUTO_AND_VEHICLES : 647317.8170731707
LIBRARIES_AND_DEMO : 638503.734939759
PARENTING : 542603.6206896552
BEAUTY : 51315

Let's look at the detailed *Installs Per Genre* data as well now.

In [18]:
# Calculate Google Play Installs Per App Genre

Pop_Google_Genres = freq_table(Google_free_english_apps, 9)
avg_installs_genre = [] # Create a list to store tuples (average_installs, genre)

for genre in Pop_Google_Genres:
    total = 0 # sum of number of user installs per genre
    len_genre = 0 # number of apps per genre
    
    for app in Google_free_english_apps:
        genre_app = app[9]
        if genre_app == genre:
            n_installs = app[5]
            n_installs = n_installs.replace('+', '')
            n_installs = n_installs.replace(',', '')
            n_installs = float(n_installs)
            total += n_installs
            len_genre += 1
    avg_n_genre_installs = total / len_genre
    avg_installs_genre.append((avg_n_genre_installs, genre)) # add a tuple to the list

sorted_avg_installs_genre = sorted(avg_installs_genre, reverse=True) # Sort in descending order

for avg_n_genre_installs, genre in sorted_avg_installs_genre:
    print(genre, ':', avg_n_genre_installs )

Communication : 38456119.167247385
Adventure;Action & Adventure : 35333333.333333336
Video Players & Editors : 24947335.796178345
Social : 23253652.127118643
Arcade : 22888365.48780488
Casual : 19569221.602564104
Puzzle;Action & Adventure : 18366666.666666668
Photography : 17840110.40229885
Educational;Action & Adventure : 17016666.666666668
Productivity : 16787331.344927534
Racing : 15910645.681818182
Travel & Local : 14051476.145631067
Casual;Action & Adventure : 12916666.666666666
Action : 12603588.872727273
Strategy : 11199902.530864198
Tools : 10802461.246995995
Tools;Education : 10000000.0
Role Playing;Brain Games : 10000000.0
Lifestyle;Pretend Play : 10000000.0
Casual;Music & Video : 10000000.0
Card;Action & Adventure : 10000000.0
Adventure;Education : 10000000.0
News & Magazines : 9549178.467741935
Music : 9445583.333333334
Educational;Pretend Play : 9375000.0
Puzzle;Brain Games : 9280666.666666666
Word : 9094458.695652174
Racing;Action & Adventure : 8816666.666666666
Books & R



## IV.   Conclusions

Based on an extensive data analysis, it is highly recommended to develop a niche social networking app, such as one that caters to music lovers or musicians. While the Social Networking genre garners the highest ratings overall, it faces strong competition from established cornerstone apps like Facebook, Pinterest, Skype, Messenger, and WhatsApp. By focusing on a specialized category within social networking, such as music, the app has the potential to not only attract users from popular platforms like Facebook but also tap into the music genre's user base, thereby maximizing revenue opportunities.