# Profitable App Profiles for the App Store and Google Play Markets

<br />

<br />

As a data analyst working for a company that **only build free apps to download and install** available on Google Play and in the App Store, our main source of revenue consists of in-app ads. 

This means that the number of users of our apps determines our revenue for any given app therefore the more users who see and engage with the ads, the better. 

![header.jpg](header.jpg)



### **Our goal**  

Collect and analyze data from each of the online stores to understand what type of apps are likely to attract more users.

* * *

The datasets are in two documents of type csv's i.e. **Comma Separated Values file** for read the content, it is necessary to load a function called `reader` of the python `csv` module.

In [1]:
from csv import reader

## Exploring datasets:

We are going to explore the two sets of data, but first we are going to kwon what type of [character encoding](https://en.wikipedia.org/wiki/Character_encoding) both datasets have.

A command called [file](https://www.man7.org/linux/man-pages/man1/file.1.html), will help us to kwon the type of file regardless of its extension and avoid a error called `UnicodeDecodeError`.

In [2]:
! file -i AppleStore.csv

AppleStore.csv: application/csv; charset=utf-8


In [3]:
! file -i googleplaystore.csv

googleplaystore.csv: application/csv; charset=utf-8


Our files have the same type of charset https://en.wikipedia.org/wiki/UTF-8.


Fast idea:

- **charset:** is the set of characters you can use.
- **encoding:** is the way these characters are stored into memory.

[source](https://stackoverflow.com/questions/2281646/whats-the-difference-between-encoding-and-charset)

In [4]:
AppleStore = open('AppleStore.csv', encoding='utf8')
AppleStore = reader(AppleStore)
AppleStore = list(AppleStore)
header_apple = [x for x in AppleStore[0]]
header_apple #Columns on AppleStore

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [5]:
dataset_apple = AppleStore[1:] # dataset apple no headers

In [6]:
GooglePlay = open('googleplaystore.csv', encoding='utf8')
GooglePlay = reader(GooglePlay)
GooglePlay = list(GooglePlay)
header_google = [x for x in GooglePlay[0]]
header_google 

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [7]:
dataset_google = GooglePlay[1:] # dataset google no header

### Function to explore datasets

To make them easier to explore, we created a function named explore_data() that you can repeatedly use to print rows in a readable way.

In [8]:
def explore_data(dataset, start, end, rows_and_columns=False):
        dataset_slice = dataset[start:end]    
        for row in dataset_slice:
            print(row)
            print('\n') # adds a new (empty) line after each row
        if rows_and_columns:
            print('Number of rows:', len(dataset))
            print('Number of columns:', len(dataset[0]))

In [9]:
%%html
<style>
table {float:left}
</style>

### Data dictionary:

We have the name of the columns and the data that form the set of both data sets:


#### AppleStore.csv

 - `header_apple`

 - `dataset_apple` 

||Name | Description|
|:--|:---|:--|
|1|"id" : |App ID|
|2|"track_name": |App Name|
|3|"size_bytes": |Size (in Bytes)|
|4|"currency": |Currency Type|
|5|"price": |Price amount|
|6|"ratingcounttot": |User Rating counts (for all version)|
|7|"ratingcountver": |User Rating counts (for current version)|
|8|"user_rating" : |Average User Rating value (for all version)|
|9|"userratingver": |Average User Rating value (for current version)|
|10|"ver" : |Latest version code|
|11|"cont_rating": |Content Rating|
|12|"prime_genre": |Primary Genre|
|13|"sup_devices.num": |Number of supporting devices|
|14|"ipadSc_urls.num": |Number of screenshots showed for display|
|15|"lang.num": |Number of supported languages|
|16|"vpp_lic": |Vpp Device Based Licensing Enabled|


#### googleplaystore.csv

 - `header_google`

 - `dataset_google`


||Name | Description|
|:--|:---|:--|
|1 | App: |Application name|
|2 | Category: |Category the app belongs to|
|3 | Rating: |Overall user rating of the app (as when scraped)|
|4 | Reviews: | Number of user reviews for the app (as when scraped)|
|5 | Size: | Size of the app (as when scraped)|
|6 | Installs: | Number of user downloads/installs for the app (as when scraped)|
|7 | Type: | Paid or Free|
|8 | Price: | Price of the app (as when scraped)|
|9 | Content: | Rating Age group the app is targeted at - Children / Mature 21+ / Adult|
|10 |  Genres: | An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres.|

### Knowing the size in both data sets.

In [10]:
explore_data(dataset_apple, 2, 4, True)

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 7197
Number of columns: 16


In [11]:
explore_data(dataset_google, 2, 4, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13


### Resume

|`dataset_apple`|`dataset_google`|
|:----|:----|
|Number of rows: 7197| Number of rows: 10841|
|Number of columns: 16|Number of columns: 13|

In [12]:
### Columns that can help us in our analysis:

## Data cleaning

This means that we need to:

- 1. **Detect inaccurate data, and correct or remove it**.

- 2. **Detect duplicate data, and remove the duplicates**.


- 3. **Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播**.

- 4. **Remove apps that aren't free##**.


The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion?sort=votes), and we can see that [one of the discussions](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

### 1. Detect inaccurate data... remove it.

We start from the idea that the number of columns must be the same as the number of fields contained in a row  so if the number of fields in a row varies that means data is missing.


In [13]:
def field_number(dataset, header):
    index_number = 0
    for row in range(len(dataset)):
        if len(dataset[row]) != len(header):
            print(dataset[row])
            print("\n")
            print("error in index number: ",index_number)
        index_number +=1

In [14]:
field_number(dataset_apple, header_apple)

In [15]:
field_number(dataset_google, header_google)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


error in index number:  10472


In [16]:
len(dataset_google[10472]) #discussion row

12

We have passed the two datasets through the function and there is a row in the `dataset_google` with a different length from the number of columns therefore we are going to remove it from our dataset.

In [17]:
del dataset_google[10472]

Checking if row has been removed properlly.

In [18]:
dataset_google[10472]

['osmino Wi-Fi: free WiFi',
 'TOOLS',
 '4.2',
 '134203',
 '4.1M',
 '10,000,000+',
 'Free',
 '0',
 'Everyone',
 'Tools',
 'August 7, 2018',
 '6.06.14',
 '4.4 and up']

### yes ...It has been removed

### 2. Detect duplicate data, and remove the duplicates.

In [19]:
def detect_duplicated(dataset):
    unique_apps = []
    duplicate_apps = []
    
    for row in dataset:
        name = row[0]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
            
    if len(duplicate_apps) == 0:
        print("In this dataset there are no duplicate apps")
    else:
        times=len(duplicate_apps)
        print("There are {reapeated} repeated apps".format(reapeated=times,dataset=dataset))
        print("\n")
        print('Examples of duplicate apps:',"\n"+"\n" , duplicate_apps[:5])

In [20]:
detect_duplicated(dataset_apple)

In this dataset there are no duplicate apps


In [21]:
detect_duplicated(dataset_google)

There are 1181 repeated apps


Examples of duplicate apps: 

 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings']


Just only 5 samples of the repeated applications in **dataset_google**

* * * 

### What differentiates duplicate applications?

If we take as an example a repeated application, in this case `Instagram`, we see that of all the fields that make up the row **there is only one that is different**.

In [22]:
for app in dataset_google:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


That is the number of `'Reviews'`:


Therefore this will be the criterion we use to eliminate duplicate applications.

In [23]:
reviews_max = {} #key=name, value=review

for row in dataset_google:
    name = row[0]
    n_reviews = float(row[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews: #if name in reviews_max & value in reviews_max[name]
        reviews_max[name] = n_reviews                         #is < than n_reviews the update
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print('length of the dictionary:',len(reviews_max))

length of the dictionary: 9659


This is the new length of our dataset having deleted repeated applications.

The original length of the dataset minus the number of repeated applications results in this number.

In [24]:
print('expected length', len(dataset_google) - 1181)

expected length 9659


In [25]:
android_clean = []
already_added = []

for row in dataset_google: #original dataset
    name = row[0]
    n_reviews = float(row[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(row) #row list added to list
        already_added.append(name) #new name added list

`android_clean` is the our **new dataset**

In [26]:
explore_data(android_clean, 1, 2, True)

['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 9659
Number of columns: 13


### Removing Non-English Apps

It is no necessary to do the same with App Store data because there are no duplicates.

We use English for the applications we develop in our company, and we would like to analyze only applications aimed at an English-speaking audience. 


However, if we explore the data enough, we will find that **both datasets** have applications with names that suggest they are not aimed at an English-speaking audience.

In [27]:
print(AppleStore[813][1])
print(AppleStore[6731][1])
print('\n')
print(android_clean[4412][0])
print(android_clean[7940][0])

BATTLE BEARS -1
Beast Poker


中国語 AQリスニング
لعبة تقدر تربح DZ


**One way to do this is to remove all applications whose name contains a symbol that is not commonly used in English text**. 

English text typically includes letters of the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;) and other símbols (+, *, /).

Each character we use in a string has a corresponding number associated with it. 

For example, the corresponding number of the character"a" es 97, el del carácter "A" es 65, y el del carácter "爱" es 29.233. 

 We can get the corresponding number of each character using the built-in function **ord().**

In [28]:
print(ord('a'))
print(ord('A'))
print(ord('爱'))
print(ord('5'))
print(ord('+'))

97
65
29233
53
43


The numbers corresponding to the characters that we usually use in an English text are all in the range of 0 to 127, according to the ASCII system (American Standard Code for Information Interchange). 

Based on this range of numbers, we can construct a function that detects whether a character belongs to the common English character set or not. **If the number is equal to or less than 127, the character belongs to the set of common English characters. If an app's name contains a character greater than 127, it probably means that the app has a non-English name.** 

**However, the names of our applications are stored as strings, so how could we take each individual character in a string and check its corresponding number?**

In [29]:
def english_speak(string):
    for character in string:
        valor_character = ord(character)
        if valor_character > 127:
            return False #not English 
        else:
            return True # is English

In [30]:
english_speak('Instagram')

True

In [31]:
english_speak('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [32]:
english_speak('Docs To Go™ Free Office Suite')#Is English!

True

In [33]:
english_speak('Instachat 😜') #Is English!

True

In [34]:
print(ord('™'))
print(ord('😜'))

8482
128540


If we are going to use the function we have created, we will lose useful data, we saw that the function could not correctly identify certain names of English applications such as **'Docs To Go™ Free Office Suite'** and **'Instachat 😜'** since many English applications will be incorrectly labeled as non-English. 


To **minimize the impact of data loss**,it is necessary to have a basic criterion that helps in the screening so **we will only delete an application if its name is longer than three characters with the corresponding numbers outside the ASCII range**. 

This means that all English apps with up to three emoji or other special characters will still be labeled as English. 

In [35]:
def english_speak(text):
    non_validchar = []
    for character in text:
        valor_character = ord(character)
        if valor_character > 127:
            non_validchar.append(valor_character)
    if len(non_validchar) >= 3:
        return False
    else:
        return True

In [36]:
english_speak('Docs To Go™ Free Office Suite')

True

In [37]:
english_speak('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [38]:
english_speak('Instachat 😜 😜')

True

In [39]:
english_speak('Instachat 😜 😜 😜')

False

At this point it is important to remember what the names of the datasets are, thus avoiding failures.

|Dataset names|lenght|
|:----|:----|
|`dataset_apple`|7197|
|`android_clean`|9659, 8757|

### Filter non-English applications from both datasets. 

If an app's name is identified as English, the we add the entire row to a separate list.

In [40]:
# AppleStore

list_apple_clean = []

for row in dataset_apple:
    name = row[1]
    if english_speak(name):
        list_apple_clean.append(row)
    else:
        pass

In [41]:
## android_clean

list_android_clean = []

for row in android_clean:
    nombre = row[0]
    if english_speak(nombre):
        list_android_clean.append(row)
    else:
        pass

### Exploring datasets and see how many rows we have left for each dataset.

In each cleaning process we have put different names to our dataset variables so it is important to take them into account

In [42]:
explore_data(list_apple_clean, 2, 4, True)

['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


Number of rows: 6155
Number of columns: 16


In [43]:
explore_data(list_android_clean, 2, 4, True)

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 9597
Number of columns: 13


**Now we are working with this:**


|Dataset names|lenght|
|:----|:----|
|`list_apple_clean`|6155|
|`list_android_clean`|9597, 8698|

## 8. Isolating the Free Apps

Our datasets contain **free and non-free applications** we need to isolate only the free applications for our analysis.

Checking the length of each dataset to see how many apps have left.

In [44]:
header_apple

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [45]:
free_apple_clean = []

for row in list_apple_clean:
    prix = row[4] #'price'
 
    if prix == '0.0' or prix == '0':
        free_apple_clean.append(row)
    else:
        pass
    
len(free_apple_clean)

3203

In [46]:
header_google

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [47]:
free_android_clean = [] 

for row in list_android_clean:
    prix = row[7] #'Price'
    if prix == '0.0' or prix == '0':
        free_android_clean.append(row)
    else:
        pass
    
len(free_android_clean)

8848

**Updating names and lengths**


|Dataset names|lenght|
|:----|:----|
|`free_apple_clean`|3203|
|`free_android_clean`|8848|

Our ultimate goal is to add the app on both Google Play and the App Store, so we need to **find profiles of apps that are successful in both markets**. 


Let's start the analysis by having an idea of what are the most common genres for each market. To do this, we will have to build frequency tables for some columns of our datasets.

The columns that can give us the information we need are:

`free_apple_clean[11]` ---> `prime_genre`

`free_android_clean[]` ---> `Genres`

In [48]:
def freq_table(dataset, column):
    table = {}
    total = 0
    
    for row in dataset:
        total +=1
        value = row[column]
        if value in table:
            table[value] +=1
        else:
            table[value] = 1
        
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total ) * 100
        table_percentages[key]=percentage
            
    return table_percentages

In [51]:
def display_table(dataset, column):
    table = freq_table(dataset, column)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])

In [52]:
output_apple = display_table(free_apple_clean,11) #apple prime_genere column

Games : 58.25788323446769
Entertainment : 7.836403371838902
Photo & Video : 4.995316890415236
Education : 3.6840462066812365
Social Networking : 3.3093974399000934
Shopping : 2.5913206369029034
Utilities : 2.466437714642523
Sports : 2.1542304089915705
Music : 2.0605682172962845
Health & Fitness : 2.0293474867311896
Productivity : 1.7483609116453322
Lifestyle : 1.5610365282547611
News : 1.3424914142990947
Travel : 1.248829222603809
Finance : 1.0927255697783327
Weather : 0.8741804558226661
Food & Drink : 0.8117389946924758
Reference : 0.5307524196066188
Business : 0.5307524196066188
Book : 0.3746487667811427
Navigation : 0.18732438339057134
Medical : 0.18732438339057134
Catalogs : 0.1248829222603809


In [53]:
output_gogle = display_table(free_android_clean,9) #Goolge Genres column

Tools : 8.44258589511754
Entertainment : 6.080470162748644
Education : 5.357142857142857
Business : 4.599909584086799
Productivity : 3.899186256781193
Lifestyle : 3.8765822784810124
Finance : 3.7070524412296564
Medical : 3.5375226039783
Sports : 3.4584086799276674
Personalization : 3.322784810126582
Communication : 3.2323688969258586
Action : 3.096745027124774
Health & Fitness : 3.0854430379746836
Photography : 2.949819168173599
News & Magazines : 2.802893309222423
Social : 2.667269439421338
Travel & Local : 2.328209764918626
Shopping : 2.2490958408679926
Books & Reference : 2.1360759493670884
Simulation : 2.0456600361663653
Dating : 1.8648282097649187
Arcade : 1.842224231464738
Video Players & Editors : 1.7744122965641953
Casual : 1.763110307414105
Maps & Navigation : 1.3901446654611211
Food & Drink : 1.2432188065099457
Puzzle : 1.1301989150090417
Racing : 0.9945750452079566
Role Playing : 0.9380650994575045
Libraries & Demo : 0.9380650994575045
Auto & Vehicles : 0.9267631103074141
St

In [None]:
# entertaiment tiene sentido

In [None]:
def apps_more_users(dataset):
    apps_users = {}
    
    for row in range(len(dataset)):
        
        genero = free_android_clean[9]
        installs = free_android_clean[5]

.....

## 12. Most Popular Apps by Genre on the App Store

Las tablas de frecuencias que analizamos en la pantalla anterior nos mostraron que la App Store está dominada por aplicaciones diseñadas para la diversión, mientras que Google Play muestra un panorama más equilibrado de aplicaciones tanto prácticas como divertidas. 

Ahora, **nos gustaría hacernos una idea sobre el tipo de apps con más usuarios**.

**Una forma de averiguar qué géneros son los más populares (tienen más usuarios) es calcular el número medio de instalaciones de cada género de aplicación.** 

Para el conjunto de datos de **Google Play, podemos encontrar esta información en la columna de instalaciones**, 


**Para el conjunto de datos de la App Store tomaremos el número total de valoraciones de los usuarios como proxy, que podemos encontrar en la app `rating_count_column`.**

Empecemos por calcular el número medio de valoraciones de los usuarios por género de aplicación en la App Store. 


Para ello, necesitaremos

Aislar las apps de cada género.

Sumar las valoraciones de los usuarios de las aplicaciones de ese género.

Dividir la suma por el número de aplicaciones que pertenecen a ese género (no por el número total de aplicaciones).

Para calcular el número medio de valoraciones de los usuarios de cada género, utilizaremos un bucle for dentro de otro bucle for. Este es un ejemplo de un bucle for utilizado dentro de otro bucle for:

In [None]:
some_strings = ['FIRST','SECOND']
some_integers = [1,2,3,4,5]

for string in some_strings:
    print(string)
    
    for integer in some_integers:
        print(integer)

In [None]:
apps_gen = {}

for row in free_apple_clean:
    user_rating = row[7]
    
    if user_rating in apps_gen:
        apps_gen[user_rating] +=1
    else:
        apps_gen[user_rating] =1

In [None]:
apps_gen

Arriba, podemos ver eso:

Primero iteramos sobre la lista some_strings, y para cada iteración:
Imprimimos la cadena (variable de iteración).
Iniciamos otra iteración sobre la lista algunos_integros.
Para cada iteración sobre esta lista, imprimimos entero (variable de iteración).
Podemos ver que para cada una de las dos iteraciones sobre la lista algunas_cuerdas (hay dos iteraciones porque algunas_cuerdas sólo tiene dos elementos de lista), hay otra iteración interna que ocurre sobre la lista algunos_integros.

La segunda iteración sobre algunas_cuerdas comienza sólo cuando la iteración sobre algunos_integros se ha completado. Observa que todos los elementos de la lista algunos_integros se imprimen para cada una de las dos iteraciones sobre la lista algunas_cadenas.

Un bucle dentro de otro bucle se llama bucle anidado. Usaremos un bucle anidado para calcular los promedios que mencionamos anteriormente.



 Comience generando una tabla de frecuencias para la columna prime_genre para obtener los géneros únicos de las aplicaciones (a continuación, necesitaremos hacer un bucle sobre los géneros únicos). Puede utilizar la función freq_table() que escribió en una pantalla anterior.

 Haga un bucle sobre los géneros únicos del conjunto de datos de la App Store. Para cada iteración (a continuación, asumiremos que la variable de iteración se llama género):

 Inicie una variable llamada total con un valor de 0. Esta variable almacenará la suma de las valoraciones de los usuarios (el número de valoraciones, no las valoraciones reales) específicas de cada género.

 Inicie una variable llamada len_genre con un valor de 0. Esta variable almacenará el número de aplicaciones específicas de cada género.

- Realice un bucle sobre el conjunto de datos de la App Store y, en cada iteración, guarde el género de la aplicación en una variable denominada len_genre:

    - Guarda el género de la aplicación en una variable llamada genre_app.

    - Si genre_app es la misma que genre (la variable de iteración del bucle principal), entonces:

        - Guarda el número de valoraciones de los usuarios de la app como un flotador.

        - Suma el número de valoraciones de los usuarios a la variable total.

        - Incrementa la variable len_genre en 1.
        

- Calcule el número medio de valoraciones de los usuarios dividiendo el total entre len_genre. Esto debe hacerse fuera del bucle anidado.

- Imprime el género de la aplicación y el número medio de valoraciones de los usuarios. Esto también debe hacerse fuera del bucle anidado.

 Analice los resultados e intente obtener al menos una recomendación de perfil de aplicación para la App Store. Ten en cuenta que aquí no hay una respuesta fija, y que no pasa nada si el perfil de aplicación que recomiendas es diferente al que se recomienda en el cuaderno de soluciones.

In [None]:
display_table(prime_genre)