# Data analysis to help developers understand what type of apps are likely to attract more users.

<br>

**Introduction to the Project**



In today's mobile development landscape, it is crucial to understand which types of applications attract more users in both Android and Apple ecosystems. With the goal of providing valuable information to developers, this project focuses on data analysis to identify common characteristics among the applications that receive a higher number of downloads and engagement.

1. **What Does the Project Entail?**

    This data analysis combines information from various sources, such as Google Play Store and iTunes App Store, along with usage and performance metrics. Using statistical techniques and machine learning algorithms, patterns and trends are identified to indicate which types of applications perform better in each platform.

2. **Project Objective**

    The primary objective of this project is to provide developers with precise information about the characteristics that resonate best with users on Android and iOS. By understanding better what types of applications perform well in terms of downloads, engagement, and retention, developers can make more informed decisions for the design and optimization of their own apps.

Lo primero es la recopilar y analizar datos sobre aplicaciones móviles disponibles en Google Play y App Store.

Para evitar gastar recursos en la recopilación de nuevos datos, primero debemos tratar de ver si podemos encontrar datos existentes relevantes sin costo alguno. Afortunadamente, aquí hay dos conjuntos de datos que parecen adecuados para nuestros objetivos:

<br>

Un conjunto de datos contiene datos de aproximadamente 10,000 aplicaciones de **Android de Google Play**; los datos se recopilaron en agosto de 2018. 

Un conjunto de datos contiene datos de aproximadamente 7,000 aplicaciones de **iOS de la App Store**; los datos se recopilaron en julio de 2017. 

Vamos a abrir y a explorar estos conjuntos de datos con la función `explore_data()`

Parámetros:

- `dataset`: Una lista de listas que representa el conjunto de datos.

- `start`: Un entero que indica el índice inicial del segmento a mostrar.

- `end`: Un entero que indica el índice final (exclusivo) del segmento a mostrar.

- `rows_and_columns` (opcional): Un booleano con valor predeterminado False. Si es True, imprimirá el número de filas y columnas.

* * *

The datasets are in two documents of type csv's i.e. **Comma Separated Values file** for read the content, it is necessary to load a function called `reader` of the python `csv` module.

In [1]:
! ls -ls

total 440
268 -rw-rw-r-- 1 ion ion 272175 ene 23 18:02  00_For_Loops_Conditionals.ipynb
 52 -rw-rw-r-- 1 ion ion  49612 ene 24 19:55 'Analyzing Mobile App Data_copy.ipynb'
104 -rw-rw-r-- 1 ion ion 104441 ene 27 18:27 'Analyzing Mobile App Data.ipynb'
  4 drwxrwxr-x 2 ion ion   4096 ene 23 18:07  datasets
  4 -rw-rw-r-- 1 ion ion    159 ene 23 18:00  enlace_DQ
  4 -rw-rw-r-- 1 ion ion    538 ene 23 18:01  README.md
  4 -rw-rw-r-- 1 ion ion    618 ene 23 16:02  Untitled.ipynb


In [2]:
! ls datasets

 AppleStore.csv  'data dictionary.txt'	 googleplaystore.csv


In [3]:
from csv import reader

## 1. Exploring datasets:

We are going to explore the two sets of data, but first we are going to kwon what type of [character encoding](https://en.wikipedia.org/wiki/Character_encoding) both datasets have.

A command called [file](https://www.man7.org/linux/man-pages/man1/file.1.html), will help us to kwon the type of file regardless of its extension and avoid a error called `UnicodeDecodeError`.


```
FILE(1)                                         BSD General Commands Manual                                         FILE(1)

NAME
     file — determine file type

SYNOPSIS
     file [-bcdEhiklLNnprsSvzZ0] [--apple] [--exclude-quiet] [--extension] [--mime-encoding] [--mime-type] [-e testname]
          [-F separator] [-f namefile] [-m magicfiles] [-P name=value] file ...
     file -C [-m magicfiles]
     file [--help]

DESCRIPTION
     This manual page documents version 5.41 of the file command.

     file tests each argument in an attempt to classify it.  There are three sets of tests, performed in this order:
     filesystem tests, magic tests, and language tests.  The first test that succeeds causes the file type to be printed.

     The type printed will usually contain one of the words text (the file contains only printing characters and a few com‐
     mon control characters and is probably safe to read on an ASCII terminal), executable (the file contains the result of
     compiling a program in a form understandable to some UNIX kernel or another), or data meaning anything else (data is
     usually “binary” or non-printable).  Exceptions are well-known file formats (core files, tar archives) that are known
     to contain binary data.  When modifying magic files or the program itself, make sure to preserve these keywords.
     Users depend on knowing that all the readable files in a directory have the word “text” printed.  Don't do as Berkeley
     did and change “shell commands text” to “shell script”.

```

In [4]:
! file -i datasets/AppleStore.csv

datasets/AppleStore.csv: text/csv; charset=utf-8


In [5]:
! file -i datasets/googleplaystore.csv

datasets/googleplaystore.csv: text/csv; charset=utf-8


Our files have the same type of charset https://en.wikipedia.org/wiki/UTF-8.


Quick idea:

- **charset:** is the set of characters you can use.
- **encoding:** is the way these characters are stored into memory.

[source](https://stackoverflow.com/questions/2281646/whats-the-difference-between-encoding-and-charset)

In [6]:
opened_ios = open('datasets/AppleStore.csv', encoding = 'utf8')
opened_android = open('datasets/googleplaystore.csv', encoding = 'utf8')

readed_ios = reader(opened_ios)
readed_android = reader(opened_android)

# The datasets we are going to work with
ios = list(readed_ios) 
android = list(readed_android)

Separating the row corresponding to the headers from the rest of the data in both datasets

In [7]:
header_ios = [x for x in ios[0]]
header_ios #Columns on AppleStore

['id',
 'track_name',
 'size_bytes',
 'currency',
 'price',
 'rating_count_tot',
 'rating_count_ver',
 'user_rating',
 'user_rating_ver',
 'ver',
 'cont_rating',
 'prime_genre',
 'sup_devices.num',
 'ipadSc_urls.num',
 'lang.num',
 'vpp_lic']

In [8]:
header_android = [x for x in android[0]]
header_android #Columns on AppleStore

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

Creating the Datasets `data_ios` and `data_android`

In [9]:
data_ios = ios[1:]
data_android = android[1:]

In [10]:
def explore_data(dataset, start, end, rows_and_columns = False):
    offset = 1
    cnt = 1
    columns = len(dataset[0])
    
    if rows_and_columns == True:
        print("\n")

        for row in dataset[offset+start:end+offset]:
            print("row:",cnt, row)
            print("\n")
            cnt +=1
            
        print("columns ",columns , dataset[0])
        print("\n")
        print('Number of rows:', len(dataset))
        
    else:
        for row in dataset[offset+start:end+offset]:
            print(row)
            print("\n")

In [11]:
explore_data(data_ios, 0, 10, True)



row: 1 ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


row: 2 ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


row: 3 ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


row: 4 ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


row: 5 ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


row: 6 ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


row: 7 ['553834731', 'Candy Crush Saga', '222846976', 'USD', '0.0', '961794', '2453', '4.5', '4.5', '1.101.0'

In [12]:
explore_data(data_android, 0, 10, True)



row: 1 ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


row: 2 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


row: 3 ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


row: 4 ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


row: 5 ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


row: 6 ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN

Aparentemente parece que todo está bien.

### Featuring Data sets 

|`data_ios`|`data_android`|
|:----|:----|
|Number of rows: 7198| Number of rows: 10842|
|Number of columns: 16|Number of columns: 13|
|________________________________________________________________________________________________|_______________________________________________________________

### Data dictionary (`data_ios`)

||Name | Description|
|:--|:---|:--|
|1|"id" : |App ID|
|2|"track_name": |App Name|
|3|"size_bytes": |Size (in Bytes)|
|4|"currency": |Currency Type|
|5|"price": |Price amount|
|6|"ratingcounttot": |User Rating counts (for all version)|
|7|"ratingcountver": |User Rating counts (for current version)|
|8|"user_rating" : |Average User Rating value (for all version)|
|9|"userratingver": |Average User Rating value (for current version)|
|10|"ver" : |Latest version code|
|11|"cont_rating": |Content Rating|
|12|"prime_genre": |Primary Genre|
|13|"sup_devices.num": |Number of supporting devices|
|14|"ipadSc_urls.num": |Number of screenshots showed for display|
|15|"lang.num": |Number of supported languages|
|16|"vpp_lic": |Vpp Device Based Licensing Enabled|

### Diccionario de datos de Android (`data_android`)

|||
|:--|:--|
|**App**| Application name|
|**Category**| Category the app belongs to|
|**Rating**| Overall user rating of the app (as when scraped)|
|**Reviews**| Number of user reviews for the app (as when scraped)|
|**Size**| Size of the app (as when scraped)|
|**Installs**| Number of user downloads/installs for the app (as when scraped)|
|**Type**| Paid or Free|
|**Price**| Price of the app (as when scraped)|
|**Content Rating**| Age group the app is targeted at - Children / Mature 21+ / Adult|
|**Genres**| An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres.|
|**Last Updated**| Date when the app was last updated on Play Store (as when scraped)|
|**Current Ver**| Current version of the app available on Play Store (as when scraped)|
|**Android Ver**| Min required Android version (as when scraped)|
||_____________________________________________________________________________________________________________________|

#### Googleplaystore.csv


||Name | Description|
|:--|:---|:--|
|1 | App: |Application name|
|2 | Category: |Category the app belongs to|
|3 | Rating: |Overall user rating of the app (as when scraped)|
|4 | Reviews: | Number of user reviews for the app (as when scraped)|
|5 | Size: | Size of the app (as when scraped)|
|6 | Installs: | Number of user downloads/installs for the app (as when scraped)|
|7 | Type: | Paid or Free|
|8 | Price: | Price of the app (as when scraped)|
|9 | Content: | Rating Age group the app is targeted at - Children / Mature 21+ / Adult|
|10 |  Genres: | An app can belong to multiple genres (apart from its main category). For eg, a musical family game will belong to Music, Game, Family genres.|

Here are the columns that might help us in our analysis in the quest to understand what kind of apps are likely to attract the most users:

## 2. Data cleaning

Once we have explored the datasets, considering that we create free applications in English, simply to download and install, we must remove the applications that are not free and those that are not in English. The next step we have to do is to perform a data cleansing before the analysis, which involves:

- 1. **Detect inaccurate data, and correct or remove it**.

- 2. **Detect duplicate data, and remove the duplicates**.

- 3. **Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播**.

- 4. **Remove apps that aren't free##**.

<br>

**Detect inaccurate data... remove it.**

The Google Play dataset has a dedicated [discussion section](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion?sort=votes), and we can see that [one of the discussions](https://www.kaggle.com/datasets/lava18/google-play-store-apps/discussion/66015) describes an error for a certain row.

If the number of columns in a row varies then some data is missing.

<br>

This function `field_number` has a pair of parameters `dataset`, `header` in which we enter the dataset and the header.

The passage of each row compare the number of fields with the number of fields in header if they do not coincide then in that line **some type of data is missing**.

In [13]:
def field_number(dataset):
    index_number = 0
    header = len(dataset[0]) 
    for row in range(len(dataset)):
        if len(dataset[row]) != header:
            print(dataset[row])
            print("\n")
            print("error in index number: ",index_number)
            print("number of columns has an error: ",len(dataset[row]))
        index_number +=1

In [14]:
field_number(data_ios)

En el dataset `data_ios` no tenemos ningun problema con las columnas.

In [15]:
data_android[10472]

['Life Made WI-Fi Touchscreen Photo Frame',
 '1.9',
 '19',
 '3.0M',
 '1,000+',
 'Free',
 '0',
 'Everyone',
 '',
 'February 11, 2018',
 '1.0.19',
 '4.0 and up']

In [16]:
field_number(data_android)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


error in index number:  10472
number of columns has an error:  12


In [17]:
len(data_android[10472])

12

We can check that the number of columns in this row `10472` is less than the number of columns in the header, so let's delete this row from our dataset.

In [18]:
del data_android[10472]

Checking if row has been removed properlly.

In [19]:
data_android[10472]

['osmino Wi-Fi: free WiFi',
 'TOOLS',
 '4.2',
 '134203',
 '4.1M',
 '10,000,000+',
 'Free',
 '0',
 'Everyone',
 'Tools',
 'August 7, 2018',
 '6.06.14',
 '4.4 and up']

#### yes ...It has been removed

### Detect duplicate data, and remove the duplicates

We have already checked that there is a coherence between the columns and columns of each of the rows of both datasets, so we must check the existence of applications with the repeated name. So let's check if this is true in both datasets:

`for` is running through each row extracting the name of the application.

- Checking if it already exists or not.

- if not exist then the name is inserted in `unique_apps` else is inserted into `duplicate_apps`.

In [20]:
def detect_duplicated(dataset):
    duplicate_apps = []
    unique_apps = []
    
    for app in dataset:
        name = app[0]
        if name in unique_apps:
            duplicate_apps.append(name)
        else:
            unique_apps.append(name)
    if len(duplicate_apps) == 0:
        print("In this dataset there are no duplicate apps")
    else:
        times=len(duplicate_apps)
        print("There are {reapeated} repeated apps".format(reapeated=times,dataset=dataset))
        print("\n")
        print('Examples of duplicate apps:',"\n"+"\n" , duplicate_apps[:25])

In [21]:
detect_duplicated(data_ios)

In this dataset there are no duplicate apps


In [22]:
detect_duplicated(data_android)

There are 1181 repeated apps


Examples of duplicate apps: 

 ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software', 'MailChimp - Email, Marketing Automation', 'Crew - Free Messaging and Scheduling', 'Asana: organize team projects', 'Google Analytics', 'AdWords Express', 'Accounting App - Zoho Books', 'Invoice & Time Tracking - Zoho', 'join.me - Simple Meetings', 'Invoice 2go — Professional Invoices and Estimates', 'SignEasy | Sign and Fill PDF and other Documents']


This list that has appeared shows an example of the `1181` repeated applications that are in the dataset, but that does not mean that they are duplicates, there are some that are even quadrupled. 

<br>

By entering in this piece of code the applications that appeared as repeated in the previous cell we can get an idea of what it looks like.

### 2.3 What differentiates duplicate applications?

If we take one of this as an example of a repeated application, we see that out of all the fields that make up the row **there is only one that differs**.

Once we have verified that there are many applications (some) repeated more than once, the next step would be to determine which criterion to follow for removing the applications:

In this case, the criterion has been: **the application with the higher number of reviews should have the most recent data**."

In [23]:
for app in data_android[1:]:
    if app[0] == 'Quick PDF Scanner + OCR FREE':
        print(app)

['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80805', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']
['Quick PDF Scanner + OCR FREE', 'BUSINESS', '4.2', '80804', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'February 26, 2018', 'Varies with device', '4.0.3 and up']


In [24]:
for app in data_android[1:]:
    if app[0] == 'MailChimp - Email, Marketing Automation':
        print(app)

['MailChimp - Email, Marketing Automation', 'BUSINESS', '4.1', '5448', '12M', '500,000+', 'Free', '0', 'Everyone', 'Business', 'July 25, 2018', '4.9.1', '5.0 and up']
['MailChimp - Email, Marketing Automation', 'BUSINESS', '4.1', '5448', '12M', '500,000+', 'Free', '0', 'Everyone', 'Business', 'July 25, 2018', '4.9.1', '5.0 and up']


Once we have verified that there are applications (many) repeated more than once, the next step would be to determine what criteria to follow to remove the applications:

In this case, the criterion has been:

- **The higher the number of reviews, the more recent the data should be!** 

<br> 
, instead of eliminating random duplicates we will only keep those applications that are repeated and have the highest number of reviews, they will be the ones that will prevail in our dataset, the others will be eliminated.

This function takes a dataset (usually a list of lists) and creates a dictionary where the key is the name of the app and the value is the maximum number of reviews associated with that app.

In [25]:
def uniq_apps_top_reviews(dataset):  # diccionario con el nombre de las aplicaciones con su máximo review
    reviews_max = {} #key=name[0], value=review[3]
    
    for app in dataset: # No tengo que tener en cuenta la fila de las columnas
        name = app[0]
        n_reviews = float(app[3]) # porque si lo hiciera tendría un error al convertir a float 
                                  # pero esto me deja el dataset sin nombres de columnas! Debo hacer lo mismo en el de Apple.  
        
        if name in reviews_max and reviews_max[name] < n_reviews:
            reviews_max[name] = n_reviews
            
        elif name is not reviews_max:
            reviews_max[name] = n_reviews
    
    print('length of the dictionary:',len(reviews_max))
    return reviews_max

We have created a dictionary name `dicty` containing the name of the apps with the highest number of reviews

In [26]:
dicty = uniq_apps_top_reviews(data_android)

length of the dictionary: 9659


This function takes a dataset and a dictionary as inputs. It filters the dataset to include only those entries where the number of reviews matches the maximum number of reviews for that application, based on the dictionary. 

In [27]:
def rev_apps(dataset, dict):
    android_clean = []
    already_added = []
    for app in dataset:
        name = app[0]
        n_reviews = float(app[3])
        
        if ((dict[name] == n_reviews) and (name not in already_added)):
            android_clean.append(app)
            already_added.append(name)
    
    return android_clean

In [28]:
android_clean = rev_apps(data_android,dicty) # cleaning all dataset

Once we have cleaned the dataset, we will check the absence of repeated applications in both datasets, just for be sure

In [29]:
detect_duplicated(android_clean)

In this dataset there are no duplicate apps


In [30]:
print(len(android_clean))

9659


In [31]:
print(len(data_ios))

7197


We update the status of the datasets
|`data_ios`|`android_clean`|
|:----|:----|
|Number of rows: 7197| Number of rows: 9659|
|Number of columns: 16|Number of columns: 13|
|________________________________________________________________________________________________|_______________________________________________________________|

Now we have a new dataset for android with **9659** different apps in different languages, so now what we're going to do is filter those apps.

### Removing Non-English Apps

1. **General Objective**: Our company develops applications in English and we are interested in analyzing only those applications aimed at an English-speaking audience.

2. **Identified Problem**: Upon examining the data, we have found that some application names suggest they are not intended for an English-speaking audience. This is often due to the presence of symbols or characters that are not commonly used in English text.

3. **Proposed Strategy**: To address this problem, we will remove all applications whose names contain symbols that are not typically used in English. These symbols include letters from other alphabets, punctuation marks (., !, ?, ;), and other non-alphabetic characters such as (+, *, /).

4. **Detailed Method**:
   - Each character (letter, number, dot, comma, etc.) has a corresponding number that can be obtained using the built-in function `ord()`.
   - For example, the character "a" corresponds to 97, "A" corresponds to 65, and any other non-alphabetic character might have a different value (for instance, "爱" is 29.233).

   - We will use the `ord()` function to get the numerical value of each character in a string.
   - If a character has a number that falls outside the common range for English characters (i.e., not within the range 65-90 for uppercase and 97-122 for lowercase), we will remove that application.

In summary, our strategy is:
- Use `ord()` to get the numerical value of each character in the names of the applications.
- Compare this value with the common range of values for English characters (65-90 for uppercase and 97-122 for lowercase).
- Remove any applications whose names contain at least one character outside this range.

This strategy will ensure that we only analyze applications with names clearly in English.

### Code Explanation

The function `english_speak(text)` checks if a given text is primarily composed of English characters. It does this by:
1. **Checking Each Character**: For each character in the input text.
2. **Determining Ordinal Value**: Using the `ord()` function to get the ordinal value of each character.
3. **Marking Non-English Characters**: If the ordinal value is greater than 127, it marks that character as non-English.
4. **Returning Result**:
   - Returns `False` if there are more than two non-English characters.
   - Returns `True` otherwise.

### Examples

#### Example 1: "hello"
- Each character is checked: `'h'`, `'e'`, `'l'`, `'l'`, `'o'`.
- All characters have ordinal values between 33 and 126, so it returns `True`.

#### Example 2: "¡hola"
- Each character is checked: `'¡'`, `'h'`, `'o'`, `'l'`, `'a'`.
- `'¡'` has an ordinal value of 161 (greater than 127), so it is marked as non-English.
- Only one character is marked, so it returns `True`.

#### Example 3: "我爱你"
- Each character is checked: `'我'`, `'爱'`, `'你'`.
- Both `'我'` and `'爱'` have ordinal values greater than 127, so they are marked as non-English.
- More than two characters are marked, so it returns `False`.

### Summary

- The function checks if there are more than two non-English characters in the text.
- If there are more than two, it returns `False`.
- Otherwise, it returns `True`.

This way, the function ensures that only texts with primarily English characters are considered.

In [32]:
def english_speak(text):
    non_validchar = [] #aux list
    for character in text:
        valor_character = ord(character)
        if valor_character > 127:
            non_validchar.append(valor_character)
    if len(non_validchar) >= 3:
        return False
    else:
        return True

Recordamos en que indice tanto android como ios tienen los indices que corresponden a los nombres

In [33]:
english_speak('Docs To Go™ Free Office Suite')

True

In [34]:
english_speak('Instachat 😜 😜')

True

In [35]:
english_speak('爱奇艺PPS -《欢乐颂2》电视剧热播')

False

In [36]:
english_speak('Instachat 😜 😜 😜')

False

### Filter non-English applications from both datasets

In [37]:
def cleaning_dataset(dataset):
    list_clean = []
    for row in dataset[1:]:
        name = row[1]
        if english_speak(name):
            list_clean.append(row)
        else:
            pass
    return list_clean

In [38]:
# Android

clean_android = cleaning_dataset(android_clean)

In [39]:
# AppleStore

clean_ios = cleaning_dataset(data_ios)

### 2.6 Exploring datasets and see how many rows we have left for each dataset.

In each cleaning process we have put different names to our dataset variables so it is important to take them into account

In [40]:
explore_data(clean_ios, 0, 5, True)



row: 1 ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


row: 2 ['420009108', 'Temple Run', '65921024', 'USD', '0.0', '1724546', '3842', '4.5', '4.0', '1.6.2', '9+', 'Games', '40', '5', '1', '1']


row: 3 ['284035177', 'Pandora - Music & Radio', '130242560', 'USD', '0.0', '1126879', '3594', '4.0', '4.5', '8.4.1', '12+', 'Music', '37', '4', '1', '1']


row: 4 ['429047995', 'Pinterest', '74778624', 'USD', '0.0', '1061624', '1814', '4.5', '4.0', '6.26', '12+', 'Social Networking', '37', '5', '27', '1']


row: 5 ['282935706', 'Bible', '92774400', 'USD', '0.0', '985920', '5320', '4.5', '5.0', '7.5.1', '4+', 'Reference', '37', '5', '45', '1']


columns  16 ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows: 6154


In [41]:
explore_data(clean_android, 0, 5, True)



row: 1 ['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


row: 2 ['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


row: 3 ['Paper flowers instructions', 'ART_AND_DESIGN', '4.4', '167', '5.6M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'March 26, 2017', '1.0', '2.3 and up']


row: 4 ['Smoke Effect Photo Maker - Smoke Editor', 'ART_AND_DESIGN', '3.8', '178', '19M', '50,000+', 'Free', '0', 'Everyone', 'Art & Design', 'April 26, 2018', '1.1', '4.0.3 and up']


row: 5 ['Infinite Painter', 'ART_AND_DESIGN', '4.1', '36815', '29M', '1,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'June 14, 2018', '6.1.61.1', '4.2 and up']


columns  13 ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510'

We update the status of the datasets
|`clean_ios`|`clean_android`|
|:----|:----|
|Number of rows: 9658| Number of rows: 6155|
|Number of columns: 16|Number of columns: 13|
|________________________________________________________________________________________________|_______________________________________________________________|

### 3. Isolating Free Apps


To isolate only free apps from our datasets and check how many apps we have left for analysis, we need to determine **which number** of column indicates whether an app is free or paid. Then, we will see the length of each dataset to know the exact number of free apps

In [42]:
def column_number(dataset):
    return [f"{x} = {i}" for i, x in enumerate(dataset)]

In [43]:
column_number(header_ios)

['id = 0',
 'track_name = 1',
 'size_bytes = 2',
 'currency = 3',
 'price = 4',
 'rating_count_tot = 5',
 'rating_count_ver = 6',
 'user_rating = 7',
 'user_rating_ver = 8',
 'ver = 9',
 'cont_rating = 10',
 'prime_genre = 11',
 'sup_devices.num = 12',
 'ipadSc_urls.num = 13',
 'lang.num = 14',
 'vpp_lic = 15']

In [44]:
column_number(header_android)

['App = 0',
 'Category = 1',
 'Rating = 2',
 'Reviews = 3',
 'Size = 4',
 'Installs = 5',
 'Type = 6',
 'Price = 7',
 'Content Rating = 8',
 'Genres = 9',
 'Last Updated = 10',
 'Current Ver = 11',
 'Android Ver = 12']

In [45]:
def free_apps(dataset, number_col): # Depending on the dataset, the number of column is one or the other
    free_clean = [] 
    
    for row in dataset:
        prix = row[number_col]
        if prix == '0.0' or prix == '0': #not numbers just characters! 
            free_clean.append(row)
        else:
            pass
    print(f"Free Apps are {len(free_clean)}")
    return free_clean

In [46]:
free_apple_clean = free_apps(clean_ios,4)

Free Apps are 3202


In [47]:
free_android_clean = free_apps(clean_android,7)

Free Apps are 8904


**Updating names and lengths**


|Dataset names|lenght|
|:----|:----|
|`free_apple_clean`|3203|
|`free_android_clean`|8848|

### 3.1 Our ultimate goal

As we mentioned in the introduction, our goal is to determine the types of apps that are likely to attract more users because the number of people who use our apps affects our revenue. 

To minimize risks and overhead, our validation strategy for an app idea has three steps: 

- Create a minimum Android version of the app and add it to Google Play.
- If the app has a good response from users, we develop it further.
    - If the app is profitable after six months, we build an iOS version of the app and add it to the App Store.


<br>

Because our ultimate goal is to add the app on both Google Play and the App Store, we need to find app profiles that are successful in both markets. For example, a profile that works well for both markets could be a productivity app that makes use of gamification.

Our **ultimate goal** is to add the app on both `Google Play` and the `App Store`.

So we need to find profiles of apps that are successful **in both markets**. 

Let's start the analysis by having an idea of **what are the most common genres for each market**. To do this, we will have to build frequency tables for some columns of our datasets.

The columns that can give us the information we need are:

`free_apple_clean[11]` ---> `prime_genre`

`free_android_clean[9]` ---> `Genres`

In [48]:
def freq_table(dataset, column):
    table = {}
    total = 0
    
    for row in dataset:
        total +=1 # counter give us length.
        value = row[column]
        if value in table:
            table[value] +=1
        else:
            table[value] = 1
        
    table_percentages = {}
    
    for key in table: #table[column value]<--as key] value <--as +1 values added 
        percentage = (table[key] / total ) * 100
        table_percentages[key]=percentage
            
    return table_percentages # {column:times repeated}

In [49]:
print(freq_table(free_apple_clean, 11))

{'Photo & Video': 4.996876951905059, 'Games': 58.276077451592755, 'Music': 2.061211742660837, 'Social Networking': 3.2792004996876947, 'Reference': 0.5309181761399125, 'Health & Fitness': 2.0299812617114306, 'Weather': 0.8744534665833853, 'Utilities': 2.467207995003123, 'Travel': 1.2492192379762648, 'Shopping': 2.5921299188007496, 'News': 1.3429106808244848, 'Navigation': 0.18738288569643974, 'Lifestyle': 1.5615240474703311, 'Entertainment': 7.838850718301062, 'Food & Drink': 0.8119925046845722, 'Sports': 2.1549031855090566, 'Book': 0.3747657713928795, 'Finance': 1.0930668332292317, 'Education': 3.685196752029981, 'Productivity': 1.7489069331667706, 'Business': 0.5309181761399125, 'Catalogs': 0.12492192379762648, 'Medical': 0.18738288569643974}


In [50]:
print(freq_table(free_android_clean, 9))

{'Art & Design': 0.5952380952380952, 'Art & Design;Creativity': 0.06738544474393532, 'Auto & Vehicles': 0.9209344115004492, 'Beauty': 0.5952380952380952, 'Books & Reference': 2.178796046720575, 'Business': 4.5709793351302785, 'Comics': 0.6176999101527404, 'Comics;Creativity': 0.011230907457322553, 'Communication': 3.2457322551662173, 'Dating': 1.853099730458221, 'Education': 5.3908355795148255, 'Education;Creativity': 0.04492362982929021, 'Education;Education': 0.3481581311769991, 'Education;Pretend Play': 0.05615453728661276, 'Education;Brain Games': 0.03369272237196766, 'Entertainment': 6.087151841868823, 'Entertainment;Brain Games': 0.07861635220125787, 'Entertainment;Creativity': 0.03369272237196766, 'Entertainment;Music & Video': 0.1684636118598383, 'Events': 0.7075471698113208, 'Finance': 3.6837376460017968, 'Food & Drink': 1.2353998203054808, 'Health & Fitness': 3.054806828391734, 'House & Home': 0.8198562443845463, 'Libraries & Demo': 0.9321653189577718, 'Lifestyle': 3.91958670

We can see the content of dictionaries, but this visualization format is not exactly the most comfortable to work with, so let's make a visualization table of each of the elements of the dictionary ordered from highest to lowest by percentage

In [51]:
def display_table(dataset, column):
    table = freq_table(dataset, column) #return diccionary
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)   #times repeated & key same variable
        table_display.append(key_val_as_tuple) #append tuple into list

    table_sorted = sorted(table_display, reverse = True) #sorting list
    for entry in table_sorted: #print every cell on list
        print(entry[1], ':', round(entry[0],2),'%')

In [52]:
column_number(header_ios)

['id = 0',
 'track_name = 1',
 'size_bytes = 2',
 'currency = 3',
 'price = 4',
 'rating_count_tot = 5',
 'rating_count_ver = 6',
 'user_rating = 7',
 'user_rating_ver = 8',
 'ver = 9',
 'cont_rating = 10',
 'prime_genre = 11',
 'sup_devices.num = 12',
 'ipadSc_urls.num = 13',
 'lang.num = 14',
 'vpp_lic = 15']

In [53]:
column_number(header_android)

['App = 0',
 'Category = 1',
 'Rating = 2',
 'Reviews = 3',
 'Size = 4',
 'Installs = 5',
 'Type = 6',
 'Price = 7',
 'Content Rating = 8',
 'Genres = 9',
 'Last Updated = 10',
 'Current Ver = 11',
 'Android Ver = 12']

Utilizaremos la funcion `display_table` la utilizaremos para mostrar la tabla de frecuencias de las columnas `prime_genre`, `Genres`, y `Category`

In [54]:
output_apple_genere = display_table(free_apple_clean,11) #apple prime_genere 

Games : 58.28 %
Entertainment : 7.84 %
Photo & Video : 5.0 %
Education : 3.69 %
Social Networking : 3.28 %
Shopping : 2.59 %
Utilities : 2.47 %
Sports : 2.15 %
Music : 2.06 %
Health & Fitness : 2.03 %
Productivity : 1.75 %
Lifestyle : 1.56 %
News : 1.34 %
Travel : 1.25 %
Finance : 1.09 %
Weather : 0.87 %
Food & Drink : 0.81 %
Reference : 0.53 %
Business : 0.53 %
Book : 0.37 %
Navigation : 0.19 %
Medical : 0.19 %
Catalogs : 0.12 %


In [55]:
output_gogle_genre = display_table(free_android_clean,9)  # Android Genres 

Tools : 8.42 %
Entertainment : 6.09 %
Education : 5.39 %
Business : 4.57 %
Lifestyle : 3.92 %
Productivity : 3.89 %
Finance : 3.68 %
Medical : 3.53 %
Sports : 3.45 %
Personalization : 3.31 %
Communication : 3.25 %
Action : 3.09 %
Health & Fitness : 3.05 %
Photography : 2.94 %
News & Magazines : 2.83 %
Social : 2.65 %
Travel & Local : 2.31 %
Shopping : 2.25 %
Books & Reference : 2.18 %
Simulation : 2.07 %
Dating : 1.85 %
Arcade : 1.85 %
Video Players & Editors : 1.79 %
Casual : 1.74 %
Maps & Navigation : 1.42 %
Food & Drink : 1.24 %
Puzzle : 1.12 %
Racing : 0.99 %
Role Playing : 0.93 %
Libraries & Demo : 0.93 %
Strategy : 0.92 %
Auto & Vehicles : 0.92 %
House & Home : 0.82 %
Weather : 0.8 %
Events : 0.71 %
Adventure : 0.69 %
Comics : 0.62 %
Beauty : 0.6 %
Art & Design : 0.6 %
Parenting : 0.49 %
Card : 0.44 %
Trivia : 0.43 %
Casino : 0.43 %
Educational;Education : 0.39 %
Board : 0.38 %
Educational : 0.37 %
Education;Education : 0.35 %
Word : 0.26 %
Casual;Pretend Play : 0.24 %
Music : 0.

In [56]:
output_gogle_category = display_table(free_android_clean,1)  # Android Category 

FAMILY : 19.29 %
GAME : 9.49 %
TOOLS : 8.43 %
BUSINESS : 4.57 %
LIFESTYLE : 3.93 %
PRODUCTIVITY : 3.89 %
FINANCE : 3.68 %
MEDICAL : 3.53 %
SPORTS : 3.4 %
PERSONALIZATION : 3.31 %
COMMUNICATION : 3.25 %
HEALTH_AND_FITNESS : 3.05 %
PHOTOGRAPHY : 2.94 %
NEWS_AND_MAGAZINES : 2.83 %
SOCIAL : 2.65 %
TRAVEL_AND_LOCAL : 2.32 %
SHOPPING : 2.25 %
BOOKS_AND_REFERENCE : 2.18 %
DATING : 1.85 %
VIDEO_PLAYERS : 1.79 %
MAPS_AND_NAVIGATION : 1.42 %
FOOD_AND_DRINK : 1.24 %
EDUCATION : 1.13 %
LIBRARIES_AND_DEMO : 0.93 %
AUTO_AND_VEHICLES : 0.92 %
ENTERTAINMENT : 0.88 %
HOUSE_AND_HOME : 0.82 %
WEATHER : 0.8 %
EVENTS : 0.71 %
PARENTING : 0.65 %
ART_AND_DESIGN : 0.64 %
COMICS : 0.63 %
BEAUTY : 0.6 %


## Summary

| Dataset name       | Genres            | percentage   | Category    | Category Percentage |
|:------------------|:-----------------|-------------:|------------:|--------------------:|
| free_apple_clean  | Games            | 58%          |             |                     |
|                   | **Entertainment**    | 7.84%        |             |                     |
|                   | Photo & Video    | 5%           |             |                     |
|                   | Education        | 4%           |             |                     |
|                   | Social Networking| 3%           |             |                     |
| free_android_clean| Tools            | 8%           | FAMILY      | 19%                 |
|                   | **Entertainment**    | 6%           | GAME        | 9.5%                |
|                   | Education        | 5%           | BUSINESS    | 4.5%                |
|                   | Business         | 5%           | TOOLS       | 8.4%                |
|                   | Productivity     | 4%           | LIFESTYLE   | 4.0%                |


Analyze the frequency table that you generated for the prime_genre column of the App Store dataset. 

- What is the most common gender?
- What is the next most common?
- What other patterns do you see? What's the overall impression — are most apps designed for practical purposes (education, shopping, utilities, productivity, lifestyle) or more for entertainment (games, photos and videos, social media, sports, music)?


The common genre in both systems is Entertainment. Curiously, the percentage of applications on Android is much more balanced than the percentage of applications on iOS, where in first place and with more than half of the market appears the category of games.



- Can you recommend an app profile for the App Store marketplace based on this frequency chart alone?
- If there are a large number of apps for a particular genre, does that also imply that apps in that genre usually have a large number of users?

- Analyze the frequency table that you generated for the Category and Genres column of the Google Play dataset.
- What are the most common genres?
- What other patterns do you see?


Compare the patterns you see for the Google Play marketplace with the ones you saw for the App Store marketplace. 
- Can you recommend an app profile based on what you've found so far?
- Do the frequency tables it generated reveal the most frequent app genres or which genres have the most users?

### We would like to get an idea about the type of apps with more users.

One way to find out which genres are the most popular (have the most users) is to calculate the **average number of installs for each app genre**.

<br>

- For the **Google Play** dataset, we can find this information in the **installs** column, 


- For the **App Store dataset** we will take the total number of user ratings as a proxy, which we can find in the `'rating_count'` column app.

In [57]:
column_number(header_ios)

['id = 0',
 'track_name = 1',
 'size_bytes = 2',
 'currency = 3',
 'price = 4',
 'rating_count_tot = 5',
 'rating_count_ver = 6',
 'user_rating = 7',
 'user_rating_ver = 8',
 'ver = 9',
 'cont_rating = 10',
 'prime_genre = 11',
 'sup_devices.num = 12',
 'ipadSc_urls.num = 13',
 'lang.num = 14',
 'vpp_lic = 15']

In [58]:
column_number(header_android)

['App = 0',
 'Category = 1',
 'Rating = 2',
 'Reviews = 3',
 'Size = 4',
 'Installs = 5',
 'Type = 6',
 'Price = 7',
 'Content Rating = 8',
 'Genres = 9',
 'Last Updated = 10',
 'Current Ver = 11',
 'Android Ver = 12']

## Most Popular Apps by Genre on Apple Store

In [59]:
genres_ios = freq_table(free_apple_clean, 11) #prime_genre
list_order = {}

for genre in genres_ios:
    total = 0
    len_genre = 0
    for app in free_apple_clean:
        genre_app = app[-5]
        if genre_app == genre:            
            n_ratings = float(app[5]) #rating_count_tot
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    print(genre, ':', avg_n_ratings)
    
    list_order[genre]=avg_n_ratings

Photo & Video : 28441.54375
Games : 22886.36709539121
Music : 57326.530303030304
Social Networking : 43899.514285714286
Reference : 79350.4705882353
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 19156.493670886077
Travel : 28243.8
Shopping : 27230.734939759037
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16815.48
Entertainment : 14195.358565737051
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 46384.916666666664
Finance : 32367.02857142857
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


### - sorting Genres on App Store 

In [60]:
marklist = sorted(list_order.items(), key=lambda x:x[1],reverse = True)
sortdict = dict(marklist)
sortdict

{'Navigation': 86090.33333333333,
 'Reference': 79350.4705882353,
 'Music': 57326.530303030304,
 'Weather': 52279.892857142855,
 'Book': 46384.916666666664,
 'Social Networking': 43899.514285714286,
 'Food & Drink': 33333.92307692308,
 'Finance': 32367.02857142857,
 'Photo & Video': 28441.54375,
 'Travel': 28243.8,
 'Shopping': 27230.734939759037,
 'Health & Fitness': 23298.015384615384,
 'Sports': 23008.898550724636,
 'Games': 22886.36709539121,
 'News': 21248.023255813954,
 'Productivity': 21028.410714285714,
 'Utilities': 19156.493670886077,
 'Lifestyle': 16815.48,
 'Entertainment': 14195.358565737051,
 'Business': 7491.117647058823,
 'Education': 7003.983050847458,
 'Catalogs': 4004.0,
 'Medical': 612.0}

In [61]:
for app in free_apple_clean:
    if app[-5] == 'Navigation':    # number one
        print(app[1], ':', app[5]) # print name and number of ratings

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.

**Our aim** is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. 

**The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold**. 


We could get a **better picture by removing these extremely popular apps** for each genre and then rework the averages, but we'll leave this level of detail for later.

Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:

### What applications are there in the second most popular genre?

In [62]:
for app in free_apple_clean:
    if app[-5] == 'Reference': # number two
        print(app[1], ':', app[5])

Bible : 985920
Dictionary.com Dictionary & Thesaurus : 200047
Dictionary.com Dictionary & Thesaurus for iPad : 54175
Google Translate : 26786
Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418
New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588
Merriam-Webster Dictionary : 16849
Night Sky : 12122
City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535
LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693
GUNS MODS for Minecraft PC Edition - Mods Tools : 1497
Guides for Pokémon GO - Pokemon GO News and Cheats : 826
WWDC : 762
Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718
VPN Express : 14
Real Bike Traffic Rider Virtual Reality Glasses : 8
Jishokun-Japanese English Dictionary & Translator : 0


## Most Popular Apps by Genre on Google Android

In [63]:
categories_android = freq_table(free_android_clean, 5)
categories_android

{'5,000,000+': 6.8171608265947885,
 '50,000,000+': 2.2911051212938007,
 '100,000+': 11.590296495956872,
 '50,000+': 4.818059299191375,
 '1,000,000+': 15.71203953279425,
 '10,000+': 10.25381850853549,
 '10,000,000+': 10.455974842767295,
 '5,000+': 4.526055705300988,
 '500,000+': 5.536837376460018,
 '1,000,000,000+': 0.22461814914645103,
 '100,000,000+': 2.1226415094339623,
 '1,000+': 8.423180592991914,
 '500,000,000+': 0.2695417789757413,
 '500+': 3.234501347708895,
 '100+': 6.918238993710692,
 '50+': 1.9092542677448336,
 '10+': 3.5377358490566038,
 '1+': 0.5166217430368374,
 '5+': 0.7861635220125787,
 '0+': 0.04492362982929021,
 '0': 0.011230907457322553}

In [64]:
for category in categories_android:
    total = 0
    len_category = 0
    for app in free_android_clean:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
    
    list_order[category]=avg_n_installs

ZeroDivisionError: division by zero

In [None]:
marklist = sorted(list_order.items(), key=lambda x:x[1],reverse = True)
sortdict_android = dict(marklist)
sortdict_android

On average, communication apps have the most installs: 38590581.08741259. 

**This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts)**, and a few others with over 100 and 500 million installs:


In [None]:
for app in free_android_clean:
    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+' or 
                                      app[5] == '500,000,000+' or 
                                      app[5] == '100,000,000+'):
        print(app[0], ':', app[5])

### 5. Most Popular Apps by Genre on Google App

In [None]:
for app in free_android_clean:
    if app[9] == 'Art & Design;Creativity':    # number one
        print(app[1], ':', app[5]) # print name and number of ratings

### What applications are there in the second most popular genre?

In [None]:
for app in free_apple_clean:
    if app[-5] == 'Reference':    # number one
        print(app[1], ':', app[5]) # print name and number of ratings

In [None]:
for app in free_android_clean:
    if app[9] == 'Comics;Creativity':    # number one
        print(app[1], ':', app[5]) # print name and number of ratings