# Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We’re working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build. 

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

Let's start by opening the two data sets and then continue with exploring the data.

In [4]:
from csv import reader

### The Google Play data set ###
opened_file = open('datasets/googleplaystore.csv', encoding = 'utf8')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### The App Store data set ###
opened_file = open('datasets/AppleStore.csv', encoding = 'utf8')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

The `explore_data()` function:

* Takes in four parameters:
    * `dataset`, which is expected to be a list of lists.
    * `start` and `end`, which are both expected to be integers and represent the starting and the ending indices of a slice from the data set.
    * `rows_and_columns`, which is expected to be a Boolean and has `False` as a default argument.
* Slices the data set using `dataset[start:end]`.
* Loops through the slice, and for each iteration, prints a row and adds a new line after that row using `print('\n')`.
    * The `\n` in `print('\n')` is a special character and won't be printed. Instead, the `\n` character adds a new line, and we use `print('\n')` to add some blank space between rows.
* Prints the number of rows and columns if `rows_and_columns` is `True`.
    * `dataset` shouldn't have a header row, otherwise the function will print the wrong number of rows (one more row compared to the actual length).

In [5]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line afte each row
        
    if rows_and_columns:
        print('Number of rows: ', len(dataset))
        print('Number of columns: ',len(dataset[0]))

In [6]:
print(android_header)
print('\n')
explore_data(android,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows:  10841
Number of columns:  13
