# Apps that Attract the most Users

As a company that builds free apps to download and install we rely on having a large user base that will see and engage with ads. The goal of this analysis is to help developers understand what type of apps are likely to attract the most number of users.

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.


![image](https://s3.amazonaws.com/dq-content/350/py1m8_statista.png) 


Source: [Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)


collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on colleting new data ourselves, we shoudl first try to see if we can find any relevant existing data at no cost. Luckily, here are two datasets that seem suitable for our goals:


1. [A dataset](https://www.kaggle.com/datasets/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the dataset directly from [this link.](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv)

2. [A dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the dataset directly from [this link.](https://dq-content.s3.amazonaws.com/350/AppleStore.csv)


Lets Open the two data sets and explore.

In [12]:
from csv import reader


### The App Store Data Set
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios_apps_data = list(read_file)
opened_file.close()

for row in ios_apps_data:
    row.pop(0)

print(ios_apps_data[0])


### The Google Play data Set
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android_apps_data = list(read_file)
opened_file.close()



['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [13]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')

    if rows_and_columns:
        print(f'Number of rows: {len(dataset[1:])}')
        print(f'Number of columns: {len(dataset[0])}')
            

## Exploring the Google Play Dataset

In [14]:
explore_data(android_apps_data, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


Reviewing the output above we can see that the Dataset has 10841 apps with 13 columns. Reviewing the above columns, the ones that would be important for this analysis would be `App`, `Category`, `Reviews`, `Type`, `Price`, and `Genre`

## Exploring the iOS App Store Dataset 

In [15]:
explore_data(ios_apps_data, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


Number of rows: 7197
Number of columns: 16


The iOS Dataset has 7197 apps with 16 columns. The following columns be be of interest: `track_name`, `currency`, `price`, `rating_count_tot`, `rating_count_ver`, `prime_genre`.