# iOS and Android App Research
* I am putting together datasets to better understand statistics for app development. We'll be looking at data collected from the Google Play and the App Store. 

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and in the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]
    
    if rows_and_columns:
        print(f'Number of rows: {len(dataset)}')
        print(f'Number of columns:{len(dataset[0])}\n')
        
    for row in dataset_slice:
        print(row)
        print('\n')  # adds a new blank line after each row.

In [2]:
opened_file_ios = open('AppleStore.csv')
opened_file_android = open('googleplaystore.csv')
from csv import reader
read_ios = reader(opened_file_ios)
read_droid = reader(opened_file_android)
ios_all_data = list(read_ios)
droid_all_data = list(read_droid)

In order to find free, user driven apps, funded by ad revenue I believe relevant columns will be:
* name
* price
* user ratings 
* prime genre
* category
* reviews
* genre

In [3]:
print('iOS Header...')
explore_data(ios_all_data, 0, 1)
print('iOS Data...')
explore_data(ios_all_data, 1, 3, True)

print('Android Header...')
explore_data(droid_all_data, 0, 1)
print('Android Data...')
explore_data(droid_all_data, 1, 3, True)

iOS Header...
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


iOS Data...
Number of rows: 7198
Number of columns:16

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Android Header...
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


Android Data...
Number of rows: 10842
Number of columns:13

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and 

In [4]:
def extract_data(dataset: list, start: int, end: int, rows_and_columns=False, header=False):
    dataset_slice = []
    if header:
        start += 1
        header = dataset[0]
    for row in dataset[start:end]:
        dataset_slice.append(row)
        
    return dataset_slice

ios = extract_data(ios_all_data, 0, -1, header=True)
print(ios[:3])
print()
droid = extract_data(droid_all_data, 0, -1, header=True)
print(droid[:3])

[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']]


In [5]:
for row in droid:
    if len(row) != len(droid_header):
        print(row)
        print('\n')
        print(f"Index position is {droid.index(row)}")

NameError: name 'droid_header' is not defined

In [None]:
row_1 = droid[0]
row_2 = droid[10473]
for col_row_1, col_row_2 in zip(row_1, row_2):
    print(f"{col_row_1}: {col_row_2}")

Looks like this app has a rating of 19 which is not possible. It is missing an entry for the `Category` column. Let's `delete` that.

In [None]:
del droid[10473]

In [None]:
row_1 = droid[0]
row_2 = droid[10473]
for x, y in zip(row_1, row_2):
    print(x, y)

Recall that at our company, we only build apps that are free to download and install, and we design them for an English-speaking audience. This means that we'll need to do the following:

* Remove non-English apps like 爱奇艺PPS -《欢乐颂2》电视剧热播.
* Remove apps that aren't free.

## Let's investigate duplicate apps.

In [None]:
duplicate_apps = []
unique_apps = []

for app in droid:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print(f'Number of duplicate app: {len(duplicate_apps)}\n')
print(f'Examples of duplicate apps: {duplicate_apps[:15]}')

**That's 1181 duplicate apps. Let's see if we can find some discrepencies between the entries.**

In [None]:
for app in droid:
    name = app[0]
    if name == 'Instagram':
        print(app)

**`Instagram` seems to have taken this info at different points in the rating totals. We can use this number to delete multiple entries.**

In [None]:
print(f'Expected length: {len(droid)}'')