## Profitable App Profiles for the App Store and Google Play Markets

App Mechanics, Inc. is an Atlanta-based app development company which makes free gaming apps for the iOS and Google Play stores. The main revenue stream for the company is in-app ads. Advertisers pay App Mechanics for the number of **gross impressions**—a measure of contact with one or more of a brand's advertisement—which translates into an improved revenue position, if the gaming apps can help drive user contact and engagement with a brand's advertisement.[<sup>1</sup>](#fn1)

The goal of this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users. 

## Opening and Exploring the Datasets

In [21]:
# The iOS dataset

opened_file = open('AppleStore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
ios = list(read_file)
ios = [row[1:] for row in ios]
ios_header = ios[0]
ios = ios[1:]

# The Google Play Dataset

opened_file = open('googleplaystore.csv', encoding="utf8")
from csv import reader
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

To make it easier to explore the two data sets, we'll first write a function named `explore_data()` that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [22]:
# Define the explore_data function
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


In [23]:
print(ios_header)
print('\n')
explore_data(ios, 0, 3, True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['281656475', 'PAC-MAN Premium', '100788224', 'USD', '3.99', '21292', '26', '4', '4.5', '6.3.5', '4+', 'Games', '38', '5', '10', '1']


['281796108', 'Evernote - stay organized', '158578688', 'USD', '0', '161065', '26', '4', '3.5', '8.2.2', '4+', 'Productivity', '37', '5', '23', '1']


['281940292', 'WeatherBug - Local Weather, Radar, Maps, Alerts', '100524032', 'USD', '0', '188583', '2822', '3.5', '4.5', '5.0.0', '4+', 'Weather', '37', '5', '3', '1']


Number of rows: 7197
Number of columns: 16


## Data Cleaning

### Deleting Incorrect Data

The Google Play Store Apps dataset on [kaggle](https://www.kaggle.com/lava18/google-play-store-apps) has a dedicated discussion [forum](https://www.kaggle.com/lava18/google-play-store-apps/discussion). A user reported an error with row index `10472`. Let's explore to see if there is indeed an issue with the data.  

In [27]:
explore_data(android, 10472, 10473)
print('\n')
print(android_header)

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']




['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


As shown above row `10472` is missing both the `Category` and `Content Rating`, which is blank. We can take care of this by deleting this row. 

In [29]:
print(len(android))
del android[10472]  # don't run this more than once
print(len(android))

10841
10840


### Removing Duplicate Entries

If we explore the Google Play data set or look at the discussions section, you'll notice some apps have duplicate entries. For instance, Instagram has four entries.

In [32]:
for app in android:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In total, there are 1,181 cases where an app occurs more than once:

In [33]:
duplicate_apps = []
unique_apps = []

for app in android:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)
    
print('Number of duplicate apps:', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:15])

Number of duplicate apps: 1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


Above, we:

- Created two lists: one for storing the name of duplicate apps, and one for storing the name of unique apps
- Looped through the android data set (the Google Play data set), and for each iteration:
    - We saved the app name to a variable named name
    - If name was already in the unique_apps list, we appended name to the duplicate_apps list
    - Else (if name wasn't already in the unique_apps list), we appended name to the unique_apps list
- As a side note, you might have noticed above that we used the in operator to check for membership in a list. We only learned to use in to check for membership in dictionaries, but in also works with lists:

## References

<span id="fn1"> [1] Geskey Ronald, D Sr. (2013) Media Planning & Buying in the 21st Century</span>