# Profitable App Profiles for the App Store and Google Play Markets

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and the App Store.

We *only build apps that are free to download and install*, and our main source of revenue consists of in-app ads. This means our revenue for any given app is mostly influenced by the number of users who use our app — the more users that see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.

In [1]:
# Importing the libraries

from csv import reader

In [2]:
# Opening the files

open_file_apple = open('AppleStore.csv')
read_file_apple = reader(open_file_apple)
apps_data_apple = list(read_file_apple)

open_file_google = open('googleplaystore.csv')
read_file_google = reader(open_file_google)
apps_data_google = list(read_file_google)



In [3]:
# Function to Explore dataset
'''
This function will help to explore the dataset. 

Parameters: 
dataset: name of the dataset, list type
start:stating point for slicing
end: ending point for slicing
rows_and_columns: Boolean, False by default

Output:
Number of rows and columns
dataset after slicing

'''

def explore_dataset(dataset, start, end, rows_and_columns = False):
    dataset_slice = dataset[start:end]
    
    for row in dataset_slice:
        print(row)
        print('\n')
    
    if rows_and_columns:
        print('Number of Rows: ', len(dataset))
        print('Number of Columns: ', len(dataset[0]))
        

In [4]:
# Lets explore the dataset
print('Android Header:', apps_data_google[0])
explore_dataset(apps_data_google, 1, 6, True)

Android Header: ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Fr

# Data Cleaning

There are some duplicate data in the Google apps data.
We will remore those. In this case, we will remove it randomly.
We will keep the data that has highest reviews, meaning the data is latest.

In [5]:
# Example of duplicate data

for app in apps_data_google:
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [10]:
# lets count the number of duplicate apps

duplicate_apps = []
unique_apps = []

for app in apps_data_google:
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

print('Number of Unique apps: ', len(unique_apps))
print('Number of Duplicare apps: ', len(duplicate_apps))


Number of Unique apps:  9661
Number of Duplicare apps:  1181


In [14]:
# In the review count, we actually found an string
# It was creating error, so we checked it maunally
# New we have to remove it

for app in apps_data_google:
    name = app[0]
    n_reviews = app[3]
    if '3.0M' in n_reviews:
        print(app)


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


In [8]:
# Removing the duplicate apps

'''
We will create a dictionary, where the dictionaty key is the 
name of the unique app and value is the highest number of reviews. 

'''

reviews_max = {}
for app in apps_data_google[1:]:
    name = app[0]
    n_reviews = float(app[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews
        
print(reviews_max)

ValueError: could not convert string to float: '3.0M'