<u1><h1>Mobile App Profiles for the App Store (IOS) and Google Play Store (Android)</h1></u1>

<h2>Introduction</h2>
<p>Here, we have datasets that consists of mobile apps that are in the App Store and Google Play Store that contain important information regarding each app (i.e. Rating, Category, Description, Number Of Downloads etc...)</p>

<h2> Goal of analysis </h2>
<p>The goal of our analysis is to deduce what is the driving force behind the success of certain mobile apps for both mobile markets and how we can replicate their success. The pandas package will not be used as we would like to use more fundamental ways of analysing the dataset to see other methods of analysis.</p>

<h2> Data Collection </h2>
<p>The data will be from Kaggle from the following repositories:
<ul>
  <li>About 10,0000 Android apps on the Google Play store: <a href="https://www.kaggle.com/lava18/google-play-store-apps/home">Dataset</a>
</li>
  <li>About 7,000 IOS apps from the App store: <a href="https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home">Dataset</a>
</li>
</ul></p>

<h2>Importing Data</h2>

In [2]:
# import statements
from csv import reader

# retrieve datasets
def retrieve_data(dataset, header):
    open_file = open(dataset, encoding = "utf8")
    read_file = reader(open_file)
    list_data = list(read_file)
    
    # return header seperately if available
    if (header):
        return list_data[0], list_data[1:]
        
    return list_data

# convert the dataset csv files into list of lists, including header
android_header, android_apps = retrieve_data('googleplaystore.csv', True)
ios_header, ios_apps = retrieve_data('AppleStore.csv', True)

<h2> Exploring the dataset </h2>
<p>Before we get too exited, let's go through and explore what variables we are working with by printing the first 3 rows of both datasets and also include the header to clarify what each column is represents.</p>

In [7]:
# print n rows of the dataset
def print_data(dataset, start_row, end_row):
    subset = dataset[start_row:end_row+1]
    for row in subset:
        print(row, '\n')
        
# get the number of rows and columns of the dataset
def dimensions(dataset, header):
    return len(dataset), len(header)

# explore android dataset
print("-----Android data (Header)-----")
print(android_header, '\n')

print("-----Android data (Dimensions)-----")
rows, cols = dimensions(android_apps, android_header)
print("Number of rows:", rows, 
      "\nNumber of columns:", cols, 
      "\n")

print("-----Android data (First 3 rows)-----")
print_data(android_apps,1,3)

# explore ios dataset
print("-----IOS data (Header)-----")
print(ios_header, '\n')

print("-----IOS data (Dimensions)-----")
rows, cols = dimensions(ios_apps, ios_header)
print("Number of rows:", rows, 
      "\nNumber of columns:", cols, 
      "\n")

print("-----IOS data (First 3 rows)-----")
print_data(ios_apps,1,3)

-----Android data (Header)-----
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 

-----Android data (Dimensions)-----
Number of rows: 10841 
Number of columns: 13 

-----Android data (First 3 rows)-----
['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up'] 

['U Launcher Lite â€“ FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up'] 

['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up'] 

-----IOS data (Header)-----
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user

<b>Comment:</b> Here, we see the Android dataset has the following 13 variables as the columns: 'App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver' and it cotains 10841 rows of android app data.

Furthermore, the IOS dataset contains 16 variables which consists of: 'id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic' and it contains 7197 rows of ios app data.