# Profitable App Profiles for the App Store and Google Play Markets

![Image](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTZMG2LTP5xD1lZ9SxeiWuql2IJdUG9ct4Ukg&usqp=CAU)

Table of Contents
=================

   * [1. Project Overview](#chapter1)
   * [2. Getting Started](#chapter2)
       * [2.1 Importing relevant libraries and checking their versions](#chapter2.1)
       * [2.2 Reading in datasets as lists of lists](#chapter2.2)
       * [2.3 Exploring the datasets](#chapter2.3)
       
       
   * [3. Reading In Data from Local Folder](#-3.-Reading-In-Data-from-Local-Folder)
       * [3.1 Reading in a CSV file](##-3.1-Reading-in-a-CSV-file)
       * [3.2 Reading in an Excel file](##-3.2-Reading-in-an-Excel-file)
       * [3.3 Reading in a JSON file](##-3.3-Reading-in-a-JSON-file)
       
       
   * [4. Conclusion](#-4.-Conclusion))

## 1. Project Overview <a class="anchor" id="chapter1"></a>

This is a basic Python project that covers data cleaning and EDA. *We did not use Numpy or Pandas libraries here for a purpose.*

For this project, we'll pretend we're working as data analysts for a company that builds Android and iOS mobile apps. We make our apps available on Google Play and in the App Store.

We only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that the number of users of our apps determines our revenue for any given app — the more users who see and engage with the ads, the better. Our goal for this project is to analyze data to help our developers understand what type of apps are likely to attract more users.



## 2. Getting started <a class="anchor" id="chapter2"></a>

### 2.1 Importing relevant libraries and checking their versions <a class="anchor" id="chapter2.1"></a>

In [86]:
# Importing libraries
import requests
from csv import reader

In [87]:
# Printing verions of Python modules and packages with watermark - the IPython magic extension.
%load_ext watermark

%watermark -v -p requests,csv

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
Python implementation: CPython
Python version       : 3.7.10
IPython version      : 7.29.0

requests: 2.26.0
csv     : 1.0



### 2.2 Reading in datasets as lists of lists <a class="anchor" id="chapter2.2"></a>

As of 1st quarter of 2021, there were approximately 2 million iOS apps available on the App Store, and 3.5 million Android apps on Google Play[<sup>1</sup>](#fn1).

Collecting data for over 4 million apps requires a significant amount of time and money, so we'll try to analyze a sample of the data instead. To avoid spending resources on collecting new data ourselves, we should first try to see if we can find any relevant existing data at no cost. Luckily, here are two data sets that seem suitable for our goals:

* [A dataset](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately 10,000 Android apps from Google Play; the data was collected in August 2018. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
* [A dataset](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately 7,000 iOS apps from the App Store; the data was collected in July 2017. You can download the data set directly from this [link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Now, we will define a function `open_data` to import the two datasets we mentioned above and save them as *list of lists*.

In [88]:
# Saving links as variables
url_android = 'https://dq-content.s3.amazonaws.com/350/googleplaystore.csv'
url_ios = 'https://dq-content.s3.amazonaws.com/350/AppleStore.csv'

# Defining the `open_data` function
def open_data(filename):
    """
    This function takes in a link to a csv file as input and 
    returns dataset in the list of lists format as output
    """
    response=requests.get(filename)
    text=response.iter_lines(decode_unicode='utf-8')
    read_file=reader(text,delimiter=',')
    return list(read_file)

# Loading both datasets
android = open_data(url_android)   
android_header = android[0]
android = android[1:]

ios = open_data(url_ios) 
ios_header = ios[0]
ios = ios[1:]

### 2.3 Exploring the datasets <a class="anchor" id="chapter2.3"></a>

To make it easier to explore the two datasets, we'll first write a function named `explore_data()` that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any dataset.

In [89]:
# Defining the `explore_data` function
def explore_data(dataset, start, end, rows_and_columns=False):
    """
    This function loops through the slice of a dataset, 
    and for each iteration, prints a row and adds a new line after that row, 
    and prints the number of rows and columns if `rows_and_columns` is `True`
    """
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

In [90]:
# Exploring the columns names
print(android_header)
print('\n')
print(ios_header)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [91]:
# Exploring the first three rows of each dataset
explore_data(android, 0, 3, True)
print('\n')
explore_data(ios, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite â\x80\x93 FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10855
Number of columns: 13


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.

At a quick glance, the columns that might be useful for the purpose of our analysis are *'track_name'*, *'price'*, *'rating_count_tot'*, *'user_rating'*, and *'prime_genre'*.

The columns that might be useful for the purpose of our analysis are *'App'*, *'Category'*, *'Rating'*, *'Reviews'*, *'Installs'*, *'Type'*, *'Price'*, and *'Genres'*.

## Footnotes

<span id="fn1"> 1. [Number of apps available in leading app stores as of 1st quarter 2021, Statista
](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)</span>