# Analysis of mobile app sales

The used data-files contain information on sales from the [google](https://www.kaggle.com/lava18/google-play-store-apps) and [apple](https://dq-content.s3.amazonaws.com/350/AppleStore.csv) app stores. The objective of this project is to identify the type of apps that generate the highest number of user downloads. To facilitate the analysis only free apps are considered. The data on paid apps would need to be properly segmented. Results can be used by developers to make better decisions on future free app products and increase revenue.

In [1]:
def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

Saving app data as list of lists in the varaibles `apple` and `google`.  
Headers are saved seperately as `apple_header` and `google_header`.

In [19]:
from csv import reader

#Apple file
opened_file = open('C:/Users/User/Documents/my_datasets/AppleStore.csv', encoding="utf8")
read_file = reader(opened_file)
apple = list(read_file)
apple_header = apple[0]
apple = apple[1:]

#Android file
opened_file = open('C:/Users/User/Documents/my_datasets/googleplaystore.csv', encoding="utf8")
read_file = reader(opened_file)
google = list(read_file)
google_header = google[0]
google = google[1:]

In [26]:
#Exploring the data from the apple store.
explore_data(apple,0,3,True)

['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


|Column name|Description|
|---|---|
| "id" (1) | App ID |
| "track_name" (2) | App Name |
| "size_bytes" (3) | Size (in Bytes)|
| "currency" (4) | Currency Type|
| "price" (5)| Price amount|
| "ratingcounttot" (6)| User Rating counts (for all version)|
|"ratingcountver" (7)| User Rating counts (for current version)|
|"user_rating" (8)| Average User Rating value (for all version)|
|"userratingver" (9)| Average User Rating value (for current version)|
|"ver" (10)| Latest version code|
|"cont_rating" (11)| Content Rating|
|"prime_genre" (12) | Primary Genre|
|"sup_devices.num" (13)| Number of supporting devices|
|"ipadSc_urls.num" (14)| Number of screenshots showed for display|
|"lang.num" (15)| Number of supported languages|
|"vpp_lic" (16)| Vpp Device Based Licensing Enabled|

In [27]:
#Exploring data from the google store.
explore_data(google,0,3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


|Column name|Description|
|---|---|
|'App' (1)| Name of app|
|'Category' (2)| App category|
|'Rating' (3)| App rating (total)|
|'Reviews' (4)| Count of toal reviews|
|'Size' (5)| App size (MB)|
|'Installs' (6)| Total number of installs|
|'Type' (7)| Paid vs. Free|
|'Price' (8)| Price of app|
|'Content Rating' (9)| Targeted age group|
|'Genres' (10)| Complete list of genres|
|'Last Updated' (11)| Date of last update|
|'Current Ver' (12)| Current version number|
|'Android Ver' (13)| Required android version|

In [35]:
# Ratings must be between 0-5. Search for errors in google data.
error_index = []
for index in range(len(google)):
    rating = float(google[index][2])
    if rating < 0 or rating > 5:
        error_index.append(index)
print(error_index)

[10472]


In [40]:
# Delete erronious row
del google[10472]

In [39]:
# Ratings must be between 0-5. Search for errors in apple data.
error_index1 = []
error_index2 = []

for index in range(len(apple)):
    rating = float(apple[index][7])
    if rating < 0 or rating > 5:
        error_index1.append(index)
        
    rating = float(apple[index][8])
    if rating < 0 or rating > 5:
        error_index2.append(index)
        
print(error_index1, error_index2)

[] []
