# Analysis of mobile app data from the Apple and Android Stores

The intent of this analysis is to understand what types of apps are most likely to attract users on both the Google Play and the App store. This analysis will also test the skills learned so far in this Python course. 

Further information about the datasets used in the analysis can be found here: 

| Google Play Store | Apple App Store |
|-------- | ---------|
|[Kaggle Link](https://www.kaggle.com/datasets/lava18/google-play-store-apps) | [Kaggle Link](https://www.kaggle.com/datasets/ramamet4/app-store-apple-data-set-10k-apps)|



In [2]:
from csv import reader

### Google Play Store ###
opened_file = open('googleplaystore.csv')
read_file = reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

### Apple App Store ###
opened_file = open('AppleStore.csv')
read_file = reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line after each row

    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))

print('Google Play Store:')        
explore_data(android,0,4,True)

print('Apple App Store:')
explore_data(ios,0,4,True)

Google Play Store:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 10841
Number of columns: 13
Apple App Store:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', '

In [3]:
print(android[9148])

['Command & Conquer: Rivals', 'FAMILY', 'NaN', '0', 'Varies with device', '0', 'NaN', '0', 'Everyone 10+', 'Strategy', 'June 28, 2018', 'Varies with device', 'Varies with device']


## Deletion of duplicate entries

### Part One

Several entries have duplicates as shown in the code below printing out all the entries with the name "Instagram". This section will remove duplicate rows from our dataset. 

In [4]:
for app in android: 
    name = app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


Count the number of duplicate apps in the Google Play dataset:

In [5]:
duplicate_apps=[]
unique_apps=[]

for app in android: 
    name = app[0]
    if name in unique_apps:
        duplicate_apps.append(name)
    else:
        unique_apps.append(name)

len_unique = len(unique_apps)
len_duplicate = len(duplicate_apps)

print('Number of duplicate apps: ', len_duplicate)
print('\n')
print('Examples of duplicate apps: ', duplicate_apps[:15])

Number of duplicate apps:  1181


Examples of duplicate apps:  ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']


## Part Two

Duplicate entries will be removed by taking the entry with the highest number of reviews in the review column. This implies that it is a more recent entry. 

We will start by building a dictionary.

### Removal of problem entry

We need to remove one of the app entries that has the review count listed as '3.0M' and can not be converted into a float. A check has been included to make sure additional rows aren't deleted if the code is run twice.

In [6]:
bad_entry = 'False'
for app in android:
    if app[3] == '3.0M':
        print(app)
        index = android.index(app)
        bad_entry = 'True'
    
        
print(index)
print(bad_entry)

if bad_entry == 'True':
    del android[index]
else: 
    print('No bad entries')

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
10472
True


In [7]:
reviews_max = {}

for app in android: 
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    if name not in reviews_max: 
        reviews_max[name] = n_reviews
            
            
print(len(reviews_max))

9659


Now we will use the created dictionary to remove the duplicate rows. We will loop through the android list and add the names who's number of reviews match the highest review number in the dictionary. 

We then print the length of the android_clean list to make sure it matches the length of our reviews_max list. This is the number of unique app names in the dataset.

In [9]:
android_clean = []
already_added = [] 

for app in android: 
    name = app[0]
    n_reviews = float(app[3])
    
    if n_reviews == reviews_max[name] and name not in already_added:
        android_clean.append(app)
        already_added.append(name)
        
explore_data(android_clean,0,3,True)


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


## Removal of non-english apps

### Part One

We will now remove non-english app listings in the android_clean dataset. We will do this by verifying that the characters in the name field are withing the ASCII range for English characters. Below you can see some entries that have non-english names:

In [10]:
print(android_clean[4412][0])
print(android_clean[7940][0])

中国語 AQリスニング
لعبة تقدر تربح DZ


In [19]:
def check_english(string):
    for character in string: 
        if ord(character) > 127:
            return False
        
    return True 
    
print(check_english('Instagram'))
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english('Docs To Go™ Free Office Suite'))
print(check_english('Instachat 😜'))

True
False
False
False


### Part Two

We need to add a condition so that we don't remove apps that use characters such as ™. We will update the function to return false if more than 3 non-english characters are present.

In [20]:
print(ord('😜'))

128540


In [24]:
def is_english(string): 
    count = 0
    for character in string: 
        if ord(character) > 127:
            count +=1     
    if count >= 3:
        return False
        
    return True 

print(is_english('Instagram'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

True
False
True
True


In [26]:
android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if is_english(name) == True:
        android_english.append(app)

for app in ios: 
    name = app[0]
    if is_english(name) == True: 
        ios_english.append(app)

explore_data(android_english,0,3,True)
explore_data(ios_english,0,3,True) 

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9597
Number of columns: 13
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'G

## Isolating the Free Apps

Now we will isolate the free apps.

In [27]:
android_free = []
ios_free = []

for app in android_english: 
    price = app[7]
    if price == '0':
        android_free.append(app)

for app in ios_english: 
    price = app[4]
    if price == '0.0':
        ios_free.append(app)
        
print(len(android_free))
print(len(ios_free))


8848
4056


## Finding the most common apps by Genre

### Part One

We would like to determine the most popular genres of apps on both the Google Play store and the Apple store. This will help inform what type of app our company should build that will be most likely to acheive widespread use.


In [31]:
print('Google Play:')
explore_data(android_free, 0,4, True)
print('Apple Store:')
explore_data(ios_free,0,4,True)

Google Play:
['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


['Pixel Draw - Number Art Coloring Book', 'ART_AND_DESIGN', '4.3', '967', '2.8M', '100,000+', 'Free', '0', 'Everyone', 'Art & Design;Creativity', 'June 20, 2018', '1.1', '4.4 and up']


Number of rows: 8848
Number of columns: 13
Apple Store:
['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'U

### Part Two

We will now build frequency tables.

In [35]:
def freq_table(dataset,index):
    
    table = {}
    total = 0 
    
    for item in dataset:
        total +=1
        value = item[index]
        if value in table: 
            table[value] += 1
        else: 
            table[value] = 1
    
    table_percentages = {}
    for key in table:
        percentage = (table[key] / total) * 100
        table_percentages[key] = percentage 
    
    return table_percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)

    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1], ':', entry[0])
        
        
##display_table(android_english,9)
display_table(android_english,8)
##


Everyone : 81.83807439824945
Teen : 10.732520579347712
Mature 17+ : 4.053350005209961
Everyone 10+ : 3.32395540273002
Adults only 18+ : 0.03125976867771178
Unrated : 0.020839845785141187


### Part Three

We will now examine the frequency tables:

In [36]:
display_table(ios_english,-5) #prime genre

Games : 53.66124774211477
Entertainment : 7.433652910935113
Education : 6.294289287203002
Photo & Video : 4.849242740030569
Utilities : 3.4458802278727245
Health & Fitness : 2.501042100875365
Productivity : 2.473252744198972
Social Networking : 2.3204112824788106
Lifestyle : 2.0008336807002918
Music : 1.9174656106711132
Shopping : 1.6951507572599693
Sports : 1.5839933305543976
Book : 1.5562039738780047
Finance : 1.445046547172433
Travel : 1.1254689453939142
News : 1.0421008753647354
Weather : 1.0004168403501459
Reference : 0.8892594136445742
Food & Drink : 0.8753647353063776
Business : 0.7919966652771988
Navigation : 0.6391552035570377
Medical : 0.31957760177851885
Catalogs : 0.1389467833819647


We can conclude here that Games make up the majority of apps in the Apple Store with 53.66% representation.

In [37]:
display_table(android_english,9) # Genre

Tools : 8.59643638637074
Entertainment : 5.8038970511618215
Education : 5.2412212149630095
Business : 4.365947691987079
Medical : 4.115869542565385
Personalization : 3.9074710847139733
Productivity : 3.8866312389288318
Lifestyle : 3.761592164217985
Finance : 3.5948733979368557
Sports : 3.4385745545482966
Communication : 3.2614358653745965
Action : 3.1051370219860375
Health & Fitness : 3.000937793060331
Photography : 2.917578409919767
News & Magazines : 2.594560800250078
Social : 2.4903615713243723
Travel & Local : 2.27154319058039
Books & Reference : 2.261123267687819
Shopping : 2.0944045014066894
Simulation : 1.979785349588413
Arcade : 1.9068458893404188
Dating : 1.771386891737001
Casual : 1.7192872772741483
Video Players & Editors : 1.6776075857038657
Maps & Navigation : 1.333750130249036
Puzzle : 1.2399708242159009
Food & Drink : 1.1670313639679066
Role Playing : 1.0836719808273418
Strategy : 0.9794727519016359
Racing : 0.9482129832239241
Libraries & Demo : 0.87527352297593
Auto & V

We can see here that the top 3 genres in the Google Play Store are Tools, Entertainment, and Education

In [38]:
display_table(android_english,1) #Category

FAMILY : 19.360216734396165
GAME : 9.794727519016359
TOOLS : 8.60685630926331
BUSINESS : 4.365947691987079
MEDICAL : 4.115869542565385
PERSONALIZATION : 3.9074710847139733
PRODUCTIVITY : 3.8866312389288318
LIFESTYLE : 3.7720120871105554
FINANCE : 3.5948733979368557
SPORTS : 3.376055017192873
COMMUNICATION : 3.2614358653745965
HEALTH_AND_FITNESS : 3.000937793060331
PHOTOGRAPHY : 2.917578409919767
NEWS_AND_MAGAZINES : 2.594560800250078
SOCIAL : 2.4903615713243723
TRAVEL_AND_LOCAL : 2.28196311347296
BOOKS_AND_REFERENCE : 2.261123267687819
SHOPPING : 2.0944045014066894
DATING : 1.771386891737001
VIDEO_PLAYERS : 1.6984474314890068
MAPS_AND_NAVIGATION : 1.333750130249036
FOOD_AND_DRINK : 1.1670313639679066
EDUCATION : 1.104511826612483
ENTERTAINMENT : 0.9065332916536417
LIBRARIES_AND_DEMO : 0.87527352297593
AUTO_AND_VEHICLES : 0.87527352297593
WEATHER : 0.8127539856205065
HOUSE_AND_HOME : 0.7398145253725122
EVENTS : 0.666875065124518
PARENTING : 0.6251953735542357
ART_AND_DESIGN : 0.62519537

We can see here that Family, Game, and Tools are the largest Categories in the Google Play Store. The Apple Store has a much more notable share of Games apps designed for fun whereas the distribution of Genres and Categories in the Google Play Store is more balanced.

## Most Popular Apps on the App Store by Genre

We will now look for the most popular apps. This differs from the most common apps we found in the previous section. 

In [43]:
genres_ios = freq_table(ios_english,-5)

for genre in genres_ios:
    
    total = 0
    len_genre = 0
    
    for app in ios_english: 
        genre_app = app[-5]
        if genre == genre_app:
            n_ratings = float(app[5])
            total += n_ratings
            len_genre += 1
            
    avg_n_ratings = total / len_genre
    
    print(genre, ' : ', avg_n_ratings)
            

Social Networking  :  45498.89820359281
Photo & Video  :  14352.280802292264
Games  :  13691.996633868463
Music  :  28842.021739130436
Reference  :  22410.84375
Health & Fitness  :  9913.172222222222
Weather  :  22181.027777777777
Utilities  :  6863.822580645161
Travel  :  14129.444444444445
Shopping  :  18615.32786885246
News  :  13015.066666666668
Navigation  :  11853.95652173913
Lifestyle  :  6161.763888888889
Entertainment  :  7533.678504672897
Food & Drink  :  13938.619047619048
Sports  :  14026.929824561403
Book  :  5125.4375
Finance  :  11047.653846153846
Education  :  2239.2295805739514
Productivity  :  8051.3258426966295
Business  :  4788.087719298245
Catalogs  :  1732.5
Medical  :  592.7826086956521


We can see that Social Networking is the "most downloaded" genre of app on iOS by using number of ratings as a proxy.

## Most Popular Apps on Google Play by Genre

We will now look at apps on Google Play. Note that we must first remove the + and , characters present in the Installs column before converting the estimates to floats so that we can compute and average.

In [45]:
categories_android = freq_table(android_english, 1)

for category in categories_android:
    total = 0
    len_category = 0
    for app in android_english:
        category_app = app[1]
        if category_app == category:            
            n_installs = app[5]
            n_installs = n_installs.replace(',', '')
            n_installs = n_installs.replace('+', '')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)

ART_AND_DESIGN : 1887285.0
AUTO_AND_VEHICLES : 632501.3214285715
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 7676991.133640553
BUSINESS : 1663758.627684964
COMICS : 832613.8888888889
COMMUNICATION : 35266026.32907348
DATING : 828971.2176470588
EDUCATION : 1782566.0377358492
ENTERTAINMENT : 11375402.298850575
EVENTS : 249580.640625
FINANCE : 1319851.4028985507
FOOD_AND_DRINK : 1891060.2767857143
HEALTH_AND_FITNESS : 3972300.388888889
HOUSE_AND_HOME : 1360598.042253521
LIBRARIES_AND_DEMO : 630903.6904761905
LIFESTYLE : 1377507.0138121548
GAME : 14210387.675531914
FAMILY : 3345018.516684607
MEDICAL : 96944.49873417722
SOCIAL : 22961790.384937238
SHOPPING : 6966908.880597015
PHOTOGRAPHY : 16636241.267857144
SPORTS : 3384026.2283950616
TRAVEL_AND_LOCAL : 13218662.767123288
TOOLS : 9809631.85835351
PERSONALIZATION : 4086652.4853333333
PRODUCTIVITY : 15530942.008042896
PARENTING : 525351.8333333334
WEATHER : 4628211.794871795
VIDEO_PLAYERS : 24121489.079754602
NEWS_AND_MAGAZINES : 95108

It appears that communication apps have the most installs on the Google Play Store.

## Conclusion

We have analysed data from the Apple Store and Google Play Store to determine the best type of app to build. We would recommend continued analys