# Finding the most attractive App - Analyzing App Store and Google Play Markets

Our goal is to identify the profile of the app that is more profitable, analyzing App Store and Google Play Markets. 
This help our developers to make data-driven decisions with respect to the kind of apps are likely to attract more users. 

## Opening and Exploring Data

Considering the huge quantity of apps, we try to analyze a sample of data. In particular, we decide to analyze two existing data sets at no cost:
* a data set cointaing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](http://localhost:8826/edit/Dataquest_projects/App%20Profile/googleplaystore.csv).
* a data set cointaing data about approximately seven thousand iOs apps from App Store. You can download the data set directly from [this link](http://localhost:8826/edit/Dataquest_projects/App%20Profile/AppleStore.csv).

In [19]:
from csv import reader 
### The Google Play data set ###
opened_file = open('C:/Users/Acer/Dataquest_projects/App Profile/googleplaystore.csv', encoding='utf8')
read_file=reader(opened_file)
android = list(read_file)
android_header = android[0]
android = android[1:]

In [10]:
### The App Store data set ###
opened_file = open('C:/Users/Acer/Dataquest_projects/App Profile/AppleStore.csv', encoding='utf8')
read_file=reader(opened_file)
ios = list(read_file)
ios_header = ios[0]
ios = ios[1:]

We want to explore the two data sets, so we'll create a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

In [12]:
def explore_data(dataset, start, end, row_and_columns=False):
    dataset_slice=dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n') # adds a new (empty) line between rows
        
    if row_and_columns:
        print ('Number of rows:', len(dataset))
        print ('Number of columns:', len(dataset[0]))

        
print(android_header)
print('\n')
explore_data(android,0,3,True)

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


Number of rows: 10841
Number of columns: 13


The **columns** that might be uselful for our analysis are: **'App'**, **'Category'**, **'Reviews'**, **'Installs'**, **'Type'**, **'Price'** and **'Genres'**.

In [24]:
print(ios_header)
print('\n')
explore_data(ios,0,3,True)

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']


Number of rows: 7197
Number of columns: 16


Looking at the **columns**, we can find interesting: **"track_name"**, **"currency"**, **"price"**, **"rating_count_tot"** and **"prime_genre"**. 

We can find the details about the other columns [here](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home). 

## Deleting Wrong Data

The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/discussion), and we can see that [one of the discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct.

In [27]:
print (android[10472]) # incorrect row
print('\n')
print(android_header) # header
print('\n')
print(android[1]) # correct row

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


The row 10472 corresponds to the app *Life Made WI-Fi Touchscreen Photo Frame*, and we can see that the ratins is 19. This is clearly off because the maximum rating for a Google Play app is 5. As a consequence, we'll delete this row. 

In [28]:
print(len(android))
del android[10472] #don't run this more than once
print (len(android))

10841
10840


## Removing Duplicate Entries

If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For istance, the application Instagram has four entries.

In [30]:
for app in android:
    name=app[0]
    if name == 'Instagram':
        print(app)

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']


In [69]:
how_many_times_app={}
for app in android:
    name=app[0]
    if name in how_many_times_app:
        how_many_times_app[name]+=1 # counts how many time the app appears
    else:
        how_many_times_app[name]=1 

# in name_app we can see haw many time every app appears 
count=0
for app in how_many_times_app:
    print (app,how_many_times_app[app])
    print('\n')  

Photo Editor & Candy Camera & Grid & ScrapBook 1


Coloring book moana 2


U Launcher Lite – FREE Live Cool Themes, Hide Apps 1


Sketch - Draw & Paint 1


Pixel Draw - Number Art Coloring Book 1


Paper flowers instructions 1


Smoke Effect Photo Maker - Smoke Editor 1


Infinite Painter 1


Garden Coloring Book 1


Kids Paint Free - Drawing Fun 1


Text on Photo - Fonteee 1


Name Art Photo Editor - Focus n Filters 1


Tattoo Name On My Photo Editor 1


Mandala Coloring Book 1


3D Color Pixel by Number - Sandbox Art Coloring 1


Learn To Draw Kawaii Characters 1


Photo Designer - Write your name with shapes 1


350 Diy Room Decor Ideas 1


FlipaClip - Cartoon animation 1


ibis Paint X 1


Logo Maker - Small Business 1


Boys Photo Editor - Six Pack & Men's Suit 1


Superheroes Wallpapers | 4K Backgrounds 1


Mcqueen Coloring pages 2


HD Mickey Minnie Wallpapers 1


Harley Quinn wallpapers HD 1


Colorfit - Drawing & Coloring 1


Animated Photo Editor 1


Pencil Sketch Drawing 1



Wedding Countdown Widget 1


My Day - Countdown Calendar 🗓️ 1


justWink Greeting Cards 1


Wedding LookBook by The Knot 1


Big Days - Events Countdown 1


Wedding Planner by WeddingWire - Venues, Checklist 1


Been Together (Ad) - D-day 1


WedMeGood - Wedding Planner 1


DIY Garden Ideas 1


Brit + Co 2


Creative Ideas - DIY & Craft 1


Homestyler Interior Design & Decorating Ideas 1


Wheretoget: Shop in style 2


My Dressing - Fashion closet 2


Chictopia 2


Scarf Fashion Designer 2


Zara 2


Real Estate by Movoto 1


Etta Homes 1


Houlihan Lawrence 1


Home Scouting® MLS Mobile 1


Housing-Real Estate & Property 1


ZipRealty Real Estate & Homes 1


Sotheby's International Realty 1


Zoopla Property Search UK - Home to buy & rent 1


Neighborhoods & Apartments 1


Keller Williams Real Estate 1


Relax Rain ~ Rain Sounds 1


Relax Ocean ~ Nature Sounds 1


Bed Time Fan - White Noise Sleep Sounds 1


Nature Sounds 1


SleepyTime: Bedtime Calculator 1


White Noise Baby 1


Whit

DreamTrips 1


Navmii GPS USA (Navfree) 1


Sygic Truck GPS Navigation 1


Moto File Manager 2


Google 2


Google Translate 2


Moto Display 1


Motorola Alert 1


Motorola Assist 1


Cache Cleaner-DU Speed Booster (booster & cleaner) 2


Moto Suggestions ™ 1


Moto Voice 1


Device Help 1


Account Manager 1


myMetro 1


File Manager 1


My Telcel 1


Calculator - free calculator, multi calculator app 1


ASUS Sound Recorder 1


iWnn IME for Nexus 1


Samsung Max - Data Savings & Privacy Protection 1


Android TV Remote Service 1


ZenUI Help 1


Calculator - free calculator ,multi calculator app 1


SHAREit - Transfer & Share 2


ZenUI Keyboard – Emoji, Theme 1


Files Go by Google: Free up space on your phone 1


SD card backup 1


Nokia mobile support 1


File Manager -- Take Command of Your Files Easily 1


Samsung Calculator 1


Clear 1


Phone 1


HTC Lock Screen 1


Gboard - the Google Keyboard 3


Google Korean Input 1


AT&T Smart Wi-Fi 1


Google app for Android TV 1


Sou



Casino X - Free Online Slots 1


X-WOLF 1


Light X - Icon Pack 1


X-Plane 10 Flight Simulator 1


Anime X Wallpaper 1


NEW Theme for Phone X 1


X-Plane to GPS 1


Control Center iOS 11 - Phone X notifications 1


iLocker X - iOS11 Lockscreen with HD Wallpapers 1


iLauncher OS 12 Pro - Phone X 1


Hero Fighter X 1


Slendrina X 1


X Back - Icon Pack 1


Guide (for X-MEN) 1


Decibel X PRO - Sound Meter dBA, Noise Detector 1


Flight Simulator X 2016 Free 1


Robocar X Ray 1


Space X: Sky Wars of Air Force 1


OS-X EMUI 4/5/8 THEME 1


Flexiroam X 1


X-ray scanner simulator 1


X Construction 1


X Home Bar - Home Bar Gesture Pro 1


PLAYBULB X 1


X-Wing Squadron Builder 1


YouTube TV - Watch & Record Live TV 1


JW Caleb y Sofia 1


Boomerang Make and Race 1


Talking Tom & Ben News 1


Vegas Crime Simulator 1


PewDiePie's Tuber Simulator 1


Real Gangster Crime 1


Antivirus & Mobile Security 1


Flashlight Ultimate 1


Gangster Town 1


Rope Hero 3 1


Avast Mobile Securi


Auto Background Changer 1


Block Gun 3D: Call of Destiny 1


BG Remover & Eraser Pro 1


BG Metro - Red voznje 1


BG Cricket 1


BG Guide 1


Bg Radios - Bulgarian radio stations online 1


Micro.bg Cloud POS System 1


BG-FLEET 1


trip.bg 1


Dete.bg 1


BLOOD & GLORY: IMMORTALS 1


Change photo background 1


Es-Bg Offline Voice Translator 1


Baldur's Gate: Enhanced Edition 1


Shadow Fight 2 Special Edition 1


ePazar.bg 1


Revita.bg 1


Background Eraser 1


ePay.bg 1


Top Novini BG 1


Block Gun 3D: Ghost Ops 1


Music for Youtube - Tube Music BG, Red+ 1


Bg+ Call Blocker 1


BG Blurry HD Wallpapers 1


Background Changer & Eraser 1


Stolica.bg 1


Cut Out : Background Eraser and background changer 1


Photo Background Changer 2018 - Blur Background 1


BGCN TV 1


ScorePal 1


Backgammon NJ for Android 1


BG Burger 1


JOBS.bg 1


CSCS BG (в български) 1


LEGO ® Batman: Beyond Gotham 1


Blood Glucose Tracker 1


Baldur's Gate II 1


Rento - Dice Board Game Online 1






Roll Call News 1


UFO-CQ 1


CQ ESPM 1


SHUTTLLS CQ - Connect Ride Go 1


QRZ Assistant 1


Palavras la cq 1


CQ Ukraine 1


Pocket Prefix Plus 1


CQ Electrical Group 1


10 WPM Amateur ham radio CW Morse code trainer 1


Sabbath School 1


Gunship Modern Combat 3D 1


Gun Simulator - Gun Games 1


ClanManagerTT2 1


Christian Questions Podcast 1


Sniper Killer Shooter 1


25WPM Amateur ham radio Koch CW Morse code trainer 1


QST 1


Cypress College Library 1


Create My App 1


Snowboard Racing Free Fun Game 1


theCut 1


الفاتحون Conquerors 1


Toughest Game Ever 2 1


RIDE ZERO 1


CricQuick 1


Next Island: Dino Village 1


iReadMe 1


Global Shop 1


Webmail web mobile app 1


QC 1


Traffic Info and Traffic Alert 1


Color Quartets 1


Ham Radio Prefixes 1


Battle Result Predictor for CR 1


Deck Advisor for CR 1


Card Creator for CR 1


CR & CoC Private Server - Clash Barbarians PRO 1


Deck Analyzer for CR 1


Chest Simulator for Clash Royale 1


Clash Soundboard For

Travelmoji 1


EF Forms 1


EFAmbassador 1


Agenda EF 1


EF App 1


EF Lens Simulator 1


EF Coach 1


e-Docente EI/anos iniciais EF 1


EF Universe: Endless Battle 1


[EF]ShoutBox 1


EF Catalogues · Kataloge 1


EF Staff 1


EF Academy 1


EF First 1


Financial Calculator Pro EF 1


Carrier Landings Pro 1


EF Calculator 1


EF Utilities 1


EF Financial Control Free 1


أحداث وحقائق | خبر عاجل في اخبار العالم 1


Diário Escola Mestres EF 1


EF Jumper 1


StudyLock - Education First 1


E.F.JUVENTUD MADRID 1


English Conversation Courses 1


eG Monitor 1


E.G. Chess Free 1


EG SIM CARD (EGSIMCARD, 이지심카드) 1


EG-Boost 1


EG Mantenimiento 1


EG CrossPad - ASPECT4 1


EG CrossPad 1


EG Player 1


Bee Mobile EG 1


EG 1


E.G. Chess 1


EG SIM CARD (EGSIMCARD) 1


EG Retail 1


EG Movi 1


EG Groups 1


EG Way Life 1


EG Tax Service 1


Energenie EG-PM1W Setup Wizard 1


qEG APP / Química EG SRL 1


EG Classroom Decimals™ 1


EGW Writings 2 1


Alipay 1


EG | Explore Folegan

In total, there are 1.181 cases where an app occurs more than once:

In [94]:
more_than_once=[]

times_more_than_one = 0

for app in how_many_times_app:
    if how_many_times_app[app]>1:
        more_than_once.append(app)
        times_more_than_one += how_many_times_app[app]-1 #-1 because a app has to occur only one time
            
print(times_more_than_one) # how many times apps appear more than one 

1181


We don't want to remove rows randomly, but we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more realiable the ratings.

In [100]:
reviews_max = {}

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
        
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

In a previous code cell, we found that are 1.181 cases where an app occurs more than once, so the lenght of our dictionary (of unique apps) should be equal to the difference between the lenght of our data set and 1181 

In [103]:
print ('Expected lenght',len(android)-1181)
print ('Actual lenght',len(reviews_max))

Expected lenght 9659
Actual lenght 9659



Now, let's use the reviews_max dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:

* We start by initializing two empty lists, android_clean and already_added.
* We loop through the android data set, and for every iteration:
   * We isolate the name of the app and the number of reviews.
   * We add the current row (app) to the android_clean list, and the app name (name) to the already_added list if:
     * The number of reviews of the current app matches the number of reviews of that app as described in the reviews_max dictionary; and
     * The name of the app is not already in the already_added list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for reviews_max[name] == n_reviews, we'll still end up with duplicate entries for some apps.

In [104]:
android_clean = []
already_added = []

for app in android:
    name = app[0]
    n_reviews = float(app[3])
    
    if (reviews_max[name] == n_reviews) and (name not in already_added):
        android_clean.append(app)
        already_added.append(name) # make sure this is inside the if block

Now let's quickly explore the new data set, and confirm that the number of rows is 9,659.

In [105]:
explore_data(android_clean, 0, 3, True)

['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']


Number of rows: 9659
Number of columns: 13


We have 9659 rows, just as expected.

## Removing Non-English Apps

If we explore the data sets enough, we can notice that there are names of some apps that are not for an English-speaking audience.
Below, we see a couple of examples from both data sets:

In [107]:
print(ios[813][1])
print(ios[6731][1])

print(android_clean[4412][0])
print(android_clean[7940][0])

爱奇艺PPS -《欢乐颂2》电视剧热播
【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き＆ブロックパズル〜
中国語 AQリスニング
لعبة تقدر تربح DZ


So, we'll remove all the apps that contains symbol that is not commonly used in English text. All the character that are specific to English texts (English alphabet, numbers, punctuation marks and symbols like +,\*,..) are encoded using the ASCII standard. 
Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters. 

In [108]:
def is_english(string):
    
    for character in string:
        if ord(character)>127:
            return False
        
    return True

In [110]:
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(is_english('Instagram'))

False
True


The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form.

In [111]:
print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))

print(ord('™'))
print(ord('😜'))

False
False
8482
128540


To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:

In [113]:
def is_english(string):
    no_ascii = 0
    for character in string:
        if ord(character)>127:
            no_ascii += 1
            
    if no_ascii > 3:
        return False
    else:
        return True 

print(is_english('Docs To Go™ Free Office Suite'))
print(is_english('Instachat 😜'))
print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

True
True
False
