# Insight of Play store and App Store dataset

- Find mobile app profiles that are profitable for the App Store and Google Play markets.
- The goal of the project is to analyze data extracted from apples' Appstore & Google's Playstore 
- To provide insight to developers about type of apps that are likely to attract more users

### Exploring data
- In the code below we define a function to explore the datasets(i.e app store, google play )

In [1]:
def explore_data(data_set,start= 0, end = -1, row_col = False):
    data_set_slice = data_set[start:end]
    for row in data_set_slice:
        print(row)
        print('\n')
    if row_col:
        print('no of rows :',len(data_set_slice))
        print('no of columns:', len(data_set_slice[0]))

#### storing the given datasets into 2-dimensional lists for easy analysis

In [2]:
from csv import reader
with open('AppleStore.csv', encoding = 'utf8') as f1:
    f1_read = reader(f1)
    ios = list(f1_read)
    ios_header = ios[0]
    ios = ios[1:]
with open('googleplaystore.csv', encoding = 'utf8') as f2:
    f2_read = reader(f2)
    android = list(f2_read)
    android_header = android [0]
    android = android[1:]
    print('ios_header',ios_header )
    print('\n')
    print('android_header', android_header)
    
    
    

## column names  for app store

"id" : App ID

"track_name": App Name

"size_bytes": Size (in Bytes)

"currency": Currency Type

"price": Price amount

"ratingcounttot": User Rating counts (for all version)

"ratingcountver": User Rating counts (for current version)

"user_rating" : Average User Rating value (for all version)

"userratingver": Average User Rating value (for current version)

"ver" : Latest version code

"cont_rating": Content Rating

"prime_genre": Primary Genre

"sup_devices.num": Number of supporting devices

"ipadSc_urls.num": Number of screenshots showed for display

"lang.num": Number of supported languages

"vpp_lic": Vpp Device Based Licensing Enabled

#### check for bad records by comparing each individual row length  with  length of header & delete respective records using index i.e(android[10472])

In [16]:
count = 0
for row in android:
    if len(android_header) != len(row):
        print(count,row)
    count += 1
for row in ios:
    if len(ios_header) != len(row):
        print(count,row)
    count += 1

# del android[10472]




        

#### Identifying duplicate records in the data sets and stroring the respective app names in list
- check for duplicate apps names using index 0 for google play store 
- index 1 for ios data set i.e, (track name apps name  from header)

In [4]:
android_dup_apps = []
android_uniq_apps = []
for app in android:
    name = app[0]
    if name in android_uniq_apps:
        android_dup_apps.append(name)
    else:
        android_uniq_apps.append(name)
print('no of dup_apps',len(android_dup_apps))
print('no of uniq_apps', len(android_uniq_apps))
print('dup_apps', android_dup_apps[:15])

ios_dup_apps = []
ios_uniq_apps = []
for app in ios:
    name = app[1]
    if name in ios_uniq_apps:
        ios_dup_apps.append(name)
    else:
        ios_uniq_apps.append(name)
print('no of ios_dup_apps',len(ios_dup_apps))
print('no of uniq_apps', len(ios_uniq_apps))
print('dup_apps', ios_dup_apps[:15])


no of dup_apps 1181
no of uniq_apps 9659
dup_apps ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']
no of ios_dup_apps 2
no of uniq_apps 7195
dup_apps ['Mannequin Challenge', 'VR Roller Coaster']


#### Instead of removing duplicate apps randomely its better to use ' no of ratings' field as criterian 
-  The higher the number of reviews, the more recent the data should be. Rather than removing duplicates randomly, we'll only keep the row with the highest number of reviews and remove the other entries for any given app.

In [5]:
for app in android:
    name = app[0]
    if name == 'Slack':
        print(app)

['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51507', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']
['Slack', 'BUSINESS', '4.4', '51510', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'August 2, 2018', 'Varies with device', 'Varies with device']


#### To remove the duplicates, we will:

- Create a dictionary, where each dictionary key is a unique app name and the corresponding dictionary value is the highest number of reviews of that app.
- Use the information stored in the dictionary and create a new data set, which will have only one entry per app (and for each app, we'll only select the entry with the highest number of reviews).

In [30]:

android_reviews_max = {}
ios_reviews_max = {}
for app in android:
    name = app[0]
    n_reviews = float(app[3])
    if name in android_reviews_max and android_reviews_max[name] < n_reviews:
        android_reviews_max[name] = n_reviews
    if name not in android_reviews_max:
        android_reviews_max[name] = n_reviews
# verifying the count with previous unique app count:)
# 9,659
# print('android_reviews_max', len(android_reviews_max))
# print(android_reviews_max)

for app in ios:
    name = app[1]
    n_reviews = float(app[5])
    if name in ios_reviews_max and ios_reviews_max[name] < n_reviews:
        ios_reviews_max[name] = n_reviews
    if name not in ios_reviews_max:
        ios_reviews_max[name] = n_reviews
 
# ios_rewiews_max 7195
print('ios_reviews_max', len(ios_reviews_max))
print('android_reviews_max', len(android_reviews_max))
for k, y in ios_reviews_max.items():
    print (k, y)
        
    
    

ios_reviews_max 7195
android_reviews_max 9659
Facebook 2974676.0
Instagram 2161558.0
Clash of Clans 2130805.0
Temple Run 1724546.0
Pandora - Music & Radio 1126879.0
Pinterest 1061624.0
Bible 985920.0
Candy Crush Saga 961794.0
Spotify Music 878563.0
Angry Birds 824451.0
Subway Surfers 706110.0
Fruit Ninja Classic 698516.0
Solitaire 679055.0
CSR Racing 677247.0
Crossy Road - Endless Arcade Hopper 669079.0
Injustice: Gods Among Us 612532.0
Hay Day 567344.0
Clear Vision (17+) 541693.0
Minecraft: Pocket Edition 522012.0
PAC-MAN 508808.0
Calorie Counter & Diet Tracker by MyFitnessPal 507706.0
DragonVale 503230.0
The Weather Channel: Forecast, Radar & Alerts 495626.0
Head Soccer 481564.0
Google – Search made just for mobile 479440.0
Despicable Me: Minion Rush 464312.0
The Sims™ FreePlay 446880.0
Google Earth 446185.0
Plants vs. Zombies 426463.0
Sonic Dash 418033.0
Groupon - Deals, Coupons & Discount Shopping App 417779.0
8 Ball Pool™ 416736.0
Tiny Tower - Free City Building 414803.0
Jetpack J

Instant Heart Rate+: Heart Rate & Pulse Monitor 10158.0
Attack the Light - Steven Universe Light RPG 10132.0
Weed Firm: RePlanted 10122.0
Google Play Music 10118.0
Bumble – Find a Date, Meet Friends & Network 10109.0
Fresh Tracks Snowboarding 10107.0
Map My Ride+ - GPS Cycling & Route Tracker 10046.0
Big Button Box - funny sounds, sound effects buttons, pro fx soundboard, fun games board, scary music, annoying fart noises, jokes, super cool dj effect, cat, dog & animal fx 10031.0
Dark Sky Weather 10014.0
Certified Mixtapes - Hip Hop Albums & Mixtapes 9975.0
Flick Golf! 9948.0
High 5 Casino - Real Vegas Slots! 9930.0
Google Slides 9920.0
Home Design 3D GOLD 9889.0
The Impossible Game 9885.0
365Scores 9879.0
Five Nights at Freddy's 3 9876.0
Mr. Crab 9875.0
Worms 2: Armageddon 9870.0
All-in-1 Logic GameBox 9854.0
HP All-in-One Printer Remote 9819.0
Red Ball 4 9818.0
Remind: Fast, Efficient School Messaging 9796.0
Genies & Gems 9768.0
Dude Perfect 9763.0
Drop Flip 9752.0
The Sims 3 World A

Hudl 2622.0
Firefox web browser 2619.0
Ultimate Food Value Diary - Diet & Weight Tracker 2618.0
My Virtual Girlfriend - Deluxe Dating Sim 2609.0
Bakery Blitz: Cooking Game 2605.0
Video Merger Pro Combine Multiple Videos to Video 2604.0
Pixel Car Racer 2602.0
AirCoaster - Roller Coaster Builder 2602.0
Can You Dab? 2601.0
Scanner App by Photomyne: Scan & Auto-Crop Photos 2582.0
First Words Animals 2576.0
Bits of Sweets 2572.0
Trebel Music - Unlimited Music Downloader 2570.0
The Trace: Murder Mystery Game - Analyze evidence and solve the criminal case 2570.0
Red Onion - Tor-powered web browser for anonymous browsing and darknet 2566.0
Tozzle - Toddler's favorite puzzle 2562.0
Give It Up! 2560.0
Jelly Blast: New Exciting Match 3 2557.0
Soccer Clicker 2556.0
Pirate Power 2555.0
Nom Nom Paleo 2555.0
Due — Reminders, Countdown Timers 2554.0
Glowing Snake King - Anaconda Diep War Battle Game 2554.0
Dude Perfect HD 2552.0
Kitty Powers' Matchmaker 2552.0
TeachMe: Kindergarten 2548.0
Mad Skills B

Slice Fractions 834.0
AJ Jump: Animal Jam Kangaroos! 834.0
Dawn of Gods 833.0
Ketchapp Tennis 830.0
Camcorder - Record VHS Home Videos 830.0
All is Lost 829.0
Ski Tracks 829.0
Family Organizer - Calendar Planner 828.0
Infect Them All : Vampires 826.0
Guides for Pokémon GO - Pokemon GO News and Cheats 826.0
Cisco AnyConnect 825.0
Scotland Yard 824.0
TEDiSUB - Enjoy TED Talks with Subtitles 823.0
Eden: The Game - Build Your Village! 823.0
har•mo•ny 3 823.0
讯飞输入法-智能语音输入和表情斗图神器 822.0
MacID 820.0
Clash of Queens: Dragons Rise 820.0
Mutant Mudds 819.0
Defenders 2: Tower Defense battle of the frontiers 818.0
FL Studio Mobile 818.0
Videon - Video Camera and Editor with Zoom, Pause, Effects, Filters 818.0
Mimpi 816.0
Trick Shot 816.0
Crayola Color Alive 816.0
Snowboarding The Fourth Phase 815.0
Pretty Ballerina - Ballet Dreams 814.0
Gather - In real life 813.0
Focus To-Do: The Best Focus Timer for Work & Study 813.0
Nintype 813.0
Hockey Clicker 813.0
Big Day - Event Countdown 812.0
UFC ® 809.0


Dr. Seuss Camera - The Cat in the Hat Edition 249.0
Hydra - Amazing Photography 249.0
High Dive 249.0
ownCloud 249.0
Trainline UK: Live Train Times, Tickets & Planner 248.0
中国联通手机营业厅客户端(官方版) 248.0
Haunted Manor 2 - The Horror behind the Mystery - FULL (Christmas Edition) 248.0
Math 42 248.0
Dreampath - The Two Kingdoms HD - A Magical Hidden Object Game (Full) 248.0
VinoCell: manage your wine cellar like a pro 248.0
电信营业厅 248.0
Osmo Masterpiece 247.0
Crypt of the NecroDancer Pocket Edition 247.0
Galactic Nemesis 247.0
The EO Bar 247.0
Hiking Project 247.0
Motor Trend OnDemand 247.0
miCal - the missing calendar with reminders 245.0
Frax - The First Realtime Immersive Fractals 245.0
Dirt Racing Mobile 3D 245.0
PetMOJI – Character Creator & Emoji Keyboard 244.0
BeetMoji 244.0
FabFocus - portraits with depth and bokeh 244.0
News Pro - Breitbart Edition 244.0
Chaos Centurions 244.0
Take Off - The Flight Simulator 244.0
HARVEST MOON: Seeds Of Memories 244.0
InstaSave for Instagram - Download 

Mystery Trackers: Nightsville Horror HD - A Hidden Object Adventure (Full) 72.0
Copy My Data 72.0
Gumtree Classifieds: Buy & Sell - Local for Sale 72.0
Parents Calling Easter Bunny 72.0
Hunting USA 72.0
Save The Line 72.0
Flip King 72.0
Super Why! Phonics Fair 72.0
Royal Detective: Legend of The Golem - A Hidden Object Adventure (Full) 72.0
Small Town Terrors: Galdor's Bluff HD - A Magical Hidden Object Mystery (Full) 71.0
The Curio Society: Eclipse over Mesina HD - A Hidden Object Mystery (Full) 71.0
Azerrz Sounds 71.0
KNFB Reader 71.0
Signily Keyboard - Sign Language Emoji and GIFs! 71.0
Phantasmat: The Dread of Oakville - A Mystery Hidden Object Game (Full) 71.0
Chimeras: The Signs of Prophecy - A Hidden Object Adventure (Full) 71.0
Iron Commander 71.0
Space War HD 71.0
Cut the Rope HD 71.0
Teenage Mutant Ninja Turtles: Battle Match Game 70.0
Flowstate 70.0
NORAD Tracks Santa Claus 70.0
Sago Mini Babies 70.0
Grim Tales: Threads of Destiny - A Hidden Object Mystery (Full) 70.0
Vectra

Super Spy Girl Salon: Spa, Makeup and Dressup Game 12.0
Stealers Steal: A thief's quest for gold 12.0
【明星恋爱】偶像之路TIME TO STAR 12.0
실시간 날씨 12.0
Cleanz - Clean up Your Photo Library 12.0
GameDay College Football Radio - Live Games, Scores, News, Highlights, Videos, Schedule, and Rankings 12.0
Attack Heroes 12.0
Oh, the Places You'll Go! - Read & Play - Dr. Seuss 12.0
零基础学音标 12.0
Vandermojis by Lisa Vanderpump 12.0
Grandpa's Toy Shop 12.0
Assault on Arnhem 12.0
Monster Zombie Plague War - Virtual Reality (VR) 12.0
记账·圈子账本(专业版)—可共享的全能记帐本软件 12.0
Demi Lovato Stickers 12.0
宾果消消消 -赵丽颖代言 12.0
Addictive Pro 12.0
Clipstro - auto strobe motion video creator 12.0
Hear My Baby - Baby Heartbeat Monitor App 12.0
PJ Masks: Super City Run 12.0
Baby Cries Translator 12.0
Team Drift Cats 12.0
Aquarium VR 12.0
戦艦帝国-200艘の実在戦艦を集めろ (2周年記念&世界2000万DL) 12.0
Toby: The Secret Mine 12.0
Police Lights 3 - Fire Truck, Police and Paramedic Sirens 12.0
Dark Souls III Map Companion 11.0
DADA Trains 11.0
Stephen Hawking's

肌年齢診断 0.0
ｗｗｗ 0.0
自らデッドボール 0.0
洋葱圈—正经人的不正经聊天工具 0.0
飞刀传奇-动作武侠热血江湖即时PK传奇（登录爆金装） 0.0
パチスロ デビル メイ クライ クロス 0.0
ガールズちゃんねる - 女子のニュースとガールズトーク 0.0
英熟語ターゲット1000（4訂版） 0.0
【マイナビバイト大学生版】大学生のバイト探し・アルバイト求人 0.0
Pupoji - Cute Dog Emoji Keyboard Puppy Face Emojis 0.0
兜町アナリストがお伝えする「兜予報」（無料） 0.0
Masks - MSQRD Edition 0.0
Clash Playbook: Plan Attacks for Clash of Clans 0.0
ドリブルの達人 0.0
剑客情缘-高爆率高掉落天天疯玩 0.0
【放置】勇者改名 ～「ふざけた名前つけやがって！」 0.0
現金製造ロボ -毎日コツコツ3万円稼ぐ- 0.0
出会い系無料で遊べるsnsアプリ内緒チャットトーク 0.0
鬼吹灯昆仑神宫 - 年兽袭来 0.0
スッキリ謎解きゲーム！！ 0.0
えほう - 最強の恵方コンパス 0.0
Clash of Richers 3 （城市富翁3） 0.0
借金勇者~そして完済へ… 0.0
Poker - Texas Holdem HD Poker 0.0
Fluege.de - Finde den billigsten Flug 0.0
BLOCK(ブロック) -ぼくの箱庭【3D】- 0.0
NumberQ 0.0
素飛び 0.0
BlurEffect-Blur Photo & Video, Hide Face 0.0
问仙奇遇-新玩法新套装嗨到爆 0.0
心の美男美女診断 0.0
動画英文法2700 0.0
TAP BRAIN - 1日5分の計算で頭が良くなるゲーム 0.0
地元の掲示板「ジモティー」地元でカンタン！フリマよりもお得！ 0.0
Digiposte + 0.0
BringGo Western Europe 0.0
モンスト覇者の塔マルチ掲示板 for モンスターストライク 0.0
ケンタッキーフライドチキン　公式アプリ 0.0
MangaTiara - love comic reader 0.0
君にはク

#### Use the dictionary  created above to remove the duplicate rows:
- Start by creating two empty lists: android_clean & ios_clean (which will store  new cleaned data set) and already_added (which will just store app names)
- Loop through the Google Play data set and for each iteration:
    1. Assign the app name to a variable named name.
    2. Convert the number of reviews to float, and assign it to a variable named n_reviews.
- If n_reviews is the same as the number of maximum reviews of the app name (the number can be found in the reviews_max dictionary) and name is not already in the list already_added 
    1. Append the entire row to the android_clean list (which will eventually be a list of list and store  cleaned data set).
    2. Append the name of the app name to the already_added list — this helps us to keep track of apps that we already added.
- Explore the clean data set to ensure everything went as expected. The android & ios data sets have 9,659 rows 7195 rows respectively .   

In [7]:
android_clean = []
android_already_added = []
ios_clean = []
ios_already_added = []
for app in android:
    name = app[0] 
    n_reviews = float(app[3])
    if name not in android_already_added and  android_reviews_max[name] == n_reviews:
        android_clean.append(app)
        android_already_added.append(name)
# print('android_clean', len(android_clean))
# print(android_clean[:3])

for app in ios:
    name = app[1] 
    n_reviews = float(app[5])
    if name not in ios_already_added and  ios_reviews_max[name] == n_reviews:
        ios_clean.append(app)
        ios_already_added.append(name)
print('ios_clean', len(ios_clean))
print(ios_clean[:3])
    
        

ios_clean 7195
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1'], ['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']]


#### Removing Non- English Apps from both Data sets:
- Since we  are analying apps which are developed in English getting rid of non English apps is a good idea as it will save space and time
- if we explore the data long enough, we'll find that both data sets have apps with names that suggest they are not directed toward an English-speaking audience.
- One way to go about this is to remove each app with a name containing a symbol that is not commonly used in English text 
- We can get the corresponding number of each character using the ord() built-in function.
- The numbers corresponding to the characters we commonly use in an English text are all in the range 0 to 127, according to the ASCII  system.
- Based on this number range, we can build a function that detects whether a character belongs to the set of common English characters or not

In [8]:
#  function that takes in a string and returns False if there's any character in the string that doesn't belong to the set of common English characters else returns True
def eng_word(word):
    count = 0
    for letter in word:
        if ord(letter) > 127:
            count += 1
            
        if count > 3:
            return False
    return True
# testing Eng_word
print(eng_word('Faceboook'))
print(eng_word('Docs To Go™ Free Office Suite'))
print(eng_word('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_word('Instachat 😜'))

True
True
False
True


-  we wrote a function that detects non-English app names, but we saw that the function couldn't correctly identify certain English app names like 'Docs To Go™ Free Office Suite' and 'Instachat 😜'. This is because emojis and characters like ™ fall outside the ASCII range and have corresponding numbers over 127.
-  To minimize the impact of data loss, we'll only remove an app if its name has more than three characters with corresponding numbers falling outside the ASCII range

- Filtering out non-English apps from both data sets. Loop through each data set. If an app name is identified as English, append the whole row to a separate list.

In [9]:
android_english = []
android_non_english = []
ios_english = []
ios_non_english = []
for app in android_clean:
    name = app[0]
    if eng_word(name):
        android_english.append(app)
    else:
        android_non_english.append(app)
print('android_english', len(android_english))
print('android_non_english', len(android_non_english))

for app in ios_clean:
    name = app[1]
    if eng_word(name):
        ios_english.append(app)
    else:
        ios_non_english.append(app)
print('ios_english', len(ios_english))

android_english 9614
android_non_english 45
ios_english 6181


#### So far in the data cleaning process, we:
- Removed inaccurate data
- Removed duplicate app entries
- Removed non-English apps

As we mentioned in the introduction, we  are analyzing  free to download and install apps, since main source of revenue consists of in-app ads. 
Our data sets contain both free and non-free apps, we'll need to isolate only the free apps for our analysis.
Isolating the free apps will be our last step in the data cleaning process. 

- Loop through each data set to isolate the free apps in separate lists.  Identify the columns describing the app price correctly.

- Ater  isolating the free apps, check the length of each data set to see how many apps you have remaining.

In [10]:
print(android_english[:2])
print(ios_english[:2])

[['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'], ['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']]
[['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'], ['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']]


In [11]:
# android_english 9614, ios_english 6181
android_eng_free = []
android_eng_non_free = []
ios_eng_free = []
ios_eng_non_free = []
for app in android_english:
    price = app[6].lower()
    if price == 'free':
        android_eng_free.append(app)
    else:
        android_eng_non_free.append(app)
print('android_eng_free', len(android_eng_free))
print('android_eng_non_free', len(android_eng_non_free))
print('total android',(len(android_eng_free)+len(android_eng_non_free)))

for app in ios_english:
    price = float(app[4])
        
    if price == 0:
        ios_eng_free.append(app)
    else:
        ios_eng_non_free.append(app)
print('ios_eng_free', len(ios_eng_free))
print('ios_eng_non_free', len(ios_eng_non_free))
    

android_eng_free 8863
android_eng_non_free 751
total android 9614
ios_eng_free 3220
ios_eng_non_free 2961


In [12]:
print(android_header)
print(android_eng_free[1])
print('\n')
print(ios_header)
print(ios_eng_free[1])

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']
['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']


['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']
['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


#### Determine the kinds of apps that are likely to attract more users 
- Goal is to add the app on both Google Play and the App Store for that  we need to find app profiles that are successful on both markets
- Let's begin the analysis by getting a sense of what are the most common genres for each market. 
- For this, we'll need to build frequency tables for a few columns in our data sets.

- Inspect both data sets and identify the columns we could use to generate frequency tables to find out what are the most common genres in each market.
- we can use Genre , prime_genre columns  from android, ios datasets respectively to know the genre of the apps 
- reviews columns to know no of users for each genre

In [13]:
android_genre_dict = {}
ios_genre_dict = {}
for app in android_eng_free:
    genre = app[-4].lower()
#     reviews = float(app[3])
    android_genre_dict[genre] = android_genre_dict.get(genre, 0) + 1
print(android_genre_dict) 

for app in ios_eng_free:
    genre = app[-5].lower()
#     reviews = float(app[5])
    ios_genre_dict[genre] = ios_genre_dict.get(genre,0) + 1
print('\n')
print(ios_genre_dict)
    

{'art & design': 53, 'art & design;creativity': 6, 'auto & vehicles': 82, 'beauty': 53, 'books & reference': 190, 'business': 407, 'comics': 54, 'comics;creativity': 1, 'communication': 287, 'dating': 165, 'education': 474, 'education;creativity': 4, 'education;education': 30, 'education;pretend play': 5, 'education;brain games': 3, 'entertainment': 538, 'entertainment;brain games': 7, 'entertainment;creativity': 3, 'entertainment;music & video': 15, 'events': 63, 'finance': 328, 'food & drink': 110, 'health & fitness': 273, 'house & home': 73, 'libraries & demo': 83, 'lifestyle': 345, 'lifestyle;pretend play': 1, 'card': 40, 'arcade': 164, 'puzzle': 100, 'racing': 88, 'sports': 307, 'casual': 156, 'simulation': 181, 'adventure': 60, 'trivia': 37, 'action': 275, 'word': 23, 'role playing': 83, 'strategy': 80, 'board': 34, 'music': 18, 'action;action & adventure': 9, 'casual;brain games': 12, 'educational;creativity': 3, 'puzzle;brain games': 15, 'educational;education': 35, 'casual;pre

In [14]:
print(sorted([(v,k) for k,v in ios_genre_dict.items()], reverse = True))

[(1872, 'games'), (254, 'entertainment'), (160, 'photo & video'), (118, 'education'), (106, 'social networking'), (84, 'shopping'), (81, 'utilities'), (69, 'sports'), (66, 'music'), (65, 'health & fitness'), (56, 'productivity'), (51, 'lifestyle'), (43, 'news'), (40, 'travel'), (36, 'finance'), (28, 'weather'), (26, 'food & drink'), (18, 'reference'), (17, 'business'), (14, 'book'), (6, 'navigation'), (6, 'medical'), (4, 'catalogs')]


#### In the code below we will:
- create a function for generating frequency tables, and use it in combination with the display_table() function.
    - freq_table()  takes in  parameters dataset (which is expected to be a list of lists) and index (which is expected to be an integer).
    - function will return the frequency table (as a dictionary) for any column we want. The frequencies will be expressed as percentages.
    - we will use it to display the frequency table of the columns prime_genre from ios , Genres, and Category from Android

In [21]:
def freq_table(dataset, ind):
    table = {}
    for app in dataset:
        col = app[ind]
        table[col] = table.get(col, 0) +1
    l = len(dataset)
    for k,v in table.items():
        table[k] = (table[k]/l)*100
    return table
# print(freq_table(android_eng_free, -4))
def dispay_table(dataset, ind):
    table = freq_table(dataset, ind)
    table_sorted = sorted([(v,k) for k,v in table.items()], reverse= True)
    for v,k in table_sorted:
        print(k, ':', v)
        
genres_freq_android = freq_table(android_eng_free, -4)
category_freq_android = freq_table(android_eng_free, 1)
prime_genre_freq_ios = freq_table(ios_eng_free, -5)

print(dispay_table(android_eng_free, -4))
print('\n')
print(dispay_table(android_eng_free, 1))
print('\n')
print(dispay_table(ios_eng_free, -5))

        
        
        
    
     
    

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

### Analysis based on  above freq tables:
#### prime genre column from ios:
    - most common genre: Games with 58%
    - runnner up : Entertainment with 7.88 %
    - most of the apps in ios are designed towards entertaiment audience like  (games, photo and video, social networking, sports, music)

#### Category and Genres column of the Google Play data set
    - most common genre: family with 58% and tools with 8.45% repectively
    - runnner up : game with 9.72 and Entertainment with 6.07 %
#### Comparing  patterns  of  Google Play market with  App Store market.
    - after comparing both markets we can see higher % of entertainment apps like games in both platforms
    - genre  least number of apps are educational , and apps that require reading like comics, art design
- we can't  recommend an app profile  based on this frequency table alone . Large number of apps for a particular genre, does not imply that apps of that genre generally have a large number of users. Ex : Single app like fitbit from health category might have large no of installs but the apps in the genre are very less 

### Calculate the average number of user ratings for each genre
The frequency tables we analyzed so far showed us that the App Store is dominated by apps designed for fun,
while Google Play shows a more balanced landscape of both practical and fun apps. Now, we'd like to get an idea about the kind of apps with the most users.
One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre.
For the Google Play data set, we can find this information in the Installs column, but this information is missing for the App Store data set. 
As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the rating_count_tot app.
Let's start with calculating the average number of user ratings per app genre on the App Store. To do that, we'll :
- Isolate the apps of each genre.
- Sum up the user ratings for the apps of that genre.
- Divide the sum by the number of apps belonging to that genre (not by the total number of apps).

In [28]:
android_genre_count = {}
android_genre_installs = {}
android_avg_installs = {}

ios_genre_count = {}
ios_genre_rating = {}
ios_avg_rating = {}

for app in android_eng_free:
    genre = app[-4].lower()
    installs = float(app[5].replace(',','').replace('+',''))
    android_genre_installs[genre] = android_genre_installs.get(genre, 0) + installs
    android_genre_count[genre] = android_genre_count.get(genre, 0) + 1
for genre in android_genre_installs:
    android_avg_installs[genre] = android_genre_installs[genre] / android_genre_count[genre]

print(sorted(android_avg_installs.items(), key = lambda x:x[1], reverse = True ))

for app in ios_eng_free:
    genre = app[-5].lower()
    rating = float(app[5])
    ios_genre_rating[genre] = ios_genre_rating.get(genre,0) + rating
    ios_genre_count[genre] = ios_genre_count.get(genre, 0) + 1
for genre in ios_genre_rating:
    ios_avg_rating[genre] = ios_genre_rating[genre] / ios_genre_count[genre]
    
    
print('\n')
print(sorted(ios_avg_rating.items(), key = lambda item : item[1], reverse = True ))
    


[('communication', 38456119.167247385), ('adventure;action & adventure', 35333333.333333336), ('video players & editors', 24947335.796178345), ('social', 23253652.127118643), ('arcade', 22888365.48780488), ('casual', 19569221.602564104), ('puzzle;action & adventure', 18366666.666666668), ('photography', 17840110.40229885), ('educational;action & adventure', 17016666.666666668), ('productivity', 16787331.344927534), ('racing', 15910645.681818182), ('travel & local', 14051476.145631067), ('casual;action & adventure', 12916666.666666666), ('action', 12603588.872727273), ('strategy', 11339901.3125), ('tools', 10802461.246995995), ('lifestyle;pretend play', 10000000.0), ('casual;music & video', 10000000.0), ('tools;education', 10000000.0), ('card;action & adventure', 10000000.0), ('adventure;education', 10000000.0), ('role playing;brain games', 10000000.0), ('news & magazines', 9549178.467741935), ('music', 9445583.333333334), ('educational;pretend play', 9375000.0), ('puzzle;brain games', 

## Conclusion 

### Genre with Highest average user base:
#### Android
On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs
#### IOS
On average, navigation apps have the highest number of user reviews, this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together