## Profitable App Profiles for the App Store and Google Play Markets

Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

### Opening and Exploring the Data

As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.

Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:

- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).
- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).

Let's start by opening the two data sets and then continue with exploring the data.

In [11]:
app_store_file_name = "AppleStore.csv"

In [12]:
play_store_file_name = "googleplaystore.csv"

In [13]:
app_store_data = open(app_store_file_name)
play_store_data = open(play_store_file_name)

In [14]:
#Without using csv library

#for lines in app_store_data:
    #print(lines.split(","))


In [15]:
import csv as rd

In [16]:
app_store_d = open(app_store_file_name,'r')
adata = rd.reader(app_store_d,delimiter=',')

app_store_data = list(adata)

In [17]:
def data_info(x):
    print("Rows =", len(x))
    print("Columns =",len(x[0]))    


In [18]:
#print(app_store_data)

In [20]:
print(app_store_data[0])

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


In [23]:
print(len(app_store_data[0]))

16


In [22]:
print(len(app_store_data))

7198


In [24]:
play_store_d = open(play_store_file_name,'r')
pdata = rd.reader(play_store_d,delimiter=',')

play_store_data = list(pdata)

In [25]:
#print(play_store_data)

In [26]:
play_store_data[0]

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [27]:
play_store_headers = play_store_data[0]
play_store_data = play_store_data[1:]

play_store_headers

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [28]:
len(play_store_data)
col_in_play_store = len(play_store_data[0])
print(col_in_play_store)

13


### Wrong entry
We have a wrong entry in csv for playstore.

In [29]:
wrong_entries_indexes = []
row_index = 0
for row in play_store_data:
    if(len(row) != col_in_play_store):
        wrong_entries_indexes.append(row_index)
        print(row)
    row_index = row_index + 1

print(wrong_entries_indexes)

##will create a function to check if the data is not consistent

['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']
[10472]


In [30]:
#for i in wrong_entries_indexes:
    #play_store_data.pop(i)

print(len(play_store_data))

10841


In [31]:
data_info(play_store_data)

Rows = 10841
Columns = 13


In [32]:
def get_wrong_entries(x):
    wrong_indexes = []
    col = len(x[0])
    row_index = 0
    for row in x:
        if(len(row) != col):
            wrong_indexes.append(row_index)
            #print(row)
        row_index = row_index + 1
    
    return wrong_indexes

In [33]:
print(get_wrong_entries(play_store_data))

[10472]


In [34]:
def remove_wrong_entries(x):
    wrong_indexes = get_wrong_entries(x)
    if(len(wrong_indexes) > 0):
        
        wrong_indexes.reverse()
        for i in wrong_indexes:
            print("index=",i)
            x.pop(i)
    else:
        print("No wrong entries")
    
    return x
            

In [35]:
data_info(remove_wrong_entries(play_store_data))

index= 10472
Rows = 10840
Columns = 13


In [36]:
data_info(remove_wrong_entries(app_store_data))

No wrong entries
Rows = 7198
Columns = 16


In [37]:
def get_duplicates(x,limit):
    check_for_duplicates = False
    count_of_duplicates = 0
    for row in x:
        if(x.count(row) > 1 and count_of_duplicates < limit):
            print("No. of duplicates= ",x.count(row), " Duplicate row= ",row)
            check_for_duplicates = True
            count_of_duplicates = count_of_duplicates + 1
    return check_for_duplicates

In [38]:
get_duplicates(play_store_data,5)

No. of duplicates=  2  Duplicate row=  ['Ebook Reader', 'BOOKS_AND_REFERENCE', '4.1', '85842', '37M', '5,000,000+', 'Free', '0', 'Everyone', 'Books & Reference', 'June 25, 2018', '5.0.6', '4.0 and up']
No. of duplicates=  2  Duplicate row=  ['Docs To Go™ Free Office Suite', 'BUSINESS', '4.1', '217730', 'Varies with device', '50,000,000+', 'Free', '0', 'Everyone', 'Business', 'April 2, 2018', 'Varies with device', 'Varies with device']
No. of duplicates=  3  Duplicate row=  ['Google My Business', 'BUSINESS', '4.4', '70991', 'Varies with device', '5,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 24, 2018', '2.19.0.204537701', '4.4 and up']
No. of duplicates=  3  Duplicate row=  ['Box', 'BUSINESS', '4.2', '159872', 'Varies with device', '10,000,000+', 'Free', '0', 'Everyone', 'Business', 'July 31, 2018', 'Varies with device', 'Varies with device']
No. of duplicates=  2  Duplicate row=  ['ZOOM Cloud Meetings', 'BUSINESS', '4.4', '31614', '37M', '10,000,000+', 'Free', '0', 'Everyone'

True

In [39]:
get_duplicates(app_store_data,5)

False

In [40]:
def get_set_of_rows(x):
    result_set = set(tuple(row) for row in x)
    return list(result_set)


In [41]:
print(len(get_set_of_rows(play_store_data)))

10357


In [42]:
print("Number of duplicates = ", len(play_store_data)-len(get_set_of_rows(play_store_data)) )

Number of duplicates =  483


In [43]:
#update play_store_data with headers

play_store_data = get_set_of_rows(play_store_data)
data_info(play_store_data)

Rows = 10357
Columns = 13


In [44]:
play_store_headers

['App',
 'Category',
 'Rating',
 'Reviews',
 'Size',
 'Installs',
 'Type',
 'Price',
 'Content Rating',
 'Genres',
 'Last Updated',
 'Current Ver',
 'Android Ver']

In [45]:

def get_dict_of_columns(x,headers):
    store_dict = dict()
    count = 0
    for head in headers:
        for row in x:
            store_dict.setdefault(head,[]).append(row[count])
        count = count + 1
    return store_dict

In [46]:
play_store_dict = get_dict_of_columns(play_store_data,play_store_headers)
#play_store_dict

In [47]:
##Removing duplicates according to only one column


In [48]:
for i in play_store_dict:
    print("Column= ",i)
    get_duplicates(play_store_dict[i],10)
    

Column=  App
No. of duplicates=  2  Duplicate row=  Moovit: Bus Time & Train Time Live Info
No. of duplicates=  5  Duplicate row=  Angry Birds Classic
No. of duplicates=  2  Duplicate row=  BZWBK24 mobile
No. of duplicates=  2  Duplicate row=  Manga AZ - Manga Comic Reader
No. of duplicates=  2  Duplicate row=  Video Downloader
No. of duplicates=  2  Duplicate row=  YouCam Perfect - Selfie Photo Editor
No. of duplicates=  3  Duplicate row=  Magic Tiles 3
No. of duplicates=  2  Duplicate row=  SayHi Chat, Meet New People
No. of duplicates=  2  Duplicate row=  Pinterest
No. of duplicates=  2  Duplicate row=  textPlus: Free Text & Calls
Column=  Category
No. of duplicates=  1943  Duplicate row=  FAMILY
No. of duplicates=  137  Duplicate row=  MAPS_AND_NAVIGATION
No. of duplicates=  1121  Duplicate row=  GAME
No. of duplicates=  1943  Duplicate row=  FAMILY
No. of duplicates=  373  Duplicate row=  LIFESTYLE
No. of duplicates=  1943  Duplicate row=  FAMILY
No. of duplicates=  1121  Duplicat

In [63]:
def check_for_non_asc(x):
    non_asc_index = []
    for val in x:
        if(val.isascii() == False):
            non_asc_index.append(x.index(val))
    #return x.isascii()
    
    return non_asc_index


print(check_for_non_asc(["Test"]))
print(check_for_non_asc(["_1991_اف_جي2"]))
print(check_for_non_asc(["Babbel – Learn Languages"]))


[]
[0]
[0]


In [61]:
non_eng_index = check_for_non_asc(play_store_dict['App'])
#print(non_eng_index)

for x in check_for_non_asc(play_store_dict['App']):
    print(x,play_store_dict['App'][x])

7 Fruit Ninja®
13 XCOM®: Enemy Within
17 Xperia Link™
24 Cookpad - FREE recipe search makes fun cooking · musical making!
44 Anime Love Story Games: ✨Shadowtime✨
58 AC - Tips & News for Android™
117 뽕티비 - 개인방송, 인터넷방송, BJ방송
130 Meitu – Beauty Cam, Easy Photo Editor
147 SecondSecret ‐「恋を読む」BLノベルゲーム‐
158 Skip-Bo™
205 U Launcher Lite – FREE Live Cool Themes, Hide Apps
223 Low Poly – Puzzle art game
232 Sona - Nær við allastaðni
233 EA SPORTS™ FIFA 18 Companion
261 💘 WhatsLov: Smileys of love, stickers and GIF
263 EZCast – Cast Media to TV
276 SUBWAY®
281 FK Čukarički
282 Babbel – Learn Languages
292 Cheapflights – Flight Search
309 Live Camera Viewer ★ World Webcam & IP Cam Streams
313 DU Browser—Browse fast & fun
331 Direct Express®
332 Robot Fighting Games™ - Real Boxing Champions 3D
368 Mirror’s Edge™ Companion
375 DG ग्राम / Digital Gram Panchayat
282 Babbel – Learn Languages
407 Disciple Maker’s (DM) Lab
439 Lep's World 3 🍀🍀🍀
441 PBA® Bowling Challenge
446 Knightfall™ AR
448 studentsL

In [95]:
def check(x):
    try:
        x.encode(encoding='utf-8').decode('utf-8')
    except UnicodeDecodeError:
        return False
    else:
        return True

In [96]:
check("Babbel–Learn Languages")

True

In [97]:
check("Babbel-LearnLanguages")

True

In [91]:
ord('–')

8211

In [92]:
ord('-')

45

In [98]:
check("SUBWAY®")

True

In [99]:
check("뽕티비 - 개인방송, 인터넷방송, BJ방송")

True