#                   Data Analysis on Apps

## *Problem Statement*
 
The company gets revenue on in-app ads. So, Intially to get the revenue, there has to be users to utilize the app. This project goal is to find out what type of apps does users get attracted to? What genre of App do they like?

The company concentrates on the in-app ads and not on the app developments. It's interests are on free apps that are convenient to store. 


In [1]:
from csv import reader

# Opening the datsets and converting to lists
ios_dataset = list(reader(open('AppleStore.csv', encoding = 'utf8')))
android_dataset = list(reader(open('googleplaystore.csv', encoding = 'utf8')))



To explore the contents of data, we will use the **explore** function to print desired number of rows. This helps to view the data and later select the columns or rows needed for further analysis

In [2]:
def explore(dataset, start, end, rows_columns = False):
    dataset_slice = dataset[start:end]
    for row in dataset_slice:
        print(row)
        print('\n')
        
    if rows_columns:
        print('Number of rows', len(dataset))
        print('Number of columns', len(dataset[0]))
        

print('Apple apps \n')
print(ios_dataset[0])
print('\n')
explore(ios_dataset, 1, 3, True)

Apple apps 

['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']


['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']


['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']


Number of rows 7198
Number of columns 16


Total number of iOS apps are 7198. Clear understanding of column names, is given in the dataset [documnetation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home). The columns needed for our analysis are: 

- **'track_name'** - Type of App
- **'currency'** - Type of Currency
- **'price'** - Free or Paid app
- **'rating_count_tot'** - User rating total counts
- **'rating_count_ver'** - User rating for current version counts
- **'user_rating'** - Average user rating for respective app

Lets explore Google Apps

In [3]:
print('Google Apps \n')
print(android_dataset[0])
print('\n')
explore(android_dataset, 1, 3, True)

Google Apps 

['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']


['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']


['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']


Number of rows 10842
Number of columns 13


Total number of android apps are 10,842. The coloumns needed for our analysis are: 'App' , 'Category' , 'Reviews', 'Installs', 'Type', 'price', 'Genres'.  

## Deleting Wrong data

We need to inspect if the data has any duplicates or wrong data before processing further. For Android apps datasets, the [discussion](https://www.kaggle.com/lava18/google-play-store-apps/discussion/163791) helps to identify the wrong data. One of the user points out that 10473 row in the data set has values shifted to left. So we can drop that particular row.

In [4]:
print(android_dataset[10473])
print('\n')
print(android_dataset[0])
print ('\n')


['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']


['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']




After comparing the incorrect row with header, we see that Category column is missing. So we'll delete this row!

In [5]:
del android_dataset[10473]
print(len(android_dataset))

10841


## Removing repeated data


After removing wrong data, we need to identify duplicate rows. As app with same particular column leads to over-fitting of the data, and we dont want to analyse the data with same category of the column so,

- First To Identify the duplicate rows
- Second to identify the criteria required to select the desired row and remove the rest

In [6]:
# Identify the duplicates
duplicate_apps = []
unique_apps = []
for row in android_dataset:
    app = row[0]
    if app in unique_apps:
        duplicate_apps.append(app)
    else:
        unique_apps.append(app)
print('Total no. of duplicated apps - ', len(duplicate_apps))
print('\n')
print('Examples of duplicate apps:', duplicate_apps[:10])

Total no. of duplicated apps -  1181


Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack']


In [7]:
# frequency table of duplicated rows
dicti_duplicate = {}

for name in duplicate_apps:
    if name in dicti_duplicate:
        dicti_duplicate[name] += 1
    else:
        dicti_duplicate[name] = 1
        
print(len(dicti_duplicate))

798


There are 798 duplicate apps. 

In [8]:
for app in android_dataset:
    name = app[0]
    if name == 'Instagram':
        print(app)
        print('/n')

['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
/n
['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
/n
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
/n
['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']
/n


We can see that there is differencr in number of reviews i.e. 'reviews' column.So we can keep the one with highest reviews and delete the rest from dataset. As more number of reviews leads to latest user reviews.

This leads to second step **Identifing the criteria**

In [9]:
reviews_max = {}
for row in android_dataset[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if name in reviews_max and reviews_max[name] < n_reviews:
        reviews_max[name] = n_reviews
    elif name not in reviews_max:
        reviews_max[name] = n_reviews

print('Expected length:', len(android_dataset[1:]) - 1181)
print('Actual length:', len(reviews_max))

Expected length: 9659
Actual length: 9659


Lets use the above created dictionary to remove the duplicates. After assigning the repeated app names with maximum reviews, now we need to create a new list with clean data. We can check if the data has been cleaned by comparing the length of Expected length (**'reviews_max'**) and Clean android dataset (**'android_clean'**).

In [10]:
android_clean = [] # To store new cleaned data set
already_android =[] # To store app_names
for row in android_dataset[1:]:
    name = row[0]
    n_reviews = float(row[3])
    if n_reviews == reviews_max[name] and name not in already_android:
        android_clean.append(row)
        already_android.append(name)

print(len(android_clean))

9659


To check if ios dataset has any duplicates in it, let's create a frequency table and compare it's length with actual ios datset.

In [11]:
already_ios = {}
for row in ios_dataset[1:]:
    identity = row[0]
    if identity in already_ios:
        already_ios[identity] += 1
    else: 
        already_ios[identity] = 1

print('The length of actual ios dataset', len(ios_dataset[1:]))
print('The length of frequency table of ios apps', len(already_ios))
        

The length of actual ios dataset 7197
The length of frequency table of ios apps 7197


From above result, we can conclude that there are no duplicates as each app has its own identity (**'id'**) number. So to check the duplicates we can check it with identity number instead of app name (**'track name'**)

## Removing Non-English Apps

To check if the app is english or non-english app, we need to check the name of app. One way is to remove each app wath a name containing a symbol that is not commonly used in english text - English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;), and other symbols (+, *, /).

As each character in the text, corresponds to a number in ASCII system. The numbers corresponding to characters in an english text are in range 0 to 127. Any character with a number greater than 127 is not a english text.

- First identify whether it is english or not
- Second remove the rows corresponding to Non-English Apps


In [12]:
def eng_non_eng(word): # To check if it is an engliah app or not.
    j = 0
    for i in word:
        if ord(i) > 127:
            j += 1              
            if j > 3:        # To check if there are more than 3 characters that
                             # don't belong to english language
                return False
    return True
print(eng_non_eng('Instagram'))
print(eng_non_eng('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(eng_non_eng('Docs To Go™ Free Office Suite'))
print(eng_non_eng('Instachat 😜'))

True
False
True
True


Lets check in android datset, and seperate english_apps and non_english apps from cleaned dataset

In [13]:
english_android_dataset = []
for row in android_clean:
    name = row[0]
    if eng_non_eng(name):
        english_android_dataset.append(row)
        
print('The total number of english android apps',len(english_android_dataset))
print('The total number of non-english android apps',len(android_clean)-len(english_android_dataset))
        

The total number of english android apps 9614
The total number of non-english android apps 45


Lets check in apple apps dataset and seperate them

In [14]:
english_apple_dataset = []
for row in ios_dataset[1:]:
    name = row[1]
    if eng_non_eng(name):
        english_apple_dataset.append(row)
            
print('The total number of english IOs apps', len(english_apple_dataset))
print('The total number of non-english IOs apps', len(ios_dataset[1:])-len(english_apple_dataset))

The total number of english IOs apps 6183
The total number of non-english IOs apps 1014


## Free Apps

As mentioned, the company is only interested in in-app adds and only build apps that are free to download and install. As the datasets contain both free and non-free apps we need to seperate the apps

- First identify the free apps
- Second isolate the free apps

Lets do in android dataset

In [15]:
free_android_apps = []
for row in english_android_dataset:
    type_app = row[6]
    if type_app == 'Free':
        free_android_apps.append(row)

print('The total number of free android apps', len(free_android_apps))
print('The total number of non-free apps', len(english_android_dataset)-len(free_android_apps))
    
    

The total number of free android apps 8863
The total number of non-free apps 751


Now to identify and isolate free IOs apps

In [16]:
free_apple_apps = []
for row in english_apple_dataset:
    price = float(row[4])
    if price == 0.0:
        free_apple_apps.append(row)
        
print('The total number of free iOs apps', len(free_apple_apps))
print('The total number non-free iOs apps', len(english_apple_dataset)-len(free_apple_apps))

The total number of free iOs apps 3222
The total number non-free iOs apps 2961


## Common apps by Genre

Till now we have cleaned the datasets. As the main aim is to determine the kind of apps that most of the users are attracted/Interested. 

The validation strategy of the company is compridsed of 3 steps:

- Build a minimal Android version of the app and add it to Playstore.
- If the app has a good response from users, we develop it further
- If the app is profitable after 6 months, they build an iOS version of the app and add it to the App store

So, a frequency table should be created to identfy the genre that most of the users are attracted to.

In [17]:
genre_android_apps = {}                # Frequency-Genre data for  android apps
for row in free_android_apps:
    genre = row[9]
    if genre in genre_android_apps:
        genre_android_apps[genre] += 1
    else:
        genre_android_apps[genre] = 1
maximum = 0
name = '' 
for i in genre_android_apps:            # To find most common genre in android apps
    if genre_android_apps[i] > maximum:
        maximum = genre_android_apps[i]
        name = i
print('The most common genre in android apps is', name)
print('The total number of users using common genre app are',maximum)

The most common genre in android apps is Tools
The total number of users using common genre app are 749


In [18]:
genre_apple_apps = {}
for row in free_apple_apps:
    genre = row[11]
    if genre in genre_apple_apps:
        genre_apple_apps[genre] += 1    
    else:
        genre_apple_apps[genre] = 1
maximum = 0
name = ''
for i in genre_apple_apps:
    if genre_apple_apps[i] > maximum:
        maximum = genre_apple_apps[i]
        name = i
print('The most common genre in iOS apps is', name)
print('The total number of users using common genre app are', maximum)

The most common genre in iOS apps is Games
The total number of users using common genre app are 1874


Getting the frequency only by number doesnt give much observation points. Using percentage- frequency tables helps to understand the data better and then arranging them in descending order leads us to analyse the most uused genre in both the markets.

- One function to generate frequency tables to show percentages
- Another function to display the data in descending order according to the percentages

In [19]:
def freq_table(dataset, index):
    frequency_table = {}
    total = 0
    for row in dataset:
        total += 1
        genre = row[index]
        if genre in frequency_table:
            frequency_table[genre] += 1
        else:
            frequency_table[genre] = 1
            
    percentages = {}
    for genre in frequency_table:
        percentage = (frequency_table[genre]/total) * 100
        percentages[genre] = percentage
        
    return percentages

def display_table(dataset, index):
    table = freq_table(dataset, index)
    table_display = []
    for key in table:
        key_val_as_tuple = (table[key], key)
        table_display.append(key_val_as_tuple)
        
    table_sorted = sorted(table_display, reverse = True)
    for entry in table_sorted:
        print(entry[1],':', entry[0])
    

In [20]:
display_table(free_apple_apps, -5) # to display the percentage-table of iOS  

Games : 58.16263190564867
Entertainment : 7.883302296710118
Photo & Video : 4.9658597144630665
Education : 3.662321539416512
Social Networking : 3.2898820608317814
Shopping : 2.60707635009311
Utilities : 2.5139664804469275
Sports : 2.1415270018621975
Music : 2.0484171322160147
Health & Fitness : 2.0173805090006205
Productivity : 1.7380509000620732
Lifestyle : 1.5828677839851024
News : 1.3345747982619491
Travel : 1.2414649286157666
Finance : 1.1173184357541899
Weather : 0.8690254500310366
Food & Drink : 0.8069522036002483
Reference : 0.5586592178770949
Business : 0.5276225946617008
Book : 0.4345127250155183
Navigation : 0.186219739292365
Medical : 0.186219739292365
Catalogs : 0.12414649286157665


As seen above from apple market, Games are the most common app that have 58% popularity and has you go down, we can see that fun apps are having good popularity (Games, Entertainment, Photo&Video, music and sports) where as other apps have lesser popularity irrespective of their productivity.

In [21]:
display_table(free_android_apps, 9) # percentage-table for android apps for genres

Tools : 8.450863138892023
Entertainment : 6.070179397495204
Education : 5.348076272142616
Business : 4.592124562789123
Productivity : 3.8925871601038025
Lifestyle : 3.8925871601038025
Finance : 3.7007785174320205
Medical : 3.5315355974275078
Sports : 3.463838429425702
Personalization : 3.317161232088458
Communication : 3.2381812027530184
Action : 3.102786866749408
Health & Fitness : 3.0802211440821394
Photography : 2.944826808078529
News & Magazines : 2.798149610741284
Social : 2.6627552747376737
Travel & Local : 2.324269434728647
Shopping : 2.245289405393208
Books & Reference : 2.1437436533904997
Simulation : 2.042197901387792
Dating : 1.8616721200496444
Arcade : 1.8503892587160102
Video Players & Editors : 1.771409229380571
Casual : 1.7601263680469368
Maps & Navigation : 1.399074805370642
Food & Drink : 1.241114746699763
Puzzle : 1.128286133363421
Racing : 0.9928917973598104
Role Playing : 0.9364774906916393
Libraries & Demo : 0.9364774906916393
Auto & Vehicles : 0.9251946293580051
S

In Google market, Tools genre is most common but it's only 8% of all apps. If you look at [tools](https://play.google.com/store/apps/collection/cluster?clp=ogoRCAESBVRPT0xTKgIIB1ICCAE%3D:S:ANO1ljJ5-OA&gsr=ChSiChEIARIFVE9PTFMqAggHUgIIAQ%3D%3D:S:ANO1ljJNkiQ) we can see that all are necessaty app for different devices and products or apps. We can see that most of the productivity apps (Education, Business, productivity, life style, finane, medical, personilisation, communication) are more in percentage when compared to fun&Entertainment apps.

In [22]:
display_table(free_android_apps, 1) # Category of android apps

FAMILY : 18.898792733837304
GAME : 9.725826469592688
TOOLS : 8.462146000225657
BUSINESS : 4.592124562789123
LIFESTYLE : 3.9038700214374367
PRODUCTIVITY : 3.8925871601038025
FINANCE : 3.7007785174320205
MEDICAL : 3.5315355974275078
SPORTS : 3.396141261423897
PERSONALIZATION : 3.317161232088458
COMMUNICATION : 3.2381812027530184
HEALTH_AND_FITNESS : 3.0802211440821394
PHOTOGRAPHY : 2.944826808078529
NEWS_AND_MAGAZINES : 2.798149610741284
SOCIAL : 2.6627552747376737
TRAVEL_AND_LOCAL : 2.335552296062281
SHOPPING : 2.245289405393208
BOOKS_AND_REFERENCE : 2.1437436533904997
DATING : 1.8616721200496444
VIDEO_PLAYERS : 1.7939749520478394
MAPS_AND_NAVIGATION : 1.399074805370642
FOOD_AND_DRINK : 1.241114746699763
EDUCATION : 1.1621347173643235
ENTERTAINMENT : 0.9590432133589079
LIBRARIES_AND_DEMO : 0.9364774906916393
AUTO_AND_VEHICLES : 0.9251946293580051
HOUSE_AND_HOME : 0.8236488773552973
WEATHER : 0.8010831546880289
EVENTS : 0.7108202640189552
PARENTING : 0.6544059573507841
ART_AND_DESIGN : 0

Both the category and genres columns don't have much difference. The genres column has more subsections when compared to category column. To get the bigger picture, we will work on Category column

Let's find out the kind of the apps most users use.

## Most popular apps

One way to find out most popular app is to calculate the average number of installs for each genre. To do this,

- Isolate the apps of each genre
- Sum up the user ratings of each genre
- Divide the sum by number of apps belonging to that genre

In android dataset, we already have column 'Installs' to get this information. Let's find it in iOS store as there is no information. Following the above stated steps to get average number of installs for each genre

In [23]:
frequency_table = {}
for row in free_apple_apps:
    genre = row[-5]
    if genre in frequency_table:
        frequency_table[genre] += 1
    else:
        frequency_table[genre] = 1

for key in frequency_table:
    total = 0
    len_genre = 0
    for row in free_apple_apps:
        genre_app = row[-5]
        if genre_app == key:
            rating = float(row[5])
            total = total + rating
            len_genre += 1
    average = total/len_genre
    print(key,':',average)


Social Networking : 71548.34905660378
Photo & Video : 28441.54375
Games : 22788.6696905016
Music : 57326.530303030304
Reference : 74942.11111111111
Health & Fitness : 23298.015384615384
Weather : 52279.892857142855
Utilities : 18684.456790123455
Travel : 28243.8
Shopping : 26919.690476190477
News : 21248.023255813954
Navigation : 86090.33333333333
Lifestyle : 16485.764705882353
Entertainment : 14029.830708661417
Food & Drink : 33333.92307692308
Sports : 23008.898550724636
Book : 39758.5
Finance : 31467.944444444445
Education : 7003.983050847458
Productivity : 21028.410714285714
Business : 7491.117647058823
Catalogs : 4004.0
Medical : 612.0


From the result above, its shown that navigation genre apps have high average ratings. Let's see the apps with navgation genre and analyse the apps that have high user rating

In [24]:
for row in free_apple_apps:
    if row[-5] == 'Navigation':
        print(row[1],':',row[5])

Waze - GPS Navigation, Maps & Real-time Traffic : 345046
Google Maps - Navigation & Transit : 154911
Geocaching® : 12811
CoPilot GPS – Car Navigation & Offline Maps : 3582
ImmobilienScout24: Real Estate Search in Germany : 187
Railway Route Search : 5


The above result signifies that most if the reviews are from apps Waze and Google Maps.

As shown above, same applies for other popular genres such as Games, Music, Social Networking. Since these genres already have popular apps such as facebook, instagram, spotify, etc. There is no space for development, but if we look into health and fitness or weather.

Let's see the apps in Weather genre

In [25]:
for row in free_apple_apps:
    if row[-5] == 'Weather':
        print(row[1],':',row[5])

The Weather Channel: Forecast, Radar & Alerts : 495626
The Weather Channel App for iPad – best local forecast, radar map, and storm tracking : 208648
WeatherBug - Local Weather, Radar, Maps, Alerts : 188583
MyRadar NOAA Weather Radar Forecast : 150158
AccuWeather - Weather for Life : 144214
Yahoo Weather : 112603
Weather Underground: Custom Forecast & Local Radar : 49192
NOAA Weather Radar - Weather Forecast & HD Radar : 45696
Weather Live Free - Weather Forecast & Alerts : 35702
Storm Radar : 22792
QuakeFeed Earthquake Map, Alerts, and News : 6081
Moji Weather - Free Weather Forecast : 2333
Hurricane by American Red Cross : 1158
Forecast Bar : 375
Hurricane Tracker WESH 2 Orlando, Central Florida : 203
FEMA : 128
iWeather - World weather forecast : 80
Weather - Radar - Storm with Morecast App : 78
Yurekuru Call : 53
Weather & Radar : 37
WRAL Weather Alert : 25
Météo-France : 24
JaxReady : 22
Freddy the Frogcaster's Weather Station : 14
Almanac Long-Range Weather Forecast : 12
TodayAir

This genre can have things to develop like showing weather for a beach dat, sunny day and suggesting the activities that can be done in that particular weather.

# Popular apps on Google play store

In android_dataset, we already have a column 'installs' which mentions the number of installs. Let's display the data  

In [28]:
display_table(free_android_apps, 5)

1,000,000+ : 15.728308699086089
100,000+ : 11.55365000564143
10,000,000+ : 10.549475346947986
10,000+ : 10.199706645605326
1,000+ : 8.394448832223853
100+ : 6.916393997517771
5,000,000+ : 6.826131106848697
500,000+ : 5.562450637481666
50,000+ : 4.772650344127271
5,000+ : 4.513144533453684
10+ : 3.542818458761142
500+ : 3.2494640640866526
50,000,000+ : 2.3017037120613786
100,000,000+ : 2.1324607920568655
50+ : 1.9180864267178157
5+ : 0.7898002933543946
1+ : 0.5077287600135394
500,000,000+ : 0.270788672007221
1,000,000,000+ : 0.2256572266726842
0+ : 0.045131445334536835


As seen in the result, it only gives the nearest integer. So, 100,000+ can be 500,000 or 765,456. As only desired data is the app which is popular. From above we can see that 15% of android apps have install count oof 1,000,000+ and so on. 

To see whic category of app has is most popular. we need to convert string to float by replacing  **','** and **'+'**

In [30]:
category_tab = freq_table(free_android_apps,1)

for category in category_tab:
    total = 0
    len_category = 0
    for app in free_android_apps:
        category_v = app[1]
        if category_v == category:
            n_installs = app[5]
            n_installs = n_installs.replace(',','')
            n_installs = n_installs.replace('+','')
            total += float(n_installs)
            len_category += 1
    avg_n_installs = total / len_category
    print(category, ':', avg_n_installs)
        
        
        

ART_AND_DESIGN : 1986335.0877192982
AUTO_AND_VEHICLES : 647317.8170731707
BEAUTY : 513151.88679245283
BOOKS_AND_REFERENCE : 8767811.894736841
BUSINESS : 1712290.1474201474
COMICS : 817657.2727272727
COMMUNICATION : 38456119.167247385
DATING : 854028.8303030303
EDUCATION : 1833495.145631068
ENTERTAINMENT : 11640705.88235294
EVENTS : 253542.22222222222
FINANCE : 1387692.475609756
FOOD_AND_DRINK : 1924897.7363636363
HEALTH_AND_FITNESS : 4188821.9853479853
HOUSE_AND_HOME : 1331540.5616438356
LIBRARIES_AND_DEMO : 638503.734939759
LIFESTYLE : 1437816.2687861272
GAME : 15588015.603248259
FAMILY : 3697848.1731343283
MEDICAL : 120550.61980830671
SOCIAL : 23253652.127118643
SHOPPING : 7036877.311557789
PHOTOGRAPHY : 17840110.40229885
SPORTS : 3638640.1428571427
TRAVEL_AND_LOCAL : 13984077.710144928
TOOLS : 10801391.298666667
PERSONALIZATION : 5201482.6122448975
PRODUCTIVITY : 16787331.344927534
PARENTING : 542603.6206896552
WEATHER : 5074486.197183099
VIDEO_PLAYERS : 24727872.452830188
NEWS_AND_

The Communication apps have the most installed apps approximately 38,000,000. Mostly it can be guessed the list would include what's app, facebook , instagram and various other social network apps. Most of these aforementioned apps would have occupied the most number of installs.

Let,s visual each app in this category with high number of installs

In [32]:
for row in free_android_apps:
    if row[1] == 'COMMUNICATION' and (row[5] =='1,000,000,000+' or
                                      row[5] == '500,000,000+' or
                                      row[5] == '100,000,000+'):
        print(row[0],':',row[5])

WhatsApp Messenger : 1,000,000,000+
imo beta free calls and text : 100,000,000+
Android Messages : 100,000,000+
Google Duo - High Quality Video Calls : 500,000,000+
Messenger – Text and Video Chat for Free : 1,000,000,000+
imo free video calls and chat : 500,000,000+
Skype - free IM & video calls : 1,000,000,000+
Who : 100,000,000+
GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+
LINE: Free Calls & Messages : 500,000,000+
Google Chrome: Fast & Secure : 1,000,000,000+
Firefox Browser fast & private : 100,000,000+
UC Browser - Fast Download Private & Secure : 500,000,000+
Gmail : 1,000,000,000+
Hangouts : 1,000,000,000+
Messenger Lite: Free Calls & Messages : 100,000,000+
Kik : 100,000,000+
KakaoTalk: Free Calls & Text : 100,000,000+
Opera Mini - fast web browser : 100,000,000+
Opera Browser: Fast and Secure : 100,000,000+
Telegram : 100,000,000+
Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+
UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+
Viber Mess

We can see that, most of these are already well developed. It would be thae same for other categories like video players (i.e. Youtube, Mx player, etc.). As already these niches are already well dominated and they are hard to compete against.

Previously we decided we can work on weather category apps as there is scope to develop free apps. Lets see in android market, and number of installs af all the apps in this category.

In [33]:
for app in free_android_apps:
    if app[1] == 'WEATHER':
        print(app[0], ':', app[5])

The Weather Channel: Rain Forecast & Storm Alerts : 50,000,000+
Weather forecast : 1,000,000+
AccuWeather: Daily Forecast & Live Weather Reports : 50,000,000+
Live Weather Pro : 10,000+
Weather by WeatherBug: Forecast, Radar & Alerts : 10,000,000+
weather - weather forecast : 1,000,000+
MyRadar NOAA Weather Radar : 10,000,000+
SMHI Weather : 1,000,000+
Free live weather on screen : 1,000,000+
Weather Radar Widget : 1,000,000+
Weather –Simple weather forecast : 10,000,000+
Weather Crave : 5,000,000+
Klara weather : 500,000+
Yahoo Weather : 10,000,000+
Real time Weather Forecast : 1,000,000+
METEO FRANCE : 5,000,000+
APE Weather ( Live Forecast) : 5,000,000+
Live Weather & Daily Local Weather Forecast : 1,000,000+
Weather : 10,000,000+
Rainfall radar - weather : 5,000,000+
Yahoo! Weather for SH Forecast for understanding the approach of rain clouds Free : 1,000,000+
The Weather Network : 5,000,000+
Klart.se - Sweden's best weather : 1,000,000+
GO Weather - Widget, Theme, Wallpaper, Effic

Let's see the popular apps in the weather category.

In [35]:
for row in free_android_apps:
    if row[1] == 'WEATHER' and (row[5] == '50,000,000+' or
                               row[5] == '10,000,000+' or
                               row[5] == '1,000,000+'):
        print(row[0],':',row[5])

The Weather Channel: Rain Forecast & Storm Alerts : 50,000,000+
Weather forecast : 1,000,000+
AccuWeather: Daily Forecast & Live Weather Reports : 50,000,000+
Weather by WeatherBug: Forecast, Radar & Alerts : 10,000,000+
weather - weather forecast : 1,000,000+
MyRadar NOAA Weather Radar : 10,000,000+
SMHI Weather : 1,000,000+
Free live weather on screen : 1,000,000+
Weather Radar Widget : 1,000,000+
Weather –Simple weather forecast : 10,000,000+
Yahoo Weather : 10,000,000+
Real time Weather Forecast : 1,000,000+
Live Weather & Daily Local Weather Forecast : 1,000,000+
Weather : 10,000,000+
Yahoo! Weather for SH Forecast for understanding the approach of rain clouds Free : 1,000,000+
Klart.se - Sweden's best weather : 1,000,000+
GO Weather - Widget, Theme, Wallpaper, Efficient : 50,000,000+
Info BMKG : 1,000,000+
wetter.com - Weather and Radar : 10,000,000+
Storm Radar: Tornado Tracker & Hurricane Alerts : 1,000,000+
Yandex.Weather : 10,000,000+
Wetter by t-online.de : 1,000,000+
HTC We

 We can see that most of these apps are based on weather forecast and determinig the weather conditions. So, its probably not a good idea to develop similar apps. Instead we can work on apps suggesting things or to be eaten according to the weather conditions. This could be one idea to start to develop on!!
 
 
 # Conclusions
 
 In this project, analysis of data about app store and google play store has been done to reccommmend most popular category of apps to develop on at free cost.
 
We can conclude that weather category is one particular niche which has scope to grow. The markets are full of weather forecasting but to this we can add on reccomendations based on weather conditions. The recommendations can be releated to cook stuffs, order food, etc. 