# Profitable App Profiles for the Google Play Market

Our aim in this project is to find mobile app profiles that are profitable for the Google Play market. We're working as data analysts for a company that builds Android mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build.

At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.

* [A data set](https://www.kaggle.com/lava18/google-play-store-apps/home) containing data about approximately ten thousand Android apps from Google Play

In [None]:

# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current director

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
android = pd.read_csv("/kaggle/input/google-play-store-apps/googleplaystore.csv")

In [None]:
android.head(3)

In [None]:
type(android)

In [None]:
android.shape

In [None]:
10841*13

### in the discussion category i am seen that there is wrong entry in the 10472 row that's why i am decided to drop this row

In [None]:
android.iloc[10472]

In [None]:
android.head(1)

In [None]:
# android = android.drop(10472, axis = 0)
android.drop(10472, axis = 0, inplace = True)

In [None]:
android.iloc[10472]

In [None]:
android.loc[10473]

### it will give error because wrong entry has been removed

In [None]:
android.drop(10472, axis = 0)

In [None]:
android.duplicated(["App"])

In [None]:
# Facebook  False
# Facebook  True
# Facebook  True

In [None]:
android.duplicated(['App']).sum()

In [None]:
bol = android.duplicated(['App'])


android[bol]

In [None]:
android["App"].head()

In [None]:
android["App"] == "Instagram"

In [None]:
android[android["App"] == "Instagram"]

In [None]:
type(android["Reviews"])

In [None]:
android.info()

In [None]:
android["Reviews"].head()

In [None]:
# android["Reviews"].sum()

In [None]:
android["Reviews"] = android["Reviews"].astype(float)

In [None]:
android["Reviews"].head()

In [None]:
# android.drop_duplicates(keep = "last")

In [None]:
android.sort_values("Reviews", ascending = False).head(15)

In [None]:
android.sort_values("Reviews", ascending = False, inplace = True)

In [None]:
android.head(10)

### app having maximum reviews.
1. Facebook
2. whatsapp
3. Instagram
4. Messenger

### these giants are at the top

In [None]:
# android.drop_duplicates(keep = "first")
android.drop_duplicates(["App"], inplace = True)

In [None]:
android.shape

In [None]:
android["Type"].unique()

In [None]:
android.info()

In [None]:
android["Type"].value_counts(dropna = False)

In [None]:
android["Type"] == "Free"

In [None]:
android = android[android["Type"] == "Free"]

In [None]:
android.shape

### there are multiple apps in different languages also.
* arabic
* hindi
* chineese
* japneese
* bangali

### emojis also.
* 👍
* 🔥
* 😍
* 💘

In [None]:

android["App"].sort_values().tail(20)

0-127 -> 128 -> ASCII Range

In [None]:
# Facebook


ord("F")

In [None]:
ord("k")

In [None]:
ord("🔥")

In [None]:
ap = "Facebook🔥"

for i in ap:
    print(ord(i))

In [None]:
for i in ap:
    if ord(i)<128:
        print("Ascii")
    else:
        print("non_ascii")
    

### i am decided that i am analyzing only english app

In [None]:
def english(app_name):
    for i in app_name:
        if ord(i)<128:
            return True
        else:
            return False
    

In [None]:
# print()
english("Facebook")

In [None]:
english("🔥Facebook")

In [None]:
english("Facebリスニングook🔥")

In [None]:
def english(app_name):
    non_eng = 0
    for i in app_name:
        if ord(i)>127:
            non_eng+=1
    if non_eng>3:
        return False
    else:
        return True

In [None]:
english("Facebリスニングook🔥")

In [None]:
english("Facebスニングook🔥")

In [None]:
english("Facebスニook🔥")

In [None]:
is_english = android["App"].apply(english)
is_english

### there are 8863 english app are.

In [None]:
android["App"].apply(english).sum()

In [None]:
android_final = android[is_english].copy()

In [None]:
android_final.shape

In [None]:
android_final.head()

In [None]:
android_final.drop(["Type", "Price"], axis = 1, inplace = True)

In [None]:
android_final.shape

In [None]:
android_final.head()

In [None]:
android_final["Installs"] = android_final["Installs"].str.replace(",","").str.replace("+","").astype(int)

In [None]:
android_final["Category"].value_counts()

### for category wise family category have maximum apps 

In [None]:
len(android_final["Category"].unique())

In [None]:
cat = android_final["Category"].unique()

cat

In [None]:
for i in cat[:3]:
    print(i)

In [None]:
cat_mean_ins = {}
for i in cat:
    df = android_final[android_final["Category"] == i]
    mean = df["Installs"].mean()
    cat_mean_ins[i] = mean



### For category wise "Social" category have maximum mean installation.
### In the social category all the giants are there like.
* Facebook
* Instagram 
* Twitter

In [None]:
cat_mean_ins

In [None]:
cat_means = android_final.groupby("Category")["Installs"].mean()

In [None]:
cat_means.head()

In [None]:
sorted_cat = cat_means.sort_values(ascending = False)

### After sorting category column "Communication" are at the top.

### Again in the Communication column more giants are there like\

* Whatsapp
* Skype
* Gmail
* Messenger
* Google Chrome

### no one will compete with them.

In [None]:
sorted_cat.head(15)

In [None]:
com = android_final[android_final["Category"] == "COMMUNICATION"]

In [None]:
com[["App", "Installs"]].sort_values("Installs", ascending = False).head(15)

In [None]:
com[com["Installs"]< 100000000]["Installs"].mean()

### The Communication category are totally dependent upon the giant like.
* Whatsapp
* Skype
* Gmail
* Messenger
* Google Chrome

### As a data analyst i am not compete with the giants like these i mention above,i recommended my company developer to  made a app from top 4,5,6 category like productivity , game , photography to made app from these category which would be benefecial for our company and take low cost.

### this is my first project with pandas if you like plz vote my notebook.