<h1 style="color:violet;"> Wrestle Android App Store Data into Beautiful Looking Charts with Plotly</h1>

<p style="color:smokewhite; font-size:20px;">Have you ever thought about building your own an iOS or Android app? If so, then you probably have wondered about how things work in the app stores. Today we'll replicate some of the app store analytics provided by companies like App Annie or Sensor Tower that helps inform development and app marketing strategies for many companies. This stuff is BIG business!</p>

<h2 style="color:violet"><em>Questions to answer</em></h2>
<ol style="font-size:20px">
<li>How competitive different app categories (e.g., Games, Lifestyle, Weather) are</li>
<li>Which app category offers compelling opportunities based on its popularity</li>
<li>How many downloads you would give up by making your app paid vs. free</li>
<li>How much you can reasonably charge for a paid app</li>
<li>Which paid apps have had the highest revenue</li>
<li>How many paid apps will recoup their development costs based on their sales revenue</li>

</ol>

<h3 style="color:violet;"><em>Importing the needed packages</em></h3>

In [47]:
import pandas as pd
import plotly.express as px

<h3 style="color:violet;"><em>Loading the data</em></h3>

In [3]:
app_df = pd.read_csv("./data/apps.csv")

<h3 style="color:violet;"><em>Preliminary Explorations</em></h3>

In [4]:
app_df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Last_Updated,Android_Ver
0,Ak Parti Yardım Toplama,SOCIAL,,0,8.7,0,Paid,$13.99,Teen,Social,"July 28, 2017",4.1 and up
1,Ain Arabic Kids Alif Ba ta,FAMILY,,0,33.0,0,Paid,$2.99,Everyone,Education,"April 15, 2016",3.0 and up
2,Popsicle Launcher for Android P 9.0 launcher,PERSONALIZATION,,0,5.5,0,Paid,$1.49,Everyone,Personalization,"July 11, 2018",4.2 and up
3,Command & Conquer: Rivals,FAMILY,,0,19.0,0,,0,Everyone 10+,Strategy,"June 28, 2018",Varies with device
4,CX Network,BUSINESS,,0,10.0,0,Free,0,Everyone,Business,"August 6, 2018",4.1 and up


In [12]:
app_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 12 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             10841 non-null  object 
 1   Category        10841 non-null  object 
 2   Rating          9367 non-null   float64
 3   Reviews         10841 non-null  int64  
 4   Size_MBs        10841 non-null  float64
 5   Installs        10841 non-null  object 
 6   Type            10840 non-null  object 
 7   Price           10841 non-null  object 
 8   Content_Rating  10841 non-null  object 
 9   Genres          10841 non-null  object 
 10  Last_Updated    10841 non-null  object 
 11  Android_Ver     10839 non-null  object 
dtypes: float64(2), int64(1), object(9)
memory usage: 1016.5+ KB


In [8]:
app_df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Rating,9367.0,4.191513,0.5157352,1.0,4.0,4.3,4.5,5.0
Reviews,10841.0,444111.928051,2927629.0,0.0,38.0,2094.0,54768.0,78158306.0
Size_MBs,10841.0,19.774147,21.40435,0.008301,4.9,11.0,27.0,100.0


<h3 style="color:violet;"><em>Droping Columns that are not needed</em></h3>

In [13]:
app_df.drop(columns=['Last_Updated', 'Android_Ver'], inplace=True)

In [14]:
app_df.head()

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
0,Ak Parti Yardım Toplama,SOCIAL,,0,8.7,0,Paid,$13.99,Teen,Social
1,Ain Arabic Kids Alif Ba ta,FAMILY,,0,33.0,0,Paid,$2.99,Everyone,Education
2,Popsicle Launcher for Android P 9.0 launcher,PERSONALIZATION,,0,5.5,0,Paid,$1.49,Everyone,Personalization
3,Command & Conquer: Rivals,FAMILY,,0,19.0,0,,0,Everyone 10+,Strategy
4,CX Network,BUSINESS,,0,10.0,0,Free,0,Everyone,Business


<h3 style="color:violet;"><em>Checking missing values and Dropping them</em></h3>

In [15]:
app_df.isna().sum()

App                  0
Category             0
Rating            1474
Reviews              0
Size_MBs             0
Installs             0
Type                 1
Price                0
Content_Rating       0
Genres               0
dtype: int64

In [16]:
clean_df = app_df.dropna()

In [17]:
clean_df.isna().sum()

App               0
Category          0
Rating            0
Reviews           0
Size_MBs          0
Installs          0
Type              0
Price             0
Content_Rating    0
Genres            0
dtype: int64

In [18]:
clean_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 9367 entries, 21 to 10840
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   App             9367 non-null   object 
 1   Category        9367 non-null   object 
 2   Rating          9367 non-null   float64
 3   Reviews         9367 non-null   int64  
 4   Size_MBs        9367 non-null   float64
 5   Installs        9367 non-null   object 
 6   Type            9367 non-null   object 
 7   Price           9367 non-null   object 
 8   Content_Rating  9367 non-null   object 
 9   Genres          9367 non-null   object 
dtypes: float64(2), int64(1), object(7)
memory usage: 805.0+ KB


<h3 style="color:violet;"><em>Checking for duplicate and removing them</em></h3>

In [25]:
clean_df.duplicated().sum()

476

In [23]:
clean_df[clean_df.duplicated()].sort_values('App')

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
6371,10 Best Foods for You,HEALTH_AND_FITNESS,4.0,2490,3.8,500000,Free,0,Everyone 10+,Health & Fitness
7159,1800 Contacts - Lens Store,MEDICAL,4.7,23160,26.0,1000000,Free,0,Everyone,Medical
2604,2017 EMRA Antibiotic Guide,MEDICAL,4.4,12,3.8,1000,Paid,$16.99,Everyone,Medical
5140,21-Day Meditation Experience,HEALTH_AND_FITNESS,4.4,11506,15.0,100000,Free,0,Everyone,Health & Fitness
946,420 BZ Budeze Delivery,MEDICAL,5.0,2,11.0,100,Free,0,Mature 17+,Medical
...,...,...,...,...,...,...,...,...,...,...
9635,"theScore: Live Sports Scores, News, Stats & Vi...",SPORTS,4.4,133833,34.0,10000000,Free,0,Everyone 10+,Sports
9634,"theScore: Live Sports Scores, News, Stats & Vi...",SPORTS,4.4,133833,34.0,10000000,Free,0,Everyone 10+,Sports
9632,"theScore: Live Sports Scores, News, Stats & Vi...",SPORTS,4.4,133825,34.0,10000000,Free,0,Everyone 10+,Sports
10156,trivago: Hotels & Travel,TRAVEL_AND_LOCAL,4.2,219848,12.0,50000000,Free,0,Everyone,Travel & Local


In [26]:
cleaned_df = clean_df.drop_duplicates(subset=["App", "Type", "Price"])

In [30]:
cleaned_df.duplicated().sum()

0

<h4 style="color:smokewhite;">Which Apps are the highest rated</h4>

In [32]:
cleaned_df['Rating'].max()

5.0

In [33]:
cleaned_df.loc[cleaned_df['Rating'] == cleaned_df['Rating'].max()]

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.0,1,Free,0,Everyone,Medical
28,Ra Ga Ba,GAME,5.0,2,20.0,1,Paid,$1.49,Everyone,Arcade
47,Mu.F.O.,GAME,5.0,2,16.0,1,Paid,$0.99,Everyone,Arcade
82,Brick Breaker BR,GAME,5.0,7,19.0,5,Free,0,Everyone,Arcade
99,Anatomy & Physiology Vocabulary Exam Review App,MEDICAL,5.0,1,4.6,5,Free,0,Everyone,Medical
...,...,...,...,...,...,...,...,...,...,...
2680,Florida Wildflowers,FAMILY,5.0,5,69.0,1000,Free,0,Everyone,Education
2750,"Superheroes, Marvel, DC, Comics, TV, Movies News",COMICS,5.0,34,12.0,5000,Free,0,Everyone,Comics
3030,CL Keyboard - Myanmar Keyboard (No Ads),TOOLS,5.0,24,3.2,5000,Free,0,Everyone,Tools
3115,Oración CX,LIFESTYLE,5.0,103,3.8,5000,Free,0,Everyone,Lifestyle


In [41]:
# Alternatively
cleaned_df.sort_values("Rating", ascending=False).head()

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.0,1,Free,0,Everyone,Medical
1230,Sway Medical,MEDICAL,5.0,3,22.0,100,Free,0,Everyone,Medical
1227,AJ Men's Grooming,LIFESTYLE,5.0,2,22.0,100,Free,0,Everyone,Lifestyle
1224,FK Dedinje BGD,SPORTS,5.0,36,2.6,100,Free,0,Everyone,Sports
1223,CB VIDEO VISION,PHOTOGRAPHY,5.0,13,2.6,100,Free,0,Everyone,Photography


<h4 style="color:smokewhite;">Which Apps are the largest in Downloadable size</h4>

In [35]:
cleaned_df['Size_MBs'].max()

100.0

In [36]:
cleaned_df.loc[cleaned_df['Size_MBs'] == cleaned_df['Size_MBs'].max()]

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
1795,Navi Radiography Pro,MEDICAL,4.7,11,100.0,500,Paid,$15.99,Everyone,Medical
3144,Vi Trainer,HEALTH_AND_FITNESS,3.6,124,100.0,5000,Free,0,Everyone,Health & Fitness
4176,Car Crash III Beam DH Real Damage Simulator 2018,GAME,3.6,151,100.0,10000,Free,0,Everyone,Racing
7926,Post Bank,FINANCE,4.5,60449,100.0,1000000,Free,0,Everyone,Finance
7927,The Walking Dead: Our World,GAME,4.0,22435,100.0,1000000,Free,0,Teen,Action
7928,Stickman Legends: Shadow Wars,GAME,4.4,38419,100.0,1000000,Paid,$0.99,Everyone 10+,Action
8718,Mini Golf King - Multiplayer Game,GAME,4.5,531458,100.0,5000000,Free,0,Everyone,Sports
8719,Draft Simulator for FUT 18,SPORTS,4.6,162933,100.0,5000000,Free,0,Everyone,Sports
9942,Talking Babsy Baby: Baby Games,LIFESTYLE,4.0,140995,100.0,10000000,Free,0,Everyone,Lifestyle;Pretend Play
9943,Miami crime simulator,GAME,4.0,254518,100.0,10000000,Free,0,Mature 17+,Action


In [42]:
#Alternatively
cleaned_df.sort_values("Size_MBs", ascending=False)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
9942,Talking Babsy Baby: Baby Games,LIFESTYLE,4.0,140995,100.000000,10000000,Free,0,Everyone,Lifestyle;Pretend Play
10687,Hungry Shark Evolution,GAME,4.5,6074334,100.000000,100000000,Free,0,Teen,Arcade
9943,Miami crime simulator,GAME,4.0,254518,100.000000,10000000,Free,0,Mature 17+,Action
9944,Gangster Town: Vice District,FAMILY,4.3,65146,100.000000,10000000,Free,0,Mature 17+,Simulation
3144,Vi Trainer,HEALTH_AND_FITNESS,3.6,124,100.000000,5000,Free,0,Everyone,Health & Fitness
...,...,...,...,...,...,...,...,...,...,...
2648,Ad Remove Plugin for App2SD,PRODUCTIVITY,4.1,66,0.016602,1000,Paid,$1.29,Everyone,Productivity
5798,ExDialer PRO Key,COMMUNICATION,4.5,5474,0.016602,100000,Paid,$3.99,Everyone,Communication
2684,My baby firework (Remove ad),FAMILY,4.1,30,0.013672,1000,Paid,$0.99,Everyone,Entertainment
7966,Market Update Helper,LIBRARIES_AND_DEMO,4.1,20145,0.010742,1000000,Free,0,Everyone,Libraries & Demo


<h4 style="color:smokewhite;">Which Apps have the highest number of Reviews</h4>

In [37]:
cleaned_df['Reviews'].max()

78158306

In [38]:
cleaned_df.loc[cleaned_df['Reviews'] == cleaned_df["Reviews"].max()]

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
10805,Facebook,SOCIAL,4.1,78158306,5.3,1000000000,Free,0,Teen,Social


In [43]:
# Alternatively
cleaned_df.sort_values("Reviews", ascending=False)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
10805,Facebook,SOCIAL,4.1,78158306,5.3,1000000000,Free,0,Teen,Social
10785,WhatsApp Messenger,COMMUNICATION,4.4,69119316,3.5,1000000000,Free,0,Everyone,Communication
10806,Instagram,SOCIAL,4.5,66577313,5.3,1000000000,Free,0,Teen,Social
10784,Messenger – Text and Video Chat for Free,COMMUNICATION,4.0,56642847,3.5,1000000000,Free,0,Everyone,Communication
10650,Clash of Clans,GAME,4.6,44891723,98.0,100000000,Free,0,Everyone 10+,Strategy
...,...,...,...,...,...,...,...,...,...,...
453,Wowkwis aq Ka'qaquj,FAMILY,5.0,1,49.0,10,Free,0,Everyone,Education;Education
462,CB Fit,HEALTH_AND_FITNESS,5.0,1,7.8,10,Free,0,Everyone,Health & Fitness
901,ES Billing System (Offline App),PRODUCTIVITY,5.0,1,4.2,100,Free,0,Everyone,Productivity
1416,Ek Kahani Aisi Bhi Season 3 - The Horror Story,FAMILY,3.0,1,5.8,100,Free,0,Teen,Entertainment


<h3 style="color:violet;"><em>Ploting with Plotly</em></h3>

In [46]:
content_dist = cleaned_df['Content_Rating'].value_counts()
content_dist

Content_Rating
Everyone           6621
Teen                912
Mature 17+          357
Everyone 10+        305
Adults only 18+       3
Unrated               1
Name: count, dtype: int64

In [54]:
fig = px.pie(labels=content_dist.index, values=content_dist)
fig.show()


In [55]:
fig = px.pie(labels=content_dist.index, values=content_dist, title="Content Rating", names=content_dist.index)
fig.update_traces(textposition="outside", textinfo="percent+label")
fig.show()


In [56]:
fig = px.pie(labels=content_dist.index, values=content_dist, title="Content Rating", names=content_dist.index, hole=0.6)
fig.update_traces(textposition="inside", textinfo="percent", textfont_size=15)
fig.show()


<h4 style="color:smokewhite;">How many Apps have more than 1 Billion Installs</h4>

In [58]:
cleaned_df['Installs'].dtype

dtype('O')

In [59]:
cleaned_df['Installs'].unique()

array(['1', '5', '10', '50', '100', '500', '1,000', '5,000', '10,000',
       '50,000', '100,000', '500,000', '1,000,000', '5,000,000',
       '10,000,000', '50,000,000', '100,000,000', '500,000,000',
       '1,000,000,000'], dtype=object)

In [64]:
clean_df["Installs"] = cleaned_df['Installs'].str.replace(',', '').astype("int")

AttributeError: Can only use .str accessor with string values!

In [65]:
cleaned_df['Installs'].dtype

dtype('int32')

In [67]:
cleaned_df[["App", "Installs"]].groupby('Installs').count()

Unnamed: 0_level_0,App
Installs,Unnamed: 1_level_1
1,3
5,9
10,69
50,56
100,303
500,199
1000,698
5000,425
10000,988
50000,457


In [68]:
cleaned_df["Price"].dtype

dtype('O')

In [69]:
cleaned_df["Price"].unique()

array(['0', '$1.49', '$0.99', '$14.99', '$2.49', '$3.04', '$3.49',
       '$2.99', '$4.99', '$29.99', '$19.99', '$1.99', '$1.76', '$8.99',
       '$1.75', '$12.99', '$7.99', '$3.99', '$3.88', '$4.59', '$9.99',
       '$14.00', '$4.49', '$6.99', '$399.99', '$5.99', '$39.99', '$3.02',
       '$6.49', '$17.99', '$2.00', '$15.99', '$10.99', '$1.61', '$4.29',
       '$15.46', '$7.49', '$5.49', '$1.70', '$379.99', '$79.99', '$2.56',
       '$8.49', '$1.50', '$4.84', '$19.40', '$33.99', '$3.08', '$16.99',
       '$37.99', '$10.00', '$1.29', '$4.60', '$1.00', '$24.99', '$2.50',
       '$3.90', '$2.95', '$1.97', '$3.28', '$299.99', '$389.99', '$1.20',
       '$400.00', '$2.59', '$18.99', '$1.59', '$11.99', '$4.77', '$9.00',
       '$3.95', '$13.99', '$2.90'], dtype=object)

In [72]:
cleaned_df.loc[:, "Price"] = cleaned_df['Price'].str.replace('$', '').astype("float")

In [75]:
cleaned_df.sort_values("Price", ascending=False).head(20)

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres
3946,I'm Rich - Trump Edition,LIFESTYLE,3.6,275,7.3,10000,Paid,400.0,Everyone,Lifestyle
2461,I AM RICH PRO PLUS,FINANCE,4.0,36,41.0,1000,Paid,399.99,Everyone,Finance
4606,I Am Rich Premium,FINANCE,4.1,1867,4.7,50000,Paid,399.99,Everyone,Finance
3145,I am rich(premium),FINANCE,3.5,472,0.942383,5000,Paid,399.99,Everyone,Finance
3554,💎 I'm rich,LIFESTYLE,3.8,718,26.0,10000,Paid,399.99,Everyone,Lifestyle
5765,I am rich,LIFESTYLE,3.8,3547,1.8,100000,Paid,399.99,Everyone,Lifestyle
1946,I am rich (Most expensive app),FINANCE,4.1,129,2.7,1000,Paid,399.99,Teen,Finance
2775,I Am Rich Pro,FAMILY,4.4,201,2.7,5000,Paid,399.99,Everyone,Entertainment
3221,I am Rich Plus,FAMILY,4.0,856,8.7,10000,Paid,399.99,Everyone,Entertainment
3114,I am Rich,FINANCE,4.3,180,3.8,5000,Paid,399.99,Everyone,Finance


In [77]:
apps_below_250 = cleaned_df.loc[cleaned_df['Price']<=250]

In [79]:
apps_below_250["Revenue_Estimate"] =apps_below_250['Installs'] * apps_below_250["Price"]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [80]:
apps_below_250

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Revenue_Estimate
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.00,1,Free,0.0,Everyone,Medical,0.0
28,Ra Ga Ba,GAME,5.0,2,20.00,1,Paid,1.49,Everyone,Arcade,1.49
47,Mu.F.O.,GAME,5.0,2,16.00,1,Paid,0.99,Everyone,Arcade,0.99
82,Brick Breaker BR,GAME,5.0,7,19.00,5,Free,0.0,Everyone,Arcade,0.0
99,Anatomy & Physiology Vocabulary Exam Review App,MEDICAL,5.0,1,4.60,5,Free,0.0,Everyone,Medical,0.0
...,...,...,...,...,...,...,...,...,...,...,...
10824,Google Drive,PRODUCTIVITY,4.4,2731171,4.00,1000000000,Free,0.0,Everyone,Productivity,0.0
10828,YouTube,VIDEO_PLAYERS,4.3,25655305,4.65,1000000000,Free,0.0,Teen,Video Players & Editors,0.0
10829,Google Play Movies & TV,VIDEO_PLAYERS,3.7,906384,4.65,1000000000,Free,0.0,Teen,Video Players & Editors,0.0
10831,Google News,NEWS_AND_MAGAZINES,3.9,877635,13.00,1000000000,Free,0.0,Teen,News & Magazines,0.0


<h4 style="color:smokewhite;">What are the top 10 Highest Gross-paid Apps</h4>

In [84]:
top_10 = apps_below_250.loc[apps_below_250['Type']=="Paid"].sort_values("Revenue_Estimate", ascending=False).head(10)
top_10

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Revenue_Estimate
9220,Minecraft,FAMILY,4.5,2376564,19.0,10000000,Paid,6.99,Everyone 10+,Arcade;Action & Adventure,69900000.0
8825,Hitman Sniper,GAME,4.6,408292,29.0,10000000,Paid,0.99,Mature 17+,Action,9900000.0
7151,Grand Theft Auto: San Andreas,GAME,4.4,348962,26.0,1000000,Paid,6.99,Mature 17+,Action,6990000.0
7477,Facetune - For Free,PHOTOGRAPHY,4.4,49553,48.0,1000000,Paid,5.99,Everyone,Photography,5990000.0
7977,Sleep as Android Unlock,LIFESTYLE,4.5,23966,0.851562,1000000,Paid,5.99,Everyone,Lifestyle,5990000.0
6594,DraStic DS Emulator,GAME,4.6,87766,12.0,1000000,Paid,4.99,Everyone,Action,4990000.0
6082,Weather Live,WEATHER,4.5,76593,4.75,500000,Paid,5.99,Everyone,Weather,2995000.0
7633,Five Nights at Freddy's,GAME,4.6,100805,50.0,1000000,Paid,2.99,Teen,Action,2990000.0
7954,Bloons TD 5,FAMILY,4.6,190086,94.0,1000000,Paid,2.99,Everyone,Strategy,2990000.0
7044,Tasker,TOOLS,4.6,43045,3.4,1000000,Paid,2.99,Everyone,Tools,2990000.0


<h4 style="color:smokewhite;">Out of the top 10, how many are games?</h4>


In [85]:
top_10['Category'].value_counts()

Category
GAME           4
FAMILY         2
PHOTOGRAPHY    1
LIFESTYLE      1
WEATHER        1
TOOLS          1
Name: count, dtype: int64

In [86]:
apps_below_250['Category'].nunique()

33

In [90]:
top_10_categories = apps_below_250['Category'].value_counts().head(10)

In [91]:
bar = px.bar(x=top_10_categories.index, y=top_10_categories.values)
bar.show()


In [104]:
categories_installs = apps_below_250.groupby("Category").agg({"Installs":pd.Series.mean})

In [105]:
categories_installs.sort_values("Installs", ascending=True, inplace=True)

In [106]:
categories_installs

Unnamed: 0_level_0,Installs
Category,Unnamed: 1_level_1
MEDICAL,134118.8
EVENTS,354431.3
PARENTING,622322.2
BEAUTY,640861.9
AUTO_AND_VEHICLES,727805.5
LIBRARIES_AND_DEMO,813796.9
COMICS,832057.4
DATING,1051585.0
FINANCE,1538005.0
HOUSE_AND_HOME,1565839.0


In [107]:
barh = px.bar(x=categories_installs.Installs, y=categories_installs.index, orientation='h')
barh.update_layout(xaxis_title="Number of Downloads", yaxis_title="Category")
barh.show()

In [108]:
cat_df = apps_below_250.groupby("Category").agg({"App":pd.Series.count, "Installs":pd.Series.mean})

In [109]:
cat_df

Unnamed: 0_level_0,App,Installs
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
ART_AND_DESIGN,61,1872674.0
AUTO_AND_VEHICLES,73,727805.5
BEAUTY,42,640861.9
BOOKS_AND_REFERENCE,169,9856755.0
BUSINESS,262,2641291.0
COMICS,54,832057.4
COMMUNICATION,257,42954250.0
DATING,134,1051585.0
EDUCATION,118,2990271.0
ENTERTAINMENT,102,20722160.0


In [110]:
cat_df.sort_values("App", ascending=False, inplace=True)
cat_df

Unnamed: 0_level_0,App,Installs
Category,Unnamed: 1_level_1,Unnamed: 2_level_1
FAMILY,1606,2763110.0
GAME,910,15229410.0
TOOLS,719,11265260.0
PRODUCTIVITY,301,19229470.0
PERSONALIZATION,298,5142124.0
LIFESTYLE,297,1695660.0
FINANCE,296,1538005.0
MEDICAL,292,134118.8
PHOTOGRAPHY,263,17677350.0
BUSINESS,262,2641291.0


In [112]:
scat = px.scatter(data_frame=cat_df, x="App", y="Installs", title="Category Concentration", size="App", hover_name=cat_df.index, color="Installs")
scat.update_layout(xaxis_title="Number of Apps (lower=More concentrations)", yaxis_title="Installs", yaxis=dict(type="log"))
scat.show()

In [113]:
apps_below_250.head()

Unnamed: 0,App,Category,Rating,Reviews,Size_MBs,Installs,Type,Price,Content_Rating,Genres,Revenue_Estimate
21,KBA-EZ Health Guide,MEDICAL,5.0,4,25.0,1,Free,0.0,Everyone,Medical,0.0
28,Ra Ga Ba,GAME,5.0,2,20.0,1,Paid,1.49,Everyone,Arcade,1.49
47,Mu.F.O.,GAME,5.0,2,16.0,1,Paid,0.99,Everyone,Arcade,0.99
82,Brick Breaker BR,GAME,5.0,7,19.0,5,Free,0.0,Everyone,Arcade,0.0
99,Anatomy & Physiology Vocabulary Exam Review App,MEDICAL,5.0,1,4.6,5,Free,0.0,Everyone,Medical,0.0


In [114]:
apps_below_250["Genres"].nunique()

114

In [116]:
apps_below_250['Genres'].value_counts()

Genres
Tools                                718
Entertainment                        467
Education                            429
Productivity                         301
Personalization                      298
                                    ... 
Adventure;Brain Games                  1
Travel & Local;Action & Adventure      1
Art & Design;Pretend Play              1
Music & Audio;Music & Video            1
Lifestyle;Pretend Play                 1
Name: count, Length: 114, dtype: int64

In [120]:
stack = apps_below_250['Genres'].str.split(";", expand=True).stack()

In [121]:
stack

21     0                    Medical
28     0                     Arcade
47     0                     Arcade
82     0                     Arcade
99     0                    Medical
                     ...           
10824  0               Productivity
10828  0    Video Players & Editors
10829  0    Video Players & Editors
10831  0           News & Magazines
10835  0                     Arcade
Length: 8564, dtype: object

In [123]:
genres_stack = stack.value_counts()

In [126]:
bar = px.bar(x=genres_stack.index[:15], y=genres_stack.values[:15], title="Top Genres", hover_name=genres_stack.values[:15], color=genres_stack.values[:15], color_continuous_scale="Agsunset")
bar.update_layout(xaxis_title="Genres", yaxis_title="Number of Apps", coloraxis_showscale=False)
bar.show()

In [127]:
apps_below_250["Type"].value_counts()

Type
Free    7595
Paid     589
Name: count, dtype: int64

In [129]:
df_free_vs_paid = apps_below_250.groupby(["Category", "Type"], as_index=False).agg({"App":pd.Series.count})

In [130]:
df_free_vs_paid

Unnamed: 0,Category,Type,App
0,ART_AND_DESIGN,Free,58
1,ART_AND_DESIGN,Paid,3
2,AUTO_AND_VEHICLES,Free,72
3,AUTO_AND_VEHICLES,Paid,1
4,BEAUTY,Free,42
...,...,...,...
56,TRAVEL_AND_LOCAL,Paid,8
57,VIDEO_PLAYERS,Free,144
58,VIDEO_PLAYERS,Paid,4
59,WEATHER,Free,65


In [133]:
barg = px.bar(data_frame=df_free_vs_paid, x="Category", y="App", title="Free vs Paid App by Category", color="Type", barmode="group")
barg.update_layout(xaxis_title ="Category", yaxis_title = "Number of Apps", xaxis={"categoryorder":"total descending"}, yaxis=dict(type="log"))
barg.show()

In [138]:
box = px.box(apps_below_250, y="Installs", x="Type", color="Type", notched=True, points="all", title="How many downloads are paid apps giving up?")
box.update_layout(yaxis=dict(type='log'))
box.show()

In [140]:
df_paid_apps = apps_below_250[apps_below_250['Type'] == 'Paid']
box = px.box(df_paid_apps, 
             x='Category', 
             y='Revenue_Estimate',
             title='How Much Can Paid Apps Earn?')
 
box.update_layout(xaxis_title='Category',
                  yaxis_title='Paid App Ballpark Revenue',
                  xaxis={'categoryorder':'min ascending'},
                  yaxis=dict(type='log'))
 
 
box.show()

In [141]:
box = px.box(df_paid_apps,
             x='Category',
             y="Price",
             title='Price per Category')
 
box.update_layout(xaxis_title='Category',
                  yaxis_title='Paid App Price',
                  xaxis={'categoryorder':'max descending'},
                  yaxis=dict(type='log'))
 
box.show()
