![](https://apptweak-blog.imgix.net/images/2019/02/06/Game%20Arcade.png?auto=format)

**Hello all,**

In this kernel, Here I'm going to perform Exploratory Data Analysis on ***ANDROID GAMES***. I m going to use basic 3D plots for visualization. So, that anyone can easily understand the visualization. Codes will be very simple & this is perfect notebook for beginners.

Data Analysis is all about finding some intersting insights in the data and we can find more insight by asking more questions & here I'm going to find some interesting insights in the data by asking the following questions:

1. **What is the percentage of free/paid games in Play store?**
2. **Which game category has the Average Ratings in Free Games?**
3. **Which game category was liked by people?**
4. **Which game category was disliked by people?**

Also I m going analyze all games and their ratings elaborately.

Let's start!!!

## Importing libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected = True)

## Displaying the Dataset

In [None]:
data = pd.read_csv('../input/top-play-store-games/android-games.csv')
data.head(5)

## Data Cleaning

In [None]:
data.info()

In [None]:
data.shape

In [None]:
data.isnull().sum()

In [None]:
data.duplicated().sum()

Now, I m going to change column values of 'installs' for visualization.

In [None]:
change = {'100.0 k' : 100000, '500.0 k' : 500000, '1.0 M' : 1000000, '5.0 M' : 5000000, '10.0 M' :10000000, 
                '50.0 M' : 50000000, '100.0 M': 100000000, '500.0 M': 500000000, '1000.0 M': 1000000000,}
data['installs'] = data['installs'].map(change)
data.head(5)

## EDA

### ***What is the percentage of free/paid games in Play store?***

In [None]:
fig = px.pie(data['paid'].value_counts(), values = 'paid', names = ['free', 'paid'], hole = 0.4, opacity = 1, 
       color_discrete_sequence = ['green','yellow'])
fig.add_annotation(text = 'Top charts', x = 0.5, y = 0.5,showarrow = False, font_size = 20, opacity = 1,font_family = 'serif')
fig.update_traces(textposition = 'outside', textinfo = 'percent+label')
fig.show()

## ***CATEGORICAL ANALYSIS***

### ***1. No.of Installs***

In [None]:
df = data[data['paid'] == False]

df1 = df.groupby('category')['installs'].sum().sort_values(ascending = False)

fig = px.bar(df1, x = df1.index, y = df1.values, color = df1.values)

fig.update_layout(font_family = 'serif',
                   title = dict(text = 'TOTAL INSTALLS OF FREE GAMES', x = 0.50, y = 0.95, font = dict(color = 'grey', size = 30)),
                   xaxis_title_text = 'categories',
                   yaxis_title_text = 'installs',
                   plot_bgcolor = 'white')

fig.show()

### ***2. Growth of Free Games***

In [None]:
fig = px.line(data, x = 'category', y = ['growth (30 days)', 'growth (60 days)'])

fig.update_layout(font_family = 'Helvetica',
                   title = dict(text = '30 days growth & 60 days growth', x = 0.50, y = 0.95),
                   xaxis = dict(showline = False, showgrid = False, showticklabels = True, linecolor = 'rgb(0,0,0)', linewidth = 5, ticks = 'outside'),
                   yaxis = dict(showgrid = False, zeroline = False, showline = True, showticklabels = True),
                   autosize = True,
                   margin = dict(autoexpand = True, l = 100, r = 100, t = 100, b = 100),
                   showlegend = True, plot_bgcolor = 'white')

fig.show()

### ***3. Ratings (Free Games)***

In [None]:
fig1 = px.bar(df, x = 'category', y = 'total ratings',color = '5 star ratings', hover_name = 'title')

fig1.update_layout(
    font_family = 'Comic Sans MS',
    title = dict(text = '5 STAR RATINGS', x = 0.50, y = 0.95, font = dict(color = 'red', size = 30)),
    bargap = 0.3,
    plot_bgcolor = 'white'
)

fig1.show()

In [None]:
fig2 = px.bar(df, x = 'category', y = 'total ratings',color = '4 star ratings', hover_name = 'title')

fig2.update_layout(
    font_family = 'Comic Sans MS',
    title = dict(text = '4 STAR RATINGS', x = 0.50, y = 0.95, font = dict(color = 'blue', size = 30)),
    bargap = 0.3,
    plot_bgcolor = 'white'
)

fig2.show()

In [None]:
fig3 = px.bar(df, x = 'category', y = 'total ratings',color = '3 star ratings', hover_name = 'title')

fig3.update_layout(
    font_family = 'Comic Sans MS',
    title = dict(text = '3 STAR RATINGS', x = 0.50, y = 0.95, font = dict(color = 'green', size = 30)),
    bargap = 0.3,
    plot_bgcolor = 'white'
)

fig3.show()

In [None]:
fig4 = px.bar(df, x = 'category', y = 'total ratings',color = '2 star ratings', hover_name = 'title')

fig4.update_layout(
    font_family = 'Comic Sans MS',
    title = dict(text = '2 STAR RATINGS', x = 0.50, y = 0.95, font = dict(color = 'hotpink', size = 30)),
    bargap = 0.3,
    plot_bgcolor = 'white'
)

fig4.show()

In [None]:
fig5 = px.bar(df, x = 'category', y = 'total ratings',color = '1 star ratings', hover_name = 'title')

fig5.update_layout(
    font_family = 'Comic Sans MS',
    title = dict(text = '1 STAR RATING', x = 0.50, y = 0.95, font = dict(color = 'orange', size = 30)),
    bargap = 0.3,
    plot_bgcolor = 'white'
)

fig5.show()

### ***4. Which game category has the highest Average Ratings in Free Games?***

In [None]:
fig = px.histogram(df, x = 'category', y = 'average rating', color = 'average rating', template = 'plotly_white', marginal = 'box',
            nbins = 100, color_discrete_sequence = ['red','orange','blue'], barmode = 'group', histfunc = 'count')

fig.update_layout(
    font_family = 'cambria',
    title = dict(text = 'Average ratings of free games',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.1,
)

fig.show()

**GAME CARD** has the highest average ratings.

### ***5. Which game category was liked by people?***

In [None]:
df1 = df.groupby('category')[['installs','total ratings','5 star ratings']].sum().sort_values(by = 'installs', ascending = True)
df1['percentage'] = (df1['5 star ratings']/df1['total ratings'])*100
df1['percentage'] = df1['percentage'].round(2)
df1.sort_values(by = 'percentage', ascending = True)

fig = go.Figure()

fig.add_trace(go.Bar(
    x = df1.index,
    y = df1['total ratings'],
    name = 'total ratings',
    marker_color = 'blue'
))
fig.add_trace(go.Bar(
    x = df1.index,
    y = df1['5 star ratings'],
    name = '5 star ratings',
    marker_color = 'cyan'
))

fig.update_layout(font_family = 'cambria',
                  title = dict(text = 'Total ratings Vs 5 Star ratings', x = 0.50, y = 0.95, font = dict(color = 'black', size = 20)),
                  legend = dict(x = 1 , y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 10),
                  bargap = 0.5,barmode='group', plot_bgcolor = 'white')
fig.show()

**Top 5 category of highest 5 star ratings percentage:**
 
  1. GAME CASINO (74.08%)
  2. GAME CASUAL (72.55%)
  3. GAME WORD (72.44%)
  4. GAME PUZZLE (72.13%)
  5. GAME ARCADE (72.00%)

### ***6. Which game category was disliked by people?***

In [None]:
df2 = df.groupby('category')[['installs','total ratings','1 star ratings']].sum().sort_values(by = 'installs', ascending = True)
df2['percentage'] = (df2['1 star ratings']/df2['total ratings'])*100
df2['percentage'] = df2['percentage'].round(2)
df2.sort_values(by = 'percentage', ascending = True)

fig = go.Figure()

fig.add_trace(go.Bar(
    x = df2.index,
    y = df2['total ratings'],
    name = 'total ratings',
    marker_color = 'hotpink',
))
fig.add_trace(go.Bar(
    x = df2.index,
    y = df2['1 star ratings'],
    name = '1 star ratings',
    marker_color = 'magenta'
))

fig.update_layout(font_family = 'calibri',
                  title = dict(text = 'Total ratings Vs 1 Star rating', x = 0.50, y = 0.95, font = dict(color = 'black', size = 20)),
                  legend = dict(x = 1 , y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 10),
                  bargap = 0.5,barmode='group', plot_bgcolor = 'white')
fig.show()

**Top 5 category of highest 1 star rating percentage:**
 
  1. GAME ACTION (12.54%)
  2. GAME MUSIC (11.28%)
  3. GAME EDUCATIONAL (10.77%)
  4. GAME BOARD (10.38%)
  5. GAME ADVENTURE (10.10%)

## ***FINDING BEST FREE GAMES***

In [None]:
df3 = df.groupby('installs')['title'].count().sort_values(ascending = True)
df3

In [None]:
g1 = df[df['installs'] == 1000000000]
g1['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g1['percentage'] = g1['percentage'].round(2)

fig = px.histogram(g1, x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'rug',
            nbins = 100, histfunc = 'sum',opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = '1000M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

**Reasons for 1000M downloads:**

1. Age Restrictions
2. Animations
3. Easy controls
4. File size
5. Adapt to all android mobiles
6. Popular from the beginning

In [None]:
g2 = df[df['installs'] == 500000000]
g2['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g2['percentage'] = g2['percentage'].round(2)

df4 = g2.groupby('category')
fig = px.histogram(g2, x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'box',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = '500M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

1. Most of the above games were downloaded by teenagers & youngsters.
2. These games gives real experience to the players because of their graphics.

In [None]:
g3 = df[df['installs'] == 100000000]
g3['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g3['percentage'] = g3['percentage'].round(2)
g3 = g3.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g3[:10], x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'violin',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = 'Top 10 games in 100M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

In [None]:
g4 = df[df['installs'] == 50000000]
g4['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g4['percentage'] = g4['percentage'].round(2)
g4 = g4.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g4[:10], x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'violin',
            nbins = 100, histfunc = 'count', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = 'Top 10 games in 50M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

The count of the Solitaire is 4 among the top 10 games in 50M downloads.

In [None]:
g5 = df[df['installs'] == 10000000]
g5['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g5['percentage'] = g5['percentage'].round(2)
g5 = g5.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g5[:10], x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'violin',
            nbins = 100, histfunc = 'count', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = 'Top 10 games in 10M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

Here also, the count of the Solitaire is 5 among the top 10 games in 10M downloads.

In [None]:
g6 = df[df['installs'] == 5000000]
g6['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g6['percentage'] = g6['percentage'].round(2)
g6 = g6.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g6[:10], x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'violin',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = 'Top 10 games in 5M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

In [None]:
g7 = df[df['installs'] == 1000000]
g7['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g7['percentage'] = g7['percentage'].round(2)
g7 = g7.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g7[:10], x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'box',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = 'Top 10 games in 1M downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

In [None]:
g8 = df[df['installs'] == 500000]
g8['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g8['percentage'] = g8['percentage'].round(2)
g8 = g8.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g8, x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'box',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = '500K downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

In [None]:
g9 = df[df['installs'] == 100000]
g9['percentage'] = (df['5 star ratings']/df['total ratings'])*100
g9['percentage'] = g9['percentage'].round(2)
g9 = g9.sort_values(by = 'percentage', ascending = False)

fig = px.histogram(g9, x = 'title', y = 'percentage', color = 'category', template = 'plotly_white', marginal = 'rug',
            nbins = 100, histfunc = 'sum', opacity = 1)

fig.update_layout(
    font_family = 'segoe print',
    title = dict(text = '100K downloads',x = 0.50, y = 0.95, font = dict(color = 'black', size = 30)),
    legend = dict(x = 1,y = 0.96, bordercolor = 'black', borderwidth = 2, tracegroupgap = 5),
    bargap = 0.2,
)

fig.show()

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTla08azUQDopf0SUDEcAbGch2Qp4YAJtphhg&usqp=CAU)

### ***Upvote me if you like it...***💐