![PUBG_Logo](assets/PUBG_logo.png)

## Objective
* Complete Exploratory Data Analysis.
* Deploy a dashboard to report findings.

## Background Information
* Playerunknown's Battleground (PUBG) is a video game, which set the standard for preceding games in the Battle Royale genre. The main goal is to SURVIVE at all costs.

## Process:
* Exploratory Data Analysis conducted utilizing various python packages (Numpy, Matplotlib, Pandas, and Plotly).


## Table of Contents:
* Part I: Exploratory Data Analysis
    * EDA
* Part II: Dashboard
    * Produced with plotly-dash and deployed by heroku.

In [None]:
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *
from scipy import stats
from sklearn.model_selection import train_test_split

import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import seaborn as sns

%matplotlib inline
pd.options.mode.chained_assignment = None

# Set an offline environment for plotly
init_notebook_mode(connected = True)

# PART I - Exploratory Data Analysis

### Data Preprocessing / Feature Engineering

Let us begin by reading in the CSV file containing the data, and examining the data contents such as the number of features and the number of samples. It seems there are 152 column entries (features) and 87898 row entries (number of samples).

In [None]:
#--------- Pandas Dataframe
## Read in CSV
orig = pd.read_csv('data/PUBG_Player_Statistics.csv')

## Examine data contents
orig.info()
orig.head()

Now, let us remove and combine features, which do not pertain to our goal of clustering solo player behavior. 

Remove:
* player_name
* tracker_id
* duo
* squad

Add:
* Total Distance

This can be achieved by removing all columns after the 52nd. Also, create a new feature that combines the walking and riding distance.

In [None]:
#---------Preprocessing
## Create a copy of the dataframe
df = orig.copy()
cols = np.arange(52, 152, 1)

## Drop columns after the 52nd index
df.drop(df.columns[cols], axis = 1, inplace = True)

## Drop player_name and tracker id
df.drop(df.columns[[0, 1]], axis = 1, inplace = True)

## Drop Knockout and Revives
df.drop(df.columns[[49]], axis = 1, inplace = True)
df.drop(columns = ['solo_Revives'], inplace = True)

## Drop the string solo from all strings
df.rename(columns = lambda x: x.lstrip('solo_').rstrip(''), inplace = True)

## Combine a few columns 
df['TotalDistance'] = df['WalkDistance'] + df['RideDistance']
df['AvgTotalDistance'] = df['AvgWalkDistance'] + df['AvgRideDistance']



Split the data into three sets: train, dev, and test set.

In [None]:
# Create train and test set using Sci-Kit Learn
train, test = train_test_split(df, test_size = 0.1)
dev, test = train_test_split(test, test_size = 0.5)
data = train

print("The number of training samples is", len(train))
print("The number of development samples is", len(dev))
print("The number of testing samples is", len(test))

It is important we go through the final output to make sure that are data preprocessing is complete. And it looks great!

In [None]:
with pd.option_context('display.max_columns', 52):
    print(data.describe(include = 'all'))

## Feature Relations

In this section, we'll go over the continuous and discrete representations of the features. Starting with univariate plots and concluding with observing bivariate relationships.

### Univariate Plots 


#### Continous Representations

Select the following features because they are prominent predictors in clustering player behavior.
These features were selected by communicating with experts who have domain experience.

From our observations, the distribution plots imply that all of these features are right-skewed except for the  Average Survival Time per round, which appears to be following a normal distribution.

In [None]:
#---------Histogram Plots
## Creating axes for plots
f, axes = plt.subplots(3, 4, figsize = (40,30), sharex = False)
f.suptitle('Feature Histograms', fontsize = 32)

## Distplots
ax1 = sns.distplot(data["Kills"], color = "skyblue", ax = axes[(0, 0)])
ax1.set_xlabel('Kills', fontsize = 24)

ax2 = sns.distplot(data["KillDeathRatio"], color = "olive", ax = axes[(0, 1)])
ax2.set_xlabel('Kill-Death Ratio', fontsize = 24)

ax3 = sns.distplot(data["Wins"], color = "gold", ax = axes[(1, 0)])
ax3.set_xlabel('Wins', fontsize = 24)

ax4 = sns.distplot(data["WinRatio"], color = "teal", ax = axes[(1, 1)])
ax4.set_xlabel('Win Ratio', fontsize = 24)

ax5 = sns.distplot(data["HeadshotKills"], color = "blue", ax = axes[(0, 2)])
ax5.set_xlabel('Headshot Kills', fontsize = 24)

ax6 = sns.distplot(data["HeadshotKillRatio"], color = "red", ax = axes[(1, 2)])
ax6.set_xlabel('Headshot-Kill Ratio', fontsize = 24)

ax7 = sns.distplot(data["DamageDealt"], color = "purple", ax = axes[(0, 3)])
ax7.set_xlabel('Total Damage Dealt', fontsize = 24)

ax8 = sns.distplot(data["DamagePg"], color = "magenta", ax = axes[(1, 3)])
ax8.set_xlabel('Damage per round', fontsize = 24)

ax9 = sns.distplot(data["RoundsPlayed"], color = "silver", ax = axes[(2, 0)])
ax9.set_xlabel('Rounds Played', fontsize = 24)

ax10 = sns.distplot(data["AvgSurvivalTime"], color = "orange", ax = axes[(2, 1)])
ax10.set_xlabel('Average Survival Time per round', fontsize = 24)

ax11 = sns.distplot(data['TotalDistance'], color = "green", ax = axes[(2, 2)])
ax11.set_xlabel('Total Distance', fontsize = 24)

ax12 = sns.distplot(data["AvgTotalDistance"], color = "darkblue", ax = axes[(2, 3)])
ax12.set_xlabel('Average Total Distance per round', fontsize = 24)


Following up with the previous claim of normality, a probability plot was analyzed to see if it has linear behavior.

In [None]:
#--------- Probability Plots
## 1st group of four plots

ax1 = plt.subplot(221)
stats.probplot(data["Kills"], plot = sns.mpl.pyplot)
plt.title('Kills', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax2 = plt.subplot(222)
stats.probplot(data["KillDeathRatio"], plot = sns.mpl.pyplot)
plt.title('Kill-Death Ratio', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax3 = plt.subplot(223)
stats.probplot(data["Wins"], plot = sns.mpl.pyplot) 
plt.title('Wins', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax4 = plt.subplot(224)
stats.probplot(data["WinRatio"], plot = sns.mpl.pyplot)
plt.title('Win Ratio', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

plt.subplots_adjust(left = 0, bottom = 0, right = 3, top = 3, wspace = 0.25, hspace = 0.5)
fig = plt.figure()



## 2nd group of four plots

ax_1 = fig.add_subplot(221)
stats.probplot(data["HeadshotKills"], plot = sns.mpl.pyplot) 
plt.title('Headshot Kills' , fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_2 = fig.add_subplot(222)
stats.probplot(data["HeadshotKillRatio"], plot = sns.mpl.pyplot)
plt.title('Headshot-Kill Ratio', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_3 = fig.add_subplot(223)
stats.probplot(data["DamageDealt"], plot = sns.mpl.pyplot) 
plt.title('Damage Dealt', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_4 = fig.add_subplot(224)
stats.probplot(data["DamagePg"], plot = sns.mpl.pyplot)
plt.title('Damage per round', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

plt.subplots_adjust(left = 0, bottom = 0, right = 3, top = 3, wspace = 0.25, hspace = 0.5)
fig = plt.figure()

## 3rd group of four plots

ax_31 = fig.add_subplot(221)
stats.probplot(data["RoundsPlayed"], plot = sns.mpl.pyplot)
plt.title('Rounds Played', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_32 = fig.add_subplot(222)
stats.probplot(data["AvgSurvivalTime"], plot = sns.mpl.pyplot)
plt.title('Average Survival Time per round', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_33 = fig.add_subplot(223)
stats.probplot(data['TotalDistance'], plot = sns.mpl.pyplot)
plt.title('Total Distance', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

ax_34 = fig.add_subplot(224)
stats.probplot(data["AvgTotalDistance"], plot = sns.mpl.pyplot)
plt.title('Average Total Distance per round', fontsize = 36)
plt.xlabel('Theoretical quantiles', fontsize = 36)
plt.ylabel('Ordered Values', fontsize = 36)

plt.subplots_adjust(left = 0, bottom = 0, right = 3, top = 3, wspace = 0.25, hspace = 0.5)
fig = plt.figure()


#### Discrete Representations

With the completion of the continuous representation of the data. Let's discretize the data into specific intervals and see how the population is spread. Discretizing a feature converts a feature from numerical into categorical. It is very useful for conducting boxplots or violin plots, as we don't want to look at all of the points but a select few intervals!

The function below will take a feature, discretize it, and then plot the discretized representation of it.

In [None]:
def CountPlot(data, discretized, cut, title_label, plot_xlabel, rotation, bins, labels):
    ###Creates a seaborn count plot by discretizing a column to change into an interval (category)
    ## Use 
    # 1) Discretize a feature: convert it from numerical to categorical
    # 2) Plot a countplot using seaborn
    # 3) Annotate the frequency of each interval
    ## Function parameters:
    # data == dataframe
    # discretized == new feature (category)
    # cut == feature that is being
    # converted into a category
    # plot_label = label of the x-axis and title
    # rotation = rotation of the x-tick marks
    # bins == the intervals
    # labels == string representations of the bins
    
    # Discretize the data
    data[discretized] = pd.cut(data[cut], bins = bins, labels = labels)
    
    # Plot
    dfWIM = data
    plt.style.use('ggplot')
    plt.figure(figsize = (15,10))
    ax = sns.countplot(x = discretized, data = dfWIM, order = labels)
    plt.title('Distribution of ' + ' ' + title_label)
    plt.xlabel(plot_xlabel)
    plt.xticks(rotation = rotation, fontsize = 10)

    # Annotate the frequencies of each interval
    ncount = len(dfWIM)
    for p in ax.patches:
        x = p.get_bbox().get_points()[:,0]
        y = p.get_bbox().get_points()[1,1]
        ax.annotate('{:.1f}%'.format(100. *y /ncount), (x.mean(), y), 
                ha = 'center', va = 'bottom', rotation = 90) # set the alignment of the text
    return plt.show()


##### Kills / Kill Death Ratio

Distribution of Kills:
* Most players are in the range of 0 - 9 kills, which is 9.1% of the data.

Distribution of Kill Death Ratio (KDR):
* Most players are in intervals of 0.60 - 0.79, 0.80 - 0.99, and 1.00 - 1.19 (KDR). 
* For reference, a KDR of 1.0 implies that for every death you incur, you accomplish one kill.

In [None]:
# Distribution of Kills
bins = [i for i in range(0, 610, 10)] + [5033]
labels = [str(i) + '-' + str(i + 9) for i in range(0,600,10)] + ['600.0-5033']
CountPlot(data = data, discretized = 'Kills_Category', cut = 'Kills'
          , title_label = 'Kills' , plot_xlabel = 'Kills'
          , rotation = 90 , bins = bins, labels = labels)

# Distribution of KDR(Kill Death Ratio)
bins =  [i for i in np.arange(0, 9.2, 0.2)] + [100]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 0.19) for i in np.arange(0, 9.0, 0.2)] + ['9.0-100']
CountPlot(data = data, discretized = 'KillDeathRatio_Category', cut = 'KillDeathRatio'
          , title_label = 'Kill-Death Ratio', rotation = 45 ,plot_xlabel = 'Kill-Death Ratio'
          , bins = bins, labels = labels)


##### Headshots and Headshot Ratio

Distribution of Headshots:
* Most players are in the range of 0 - 9 headshots, which is 34.4% of the data.

Distribution of Headshot Ratio:
* Most players are in the intervals of 0.150 - 0.199 and 0.200 - 0.249 (HKR).
* For reference, a HKR of 1.0 implies that for every kill you incur, you accomplish one headshot.



In [None]:
# Distribution of Headshots
bins = [i for i in range(0, 150, 10)] + [1500]
labels = [str(i) + '-' + str(i + 9) for i in range(0,140,10)] + ['140-1500']
CountPlot(data = data, discretized = 'HeadshotKills_Category'
          , cut = 'HeadshotKills', title_label = 'Headshots' , plot_xlabel = 'Headshots'
          , rotation = 0 , bins = bins, labels = labels)

# Distribution of Headshot-Kill Ratio
bins =  [i for i in np.arange(0, 0.55, 0.050)] + [1.1]
labels = ["{0:.3f}".format(i) + '-' + "{0:.3f}".format(i + 0.049) for i in np.arange(0, 0.50, 0.050)] + ['0.50-1.00']
CountPlot(data = data, discretized = 'HeadshotKillRatio_Category'
         , cut = 'HeadshotKillRatio', title_label = 'Headshot-Kill Ratio'
         , plot_xlabel = 'Headshot-Kill Ratio' , rotation = 0 , bins = bins, labels = labels)


##### Wins And Win Ratio

Distribution of Wins:
* Most players are in the range of 0 - 9 wins, which is 25.8% of the data.

Distribution of Win Ratio:
* Most players are in the interval of 1.00 - 1.99 (%), which is 12.2% of the data.
* For reference, a 1.0% win ratio is analogous to, for every 100 games, one win is achieved.

In [None]:
# Distribution of Wins
bins =  [i for i in range(0, 16, 1)] + [106]
labels = [str(i) for i in range(0,15,1)] + ['15-106']
CountPlot(data = data, discretized = 'Wins_Category'
         , cut = 'Wins', title_label = 'Wins' , plot_xlabel = 'Wins' 
         , rotation = 0 , bins = bins, labels = labels)

# Distribution of Win Ratio
bins = [i for i in np.arange(0, 51, 1)] + [100]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 0.99) for i in np.arange(0, 50, 1)] + ['50.00-100']
CountPlot(data = data, discretized = 'WinsRatio_Category'
         , cut = 'WinRatio', title_label = 'Win Ratio', plot_xlabel = 'Win Ratio (%)' 
         , rotation = 90 , bins = bins, labels = labels)

##### Top 10s and Top 10 Ratio

Distribution of Top 10:
* Most players have not achieved a top 10 finish, which is 6.9% of the data.

Distribution of Top 10 Ratio:
* Most players are in intervals of 11.00 - 11.99 (%), which is 4.6% of the data. 
* For reference, a 1% top 10 ratio implies that you make nine top 10 finishes out of 100 rounds played.

In [None]:
# Top 10s
bins = [i for i in np.arange(0, 70, 1)] + [386]
labels = [ str(i) for i in np.arange(0, 69, 1)] + ['69-386']
CountPlot(data = data, discretized = 'Top10s_Category'
         , cut = 'Top10s', title_label = 'Top 10s',  plot_xlabel = 'Top 10s' 
         , rotation = 90 , bins = bins, labels = labels)

# Top10 Ratio
bins =  [i for i in np.arange(0, 70, 1)] + [100]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 0.99) for i in np.arange(0, 69, 1)] + ['69.00-100']
CountPlot(data = data, discretized = 'Top10Ratio_Category'
         , cut = 'Top10Ratio', title_label = 'Top 10 Ratios' , plot_xlabel = 'Top 10 Ratio (%)' 
         , rotation = 90 , bins = bins, labels = labels)


##### Total Distance and Average Distance per round

Distribution of Total Distance:
* Most players are in the range of 0 - 19999 miles, which is 12.0% of the data.
* The average man will travel 110,000 miles in his lifetime, which is 6x the reported amount from the majority of players.

Distribution of Average Distance per round:
* Most data is represented in the center  (1800 - 3000 miles).
* The average man will travel 1,000 miles (driving) + 3.7 miles (walking) daily. 

In [None]:
# Total Distance
bins =  [i for i in range(0, 1260000 , 20000)] + [6.490221e+06]
labels = [str(i) + '-' + str(i + 19999) for i in range(0,1240000,20000)] + ['1240000-6.490221e+06']
CountPlot(data = data, discretized = 'TotalDistance_Category'
         , cut = 'TotalDistance', title_label = 'Total Distance', plot_xlabel = 'Total Distance (miles)' 
         , rotation = 90 , bins = bins, labels = labels)

# Average Distance Per Round
bins = [i for i in np.arange(0, 7100 , 100)] + [77000]
labels = [str(i)+ '-' + str(i + 99) for i in np.arange(0,7000,100)] + ['7000-77000']
CountPlot(data = data, discretized = 'AvgTotalDistance_Category'
         , cut = 'AvgTotalDistance', title_label = 'Average Distance per round', plot_xlabel = 'Average Distance per round (miles)' 
         , rotation = 90 , bins = bins, labels = labels)


##### Time Surived and Average Time Survived per round

Distribution of Time Survived:
* Most players are in the range of 0 - 9999 seconds, which is 15.6% of the data.
* The average man will live 22,075,000 seconds in his lifetime, which is 22,700x the reported amount from the majority of players.

Distribution of Average Time Survived per round:
* Most data is represented in the center (900 - 999 seconds).

In [None]:
# Time Survived
bins = [i for i in np.arange(0, 400000 , 10000)] + [1530000]
labels = [str(i) + '-' + str(i+9999) for i in np.arange(0, 390000, 10000)] + ['390000-1530000']
CountPlot(data = data, discretized = 'TimeSurvived_Category'
         , cut = 'TimeSurvived', title_label = 'Time Survived', plot_xlabel = 'Time Survived (s)' 
         , rotation = 90 , bins = bins, labels = labels)

# Average Time Survived per round
bins = [i for i in np.arange(0, 1800, 100)] + [2200]
labels = [str(i) + '-' + str(i + 99) for i in np.arange(0, 1700 , 100)] + ['1610-2200']
CountPlot(data = data, discretized = 'AvgSurvivalTime_Category'
         , cut = 'AvgSurvivalTime', title_label = 'Average Time Survived per round', plot_xlabel = 'Average Time Survived per round (s)' 
         , rotation = 45 , bins = bins, labels = labels)


##### Rounds Played and Damage per round

Distribution of Rounds Played:
* Most players are in the range of 0 - 9 rounds, which is 17.0% of the data.


Distribution of Damage per round:
* Most data is represented in the center (130 - 139 Damage per round), which is 6.0% of the data.

In [None]:
# RoundsPlayed
bins = [i for i in range(0, 480 , 10)] + [1682]
labels = [str(i) + '-' + str(i+9) for i in range(0,470,10)] + ['470 - 1682']

CountPlot(data = data, discretized = 'RoundsPlayed_Category'
         , cut = 'RoundsPlayed', title_label = 'Rounds Played', plot_xlabel = 'Rounds' 
         , rotation = 45 , bins = bins, labels = labels)

# Damage Dealt
bins = [i for i in range(0, 630 , 10)] + [2030]
labels = [str(i) + '-' + str(i + 9) for i in range(0,620,10)] + ['620 - 1230']
CountPlot(data = data, discretized = 'DamagePg_Category'
         , cut = 'DamagePg', title_label = 'Damage per round', plot_xlabel = 'Damage per round' 
         , rotation = 90 , bins = bins, labels = labels)


### Bivariate Relations

For Bivariate Relationships, we will use scatter and box plots. To minimize the number of box plots, the discretized features from the previous will be used. The function below takes an input feature and outputs a 2D scatter plot and a boxplot containing discretized intervals. But what are we comparing? In this case, let's examine the features in the previous section with win ratios to observe any correlations with win ratios.

In [None]:
def ScatterBoxPlot(data, discretized, cut, bins, labels, x, y):
    ##Creates a seaborn scatter and boxplot by discretizing a column to change into an interval (category)
    ## Use 
    # 1) Discretize a feature: convert it from numerical to categorical
    # 2) Plot a scatter plot
    # 3) Plot a box plot
    ## Function parameters:
    # data == dataframe
    # discretized == new feature (category)
    # cut == feature that is being
    # bins == the intervals
    # labels == string representations of the bins
    # x = x_axis of scatterplot
    # y = y_axis of scatterplot    
    
    # Discretize the data
    data[discretized] = pd.cut(data[cut], bins = bins, labels = labels)
    
    # Give a numerical label to the category
    c = data[discretized].astype('category')
    d = dict(enumerate(c.cat.categories))
    box_code = discretized + '_' + 'code'
    level_back = discretized + '_' + 'interval'
    data[box_code] = data[discretized].astype('category').cat.codes
    data[box_code] = data[box_code].replace(-1,0)
    data[level_back] = data[box_code].map(d)
    category_names = x + "_" + 'Interval'
    data[category_names] = data[level_back]      # we need this to color our boxplot
    
    # Scatter
    scatter = px.scatter(data, x = x, y = y,color = category_names,
                         color_discrete_sequence=px.colors.qualitative.Light24,
                         category_orders = {category_names : labels})
    scatter.update_xaxes(showline = True, linewidth = 1, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink', automargin = True, 
                          zeroline = True, zerolinewidth = 2, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    scatter.update_yaxes(showline = True, linewidth = 2, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink',
                          zeroline = True, zerolinewidth = 1, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    
    
    scatter.update_layout(
        legend = dict(
            x = 1,
            y = 1,
            traceorder = "normal",
            font = dict(
                family = "sans-serif",
                size = 14,
                color = "black"
            ),
            bgcolor = "#e5ecf6",
            bordercolor = "Black",
            borderwidth = 2
        )
    )

    # Boxplot
    box = px.box(data, x = discretized , y = y, color = category_names, 
                 color_discrete_sequence = px.colors.qualitative.Light24,
                         category_orders = { category_names : labels})
    box.update_xaxes(automargin = True, showline = True, linewidth = 1, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink', 
                          zeroline = True, zerolinewidth = 2, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    box.update_yaxes(showline = True, linewidth = 2, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink',
                          zeroline = True, zerolinewidth = 1, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    
    box.update_layout(
        legend = dict(
            x = 1,
            y = 1,
            traceorder = "normal",
            font = dict(
                family = "sans-serif",
                size = 14,
                color = "black"
            ),
            bgcolor = "#e5ecf6",
            bordercolor = "Black",
            borderwidth = 2
        )
    )
        
    return scatter, box

##### Kills

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Kills
bins = [i for i in range(0, 700, 100)] + [5600]
labels = [str(i) + '-' + str(i + 99) for i in range(0,600,100)] + ['600.0-5600']
Kills_Scatter, Kills_Box =  ScatterBoxPlot(data = data, discretized = 'Kills_Box', cut = 'Kills',
                                           bins = bins , labels = labels,
                                           x = 'Kills', y = 'WinRatio')

Kills_Scatter.update_xaxes(title_text = 'Kills', title_font = {'size': 24})
Kills_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Kills_Scatter.update_layout(title_text = "Scatterplot of Kills and Win Ratios", title_font = {'size': 30} )

Kills_Box.update_xaxes(title_text = 'Kills', title_font = {'size': 24})
Kills_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Kills_Box.update_layout(title_text = "Boxplot of Kills and Win Ratios", title_font = {'size': 30} )





Kills_Scatter.show()

Kills_Box.show()

##### Kill-Death Ratio

There is a clear increasing trend between Kill-Death Ratio and Win Ratio.

In [None]:
# KDR
bins =  [i for i in np.arange(0, 10, 1)] + [110]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 0.99) for i in np.arange(0, 9, 1)] + ['9.0-100']

KDR_Scatter, KDR_Box = ScatterBoxPlot(data = data, discretized = 'KillDeathRatio_Box',
                                      cut = 'KillDeathRatio', bins = bins,
                                      labels = labels, x = 'KillDeathRatio', y = 'WinRatio')




KDR_Scatter.update_xaxes(title_text = 'Kill-Death Ratio', title_font = {'size': 24})
KDR_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
KDR_Scatter.update_layout(title_text = "Scatterplot of Kill-Death Ratios and Win Ratios", title_font = {'size': 30} )

KDR_Box.update_xaxes(title_text = 'Kills-Death Ratio', title_font = {'size': 24})
KDR_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
KDR_Box.update_layout(title_text = "Boxplot of Kill-Death Ratios and Win Ratios", title_font = {'size': 30} )





KDR_Scatter.show()

KDR_Box.show()

##### Headshot Kills

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Headshot
bins = [i for i in range(0, 150, 10)] + [1500]
labels = [str(i) + '-' + str(i + 9) for i in range(0,140,10)] + ['140-1500']

Headshot_Scatter, Headshot_Box = ScatterBoxPlot(data = data, discretized = 'HeadshotKills_Box',
                                                  cut = 'HeadshotKills', bins = bins,
                                                  labels = labels, x = 'HeadshotKills', y = 'WinRatio')


Headshot_Scatter.update_xaxes(title_text = 'Headshot Kills', title_font = {'size': 24})
Headshot_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Headshot_Scatter.update_layout(title_text = "Scatterplot of Headshot Kills and Win Ratios", title_font = {'size': 30} )

Headshot_Box.update_xaxes(title_text = 'Headshot Kills', title_font = {'size': 24})
Headshot_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Headshot_Box.update_layout(title_text = "Boxplot of Headshot Kills and Win Ratios", title_font = {'size': 30} )





Headshot_Scatter.show()

Headshot_Box.show()

##### Headshot-Kill Ratio

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Headshot Kill Ratio
bins =  [i for i in np.arange(0, 0.55, 0.050)] + [1.1]
labels = ["{0:.3f}".format(i) + '-' + "{0:.3f}".format(i + 0.049) for i in np.arange(0, 0.50, 0.050)] + ['0.50-1.00']
HKR_Scatter, HKR_Box = ScatterBoxPlot(data = data, discretized = 'HeadshotKillRatio_Box', cut = 'HeadshotKillRatio',
                                      bins = bins , labels = labels, x = 'HeadshotKillRatio', y = 'WinRatio')


HKR_Scatter.update_xaxes(title_text = 'Headshot-Kill Ratio', title_font = {'size': 24})
HKR_Scatter.update_yaxes(title_text = 'Win Ratio (%), title_font = {'size': 24})
HKR_Scatter.update_layout(title_text = "Scatterplot of Headshot-Kill Ratio and Win Ratios", title_font = {'size': 30} )

HKR_Box.update_xaxes(title_text = 'Headshot-Kill Ratio', title_font = {'size': 24})
HKR_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
HKR_Box.update_layout(title_text = "Boxplot of Headshot-Kill Ratio and Win Ratios", title_font = {'size': 30} )





HKR_Scatter.show()

HKR_Box.show()

##### Top 10 Ratio

There is a clear increasing trend between Top 10 Ratio and Win Ratio.

In [None]:
# Top 10s
bins =  [i for i in np.arange(0, 110, 10)]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 9.99) for i in np.arange(0, 100, 10)] 
Top10_Scatter, Top10_Box = ScatterBoxPlot(data = data, discretized = 'Top10Ratio_Box' , cut = 'Top10Ratio',
                                          bins = bins , labels = labels, x = 'Top10Ratio', y = 'WinRatio')


Top10_Scatter.update_xaxes(title_text = 'Top 10 Ratio (%)', title_font = {'size': 24})
Top10_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Top10_Scatter.update_layout(title_text = "Scatterplot of Top 10 Ratio and Win Ratios", title_font = {'size': 30} )

Top10_Box.update_xaxes(title_text = 'Top 10 Ratio (%)', title_font = {'size': 24})
Top10_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Top10_Box.update_layout(title_text = "Boxplot of Top 10 Ratio and Win Ratios", title_font = {'size': 30} )





Top10_Scatter.show()

Top10_Box.show()

##### Total Distance Survived

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Total Distance Survived
bins =  [i for i in range(0, 1400000 , 200000)] + [6.490221e+06]
labels = [str(i) + '-' + str(i + 199999) for i in range(0,1200000,200000)] + ['1200000-6500000']
TDS_Scatter, TDS_Box = ScatterBoxPlot(data = data, discretized = 'TotalDistance_Box' , cut = 'TotalDistance'
               , bins = bins , labels = labels, x = 'TotalDistance', y = 'WinRatio')

TDS_Scatter.update_xaxes(title_text = 'Total Distance Survived (miles)', title_font = {'size': 24})
TDS_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
TDS_Scatter.update_layout(title_text = "Scatterplot of Total Distance Survived and Win Ratios", title_font = {'size': 30} )


TDS_Box.update_xaxes(title_text = 'Total Distance Survived (miles)', title_font = {'size': 24})
TDS_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
TDS_Box.update_layout(title_text = "Boxplot of Total Distance Survived and Win Ratios", title_font = {'size': 30} )





TDS_Scatter.show()

TDS_Box.show()

##### Average Distance Per Round

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Average Distance
bins = [i for i in np.arange(0, 8000 , 1000)] + [77000]
labels = [str(i) + '-' + str(i + 999) for i in np.arange(0,7000,1000)] + ['7000-77000']

AD_Scatter, AD_Box = ScatterBoxPlot(data = data, discretized = 'AvgTotalDistance_Box' , cut = 'AvgTotalDistance',
               bins = bins , labels = labels, x = 'AvgTotalDistance', y = 'WinRatio')


AD_Scatter.update_xaxes(title_text = 'Average Total Distance per round (miles)', title_font = {'size': 24})
AD_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
AD_Scatter.update_layout(title_text = "Scatterplot of Average Total Distance per round and Win Ratios", title_font = {'size': 30} )


AD_Box.update_xaxes(title_text = 'Average Total Distance per round (miles)', title_font = {'size': 24})
AD_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
AD_Box.update_layout(title_text = "Boxplot of Average Total Distance per round and Win Ratios", title_font = {'size': 30} )





AD_Scatter.show()

AD_Box.show()

##### Time Survived

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Time Survived
bins = [i for i in np.arange(0, 500000 , 100000)] + [1530000]
labels = [str(i) + '-' + str(i + 99999) for i in np.arange(0, 400000, 100000)] + ['390000-1530000']
TS_Scatter, TS_Box = ScatterBoxPlot(data = data, discretized = 'TimeSurvived_Box' , cut = 'TimeSurvived',
                                    bins = bins , labels = labels, x = 'TimeSurvived', y = 'WinRatio')


TS_Scatter.update_xaxes(title_text = 'Time Survived (s)', title_font = {'size': 24})
TS_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
TS_Scatter.update_layout(title_text = "Scatterplot of Time Survived and Win Ratios", title_font = {'size': 30} )

TS_Box.update_xaxes(title_text = 'Time Survived (s)', title_font = {'size': 24})
TS_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
TS_Box.update_layout(title_text = "Boxplot of Time Survived and Win Ratios", title_font = {'size': 30} )





TS_Scatter.show()

TS_Box.show()

##### Average Time Survived per round

There is a clear increasing trend between Average Survival Time and Win Ratio.

In [None]:
# Average Time Survived
bins = [i for i in np.arange(0, 2000, 200)] + [2200]
labels = [str(i) + '-' + str(i + 199) for i in np.arange(0, 1800 , 200)] + ['1800-2200']
ATS_Scatter, ATS_Box = ScatterBoxPlot(data = data, discretized = 'AvgSurvivalTime_Box' , cut = 'AvgSurvivalTime',
                                      bins = bins , labels = labels, x = 'AvgSurvivalTime', y = 'WinRatio')

ATS_Scatter.update_xaxes(title_text = 'Average Time Survived per round (s)', title_font = {'size': 24})
ATS_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
ATS_Scatter.update_layout(title_text = "Scatterplot of Average Time Survived per round and Win Ratios", title_font = {'size': 30} )

ATS_Box.update_xaxes(title_text = 'Average Time Survived per round (s)', title_font = {'size': 24})
ATS_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
ATS_Box.update_layout(title_text = "Boxplot of Average Time Survived per round and Win Ratios", title_font = {'size': 30} )





ATS_Scatter.show()

ATS_Box.show()

##### Rounds Played

There is no clear indication of a strong relationship because there is large variance.

In [None]:
# Rounds Played
bins = [i for i in range(0, 550 , 50)] + [1682]
labels = [str(i) + '-' + str(i + 49) for i in range(0,500,50)] + ['470 - 1682']

RP_Scatter, RP_Box = ScatterBoxPlot(data = data, discretized = 'RoundsPlayed_Box' , cut = 'RoundsPlayed',
               bins = bins , labels = labels, x = 'RoundsPlayed', y = 'WinRatio')


RP_Scatter.update_xaxes(title_text = 'Rounds Played', title_font = {'size': 24})
RP_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
RP_Scatter.update_layout(title_text = "Scatterplot of Rounds Played and Win Ratios", title_font = {'size': 30} )

RP_Box.update_xaxes(title_text = 'Rounds Played', title_font = {'size': 24})
RP_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
RP_Box.update_layout(title_text = "Boxplot of Rounds Played and Win Ratios", title_font = {'size': 30} )





RP_Scatter.show()

RP_Box.show()

##### Damage per round

There is a clear increasing trend between Damage Per Round and Win Ratio.

In [None]:
# Damage per Round
bins = [i for i in range(0, 700 , 100)] + [2030]
labels = [str(i) + '-' + str(i + 9) for i in range(0,600,100)] + ['510 - 2030']
DPG_Scatter, DPG_Box = ScatterBoxPlot(data = data, discretized = 'DamagePg_Box' , cut = 'DamagePg',
                                      bins = bins , labels = labels, x = 'DamagePg', y = 'WinRatio')

DPG_Scatter.update_xaxes(title_text = 'Damage Per Round', title_font = {'size': 24})
DPG_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
DPG_Scatter.update_layout(title_text = "Scatterplot of Damage per round and Win Ratios", title_font = {'size': 30} )

DPG_Box.update_xaxes(title_text = 'Damage Per Round', title_font = {'size': 24})
DPG_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
DPG_Box.update_layout(title_text = "Boxplot of Damage per round and Win Ratios", title_font = {'size': 30} )





DPG_Scatter.show()

DPG_Box.show()


## Data Exploration

### Does traveling more distance per round correlate with a higher win rate? 
##### Let's analyze this by comparing the lower 50% and the upper 50% of the population. 

In [None]:
# Do people who walk more than 50% of the population have higher win ratios?
## Calculate upper half
x = data[(data['AvgTotalDistance'] > data['AvgTotalDistance'].quantile(0.50))]
wins_upper = x['WinRatio'].mean()

#@ Calculate lower half
x = data[(data['AvgTotalDistance'] < data['AvgTotalDistance'].quantile(0.50))]
wins_lower = x['WinRatio'].mean()
winratio_mean = data['WinRatio'].mean()

# Annotations
rounded_wins_lower = round(wins_lower, 3)
rounded_wins_upper = round(wins_upper, 3)
rounded_wins_mean = round(winratio_mean, 3)


# Plot parameters
sns.set(style = "white", rc = {"lines.linewidth": 3})
fig, ax1 = plt.subplots(figsize = (10,10))
g = sns.barplot(y = [wins_lower, wins_upper, winratio_mean], x = ['Population < 50%', 'Population > 50%', 'Win Ratio Mean'],
                ax = ax1, palette="bright")

plt.ylabel('Win Ratio (%)', fontsize = '24')
plt.title('How does distance traveled affect win ratio?')
plt.text(-0.13, rounded_wins_lower + 0.03, str(rounded_wins_lower) + '%', size = 'large', color = 'blue', weight = 'semibold')
plt.text(0.86, rounded_wins_upper + 0.03, str(rounded_wins_upper) + '%', size = 'large', color = 'orange', weight = 'semibold')
plt.text(1.86, rounded_wins_mean + 0.03, str(rounded_wins_mean) + '%', size = 'large', color = 'green', weight = 'semibold')
plt.show()



In [None]:
print("The lower 50% of the population has a" + ' ' + str(rounded_wins_lower) + "% win ratio.")
print("The upper 50% of the population has a" + ' ' + str(rounded_wins_upper) + "% win ratio.")
print("The mean of the population has a" + ' ' + str(rounded_wins_mean) + "% win ratio.")

With this in mind, having a higher average distance per round to a greater win ratio!

----------------------------------------------------------------------------------

### Does average survival time per round correlate with a higher win rate? 
##### Let's analyze this by comparing the lower 50% and the upper 50% of the population. 

In [None]:
# Do people who survive longer than 50% of the population have higher win ratios?
## Calculate upper half
x = data[(data['AvgSurvivalTime'] > data['AvgSurvivalTime'].quantile(0.50))]
wins_upper = x['WinRatio'].mean()

#@ Calculate lower half
x = data[(data['AvgSurvivalTime'] < data['AvgSurvivalTime'].quantile(0.50))]
wins_lower = x['WinRatio'].mean()
winratio_mean = data['WinRatio'].mean()

# Annotations
rounded_wins_lower = round(wins_lower, 3)
rounded_wins_upper = round(wins_upper, 3)
rounded_wins_mean = round(winratio_mean, 3)

# Plot parameters
sns.set(style = "white", rc = {"lines.linewidth": 3})
fig, ax1 = plt.subplots(figsize = (10,10))
sns.barplot(y = [wins_lower, wins_upper, winratio_mean], x = ['Population < 50%', 'Population > 50%', 'Win Ratio Mean'],
            ax = ax1, palette = "bright")

plt.ylabel('Win Ratio (%)', fontsize = '24')
plt.title('How does average survival time per round affect win ratio?')
plt.text(-0.13, rounded_wins_lower + 0.03, str(rounded_wins_lower) + '%', size = 'large', color = 'blue', weight = 'semibold')
plt.text(0.85, rounded_wins_upper + 0.03, str(rounded_wins_upper) + '%', size = 'large', color = 'orange', weight = 'semibold')
plt.text(1.86, rounded_wins_mean + 0.03, str(rounded_wins_mean) + '%', size = 'large', color = 'green', weight = 'semibold')
plt.show()


In [None]:
print("The lower 50% of the population has a" + ' ' + str(rounded_wins_lower) + "% win ratio.")
print("The upper 50% of the population has a" + ' ' + str(rounded_wins_upper) + "% win ratio.")
print("The mean of the population has a" + ' ' + str(rounded_wins_mean) + "% win ratio.")

With this in mind, having a greater average survival time per round can lead to a greater win ratio!

-------------------------------------------------------------------------------------------------------------

### Does average survival time per round correlate with higher KDRs? 
##### Let's analyze this by comparing the lower 50% and the upper 50% of the population. 

In [None]:
# Do people who survive longer than 50% of the population have higher KDRs
# Calculate upper half?
x = data[(data['AvgSurvivalTime'] > data['AvgSurvivalTime'].quantile(0.50))]
wins_upper = x['KillDeathRatio'].mean()

# Calculate lower half
x = data[(data['AvgSurvivalTime'] < data['AvgSurvivalTime'].quantile(0.50))]
wins_lower = x['KillDeathRatio'].mean()
winratio_mean = data['KillDeathRatio'].mean()

# Annotations
rounded_wins_lower = round(wins_lower, 3)
rounded_wins_upper = round(wins_upper, 3)
rounded_wins_mean = round(winratio_mean, 3)

# Plot parameters
sns.set(style = "white", rc = {"lines.linewidth": 3})
fig, ax1 = plt.subplots(figsize = (10,10))
sns.barplot(y = [wins_lower, wins_upper, winratio_mean],  x = ['Population < 50%', 'Population > 50%', 'Win Ratio Mean'],
            ax = ax1, palette = "bright")

plt.ylabel('Win Ratio (%)', fontsize = '24')
plt.title('How does average survival time per round affect kill-death ratios?')
plt.text(-0.20, rounded_wins_lower + 0.008, str(rounded_wins_lower) + ' KDR', size = 'large', color = 'blue', weight = 'semibold')
plt.text(0.80, rounded_wins_upper + 0.008, str(rounded_wins_upper) + ' KDR', size = 'large', color = 'orange', weight = 'semibold')
plt.text(1.79, rounded_wins_mean + 0.008, str(rounded_wins_mean) + ' KDR', size = 'large', color = 'green', weight = 'semibold')
plt.show()

In [None]:
print("The lower 50% of the population has a " +   str(rounded_wins_lower) + " KDR.")
print("The upper 50% of the population has a " + str(rounded_wins_upper) + " KDR.")
print("The mean of the population has a " + str(rounded_wins_mean) + " KDR.")

With this in mind, having a greater average survival time per round can lead to a greater KDR!

--------------------------------------------------------------------------------------------------------------

### What factors correlate with a higher average survival time?

In [None]:
g = sns.pairplot(data,
                 x_vars = 
                 [
                     "KillsPg", "AvgTotalDistance", 'KillDeathRatio',
                     'HeadshotKillRatio', 'WinRatio', 'Top10Ratio',
                     'RoundsPlayed'
                 ],
                 y_vars = ["AvgSurvivalTime"])

The only factors above that have a positive correlation to average survival time are Average Total Distance per round, Win Ratio, and Top 10 Ratio.

## Remarks

* From the analysis presented in the EDA section, the data is sporadic.
* Through the nature of the game despite having better combat skills than your opponents. There are external factors present in the game that can impact your success (wins).


# Part II - Dashboard

Dashboard created using plotly and flask.

### Dashboard 1 - Univariate Relationships

In [None]:
from collections import OrderedDict
from dash.dependencies import Input, Output
from dash_table.Format import Format, Scheme, Sign, Symbol
from plotly.graph_objs import *
from scipy import stats
from sklearn.model_selection import train_test_split

import base64
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import dash_table.FormatTemplate as FormatTemplate
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


#--------- Pandas Dataframe
orig = pd.read_csv('data/PUBG_Player_Statistics.csv')

## Create a copy of the dataframe
df = orig.copy()
cols = np.arange(52, 152, 1)

## Drop columns after the 52nd index
df.drop(df.columns[cols], axis = 1, inplace = True)

## Drop player_name and tracker id
df.drop(df.columns[[0, 1]], axis = 1, inplace = True)

## Drop Knockout and Revives
df.drop(df.columns[[49]], axis = 1, inplace = True)
df.drop(columns = ['solo_Revives'], inplace = True)

## Drop the string solo from all strings
df.rename(columns = lambda x: x.lstrip('solo_').rstrip(''), inplace = True)

## Combine a few columns 
df['TotalDistance'] = df['WalkDistance'] + df['RideDistance']
df['AvgTotalDistance'] = df['AvgWalkDistance'] + df['AvgRideDistance']

# Create train and test set using Sci-Kit Learn
train, test = train_test_split(df, test_size=0.1)
dev, test = train_test_split(test, test_size = 0.5)
df = train

#--------- Dashboard
## Importing Logo and encoding it
image_filename = 'assets/PUBG_logo.png' 
encoded_image = base64.b64encode(
    open(image_filename, 'rb').read())


## Importing Plots and encoding it
### Histogram
image_filename = 'assets/Histogram_Plots.png' 
encoded_image1 = base64.b64encode(
    open(image_filename, 'rb').read())

### Q-Q Plot
image_filename = 'assets/Probability_Plot_1.png' 
encoded_image2 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Probability_Plot_2.png' 
encoded_image3 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Probability_Plot_3.png' 
encoded_image4 = base64.b64encode(
    open(image_filename, 'rb').read())

### Discretized Distributions
image_filename = 'assets/Distribution_Kills.png' 
encoded_image5 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Kill-Death-Ratio.png' 
encoded_image6 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Headshots.png' 
encoded_image7 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Headshot-Kill-Ratio.png' 
encoded_image8 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Wins.png' 
encoded_image9 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Win-Ratio.png' 
encoded_image10 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Top10s.png' 
encoded_image11 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Top10-Ratio.png' 
encoded_image12 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Total-Distance.png' 
encoded_image13 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Average-Distance.png' 
encoded_image14 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Time-Survived.png' 
encoded_image15 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Average-Time-Survived.png' 
encoded_image16 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Rounds-Played.png' 
encoded_image17 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Distribution_Damage-Per-Game.png' 
encoded_image18 = base64.b64encode(
    open(image_filename, 'rb').read())


## CSS stylesheet for formatting
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

## Instantiating the dashboard application
app = dash.Dash(__name__,
                external_stylesheets=external_stylesheets)


server = app.server
app.config['suppress_callback_exceptions'] = True

## Setting up the dashboard layout
app.layout = html.Div(
    [

### Inserting Logo into Heading and centering it
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image.decode())
                        )
            ],
            
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),

### Inserting Datatable Header               
        html.Div(
            [
                html.H2("Playerunknown's Battleground Match Statistics")
            ]
        ),
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * Dataset distributed through Kaggle on a successfully popular multiplayer video game, Playerunknown's Battlegrounds.
                    * The dataset includes various features on the performance of an individual player collected through their match history.
                    
                    '''
                )
            ]
        ),

### Inserting in Datatable
        dash_table.DataTable( 
            id = 'typing_formatting_1',
            data = df.to_dict('records'),
            columns =
            [
                {
                    'id': 'Kills',
                    'name': 'Kills',
                    'type': 'numeric'
                }, 

                {
                    'id': 'KillDeathRatio',
                    'name': 'Kill-Death Ratio',
                    'type': 'numeric'
                }, 

                {
                    'id': 'HeadshotKills',
                    'name': 'Headshot Kills',
                    'type': 'numeric'
                }, 

                {
                    'id': 'HeadshotKillRatio',
                    'name': 'Headshot-Kill Ratio',
                    'type': 'numeric'
                }, 

                {
                    'id': 'Wins',
                    'name': 'Wins',
                    'type': 'numeric',
                },  

                {
                    'id': 'WinRatio',
                    'name': 'WinRatio (%)',
                    'type': 'numeric'

                },
                {
                    'id': 'Top10s',
                    'name': 'Top 10s',
                    'type': 'numeric'

                },
                {
                    'id': 'Top10Ratio',
                    'name': 'Top 10 Ratio',
                    'type': 'numeric'

                },
                {
                    'id': 'TotalDistance',
                    'name': 'Total Distance',
                    'type': 'numeric'

                },
                {
                    'id': 'AvgTotalDistance',
                    'name': 'Average Total Distance (miles)',
                    'type': 'numeric'

                },
                
                {
                    'id': 'TimeSurvived',
                    'name': 'Survival Time (s)',
                    'type': 'numeric'

                },
                
                {
                    'id': 'AvgSurvivalTime',
                    'name': 'Average Survival Time (s)',
                    'type': 'numeric'

                },
                

                {
                    'id': 'RoundsPlayed',
                    'name': 'Rounds Played',
                    'type': 'numeric'

                },
                                {
                    'id': 'DamagePg',
                    'name': 'Damage Per Round',
                    'type': 'numeric'

                },
                
            ],



### Formatting the data/headers cells
            style_cell = 
            {
                'backgroundColor': 'rgb(255, 245, 205)','height': 'auto',
                'minWidth': '0px', 'maxWidth': '300px',
                'whiteSpace': 'normal'
            },

            style_data = 
            {
                'border': '1px solid blue',
                'font-size': 18 
            },

            style_header = 
            {
                'border': '2px solid gold',
                'font-size': 21
            },
            editable = True,
            filter_action = "native",
            sort_action = "native",
            sort_mode = "multi",
            column_selectable = "single",
            row_selectable = "multi",
            row_deletable = True,
            selected_columns = [],
            selected_rows = [],
            page_action = "native",
            page_current = 0,
            page_size = 20,
        
        ),
        html.Div(id = 'typing_formatting_1-container'),
        
        html.Div(
            [
                html.H2("Continuous Representations")
            ]
        ),
        
# Markdown on Continuous Representations
        html.Div(
            [
                dcc.Markdown(
                    '''
                    * Examine the distribution of each feature if its left-skewed, normal, or right-skewed.
                    
                    '''
                )
            ]
        ),
        
# Insert Header for Histograms
        html.Div(
            [
                html.H3("Feature Distributions")
            ]
        ),
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * Most features are right-skewed; only Average Survival Time per round appears to be normal.
                    
                    ''')
            ]
        ),

# Insert Histogram Plots
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image1.decode())
                        )
            ],
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),    

# Insert Header for Probability Plots
        html.Div(
            [
                html.H3("Q-Q Plots")
            ]
        ),
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * Verify our initial claims from the histograms by examining linear behavior in Q-Q plots.
                    * Average Survival Time per round exhibits linear behavior and is normal.
                    
                    '''
                )
            ]
        ),

# Insert Q-Q Plots
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image2.decode()), className = "four columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image3.decode()), className = "four columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image4.decode()), className = "four columns")
                    ],
                ),
            ], className = 'row'
        ),

# Insert Header for Discrete Representation
        html.Div(
            [ 
                html.H2("Discrete Representations")
            ]
        ), 
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * Convert features from numerical into categorical to identify populous intervals.
                    
                    ''')
            ]
        ),

# Insert Header for Kills and Kill-Death Ratio
        html.Div(
            [
                html.H3("Kills and Kill-Death Ratio")
            ], className = 'row'
        ),
        

# Insert Markdown for Kills and Kill-Death Ratio     
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 9 kills, which is 9.1% of the data.
                            
                            ''', className = "six columns"
                        )
                    ],
                ),
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in intervals of 0.60 - 0.79, 0.80 - 0.99, and 1.00 - 1.19 (KDR). 
                            * For reference, a KDR of 1.0 implies that for every death you incur, you accomplish one kill.
                            
                            ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),
        
# Insert Kills and Kill-Death Ratio Distributions
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image5.decode()), className = "six columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image6.decode()), className = "six columns")
                    ],
                ), 
            ], className = 'row'
        ),

# Insert Header for Headshots and Headshot-Kill Ratio
        html.Div(
            [
                html.H3("Headshots and Headshot-Kill Ratio" )
            ]
        ),
        
# Insert Markdown for Headshots and Headshot-Kill Ratio     
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 9 headshots, which is 34.4% of the data.
                            
                            ''', className = "six columns"
                        )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the intervals of 0.150 - 0.199 and 0.200 - 0.249 (HKR). 
                            * For reference, a HKR of 1.0 implies that for every kill you incur, you accomplish one headshot.

                            ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),

# Insert Headshots and Headshot-Kill Ratio Distributions
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image7.decode()), className = "six columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image8.decode()), className = "six columns")
                    ],
                ), 
            ], className = 'row'
        ),

# Insert Header for Wins and Win Ratio
        html.Div(
            [
                html.H3("Wins and Win Ratio" )
            ]
        ),

# Insert Markdown for Wins and Win Ratio    
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 9 wins, which is 25.8% of the data.
                            
                            ''', className = "six columns" )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the interval of 1.00 - 1.99 (%), which is 12.2% of the data.
                            * For reference, a 1.0% win ratio is analogous to, for every 100 round, one win is achieved.
                
                            ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),

# Insert Wins and Win Ratio Distributions
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image9.decode()), className = "six columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image10.decode()), className = "six columns")
                    ],
                ), 
            ], className = 'row'
        ), 

# Insert Header for Top 10s and Top 10 Ratio
        html.Div(
            [
                html.H3("Top 10s and Top 10 Ratio" )
            ]
        ),
        
# Insert Markdown for Top 10s and Top 10 Ratio    
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players have not achieved a top 10 finish, which is 6.9% of the data.

                            ''', className = "six columns" 
                        )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in intervals of 11.00 - 11.99 (%), which is 4.6% of the data.  
                            * For reference, a 1% top 10 ratio implies that you earn nine top 10 finishes out of 100 rounds played.

                            ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),
# Insert Top 10s and Top 10 Ratio Distributions
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image11.decode()), className = "six columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image12.decode()), className = "six columns")
                    ],
                ), 
            ], className = 'row'
        ), 

# Insert Header for Total Distance and Average Distance Per Round
        html.Div(
            [
                html.H3("Total Distance and Average Distance per round" )
            ]
        ),
        
# Insert Markdown for Total Distance and Average Distance Per Round      
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 19999 miles, which is 12.0% of the data.
                            * The average man will travel 110,000 miles in his lifetime, which is 6x the reported amount from the majority of players.
                
                ''', className = "six columns" 
                        )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most data is represented in the center  (1800 - 3000 miles).
                            * The average man will travel 1,000 miles (driving) + 3.7 miles (walking). 
                
                ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),
        
# Insert Total Distance and Average Distance Per Round Distributions
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image13.decode()), className = "six columns")
                    ],
                ), 
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image14.decode()), className = "six columns")
            ],
        ), 
            ], className = 'row'
        ), 
        
# Insert Header for Time Survived and Average Time Survived per round
html.Div(
    [
        html.H3("Time Survived and Average Time Survived per round" )
    ]
        ),
        

# Insert Markdown for Time Survived and Average Time Survived per round      
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 9999 seconds, which is 15.6% of the data.
                            * The average man will live 22,075,000 seconds in his lifetime, which is roughly 22,700x the reported amount from the majority of players.        
                            
                            ''', className = "six columns" 
                        )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most data is represented in the center (900 - 999 seconds).
                            
                            ''', className = "six columns" )
                    ],
                ), 
            ], className = 'row'
        ),

# Insert Time Survived and Average Time Survived per round
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image15.decode()), className = "six columns")
                    ],
                ), 
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image16.decode()), className = "six columns")
            ],
        ), 
            ], className = 'row'
        ), 

# Insert Header for Rounds Played and Damage per game
        html.Div(
            [
                html.H3("Rounds Played and Damage per round" )
            ]
        ),

        
# Insert Markdown for Rounds Played and Damage per round   
        html.Div(
            [
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most players are in the range of 0 - 9 rounds, which is 17.0% of the data.
                            
                            ''', className = "six columns" 
                        )
                    ],
                ), 
                html.Div(
                    [
                        dcc.Markdown(
                            ''' 
                            * Most data is represented in the center (130 - 139 Damage per round), which is 6.0% of the data.
                            
                            ''', className = "six columns" 
                        )
                    ],
                ), 
            ], className = 'row'
        ),
# Insert Rounds Played and Damage per round
        html.Div(
            [
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image17.decode()), className = "six columns")
                    ],
                ), 
                html.Div(
                    [
                        html.Img(src = 'data:image/png;base64,{}'
                                 .format(encoded_image18.decode()), className = "six columns")
                    ],
                ), 
            ], className = 'row'
        ),          
    ]
)

    

if __name__ == '__main__':
    app.run_server(debug = False)

--------------------------------------------------------------------------------------------------------

### Dashboard 2 - Bivariate Relationships

In [None]:
from collections import OrderedDict
from dash.dependencies import Input, Output
from dash_table.Format import Format, Scheme, Sign, Symbol
from plotly.graph_objs import *
from scipy import stats
from sklearn.model_selection import train_test_split

import base64
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_table
import dash_table.FormatTemplate as FormatTemplate
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go


#--------- Pandas Dataframe
orig = pd.read_csv('data/PUBG_Player_Statistics.csv')

## Create a copy of the dataframe
df = orig.copy()
cols = np.arange(52, 152, 1)

## Drop columns after the 52nd index
df.drop(df.columns[cols], axis = 1, inplace = True)

## Drop player_name and tracker id
df.drop(df.columns[[0, 1]], axis = 1, inplace = True)

## Drop Knockout and Revives
df.drop(df.columns[[49]], axis = 1, inplace = True)
df.drop(columns = ['solo_Revives'], inplace = True)

## Drop the string solo from all strings
df.rename(columns = lambda x: x.lstrip('solo_').rstrip(''), inplace = True)

## Combine a few columns 
df['TotalDistance'] = df['WalkDistance'] + df['RideDistance']
df['AvgTotalDistance'] = df['AvgWalkDistance'] + df['AvgRideDistance']

# Create train and test set using Sci-Kit Learn
train, test = train_test_split(df, test_size = 0.1)
dev, test = train_test_split(test, test_size = 0.5)
df = train
data = df


#--------- Dashboard
## Importing Logo and encoding it
image_filename = 'assets/PUBG_logo.png' 
encoded_image = base64.b64encode(
    open(image_filename, 'rb').read())


## Bivariate Plots


def ScatterBoxPlot(data, discretized, cut, bins, labels, x, y):
    ##Creates a seaborn scatter and boxplot by discretizing a column to change into an interval (category)
    ## Use 
    # 1) Discretize a feature: convert it from numerical to categorical
    # 2) Plot a scatter plot
    # 3) Plot a box plot
    ## Function parameters:
    # data == dataframe
    # discretized == new feature (category)
    # cut == feature that is being
    # bins == the intervals
    # labels == string representations of the bins
    # x = x_axis of scatterplot
    # y = y_axis of scatterplot    
    
    # Discretize the data
    data[discretized] = pd.cut(data[cut], bins = bins, labels = labels)
    
    # Give a numerical label to the category
    c = data[discretized].astype('category')
    d = dict(enumerate(c.cat.categories))
    box_code = discretized + '_' + 'code'
    level_back = discretized + '_' + 'interval'
    data[box_code] = data[discretized].astype('category').cat.codes
    data[box_code] = data[box_code].replace(-1,0)
    data[level_back] = data[box_code].map(d)
    category_names = x + "_" + 'Interval'
    data[category_names] = data[level_back]      # we need this to color our boxplot
    
    # Scatter
    scatter = px.scatter(data, x = x, y = y,color = category_names,
                         color_discrete_sequence=px.colors.qualitative.Light24,
                         category_orders = {category_names : labels})
    scatter.update_xaxes(showline = True, linewidth = 1, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink', automargin = True, 
                          zeroline = True, zerolinewidth = 2, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    scatter.update_yaxes(showline = True, linewidth = 2, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink',
                          zeroline = True, zerolinewidth = 1, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    
    
    scatter.update_layout(
        legend = dict(
            x = 1,
            y = 1,
            traceorder = "normal",
            font = dict(
                family = "sans-serif",
                size = 14,
                color = "black"
            ),
            bgcolor = "#e5ecf6",
            bordercolor = "Black",
            borderwidth = 2
        )
    )

    # Boxplot
    box = px.box(data, x = discretized , y = y, color = category_names, 
                 color_discrete_sequence = px.colors.qualitative.Light24,
                         category_orders = { category_names : labels})
    box.update_xaxes(automargin = True, showline = True, linewidth = 1, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink', 
                          zeroline = True, zerolinewidth = 2, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    box.update_yaxes(showline = True, linewidth = 2, linecolor = 'black', 
                          mirror = True, gridcolor = 'LightPink',
                          zeroline = True, zerolinewidth = 1, zerolinecolor = 'LightPink', 
                          ticks = "outside", tickwidth = 2, tickcolor = 'black', ticklen = 10)
    
    box.update_layout(
        legend = dict(
            x = 1,
            y = 1,
            traceorder = "normal",
            font = dict(
                family = "sans-serif",
                size = 14,
                color = "black"
            ),
            bgcolor = "#e5ecf6",
            bordercolor = "Black",
            borderwidth = 2
        )
    )
        
    return scatter, box




### KDR
bins =  [i for i in np.arange(0, 10, 1)] + [110]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 0.99) for i in np.arange(0, 9, 1)] + ['9.0-100']

KDR_Scatter, KDR_Box = ScatterBoxPlot(data = data, discretized = 'KillDeathRatio_Box',
                                      cut = 'KillDeathRatio', bins = bins,
                                      labels = labels, x = 'KillDeathRatio', y = 'WinRatio')




KDR_Scatter.update_xaxes(title_text = 'Kill-Death Ratio', title_font = {'size': 24})
KDR_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
KDR_Scatter.update_layout(title_text = "Scatterplot of Kill-Death Ratios and Win Ratios", title_font = {'size': 30} )

KDR_Box.update_xaxes(title_text = 'Kills-Death Ratio', title_font = {'size': 24})
KDR_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
KDR_Box.update_layout(title_text = "Boxplot of Kill-Death Ratios and Win Ratios", title_font = {'size': 30} )



### Top 10s
bins =  [i for i in np.arange(0, 110, 10)]
labels = ["{0:.2f}".format(i) + '-' + "{0:.2f}".format(i + 9.99) for i in np.arange(0, 100, 10)] 
Top10_Scatter, Top10_Box = ScatterBoxPlot(data = data, discretized = 'Top10Ratio_Box' , cut = 'Top10Ratio',
                                          bins = bins , labels = labels, x = 'Top10Ratio', y = 'WinRatio')


Top10_Scatter.update_xaxes(title_text = 'Top 10 Ratio (%)', title_font = {'size': 24})
Top10_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Top10_Scatter.update_layout(title_text = "Scatterplot of Top 10 Ratio and Win Ratios", title_font = {'size': 30} )

Top10_Box.update_xaxes(title_text = 'Top 10 Ratio (%)', title_font = {'size': 24})
Top10_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
Top10_Box.update_layout(title_text = "Boxplot of Top 10 Ratio and Win Ratios", title_font = {'size': 30} )


### Average Time Survived
bins = [i for i in np.arange(0, 2000, 200)] + [2200]
labels = [str(i) + '-' + str(i + 199) for i in np.arange(0, 1800 , 200)] + ['1800-2200']
ATS_Scatter, ATS_Box = ScatterBoxPlot(data = data, discretized = 'AvgSurvivalTime_Box' , cut = 'AvgSurvivalTime',
                                      bins = bins , labels = labels, x = 'AvgSurvivalTime', y = 'WinRatio')

ATS_Scatter.update_xaxes(title_text = 'Average Time Survived per round (s)', title_font = {'size': 24})
ATS_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
ATS_Scatter.update_layout(title_text = "Scatterplot of Average Time Survived per round and Win Ratios", title_font = {'size': 30} )

ATS_Box.update_xaxes(title_text = 'Average Time Survived per round (s)', title_font = {'size': 24})
ATS_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
ATS_Box.update_layout(title_text = "Boxplot of Average Time Survived per round and Win Ratios", title_font = {'size': 30} )


### Damage per round
bins = [i for i in range(0, 700 , 100)] + [2030]
labels = [str(i) + '-' + str(i + 9) for i in range(0,600,100)] + ['510 - 2030']
DPG_Scatter, DPG_Box = ScatterBoxPlot(data = data, discretized = 'DamagePg_Box' , cut = 'DamagePg',
                                      bins = bins , labels = labels, x = 'DamagePg', y = 'WinRatio')

DPG_Scatter.update_xaxes(title_text = 'Damage per round', title_font = {'size': 24})
DPG_Scatter.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
DPG_Scatter.update_layout(title_text = "Scatterplot of Damage per round and Win Ratios", title_font = {'size': 30} )

DPG_Box.update_xaxes(title_text = 'Damage per round', title_font = {'size': 24})
DPG_Box.update_yaxes(title_text = 'Win Ratio (%)', title_font = {'size': 24})
DPG_Box.update_layout(title_text = "Boxplot of Damage per round and Win Ratios", title_font = {'size': 30} )






## Data Exploration
image_filename = 'assets/Distance_Win-Ratio.png' 
encoded_image19 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Survival_Time-Win-Ratio.png' 
encoded_image20 = base64.b64encode(
    open(image_filename, 'rb').read())

image_filename = 'assets/Survival_Time-KDR.png' 
encoded_image21 = base64.b64encode(
    open(image_filename, 'rb').read())



## CSS stylesheet for formatting
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

## Instantiating the dashboard application
app = dash.Dash(__name__,
                external_stylesheets=external_stylesheets)

app.config['suppress_callback_exceptions'] = True
server = app.server

## Setting up the dashboard layout
app.layout = html.Div(
    [


### Inserting Logo into Heading and centering it
        html.Div(
            [
                html.Img(src='data:image/png;base64,{}'
                         .format(encoded_image.decode())
                        )
            ],
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),







# Bivariate Relationship with Win Ratio
 
## Insert header for Bivariate Relationships
        html.Div(
            [
                html.H2("Bivariate Relationships with Win Ratio" )
            ]
        ),

# Insert Markdown for Bivariate Relationships       
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * Bivariate relationships are analyzed using scatter and box plots. The features are discretized into larger intervals to minimize the number of box plots.
                    * All features below are positively correlated with win ratio, as you observe the increasing trend in the scatter plots.
                    * The box plots contain large variance due to the sporadic nature of this data.
                    '''
                )
            ]
        ),

# Insert KDR
        html.Div(
            [
                html.H3("Kill-Death Ratio")
            ]
        ),
        html.Div(
            [
                html.Div(
                    [
                        dcc.Graph(figure = KDR_Scatter),
                    ], className = "six columns"
                ), 
                html.Div(
                    [
                        dcc.Graph( figure = KDR_Box),
                    ], className = "six columns"
                ), 
            ], className = 'row'
        ),

# Insert Top 10s
        html.Div(
            [
                html.H3("Top 10s")
            ]
        ),
        html.Div(
            [
                html.Div(
                    [
                        dcc.Graph(figure = Top10_Scatter),
                    ], className = "six columns"
                ), 
                html.Div(
                    [
                        dcc.Graph( figure = Top10_Box),
                    ], className="six columns"
                ), 
            ], className = 'row'
        ), 

        
# Insert Average Time Survived
        html.Div(
            [
                html.H3("Average Time Survived per round")
            ]
        ),
        html.Div(
            [
                html.Div(
                    [
                        dcc.Graph(figure = ATS_Scatter),
                    ], className = "six columns"
                ), 
                html.Div(
                    [
                        dcc.Graph( figure = ATS_Box),
                    ], className = "six columns"
                ), 

    ], className = 'row'
        ), 
        
        
# Insert Damage per round
        html.Div(
            [
                html.H3("Damage per round")
            ]
        ),
        html.Div(
            [
                html.Div(
                    [
                        dcc.Graph(figure = DPG_Scatter),
                    ], className = "six columns"
                ), 
                html.Div(
                    [
                        dcc.Graph( figure = DPG_Box),
                    ], className = "six columns"
                ), 
            ], className = 'row'
        ),
        html.Div(
            [
                html.H2("Data Exploration")
            ]
        ),
# Insert Markdown for Data Exploration 1
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    ### Does traveling more distance per round correlate with a higher win rate? 
                    ##### Let's analyze this by comparing the lower 50% and the upper 50% of the population. 
                
                    '''
                )
            ]
        ),
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image19.decode())
                        )
            ],
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),
# Insert Markdown for Data Exploration 1
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * The lower 50% of the population has a 2.724% win ratio.
                    * The upper 50% of the population has a 7.326% win ratio.
                    * The mean of the population has a 5.025% win ratio.
                
                    '''
                )
            ]
        ),
# Insert Markdown for Data Exploration 2
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    ### Does average survival time per round correlate with a higher win rate? 
                    ##### Let's analyze this by comparing the lower 50% and the upper 50% of the population.   
                
                ''')
            ]
        ),
        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image20.decode())
                        )
            ],
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),
        html.Div(
            [
# Insert Markdown for Data Exploration 2
                dcc.Markdown(
                    ''' 
                    * The lower 50% of the population has a 2.323% win ratio.
                    * The upper 50% of the population has a 7.724% win ratio.
                    * The mean of the population has a 5.023% win ratio.          
                
                '''
                )
            ]
        ),
        
        html.Div(
            [
                
# Insert Markdown for Data Exploration 3
                dcc.Markdown(
                    ''' 
                    ### Does average survival time per round correlate with higher KDRs? 
                    ##### Let's analyze this by comparing the lower 50% and the upper 50% of the population. 
                
                    '''
                )
            ]
        ),

        html.Div(
            [
                html.Img(src = 'data:image/png;base64,{}'
                         .format(encoded_image21.decode())
                        )
            ],
            style = 
            {
                'display': 'flex', 'align-items': 'center',
                'justify-content': 'center'
            }
        ),

        html.Div(
            [
                
# Insert Markdown for Data Exploration 3
                
                dcc.Markdown(
                    ''' 
                    * The lower 50% of the population has a 1.424 KDR.
                    * The upper 50% of the population has a 2.313 KDR.
                    * The mean of the population has a 1.868 KDR.      
                
                    '''
                )
            ]
        ),
        html.Div(
            [
                html.H2("Remarks" )
            ]
        ),

# Insert Markdown for Remarks    
        html.Div(
            [
                dcc.Markdown(
                    ''' 
                    * From the analysis presented in the EDA section, the data is sporadic.
                    * Through the nature of the game despite having better combat skills than your opponents. There are external factors present in the game that can impact your success (wins).
                    * In the next section,  we'll cluster player behavior for identifying cheaters (hackers).
                    
                    
                    '''
                )
            ]
        ),
    ]
)




if __name__ == '__main__':
    app.run_server(debug=False)

## In the next section,  we'll cluster player behavior for identifying cheaters (hackers).
![PUBG Clustering player behavior for cheaters]()