# Data Description

## Data Source

* PUBG Match Deaths and Statistics, Kaggle 
    \- https://www.kaggle.com/skihikingkevin/pubg-match-deaths

## Data Introduction

In this Kaggle Dataset, over 720,000 competitive matches from the popular game PlayerUnknown's Battlegrounds. The data was extracted from pubg.op.gg, a game tracker website.


### PlayerUnknown's Battlegrounds

PUBG is a first/third-person shooter battle royale style game that matches over 90 players on a large island where teams and players fight to the death until one remains. Players are airdropped from an airplane onto the island where they are to scavenge towns and buildings for weapons, ammo, armor and first-aid. Players will then decide to either fight or hide with the ultimate goal of being the last one standing. A bluezone (see below) will appear a few minutes into the game to corral players closer and closer together by dealing damage to anyone that stands within the bluezone and sparing whoever is within the safe zone.


### The Dataset

This dataset provides two zips: aggregate and deaths.

In **deaths**, the files record every death that occurred within the 720k matches. That is, each row documents an event where a player has died in the match.

In **aggregate**, each match's meta information and player statistics are summarized (as provided by pubg). It includes various aggregate statistics such as player kills, damage, distance walked, etc as well as metadata on the match itself such as queue size, fpp/tpp, date, etc.
The uncompressed data is divided into 5 chunks of approximately 2gb each.

### Columns in deaths

1. killed_by: Which weapon is killed
1. killer_name: Killer game id
1. killer_placement: The final ranking of the team where the killer is located
1. killer_position_x: X coordinate of the killer when the killing behavior occurs
1. killer_position_y: Y coordinate of the killer when the killing behavior occurs
1. map: Game Map(Erangel island/ Miramar desert)
1. match_id : Event Unique ID
1. time: When the kill occurs(How many seconds after the game starts)
1. victim_name: The killed game id
1. victim_position_x: X coordinate of the person being killed when the killing occurs
1. victim_position_y: Y coordinate of the killer at the time of the killing behavior

### Columns in aggregate

1. date: Start time of the game
1. game_size: Site size
1. match_id: Event Unique ID
1. match_mode: Game Mode(First/ Third Person View)
1. party_size: Squad size(1person/ 2people/ 4people)
1. player_assists: Rescue teammates
1. player_dbno: Number of times the player was knocked down
1. player_dist_ride: Driving Distance
1. player_dist_walk: Walking distance
1. player_dmg: Injury points
1. player_kills: kills
1. player_name: Player Game id
1. player_survive_time: Player survival time
1. team_id: The player's team number
1. team_placement: The final ranking of the player's team

# 라이브러리 및 데이터 로드

## 라이브러리

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import math
from tqdm.auto import tqdm
tqdm.pandas()
import os

In [2]:
plt.rc('font', family='AppleGothic')
plt.rc('axes', unicode_minus=False)

In [3]:
pd.set_option('display.float_format', lambda x: '%.3f' % x)

## 데이터

In [4]:
data_dir = '../dataset/raw/'

In [5]:
def data_load(data_dir, name_list, df_list):
    df_list = []
    for i in tqdm(name_list):
        df_list.append(pd.read_csv(data_dir + i))
    return df_list

In [6]:
agg_data_dir = '../dataset/raw/aggregate/'
agg_name_list = sorted(os.listdir(agg_data_dir))
agg_list = data_load(agg_data_dir, agg_name_list, agg_list)

  0%|          | 0/5 [00:00<?, ?it/s]

In [8]:
deaths_data_dir = '../dataset/raw/deaths/'
deaths_name_list = sorted(os.listdir(deaths_data_dir))
deaths_list = data_load(deaths_data_dir, deaths_name_list, deaths_list)

  0%|          | 0/5 [00:00<?, ?it/s]

In [4]:
def get_shape(df_list):
    for i in df_list:
        print(i.shape)

In [10]:
get_shape(agg_list)

(13849287, 15)
(13844275, 15)
(13841504, 15)
(13840680, 15)
(11993485, 15)


In [11]:
get_shape(deaths_list)

(13426348, 12)
(13440889, 12)
(13431052, 12)
(13431331, 12)
(11640855, 12)


# Data preprocessing

## Aggregate의 na 제거

In [12]:
def df_drop_na(df_list):
    for i in tqdm(range(len(df_list))):
        df_list[i] = df_list[i].dropna()

In [13]:
df_drop_na(agg_list)

  0%|          | 0/5 [00:00<?, ?it/s]

## match_mode 제거
* 모두 'tpp'만 가짐

In [14]:
def del_col(df_list, col_name):
    for i in tqdm(df_list):
        del i[col_name]

In [15]:
del_col(agg_list, 'match_mode')

  0%|          | 0/5 [00:00<?, ?it/s]

## 두 데이터에서 매칭되지 않는 match_id 제거

In [16]:
def get_unique_match_id(df_list):
    match_id = []
    for i in df_list:
        match_id += [x for x in i['match_id'].unique()]
    return match_id

In [17]:
agg_match_id = get_unique_match_id(agg_list)

In [18]:
len(agg_match_id)

729969

In [19]:
deaths_match_id = get_unique_match_id(deaths_list)

In [20]:
agg_mat = set(agg_match_id)
deaths_mat = set(deaths_match_id)

In [21]:
len(agg_mat), len(deaths_mat), len(agg_mat & deaths_mat)

(729969, 722425, 722396)

In [22]:
get_shape(deaths_list)

(13426348, 12)
(13440889, 12)
(13431052, 12)
(13431331, 12)
(11640855, 12)


In [23]:
for i in range(len(deaths_list)):
    deaths_list[i] = deaths_list[i][deaths_list[i]['match_id'].isin(agg_match_id)]

In [24]:
get_shape(deaths_list)

(13425912, 12)
(13440074, 12)
(13430252, 12)
(13430868, 12)
(11640760, 12)


## deaths data na 처리

### map na 처리

In [25]:
deaths_list[0]['map'].unique()

array(['MIRAMAR', 'ERANGEL', nan], dtype=object)

* map 결측값 대체 가능 여부 확인

In [26]:
map_na_match_id = []
for i in deaths_list:
    map_na_match_id += [x for x in i.loc[i['map'].isnull(), 'match_id'].unique()]

In [27]:
E_match_id = []
for i in deaths_list:
    E_match_id += [x for x in i.loc[i['map'] == 'ERANGEL', 'match_id'].unique()]

In [28]:
M_match_id = []
for i in deaths_list:
    M_match_id += [x for x in i.loc[i['map'] == 'MIRAMAR', 'match_id'].unique()]

* 겹치는 match_id 확인

In [29]:
map_na_match_id = set(map_na_match_id)
E_match_id = set(E_match_id)
M_match_id = set(M_match_id)

In [30]:
len(map_na_match_id & E_match_id), len(map_na_match_id & M_match_id), len(E_match_id & M_match_id) 

(0, 0, 0)

* 대체 불가능 판단 -> Drop

In [31]:
get_shape(deaths_list)

(13425912, 12)
(13440074, 12)
(13430252, 12)
(13430868, 12)
(11640760, 12)


### deaths 데이터 na drop

In [32]:
df_drop_na(deaths_list)

  0%|          | 0/5 [00:00<?, ?it/s]

In [33]:
get_shape(deaths_list)

(12100006, 12)
(12116443, 12)
(12108530, 12)
(12101869, 12)
(10494810, 12)


## Data 병합

* key columns
    * agg.match_id = deaths.match_id
    * agg.player_name = deaths.killer_name

In [34]:
# key column의 이름을 맞춰야 함
# deaths의 killer_name을 player_name으로 변경

def chg_col_names(df_list, col_names):
    for i in tqdm(df_list):
        i.columns = col_names

In [35]:
deaths_list[0].columns

Index(['killed_by', 'killer_name', 'killer_placement', 'killer_position_x',
       'killer_position_y', 'map', 'match_id', 'time', 'victim_name',
       'victim_placement', 'victim_position_x', 'victim_position_y'],
      dtype='object')

In [36]:
deaths_cols = ['killed_by', 'player_name', 'killer_placement', 'killer_position_x',
               'killer_position_y', 'map', 'match_id', 'time', 'victim_name',
               'victim_placement', 'victim_position_x', 'victim_position_y']

In [37]:
# key column 이름 맞추기

chg_col_names(deaths_list, deaths_cols)

  0%|          | 0/5 [00:00<?, ?it/s]

## Aggregate 데이터와 Deaths 데이터 Merge

* 같은 번호의 데이터와 짝을 이루는 것을 match_id로 확인함
    * e.g. agg_0는 deaths_0과 Merge

In [38]:
get_shape(agg_list)

(13829038, 14)
(13824209, 14)
(13821505, 14)
(13820791, 14)
(11976035, 14)


In [39]:
get_shape(deaths_list)

(12100006, 12)
(12116443, 12)
(12108530, 12)
(12101869, 12)
(10494810, 12)


In [40]:
deaths_list[0]['map'].unique()

array(['MIRAMAR', 'ERANGEL'], dtype=object)

In [41]:
def get_merged_df(left, right, join, keys):
    df_list = []
    for i in tqdm(range(len(left))):
        df_list.append(pd.merge(left[i], right[i], how=join, on=keys))
    return df_list

In [42]:
df_merge = get_merged_df(agg_list, deaths_list, 'left', ['match_id', 'player_name'])

  0%|          | 0/5 [00:00<?, ?it/s]

In [43]:
get_shape(df_merge)

(20122234, 24)
(20125055, 24)
(20116444, 24)
(20111666, 24)
(17431740, 24)


## map 채우기

In [44]:
def fill_map_na(df_list, match_id_list, map_name):
    for i in df_list:
        i.loc[(i['match_id'].isin(match_id_list))&(i['map'].isnull()), 'map'] = map_name
    return df_list

In [45]:
match_id = [E_match_id, M_match_id]
map_names = ['ERANGEL', 'MIRAMAR']

for i in tqdm(range(len(match_id))):
    df_merge = fill_map_na(df_merge, match_id[i], map_names[i])

  0%|          | 0/2 [00:00<?, ?it/s]

## map별로 데이터 나누기

* ERANGEL과 MIRAMAR로 데이터 셋을 나눔

In [46]:
def get_df_map(df_list, map_name):
    df_map = []
    for i in df_list:
        df_map.append(i[i['map'] == map_name])
    return df_map

In [47]:
df_map  = []
for i in tqdm(range(len(map_names))):
    df_map.append(get_df_map(df_merge, map_names[i]))

  0%|          | 0/2 [00:00<?, ?it/s]

## party_size 별로 데이터 나누기

* party_size에 따라 Tier가 다르기 때문에 데이터를 분리함

In [48]:
def get_df_party_size(df_list, party_size):
    df_party_size = []
    for i in df_list:
        df_party_size.append(i[i['party_size'] == party_size])
    return df_party_size

In [50]:
df = []
party_size = [1, 2, 4]

for i in tqdm(range(len(party_size))):
    for j in range(len(df_map)):
        df.append(get_df_party_size(df_map[j], party_size[i]))

  0%|          | 0/3 [00:00<?, ?it/s]

In [51]:
# df[0]: solo, ERANGEL / df[1]: solo, MIRAMAR
# df[2]: duo, ERANGEL / df[3]: duo, MIRAMAR
# df[4]: squad, ERANGEL / df[5]: squad, MIRAMAR

len(df)

6

## party_size별 map별 dataset 합치기

In [52]:
def get_concat(df_list):
    df_concat = []
    for i in tqdm(range(len(df_list))):
        df_concat.append(pd.concat(df_list[i], ignore_index=True))
    return df_concat

In [53]:
# df_concat[0]: solo, ERANGEL / df_concat[1]: solo, MIRAMAR
# df_concat[2]: duo, ERANGEL / df_concat[3]: duo, MIRAMAR
# df_concat[4]: squad, ERANGEL / df_concat[5]: squad, MIRAMAR

df_concat = get_concat(df)

  0%|          | 0/6 [00:00<?, ?it/s]

## killed_by 그룹핑

In [54]:
def killed_by_refine(df):
    df['killed_by'] = df['killed_by'].replace({'death.WeapSawnoff_C': 'sawed_off', 
                                               'death.PlayerMale_A_C': 'Punch',
                                               'death.PG117_A_01_C': 'Boat' , 'death.RedZoneBomb_C': 'RedZone'})
    df['killed_by'] = df['killed_by'].replace(['Pickup Truck','Hit by Car','Buggy','Dacia','Motorbike',
                                               'Motorbike (SideCar)','Uaz','Van'], 'land_vehicle')
    df['killed_by'] = df['killed_by'].replace(['death.ProjMolotov_C', 'death.ProjMolotov_DamageField_C', 
                                               'death.Buff_FireDOT_C'], 'Molotov')
    df['killed_by'] = df['killed_by'].replace(['Aquarail','Boat'], 'water_vehicle')

In [55]:
for i in tqdm(df_concat):
    killed_by_refine(i)

  0%|          | 0/6 [00:00<?, ?it/s]

## csv로 내보내기

In [56]:
def df_read_csv(df_list, data_dir, file_name):
    for i in tqdm(range(len(df_list))):
        df_list[i].to_csv(data_dir + f'{file_name[i]}.csv', index=False)

In [57]:
data_dir = '../dataset/preprocessing/'
file_name = ['solo_E', 'solo_M', 'duo_E', 'duo_M', 'squad_E', 'squad_M']

df_read_csv(df_concat, data_dir, file_name)

  0%|          | 0/6 [00:00<?, ?it/s]

# Outlier 처리

In [8]:
def data_load(data_dir, name_list, df_list):
    df_list = []
    for i in tqdm(name_list):
        df_list.append(pd.read_csv(data_dir + i))
    return df_list

In [11]:
data_dir = '../dataset/preprocessing/'
name_list = ['solo_E.csv', 'solo_M.csv', 'duo_E.csv', 'duo_M.csv', 'squad_E.csv', 'squad_M.csv']
df_prep = []

df_prep = data_load(data_dir, name_list, df_prep)

  0%|          | 0/6 [00:00<?, ?it/s]

In [41]:
# df_prep[0]: solo, ERANGEL / df_prep[1]: solo, MIRAMAR
# df_prep[2]: duo, ERANGEL / df_prep[3]: duo, MIRAMAR
# df_prep[4]: squad, ERANGEL / df_prep[5]: squad, MIRAMAR

get_shape(df_prep)

(17105104, 24)
(3096664, 24)
(25881722, 24)
(5208521, 24)
(36231117, 24)
(8886496, 24)


### game_size
* 40팀 미만

In [46]:
df_prep_raw = df_prep.copy()

In [67]:
class CheckingOutlier:
    
    '''
    Outlier 처리를 위한 Class 입니다.
    아래의 기준에 만족하는 데이터만 사용합니다.
    
    game_size : solo  - 90 이상
                duo   - 40 이상
                squad - 20 이상
    dist_ride : 30000(30km) 이하
    dist_walk : 10000(10km) 이하
    kills :     30 kill 이하
    dmg :       3000 damage 이하
    kill_dist : 40000(400m) 이하
    dbno :      duo/squad : 11 이하
    survive_time : 1900 이하
    
    '''
    
    def game_size_outlier(self, df):
        if df['party_size'].unique() == 1:
            df = df.loc[df['game_size'] >= 80]
        elif df['party_size'].unique() == 2:
            df = df.loc[df['game_size'] >= 40]
        else:
            df = df.loc[df['game_size'] >= 30]
        return df
        
    def player_dist_ride(self, df):
        df = df.loc[df['player_dist_ride'] <= 30000]
    
    def player_dist_walk(self, df):
        df = df.loc[df['player_dist_walk'] <= 10000]
        
    def player_kills(self, df):
        df = df.drop(df.loc[df['player_kills'] > 30].index)
    
    def player_dmg(self, df):
        df = df.drop(df.loc[df['player_dmg'] > 3000].index)  
        
    def kill_dist(self, df):
        df['kill_distance'] = np.sqrt(((df['killer_position_x'] - df['victim_position_x']) ** 2) 
                                      + ((df['killer_position_y'] - df['victim_position_y']) ** 2))
        df = df.drop(df.loc[df['kill_dist'] > 4000].index)
        return df    
    
    def dbno(self, df):
        if df.loc[0, 'party_size'] != 1 :
            df = df.drop(df.loc[df['player_dbno'] > 11].index)
        return df
    
    def survive_time(self, df):
        df = df.drop(df.loc[df['player_survive_time'] > 1900].index)


In [68]:
def check_outlier(df_list):
    new_df_list = []
    checker = CheckingOutlier()
    
    for df in df_list[i]:
        filtered_df = checker.game_size_outlier(df)
        filtered_df = checker.player_dist_ride(filtered_df)
        .
        .
        .
        .
        
    new_df_list.append(filtered_df)
            
    return new_df_list

In [69]:
df_prep = check_outlier(df_prep)

  0%|          | 0/3 [00:00<?, ?it/s]

In [73]:
df_prep[0]['game_size'].describe()

count   16451160.000
mean          94.117
std            3.847
min           80.000
25%           93.000
50%           95.000
75%           97.000
max          100.000
Name: game_size, dtype: float64

In [None]:
df_E = df_E.loc[df_E['game_size'] >= 40]

In [None]:
df_M = df_M.loc[df_M['game_size'] >= 40]

In [None]:
df_E['team_placement'].value_counts()

In [None]:
df_M['team_placement'].value_counts()

### player_dist_ride, player_dist_walk
* ride 30km 초과
* walk 10km 초과

In [None]:
df_E = df_E.loc[df_E['player_dist_ride'] <= 30000]

In [None]:
df_M = df_M.loc[df_M['player_dist_ride'] <= 30000]

In [None]:
df_E = df_E.loc[df_E['player_dist_walk'] <= 10000]

In [None]:
df_M = df_M.loc[df_M['player_dist_walk'] <= 10000]

In [None]:
df_E['team_placement'].value_counts()

In [None]:
df_M['team_placement'].value_counts()

### player_kills, player_dmg
* kill: 30킬 초과
* dmg: 3000데미지 초과

In [None]:
plt.ticklabel_format(style='plain')
sns.scatterplot(data=df_E[['player_dmg', 'player_kills']], x='player_dmg', y='player_kills')

In [None]:
plt.ticklabel_format(style='plain')
sns.scatterplot(data=df_M[['player_dmg', 'player_kills']], x='player_dmg', y='player_kills')

In [None]:
df_E = df_E.drop(df_E.loc[(df_E['player_kills'] > 30 ) | (df_E['player_dmg'] > 3000)].index)

In [None]:
df_M = df_M.drop(df_M.loc[(df_M['player_kills'] > 30 ) | (df_M['player_dmg'] > 3000)].index)

In [None]:
df_E['team_placement'].value_counts()

In [None]:
df_M['team_placement'].value_counts()

### kill_dist
* 400m 초과

In [None]:
df_E['kill_dist'] = np.sqrt((df_E['killer_position_x'] - df_E['victim_position_x'])**2 
                             + (df_E['killer_position_y'] - df_E['victim_position_y'])**2)

In [None]:
df_M['kill_dist'] = np.sqrt((df_M['killer_position_x'] - df_M['victim_position_x'])**2 
                             + (df_M['killer_position_y'] - df_M['victim_position_y'])**2)

In [None]:
plt.ticklabel_format(style='plain')
df_E.loc[df_E['kill_dist'] < 40000, 'kill_dist'].hist(bins=100)

In [None]:
plt.ticklabel_format(style='plain')
df_M.loc[df_M['kill_dist'] < 40000, 'kill_dist'].hist(bins=100)

In [None]:
df_E = df_E.drop(df_E[df_E['kill_dist'] > 40000].index)

In [None]:
df_M = df_M.drop(df_M[df_M['kill_dist'] > 40000].index)

### player_assists, player_dbno
* assist: 그대로 사용
* dbno: 11번 초과

In [None]:
df_E = df_E.drop(df_E.loc[df_E['player_dbno'] > 11].index)

In [None]:
df_M = df_M.drop(df_M.loc[df_M['player_dbno'] > 11].index)

In [None]:
df_E['team_placement'].value_counts()

In [None]:
df_M['team_placement'].value_counts()

### player_survive_time
* 1900초 초과

In [None]:
df_E_1 = df_E.copy()
df_M_1 = df_M.copy()

In [None]:
# df_E = df_E_1.copy()
# df_M = df_M_1.copy()

In [None]:
df_E['player_survive_time'].hist(bins=100)

In [None]:
df_E.loc[df_E['team_placement']==1, 'player_survive_time'].hist(bins=100)

In [None]:
df_E.loc[df_E['player_survive_time'] > 1900, 'team_placement'].value_counts()

In [None]:
df_E = df_E.drop(df_E.loc[(df_E['player_survive_time'] > 1900)].index)

In [None]:
df_M = df_M.drop(df_M.loc[(df_M['player_survive_time'] > 1900)].index)

In [None]:
df_E['team_placement'].value_counts()

In [None]:
df_M['team_placement'].value_counts()

## 분석 Dataset 확인

In [None]:
df_E.shape

In [None]:
df_E.isnull().sum()

In [None]:
df_M.shape

In [None]:
df_M.isnull().sum()

## Dataset 분리하기

* Outlier를 제거한 데이터셋과 Outlier만 모은 데이터셋으로 분리

In [None]:
E_in_idx = list(df_E.index)
M_in_idx = list(df_M.index)

In [None]:
df_E_in = df_E.reset_index(drop=True)
df_M_in = df_M.reset_index(drop=True)

In [None]:
E_out_idx = list(set(df_E_raw.index) - set(df_E.index))
M_out_idx = list(set(df_M_raw.index) - set(df_M.index))

In [None]:
df_E_out = df_E_raw.loc[E_out_idx].reset_index(drop=True)
df_M_out = df_M_raw.loc[M_out_idx].reset_index(drop=True)

## csv로 내보내기

In [None]:
df_E_out.to_csv('../dataset/duo/duo_E_out.csv', index=False)

In [None]:
df_M_out.to_csv('../dataset/duo/duo_M_out.csv', index=False)

In [None]:
df_E_in.to_csv('../dataset/duo/duo_E_in.csv', index=False)

In [None]:
df_M_in.to_csv('../dataset/duo/duo_M_in.csv', index=False)

## play_count 10회 이상인 player 선택

In [None]:
df_duo = pd.concat([df_E_in, df_M_in])

In [None]:
df_duo.shape

In [None]:
play_count = df_duo.groupby('player_name')['match_id'].nunique().to_frame()

In [None]:
play_count.loc[play_count['match_id'] >= 10].index

In [None]:
df_duo = df.loc[df['player_name'].isin(play_count[play_count['match_id'] >= 10].index)]

In [None]:
df_duo.shape

In [None]:
df_duo.to_csv('../dataset/duo/duo.csv', index=False)

# 파생변수 생성

## Duo

In [None]:
df = pd.read_csv('../dataset/duo/duo.csv')

In [None]:
df.shape

In [None]:
df.info()

### date

* 날짜형 데이터 처리

In [None]:
df['date'] = pd.to_datetime(df['date'])

### Score
* each_game_score: (50 - team_placement) * 1 + player_kills * 2 + player_assists * 2
* total_score: sum(each_game_score) by player_name

In [None]:
df['team_placement'].value_counts()

In [None]:
# player별 각 게임에서의 점수 계산

df['each_game_score'] = (50 - df['team_placement'])*1 + df['player_kills']*2 + df['player_assists']*2
min(df['each_game_score']), max(df['each_game_score'])

In [None]:
# player별 총 점수를 계산

score = df.groupby(['player_name', 'match_id'])['each_game_score'].mean().to_frame()
total_score = score.groupby('player_name')['each_game_score'].sum().to_frame()
total_score.columns = ['total_score']
min(total_score['total_score']), max(total_score['total_score'])

In [None]:
total_score['total_score'].hist(bins=1000)
plt.ticklabel_format(style='plain')

In [None]:
total_score.loc[total_score['total_score'] > 1500, 'total_score'].count()

In [None]:
df['total_score'] = total_score.loc[df['player_name'], 'total_score'].values

In [None]:
df['total_score'] = np.log(df['total_score'])

In [None]:
df.groupby('player_name')['total_score'].mean().hist(bins=1000)

### Tier

In [None]:
def get_tier(score):
    if score < 6:
        return 1 # Bronze
    elif score < 6.75:
        return 2 # Silver
    elif score < 7.5:
        return 3 # Gold
    elif score < 8.25:
        return 4 # Platinum
    elif score < 9:
        return 5 # Diamond
    else: 
        return 6 # Master

In [None]:
df['tier'] = df['total_score'].apply(lambda x: get_tier(x))

In [None]:
df.groupby('player_name')['tier'].mean().hist()

In [None]:
df.groupby('tier')['player_name'].nunique()

In [None]:
df.groupby('tier')['player_name'].nunique()/491883*100

### KDA
* (kills + assists)/deaths

#### kills

In [None]:
kill = df.groupby(['player_name', 'match_id'])['player_kills'].mean().to_frame()

In [None]:
kills = kill.groupby('player_name')['player_kills'].sum().to_frame()

In [None]:
kills.head()

In [None]:
kills.isnull().sum()

#### assists

In [None]:
assist = df.groupby(['player_name', 'match_id'])['player_assists'].mean().to_frame()

In [None]:
assists = assist.groupby('player_name')['player_assists'].sum().to_frame()

In [None]:
assists.head()

In [None]:
assists.isnull().sum()

#### deaths

In [None]:
# 1등 한 game 횟수
count_rank1 = df[df['team_placement']==1].groupby('player_name')['match_id'].nunique().to_frame()
count_rank1.columns = ['rank1']

In [None]:
count_rank1.isnull().sum()

In [None]:
# play한 game 횟수

game_count = df.groupby('player_name')['match_id'].nunique().to_frame()
game_count.columns = ['games']

In [None]:
game_count.isnull().sum()

In [None]:
deaths = pd.merge(count_rank1, game_count, how='outer', on='player_name')
deaths.head()

In [None]:
deaths.isnull().sum()

In [None]:
deaths = deaths.fillna(0)

In [None]:
deaths['deaths'] = deaths['games'] - deaths['rank1']
deaths.head(1)

#### KDA

In [None]:
kda = pd.merge(kills, assists, how='outer', on='player_name')
kda = pd.merge(kda, deaths['deaths'], how='outer', on='player_name')
kda.isnull().sum()

In [None]:
kda['kda'] = (kda['player_kills'] + kda['player_assists']) / kda['deaths']
kda.head()

In [None]:
df = pd.merge(df, kda['kda'], how='left', on='player_name')

In [None]:
df['kda'].isnull().sum()

### num_of_match
* player별 play 횟수

In [None]:
# tier별 차이를 확인하는 데에 필요한 컬럼만 선택

cols = ['player_name', 'match_id', 'player_kills', 'player_dmg', 'player_assists', 'player_dbno',  'kda',
        'player_dist_walk', 'player_dist_ride', 'kill_dist', 'player_survive_time', 'team_placement', 'tier']

df_player_match = pd.pivot_table(data=df[cols], index=['player_name', 'match_id'], aggfunc='mean')

In [None]:
# player별 game play 횟수

num_of_match = df_player_match.groupby('player_name')['tier'].value_counts().to_frame()
num_of_match.columns = ['num_of_match']

In [None]:
df_player = df_player_match.groupby('player_name').mean()
df_player = pd.merge(df_player, num_of_match, how='left', on='player_name')

In [None]:
df_player = df_player[['player_kills', 'player_dmg', 'player_assists', 'player_dbno',  'kda', 'player_dist_walk', 
                       'player_dist_ride', 'kill_dist', 'player_survive_time', 'team_placement',
                       'num_of_match','tier']]
df_player.head()

In [None]:
df_player.shape

In [None]:
df_player.info()

In [None]:
df_player.to_csv('tier_diff_duo.csv', index=False)

# Tier별 차이 검정

## Duo

In [None]:
df.corr().style.background_gradient(cmap='Blues')

In [None]:
df.groupby('tier').mean()

### player_kills
* Tier와 평균 kill 횟수는 비례함

In [None]:
df.groupby('tier')['player_kills'].mean().to_frame()

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_kills', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_kills'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_kills'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_kills'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_kills'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_kills'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_kills'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_kills'],
              df.loc[df['tier'] == 2, 'player_kills'],
              df.loc[df['tier'] == 3, 'player_kills'],
              df.loc[df['tier'] == 4, 'player_kills'],
              df.loc[df['tier'] == 5, 'player_kills'],
              df.loc[df['tier'] == 6, 'player_kills'])

In [None]:
sp.posthoc_conover(df, val_col ='player_kills', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### player_dmg

In [None]:
df.groupby('tier')['player_dmg'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='player_dmg')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_dmg', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_dmg'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_dmg'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_dmg'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_dmg'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_dmg'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_dmg'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_dmg'],
              df.loc[df['tier'] == 2, 'player_dmg'],
              df.loc[df['tier'] == 3, 'player_dmg'],
              df.loc[df['tier'] == 4, 'player_dmg'],
              df.loc[df['tier'] == 5, 'player_dmg'],
              df.loc[df['tier'] == 6, 'player_dmg'])

In [None]:
sp.posthoc_conover(df, val_col ='player_dmg', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### player_assists
* Tier와 평균 assits 횟수는 비례함 

In [None]:
df.groupby('tier')['player_assists'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='player_assists')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_assists', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_assists'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_assists'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_assists'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_assists'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_assists'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_assists'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_assists'],
              df.loc[df['tier'] == 2, 'player_assists'],
              df.loc[df['tier'] == 3, 'player_assists'],
              df.loc[df['tier'] == 4, 'player_assists'],
              df.loc[df['tier'] == 5, 'player_assists'],
              df.loc[df['tier'] == 6, 'player_assists'])

In [None]:
sp.posthoc_conover(df, val_col ='player_assists', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### player_dbno
* Tier와 평균 dbno 횟수는 비례함 

In [None]:
df.groupby('tier')['player_dbno'].mean().to_frame()

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_dbno', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_dbno'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_dbno'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_dbno'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_dbno'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_dbno'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_dbno'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_dbno'],
              df.loc[df['tier'] == 2, 'player_dbno'],
              df.loc[df['tier'] == 3, 'player_dbno'],
              df.loc[df['tier'] == 4, 'player_dbno'],
              df.loc[df['tier'] == 5, 'player_dbno'],
              df.loc[df['tier'] == 6, 'player_dbno'])

In [None]:
sp.posthoc_conover(df, val_col ='player_dbno', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### kda
* Tier와 평균 kda는 비례함

In [None]:
df.groupby('tier')['kda'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='kda')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='kda', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'kda'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'kda'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'kda'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'kda'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'kda'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'kda'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'kda'],
              df.loc[df['tier'] == 2, 'kda'],
              df.loc[df['tier'] == 3, 'kda'],
              df.loc[df['tier'] == 4, 'kda'],
              df.loc[df['tier'] == 5, 'kda'],
              df.loc[df['tier'] == 6, 'kda'])

In [None]:
sp.posthoc_conover(df, val_col ='kda', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### player_dist_walk
* Tier 1~3 에서 평균 거리가 증가
* Tier 3~6 에서는 거리가 감소
* Tier 2와 5는 평균의 차이가 없음

In [None]:
df.groupby('tier')['player_dist_walk'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='player_dist_walk')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_dist_walk', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_dist_walk'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_dist_walk'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_dist_walk'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_dist_walk'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_dist_walk'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_dist_walk'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_dist_walk'],
              df.loc[df['tier'] == 2, 'player_dist_walk'],
              df.loc[df['tier'] == 3, 'player_dist_walk'],
              df.loc[df['tier'] == 4, 'player_dist_walk'],
              df.loc[df['tier'] == 5, 'player_dist_walk'],
              df.loc[df['tier'] == 6, 'player_dist_walk'])

In [None]:
sp.posthoc_conover(df, val_col ='player_kills', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### player_dist_ride
* Tier 1에서는 확실히 탈것을 활용한 이동 거리가 짧음
* Tier 2부터는 그룹간의 차이가 있긴하는하지만, tier와 탈것을 활용한 이동 거리가 정비례 하지는 않음

In [None]:
df.groupby('tier')['player_dist_ride'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='player_dist_ride')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_dist_ride', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_dist_ride'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_dist_ride'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_dist_ride'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_dist_ride'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_dist_ride'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_dist_ride'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_dist_ride'],
              df.loc[df['tier'] == 2, 'player_dist_ride'],
              df.loc[df['tier'] == 3, 'player_dist_ride'],
              df.loc[df['tier'] == 4, 'player_dist_ride'],
              df.loc[df['tier'] == 5, 'player_dist_ride'],
              df.loc[df['tier'] == 6, 'player_dist_ride'])

In [None]:
sp.posthoc_conover(df, val_col ='player_dist_ride', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### kill_dist
* Tier가 높다고 kill_dist가 비례하여 커지는건 아니지만, 1~2/3~6은 구분되는 것처럼 보임

In [None]:
df.groupby('tier')['kill_dist'].median().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='kill_dist')
plt.xticks(np.arange(6), ['Bronze', 'Silver', 'Gold', 'Platinum', 'Diamond', 'Master'])

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='kill_dist', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'kill_dist'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'kill_dist'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'kill_dist'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'kill_dist'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'kill_dist'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'kill_dist'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'kill_dist'].fillna(0),
               df.loc[df['tier'] == 2, 'kill_dist'].fillna(0),
               df.loc[df['tier'] == 3, 'kill_dist'].fillna(0),
               df.loc[df['tier'] == 4, 'kill_dist'].fillna(0),
               df.loc[df['tier'] == 5, 'kill_dist'].fillna(0),
               df.loc[df['tier'] == 6, 'kill_dist'].fillna(0))

In [None]:
sp.posthoc_conover(df, val_col ='kill_dist', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')


### player_survive_time
* 확실히 높은 티어일수록 1500초 이상 살아있는 경우가 많음

In [None]:
df.groupby('tier')['player_survive_time'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='player_survive_time')

In [None]:
plt.figure(figsize=(15, 5))
sns.kdeplot(data=df, x='player_survive_time', hue='tier', multiple='stack')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'player_survive_time'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'player_survive_time'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'player_survive_time'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'player_survive_time'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'player_survive_time'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'player_survive_time'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'player_survive_time'],
              df.loc[df['tier'] == 2, 'player_survive_time'],
              df.loc[df['tier'] == 3, 'player_survive_time'],
              df.loc[df['tier'] == 4, 'player_survive_time'],
              df.loc[df['tier'] == 5, 'player_survive_time'],
              df.loc[df['tier'] == 6, 'player_survive_time'])

In [None]:
sp.posthoc_conover(df, val_col ='player_kills', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### team_placement
* Tier 1은 확실히 낮은 등수를 기록하는 경우가 많음

In [None]:
df.groupby('tier')['team_placement'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='team_placement')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'team_placement'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'team_placement'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'team_placement'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'team_placement'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'team_placement'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'team_placement'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'team_placement'],
              df.loc[df['tier'] == 2, 'team_placement'],
              df.loc[df['tier'] == 3, 'team_placement'],
              df.loc[df['tier'] == 4, 'team_placement'],
              df.loc[df['tier'] == 5, 'team_placement'],
              df.loc[df['tier'] == 6, 'team_placement'])

In [None]:
sp.posthoc_conover(df, val_col ='team_placement', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

In [None]:
df_tier = pd.read_csv('../dataset/duo/duo_tier.csv')

In [None]:
del df_count

In [None]:
df_tier.columns

### num_of_match

In [None]:
df.groupby('tier')['num_of_match'].mean().to_frame()

In [None]:
plt.figure(figsize=(15,5))
sns.violinplot(data=df, x='tier', y='num_of_match')

In [None]:
tier1 = stats.anderson(df.loc[df['tier'] == 1, 'num_of_match'], dist='norm')
tier2 = stats.anderson(df.loc[df['tier'] == 2, 'num_of_match'], dist='norm')
tier3 = stats.anderson(df.loc[df['tier'] == 3, 'num_of_match'], dist='norm')
tier4 = stats.anderson(df.loc[df['tier'] == 4, 'num_of_match'], dist='norm')
tier5 = stats.anderson(df.loc[df['tier'] == 5, 'num_of_match'], dist='norm')
tier6 = stats.anderson(df.loc[df['tier'] == 6, 'num_of_match'], dist='norm')

print('tier1:', tier1[0] < tier1[1][2], '\n' 
      'tier2:', tier2[0] < tier2[1][2], '\n'
      'tier3:', tier3[0] < tier3[1][2], '\n'
      'tier4:', tier4[0] < tier3[1][2], '\n'
      'tier5:', tier5[0] < tier3[1][2], '\n'
      'tier6:', tier6[0] < tier3[1][2], '\n')

In [None]:
stats.kruskal(df.loc[df['tier'] == 1, 'num_of_match'],
              df.loc[df['tier'] == 2, 'num_of_match'],
              df.loc[df['tier'] == 3, 'num_of_match'],
              df.loc[df['tier'] == 4, 'num_of_match'],
              df.loc[df['tier'] == 5, 'num_of_match'],
              df.loc[df['tier'] == 6, 'num_of_match'])

In [None]:
sp.posthoc_conover(df, val_col ='num_of_match', 
                     group_col ='tier', p_adjust = 'holm').style.background_gradient(cmap='Blues')

### killed_by

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==1], y='killed_by', 
              order=df_tier.loc[df_tier['tier']==1, 'killed_by'].value_counts().index)

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==2], y='killed_by',
              order=df_tier.loc[df_tier['tier']==2, 'killed_by'].value_counts().index)

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==3], y='killed_by',
              order=df_tier.loc[df_tier['tier']==3, 'killed_by'].value_counts().index)

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==4], y='killed_by',
              order=df_tier.loc[df_tier['tier']==4, 'killed_by'].value_counts().index)

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==5], y='killed_by',
              order=df_tier.loc[df_tier['tier']==5, 'killed_by'].value_counts().index)

In [None]:
plt.figure(figsize=(5, 15))
sns.countplot(data=df_tier[df_tier['tier']==6], y='killed_by',
              order=df_tier.loc[df_tier['tier']==6, 'killed_by'].value_counts().index)