#COMP0138 Final Year Project


## Setup and Data Loading
Python imports and loading data from CSV.

In [11]:
import networkx as nx
import csv
import pandas as pd
from google.colab import drive
import matplotlib.pyplot as plt
import io
import numpy as np
import json
import seaborn as sns
drive.mount('/content/drive')

Mounted at /content/drive


In [210]:
data_folder = '/content/drive/MyDrive/COMP0138/data/'
results_folder = '/content/drive/MyDrive/COMP0138/results/'
months = [str(0) + str(x) for x in range(1,10)] + [str(x) for x in range(10,13)]
# labels = ['id', 'player', 'start_elo', ' end_elo', ' max_elo', ' min_elo', ' avg_elo', 'w_below_neg_300', 'w_neg_200_to_299', 'w_neg_100_to_199', 'w_neg_50_to_99', 'w_neg_1_to_49', 'w_zero', 'w_1_to_49', 'w_50_to_99', 'w_100_to_199', 'w_200_to_299', 'w_above_pos_300', 'l_below_neg_300', 'l_neg_200_to_299', 'l_neg_100_to_199', 'l_neg_50_to_99', 'l_neg_1_to_49', 'l_zero', 'l_1_to_49', 'l_50_to_99', 'l_100_to_199', 'l_200_to_299', 'l_above_pos_300', 'd_below_neg_300', 'd_neg_200_to_299', 'd_neg_100_to_199', 'd_neg_50_to_99', 'd_neg_1_to_49', 'd_zero', 'd_1_to_49', 'd_50_to_99', 'd_100_to_199', 'd_200_to_299', 'd_above_pos_300', 'w_Grandmaster', ' w_Master', ' w_ClassA', ' w_ClassB', ' w_ClassC', ' w_ClassD', ' w_Novice', ' l_Grandmaster', ' l_Master', ' l_ClassA', ' l_ClassB', ' l_ClassC', ' l_ClassD', ' l_Novice', ' d_Grandmaster', ' d_Master', ' d_ClassA', ' d_ClassB', ' d_ClassC', ' d_ClassD', ' d_Novice', 'win_count', ' loss_count', ' draw_count', 'win_loss_ratio', ' max_win_streak', ' max_lose_streak', 'max_elo_diff', ' min_elo_diff', ' avg_elo_diff', ' win_higher_count', ' win_lower_count', ' win_even_count', ' loss_higher_count', ' loss_lower_count', ' loss_even_count', ' draw_higher_count', ' draw_lower_count', ' draw_even_count', ' opponent_count', 'game_count', ' avg_daily_games', ' avg_weekly_games', ' avg_days_between_sessions', ' full_day_count']

df_list = []
for m in months:
  file_name = data_folder + 'nodelist_2014-{}.csv'.format(m)
  df = pd.read_csv(file_name)
  df_list.append(df)

# df_list

In [152]:
nodes_file = data_folder + '2014_nodes.txt'
nodes_dict = dict()

with open(nodes_file) as json_file:
  nodes_dict = json.load(json_file)


## First Stage: Progress

### Input data format
The data for part 1 consists of CSV files, each containing the 2313 nodes (players) and data of each. As above, the data can be classified into three perspectives. The data collected is for a specified period (by default, 1 month)

#### **Perspective 1: Progress**
The following are the metrics collected for measuring **progress**. 

**Elo Rating**
>`start_elo`: the player's Elo rating \\
`end_elo`: the player's Elo rating \\
`avg_elo`: the player's average Elo rating \\
`min_elo`: the player's lowest Elo rating \\
`max_elo`: the player's highest Elo rating

**Win/Loss/Draw Statistics** \\
*Overview*
>`win_count`: the number of games won \\
`loss_count`: the number of games lost \\
`draw_count`: the number of games drawn \\
`win_loss_ratio`: `win_count`/`loss_count`, defaults to the `win_count` if `loss_count` is zero \\
`max_win_streak`: the maximum number of consecutive wins \\
`max_loss_streak`: the maximum number of consecutive losses

**Stats Breakdown** \\
*Relative Metrics* \\
The win, loss and draw counts of each player against opponents with respect to their rating difference ($x$ for the list below) is also collected. These are collected in the following ranges


*   $x \le -300$, `_below_neg_300`
*   $ -300 < x \le -200 $, `_neg_200_to_299`
*   $ -200 < x \le -100 $, `_neg_100_to_199`
*   $ -100 < x \le -50 $, `_neg_50_to_99`
*   $ -50 < x < 0 $, `_neg_1_to_49`
*   $ x = 0 $, `_zero`
*   $ 0 < x < 50 $, `_1_to_49`
*   $ 50 \le x < 100 $, `_50_to_99`
*   $ 100 \le x < 200 $, `_100_to_199`
*   $ 200 \le x < 300 $, `_200_to_299`
*   $ x \ge 300 $, `_above_300`

The headers are in the form `{w/l/d}_{range}`, where the `{range}` is given by the monospace text beside the respective range in the above list. e.g. number of wins against opponents rated 100 to 199 above the player's own is given by `w_100_to_199`. This data is available for wins, losses and draws.

*Absolute Metrics* \\
The following metrics measure the win, loss and draw counts of each player against 7 classes of players by rating. These are

* Grandmaster $(\le 2300)$, `_Grandmaster`
* Master $(2000 \le x < 2300)$, `_Master`
* Class A $(1800 \le x < 2000)$, `_ClassA`
* Class B $(1600 \le x < 1800)$, `_ClassB`
* Class C $(1400 \le x < 1600)$, `_ClassC`
* Class D $(1200 \le x < 1400)$, `_ClassD`
* Novice $(> 1200)$, `_Novice`

As above, the headers are in the form `{w/l/d}_{class}`. The numbers in the brackets indicate the absolute rating of the opponent at the time of the match. e.g. `w_Grandmaster` indicates the number of wins against Grandmasters (players with an Elo rating at or above 2300 at the time of the match).

#### **Perspective 2: Strategy**

The following metrics are collected to measure **strategy**. All differences are in the form $Elo(opponent) - Elo(player)$, meaning a positive figure indicates the opponent's Elo rating was higher than the player's. Additionally, the Elo differences are taken using the player's Elo rating for that particular game (as it changes after every game).

**Elo Difference**
>`max_elo_diff`: the highest positive opponent Elo rating difference (Elo difference between player and strongest opponent played)  \\
`min_elo_diff`: the highest negative opponent Elo rating difference (Elo difference btween the player and weakest opponent played) \\
`avg_elo_diff`: the average Elo rating difference with respect to the player's Elo rating.
 
**Opponent Count**
>`opponent_count`: number of unique opponents played (may be different from `game_count`), equivalent to node degree

**Win/Loss/Draw Statistics**
These statistics are taken from various sums of the breakdown statistics obtained for Perspective 1.
>`win_higher_count`: the number of higher rated opponents beaten \\
`win_lower_count`: the number of lower rated opponents beaten \\
`win_even_count`: the number of players of the same rating beaten \\

Same for `loss` and `draw`, i.e. `loss_higher_count`, `draw_even_count`, etc.

It is likely that instead of these individual win/loss/draw stats, the sum of higher, lower and even will be used for calculating strategy. Perhaps a total number of games for each band. Additionally, the same can be obtained for the number of games played against each class of player (Grandmaster, Master, Class A, etc.)



#### Perspective 3: **Activity**
The following metrics measure **activity**.

>`avg_daily_games`: average number of games played daily (sum of games/days in the month) \\
`avg_weekly_games`: `avg_daily_games` * 7 \\
`full_day_count`: a string of numbers separated by a fullstop indicating the number of games played for each day in a month (e.g. 0.1.10.5 ... indicates that the player has played 0 games on day 1, 1 game on day 2, 10 on day 3, etc.) \\
`avg_days_between_sessions`: number of inactive days / number of inactive periods
 

For `avg_days_between_sessions`, an inactive period is referred to a period of consecutive days (or single day) when the player has not played a game. Inactive days are days on which the player has played zero games. e.g.

>0.0.1.1.0.0.1.0.0.1

The above has 3 inactive periods and 6 inactive days, meaning `avg_days_between_sessions` is 2.


### Part 1: User Classifications
Classify users based on three perspectives: progress, strategies and activity.

**Progress** is measured using data on a player's own Elo rating and their win/loss/draw statistics.

**Strategy** is measured using data on the player's Elo rating with respect to their opponent's (difference between ratings) and their win/loss/draw statistics with respect to this difference.

**Activity** is measured using data on the player's number of games played and frequency of games. 

#### **Progress Definition**

To properly measure progress, it must first be given a clear definition. We will classify players as having made progress, maintained their level or regressed. They will be given a basic classification (progressed, maintained, regressed) based on a scale from 

1. **Elo rating change**

The Elo rating of a player ids a measure of their relative strength or performance against other players. Each player's Elo start, end, minimum, maximum and average rating is recorded for each period. The variance of their Elo ratings can also be easily calculated. 


2. Win/Loss/Draw percentages with respect to the Elo difference between the player and each opponent


3. Win/Loss/Draw percentages with respect to the Elo rating of the opponents



In [211]:
# Full list of labels = 
# ['id', 'player', 'start_elo', 'end_elo', 'max_elo', 'min_elo', 'avg_elo', 
#  'w_below_neg_300', 'w_neg_200_to_299', 'w_neg_100_to_199', 'w_neg_50_to_99', 'w_neg_1_to_49', 'w_zero', 'w_1_to_49', 'w_50_to_99', 'w_100_to_199', 'w_200_to_299', 'w_above_pos_300', 
#  'l_below_neg_300', 'l_neg_200_to_299', 'l_neg_100_to_199', 'l_neg_50_to_99', 'l_neg_1_to_49', 'l_zero', 'l_1_to_49', 'l_50_to_99', 'l_100_to_199', 'l_200_to_299', 'l_above_pos_300', 
#  'd_below_neg_300', 'd_neg_200_to_299', 'd_neg_100_to_199', 'd_neg_50_to_99', 'd_neg_1_to_49', 'd_zero', 'd_1_to_49', 'd_50_to_99', 'd_100_to_199', 'd_200_to_299', 'd_above_pos_300', 
#  'w_Grandmaster', 'w_Master', 'w_ClassA', 'w_ClassB', 'w_ClassC', 'w_ClassD', 'w_Novice', 
#  'l_Grandmaster', 'l_Master', 'l_ClassA', 'l_ClassB', 'l_ClassC', 'l_ClassD', 'l_Novice', 
#  'd_Grandmaster', 'd_Master', 'd_ClassA', 'd_ClassB', 'd_ClassC', 'd_ClassD', 'd_Novice', 
#  'win_count', ' loss_count', ' draw_count', 'win_loss_ratio', 
#  'max_win_streak', 'max_lose_streak', 'max_elo_diff', 'min_elo_diff', 'avg_elo_diff', 
#  'win_higher_count', 'win_lower_count', 'win_even_count',
#  'loss_higher_count', 'loss_lower_count', 'loss_even_count', 
#  'draw_higher_count', 'draw_lower_count', 'draw_even_count', 
#  'opponent_count', 'game_count', ' avg_daily_games', ' avg_weekly_games', ' avg_days_between_sessions', ' full_day_count']

players = [x for x in nodes_dict]
cols = ['player', 'avg_elo', 'min_elo','max_elo']
cols += ['game_count', 'win_count', 'loss_count','draw_count']
cols += ['win_higher_count', 'win_lower_count', 'win_even_count', 'loss_higher_count', 'loss_lower_count', 'loss_even_count', 'draw_higher_count', 'draw_lower_count', 'draw_even_count']
cols += ['w_Grandmaster', 'w_Master', 'w_ClassA', 'w_ClassB', 'w_ClassC', 'w_ClassD', 'w_Novice',  'l_Grandmaster', 'l_Master', 'l_ClassA', 'l_ClassB', 'l_ClassC', 'l_ClassD', 'l_Novice',  'd_Grandmaster', 'd_Master', 'd_ClassA', 'd_ClassB', 'd_ClassC', 'd_ClassD', 'd_Novice']

cut_df = pd.DataFrame(columns=cols)
temp_df = pd.DataFrame()
for df in df_list:
  temp_df = df[cols]
  cut_df = cut_df.append(temp_df, ignore_index=True)

cut_df = cut_df.rename_axis('id').sort_values(by=['player','id'])

In [214]:
cut_df = pd.DataFrame()
temp_df = pd.DataFrame()
for df in df_list:
  temp_df = df[df.columns]
  cut_df = cut_df.append(temp_df, ignore_index=True)

cut_df = cut_df.rename_axis('id')

player_df = cut_df.loc[cut_df['player'] == 'mfhgxe0']
player_df

Unnamed: 0_level_0,id,player,start_elo,end_elo,max_elo,min_elo,avg_elo,w_below_neg_300,w_neg_200_to_299,w_neg_100_to_199,w_neg_50_to_99,w_neg_1_to_49,w_zero,w_1_to_49,w_50_to_99,w_100_to_199,w_200_to_299,w_above_pos_300,l_below_neg_300,l_neg_200_to_299,l_neg_100_to_199,l_neg_50_to_99,l_neg_1_to_49,l_zero,l_1_to_49,l_50_to_99,l_100_to_199,l_200_to_299,l_above_pos_300,d_below_neg_300,d_neg_200_to_299,d_neg_100_to_199,d_neg_50_to_99,d_neg_1_to_49,d_zero,d_1_to_49,d_50_to_99,d_100_to_199,d_200_to_299,d_above_pos_300,...,w_ClassD,w_Novice,l_Grandmaster,l_Master,l_ClassA,l_ClassB,l_ClassC,l_ClassD,l_Novice,d_Grandmaster,d_Master,d_ClassA,d_ClassB,d_ClassC,d_ClassD,d_Novice,win_count,loss_count,draw_count,win_loss_ratio,max_win_streak,max_lose_streak,max_elo_diff,min_elo_diff,avg_elo_diff,win_higher_count,win_lower_count,win_even_count,loss_higher_count,loss_lower_count,loss_even_count,draw_higher_count,draw_lower_count,draw_even_count,opponent_count,game_count,avg_daily_games,avg_weekly_games,avg_days_between_sessions,full_day_count
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
252,253,mfhgxe0,2154,2057,2163,2027,2103.515152,31,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,22,5,0,0,0,0,0,2,0,0,0,0,0,0,0,0,31,2,0,15.5,17,2,-373,-1203,-804.848485,0,31,0,0,2,0,0,0,0,25,33,1.03125,7.21875,2.0,0.0.0.1.1.1.0.2.0.1.0.1.0.2.1.0.0.0.1.4.0.0.0....
2565,253,mfhgxe0,2060,2138,2140,2060,2100.180556,72,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,47,16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,72,0,0,72.0,72,0,-546,-1086,-822.0,0,72,0,0,0,0,0,0,0,60,72,2.482759,17.37931,4.0,0.0.0.0.0.0.0.13.1.3.1.3.2.1.3.6.5.9.1.2.2.0.5...
4878,253,mfhgxe0,2141,2210,2210,2104,2166.546099,140,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,81,25,0,0,0,0,0,1,0,0,0,0,0,0,0,0,140,1,0,140.0,74,1,-446,-1215,-851.326241,0,140,0,0,1,0,0,0,0,104,141,4.40625,30.84375,1.5,0.8.1.1.1.6.4.7.5.1.8.1.0.0.1.8.5.2.8.7.3.4.9....
7191,253,mfhgxe0,2211,2203,2277,2129,2220.827338,137,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,90,16,0,0,0,0,1,0,0,0,0,0,0,0,1,0,137,1,1,137.0,91,1,-403,-1129,-890.410072,0,137,0,0,1,0,0,1,0,103,139,4.483871,31.387097,0.0,2.5.4.4.10.8.2.10.3.5.7.12.2.2.1.3.5.3.4.6.7.3...
9504,253,mfhgxe0,2203,2173,2279,2109,2209.176796,179,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,123,21,0,0,0,0,0,2,0,0,0,0,0,0,0,0,179,2,0,89.5,115,1,-583,-1220,-888.762431,0,179,0,0,2,0,0,0,0,121,181,5.65625,39.59375,1.0,0.3.3.6.12.2.2.4.3.2.9.3.1.5.7.7.12.1.6.3.1.4....
11817,253,mfhgxe0,2175,1980,2212,1961,2079.661538,188,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,...,126,19,0,0,0,0,3,3,0,0,0,0,0,1,0,0,188,6,1,31.333333,53,1,-409,-1065,-751.882051,0,188,0,0,6,0,0,1,0,154,195,6.290323,44.032258,1.0,0.4.8.6.2.3.4.1.12.1.2.3.4.2.16.5.2.23.11.18.1...
14130,253,mfhgxe0,1981,2087,2180,1981,2093.757282,204,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,158,18,0,0,0,0,0,2,0,0,0,0,0,0,0,0,204,2,0,102.0,157,1,-492,-1160,-776.529126,0,204,0,0,2,0,0,0,0,147,206,6.4375,45.0625,1.0,1.9.7.3.6.12.5.4.6.3.10.3.11.13.10.4.5.1.2.15....
16443,253,mfhgxe0,2088,1966,2139,1929,2031.125,140,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,134,4,0,0,0,0,0,4,0,0,0,0,0,0,0,0,140,4,0,35.0,61,2,-542,-924,-732.722222,0,140,0,0,4,0,0,0,0,100,144,4.5,31.5,1.5,0.2.11.3.3.5.6.2.2.4.10.0.2.1.2.4.0.0.0.6.1.2....
18756,253,mfhgxe0,1971,2004,2014,1919,1978.938356,142,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,...,139,1,0,0,0,0,0,2,0,0,0,0,0,0,2,0,142,2,2,71.0,46,1,-505,-784,-669.59589,0,142,0,0,2,0,0,2,0,87,146,4.709677,32.967742,2.0,0.7.6.2.5.4.4.10.9.8.3.3.2.4.3.1.5.6.4.3.16.8....
21069,253,mfhgxe0,2006,2058,2058,1910,1994.04,123,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,122,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,123,2,0,61.5,82,1,-513,-834,-692.12,0,123,0,0,2,0,0,0,0,86,125,3.90625,27.34375,4.0,0.0.0.0.0.16.4.8.5.4.4.6.13.6.5.5.5.3.6.4.4.0....


In [106]:
# Rating 

new_cols = [['player','ES_avg_rating', 'EY_avg_rating', 'ES_max_rating','SE_min_rating']]
new_rows = []

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]
  year_avg = temp_df['avg_elo'].sum()/12
  
  start_row = temp_df.iloc[0]
  end_row = temp_df.iloc[len(df_list) - 1]

  end_start_avg_diff = end_row['avg_elo'] - start_row['avg_elo']
  end_year_avg_diff = end_row['avg_elo'] - year_avg

  end_start_max_rating_diff = end_row['max_elo'] - start_row['max_elo']
  start_end_min_rating_diff = start_row['min_elo'] - end_row['min_elo']
  
  player_row = [player, end_start_avg_diff, end_year_avg_diff, end_start_max_rating_diff, start_end_min_rating_diff]
  new_rows.append(player_row)

rating_df = pd.DataFrame(columns=new_cols, data=new_rows)


In [107]:
# Basic W/L/D 

new_cols = [['player','ES_win', 'SE_loss', 'SE_draw' , 'EY_win', 'EY_loss', 'EY_draw']]
new_rows = []

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]
  
  start_row = temp_df.iloc[0]
  end_row = temp_df.iloc[len(df_list) - 1]
  
  year_games = temp_df['game_count'].sum()
  year_win_pc = temp_df['win_count'].sum()/year_games
  year_loss_pc = temp_df['loss_count'].sum()/year_games
  year_draw_pc = temp_df['draw_count'].sum()/year_games

  win_pc_diff = end_row['win_count'] / end_row['game_count'] - start_row['win_count'] / start_row['game_count']
  loss_pc_diff = start_row['loss_count'] / start_row['game_count'] - end_row['loss_count'] / end_row['game_count']
  draw_pc_diff = start_row['draw_count'] / start_row['game_count'] - end_row['draw_count'] / end_row['game_count']

  end_year_win_pc_diff = end_row['win_count'] / end_row['game_count'] - year_win_pc
  end_year_loss_pc_diff = end_row['loss_count'] / end_row['game_count'] - year_loss_pc
  end_year_draw_pc_diff = end_row['draw_count'] / end_row['game_count'] - year_draw_pc

  player_row = [player, win_pc_diff*100, loss_pc_diff*100, draw_pc_diff*100, end_year_win_pc_diff*100, end_year_loss_pc_diff*100, end_year_draw_pc_diff*100]
  new_rows.append(player_row)

basic_wld_df = pd.DataFrame(columns=new_cols, data=new_rows)

In [154]:
# Relative W/L/D

new_cols = [['player','ES_win_higher','SE_loss_higher','SE_draw_higher','ES_win_lower','SE_loss_lower','SE_draw_lower','EY_win_higher', 'EY_loss_higher','EY_draw_higher', 'EY_win_lower', 'EY_loss_lower', 'EY_draw_lower']]
new_rows = []

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]
  
  start_row = temp_df.iloc[0]
  end_row = temp_df.iloc[len(df_list) - 1]
  
  higher_count = temp_df['win_higher_count'].sum() + temp_df['loss_higher_count'].sum() + temp_df['draw_higher_count'].sum()
  higher_count += temp_df['win_even_count'].sum() + temp_df['loss_even_count'].sum() + temp_df['draw_even_count'].sum()
  lower_count = temp_df['win_lower_count'].sum() + temp_df['loss_lower_count'].sum() + temp_df['draw_lower_count'].sum()

  if(higher_count == 0):
    higher_count = 1
  if(lower_count == 0):
    lower_count = 1

  # 12 month stats
  year_win_higher_pc = (temp_df['win_higher_count'].sum() + temp_df['win_even_count'].sum()) / higher_count
  year_win_lower_pc = temp_df['win_lower_count'].sum() / lower_count

  year_loss_higher_pc = (temp_df['loss_higher_count'].sum() + temp_df['loss_even_count'].sum()) / higher_count
  year_loss_lower_pc = temp_df['loss_lower_count'].sum() / lower_count

  year_draw_higher_pc = (temp_df['draw_higher_count'].sum() + temp_df['draw_even_count'].sum()) / higher_count
  year_draw_lower_pc = temp_df['draw_lower_count'].sum() / lower_count

  # 1st month stats
  start_higher_count = start_row['win_higher_count'] + start_row['loss_higher_count'] + start_row['draw_higher_count']
  start_higher_count += start_row['win_even_count'] + start_row['loss_even_count'] + start_row['draw_even_count']
  start_lower_count = start_row['win_lower_count'] + start_row['loss_lower_count'] + start_row['draw_lower_count']

  # 12th month stats
  end_higher_count = end_row['win_higher_count'] + end_row['loss_higher_count'] + end_row['draw_higher_count']
  end_higher_count += end_row['win_even_count'] + end_row['loss_even_count'] + end_row['draw_even_count']
  end_lower_count = end_row['win_lower_count'] + end_row['loss_lower_count'] + end_row['draw_lower_count']

  if(start_higher_count == 0):
    start_higher_count = 1
  if(start_lower_count == 0):
    start_lower_count = 1
  if(end_higher_count == 0):
    end_higher_count = 1
  if(end_lower_count == 0):
    end_lower_count = 1

  end_start_win_higher_pc_diff = (end_row['win_higher_count'] + end_row['win_even_count']) / end_higher_count - (start_row['win_higher_count'] + start_row['win_even_count']) / start_higher_count
  start_end_loss_higher_pc_diff = start_row['loss_higher_count'] / start_higher_count - end_row['loss_higher_count'] / end_higher_count
  start_end_draw_higher_pc_diff = start_row['draw_higher_count'] / start_higher_count - end_row['draw_higher_count'] / end_higher_count

  end_start_win_lower_pc_diff = end_row['win_lower_count'] / end_lower_count - start_row['win_lower_count'] / start_lower_count
  start_end_loss_lower_pc_diff = start_row['loss_lower_count'] / start_lower_count - end_row['loss_lower_count'] / end_lower_count
  start_end_draw_lower_pc_diff = start_row['draw_lower_count'] / start_lower_count - end_row['draw_lower_count'] / end_lower_count
 
  # End-Year Stats
  end_year_win_higher_pc_diff = (end_row['win_higher_count'] + end_row['win_even_count']) / end_higher_count - year_win_higher_pc
  end_year_loss_higher_pc_diff = (end_row['loss_higher_count'] + end_row['loss_even_count'])/ end_higher_count - year_loss_higher_pc
  end_year_draw_higher_pc_diff = (end_row['draw_higher_count'] + end_row['draw_even_count'])/ end_higher_count - year_draw_higher_pc
 
  end_year_win_lower_pc_diff = end_row['win_lower_count'] / end_lower_count - year_win_lower_pc
  end_year_loss_lower_pc_diff = end_row['loss_lower_count'] / end_lower_count - year_loss_lower_pc
  end_year_draw_lower_pc_diff = end_row['draw_lower_count'] / end_lower_count - year_draw_lower_pc

  player_row = [end_start_win_higher_pc_diff, start_end_loss_higher_pc_diff, start_end_draw_higher_pc_diff, end_start_win_lower_pc_diff, start_end_loss_lower_pc_diff, start_end_draw_lower_pc_diff,   end_year_win_higher_pc_diff ,  end_year_loss_higher_pc_diff,  end_year_draw_higher_pc_diff,  end_year_win_lower_pc_diff,  end_year_loss_lower_pc_diff,  end_year_draw_lower_pc_diff]
  player_row = [player] + [x*100 for x in player_row]
  new_rows.append(player_row)

relative_wld_df = pd.DataFrame(columns=new_cols, data=new_rows)

In [155]:
# Absolute W/L/D
#  'w_Grandmaster', 'w_Master', 'w_ClassA', 'w_ClassB', 'w_ClassC', 'w_ClassD', 'w_Novice', 
#  'l_Grandmaster', 'l_Master', 'l_ClassA', 'l_ClassB', 'l_ClassC', 'l_ClassD', 'l_Novice', 
#  'd_Grandmaster', 'd_Master', 'd_ClassA', 'd_ClassB', 'd_ClassC', 'd_ClassD', 'd_Novice'

new_cols = [['player','ES_win_Master','ES_win_ClassA','ES_win_ClassB','ES_win_ClassC','ES_win_ClassD','ES_win_Novice','SE_loss_Master','SE_loss_ClassA','SE_loss_ClassB','SE_loss_ClassC','SE_loss_ClassD','SE_loss_Novice','SE_draw_Master','SE_draw_ClassA','SE_draw_ClassB','SE_draw_ClassC','SE_draw_ClassD','SE_draw_Novice','EY_win_Master','EY_win_ClassA','EY_win_ClassB','EY_win_ClassC','EY_win_ClassD','EY_win_Novice','EY_loss_Master','EY_loss_ClassA','EY_loss_ClassB','EY_loss_ClassC','EY_loss_ClassD','EY_loss_Novice','EY_draw_Master','EY_draw_ClassA','EY_draw_ClassB','EY_draw_ClassC','EY_draw_ClassD','EY_draw_Novice']]
new_rows = []

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]
  
  start_row = temp_df.iloc[0]
  end_row = temp_df.iloc[len(df_list) - 1]
  
  # Number of GMs and Masters played
  master_count = temp_df['w_Grandmaster'].sum() + temp_df['l_Grandmaster'].sum() + temp_df['d_Grandmaster'].sum()
  master_count += temp_df['w_Master'].sum() + temp_df['l_Master'].sum() + temp_df['d_Master'].sum()
  
  # Number of Class A played
  classA_count = temp_df['w_ClassA'].sum() + temp_df['l_ClassA'].sum() + temp_df['d_ClassA'].sum()

  # Number of Class B played
  classB_count = temp_df['w_ClassB'].sum() + temp_df['l_ClassB'].sum() + temp_df['d_ClassB'].sum()

  # Number of Class C played
  classC_count = temp_df['w_ClassC'].sum() + temp_df['l_ClassC'].sum() + temp_df['d_ClassC'].sum()

  # Number of Class D played
  classD_count = temp_df['w_ClassD'].sum() + temp_df['l_ClassD'].sum() + temp_df['d_ClassD'].sum()

  # Number of Novices played
  novice_count = temp_df['w_Novice'].sum() + temp_df['l_Novice'].sum() + temp_df['d_Novice'].sum()

  # Avoid divided by 0 error
  if(master_count == 0):
    master_count = 1
  if(classA_count == 0):
    classA_count = 1
  if(classB_count == 0):
    classB_count = 1
  if(classC_count == 0):
    classC_count = 1
  if(classD_count == 0):
    classD_count = 1
  if(novice_count == 0):
    novice_count = 1

  # 12 month stats
  # GM and Master
  year_win_Master_pc = (temp_df['w_Master'].sum() + temp_df['w_Grandmaster'].sum()) / master_count
  year_loss_Master_pc = (temp_df['l_Master'].sum() + temp_df['l_Grandmaster'].sum()) / master_count
  year_draw_Master_pc = (temp_df['d_Master'].sum() + temp_df['d_Grandmaster'].sum()) / master_count

  # Class A
  year_win_ClassA_pc = temp_df['w_ClassA'].sum() / classA_count
  year_loss_ClassA_pc = temp_df['l_ClassA'].sum() / classA_count
  year_draw_ClassA_pc = temp_df['d_ClassA'].sum() / classA_count

  # Class B
  year_win_ClassB_pc = temp_df['w_ClassB'].sum() / classB_count
  year_loss_ClassB_pc = temp_df['l_ClassB'].sum() / classB_count
  year_draw_ClassB_pc = temp_df['d_ClassB'].sum() / classB_count

  # Class C
  year_win_ClassC_pc = temp_df['w_ClassC'].sum() / classC_count
  year_loss_ClassC_pc = temp_df['l_ClassC'].sum() / classC_count
  year_draw_ClassC_pc = temp_df['d_ClassC'].sum() / classC_count

  # Class D
  year_win_ClassD_pc = temp_df['w_ClassD'].sum() / classD_count
  year_loss_ClassD_pc = temp_df['l_ClassD'].sum() / classD_count
  year_draw_ClassD_pc = temp_df['d_ClassD'].sum() / classD_count

  # Novice
  year_win_Novice_pc = temp_df['w_Novice'].sum() / novice_count
  year_loss_Novice_pc = temp_df['l_Novice'].sum() / novice_count
  year_draw_Novice_pc = temp_df['d_Novice'].sum() / novice_count

  # 1st month stats

  # Number of GMs and Masters played
  start_Master_count = start_row['w_Grandmaster'] + start_row['l_Grandmaster'] + start_row['d_Grandmaster']
  start_Master_count += start_row['w_Master'] + start_row['l_Master'] + start_row['d_Master']
  
  # Number of Class A played
  start_classA_count = start_row['w_ClassA'] + start_row['l_ClassA'] + start_row['d_ClassA']

  # Number of Class B played
  start_classB_count = start_row['w_ClassB'] + start_row['l_ClassB'] + start_row['d_ClassB']

  # Number of Class C played
  start_classC_count = start_row['w_ClassC'] + start_row['l_ClassC'] + start_row['d_ClassC']

  # Number of Class D played
  start_classD_count = start_row['w_ClassD'] + start_row['l_ClassD'] + start_row['d_ClassD']

  # Number of Novices played
  start_novice_count = start_row['w_Novice'] + start_row['l_Novice'] + start_row['d_Novice']

  # Avoid divided by 0 error
  if(start_Master_count == 0):
    start_Master_count = 1
  if(start_classA_count == 0):
    start_classA_count = 1
  if(start_classB_count == 0):
    start_classB_count = 1
  if(start_classC_count == 0):
    start_classC_count = 1
  if(start_classD_count == 0):
    start_classD_count = 1
  if(start_novice_count == 0):
    start_novice_count = 1

  # GM and Master
  start_win_Master_pc = (start_row['w_Master'] + start_row['w_Grandmaster']) / start_Master_count
  start_loss_Master_pc = (start_row['l_Master'] + start_row['l_Grandmaster']) / start_Master_count
  start_draw_Master_pc = (start_row['d_Master'] + start_row['d_Grandmaster']) / start_Master_count

  # Class A
  start_win_ClassA_pc = start_row['w_ClassA'] / start_classA_count
  start_loss_ClassA_pc = start_row['l_ClassA'] / start_classA_count
  start_draw_ClassA_pc = start_row['d_ClassA'] / start_classA_count

  # Class B
  start_win_ClassB_pc = start_row['w_ClassB'] / start_classB_count
  start_loss_ClassB_pc = start_row['l_ClassB'] / start_classB_count
  start_draw_ClassB_pc = start_row['d_ClassB'] / start_classB_count

  # Class C
  start_win_ClassC_pc = start_row['w_ClassC'] / start_classC_count
  start_loss_ClassC_pc = start_row['l_ClassC'] / start_classC_count
  start_draw_ClassC_pc = start_row['d_ClassC'] / start_classC_count

  # Class D
  start_win_ClassD_pc = start_row['w_ClassD'] / start_classD_count
  start_loss_ClassD_pc = start_row['l_ClassD'] / start_classD_count
  start_draw_ClassD_pc = start_row['d_ClassD'] / start_classD_count

  # Novice
  start_win_Novice_pc = start_row['w_Novice'] / start_novice_count
  start_loss_Novice_pc = start_row['l_Novice'] / start_novice_count
  start_draw_Novice_pc = start_row['d_Novice'] / start_novice_count


  # 12th month stats
  # Number of GMs and Masters played
  end_Master_count = end_row['w_Grandmaster'] + end_row['l_Grandmaster'] + end_row['d_Grandmaster']
  end_Master_count += end_row['w_Master'] + end_row['l_Master'] + end_row['d_Master']
  
  # Number of Class A played
  end_classA_count = end_row['w_ClassA'] + end_row['l_ClassA'] + end_row['d_ClassA']

  # Number of Class B played
  end_classB_count = end_row['w_ClassB'] + end_row['l_ClassB'] + end_row['d_ClassB']

  # Number of Class C played
  end_classC_count = end_row['w_ClassC'] + end_row['l_ClassC'] + end_row['d_ClassC']

  # Number of Class D played
  end_classD_count = end_row['w_ClassD'] + end_row['l_ClassD'] + end_row['d_ClassD']

  # Number of Novices played
  end_novice_count = end_row['w_Novice'] + end_row['l_Novice'] + end_row['d_Novice']

  # Avoid divided by 0 error
  if(end_Master_count == 0):
    end_Master_count = 1
  if(end_classA_count == 0):
    end_classA_count = 1
  if(end_classB_count == 0):
    end_classB_count = 1
  if(end_classC_count == 0):
    end_classC_count = 1
  if(end_classD_count == 0):
    end_classD_count = 1
  if(end_novice_count == 0):
    end_novice_count = 1

  # GM and Master
  end_win_Master_pc = (end_row['w_Master'] + end_row['w_Grandmaster']) / end_Master_count
  end_loss_Master_pc = (end_row['l_Master'] + end_row['l_Grandmaster']) / end_Master_count
  end_draw_Master_pc = (end_row['d_Master'] + end_row['d_Grandmaster']) / end_Master_count

  # Class A
  end_win_ClassA_pc = end_row['w_ClassA'] / end_classA_count
  end_loss_ClassA_pc = end_row['l_ClassA'] / end_classA_count
  end_draw_ClassA_pc = end_row['d_ClassA'] / end_classA_count

  # Class B
  end_win_ClassB_pc = end_row['w_ClassB'] / end_classB_count
  end_loss_ClassB_pc = end_row['l_ClassB'] / end_classB_count
  end_draw_ClassB_pc = end_row['d_ClassB'] / end_classB_count

  # Class C
  end_win_ClassC_pc = end_row['w_ClassC'] / end_classC_count
  end_loss_ClassC_pc = end_row['l_ClassC'] / end_classC_count
  end_draw_ClassC_pc = end_row['d_ClassC'] / end_classC_count

  # Class D
  end_win_ClassD_pc = end_row['w_ClassD'] / end_classD_count
  end_loss_ClassD_pc = end_row['l_ClassD'] / end_classD_count
  end_draw_ClassD_pc = end_row['d_ClassD'] / end_classD_count

  # Novice
  end_win_Novice_pc = end_row['w_Novice'] / end_novice_count
  end_loss_Novice_pc = end_row['l_Novice'] / end_novice_count
  end_draw_Novice_pc = end_row['d_Novice'] / end_novice_count

  # End Start
  end_start_win_Master_pc_diff = end_win_Master_pc - start_win_Master_pc
  end_start_win_ClassA_pc_diff = end_win_ClassA_pc - start_win_ClassA_pc
  end_start_win_ClassB_pc_diff = end_win_ClassB_pc - start_win_ClassB_pc
  end_start_win_ClassC_pc_diff = end_win_ClassC_pc - start_win_ClassC_pc
  end_start_win_ClassD_pc_diff = end_win_ClassD_pc - start_win_ClassD_pc
  end_start_win_Novice_pc_diff = end_win_Novice_pc - start_win_Novice_pc

  start_end_loss_Master_pc_diff = start_loss_Master_pc - end_loss_Master_pc
  start_end_loss_ClassA_pc_diff = start_loss_ClassA_pc - end_loss_ClassA_pc
  start_end_loss_ClassB_pc_diff = start_loss_ClassB_pc - end_loss_ClassB_pc
  start_end_loss_ClassC_pc_diff = start_loss_ClassC_pc - end_loss_ClassC_pc
  start_end_loss_ClassD_pc_diff = start_loss_ClassD_pc - end_loss_ClassD_pc
  start_end_loss_Novice_pc_diff = start_loss_Novice_pc - end_loss_Novice_pc

  start_end_draw_Master_pc_diff = start_draw_Master_pc - end_draw_Master_pc
  start_end_draw_ClassA_pc_diff = start_draw_ClassA_pc - end_draw_ClassA_pc
  start_end_draw_ClassB_pc_diff = start_draw_ClassB_pc - end_draw_ClassB_pc
  start_end_draw_ClassC_pc_diff = start_draw_ClassC_pc - end_draw_ClassC_pc
  start_end_draw_ClassD_pc_diff = start_draw_ClassD_pc - end_draw_ClassD_pc
  start_end_draw_Novice_pc_diff = start_draw_Novice_pc - end_draw_Novice_pc

  # End Year
  end_year_win_Master_pc_diff = end_win_Master_pc - year_win_Master_pc
  end_year_win_ClassA_pc_diff = end_win_ClassA_pc - year_win_ClassA_pc
  end_year_win_ClassB_pc_diff = end_win_ClassB_pc - year_win_ClassB_pc
  end_year_win_ClassC_pc_diff = end_win_ClassC_pc - year_win_ClassC_pc
  end_year_win_ClassD_pc_diff = end_win_ClassD_pc - year_win_ClassD_pc
  end_year_win_Novice_pc_diff = end_win_Novice_pc - year_win_Novice_pc

  end_year_loss_Master_pc_diff = end_loss_Master_pc - year_loss_Master_pc
  end_year_loss_ClassA_pc_diff = end_loss_ClassA_pc - year_loss_ClassA_pc
  end_year_loss_ClassB_pc_diff = end_loss_ClassB_pc - year_loss_ClassB_pc
  end_year_loss_ClassC_pc_diff = end_loss_ClassC_pc - year_loss_ClassC_pc
  end_year_loss_ClassD_pc_diff = end_loss_ClassD_pc - year_loss_ClassD_pc
  end_year_loss_Novice_pc_diff = end_loss_Novice_pc - year_loss_Novice_pc

  end_year_draw_Master_pc_diff = end_draw_Master_pc - year_draw_Master_pc
  end_year_draw_ClassA_pc_diff = end_draw_ClassA_pc - year_draw_ClassA_pc
  end_year_draw_ClassB_pc_diff = end_draw_ClassB_pc - year_draw_ClassB_pc
  end_year_draw_ClassC_pc_diff = end_draw_ClassC_pc - year_draw_ClassC_pc
  end_year_draw_ClassD_pc_diff = end_draw_ClassD_pc - year_draw_ClassD_pc
  end_year_draw_Novice_pc_diff = end_draw_Novice_pc - year_draw_Novice_pc
  
  player_row = [end_start_win_Master_pc_diff, end_start_win_ClassA_pc_diff, end_start_win_ClassB_pc_diff, end_start_win_ClassC_pc_diff, end_start_win_ClassD_pc_diff, end_start_win_Novice_pc_diff, start_end_loss_Master_pc_diff, start_end_loss_ClassA_pc_diff, start_end_loss_ClassB_pc_diff, start_end_loss_ClassC_pc_diff, start_end_loss_ClassD_pc_diff, start_end_loss_Novice_pc_diff, start_end_draw_Master_pc_diff, start_end_draw_ClassA_pc_diff, start_end_draw_ClassB_pc_diff, start_end_draw_ClassC_pc_diff, start_end_draw_ClassD_pc_diff, start_end_draw_Novice_pc_diff, end_year_win_Master_pc_diff, end_year_win_ClassA_pc_diff, end_year_win_ClassB_pc_diff, end_year_win_ClassC_pc_diff, end_year_win_ClassD_pc_diff, end_year_win_Novice_pc_diff, end_year_loss_Master_pc_diff, end_year_loss_ClassA_pc_diff, end_year_loss_ClassB_pc_diff, end_year_loss_ClassC_pc_diff, end_year_loss_ClassD_pc_diff, end_year_loss_Novice_pc_diff, end_year_draw_Master_pc_diff, end_year_draw_ClassA_pc_diff, end_year_draw_ClassB_pc_diff, end_year_draw_ClassC_pc_diff, end_year_draw_ClassD_pc_diff, end_year_draw_Novice_pc_diff]
  player_row = [player] + [x*100 for x in player_row]
  new_rows.append(player_row)

absolute_wld_df = pd.DataFrame(columns=new_cols, data=new_rows)

In [47]:
# rating_df
# basic_wld_df
# relative_wld_df
# absolute_wld_df

#### **Strategy Definition**

**Option 1:** 

Categorise players without consideration of opponent's relative rating (i.e. only consider whether opponent's rating is higher or lower, ignore magnitude of difference).

*Strategy 1: Play up*
>  100% = Play opponents rated higher only \\
  0%   = Play opponents rated lower only

*Strategy 2: Play down*
>100% = Play opponents rated lower only \\
  0%   = Play opponents rated higher only

*Strategy 3: Play randomly*
>100% = Even number of higher and lower rated opponents played \\
  0%   = Play opponents rated higher/lower only


**Option 2:** 

Use opponent's relative rating as weight

>$ Strength = \frac{\sum(Elo(Opponent) - Elo(Player))}{Total Games}$




In [26]:
# Strategy

# Full list of labels = 
# ['id', 'player', 'start_elo', 'end_elo', 'max_elo', 'min_elo', 'avg_elo', 
#  'w_below_neg_300', 'w_neg_200_to_299', 'w_neg_100_to_199', 'w_neg_50_to_99', 'w_neg_1_to_49', 'w_zero', 'w_1_to_49', 'w_50_to_99', 'w_100_to_199', 'w_200_to_299', 'w_above_pos_300', 
#  'l_below_neg_300', 'l_neg_200_to_299', 'l_neg_100_to_199', 'l_neg_50_to_99', 'l_neg_1_to_49', 'l_zero', 'l_1_to_49', 'l_50_to_99', 'l_100_to_199', 'l_200_to_299', 'l_above_pos_300', 
#  'd_below_neg_300', 'd_neg_200_to_299', 'd_neg_100_to_199', 'd_neg_50_to_99', 'd_neg_1_to_49', 'd_zero', 'd_1_to_49', 'd_50_to_99', 'd_100_to_199', 'd_200_to_299', 'd_above_pos_300', 
#  'w_Grandmaster', 'w_Master', 'w_ClassA', 'w_ClassB', 'w_ClassC', 'w_ClassD', 'w_Novice', 
#  'l_Grandmaster', 'l_Master', 'l_ClassA', 'l_ClassB', 'l_ClassC', 'l_ClassD', 'l_Novice', 
#  'd_Grandmaster', 'd_Master', 'd_ClassA', 'd_ClassB', 'd_ClassC', 'd_ClassD', 'd_Novice', 
#  'win_count', ' loss_count', ' draw_count', 'win_loss_ratio', 
#  'max_win_streak', 'max_lose_streak', 'max_elo_diff', 'min_elo_diff', 'avg_elo_diff', 
#  'win_higher_count', 'win_lower_count', 'win_even_count',
#  'loss_higher_count', 'loss_lower_count', 'loss_even_count', 
#  'draw_higher_count', 'draw_lower_count', 'draw_even_count', 
#  'opponent_count', 'game_count', ' avg_daily_games', ' avg_weekly_games', ' avg_days_between_sessions', ' full_day_count']

players = [x for x in nodes_dict]
cols = ['player']
cols += ['win_higher_count', 'win_lower_count', 'win_even_count', 'loss_higher_count', 'loss_lower_count', 'loss_even_count', 'draw_higher_count', 'draw_lower_count', 'draw_even_count', 'game_count','opponent_count']

cut_df = pd.DataFrame(columns=cols)
temp_df = pd.DataFrame()
for df in df_list:
  temp_df = df[cols]
  cut_df = cut_df.append(temp_df, ignore_index=True)

cut_df = cut_df.rename_axis('id').sort_values(by=['player','id'])

In [110]:
new_cols = [['player', 'S1', 'S2', 'S3', 'S1_start', 'S2_start', 'S3_start', 'S1_end', 'S2_end', 'S3_end', 'S1_SD', 'S2_SD', 'S3_SD']]
# new_cols = [['player', 'S1', 'S2', 'S3', 'S1_start', 'S2_start', 'S3_start', 'S1_end', 'S2_end', 'S3_end', 'S1_list', 'S2_list', 'S3_list']]
new_rows = []

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]

  strategy_1 = []
  strategy_2 = []
  strategy_3 = []
  
  for r in range(0, len(df_list)):
    row = temp_df.iloc[r]
    game_count = row['game_count']
    higher_count = row['win_higher_count'] + row['loss_higher_count'] + row['draw_higher_count']
    higher_count += row['win_even_count'] + row['loss_even_count'] + row['draw_even_count']
    lower_count = row['win_lower_count'] + row['loss_lower_count'] + row['draw_lower_count']

    strat_1 = higher_count / game_count
    strat_2 = lower_count / game_count
    strat_3 = 1 - abs(strat_1 - strat_2)
    
    strategy_1.append(strat_1)
    strategy_2.append(strat_2)
    strategy_3.append(strat_3)

  strategy_1 = list(np.around(np.array(strategy_1),2))
  strategy_2 = list(np.around(np.array(strategy_2),2))
  strategy_3 = list(np.around(np.array(strategy_3),2))


  S1 = sum(strategy_1) / len(strategy_1)
  S2 = sum(strategy_2) / len(strategy_2)
  S3 = sum(strategy_3) / len(strategy_3)

  S1_sd = (sum([(s - S1)**2 for s in strategy_1])/(len(strategy_1) - 1))**0.5
  S2_sd = (sum([(s - S2)**2 for s in strategy_2])/(len(strategy_2) - 1))**0.5
  S3_sd = (sum([(s - S3)**2 for s in strategy_3])/(len(strategy_3) - 1))**0.5

  S1_start = strategy_1[0]
  S2_start = strategy_2[0]
  S3_start = strategy_3[0]

  S1_end = strategy_1[-1]
  S2_end = strategy_2[-1]
  S3_end = strategy_3[-1]

  # S1_list = '.'.join([str(int(i*100)) for i in strategy_1])
  # S2_list = '.'.join([str(int(i*100)) for i in strategy_2])
  # S3_list = '.'.join([str(int(i*100)) for i in strategy_3])

  player_row = [S1, S2, S3, S1_start, S2_start, S3_start, S1_end, S2_end, S3_end, S1_sd, S2_sd, S3_sd]
  # player_row = [S1, S2, S3, S1_start, S2_start, S3_start, S1_end, S2_end, S3_end, S1_list, S2_list, S3_list]
  player_row = [player] + [x*100 for x in player_row]
  new_rows.append(player_row)

# strategy_df = pd.DataFrame(columns=new_cols, data=new_rows)
strategy_df_small = pd.DataFrame(columns=new_cols, data=new_rows)

#### **Activity Definition**

This measures a player's activity from month to month.


In [111]:
# Activity

# Full list of labels = 
# ['id', 'player', 'start_elo', 'end_elo', 'max_elo', 'min_elo', 'avg_elo', 
#  'w_below_neg_300', 'w_neg_200_to_299', 'w_neg_100_to_199', 'w_neg_50_to_99', 'w_neg_1_to_49', 'w_zero', 'w_1_to_49', 'w_50_to_99', 'w_100_to_199', 'w_200_to_299', 'w_above_pos_300', 
#  'l_below_neg_300', 'l_neg_200_to_299', 'l_neg_100_to_199', 'l_neg_50_to_99', 'l_neg_1_to_49', 'l_zero', 'l_1_to_49', 'l_50_to_99', 'l_100_to_199', 'l_200_to_299', 'l_above_pos_300', 
#  'd_below_neg_300', 'd_neg_200_to_299', 'd_neg_100_to_199', 'd_neg_50_to_99', 'd_neg_1_to_49', 'd_zero', 'd_1_to_49', 'd_50_to_99', 'd_100_to_199', 'd_200_to_299', 'd_above_pos_300', 
#  'w_Grandmaster', 'w_Master', 'w_ClassA', 'w_ClassB', 'w_ClassC', 'w_ClassD', 'w_Novice', 
#  'l_Grandmaster', 'l_Master', 'l_ClassA', 'l_ClassB', 'l_ClassC', 'l_ClassD', 'l_Novice', 
#  'd_Grandmaster', 'd_Master', 'd_ClassA', 'd_ClassB', 'd_ClassC', 'd_ClassD', 'd_Novice', 
#  'win_count', ' loss_count', ' draw_count', 'win_loss_ratio', 
#  'max_win_streak', 'max_lose_streak', 'max_elo_diff', 'min_elo_diff', 'avg_elo_diff', 
#  'win_higher_count', 'win_lower_count', 'win_even_count',
#  'loss_higher_count', 'loss_lower_count', 'loss_even_count', 
#  'draw_higher_count', 'draw_lower_count', 'draw_even_count', 
#  'opponent_count', 'game_count', ' avg_daily_games', ' avg_weekly_games', ' avg_days_between_sessions', ' full_day_count']

players = [x for x in nodes_dict]
cols = ['player']
cols += ['opponent_count', 'game_count', 'avg_daily_games', 'avg_weekly_games', 'avg_days_between_sessions']

cut_df = pd.DataFrame(columns=cols)
temp_df = pd.DataFrame()
for df in df_list: 
  temp_df = df[cols]
  cut_df = cut_df.append(temp_df, ignore_index=True)

cut_df = cut_df.rename_axis('id').sort_values(by=['player','id'])

In [112]:
new_cols = [['player', 'avg_opponents', 'avg_games', 'total_games', 'avg_daily', 'avg_weekly', 'avg_rest', 'ES_games', 'ES_daily', 'ES_weekly', 'ES_rest', 'EY_games', 'EY_daily', 'EY_weekly', 'EY_rest']]
new_rows = []

  # classA_count = temp_df['w_ClassA'].sum() + temp_df['l_ClassA'].sum() + temp_df['d_ClassA'].sum()

for player in players:
  temp_df = cut_df.loc[cut_df['player'] == player]

  start = temp_df.iloc[0]
  end = temp_df.iloc[len(df_list) - 1]

  avg_opponents = temp_df['opponent_count'].sum() / len(df_list)
  avg_games = temp_df['game_count'].sum() / len(df_list)
  total_games = temp_df['game_count'].sum()
  avg_daily = temp_df['avg_daily_games'].sum() / len(df_list)
  avg_weekly = temp_df['avg_weekly_games'].sum() / len(df_list)
  avg_rest = temp_df['avg_days_between_sessions'].sum() / len(df_list)

  ES_games = end['game_count'] - start['game_count']
  ES_daily = end['avg_daily_games'] - start['avg_daily_games']
  ES_weekly = end['avg_weekly_games'] - start['avg_weekly_games']
  ES_rest = end['avg_days_between_sessions'] - start['avg_days_between_sessions']

  EY_games = end['game_count'] - avg_games
  EY_daily = end['avg_daily_games'] - avg_daily
  EY_weekly = end['avg_weekly_games'] - avg_weekly
  EY_rest = end['avg_days_between_sessions'] - avg_rest

  player_row = [avg_opponents, avg_games, total_games, avg_daily, avg_weekly, avg_rest, ES_games, ES_daily, ES_weekly, ES_rest, EY_games, EY_daily, EY_weekly, EY_rest]
  # player_row = [S1, S2, S3, S1_start, S2_start, S3_start, S1_end, S2_end, S3_end, S1_list, S2_list, S3_list]
  player_row = [player] + [x for x in player_row]
  new_rows.append(player_row)

# strategy_df = pd.DataFrame(columns=new_cols, data=new_rows)
activity_df = pd.DataFrame(columns=new_cols, data=new_rows)

In [156]:
# Save results to Drive

results = [rating_df,basic_wld_df,relative_wld_df,absolute_wld_df, strategy_df_small, activity_df]
results_names = ['rating_df','basic_wld_df','relative_wld_df','absolute_wld_df','strategy_df_small','activity_df']

for i in range(0, len(results)):
  csv_path = results_folder + '{0}.csv'.format(results_names[i])
  results[i].to_csv(csv_path, index=False)
  # !cp data.csv results_folder

In [30]:
csv_path = results_folder + 'strategy_df_small.csv'
strategy_df_small.to_csv(csv_path)

### Processing Data


In [157]:
'''
79 fucking metrics

Rating
ES_avg_rating, EY_avg_rating, ES_max_rating, SE_min_rating

Basic WLD
ES_win, SE_loss, SE_draw , EY_win, EY_loss, EY_draw

Relative WLD
ES_win_higher, SE_loss_higher, SE_draw_higher, ES_win_lower, SE_loss_lower, SE_draw_lower, 
EY_win_higher, EY_loss_higher, EY_draw_higher, EY_win_lower, EY_loss_lower, EY_draw_lower

Absolute WLD
ES_win_Master,  ES_win_ClassA,  ES_win_ClassB,  ES_win_ClassC,  ES_win_ClassD,  ES_win_Novice,  
SE_loss_Master, SE_loss_ClassA, SE_loss_ClassB, SE_loss_ClassC, SE_loss_ClassD, SE_loss_Novice,  
SE_draw_Master, SE_draw_ClassA, SE_draw_ClassB, SE_draw_ClassC, SE_draw_ClassD, SE_draw_Novice,  
EY_win_Master,  EY_win_ClassA,  EY_win_ClassB,  EY_win_ClassC,  EY_win_ClassD,  EY_win_Novice,  
EY_loss_Master, EY_loss_ClassA, EY_loss_ClassB, EY_loss_ClassC, EY_loss_ClassD, EY_loss_Novice,  
EY_draw_Master, EY_draw_ClassA, EY_draw_ClassB, EY_draw_ClassC, EY_draw_ClassD, EY_draw_Novice

Strategy
S1, S2, S3, S1_start, S2_start, S3_start, S1_end, S2_end, S3_end, S1_SD, S2_SD, S3_SD

Activity
avg_opponents, avg_games, total_games, avg_daily, avg_weekly, avg_rest, ES_games, ES_daily, ES_weekly, ES_rest, EY_games, EY_daily, EY_weekly, EY_rest
'''

# Get dataframes from Drive
rating_df = pd.read_csv(results_folder + 'rating_df.csv')
basic_wld_df = pd.read_csv(results_folder + 'basic_wld_df.csv')
relative_wld_df = pd.read_csv(results_folder + 'relative_wld_df.csv')
absolute_wld_df = pd.read_csv(results_folder + 'absolute_wld_df.csv')
strategy_df = pd.read_csv(results_folder + 'strategy_df_small.csv')
activity_df = pd.read_csv(results_folder + 'activity_df.csv')
df_list = [rating_df,basic_wld_df,relative_wld_df,absolute_wld_df, strategy_df, activity_df]



In [179]:
# Rating Component
# ES_avg_rating, EY_avg_rating, ES_max_rating, SE_min_rating

weights = [2, 2, 1, 1]
rating_weights = [w/6 for w in weights]

rating_max = rating_df.max()
rating_min = rating_df.min()

# ES_avg_rating
max_ES_rating = rating_max['ES_avg_rating']
min_ES_rating = rating_min['ES_avg_rating']
# EY_avg_rating
max_EY_rating = rating_max['EY_avg_rating']
min_EY_rating = rating_min['EY_avg_rating']
# ES_max_rating
max_ES_rating = rating_max['ES_max_rating']
min_ES_rating = rating_min['ES_max_rating']
# SE_min_rating
max_SE_rating = rating_max['SE_min_rating']
min_SE_rating = rating_min['SE_min_rating']

max_vals = [max_ES_rating,max_EY_rating,max_ES_rating,max_SE_rating]
min_vals = [min_ES_rating,min_EY_rating,min_ES_rating,min_SE_rating]
zip_vals = list(zip(max_vals, min_vals))

rating_metrics = ['ES_avg_rating','EY_avg_rating','ES_max_rating','SE_min_rating']
rating_normalise = [abs(a[0]) + abs(a[1]) for a in zip_vals]
rating_min_vals = min_vals

# Basic WLD Component
# ES_win, SE_loss, SE_draw , EY_win, EY_loss, EY_draw

weights = [1, 1, 1, 2, 2, 2]
basicWLD_weights = [w/9 for w in weights]
basicWLD_metrics = ['ES_win', 'SE_loss', 'SE_draw', 'EY_win', 'EY_loss', 'EY_draw']

# Relative WLD Component
# ES_win_higher, SE_loss_higher, SE_draw_higher, ES_win_lower, SE_loss_lower, SE_draw_lower, 
# EY_win_higher, EY_loss_higher, EY_draw_higher, EY_win_lower, EY_loss_lower, EY_draw_lower

relativeWLD_weights = [3/20, 3/20, 1/20, 2/20, 2/20, 1/20] * 2
relativeWLD_metrics = ['ES_win_higher', 'SE_loss_higher', 'SE_draw_higher', 'ES_win_lower', 'SE_loss_lower', 'SE_draw_lower', 'EY_win_higher', 'EY_loss_higher', 'EY_draw_higher', 'EY_win_lower', 'EY_loss_lower', 'EY_draw_lower']

# Absolute WLD
# ES_win_Master,  ES_win_ClassA,  ES_win_ClassB,  ES_win_ClassC,  ES_win_ClassD,  ES_win_Novice,  
# SE_loss_Master, SE_loss_ClassA, SE_loss_ClassB, SE_loss_ClassC, SE_loss_ClassD, SE_loss_Novice,  
# SE_draw_Master, SE_draw_ClassA, SE_draw_ClassB, SE_draw_ClassC, SE_draw_ClassD, SE_draw_Novice,  
# EY_win_Master,  EY_win_ClassA,  EY_win_ClassB,  EY_win_ClassC,  EY_win_ClassD,  EY_win_Novice,  
# EY_loss_Master, EY_loss_ClassA, EY_loss_ClassB, EY_loss_ClassC, EY_loss_ClassD, EY_loss_Novice,  
# EY_draw_Master, EY_draw_ClassA, EY_draw_ClassB, EY_draw_ClassC, EY_draw_ClassD, EY_draw_Novice

weights = [1, 4, 4, 3, 3, 2] * 6
absoluteWLD_weights = [w/102 for w in weights]
absoluteWLD_metrics = ['ES_win_Master', 'ES_win_ClassA', 'ES_win_ClassB', 'ES_win_ClassC', 'ES_win_ClassD', 'ES_win_Novice', 'SE_loss_Master', 'SE_loss_ClassA', 'SE_loss_ClassB', 'SE_loss_ClassC', 'SE_loss_ClassD', 'SE_loss_Novice', 'SE_draw_Master', 'SE_draw_ClassA', 'SE_draw_ClassB', 'SE_draw_ClassC', 'SE_draw_ClassD', 'SE_draw_Novice', 'EY_win_Master', 'EY_win_ClassA', 'EY_win_ClassB', 'EY_win_ClassC', 'EY_win_ClassD', 'EY_win_Novice', 'EY_loss_Master', 'EY_loss_ClassA', 'EY_loss_ClassB', 'EY_loss_ClassC', 'EY_loss_ClassD', 'EY_loss_Novice', 'EY_draw_Master', 'EY_draw_ClassA', 'EY_draw_ClassB', 'EY_draw_ClassC', 'EY_draw_ClassD', 'EY_draw_Novice']

# Strategy
# S1, S2, S3, S1_start, S2_start, S3_start, S1_end, S2_end, S3_end, S1_SD, S2_SD, S3_SD

# S1_SD, S2_SD, S3_SD
strategy_weights = [1/4, 1/4, 2/4]
strategy_metrics = ['S1_SD', 'S2_SD', 'S3_SD']

strategy_max = strategy_df.max()
strategy_min = strategy_df.min()
max_S1_SD = strategy_max['S1_SD']
min_S1_SD = strategy_min['S1_SD']
max_S2_SD = strategy_max['S2_SD']
min_S2_SD = strategy_min['S2_SD']
max_S3_SD = strategy_max['S3_SD']
min_S3_SD = strategy_min['S3_SD']

max_vals = [max_S1_SD, max_S2_SD, max_S3_SD]
min_vals = [min_S1_SD, min_S2_SD, min_S3_SD]
zip_vals = list(zip(max_vals, min_vals))
strategy_normalise = [abs(a[0]) + abs(a[1]) for a in zip_vals]
strategy_min_vals = min_vals


# Activity
# avg_opponents, avg_games, total_games, avg_daily, avg_weekly, avg_rest, ES_games, ES_daily, ES_weekly, ES_rest, EY_games, EY_daily, EY_weekly, EY_rest

weights = [3, 2, 2, 3, 1, 1, 1, 1, 2, 2, 2, 2]
activity_weights = [w/22 for w in weights]

activity_max = activity_df.max()
activity_min = activity_df.min()

max_total_games = activity_max['total_games']
min_total_games = activity_min['total_games']

max_daily = activity_max['avg_daily']
min_daily = activity_max['avg_daily']

max_weekly = activity_max['avg_weekly']
min_weekly = activity_max['avg_weekly']

max_rest = activity_max['avg_rest']
min_rest = activity_max['avg_rest']

max_ES_games = activity_max['ES_games']
min_ES_games = activity_min['ES_games']

max_ES_daily = activity_max['ES_daily']
min_ES_daily = activity_max['ES_daily']

max_ES_weekly = activity_max['ES_weekly']
min_ES_weekly = activity_max['ES_weekly']

max_ES_rest = activity_max['ES_rest']
min_ES_rest = activity_max['ES_rest']

max_EY_games = activity_max['EY_games']
min_EY_games = activity_min['EY_games']

max_EY_daily = activity_max['EY_daily']
min_EY_daily = activity_max['EY_daily']

max_EY_weekly = activity_max['EY_weekly']
min_EY_weekly = activity_max['EY_weekly']

max_EY_rest = activity_max['EY_rest']
min_EY_rest = activity_max['EY_rest']

max_vals = [max_total_games,max_daily,max_weekly,max_rest,max_ES_games,max_ES_daily,max_ES_weekly,max_ES_rest,max_EY_games,max_EY_daily,max_EY_weekly,max_EY_rest]
min_vals = [min_total_games,min_daily,min_weekly,min_rest,min_ES_games,min_ES_daily,min_ES_weekly,min_ES_rest,min_EY_games,min_EY_daily,min_EY_weekly,min_EY_rest]
zip_vals = list(zip(max_vals, min_vals))
activity_normalise = [abs(a[0]) + abs(a[1]) for a in zip_vals]
activity_metrics = ['total_games', 'avg_daily', 'avg_weekly', 'avg_rest', 'ES_games', 'ES_daily', 'ES_weekly', 'ES_rest', 'EY_games', 'EY_daily', 'EY_weekly', 'EY_rest']
activity_min_vals = min_vals


In [204]:
# new_cols = [['player', 'ProgressScore', 'ProgressConsistencyScore', 'S1Score', 'S2Score', 'S3Score', 'StrategyFlexibility', 'ActivityScore', 'rating_component', 'basicWLD_component', 'relativeWLD_component', 'absoluteWLD_component', 'P_rank', 'A_rank']]
new_cols = [['player', 'ProgressScore', 'S1Score', 'S2Score', 'S3Score', 'StrategyFlexibility', 'ActivityScore', 'rating_component', 'basicWLD_component', 'relativeWLD_component', 'absoluteWLD_component']]

new_rows = []
concat_df = pd.concat([rating_df, basic_wld_df, relative_wld_df, absolute_wld_df, strategy_df, activity_df],axis=1)
try:
  concat_df = concat_df.drop("Unnamed: 0",axis=1)
except KeyError:
  pass
concat_df = concat_df.loc[:,~concat_df.columns.duplicated()]
i = 0
for player in players:
  temp_df = concat_df.loc[concat_df['player'] == player].iloc[0]

  # Rating Component
  metrics = [temp_df[x] for x in rating_metrics]
  min_vals = rating_min_vals
  denominators = rating_normalise
  weights = rating_weights
  rating_component = 0
  for m in range(0, len(metrics)):
    # Normalise and multiply by weights
    rating_component += (((metrics[m] + abs(min_vals[m]))/denominators[m]) * weights[m])

  # Basic WLD Component
  metrics = [temp_df[x] for x in basicWLD_metrics]
  weights = basicWLD_weights
  basicWLD_component = 0
  for m in range(0, len(metrics)):
    basicWLD_component += ((metrics[m] + 100)/2 * weights[m])/100
  
  # Relative WLD component
  metrics = [temp_df[x] for x in relativeWLD_metrics]
  weights = relativeWLD_weights
  relativeWLD_component = 0
  for m in range(0, len(metrics)):
    relativeWLD_component += ((metrics[m] + 100)/2 * weights[m])/100

  # Absolute WLD component
  metrics = [temp_df[x] for x in absoluteWLD_metrics]
  weights = absoluteWLD_weights
  absoluteWLD_component = 0
  for m in range(0, len(metrics)):
    absoluteWLD_component += ((metrics[m] + 100)/2 * weights[m])/100

  progress_components = [rating_component, basicWLD_component, relativeWLD_component, absoluteWLD_component]
  # Progress Component
  # Combine rating and basic, relative and absolute WLD components
  weights = [2, 2, 1, 1]
  weights = [w/6 for w in weights]
  progressScore = sum([progress_components[i] * weights[i] for i in range(0, len(progress_components))])

  # Strategy component
  S1Score = temp_df['S1']/100
  S2Score = temp_df['S2']/100
  S3Score = temp_df['S3']/100

  # Strategy Flexibility Component
  metrics = [temp_df[x] for x in strategy_metrics]
  min_vals = strategy_min_vals
  denominators = strategy_normalise
  weights = strategy_weights
  flexibilityScore = 0
  for m in range(0, len(metrics)):
    # Normalise and multiply by weights
    flexibilityScore += (((metrics[m] + abs(min_vals[m]))/denominators[m]) * weights[m])

  # Activity
  metrics = [temp_df[x] for x in activity_metrics]
  min_vals = activity_min_vals
  denominators = activity_normalise
  weights = activity_weights
  activityScore = 0
  for m in range(0, len(metrics)):
    # Normalise and multiply by weights
    activityScore += (((metrics[m] + abs(min_vals[m]))/denominators[m]) * weights[m])

  
  playerScores = [progressScore, S1Score, S2Score, S3Score, flexibilityScore, activityScore] + progress_components
  playerScores = list(np.around(np.array(playerScores),3))
  player_row = [player] + [x for x in playerScores]
  new_rows.append(player_row)
# new_cols = [['player', 'ProgressScore', 'ProgressConsistencyScore', 'S1Score', 'S2Score', 'S3Score', 'StrategyFlexibility', 'ActivityScore', 'rating_component', 'basicWLD_component', 'relativeWLD_component', 'absoluteWLD_component', 'P_rank', 'A_rank']]

# # strategy_df = pd.DataFrame(columns=new_cols, data=new_rows)
Part1Progress = pd.DataFrame(columns=new_cols, data=new_rows)

In [205]:
csv_path = results_folder + 'Part1Progress.csv'
Part1Progress.to_csv(csv_path)

In [208]:
player_df = concat_df.loc[concat_df['player'] == 'mfhgxe0']
player_df

Unnamed: 0,player,ES_avg_rating,EY_avg_rating,ES_max_rating,SE_min_rating,ES_win,SE_loss,SE_draw,EY_win,EY_loss,EY_draw,ES_win_higher,SE_loss_higher,SE_draw_higher,ES_win_lower,SE_loss_lower,SE_draw_lower,EY_win_higher,EY_loss_higher,EY_draw_higher,EY_win_lower,EY_loss_lower,EY_draw_lower,ES_win_Master,ES_win_ClassA,ES_win_ClassB,ES_win_ClassC,ES_win_ClassD,ES_win_Novice,SE_loss_Master,SE_loss_ClassA,SE_loss_ClassB,SE_loss_ClassC,SE_loss_ClassD,SE_loss_Novice,SE_draw_Master,SE_draw_ClassA,SE_draw_ClassB,SE_draw_ClassC,SE_draw_ClassD,...,EY_win_ClassD,EY_win_Novice,EY_loss_Master,EY_loss_ClassA,EY_loss_ClassB,EY_loss_ClassC,EY_loss_ClassD,EY_loss_Novice,EY_draw_Master,EY_draw_ClassA,EY_draw_ClassB,EY_draw_ClassC,EY_draw_ClassD,EY_draw_Novice,S1,S2,S3,S1_start,S2_start,S3_start,S1_end,S2_end,S3_end,S1_SD,S2_SD,S3_SD,avg_opponents,avg_games,total_games,avg_daily,avg_weekly,avg_rest,ES_games,ES_daily,ES_weekly,ES_rest,EY_games,EY_daily,EY_weekly,EY_rest
252,mfhgxe0,-46.284785,-24.803669,-26,57,5.013486,5.013486,0.0,0.971334,-0.567643,-0.403691,0.0,0.0,0.0,5.013486,5.013486,0.0,0.0,0.0,0.0,0.971334,-0.567643,-0.403691,0.0,0.0,-100.0,0.0,7.275132,0.0,0.0,0.0,0.0,0.0,7.275132,0.0,0.0,0.0,0.0,0.0,0.0,...,1.080074,0.0,0.0,0.0,0.0,-2.116402,-0.652419,0.0,0.0,0.0,0.0,-0.529101,-0.427655,0.0,0.0,100.0,0.0,0.0,100.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0,103.416667,144.5,1734,4.588869,32.122083,1.666667,158,4.9375,34.5625,-1.0,46.5,1.379881,9.659167,-0.666667


### Part 2: User Distributions
Based on the user classifications, the distribution of users will be calculated and plotted.

### Part 3: User Profile Classifications
Using unsupervised learning, the users will be divided into a number of groups.

## Second Stage
This stage analyses whether a user's strategy is relevant to their progress. It will measure the influence of the opponents they play against their own progress based on the definition from the first stage.

## Third Stage
This stage is to do with verifying the the second stage, using more nodes from the same dataset, datasets from other chess platforms or from other games.

##Fourth Stage
This stage is concerned with the potential value of the project. This looks into the relevance of progress data in rating prediction or trend, which may reveal causality or reasoning.