## **Supervised ML classification algorithm to predict next round team value (CT & T)**
## **Analysis**

Analyze first the data to define the **number of bins** we are going to create and then predict.


In [1]:
import pandas as pd

pd.set_option('display.max_columns', 30)
pd.set_option('display.max_rows', 50)

In [2]:
meta_demos = pd.read_csv('../data/csgo/esea_meta_demos.csv')

In [3]:
meta_demos.head()

Unnamed: 0.1,Unnamed: 0,file,map,round,start_seconds,end_seconds,winner_team,winner_side,round_type,ct_eq_val,t_eq_val
0,0,esea_match_13770997.dem,de_overpass,1,94.30782,160.9591,Hentai Hooligans,Terrorist,PISTOL_ROUND,4300,4250
1,1,esea_match_13770997.dem,de_overpass,2,160.9591,279.3998,Hentai Hooligans,Terrorist,ECO,6300,19400
2,2,esea_match_13770997.dem,de_overpass,3,279.3998,341.0084,Hentai Hooligans,Terrorist,SEMI_ECO,7650,19250
3,3,esea_match_13770997.dem,de_overpass,4,341.0084,435.4259,Hentai Hooligans,Terrorist,NORMAL,24900,23400
4,4,esea_match_13770997.dem,de_overpass,5,435.4259,484.2398,Animal Style,CounterTerrorist,ECO,5400,20550


In [4]:
meta_demos['round_type'].unique()

array(['PISTOL_ROUND', 'ECO', 'SEMI_ECO', 'NORMAL', 'FORCE_BUY'],
      dtype=object)

#### From esea_meta_demos.csv we have **round_type** with 5 different classes closely related to team values:

- PISTOL_ROUND
- ECO
- SEMI_ECO
- FORCE_BUY
- NORMAL

#### Let's analyze each type and try to figure out the possible range of team value for every class

In [5]:
meta_types = meta_demos[['round_type', 'ct_eq_val', 't_eq_val']]

In [6]:
display(meta_types.groupby('round_type').min())
display(meta_types.groupby('round_type').max())

Unnamed: 0_level_0,ct_eq_val,t_eq_val
round_type,Unnamed: 1_level_1,Unnamed: 2_level_1
ECO,200,400
FORCE_BUY,3000,2200
NORMAL,2750,2300
PISTOL_ROUND,1000,400
SEMI_ECO,1250,2800


Unnamed: 0_level_0,ct_eq_val,t_eq_val
round_type,Unnamed: 1_level_1,Unnamed: 2_level_1
ECO,42050,36700
FORCE_BUY,40350,38150
NORMAL,41250,37900
PISTOL_ROUND,30300,28350
SEMI_ECO,41100,35250


Not reliable information because round_type could refer only to one team.

We can see it easily in ECO type.

First, let's define ECO rounds. An ECO round occurs when a team has not enough money to get great weapons to face against the other team. Usually, the members of the team buy weapons or utilities but saving around $2000\$$ for the next round. This way they have the chance the next round to win with a more equally confrontation

Understanding now the ECO round concept we see that the minimum value is $200\$$ and $400\$$ which makes sense. But looking at the maximum values we can see that those values are $42.050\$$ and $36.700\$$, extremely high for a team who cannot afford good equipment.

It means that the classification is made with the cheapest team value.

From the cheapest to the most expensive type, the classification is:
- ECO 
- SEMI_ECO
- FORCE_BUY
- NORMAL

PISTOL_ROUND is a special type that occurs the first round of each playable side (2 times per game)

## **Preprocessing**
#### Let's create a new column with the lowest team value for each round and analyze the round type.

In [7]:
meta_types['min_eq_val'] = meta_types[['ct_eq_val', 't_eq_val']].min(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [8]:
meta_types.head(10)

Unnamed: 0,round_type,ct_eq_val,t_eq_val,min_eq_val
0,PISTOL_ROUND,4300,4250,4250
1,ECO,6300,19400,6300
2,SEMI_ECO,7650,19250,7650
3,NORMAL,24900,23400,23400
4,ECO,5400,20550,5400
5,NORMAL,29650,25450,25450
6,ECO,3200,25300,3200
7,ECO,4850,27600,4850
8,FORCE_BUY,32150,18200,18200
9,ECO,32950,9950,9950


In [9]:
meta_types[['round_type', 'min_eq_val']].groupby('round_type').max()

Unnamed: 0_level_0,min_eq_val
round_type,Unnamed: 1_level_1
ECO,14500
FORCE_BUY,22900
NORMAL,34800
PISTOL_ROUND,24300
SEMI_ECO,18400


We can see the **top value** of each category according to the given classification

### Data

Let's load the data obtained from the first prediction (2_4_ml_regressor_lgbm_tuning.ipynb):
- ct_predicted_value.csv
- t_predicted_value.csv

In [10]:
ct_df = pd.read_csv('../data/results/ct_predicted_value.csv')
t_df = pd.read_csv('../data/results/t_predicted_value.csv')

In [11]:
ct_df.drop(columns=['Unnamed: 0'], inplace=True)
t_df.drop(columns=['Unnamed: 0'], inplace=True)

In [12]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred
0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589
1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711
2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589
3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928
4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763


Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred
0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665
1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771
2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638
3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741
4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175


#### Incorporate **round_type** column to the dataframes ct_df & t_df

We can rescue this column from the csv **base_to_ml_predicted_team_value.csv** exported in 2_1_ml_preprocessingdata.ipynb

In [13]:
df_round_type = pd.read_csv('../data/processed/base_to_ml_predicted_team_value.csv')

In [14]:
df_round_type.head()

Unnamed: 0,file,round,wp_ct_val,wp_t_val,nade_ct_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,t_val_real,round_type
0,esea_match_13779704.dem,1,1000.0,1166.666667,550,1200,5,5,0.5,0.5,0,0,4550,3850,PISTOL_ROUND
1,esea_match_13779704.dem,2,10100.0,3687.5,1100,50,4,0,1.0,0.0,1,0,18450,5300,ECO
2,esea_match_13779704.dem,3,4125.0,11700.0,900,2450,0,1,0.0,0.0,0,1,9550,22900,SEMI_ECO
3,esea_match_13779704.dem,4,1000.0,11700.0,0,1600,0,3,0.0,1.0,0,2,1600,19650,ECO
4,esea_match_13779704.dem,5,15500.0,12750.0,1400,1700,0,4,0.0,1.0,0,3,23350,21750,NORMAL


In [15]:
ct_df['round_type'] = df_round_type['round_type']
t_df['round_type'] = df_round_type['round_type']

In [16]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred,round_type
0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589,PISTOL_ROUND
1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711,ECO
2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589,SEMI_ECO
3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928,ECO
4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763,NORMAL


Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred,round_type
0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665,PISTOL_ROUND
1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771,ECO
2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638,SEMI_ECO
3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741,ECO
4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175,NORMAL


#### As mentioned before, this is not the best way to classify the rounds, because it is not correlated to each team.

Let's do a manual classification by given thresholds:

- ECO $\leqslant$ 14500
- 14500 $<$ SEMI_ECO $\leqslant$ 18400
- 18400 $<$ FORCE_BUY $\leqslant$ 22900
- 22900 $<$ NORMAL

**PISTOL_ROUND will be the only one class that remains as given**

In [17]:
def get_round_type(index, ct_df, t_df):
    ct_val = ct_df.loc[index, 'ct_val_real']
    if ct_df.loc[index, 'round_type'] == 'PISTOL_ROUND':
        ct_rnd_type = 'PISTOL_ROUND'
    elif ct_val <= 14500:
        ct_rnd_type = 'ECO'
    elif ct_val <= 18400 and ct_val > 14500:
        ct_rnd_type = 'SEMI_ECO'
    elif ct_val <= 22900 and ct_val > 18400:
        ct_rnd_type = 'FORCE_BUY'
    elif ct_val > 22900:
        ct_rnd_type = 'NORMAL'
        
    t_val = t_df.loc[index, 't_val_real']
    if t_df.loc[index, 'round_type'] == 'PISTOL_ROUND':
        t_rnd_type = 'PISTOL_ROUND'
    elif t_val <= 14500:
        t_rnd_type = 'ECO'
    elif t_val <= 18400 and t_val > 14500:
        t_rnd_type = 'SEMI_ECO'
    elif t_val <= 22900 and t_val > 18400:
        t_rnd_type = 'FORCE_BUY'
    elif t_val > 22900:
        t_rnd_type = 'NORMAL'
        
    return ct_rnd_type, t_rnd_type

In [18]:
lst_round_type = [get_round_type(index, ct_df, t_df) for index in ct_df.index]

In [19]:
df_round_type = pd.DataFrame(lst_round_type, columns=[['ct_round_type', 't_round_type']])
df_round_type.head()

Unnamed: 0,ct_round_type,t_round_type
0,PISTOL_ROUND,PISTOL_ROUND
1,FORCE_BUY,ECO
2,ECO,FORCE_BUY
3,ECO,FORCE_BUY
4,NORMAL,FORCE_BUY


In [20]:
ct_df['round_type'] = df_round_type[['ct_round_type']]
t_df['round_type'] = df_round_type[['t_round_type']]

In [21]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred,round_type
0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589,PISTOL_ROUND
1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711,FORCE_BUY
2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589,ECO
3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928,ECO
4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763,NORMAL


Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred,round_type
0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665,PISTOL_ROUND
1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771,ECO
2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638,FORCE_BUY
3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741,FORCE_BUY
4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175,FORCE_BUY


#### From the experience of playing the game for a very long time, I do not agree with the results obtained from the applied threshold 

Let's define a new classification with the next parameters:
- ECO
- MEDIUM (Minimum threshold value: Armor + cheapest SMG + Flash $\rightarrow$ 1900x5 =  9.500)
- FULL (minimum threshold value: Armor-helmet + AK47 + Grenade + Flash $\rightarrow$ 4200x5 = 21.000)

In [22]:
def get_round_type(index, ct_df, t_df):
    ct_val = ct_df.loc[index, 'ct_val_real']
    if ct_df.loc[index, 'round_type'] == 'PISTOL_ROUND':
        ct_rnd_type = 'PISTOL_ROUND'
    elif ct_val <= 9500:
        ct_rnd_type = 'ECO'
    elif ct_val <= 21000 and ct_val > 9500:
        ct_rnd_type = 'MEDIUM'
    elif ct_val > 21000:
        ct_rnd_type = 'FULL'
        
    t_val = t_df.loc[index, 't_val_real']
    if t_df.loc[index, 'round_type'] == 'PISTOL_ROUND':
        t_rnd_type = 'PISTOL_ROUND'
    elif t_val <= 9500:
        t_rnd_type = 'ECO'
    elif t_val <= 21000 and t_val > 9500:
        t_rnd_type = 'MEDIUM'
    elif t_val > 21000:
        t_rnd_type = 'FULL'
        
    return ct_rnd_type, t_rnd_type

In [23]:
lst_round_type = [get_round_type(index, ct_df, t_df) for index in ct_df.index]

In [24]:
df_round_type = pd.DataFrame(lst_round_type, columns=[['ct_round_type', 't_round_type']])
df_round_type.head()

Unnamed: 0,ct_round_type,t_round_type
0,PISTOL_ROUND,PISTOL_ROUND
1,MEDIUM,ECO
2,MEDIUM,FULL
3,ECO,MEDIUM
4,FULL,FULL


In [25]:
ct_df['round_type'] = df_round_type[['ct_round_type']]
t_df['round_type'] = df_round_type[['t_round_type']]

In [26]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred,round_type
0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589,PISTOL_ROUND
1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711,MEDIUM
2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589,MEDIUM
3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928,ECO
4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763,FULL


Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred,round_type
0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665,PISTOL_ROUND
1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771,ECO
2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638,FULL
3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741,MEDIUM
4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175,FULL


### Let's create the column to predict, the **next_round_type**

In [27]:
%%time

# This code does not look fast, but it is 3 times faster than other much more pretty and sort that 
# I did for another similar situation

files = ct_df['file'].unique()

nxt_rnd_type_ct = []
nxt_rnd_type_t = []

log = 0

for file in files:

    ct_df_file = ct_df[ct_df['file'] == file]
    t_df_file = t_df[t_df['file'] == file]
    
    rounds = ct_df_file['round'].unique()
    
    if log%1000 == 0:
        print(f'{log}/12185')    
    log += 1

    
    for rnd in rounds:
        if rnd == rounds[-1]:
            nxt_rnd_type_ct.append('LAST')
            nxt_rnd_type_t.append('LAST')
        else:
            nxt_rnd_type_ct.append(ct_df_file[ct_df_file['round'] == rnd + 1]['round_type'].values[0])
            nxt_rnd_type_t.append(t_df_file[t_df_file['round'] == rnd + 1]['round_type'].values[0])
            
ct_df['nxt_rnd_type'] = nxt_rnd_type_ct
t_df['nxt_rnd_type'] = nxt_rnd_type_t

0/12185
1000/12185
2000/12185
3000/12185
4000/12185
5000/12185
6000/12185
7000/12185
8000/12185
9000/12185
10000/12185
11000/12185
12000/12185
CPU times: user 5min 7s, sys: 87.6 ms, total: 5min 7s
Wall time: 5min 7s


In [28]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred,round_type,nxt_rnd_type
0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589,PISTOL_ROUND,MEDIUM
1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711,MEDIUM,MEDIUM
2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589,MEDIUM,ECO
3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928,ECO,FULL
4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763,FULL,FULL


Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred,round_type,nxt_rnd_type
0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665,PISTOL_ROUND,ECO
1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771,ECO,FULL
2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638,FULL,MEDIUM
3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741,MEDIUM,FULL
4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175,FULL,MEDIUM


#### Last step before going into the ml classification algorithm, let's remove the real value columns, and then save the dataframes to make our predictions

In [29]:
ct_df.drop(columns=['ct_val_real'], inplace=True)
t_df.drop(columns=['t_val_real'], inplace=True)

In [30]:
ct_df.to_csv('../data/processed/4_base_predict_next_rnd_ct_type.csv', index=False)
t_df.to_csv('../data/processed/4_base_predict_next_rnd_t_type.csv', index=False)