## **Supervised ML regression algorithm to predict next round team value (CT & T)**
## **Preprocessing**

Input data: Obtained DataFrames from the previous prediction:
- ct_predicted_value
- t_predicted_value

Preprocces the data to create a new traget column, next round team value (**nxt_rnd_ct_val** & **nxt_rnd_t_val**)

In [1]:
import pandas as pd

In [2]:
pd.set_option('display.max_columns', 30)
pd.set_option('display.max_rows', 30)

### Data

In [3]:
ct_df = pd.read_csv('../data/results/ct_predicted_value.csv')
t_df = pd.read_csv('../data/results/t_predicted_value.csv')

## Preprocessing

To train the model we need to know the team value for the next round. 

We will train with the predicted value for the current round and real value for the next round

In [4]:
ct_df.drop(columns=['Unnamed: 0'], inplace=True)
t_df.drop(columns=['Unnamed: 0'], inplace=True)

#### Create the columns **nxt_rnd_ct_val** & **nxt_rnd_t_val** filled with 0

In [5]:
ct_df['nxt_rnd_ct_val'] = 0
t_df['nxt_rnd_t_val'] = 0

In [6]:
files = ct_df['file'].unique()

In [7]:
ct_df.set_index(['file', 'round'], inplace=True)
t_df.set_index(['file', 'round'], inplace=True)

#### Filling the values with the real values of next rounds

In [8]:
%%time

a = 0

for file in files:
    a += 1
    if a%500 == 0:
        print(f'{a}/12185')
    for rnd in ct_df.loc[file].index[:-1]:
        ct_df.loc[(file, rnd), 'nxt_rnd_ct_val'] = ct_df.loc[(file, rnd+1), 'ct_val_pred']
        t_df.loc[(file, rnd), 'nxt_rnd_t_val'] = t_df.loc[(file, rnd+1), 't_val_pred']

500/12185
1000/12185
1500/12185
2000/12185
2500/12185
3000/12185
3500/12185
4000/12185
4500/12185
5000/12185
5500/12185
6000/12185
6500/12185
7000/12185
7500/12185
8000/12185
8500/12185
9000/12185
9500/12185
10000/12185
10500/12185
11000/12185
11500/12185
12000/12185
CPU times: user 15min 34s, sys: 107 ms, total: 15min 34s
Wall time: 15min 35s


### Reset index and save the DataFrames 

In [9]:
ct_df.reset_index(inplace=True)
t_df.reset_index(inplace=True)

In [7]:
ct_df.to_csv('../data/processed/3_base_predict_next_rnd_ct_val.csv', index=True)
t_df.to_csv('../data/processed/3_base_predict_next_rnd_t_val.csv', index=True)

### Load Data

In [8]:
ct_df = pd.read_csv('../data/processed/3_base_predict_next_rnd_ct_val.csv')
t_df = pd.read_csv('../data/processed/3_base_predict_next_rnd_t_val.csv')

In [9]:
display(ct_df.head())
display(t_df.head())

Unnamed: 0.1,Unnamed: 0,file,round,wp_ct_val,nade_ct_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,ct_val_real,ct_val_pred,nxt_rnd_ct_val
0,0,0,1,1000.0,550,5,5,0.5,0.5,0,0,4550,4078.134589,18450
1,1,0,2,10100.0,1100,4,0,1.0,0.0,1,0,18450,17819.702711,9550
2,2,0,3,4125.0,900,0,1,0.0,0.0,0,1,9550,7038.468589,1600
3,3,0,4,1000.0,0,0,3,0.0,1.0,0,2,1600,1452.468928,23350
4,4,0,5,15500.0,1400,0,4,0.0,1.0,0,3,23350,22676.205763,26400


Unnamed: 0.1,Unnamed: 0,file,round,wp_t_val,nade_t_val,ct_alive,t_alive,ct_winner,bomb_planted,ct_cons_wins,t_cons_wins,t_val_real,t_val_pred,nxt_rnd_t_val
0,0,0,1,1166.666667,1200,5,5,0.5,0.5,0,0,3850,3943.272665,5300
1,1,0,2,3687.5,50,4,0,1.0,0.0,1,0,5300,6290.616771,22900
2,2,0,3,11700.0,2450,0,1,0.0,0.0,0,1,22900,19600.790638,19650
3,3,0,4,11700.0,1600,0,3,0.0,1.0,0,2,19650,22568.098741,21750
4,4,0,5,12750.0,1700,0,4,0.0,1.0,0,3,21750,24459.855175,19900
