# Exctracting model and dataset preprocessing

**ATTENTION:**

Notebook language: **R**

A model from a paper [EXPLAINABLE EXPECTED GOAL MODELS FOR PERFORMANCE
ANALYSIS IN FOOTBALL ANALYTICS](https://arxiv.org/pdf/2206.07212.pdf) will be used for further research. It is a `ranger` model trained on oversampled dataset.

## Preparing preprocessed dataset

In [2]:
df <- read.csv('./data/raw_data.csv')
df <- df[,-1]
head(df)

Unnamed: 0_level_0,league,id,minute,result,X,Y,player,h_a,player_id,situation,season,shotType,match_id,home_team,away_team,home_goals,away_goals,date,player_assisted,lastAction
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<int>,<chr>,<int>,<chr>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>
1,Ligue_1,425095,7,MissedShots,0.964,0.654,Myron Boadu,h,9612,OpenPlay,2021,LeftFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,BallRecovery
2,Ligue_1,425098,13,Goal,0.925,0.431,Gelson Martins,h,7012,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Caio Henrique,Throughball
3,Ligue_1,425100,24,BlockedShot,0.785,0.388,Kevin Volland,h,83,OpenPlay,2021,LeftFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,
4,Ligue_1,425101,24,MissedShots,0.761,0.525,Jean Lucas,h,7687,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,Rebound
5,Ligue_1,425102,30,MissedShots,0.936,0.415,Kevin Volland,h,83,FromCorner,2021,Head,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Jean Lucas,Aerial
6,Ligue_1,425104,42,MissedShots,0.751,0.511,Aurelien Tchouameni,h,6560,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Caio Henrique,Pass


Remove `OwnGoal` - it is also a first preprocessing step 

In [None]:
library(dplyr)

In [11]:
df <- df %>% filter(result != "OwnGoal")
head(df)

Unnamed: 0_level_0,league,id,minute,result,X,Y,player,h_a,player_id,situation,season,shotType,match_id,home_team,away_team,home_goals,away_goals,date,player_assisted,lastAction
Unnamed: 0_level_1,<chr>,<int>,<int>,<chr>,<dbl>,<dbl>,<chr>,<chr>,<int>,<chr>,<int>,<chr>,<int>,<chr>,<chr>,<int>,<int>,<chr>,<chr>,<chr>
1,Ligue_1,425095,7,MissedShots,0.964,0.654,Myron Boadu,h,9612,OpenPlay,2021,LeftFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,BallRecovery
2,Ligue_1,425098,13,Goal,0.925,0.431,Gelson Martins,h,7012,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Caio Henrique,Throughball
3,Ligue_1,425100,24,BlockedShot,0.785,0.388,Kevin Volland,h,83,OpenPlay,2021,LeftFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,
4,Ligue_1,425101,24,MissedShots,0.761,0.525,Jean Lucas,h,7687,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,,Rebound
5,Ligue_1,425102,30,MissedShots,0.936,0.415,Kevin Volland,h,83,FromCorner,2021,Head,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Jean Lucas,Aerial
6,Ligue_1,425104,42,MissedShots,0.751,0.511,Aurelien Tchouameni,h,6560,OpenPlay,2021,RightFoot,17822,Monaco,Nantes,1,1,2021-08-06 19:00:00,Caio Henrique,Pass


Update `raw_data.csv` to avoid conflicts in future

In [12]:
write.csv(df, './data/raw_data.csv')

### Preprocessing

In [9]:
source('./scripts/preprocess.R') #preprocessing steps from the paper

In [10]:
df_preprocessed <- preprocess(df)
head(df_preprocessed)

Unnamed: 0_level_0,status,minute,h_a,situation,shotType,lastAction,distanceToGoal,angleToGoal
Unnamed: 0_level_1,<dbl>,<dbl>,<fct>,<fct>,<fct>,<fct>,<dbl>,<dbl>
1,0,7,h,OpenPlay,LeftFoot,BallRecovery,12.554569,10.86049
2,1,13,h,OpenPlay,RightFoot,Throughball,8.497323,44.42738
3,0,24,h,OpenPlay,LeftFoot,,23.388803,17.20585
4,0,24,h,OpenPlay,RightFoot,Rebound,25.298204,16.33905
5,0,30,h,FromCorner,Head,Aerial,7.967234,44.48587
6,0,42,h,OpenPlay,RightFoot,Pass,26.241467,15.82464


In [13]:
write.csv(df_preprocessed, './data/data_preprocessed.csv')

## Extracting model to `Python` format

In [None]:
model <- readRDS('./model.model.RDS')

In [None]:
# install.packages("reticulate")

In [None]:
library(reticulate)

In [None]:
py_save_object(model$forest, './model.pickle')

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=638a36e2-efff-486f-858d-cbca546da2c6' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>