# Tarea Práctica Machine Learning

In [25]:
install.packages("tidyverse")

Installing package into 'C:/Users/Mariano/Documents/R/win-library/3.6'
(as 'lib' is unspecified)


package 'tidyverse' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Mariano\AppData\Local\Temp\Rtmp4eCgH9\downloaded_packages


## Bibliioteca caret para R
**caret** es un acronimo para **C**lassification **A**nd **RE**gression **T**raining (Entrenamiento de Regresión y Clasificación).

- Caret nos permite acelerar nuestro trabajo proveyendo una interfaz común a cientos de algoritmos de Machine Learning: 
    - Este paquete nos permite utilizar más de 200 algoritmos de Machine Learning distintos.
    - Funciones para dividir los datos y muestrear los datos (Data splitting/sampling). 
    - Seleccionar características (Feature selection).
    - Tuneo de modelos.

Documentación de caret: http://topepo.github.io/caret/index.html

## Bibliotteca e1071
**e1071** implementa varios algoritmos de Mahchine Learning que se pueden acceder con caret.

In [26]:
library(dplyr)
library(lattice)
library(ggplot2)
library(e1071)
library(caret)
library(DataExplorer)
library(tidyverse)

-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v tibble  2.1.3     v purrr   0.3.2
v tidyr   0.8.3     v stringr 1.4.0
v readr   1.3.1     v forcats 0.4.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
x purrr::lift()   masks caret::lift()


## Dataset **FIFA 2019**
Para este trabajo vamos a utilizar el un conjunto de datos de jugadores del juego FIFA 2019. https://www.kaggle.com/karangadiya/fifa19

Este dataset contiene informacion de los distintos jugadores del video juego FIFA 2019. El mismo contiene estadísticas de los distintos jugadores de fútbol que están en el juego.

En esta tarea vamos a intentar predecir el valor de los jugadores a partir de las distintas estadísticas.


In [53]:
# Defino url del archivo csv
fifa_url <- 'datasets/fifa19.csv'

# cargo el dataset en un dataframe
fifa <- read.table(fifa_url, sep = ",", header = TRUE, quote = "\"", encoding="UTF-8")

Observamos los datos antes de hacerles cualquier preprocesamiento

In [34]:
summary(fifa)

       X               ID                   Name            Age       
 Min.   :    0   Min.   :    16   J. Rodríguez:   11   Min.   :16.00  
 1st Qu.: 4552   1st Qu.:200316   Paulinho    :    8   1st Qu.:21.00  
 Median : 9103   Median :221759   J. Williams :    7   Median :25.00  
 Mean   : 9103   Mean   :214298   R. Williams :    7   Mean   :25.12  
 3rd Qu.:13654   3rd Qu.:236530   Felipe      :    6   3rd Qu.:28.00  
 Max.   :18206   Max.   :246620   J. Gómez    :    6   Max.   :45.00  
                                  (Other)     :18162                  
                                            Photo          Nationality   
 https://cdn.sofifa.org/players/4/19/100803.png:    1   England  : 1662  
 https://cdn.sofifa.org/players/4/19/100899.png:    1   Germany  : 1198  
 https://cdn.sofifa.org/players/4/19/101317.png:    1   Spain    : 1072  
 https://cdn.sofifa.org/players/4/19/101473.png:    1   Argentina:  937  
 https://cdn.sofifa.org/players/4/19/101488.png:    1   France

## Primero vamos a hacer una mini limpieza de los datos

Descartamos columnas que no son útiles para el aprendizaje: IDs y URLs

In [54]:
fifa <- select(fifa,-c('X', 'ID', 'Flag', 'Photo', 'Club.Logo'))

Convertimos a numéricos los valores sacandoles el símbolo de euros y los K y M al final, de miles y millones respectivamente.

In [55]:
value_wage <- function(df_value) {
    value <- substr(df_value[[1]], start=2, stop=nchar(df_value[[1]])-1)
    end <- substr(df_value[[1]], start=nchar(df_value[[1]]), stop=nchar(df_value[[1]]))
    if (identical(end, "M")) {
        value <- as.numeric(value) * 1000000 
    } else if (identical(end, 'K')) {
        value <- as.numeric(value) * 1000
    } else {
        value <- 0
    }
    return(value)
}

fifa['Value'] <- apply(select(fifa, 'Value'), 1, value_wage)
fifa['Wage'] <- apply(select(fifa, 'Wage'), 1, value_wage)
fifa['Release.Clause'] <- apply(select(fifa, 'Release.Clause'), 1, value_wage)

Descartamos algunas columnas que no vamos a utilizar

In [56]:
fifa <- select(fifa, -c('LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM', 
                        'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB',
                        'LCB', 'CB', 'RCB', 'RB'))
summary(fifa)

           Name            Age           Nationality       Overall     
 J. Rodríguez:   11   Min.   :16.00   England  : 1662   Min.   :46.00  
 Paulinho    :    8   1st Qu.:21.00   Germany  : 1198   1st Qu.:62.00  
 J. Williams :    7   Median :25.00   Spain    : 1072   Median :66.00  
 R. Williams :    7   Mean   :25.12   Argentina:  937   Mean   :66.24  
 Felipe      :    6   3rd Qu.:28.00   France   :  914   3rd Qu.:71.00  
 J. Gómez    :    6   Max.   :45.00   Brazil   :  827   Max.   :94.00  
 (Other)     :18162                   (Other)  :11597                  
   Potential                    Club           Value                Wage       
 Min.   :48.00                    :  241   Min.   :        0   Min.   :     0  
 1st Qu.:67.00   Arsenal          :   33   1st Qu.:   300000   1st Qu.:  1000  
 Median :71.00   AS Monaco        :   33   Median :   675000   Median :  3000  
 Mean   :71.31   Atlético Madrid  :   33   Mean   :  2410696   Mean   :  9731  
 3rd Qu.:75.00   Borussi

In [57]:
unique(fifa$Position)

## Ahora vamos a trabajar un poco en las features

En el dataset claramente tenemos features que no son numéricas, este tipo de features es necesario transformarlas para llevarlas a algo numérico, para esto recurrimos a one_hot_encoding que nos permite llevar a numérico features no numéricas, esto se hace creando una columna nueva por cada uno de los valores posibles que puede tomar la feature.

Pare ello recurrimos a la función dummyVars

In [58]:
# aplicamos one hot enconding a Preferred.Foot es decir si es diestro o zurdo
dmy <- dummyVars(" ~ .", data = fifa['Preferred.Foot'])
dmy <- data.frame(predict(dmy, newdata = fifa))
# agregamos las columnas nuevas al dataframe
fifa[colnames(dmy)] <- dmy
# descartamos la columna vieja
fifa <- select(fifa,-c('Preferred.Foot'))

Antes de aplicar el procedimiento anterior a las distintas variables categóricas hay que tener en cuenta que si la cantidad de valores posibles para una columna es muy alta, puede darnos problemas (de memoria y dificultad para que el algoritmo aprenda), por lo tanto hay que crear agrupaciones o descartar la columna.

Por ejemplo agrupamos la posición del jugador en una más sencilla

In [59]:
simple_position <- function(df) {
    value <- df['Position'][[1]]
    if (identical(value, 'GK')) {
        return('GK')
    } else if (identical(value, 'CB') | identical(value, 'LCB') |
              identical(value, 'RCB')) {
        return('CB')
    } else if(identical(value, 'RB') | identical(value, 'LB')) {
        return('RLB')
    } else if(identical(value, 'RWB') | identical(value, 'LWB')) {
        return('LRWB')          
    } else if (identical(value, 'LDM') | identical(value, 'RDM')) {
        return('LRDM')
    } else if(identical(value, 'RCM') | identical(value, 'LCM')
             | identical(value, 'CM')){
        return('CM')
    } else if (identical(value, 'LM') | identical(value, 'RM')) {
        return('LRM')
    } else if (identical(value, 'LW') | identical(value, 'RW')
              | identical(value, 'LAM') | identical(value, 'RAM')) {
        return('LRAW')
    } else if (identical(value, 'CF') | identical(value, 'LF') | identical(value, 'RF')) {
        return('CLRF')
    } else if (identical(value, 'RS') | identical(value, 'LS') | identical(value, 'ST')) {
        return('ST')
    } else {
        return(value)
    }
}

fifa['Position'] <- apply(fifa, 1, simple_position)

In [40]:
str(fifa)

'data.frame':	18207 obs. of  58 variables:
 $ Name                    : Factor w/ 17194 levels "A. Ábalos","A. Abang",..: 9676 3192 12552 4169 8661 4458 9684 9892 15466 7822 ...
 $ Age                     : int  31 33 26 27 27 27 32 31 32 25 ...
 $ Nationality             : Factor w/ 164 levels "Afghanistan",..: 7 124 21 141 14 14 36 159 141 138 ...
 $ Overall                 : int  94 94 92 91 91 91 91 91 91 90 ...
 $ Potential               : int  94 94 93 93 92 91 91 91 91 93 ...
 $ Club                    : Factor w/ 652 levels ""," SSV Jahn Regensburg",..: 215 330 437 377 376 138 474 215 474 62 ...
 $ Value                   : num  1.10e+08 7.70e+07 1.18e+08 7.20e+07 1.02e+08 ...
 $ Wage                    : num  565000 405000 290000 260000 355000 340000 420000 455000 380000 94000 ...
 $ Special                 : int  2202 2228 2143 1471 2281 2142 2280 2346 2201 1331 ...
 $ Preferred.Foot          : Factor w/ 3 levels "","Left","Right": 2 3 3 3 3 3 3 3 3 3 ...
 $ International.Rep

### NOTA: Antes de seguir adelante realizar las limpiezas de datos que crea necesarias, eliminar nulos por ejemplo o descartar columnas con muchos 0's o nulos, o que crea que no aportan

### Aplicar los procedimientos anteriores a las columnas que crea necesarias

In [60]:
# Se remplazan los datos vacíos por NA
fifaClean <- mutate_all(fifa, funs(na_if(.,"")))
profile_missing(fifaClean)

feature,num_missing,pct_missing
Name,0,0.0
Age,0,0.0
Nationality,0,0.0
Overall,0,0.0
Potential,0,0.0
Club,241,0.013236667
Value,0,0.0
Wage,0,0.0
Special,0,0.0
International.Reputation,48,0.002636349


In [61]:
#Cantidad de valores $0
summarise_all(fifaClean, funs(sum(is.na(.))))

Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,International.Reputation,...,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release.Clause,Preferred.Foot.,Preferred.Foot.Left,Preferred.Foot.Right
0,0,0,0,0,241,0,0,0,48,...,48,48,48,48,48,48,0,0,0,0


In [62]:
#Limpiamos valores en 0
fifaNotNull <- na.omit(fifaClean)
summarise_all(fifaNotNull, funs(sum(is.na(.))))

Name,Age,Nationality,Overall,Potential,Club,Value,Wage,Special,International.Reputation,...,SlidingTackle,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Release.Clause,Preferred.Foot.,Preferred.Foot.Left,Preferred.Foot.Right
0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [63]:
#Dataset para Goalkeepers
fifaGK <- fifaClean %>% filter(Position == "GK")%>% 
                   select(Name, Age, Nationality, Overall, Potential,GKDiving,
                              GKHandling,GKKicking,GKPositioning,GKReflexes, Reactions, International.Reputation) 
head(fifaGK, 20)

Name,Age,Nationality,Overall,Potential,GKDiving,GKHandling,GKKicking,GKPositioning,GKReflexes,Reactions,International.Reputation
De Gea,27,Spain,91,93,90,85,87,88,94,90,4
J. Oblak,25,Slovenia,90,93,86,92,78,88,89,86,3
M. ter Stegen,26,Germany,89,92,87,85,88,85,90,85,3
T. Courtois,26,Belgium,89,90,85,91,72,86,88,84,4
M. Neuer,32,Germany,89,89,90,86,91,87,87,84,5
H. Lloris,31,France,88,88,88,84,68,83,92,85,4
S. Handanovic,33,Slovenia,88,88,87,86,69,89,89,83,3
G. Buffon,40,Italy,88,88,88,87,74,90,83,79,4
K. Navas,31,Costa Rica,87,87,90,81,75,82,90,84,3
Ederson,24,Brazil,86,90,85,80,91,82,87,86,2


In [64]:
#Dataset para CentralDefenders
fifaCB <- fifaClean %>% filter(Position == "CB")%>% 
                   select(Name, Age, Nationality, Overall, Potential,Marking,
                              StandingTackle,SlidingTackle,HeadingAccuracy,Strength, Aggression, Interceptions, ShortPassing,
                          BallControl, Reactions, Jumping, International.Reputation) 
head(fifaCB, 20)

Name,Age,Nationality,Overall,Potential,Marking,StandingTackle,SlidingTackle,HeadingAccuracy,Strength,Aggression,Interceptions,ShortPassing,BallControl,Reactions,Jumping,International.Reputation
Sergio Ramos,32,Spain,91,91,87,92,91,91,83,88,90,78,84,85,93,4
D. Godín,32,Uruguay,90,90,90,89,89,92,88,89,88,79,76,85,91,3
G. Chiellini,33,Italy,89,89,93,93,90,83,89,92,88,59,57,82,89,4
M. Hummels,29,Germany,88,88,88,90,88,87,84,69,92,81,81,87,68,4
Thiago Silva,33,Brazil,88,88,88,89,85,81,82,76,89,80,80,82,90,4
S. Umtiti,24,France,87,92,90,89,86,79,84,81,87,81,77,82,89,3
K. Koulibaly,27,Senegal,87,90,91,88,86,81,94,87,88,66,60,80,81,3
J. Vertonghen,31,Belgium,87,87,90,87,88,80,79,84,89,79,76,84,85,3
Piqué,31,Spain,87,87,91,86,84,83,83,72,88,81,78,84,74,4
V. van Dijk,26,Netherlands,86,88,88,89,84,82,92,81,86,76,73,85,85,3


In [47]:
#Dataset para Right/Left Backs
fifaRLB <- fifaClean %>% filter(Position == "RLB")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Marking,
                              StandingTackle, SlidingTackle, HeadingAccuracy, Aggression, Interceptions, ShortPassing,
                          BallControl, Reactions, Stamina, Crossing, SprintSpeed, International.Reputation) 
head(fifaRLB, 20)

Name,Age,Nationality,Overall,Potential,Marking,StandingTackle,SlidingTackle,HeadingAccuracy,Aggression,Interceptions,ShortPassing,BallControl,Reactions,Stamina,Crossing,SprintSpeed,International.Reputation
Marcelo,30,Brazil,88,88,71,85,86,75,84,85,84,92,88,91,90,82,4
Jordi Alba,29,Spain,87,87,72,84,85,70,75,84,84,84,83,91,87,93,3
Alex Sandro,27,Brazil,86,86,81,84,84,76,82,82,81,81,84,92,84,86,3
Azpilicueta,28,Spain,86,86,88,90,86,76,82,89,81,77,88,87,81,76,3
D. Alaba,26,Austria,85,87,80,82,80,75,69,84,82,83,84,87,81,86,4
Filipe Luís,32,Brazil,85,85,78,84,86,72,81,84,80,83,85,89,86,75,3
Alex Telles,25,Brazil,84,87,80,81,79,74,78,80,83,82,83,92,89,85,3
Carvajal,26,Spain,84,87,83,83,85,70,81,83,81,83,82,84,83,84,3
K. Walker,28,England,84,84,78,84,83,74,78,81,78,78,83,89,81,93,3
Sergi Roberto,26,Spain,83,86,75,83,83,72,68,81,86,83,80,85,85,79,3


In [65]:
#Dataset para Right/Left Winger Backs
fifaLRWB <- fifaClean %>% filter(Position == "LRWB")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Marking,
                              StandingTackle, SlidingTackle, Dribbling, Interceptions, Agility,
                          BallControl, Reactions, Stamina, Crossing, SprintSpeed, International.Reputation) 
head(fifaLRWB, 20)

Name,Age,Nationality,Overall,Potential,Marking,StandingTackle,SlidingTackle,Dribbling,Interceptions,Agility,BallControl,Reactions,Stamina,Crossing,SprintSpeed,International.Reputation
M. Ginter,24,Germany,80,82,82,84,77,62,80,55,72,79,74,68,69,2
P. Kaderábek,26,Czech Republic,80,81,72,79,76,76,72,66,77,82,86,86,78,2
N. Schulz,25,Germany,80,81,74,76,77,79,74,76,78,78,79,83,86,1
S. Coleman,29,Republic of Ireland,80,80,78,84,82,79,80,73,78,80,77,82,75,2
Granell,29,Spain,79,79,55,75,65,76,74,71,80,72,69,73,60,1
Jonny,24,Spain,79,83,77,79,82,73,80,72,76,71,83,77,78,2
J. Hector,28,Germany,79,79,76,79,79,74,81,67,77,80,82,79,77,3
D. Caligiuri,30,Italy,79,79,70,72,68,81,76,76,80,79,85,83,78,2
Pablo Maffeo,20,Spain,78,86,76,81,80,80,69,82,81,64,74,79,86,1
J. Mojica,25,Colombia,78,81,67,78,79,78,72,74,79,70,70,79,90,1


In [66]:
#Dataset para Center Defender Midfielders
fifaCDM <- fifaClean %>% filter(Position == "CDM")%>% 
                   select(Name, Age, Nationality, Overall, Potential, ShortPassing,
                              StandingTackle, LongPassing, Vision, Interceptions, Marking,
                          BallControl, Reactions, Stamina, Aggression, Strength, International.Reputation) 
head(fifaCDM, 20)

Name,Age,Nationality,Overall,Potential,ShortPassing,StandingTackle,LongPassing,Vision,Interceptions,Marking,BallControl,Reactions,Stamina,Aggression,Strength,International.Reputation
Sergio Busquets,29,Spain,89,89,89,86,82,87,87,90,88,87,86,85,77,4
Casemiro,26,Brazil,88,90,85,90,82,77,87,88,78,84,87,87,89,3
M. Pjanic,28,Bosnia Herzegovina,86,86,89,74,85,88,78,75,89,84,78,70,66,3
Fernandinho,33,Brazil,86,86,85,85,81,75,88,85,82,86,79,87,76,3
Fabinho,24,Brazil,84,88,83,86,78,75,84,83,82,83,92,85,79,3
William Carvalho,26,Portugal,84,86,84,85,86,83,84,84,79,75,84,75,90,3
N. Matic,29,Serbia,84,84,84,84,83,73,86,83,79,83,84,80,85,3
E. Banega,30,Argentina,84,84,88,69,87,88,76,75,86,77,74,73,67,3
Danilo Pereira,26,Portugal,83,86,82,84,79,73,84,83,78,76,83,85,89,3
K. Strootman,28,Netherlands,83,83,87,84,84,79,83,75,84,83,79,86,81,3


In [67]:
#Dataset para Central Midfielders
fifaCM <- fifaClean %>% filter(Position == "CM")%>% 
                   select(Name, Age, Nationality, Overall, Potential, ShortPassing,
                              StandingTackle, LongPassing, Vision, Interceptions, LongShots,
                          BallControl, Reactions, Stamina, Dribbling, Positioning, International.Reputation)  
head(fifaCM, 20)

Name,Age,Nationality,Overall,Potential,ShortPassing,StandingTackle,LongPassing,Vision,Interceptions,LongShots,BallControl,Reactions,Stamina,Dribbling,Positioning,International.Reputation
K. De Bruyne,27,Belgium,91,92,92,58,91,94,61,91,91,91,90,86,87,4
L. Modric,32,Croatia,91,91,93,76,88,92,83,82,93,90,89,90,79,4
T. Kroos,28,Germany,90,90,92,79,93,86,82,92,90,89,75,81,79,4
David Silva,32,Spain,90,90,93,53,87,92,50,75,94,90,78,89,89,4
M. Hamšík,30,Slovakia,87,87,88,73,83,86,72,83,87,88,84,86,88,3
I. Rakitic,30,Croatia,87,87,87,74,90,86,75,90,87,77,84,84,79,4
M. Verratti,25,Italy,86,89,90,83,89,87,84,58,88,85,77,90,71,3
Thiago,27,Spain,86,86,90,63,87,86,78,79,90,84,75,90,79,3
S. Milinkovic-Savic,23,Serbia,85,90,85,77,85,85,78,80,87,80,85,86,79,2
J. Kimmich,23,Germany,85,88,85,81,80,79,79,69,85,85,85,80,80,3


In [51]:
#Dataset para Left/Right Central Midfielders
fifaLRCM <- fifaClean %>% filter(Position == "LRCM")%>% 
                   select(Name, Age, Nationality, Overall, Potential, ShortPassing,
                              StandingTackle, LongPassing, Vision, Interceptions, LongShots,
                          BallControl, Reactions, Stamina, Dribbling, Positioning, International.Reputation)  
head(fifaLRCM, 20)

Name,Age,Nationality,Overall,Potential,ShortPassing,StandingTackle,LongPassing,Vision,Interceptions,LongShots,BallControl,Reactions,Stamina,Dribbling,Positioning,International.Reputation
Thiago,27,Spain,86,86,90,63,87,86,78,79,90,84,75,90,79,3
S. Milinkovic-Savic,23,Serbia,85,90,85,77,85,85,78,80,87,80,85,86,79,2
Jorginho,26,Italy,84,87,89,78,87,87,78,62,85,83,79,82,72,2
I. Gündogan,27,Germany,84,84,88,68,83,86,77,73,86,85,69,85,79,3
N. Keïta,23,Guinea,83,88,88,62,78,81,75,73,88,82,82,88,74,2
C. Tolisso,23,France,83,88,84,78,86,80,76,81,81,83,87,77,79,2
A. Rabiot,23,France,83,87,85,81,83,80,80,79,83,83,82,79,76,2
L. Goretzka,23,Germany,83,88,84,79,79,82,83,78,83,83,86,81,77,3
J. Draxler,24,Germany,83,86,84,64,79,84,66,82,89,83,69,88,79,3
Cesc Fàbregas,31,Spain,83,83,90,59,89,91,50,70,85,81,56,79,72,4


In [52]:
#Dataset para Central Attacking Midfielders
fifaCAM <- fifaClean %>% filter(Position == "CAM")%>% 
                   select(Name, Age, Nationality, Overall, Potential, ShortPassing,
                              Agility, Acceleration, Vision, Finishing, LongShots,
                          BallControl, Reactions, ShotPower, Dribbling, Positioning, International.Reputation)  
head(fifaCAM, 20)

Name,Age,Nationality,Overall,Potential,ShortPassing,Agility,Acceleration,Vision,Finishing,LongShots,BallControl,Reactions,ShotPower,Dribbling,Positioning,International.Reputation
A. Griezmann,27,France,89,90,83,90,88,83,90,82,90,90,80,88,91,4
C. Eriksen,26,Denmark,88,91,91,79,75,91,80,89,91,88,84,84,83,3
Roberto Firmino,26,Brazil,86,87,86,80,78,85,87,76,88,86,81,87,87,3
T. Müller,28,Germany,86,86,83,75,73,85,87,80,82,91,78,75,92,4
M. Özil,29,Germany,86,86,89,79,72,91,73,75,90,84,70,84,83,4
N. Fekir,24,France,85,89,83,90,79,81,82,82,89,80,84,90,81,3
A. Vidal,31,Chile,85,85,83,74,60,80,75,85,82,84,86,76,80,4
R. Nainggolan,30,Belgium,85,85,84,76,78,76,75,86,85,87,84,80,86,3
D. Payet,31,France,84,84,84,79,75,87,78,82,90,75,80,86,79,3
Anderson Talisca,24,Brazil,83,90,81,76,77,81,80,88,84,79,84,82,86,2


In [68]:
#Dataset para Left/Right Midfielders
fifaLRM <- fifaClean %>% filter(Position == "LRM")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Crossing, ShortPassing,
                              Agility, Acceleration, Vision, SprintSpeed, LongPassing,
                          BallControl, Reactions, Dribbling, Stamina, Positioning, International.Reputation)  
head(fifaLRM, 20)

Name,Age,Nationality,Overall,Potential,Crossing,ShortPassing,Agility,Acceleration,Vision,SprintSpeed,LongPassing,BallControl,Reactions,Dribbling,Stamina,Positioning,International.Reputation
K. Mbappé,19,France,88,95,77,82,92,96,82,96,73,91,87,90,83,88,3
M. Salah,26,Egypt,88,89,78,82,91,94,82,91,72,88,91,89,84,90,3
P. Aubameyang,29,Gabon,88,88,77,77,76,93,77,95,64,82,87,79,76,90,3
S. Mané,26,Senegal,86,87,73,79,91,95,82,93,71,86,86,87,84,87,3
Douglas Costa,27,Brazil,86,86,84,84,93,97,84,93,68,91,84,92,78,76,3
M. Reus,29,Germany,86,86,79,86,86,86,86,85,75,86,87,87,73,88,4
Koke,26,Spain,85,86,86,90,74,71,87,68,89,86,85,82,90,84,3
Y. Brahimi,28,Algeria,85,85,79,79,92,87,79,75,72,86,84,93,85,83,3
I. Perišic,29,Croatia,85,85,83,81,78,84,79,88,77,85,81,84,89,85,3
B. Matuidi,31,France,85,85,75,83,83,79,76,77,77,79,84,78,94,72,3


In [69]:
#Dataset para Left/Right Attacking Wingers
fifaLRAW <- fifaClean %>% filter(Position == "LRAW")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Crossing, ShortPassing,
                              HeadingAccuracy, Acceleration, Vision, SprintSpeed, ShotPower,
                          BallControl, Reactions, Dribbling, LongShots, Positioning, International.Reputation)  
head(fifaLRAW, 20)

Name,Age,Nationality,Overall,Potential,Crossing,ShortPassing,HeadingAccuracy,Acceleration,Vision,SprintSpeed,ShotPower,BallControl,Reactions,Dribbling,LongShots,Positioning,International.Reputation
Neymar Jr,26,Brazil,92,93,79,84,62,94,87,90,80,95,94,96,82,89,5
J. Rodríguez,26,Colombia,88,89,90,89,62,73,89,67,86,90,85,85,92,80,4
L. Insigne,27,Italy,88,88,86,85,56,94,87,86,75,93,83,90,84,83,3
Isco,26,Spain,88,91,75,89,55,75,89,69,69,95,77,94,87,78,3
Coutinho,26,Brazil,88,89,79,88,48,89,90,75,83,92,83,91,93,84,3
L. Sané,22,Germany,86,92,83,79,72,93,82,96,86,85,81,88,78,84,2
Bernardo Silva,23,Portugal,86,91,85,85,51,84,86,74,70,91,82,92,72,83,2
R. Sterling,23,England,86,89,77,84,38,95,77,92,73,87,87,88,73,87,3
Marco Asensio,22,Spain,85,92,82,83,50,85,84,82,86,85,82,86,88,82,3
R. Mahrez,27,Algeria,85,85,81,82,48,88,81,83,79,90,77,91,81,80,3


In [70]:
#Dataset para Center/Left/Right Forwarders
fifaCLRF <- fifaClean %>% filter(Position == "CLRF")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Finishing, ShortPassing,
                              HeadingAccuracy, Acceleration, Vision, SprintSpeed, ShotPower,
                          BallControl, Reactions, Dribbling, LongShots, Positioning, International.Reputation)  
head(fifaCLRF, 20)

Name,Age,Nationality,Overall,Potential,Finishing,ShortPassing,HeadingAccuracy,Acceleration,Vision,SprintSpeed,ShotPower,BallControl,Reactions,Dribbling,LongShots,Positioning,International.Reputation
L. Messi,31,Argentina,94,94,95,90,70,91,94,86,85,96,95,97,94,94,5
E. Hazard,27,Belgium,91,91,84,89,61,94,89,88,82,94,90,95,80,87,4
P. Dybala,24,Argentina,89,94,84,87,68,87,87,83,82,92,86,92,88,84,3
D. Mertens,31,Belgium,87,87,86,82,35,93,83,85,80,89,88,91,81,87,3
Iniesta,34,Spain,86,86,70,90,54,70,93,67,65,92,86,90,71,81,4
Luis Alberto,25,Spain,82,85,80,88,49,77,87,72,77,88,83,86,77,81,2
Jonathan Viera,28,Spain,82,82,78,84,41,86,84,76,70,82,81,84,79,72,2
S. Giovinco,31,Italy,82,82,80,80,34,88,81,80,80,86,80,86,81,82,2
A. Milik,24,Poland,81,88,88,63,78,69,60,74,85,78,75,74,82,81,3
L. Stindl,29,Germany,81,81,79,85,64,74,83,67,81,83,84,78,83,84,2


In [71]:
#Dataset para Strikers
fifaST <- fifaClean %>% filter(Position == "ST")%>% 
                   select(Name, Age, Nationality, Overall, Potential, Finishing, Strength,
                              HeadingAccuracy, Acceleration, Volleys, SprintSpeed, ShotPower,
                          BallControl, Reactions, Dribbling, LongShots, Positioning, International.Reputation)  
head(fifaST, 20)

Name,Age,Nationality,Overall,Potential,Finishing,Strength,HeadingAccuracy,Acceleration,Volleys,SprintSpeed,ShotPower,BallControl,Reactions,Dribbling,LongShots,Positioning,International.Reputation
Cristiano Ronaldo,33,Portugal,94,94,94,79,89,89,87,91,95,94,96,88,93,95,5
L. Suárez,31,Uruguay,91,91,93,83,77,86,88,75,86,90,92,87,85,92,5
R. Lewandowski,29,Poland,90,90,91,84,85,77,89,78,88,89,90,85,84,91,4
H. Kane,24,England,89,91,94,84,85,68,84,72,88,84,91,80,85,93,3
E. Cavani,31,Uruguay,89,89,89,78,89,75,90,76,87,82,91,80,79,93,4
S. Agüero,30,Argentina,89,89,93,73,77,88,85,80,88,89,90,89,83,92,4
G. Bale,28,Wales,88,88,86,80,84,94,85,95,92,85,85,87,91,85,4
G. Higuaín,30,Argentina,88,88,92,85,80,73,90,73,86,85,86,84,80,92,4
M. Icardi,25,Argentina,87,90,91,76,91,77,85,78,84,81,88,77,70,92,3
R. Lukaku,25,Belgium,87,89,87,94,86,77,79,90,88,72,86,80,74,89,3


## Separar la columna value e intentar predecirla con una regresión lineal

## Utilizar validación cruzada y estimar el error del clasificador