# Porteros

Vamos a seleccionar a los mejores porteros según:

    1. Goles encajados
    2. Paradas
    3. Porcentaje de paradas / tiros a puerta

Importamos la librería "pandas" y el dataset inicial 

In [1]:
import pandas as pd

#Creamos el dataframe y eliminamos las filas duplicadas

df_gk = pd.read_csv('/Users/alexcar934/Desktop/EDA_bien/src/data/raw/datasetMatchGkInfo2020-La-Liga.csv')
df_gk = df_gk.drop_duplicates()


### Nos van a interesar los porteros que hayan jugado más de 900 minutos.

Esto se debe a que, si han jugado pocos minutos, el dato de porcentaje paradas/tiros puede no ser fiable

    Ejemplo: Si has jugado un partido y han tirado una vez a puerta, si lo ha parado, el porcentaje sería de 100% 

In [2]:
#Creamos un dataframe con los porteros que hayan jugado mas de 900 mins.

df_minutos = df_gk.groupby('Player')['Min'].sum()
df_minutos = df_minutos.loc[df_minutos>900]
df_minutos.to_frame
df_minutos




Player
Aitor Fernández     1170
Andrés Fernández     990
David Soria         1080
Fernando Pacheco    1440
Jan Oblak           1260
Jaume Doménech      1440
Jeremías Ledesma    1170
Marko Dmitrović     1440
Rui Silva           1260
Sergio Asenjo       1440
Sergio Herrera      1158
Thibaut Courtois    1440
Unai Simón          1440
Yassine Bounou       990
Álex Remiro         1530
Édgar Badía         1260
Name: Min, dtype: int64

### Nos interesa conocer la media de goles por partido

Esto se debe a que el número de goles recibidos puede variar dependiendo de si has jugado 900 o 1400 minutos

In [3]:
#Creamos un dataframe con la media de goles encajados por partido por portero

df_goals_mean = df_gk.groupby('Player')['Shot Stopping|GA'].mean().sort_values(ascending = True)
df_goals_mean.to_frame()
df_goals_mean


Player
Jan Oblak                0.357143
Yassine Bounou           0.636364
Rubén Yáñez              0.666667
Álex Remiro              0.705882
Thibaut Courtois         0.937500
Marko Dmitrović          1.000000
Cárdenas                 1.000000
Tomáš Vaclík             1.000000
Neto                     1.000000
Marc-André ter Stegen    1.000000
Sergio Asenjo            1.062500
Fernando Pacheco         1.125000
Jeremías Ledesma         1.153846
David Soria              1.166667
Unai Simón               1.187500
Édgar Badía              1.214286
Rubén Blanco             1.333333
Aitor Fernández          1.384615
Jordi Masip              1.400000
Iván Villar              1.428571
Sergio Herrera           1.461538
Rui Silva                1.500000
Jaume Doménech           1.500000
Alberto Cifuentes        1.500000
Andrés Fernández         1.545455
Álvaro Fernández         1.600000
Joel Robles              1.666667
Roberto                  1.666667
Rubén                    1.666667
Koke Ve

### Casamos dataframes

Necesitamos obtener los valores de las variables estadísticas:

    1. Goles encajados
    2. Paradas
    3. Porcentaje de paradas / tiros a puerta

Pero únicamente nos interesan los Porteros que hayan jugado más de 900 minutos
 

In [4]:
#reducimos el dataframe a solo las medias de los porteros que hayan jugado más de 900 minutos.
df_goals = df_goals_mean[df_goals_mean.keys().isin(df_minutos.keys())]

#dividimos los valores entre el total para obtener un valor entre 0 y 1
df_goals = df_goals/df_goals.sum()
df_goals



Player
Jan Oblak           0.019910
Yassine Bounou      0.035475
Álex Remiro         0.039351
Thibaut Courtois    0.052262
Marko Dmitrović     0.055747
Sergio Asenjo       0.059231
Fernando Pacheco    0.062715
Jeremías Ledesma    0.064323
David Soria         0.065038
Unai Simón          0.066199
Édgar Badía         0.067692
Aitor Fernández     0.077188
Sergio Herrera      0.081476
Rui Silva           0.083620
Jaume Doménech      0.083620
Andrés Fernández    0.086154
Name: Shot Stopping|GA, dtype: float64

In [5]:
#Hacemos lo mismo con la media de paradas por partido
df_saves_mean = df_gk.groupby('Player')['Shot Stopping|Saves'].mean().sort_values(ascending = False)
df_saves_try = df_saves_mean[df_saves_mean.keys().isin(df_minutos.keys())]

#dividimos los valores entre el total para obtener un valor entre 0 y 1
df_saves = df_saves_try/df_saves_try.sum()
df_saves

Player
Édgar Badía         0.090262
Jaume Doménech      0.086723
Sergio Herrera      0.074334
Fernando Pacheco    0.072785
Aitor Fernández     0.068616
Jan Oblak           0.067254
Yassine Bounou      0.065324
Jeremías Ledesma    0.064804
Marko Dmitrović     0.063493
Rui Silva           0.061945
Thibaut Courtois    0.055750
David Soria         0.051621
Sergio Asenjo       0.049556
Andrés Fernández    0.049556
Álex Remiro         0.040811
Unai Simón          0.037167
Name: Shot Stopping|Saves, dtype: float64

In [6]:
#y volvemos a hacer lo mismo con las medias del porcentaje de paradas/tiros por partido
df_save_percent_mean = df_gk.groupby('Player')['Shot Stopping|Save%'].mean().sort_values(ascending = False)
df_save_percent_try = df_save_percent_mean[df_save_percent_mean.keys().isin(df_minutos.keys())]

#dividimos los valores entre el total para obtener un valor entre 0 y 1
df_save_percent = df_save_percent_try/df_save_percent_try.sum()
df_save_percent


Player
Jan Oblak           0.076443
Yassine Bounou      0.071350
Álex Remiro         0.068695
Édgar Badía         0.067478
Thibaut Courtois    0.067449
David Soria         0.067423
Marko Dmitrović     0.062072
Fernando Pacheco    0.061981
Rui Silva           0.061846
Sergio Herrera      0.060288
Jeremías Ledesma    0.059411
Jaume Doménech      0.059397
Sergio Asenjo       0.057609
Aitor Fernández     0.056057
Andrés Fernández    0.054427
Unai Simón          0.048074
Name: Shot Stopping|Save%, dtype: float64

Convertimos las Series obtenidas en DataFrames

In [7]:
df_goals = df_goals.to_frame()
df_saves = df_saves.to_frame()
df_save_percent = df_save_percent.to_frame()

Unimos los dataframes

In [8]:
#unimos los 3 dataframes obtenidos
df_gk_final = df_goals.merge(df_saves, how= 'inner', on='Player')
df_gk_final = df_gk_final.merge(df_save_percent, how='inner', on='Player')
df_gk_final 

Unnamed: 0_level_0,Shot Stopping|GA,Shot Stopping|Saves,Shot Stopping|Save%
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Jan Oblak,0.01991,0.067254,0.076443
Yassine Bounou,0.035475,0.065324,0.07135
Álex Remiro,0.039351,0.040811,0.068695
Thibaut Courtois,0.052262,0.05575,0.067449
Marko Dmitrović,0.055747,0.063493,0.062072
Sergio Asenjo,0.059231,0.049556,0.057609
Fernando Pacheco,0.062715,0.072785,0.061981
Jeremías Ledesma,0.064323,0.064804,0.059411
David Soria,0.065038,0.051621,0.067423
Unai Simón,0.066199,0.037167,0.048074


Creamos la columna Puntuación final:

    1. Sumamos los valores de las columnas "Paradas" y del "Porcentaje de paradas / tiros a puerta".
    2. Restamos el valor de la columna "Goles encajados".

In [9]:
'''Creamos la columna de puntuación final
Sumando las ponderaciones de Paradas y el porcentaje de paradas/tiros
y restando los goles encajados'''
df_gk_final['Puntuación Final'] = (df_gk_final.iloc[:,1]) + (df_gk_final.iloc[:,2]) - (df_gk_final.iloc[:,0]) 
df_gk_final


Unnamed: 0_level_0,Shot Stopping|GA,Shot Stopping|Saves,Shot Stopping|Save%,Puntuación Final
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jan Oblak,0.01991,0.067254,0.076443,0.123788
Yassine Bounou,0.035475,0.065324,0.07135,0.101198
Álex Remiro,0.039351,0.040811,0.068695,0.070155
Thibaut Courtois,0.052262,0.05575,0.067449,0.070937
Marko Dmitrović,0.055747,0.063493,0.062072,0.069819
Sergio Asenjo,0.059231,0.049556,0.057609,0.047934
Fernando Pacheco,0.062715,0.072785,0.061981,0.072051
Jeremías Ledesma,0.064323,0.064804,0.059411,0.059892
David Soria,0.065038,0.051621,0.067423,0.054006
Unai Simón,0.066199,0.037167,0.048074,0.019041


In [10]:
#Ordenamos el dataframe por los valores de Puntuación Final, de mayor a menor
df_porteros = df_gk_final.dropna().sort_values(by='Puntuación Final', ascending = False)
df_porteros

Unnamed: 0_level_0,Shot Stopping|GA,Shot Stopping|Saves,Shot Stopping|Save%,Puntuación Final
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jan Oblak,0.01991,0.067254,0.076443,0.123788
Yassine Bounou,0.035475,0.065324,0.07135,0.101198
Édgar Badía,0.067692,0.090262,0.067478,0.090048
Fernando Pacheco,0.062715,0.072785,0.061981,0.072051
Thibaut Courtois,0.052262,0.05575,0.067449,0.070937
Álex Remiro,0.039351,0.040811,0.068695,0.070155
Marko Dmitrović,0.055747,0.063493,0.062072,0.069819
Jaume Doménech,0.08362,0.086723,0.059397,0.062499
Jeremías Ledesma,0.064323,0.064804,0.059411,0.059892
David Soria,0.065038,0.051621,0.067423,0.054006


In [11]:
#cambiamos el nombre de las columnas
df_porteros.columns = ['Goles encajados x(-1)','Paradas', 'Porcentaje paradas/tiros', 'Puntuación Final']
df_porteros

Unnamed: 0_level_0,Goles encajados x(-1),Paradas,Porcentaje paradas/tiros,Puntuación Final
Player,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Jan Oblak,0.01991,0.067254,0.076443,0.123788
Yassine Bounou,0.035475,0.065324,0.07135,0.101198
Édgar Badía,0.067692,0.090262,0.067478,0.090048
Fernando Pacheco,0.062715,0.072785,0.061981,0.072051
Thibaut Courtois,0.052262,0.05575,0.067449,0.070937
Álex Remiro,0.039351,0.040811,0.068695,0.070155
Marko Dmitrović,0.055747,0.063493,0.062072,0.069819
Jaume Doménech,0.08362,0.086723,0.059397,0.062499
Jeremías Ledesma,0.064323,0.064804,0.059411,0.059892
David Soria,0.065038,0.051621,0.067423,0.054006


In [12]:
#exportamos a csv
df_porteros.to_csv('/Users/alexcar934/Desktop/EDA_bien/src/data/processed/csv_principales/1_Portero.csv')