# Players dataframes

En este script nos dedicaremos a crear una base de datos limpia segmentada por position players y pitchers. Se exportarán dichas bases de datos contemplando respectivamente a los jugadores que son agentes libres, a los que no son y a todos los jugadores. Las secciones dentro del script son:

- **Visualización del contenido de las bases de datos.**
- **Limpieza de la base de datos y exportación.**
- **Creación de indicador de si el jugador es agente libre.**

Importemos los modulos necesarios así como especificar la configuración deseada.

In [1]:
import pandas as pd
import numpy as np
import math
import os
import warnings
print('Modulos importados')

Modulos importados


In [2]:
# Reduzcamos el número de línea a leer
pd.options.display.max_rows = 10

In [3]:
print("Para que no nos molesten los mensajes de advertencia.")
import warnings
warnings.filterwarnings('ignore')

Para que no nos molesten los mensajes de advertencia.


In [4]:
# Veamos el directorio actual de trabajo
print(os.getcwd())
# El directorio anterior es el correcto, pero si no lo fuese, hacemos lo sigueinte:
path = '/home/usuario/Documentos/Github/Proyectos/MLB_HN'
os.chdir(path)

/home/usuario/Documentos/Github/Proyectos/MLB_HN/ETL


In [5]:
free_agents_2012 = 'Data/Free_Agents/Free_Agents_2012.csv'
hitting_2012 = 'Data/Statistics/Hitting/Hitting_2012.csv'
pitching_2012 = 'Data/Statistics/Pitching/Pitching_2012.csv'
salary_2012 = 'Data/Salary/Salary_2012.csv'

df_free_agent_auxiliar_2012 = pd.read_csv(free_agents_2012)
df_hitting_auxiliar_2012 = pd.read_csv(hitting_2012)
df_pitching_auxiliar_2012 = pd.read_csv(pitching_2012)
df_salary_auxiliar_2012 = pd.read_csv(salary_2012)

## Visualización de las bases de datos

A continuación, se mostrará el contenido de las distintas bases de datos sobre los *bateadores*, *pitchers*, *salarios de los agentes libres* y *salarios de los todos los jugadores*. Esto para determinar el proceso de limpieza que se llevará a cabo.

### Agentes libres

Veamos primero el dataframe

In [6]:
df_free_agent_auxiliar_2012.head()

Unnamed: 0,Rank,Player,Year,Pos,Status,Team From,Team From To,YRS,Value,AAV
0,1,Albert Pujols,2012,DH,UFA,STL,LAA,10,"$240,000,000","$24,000,000"
1,2,Prince Fielder,2012,DH,UFA,MIL,DET,9,"$214,000,000","$23,777,778"
2,3,Jose Reyes,2012,SS,UFA,NYM,MIA,6,"$106,000,000","$17,666,667"
3,4,C.J. Wilson,2012,SP,UFA,TEX,LAA,5,"$77,500,000","$15,500,000"
4,5,Mark Buehrle,2012,SP,UFA,CHW,MIA,4,"$58,000,000","$14,500,000"


### Hitting

Veamos el dataframe

In [7]:
df_hitting_auxiliar_2012.head()

Unnamed: 0,Rank,Player,Pos,Team,GP,GP%,AB,H,HR,RBI,AVG,OPS,Cash2022
0,1,Derek Jeter,SS,NYY,159,0.982,683,216,15,58,0.316,0.791,$0
1,2,Miguel Cabrera,1B,DET,161,0.994,622,205,44,139,0.33,0.999,"$32,000,000"
2,3,Robinson Cano,2B,NYY,161,0.994,627,196,33,94,0.313,0.929,"$20,250,000"
3,4,Everth Cabrera,SS,SD,230,0.71,796,196,4,48,0.246,0.648,$0
4,5,Adrian Beltre,3B,TEX,156,0.963,604,194,36,102,0.321,0.921,$0


Los términos en la base de datos no se traducirán para evitar malentendidos en la traducción.

- **Pos**: Player position.
- **Team**: Team acronym.
- **GP**: Games played.
- **GP%**: Games played %.
- **AB**: At bats.
- **H**: Hitting.
- **HR**: Home runs.
- **RBI**: Runs batted in.
- **AVG**: Batting average.
- **OPS**: Onebase plus slugging%.

Se omitirá la columna *Cash2022* puesto que no es de interés para el trabajo el valor del jugador en la actualidad puesto que hay agentes libres que ya se han retirado en años posteriores.

## Pitching

In [8]:
df_pitching_auxiliar_2012.head()

Unnamed: 0,Rank,Player,Pos,Team,GP,GS,IP,H,R,ER,BB,SO,W,L,SV,WHIP,ERA,Cash2022,Unnamed: 18
0,,R.A. Dickey,SP,NYM,34,33,233.7,192,78,71,54,230,20,6,0,1.05,2.74,$0,
1,,Felix Hernandez,SP,SEA,33,33,232.0,209,84,79,56,223,13,9,0,1.14,3.07,$0,
2,,James Shields,SP,TB,33,33,227.7,209,103,89,58,223,15,10,0,1.17,3.52,$0,
3,,Clayton Kershaw,SP,LAD,34,33,227.7,170,70,64,63,229,14,9,0,1.02,2.53,"$17,000,000",
4,,Hiroki Kuroda,SP,NYY,33,33,219.7,205,86,81,51,167,16,11,0,1.16,3.32,$0,


#### Notación.

Veamos a qué se refieren algunos términos

- **Pos**: Player position.
- **Team**: Team acronym.
- **GP**: Games played.
- **GS**: Games started.
- **IP**: Inning pitched.
- **H**: Hits.
- **R**: Runs.
- **ER**: Earned runs.
- **BB**: Walks.
- **SO**: Strikeouts.
- **W**: Wins.
- **L**: Losses-
- **SV**: Saves.
- **WHIP**: WHIP.
- **ERA**: Earned runs average.

Por razones análogas, se descartará la columna *Cash2022*.

### Compensación salarial

En este caso, hay muchas menos variables que en las anteriores bases de datos

In [9]:
df_salary_auxiliar_2012.head()

Unnamed: 0,Rank,Player,Year,Pos,Team,BaseSalary,Payroll Salary,Adj Salary,Unnamed: 8
0,,Alex Rodriguez,2012,DH,NYY,"$29,000,000","$30,000,000","$30,000,000",
1,,C.C. Sabathia,2012,SP,NYY,"$23,000,000","$24,285,714","$24,285,714",
2,,Vernon Wells,2012,LF,LAA,"$21,000,000","$24,187,500","$24,187,500",
3,,Johan Santana,2012,SP,NYM,"$24,000,000","$24,000,000","$24,000,000",
4,,Prince Fielder,2012,DH,DET,"$23,000,000","$23,150,000","$23,150,000",


- **BaseSalary**: A base salary is the minimum amount you can expect to earn in exchange for your time or services. This is the amount earned before benefits, bonuses, or compensation is added.
- **Payroll Salary**: Payroll is the compensation a business must pay to its employees for a set period and on a given date.
- **Adj Salary**: Adjusted Salary means the regular salary, wages and commissions, if any, payable to a Participant by the Company for the Participant's service, excluding any bonuses or other compensation.

## Algoritmo para la creación de las bases de datos

A continuaicón, se optimizará el código para que se puedan obtener los *dataframes* anteriores para un cojuntos de datos de años secuenciales, como es nuestro caso

In [10]:
# Auxiliares:
free_agents = 'Data/Free_Agents/Free_Agents_'
hitting = 'Data/Statistics/Hitting/Hitting_'
pitching = 'Data/Statistics/Pitching/Pitching_'
salary = 'Data/Salary/Salary_'
csv = '.csv'
period = 11
# Originales:
df_free_agents = [None]*period
df_hitting = [None]*period
df_pitching = [None]*period
df_salary = [None]*period
# Copias:
df_free_agents_copy = [None]*period
df_hitting_copy = [None]*period
df_pitching_copy = [None]*period
df_salary_copy = [None]*period
# Producto final:
df_pitchers = [None]*period
df_hitters = [None]*period
df_pitchers_free_agents = [None]*period
df_hitters_free_agents = [None]*period
df_pitchers_no_free_agents = [None]*period
df_hitters_no_free_agents = [None]*period

In [11]:
for i in range(0,period):    
    df_free_agents[i] = pd.read_csv(free_agents + str(2011 + i) + csv)
    df_hitting[i] = pd.read_csv(hitting + str(2011 + i) + csv)
    df_pitching[i] = pd.read_csv(pitching + str(2011 + i) + csv)
    df_salary[i] = pd.read_csv(salary + str(2011 + i) + csv)
    
    df_free_agents_copy[i] = df_free_agents[i].copy()
    df_hitting_copy[i] = df_hitting[i].copy()
    df_pitching_copy[i] = df_pitching[i].copy()
    df_salary_copy[i] = df_salary[i].copy()
    
    df_free_agents_copy[i]  = df_free_agents_copy[i][['Player', 'Year', 'Status', 'Team From',
                                                      'YRS', 'Value', 'AAV']]
    df_free_agents_names  = ['Jugador', 'Anio', 'Status', 'Equipo_anterior',
                             'Anios_contrato', 'Valor_contrato', 'Valor_promedio_contrato']
    df_free_agents_copy[i].columns = df_free_agents_names

    free_agents_aux_1 = df_free_agents_copy[i]['Valor_contrato'].str.replace("$","")
    free_agents_aux_2 = free_agents_aux_1.str.replace(",","")
    free_agents_aux_3 = df_free_agents_copy[i]['Valor_promedio_contrato'].str.replace("$","")
    free_agents_aux_4 = free_agents_aux_3.str.replace(",","")
    df_free_agents_copy[i]['Valor_contrato'] = free_agents_aux_2
    df_free_agents_copy[i]['Valor_promedio_contrato'] = free_agents_aux_4
    
    df_free_agents_copy[i]['Valor_contrato'] = pd.to_numeric(df_free_agents_copy[i]['Valor_contrato'])
    df_free_agents_copy[i]['Valor_promedio_contrato'] = pd.to_numeric(df_free_agents_copy[i]['Valor_promedio_contrato'])
    
    df_hitting_copy[i] = df_hitting_copy[i][['Player', 'Pos', 'GP', 'GP%', 'AB', 'H',
                                             'HR', 'RBI', 'AVG', 'OPS']]
    df_hitting_names = ['Jugador', 'Posicion', 'Juegos', 'Porcetnaje_juegos', 'At-bats',
                        'Bateos', 'Home-runs', 'RBI', 'Porcentaje_bateo', 'OPS']
    df_hitting_copy[i].columns = df_hitting_names
    
    df_pitching_copy[i] = df_pitching_copy[i][['Player', 'Pos', 'GP', 'GS', 'IP', 'H', 
                                               'R', 'ER', 'BB', 'SO', 'W', 'L', 'SV', 
                                               'WHIP', 'ERA']]
    df_pitching_names = ['Jugador', 'Posicion', 'Juegos', 'Juegos_iniciados', 'Inning_pitched', 'Bateos_pitcher',
                         'Carreras', 'Carreras_ganadas', 'Walks', 'Strike-outs', 'Wins', 'Losses',
                         'Saves', 'WHIP', 'ERA']
    df_pitching_copy[i].columns = df_pitching_names
    
    df_salary_copy[i] = df_salary_copy[i][['Player', 'Team', 'BaseSalary',
                                           'Payroll Salary', 'Adj Salary']]
    df_salary_names = ['Jugador', 'Equipo', 'Sueldo_base', 'Sueldo', 'Sueldo_regular']
    df_salary_copy[i].columns = df_salary_names
    
    salary_aux_1 = df_salary_copy[i]['Sueldo_base'].str.replace("$","")
    salary_aux_2 = salary_aux_1.str.replace(",","")
    df_salary_copy[i]['Sueldo_base'] = salary_aux_2
    df_salary_copy[i]['Sueldo_base'] = pd.to_numeric(df_salary_copy[i]['Sueldo_base'])
    
    salary_aux_3 = df_salary_copy[i]['Sueldo'].str.replace("$","")
    salary_aux_4 = salary_aux_3.str.replace(",","")
    df_salary_copy[i]['Sueldo'] = salary_aux_4
    df_salary_copy[i]['Sueldo'] = pd.to_numeric(df_salary_copy[i]['Sueldo'])
    
    salary_aux_5 = df_salary_copy[i]['Sueldo_regular'].str.replace("$","")
    salary_aux_6 = salary_aux_5.str.replace(",","")
    df_salary_copy[i]['Sueldo_regular'] = salary_aux_6
    df_salary_copy[i]['Sueldo_regular'] = pd.to_numeric(df_salary_copy[i]['Sueldo_regular'])

    df_hitters[i] = pd.merge(df_hitting_copy[i], df_salary_copy[i], on = 'Jugador')
    df_pitchers[i] = pd.merge(df_pitching_copy[i], df_salary_copy[i], on = 'Jugador')

## Agregación de variables sugeridas por artículos

Las primeras variables que agregaremos son el cuadrado de todas las estadísticas deportivas, así como las siguientes variables:

- DOMINANCE = $Strike-outs/(9*Inning \; Pitched)$
- CONTROL = $Walks/(9*Inning \; Pitched)$
- COMMAND = $Strike-outs/Walks$

In [12]:
df_hitters[2].head()

Unnamed: 0,Jugador,Posicion,Juegos,Porcetnaje_juegos,At-bats,Bateos,Home-runs,RBI,Porcentaje_bateo,OPS,Equipo,Sueldo_base,Sueldo,Sueldo_regular
0,Adrian Beltre,3B,161,0.988,631,199,30,92,0.315,0.88,TEX,16000000,16000000,16000000
1,Matt Carpenter,3B,157,0.969,626,199,11,78,0.318,0.873,STL,504000,514000,514000
2,Dustin Pedroia,2B,160,0.988,641,193,9,84,0.301,0.787,BOS,10000000,10250000,10250000
3,Miguel Cabrera,1B,148,0.914,555,193,44,137,0.348,1.078,DET,21000000,21000000,21000000
4,Robinson Cano,2B,160,0.988,605,190,27,107,0.314,0.899,NYY,15000000,15000000,15000000


In [13]:
df_pitchers[2].head()

Unnamed: 0,Jugador,Posicion,Juegos,Juegos_iniciados,Inning_pitched,Bateos_pitcher,Carreras,Carreras_ganadas,Walks,Strike-outs,Wins,Losses,Saves,WHIP,ERA,Equipo,Sueldo_base,Sueldo,Sueldo_regular
0,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,22,16,0,2.45,7.56,BAL,502000,502000,502000
1,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,22,16,0,2.45,7.56,CHW,490000,490000,74972
2,Adam Wainwright,SP,35,34,241.7,223,83,79,35,219,19,9,0,1.07,2.94,STL,12000000,12150000,12150000
3,Clayton Kershaw,SP,35,33,236.0,164,55,48,52,232,16,9,0,0.92,1.83,LAD,11000000,11250000,11250000
4,James Shields,SP,34,34,228.7,215,82,80,68,196,13,9,0,1.24,3.15,KC,10250000,10250000,10250000


In [14]:
for i in range(0,period):
    df_pitchers[i]['Dominio'] = df_pitchers[i]['Strike-outs']/(9*df_pitchers[i]['Inning_pitched'])
    df_pitchers[i]['Control'] = df_pitchers[i]['Walks']/(9*df_pitchers[i]['Inning_pitched'])
    df_pitchers[i]['Comando'] = df_pitchers[i]['Strike-outs']/df_pitchers[i]['Walks']

In [15]:
df_pitchers[2].head()

Unnamed: 0,Jugador,Posicion,Juegos,Juegos_iniciados,Inning_pitched,Bateos_pitcher,Carreras,Carreras_ganadas,Walks,Strike-outs,...,Saves,WHIP,ERA,Equipo,Sueldo_base,Sueldo,Sueldo_regular,Dominio,Control,Comando
0,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,...,0,2.45,7.56,BAL,502000,502000,502000,0.077813,0.034368,2.264151
1,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,...,0,2.45,7.56,CHW,490000,490000,74972,0.077813,0.034368,2.264151
2,Adam Wainwright,SP,35,34,241.7,223,83,79,35,219,...,0,1.07,2.94,STL,12000000,12150000,12150000,0.100676,0.01609,6.257143
3,Clayton Kershaw,SP,35,33,236.0,164,55,48,52,232,...,0,0.92,1.83,LAD,11000000,11250000,11250000,0.109228,0.024482,4.461538
4,James Shields,SP,34,34,228.7,215,82,80,68,196,...,0,1.24,3.15,KC,10250000,10250000,10250000,0.095224,0.033037,2.882353


Con el objetivo de hacer más eficiente la creación de las variables al cuadrado, lo haremos por índice

In [16]:
# Indiquemos las columnas que se usarán por medio de su índice
square_pitchers_index = list(range(2,15)) + [20,21,22]
square_hitters_index = list(range(2,10))

In [17]:
for i in range(0,period):
    for j in square_pitchers_index:
        df_pitchers[i][df_pitchers[i].columns[j] + '_2'] = np.power(df_pitchers[i][df_pitchers[i].columns[j]], 2)
    
    for k in square_hitters_index:
        df_hitters[i][df_hitters[i].columns[k] + '_2'] = np.power(df_hitters[i][df_hitters[i].columns[k]], 2)

Apreciemos el resultado final

In [18]:
df_pitchers[2].head()

Unnamed: 0,Jugador,Posicion,Juegos,Juegos_iniciados,Inning_pitched,Bateos_pitcher,Carreras,Carreras_ganadas,Walks,Strike-outs,...,Walks_2,Strike-outs_2,Wins_2,Losses_2,Saves_2,WHIP_2,ERA_2,Control_2,Comando_2,Juegos_2_2
0,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,...,11236,57600,484,256,0,6.0025,57.1536,0.001181,5.126379,12960000
1,Miguel Gonzalez,SP,60,56,342.7,314,162,144,106,240,...,11236,57600,484,256,0,6.0025,57.1536,0.001181,5.126379,12960000
2,Adam Wainwright,SP,35,34,241.7,223,83,79,35,219,...,1225,47961,361,81,0,1.1449,8.6436,0.000259,39.151837,1500625
3,Clayton Kershaw,SP,35,33,236.0,164,55,48,52,232,...,2704,53824,256,81,0,0.8464,3.3489,0.000599,19.905325,1500625
4,James Shields,SP,34,34,228.7,215,82,80,68,196,...,4624,38416,169,81,0,1.5376,9.9225,0.001091,8.307958,1336336


In [19]:
df_hitters[7].head()

Unnamed: 0,Jugador,Posicion,Juegos,Porcetnaje_juegos,At-bats,Bateos,Home-runs,RBI,Porcentaje_bateo,OPS,...,Sueldo,Sueldo_regular,Juegos_2,Porcetnaje_juegos_2,At-bats_2,Bateos_2,Home-runs_2,RBI_2,Porcentaje_bateo_2,OPS_2
0,Whit Merrifield,2B,158,0.975,632,192,12,60,0.304,0.806,...,569500,569500,24964,0.950625,399424,36864,144,3600,0.092416,0.649636
1,Freddie Freeman,1B,162,1.0,618,191,23,98,0.309,0.892,...,21609375,21609375,26244,1.0,381924,36481,529,9604,0.095481,0.795664
2,J.D. Martinez,DH,150,0.926,569,188,43,130,0.33,1.031,...,23750000,23750000,22500,0.857476,323761,35344,1849,16900,0.1089,1.062961
3,Manny Machado,3B,162,0.994,632,188,37,107,0.298,0.905,...,16000000,6365591,26244,0.988036,399424,35344,1369,11449,0.088804,0.819025
4,Christian Yelich,LF,147,0.902,574,187,36,110,0.326,1.0,...,7000000,7000000,21609,0.813604,329476,34969,1296,12100,0.106276,1.0


## Segmentación por Agentes libres

Separaremos los pitchers y hitters en dos grupos:

- Agentes libres.
- No agentes libres.

In [20]:
for i in range(0,period):    
    df_hitters_free_agents[i] = pd.merge(df_free_agents_copy[i], df_hitters[i], on = 'Jugador')
    
    df_pitchers_free_agents[i] = pd.merge(df_free_agents_copy[i], df_pitchers[i], on = 'Jugador')
    
    df_hitters_no_free_agents[i] = df_hitters[i][~df_hitters[i].Jugador.isin(df_hitters_free_agents[i].Jugador)]
    df_pitchers_no_free_agents[i] = df_pitchers[i][~df_pitchers[i].Jugador.isin(df_pitchers_free_agents[i].Jugador)]
    
    # Exportemos los dataframes por separado
    df_hitters_free_agents[i].to_csv('Data/New_Data/Hitters/Free_Agent/free_agents_batters_' + str(2011 + i) + '.csv', index = False)
    df_pitchers_free_agents[i].to_csv('Data/New_Data/Pitchers/Free_Agent/free_agents_pitchers_' + str(2011 + i) + '.csv', index = False)
    df_hitters_no_free_agents[i].to_csv('Data/New_Data/Hitters/No_Free_Agent/no_free_agents_batters_' + str(2011 + i) + '.csv', index = False)
    df_pitchers_no_free_agents[i].to_csv('Data/New_Data/Pitchers/No_Free_Agent/no_free_agents_pitchers_' + str(2011 + i) + '.csv', index = False)

In [21]:
# Algunos ejemplos
df_pitchers_no_free_agents[0].head()

Unnamed: 0,Jugador,Posicion,Juegos,Juegos_iniciados,Inning_pitched,Bateos_pitcher,Carreras,Carreras_ganadas,Walks,Strike-outs,...,Walks_2,Strike-outs_2,Wins_2,Losses_2,Saves_2,WHIP_2,ERA_2,Control_2,Comando_2,Juegos_2_2
0,Justin Verlander,SP,34,34,251.0,174,73,67,57,250,...,3249,62500,576,25,0,0.8464,5.76,0.000637,19.236688,1336336
1,James Shields,SP,33,33,249.3,195,83,78,65,225,...,4225,50625,256,144,0,1.0816,7.9524,0.000839,11.982249,1185921
2,Dan Haren,SP,35,34,238.3,211,91,84,33,192,...,1089,36864,256,100,0,1.0404,10.0489,0.000237,33.85124,1500625
3,C.C. Sabathia,SP,33,33,237.3,230,87,79,61,230,...,3721,52900,361,64,0,1.5129,9.0,0.000816,14.216608,1185921
4,Jered Weaver,SP,33,33,235.7,182,65,63,56,198,...,3136,39204,324,64,0,1.0201,5.8081,0.000697,12.501276,1185921


In [22]:
df_hitters_no_free_agents[0].head()

Unnamed: 0,Jugador,Posicion,Juegos,Porcetnaje_juegos,At-bats,Bateos,Home-runs,RBI,Porcentaje_bateo,OPS,...,Sueldo,Sueldo_regular,Juegos_2,Porcetnaje_juegos_2,At-bats_2,Bateos_2,Home-runs_2,RBI_2,Porcentaje_bateo_2,OPS_2
0,Adrian Gonzalez,1B,159,0.982,630,213,27,117,0.338,0.957,...,6375000,6375000,25281,0.964324,396900,45369,729,13689,0.114244,0.915849
1,Jacoby Ellsbury,CF,158,0.975,660,212,32,105,0.321,0.928,...,2500000,2500000,24964,0.950625,435600,44944,1024,11025,0.103041,0.861184
2,Starlin Castro,2B,158,0.975,674,207,10,66,0.307,0.773,...,440000,440000,24964,0.950625,454276,42849,100,4356,0.094249,0.597529
3,Melky Cabrera,LF,155,0.957,658,201,18,87,0.306,0.809,...,1250000,1250000,24025,0.915849,432964,40401,324,7569,0.093636,0.654481
4,Miguel Cabrera,1B,161,0.994,572,197,30,105,0.344,1.033,...,20000000,20000000,25921,0.988036,327184,38809,900,11025,0.118336,1.067089


In [23]:
df_pitchers_free_agents[10]

Unnamed: 0,Jugador,Anio,Status,Equipo_anterior,Anios_contrato,Valor_contrato,Valor_promedio_contrato,Posicion,Juegos,Juegos_iniciados,...,Walks_2,Strike-outs_2,Wins_2,Losses_2,Saves_2,WHIP_2,ERA_2,Control_2,Comando_2,Juegos_2_2
0,Trevor Bauer,2021,UFA,CIN,3,102000000,34000000,SP,17,17,...,1369,18769,64,25,0,1.0000,6.7081,0.001457,13.710007,83521
1,Liam Hendriks,2021,UFA,OAK,3,54000000,18000000,RP/CL,69,0,...,49,12769,64,9,1444,0.5329,6.4516,0.000120,260.591837,22667121
2,Justin Turner,2021,UFA,LAD,2,34000000,17000000,3B,151,139,...,0,0,0,0,0,4.0000,0.0000,0.000000,,519885601
3,Jake Odorizzi,2021,UFA,MIN,3,23500000,7833333,SP,24,23,...,1156,8281,36,49,0,1.5625,17.7241,0.001302,7.163495,331776
4,Taijuan Walker,2021,UFA,TOR,3,23000000,7666667,SP,31,29,...,3025,21316,49,121,0,1.3924,19.9809,0.001477,7.046612,923521
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,Steve Cishek,2021,UFA,HOU,1,1000000,1000000,RP,74,0,...,1681,4096,0,4,0,2.2201,11.6964,0.004449,2.436645,29986576
60,Ryan Tepera,2021,UFA,CHC,1,800000,800000,RP,65,0,...,361,5476,0,4,4,0.7744,7.7841,0.001186,15.168975,17850625
61,Kohl Stewart,2021,UFA,BAL,1,700000,700000,SP,4,3,...,36,121,1,1,0,2.8224,27.7729,0.002368,3.361111,256
62,Ross Detwiler,2021,UFA,MIA,0,0,0,RP,53,5,...,400,3844,9,1,0,1.4884,21.5296,0.001805,9.610000,7890481


In [24]:
df_hitters_free_agents[8].head()

Unnamed: 0,Jugador,Anio,Status,Equipo_anterior,Anios_contrato,Valor_contrato,Valor_promedio_contrato,Posicion,Juegos,Porcetnaje_juegos,...,Sueldo,Sueldo_regular,Juegos_2,Porcetnaje_juegos_2,At-bats_2,Bateos_2,Home-runs_2,RBI_2,Porcentaje_bateo_2,OPS_2
0,Bryce Harper,2019,UFA,WSH,13,330000000,25384615,RF,157,0.969,...,11538462,11538462,24649,0.938961,328329,22201,1225,12996,0.0676,0.777924
1,Manny Machado,2019,UFA,LAD,10,300000000,30000000,3B,156,0.963,...,12000000,12000000,24336,0.927369,344569,22500,1024,7225,0.065536,0.633616
2,Patrick Corbin,2019,UFA,ARI,6,140000000,23333333,SP,33,0.204,...,12916666,12916666,1089,0.041616,4225,36,0,16,0.008464,0.0576
3,Nathan Eovaldi,2019,UFA,BOS,4,68000000,17000000,SP,23,0.142,...,17000000,17000000,529,0.020164,4,0,0,0,0.0,0.0
4,A.J. Pollock,2019,UFA,ARI,5,60000000,12000000,CF,86,0.531,...,4000000,4000000,7396,0.281961,94864,6724,225,2209,0.070756,0.632025


### Etiquetas para los agentes libres

Crearemos un etiqueta para indicar si el pitcher o hitter es  un agente libre o no.

In [25]:
for i in range(0,period):
    # Condiciones
    condicion_hitter = [df_hitters[i].Jugador.isin(df_hitters_free_agents[i].Jugador)]
    condicion_pitcher = [df_pitchers[i].Jugador.isin(df_pitchers_free_agents[i].Jugador)]
    # Etiquetas
    etiquetas = ['Si']
    
    df_hitters[i]['Agente libre'] = np.select(condicion_hitter, etiquetas, default = 'No')
    df_pitchers[i]['Agente libre'] = np.select(condicion_pitcher, etiquetas, default = 'No')
    
    # Exportemos los dataframes
    df_hitters[i].to_csv('Data/New_Data/Hitters/All_Hitters/hitters_' + str(2011 + i) + '.csv', index = False)
    df_pitchers[i].to_csv('Data/New_Data/Pitchers/All_Pitchers/pitchers_' + str(2011 + i) + '.csv', index = False)

In [26]:
df_hitters[10].head()

Unnamed: 0,Jugador,Posicion,Juegos,Porcetnaje_juegos,At-bats,Bateos,Home-runs,RBI,Porcentaje_bateo,OPS,...,Sueldo_regular,Juegos_2,Porcetnaje_juegos_2,At-bats_2,Bateos_2,Home-runs_2,RBI_2,Porcentaje_bateo_2,OPS_2,Agente libre
0,Trea Turner,SS,148,0.914,595,195,28,77,0.328,0.911,...,4567568,21904,0.835396,354025,38025,784,5929,0.107584,0.829921,No
1,Bo Bichette,SS,159,0.982,640,191,29,102,0.298,0.828,...,587800,25281,0.964324,409600,36481,841,10404,0.088804,0.685584,No
2,Vladimir Guerrero Jr.,1B,161,0.994,604,188,48,111,0.311,1.002,...,605400,25921,0.988036,364816,35344,2304,12321,0.096721,1.004004,No
3,Whit Merrifield,RF,162,1.0,664,184,10,74,0.277,0.713,...,7450000,26244,1.0,440896,33856,100,5476,0.076729,0.508369,No
4,Freddie Freeman,1B,160,0.994,600,180,31,83,0.3,0.895,...,22409375,25600,0.988036,360000,32400,961,6889,0.09,0.801025,No


In [27]:
df_pitchers[9].head()

Unnamed: 0,Jugador,Posicion,Juegos,Juegos_iniciados,Inning_pitched,Bateos_pitcher,Carreras,Carreras_ganadas,Walks,Strike-outs,...,Strike-outs_2,Wins_2,Losses_2,Saves_2,WHIP_2,ERA_2,Control_2,Comando_2,Juegos_2_2,Agente libre
0,Lance Lynn,SP,13,13,84.0,64,34,31,25,89,...,7921,36,9,0,1.1236,11.0224,0.001094,12.6736,28561,No
1,German Marquez,SP,13,13,81.7,78,41,34,25,73,...,5329,16,36,0,1.5876,14.0625,0.001156,8.5264,28561,No
2,Kyle Hendricks,SP,12,12,81.3,73,26,26,8,64,...,4096,36,25,0,1.0,8.2944,0.00012,64.0,20736,No
3,Yu Darvish,SP,12,12,76.0,59,18,17,14,93,...,8649,64,9,0,0.9216,4.0401,0.000419,44.127551,20736,No
4,Brandon Woodruff,SP,13,13,73.7,55,26,25,18,91,...,8281,9,25,0,0.9801,9.3025,0.000736,25.558642,28561,No
