# Introduction

This notebook provides a basic exploration of several datasets related to baseball statistics. The datasets cover a wide range of information, including player performance, team performance, and player biographical information. The goal of this exploration is to gain a better understanding of the structure and content of these datasets, which will inform further analysis and modeling.

## Contents:
1. [Dataset Loading](#dataset-loading)
2. [Basic Data Exploration](#basic-data-exploration)
    - Shape of Datasets and first look at rows
    - Datasets description and dictionaries
    - Data types
4. [Table Joins](#table-joins)
3. [Next Steps](#summary)

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np

### Dataset Loading <a class="anchor" id="dataset-loading"></a>

In [5]:
# Load datasets
raw_war = pd.read_csv('Raw datasets/war_daily_bat.csv')
raw_teams = pd.read_csv('Raw datasets/Teams.csv')
raw_batting = pd.read_csv('Raw datasets/Batting.csv')
raw_fielding = pd.read_csv('Raw datasets/Fielding.csv')
raw_people = pd.read_csv('Raw datasets/People.csv')
raw_salaries = pd.read_csv('Raw datasets/Salaries.csv')

### Basic Data Exploration  <a class="anchor" id="basic-data-exploration"></a>  
#### Shape of Datasets and first look at rows

In [6]:
# Check the shape of the datasets and view the first few rows
datasets = [raw_war, raw_teams, raw_batting, raw_fielding, raw_people, raw_salaries]
dataset_names = ['raw_war', 'raw_teams', 'raw_batting', 'raw_fielding', 'raw_people', 'raw_salaries']

for dataset, name in zip(datasets, dataset_names):
    print(f'\n{name}:')
    print(f'Shape: {dataset.shape}')
    print(dataset.head())


raw_war:
Shape: (121375, 49)
     name_common   age    mlb_ID  player_ID  year_ID team_ID  stint_ID lg_ID  \
0  David Aardsma  22.0  430911.0  aardsda01     2004     SFG         1    NL   
1  David Aardsma  24.0  430911.0  aardsda01     2006     CHC         1    NL   
2  David Aardsma  25.0  430911.0  aardsda01     2007     CHW         1    AL   
3  David Aardsma  26.0  430911.0  aardsda01     2008     BOS         1    AL   
4  David Aardsma  27.0  430911.0  aardsda01     2009     SEA         1    AL   

    PA   G  ...  oppRpG_rep  pyth_exponent  pyth_exponent_rep  waa_win_perc  \
0  0.0  11  ...     4.67092          1.890              1.890         0.500   
1  3.0  43  ...     4.86457          1.912              1.913         0.499   
2  0.0   2  ...     4.85895          1.912              1.912         0.500   
3  1.0   5  ...     4.69650          1.893              1.894         0.497   
4  0.0   3  ...     4.79788          1.905              1.905         0.500   

   waa_win_per

#### Datasets description and dictionaries
- **raw_war:** Comprehensive collection of baseball statistics, with a focus on player performance metrics.
- **raw_teams:** Comprehensive collection of team-level baseball statistics.
- **raw_batting:** Player-level batting statistics for each season.
- **raw_fielding:** Player-level fielding statistics for each season.
- **raw_people:** Contains personal and biographical information about baseball players.   
- **raw_salaries:** Player-level salary data for each season.


##### raw_war
| Column Name | Description |  
| --- | --- |
| name_common | Player name |
| age | Player age |
| mlb_ID | MLB ID code |
| player_ID | Player ID code |
| year_ID | Year |
| team_ID | Team |
| stint_ID | Player's stint (order of appearances within a season) |
| lg_ID | League |
| PA | Plate appearances when batting |
| G | Games |
| Inn | Innings played in the field |
| runs_bat | Runs above average |
| runs_br | Runs from baserunning |
| runs_dp | Runs from avoiding double plays |
| runs_field | Runs from fielding |
| runs_infield | Runs from infield defense |
| runs_outfield | Runs from outfield defense |
| runs_catcher | Runs from catcher defense |
| runs_good_plays | Runs from good fielding plays |
| runs_defense | Runs from all defensive plays |
| runs_position | Runs from positional scarcity |
| runs_position_p | Runs from positional scarcity, pitcher |
| runs_replacement | Runs from replacement level |
| runs_above_rep | Runs above replacement level |
| runs_above_avg | Runs above average |
| runs_above_avg_off | Runs above average, offense |
| runs_above_avg_def | Runs above average, defense |
| WAA | Wins above average |
| WAA_off | Wins above average, offense |
| WAA_def | Wins above average, defense |
| WAR | Wins above replacement |
| WAR_def | Wins above replacement, defense |
| WAR_off | Wins above replacement, offense |
| WAR_rep | Wins above replacement, replacement level |
| salary | Salary |
| pitcher | Pitcher indicator |
| teamRpG | Team runs per game |
| oppRpG | Opponent runs per game |
| oppRpPA_rep | Opponent runs per plate appearance, replacement level |
| oppRpG_rep | Opponent runs per game, replacement level |
| pyth_exponent | Pythagorean win percentage exponent |
| pyth_exponent_rep | Pythagorean win percentage exponent, replacement level |
| waa_win_perc | Win percentage, based on WAA |
| waa_win_perc_off | Win percentage, based on WAA, offense |
| waa_win_perc_def | Win percentage, based on WAA, defense |
| waa_win_perc_rep | Win percentage, based on WAA, replacement level |
| OPS_plus | OPS+, relative to league |
| TOB_lg | Times on base, relative to league |
| TB_lg | Total bases, relative to league |

##### raw_teams
| Column Name | Description |
| --- | --- |
| yearID | Year |
| lgID | League |
| teamID | Team |
| franchID | Franchise (links to TeamsFranchise table) |
| divID | Team's division |
| Rank | Position in final standings |
| G | Games played |
| Ghome | Games played at home |
| W | Wins |
| L | Losses |
| DivWin | Division Winner (Y or N) |
| WCWin | Wild Card Winner (Y or N) |
| LgWin | League Champion(Y or N) |
| WSWin | World Series Winner (Y or N) |
| R | Runs scored |
| AB | At bats |
| H | Hits by batters |
| 2B | Doubles |
| 3B | Triples |
| HR | Homeruns by batters |
| BB | Walks by batters |
| SO | Strikeouts by batters |
| SB | Stolen bases |
| CS | Caught stealing |
| HBP | Batters hit by pitch |
| SF | Sacrifice flies |
| RA | Opponents runs scored |
| ER | Earned runs allowed |
| ERA | Earned run average |
| CG | Complete games |
| SHO | Shutouts |
| SV | Saves |
| IPOuts | Outs Pitched (innings pitched x 3) |
| HA | Hits allowed |
| HRA | Homeruns allowed |
| BBA | Walks allowed |
| SOA | Strikeouts by pitchers |
| E | Errors |
| DP | Double Plays |
| FP | Fielding percentage |
| name | Team's full name |
| park | Name of team's home ballpark |
| attendance | Home attendance total |
| BPF | Three-year park factor for batters |
| PPF | Three-year park factor for pitchers |
| teamIDBR | Team ID used by Baseball Reference website |
| teamIDlahman45 | Team ID used in Lahman database version 4.5 |
| teamIDretro | Team ID used by Retrosheet |



##### raw_batting
| Column Name | Description |
| --- | --- |
| playerID | Player ID code |
| yearID | Year |
| stint | player's stint (order of appearances within a season) |
| teamID | Team |
| lgID | League |
| G | Games |
| AB | At Bats |
| R | Runs |
| H | Hits |
| 2B | Doubles |
| 3B | Triples |
| HR | Homeruns |
| RBI | Runs Batted In |
| SB | Stolen Bases |
| CS | Caught Stealing |
| BB | Base on Balls |
| SO | Strikeouts |
| IBB | Intentional walks |
| HBP | Hit by pitch |
| SH | Sacrifice hits |
| SF | Sacrifice flies |
| GIDP | Grounded into double plays |
| G_old | Old version of games (deprecated) |
| PA | Plate appearances |
| InnOuts | Time played in the field expressed as outs |
| PO | Putouts |
| A | Assists |
| E | Errors |
| DP | Double Plays |
| PB | Passed Balls (by catchers) |
| WP | Wild Pitches (by catchers) |
| SB | Stolen bases allowed (by catchers) |
| CS | Caught Stealing (by catchers) |
| ZR | Zone Rating |


##### raw_fielding
| Column Name | Description |
| --- | --- |
| playerID | Player ID code |
| yearID | Year |
| stint | player's stint (order of appearances within a season) |
| teamID | Team |
| lgID | League |
| POS | Position |
| G | Games |
| GS | Games Started |
| InnOuts | Time played in the field expressed as outs |
| PO | Putouts |
| A | Assists |
| E | Errors |
| DP | Double Plays |
| PB | Passed Balls (by catchers) |
| WP | Wild Pitches (by catchers) |
| SB | Stolen bases allowed (by catchers) |
| CS | Caught Stealing (by catchers) |
| ZR | Zone Rating |

##### raw_people
| Column Name | Description |
| --- | --- |
| playerID | Player ID code |
| birthYear | Year player was born |
| birthMonth | Month player was born |
| birthDay | Day player was born |
| birthCountry | Country where player was born |
| birthState | State where player was born |
| birthCity | City where player was born |
| deathYear | Year player died |
| deathMonth | Month player died |
| deathDay | Day player died |
| deathCountry | Country where player died |
| deathState | State where player died |
| deathCity | City where player died |
| nameFirst | Player's first name |
| nameLast | Player's last name |
| nameGiven | Player's given name (typically first and middle) |
| weight | Player's weight in pounds |
| height | Player's height in inches |
| bats | Player's batting hand (left, right, or both) |
| throws | Player's throwing hand (left or right) |
| debut | Date that player made first major league appearance |
| finalGame | Date that player made first major league appearance (blank if still active) |
| retroID | ID used by retrosheet |
| bbrefID | ID used by Baseball Reference website |

##### raw_salaries
| Column Name | Description |
| --- | --- |
| yearID | Year |
| teamID | Team |
| lgID | League |
| playerID | Player ID code |
| salary | Salary |


#### Data Types


In [7]:
print('\nraw_war:')
print(raw_war.dtypes)
print('\nraw_teams:')
print(raw_teams.dtypes)
print('\nraw_batting:')
print(raw_batting.dtypes)
print('\nraw_fielding:')
print(raw_fielding.dtypes)
print('\nraw_people:')
print(raw_people.dtypes)
print('\nraw_salaries:')
print(raw_salaries.dtypes)



raw_war:
name_common            object
age                   float64
mlb_ID                float64
player_ID              object
year_ID                 int64
team_ID                object
stint_ID                int64
lg_ID                  object
PA                    float64
G                       int64
Inn                   float64
runs_bat              float64
runs_br               float64
runs_dp               float64
runs_field            float64
runs_infield          float64
runs_outfield         float64
runs_catcher          float64
runs_good_plays       float64
runs_defense          float64
runs_position         float64
runs_position_p       float64
runs_replacement      float64
runs_above_rep        float64
runs_above_avg        float64
runs_above_avg_off    float64
runs_above_avg_def    float64
WAA                   float64
WAA_off               float64
WAA_def               float64
WAR                   float64
WAR_def               float64
WAR_off               float64


### Table Joins <a class="anchor" id="table-joins"></a>

Join tables to create a single dataset that contains player-level information for each season. The resulting dataset will be used for further analysis and modeling.

In [38]:
pd.set_option('display.max_columns', None)

def show_player_info(dataset, player_name):
    '''Display player information for sanity check'''

    player_info = dataset[dataset['name_common'] == player_name]
    
    return player_info


# Join raw_batting and raw_war on playerID = player_ID, yearID = year_ID, and stint = stint_ID
batting_war = pd.merge(raw_batting, raw_war, how='left', left_on=['playerID', 'yearID', 'stint'], right_on=['player_ID', 'year_ID', 'stint_ID'])

show_player_info(batting_war, 'Derek Jeter')

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G_x,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP,name_common,age,mlb_ID,player_ID,year_ID,team_ID,stint_ID,lg_ID,PA,G_y,Inn,runs_bat,runs_br,runs_dp,runs_field,runs_infield,runs_outfield,runs_catcher,runs_good_plays,runs_defense,runs_position,runs_position_p,runs_replacement,runs_above_rep,runs_above_avg,runs_above_avg_off,runs_above_avg_def,WAA,WAA_off,WAA_def,WAR,WAR_def,WAR_off,WAR_rep,salary,pitcher,teamRpG,oppRpG,oppRpPA_rep,oppRpG_rep,pyth_exponent,pyth_exponent_rep,waa_win_perc,waa_win_perc_off,waa_win_perc_def,waa_win_perc_rep,OPS_plus,TOB_lg,TB_lg
73466,jeterde01,1995,1,NYA,AL,15,48,5,12,4,1,0,7.0,0.0,0.0,3,11.0,0.0,0.0,0.0,0.0,0.0,Derek Jeter,21.0,116539.0,jeterde01,1995.0,NYY,1.0,AL,51.0,15.0,120.0,-2.27,0.1,0.4,-4.6,0.0,0.0,0.0,,-4.6,0.85,0.0,1.91,-3.6,-5.5,-0.9,-3.8,-0.51,-0.08,-0.35,-0.34,-0.35,0.09,0.17,,N,5.03944,5.10078,0.08892,4.97328,1.935,1.932,0.4657,0.4941,0.4768,0.4878,74.011305,17.483,20.405
74712,jeterde01,1996,1,NYA,AL,157,582,104,183,25,6,10,78.0,14.0,7.0,48,102.0,1.0,9.0,6.0,9.0,13.0,Derek Jeter,22.0,116539.0,jeterde01,1996.0,NYY,1.0,AL,654.0,157.0,1370.7,9.09,1.6,1.81,-10.5,-3.0,0.0,0.0,,-13.5,9.78,0.0,24.53,33.3,8.8,22.3,-3.7,1.08,2.16,-0.12,3.29,-0.12,4.37,2.21,130000.0,N,5.53993,5.39802,0.09509,5.24181,1.977,1.962,0.505,0.5128,0.4979,0.4856,101.426778,227.902,260.096
75964,jeterde01,1997,1,NYA,AL,159,654,116,190,31,7,10,70.0,23.0,12.0,74,125.0,0.0,10.0,8.0,2.0,14.0,Derek Jeter,23.0,116539.0,jeterde01,1997.0,NYY,1.0,AL,748.0,159.0,1417.0,14.01,2.19,1.46,-1.8,-1.0,0.0,0.0,,-2.8,10.26,0.0,26.03,51.2,25.1,27.9,7.5,2.55,2.8,0.82,4.96,0.82,5.21,2.41,550000.0,N,5.13784,4.96224,0.08613,4.79852,1.933,1.914,0.5151,0.5168,0.5045,0.484,103.121196,252.192,280.501
77212,jeterde01,1998,1,NYA,AL,149,626,127,203,25,8,19,84.0,30.0,6.0,57,119.0,1.0,5.0,3.0,3.0,13.0,Derek Jeter,24.0,116539.0,jeterde01,1998.0,NYY,1.0,AL,694.0,149.0,1304.7,38.05,4.36,1.83,2.4,0.0,0.0,0.0,,2.4,9.0,0.0,24.12,79.8,55.6,53.2,11.4,5.29,5.1,1.14,7.53,1.14,7.34,2.24,750000.0,N,5.39864,5.04133,0.08829,4.87948,1.951,1.923,0.5349,0.5334,0.5073,0.4843,126.75572,232.314,267.114
78541,jeterde01,1999,1,NYA,AL,158,627,134,219,37,9,24,102.0,19.0,8.0,91,116.0,5.0,12.0,3.0,6.0,12.0,Derek Jeter,25.0,116539.0,jeterde01,1999.0,NYY,1.0,AL,739.0,158.0,1395.7,58.12,3.11,2.1,-9.1,-2.0,0.0,0.0,,-11.1,9.56,0.0,25.94,87.7,61.8,72.9,-1.5,5.58,6.65,-0.11,8.0,-0.11,9.07,2.42,5000000.0,N,5.75498,5.29365,0.09295,5.12948,1.983,1.95,0.5349,0.5413,0.4991,0.4846,153.329041,253.846,273.56
79849,jeterde01,2000,1,NYA,AL,148,593,119,201,31,4,15,73.0,22.0,4.0,68,99.0,4.0,12.0,3.0,3.0,14.0,Derek Jeter,26.0,116539.0,jeterde01,2000.0,NYY,1.0,AL,679.0,148.0,1278.7,31.63,6.73,0.88,-18.6,-4.0,0.0,0.0,,-22.6,8.7,0.0,24.01,49.4,25.3,47.9,-13.9,2.35,4.42,-1.19,4.57,-1.19,6.64,2.22,10000000.0,N,5.64178,5.31786,0.09361,5.15564,1.979,1.953,0.5153,0.5292,0.4915,0.4849,128.094926,235.18,262.402
81221,jeterde01,2001,1,NYA,AL,150,614,110,191,35,3,21,74.0,27.0,3.0,56,99.0,3.0,10.0,5.0,1.0,13.0,Derek Jeter,27.0,116539.0,jeterde01,2001.0,NYY,1.0,AL,686.0,150.0,1312.3,30.7,6.72,0.22,-14.7,-2.0,0.0,0.0,,-16.7,8.5,0.0,24.04,53.5,29.4,46.1,-8.2,2.89,4.49,-0.71,5.19,-0.71,6.79,2.3,12600000.0,N,5.19318,4.88558,0.0853,4.72534,1.932,1.906,0.5187,0.5295,0.4947,0.4841,123.98488,228.476,264.573
82554,jeterde01,2002,1,NYA,AL,157,644,124,191,26,0,18,75.0,32.0,3.0,73,114.0,2.0,7.0,3.0,3.0,14.0,Derek Jeter,28.0,116539.0,jeterde01,2002.0,NYY,1.0,AL,730.0,157.0,1392.3,14.37,8.09,-0.07,-15.4,-3.0,0.0,0.0,,-18.4,8.92,0.0,25.26,38.2,12.9,31.3,-9.5,1.29,3.13,-0.87,3.67,-0.87,5.51,2.38,14600000.0,N,5.02959,4.83016,0.08422,4.66929,1.92,1.9,0.5079,0.5194,0.4941,0.4839,111.404453,241.073,273.764
83882,jeterde01,2003,1,NYA,AL,119,482,87,156,25,3,10,52.0,11.0,5.0,43,88.0,2.0,13.0,3.0,1.0,10.0,Derek Jeter,29.0,116539.0,jeterde01,2003.0,NYY,1.0,AL,542.0,119.0,1033.7,19.69,3.19,1.43,-11.0,-2.0,0.0,0.0,0.0,-13.0,6.32,0.0,19.34,37.0,17.6,30.6,-6.7,1.74,3.0,-0.58,3.57,-0.58,4.83,1.83,15600000.0,N,5.16126,4.90387,0.08597,4.74131,1.931,1.908,0.5141,0.5247,0.4946,0.4839,124.70112,178.571,204.754
85286,jeterde01,2004,1,NYA,AL,154,643,111,188,44,1,23,78.0,23.0,4.0,46,99.0,1.0,14.0,16.0,2.0,19.0,Derek Jeter,30.0,116539.0,jeterde01,2004.0,NYY,1.0,AL,721.0,154.0,1341.7,20.01,4.43,-1.07,-12.0,0.0,0.0,0.0,-1.0,-13.0,8.18,0.0,24.99,43.5,18.6,31.6,-4.8,1.91,3.04,-0.29,4.24,-0.29,5.37,2.33,18600000.0,N,5.21688,5.01201,0.08781,4.84975,1.94,1.92,0.5114,0.5194,0.497,0.4842,114.248715,236.668,276.812


In [46]:
# Join batting_war and raw_people on playerID = playerID
batting_war_people = pd.merge(batting_war, raw_people, how='left', left_on='playerID', right_on='playerID')

show_player_info(batting_war_people, 'Derek Jeter')

Unnamed: 0,playerID,yearID,stint,teamID,lgID,G_x,AB,R,H,2B,3B,HR,RBI,SB,CS,BB,SO,IBB,HBP,SH,SF,GIDP,name_common,age,mlb_ID,player_ID,year_ID,team_ID,stint_ID,lg_ID,PA,G_y,Inn,runs_bat,runs_br,runs_dp,runs_field,runs_infield,runs_outfield,runs_catcher,runs_good_plays,runs_defense,runs_position,runs_position_p,runs_replacement,runs_above_rep,runs_above_avg,runs_above_avg_off,runs_above_avg_def,WAA,WAA_off,WAA_def,WAR,WAR_def,WAR_off,WAR_rep,salary,pitcher,teamRpG,oppRpG,oppRpPA_rep,oppRpG_rep,pyth_exponent,pyth_exponent_rep,waa_win_perc,waa_win_perc_off,waa_win_perc_def,waa_win_perc_rep,OPS_plus,TOB_lg,TB_lg,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID
73466,jeterde01,1995,1,NYA,AL,15,48,5,12,4,1,0,7.0,0.0,0.0,3,11.0,0.0,0.0,0.0,0.0,0.0,Derek Jeter,21.0,116539.0,jeterde01,1995.0,NYY,1.0,AL,51.0,15.0,120.0,-2.27,0.1,0.4,-4.6,0.0,0.0,0.0,,-4.6,0.85,0.0,1.91,-3.6,-5.5,-0.9,-3.8,-0.51,-0.08,-0.35,-0.34,-0.35,0.09,0.17,,N,5.03944,5.10078,0.08892,4.97328,1.935,1.932,0.4657,0.4941,0.4768,0.4878,74.011305,17.483,20.405,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
74712,jeterde01,1996,1,NYA,AL,157,582,104,183,25,6,10,78.0,14.0,7.0,48,102.0,1.0,9.0,6.0,9.0,13.0,Derek Jeter,22.0,116539.0,jeterde01,1996.0,NYY,1.0,AL,654.0,157.0,1370.7,9.09,1.6,1.81,-10.5,-3.0,0.0,0.0,,-13.5,9.78,0.0,24.53,33.3,8.8,22.3,-3.7,1.08,2.16,-0.12,3.29,-0.12,4.37,2.21,130000.0,N,5.53993,5.39802,0.09509,5.24181,1.977,1.962,0.505,0.5128,0.4979,0.4856,101.426778,227.902,260.096,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
75964,jeterde01,1997,1,NYA,AL,159,654,116,190,31,7,10,70.0,23.0,12.0,74,125.0,0.0,10.0,8.0,2.0,14.0,Derek Jeter,23.0,116539.0,jeterde01,1997.0,NYY,1.0,AL,748.0,159.0,1417.0,14.01,2.19,1.46,-1.8,-1.0,0.0,0.0,,-2.8,10.26,0.0,26.03,51.2,25.1,27.9,7.5,2.55,2.8,0.82,4.96,0.82,5.21,2.41,550000.0,N,5.13784,4.96224,0.08613,4.79852,1.933,1.914,0.5151,0.5168,0.5045,0.484,103.121196,252.192,280.501,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
77212,jeterde01,1998,1,NYA,AL,149,626,127,203,25,8,19,84.0,30.0,6.0,57,119.0,1.0,5.0,3.0,3.0,13.0,Derek Jeter,24.0,116539.0,jeterde01,1998.0,NYY,1.0,AL,694.0,149.0,1304.7,38.05,4.36,1.83,2.4,0.0,0.0,0.0,,2.4,9.0,0.0,24.12,79.8,55.6,53.2,11.4,5.29,5.1,1.14,7.53,1.14,7.34,2.24,750000.0,N,5.39864,5.04133,0.08829,4.87948,1.951,1.923,0.5349,0.5334,0.5073,0.4843,126.75572,232.314,267.114,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
78541,jeterde01,1999,1,NYA,AL,158,627,134,219,37,9,24,102.0,19.0,8.0,91,116.0,5.0,12.0,3.0,6.0,12.0,Derek Jeter,25.0,116539.0,jeterde01,1999.0,NYY,1.0,AL,739.0,158.0,1395.7,58.12,3.11,2.1,-9.1,-2.0,0.0,0.0,,-11.1,9.56,0.0,25.94,87.7,61.8,72.9,-1.5,5.58,6.65,-0.11,8.0,-0.11,9.07,2.42,5000000.0,N,5.75498,5.29365,0.09295,5.12948,1.983,1.95,0.5349,0.5413,0.4991,0.4846,153.329041,253.846,273.56,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
79849,jeterde01,2000,1,NYA,AL,148,593,119,201,31,4,15,73.0,22.0,4.0,68,99.0,4.0,12.0,3.0,3.0,14.0,Derek Jeter,26.0,116539.0,jeterde01,2000.0,NYY,1.0,AL,679.0,148.0,1278.7,31.63,6.73,0.88,-18.6,-4.0,0.0,0.0,,-22.6,8.7,0.0,24.01,49.4,25.3,47.9,-13.9,2.35,4.42,-1.19,4.57,-1.19,6.64,2.22,10000000.0,N,5.64178,5.31786,0.09361,5.15564,1.979,1.953,0.5153,0.5292,0.4915,0.4849,128.094926,235.18,262.402,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
81221,jeterde01,2001,1,NYA,AL,150,614,110,191,35,3,21,74.0,27.0,3.0,56,99.0,3.0,10.0,5.0,1.0,13.0,Derek Jeter,27.0,116539.0,jeterde01,2001.0,NYY,1.0,AL,686.0,150.0,1312.3,30.7,6.72,0.22,-14.7,-2.0,0.0,0.0,,-16.7,8.5,0.0,24.04,53.5,29.4,46.1,-8.2,2.89,4.49,-0.71,5.19,-0.71,6.79,2.3,12600000.0,N,5.19318,4.88558,0.0853,4.72534,1.932,1.906,0.5187,0.5295,0.4947,0.4841,123.98488,228.476,264.573,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
82554,jeterde01,2002,1,NYA,AL,157,644,124,191,26,0,18,75.0,32.0,3.0,73,114.0,2.0,7.0,3.0,3.0,14.0,Derek Jeter,28.0,116539.0,jeterde01,2002.0,NYY,1.0,AL,730.0,157.0,1392.3,14.37,8.09,-0.07,-15.4,-3.0,0.0,0.0,,-18.4,8.92,0.0,25.26,38.2,12.9,31.3,-9.5,1.29,3.13,-0.87,3.67,-0.87,5.51,2.38,14600000.0,N,5.02959,4.83016,0.08422,4.66929,1.92,1.9,0.5079,0.5194,0.4941,0.4839,111.404453,241.073,273.764,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
83882,jeterde01,2003,1,NYA,AL,119,482,87,156,25,3,10,52.0,11.0,5.0,43,88.0,2.0,13.0,3.0,1.0,10.0,Derek Jeter,29.0,116539.0,jeterde01,2003.0,NYY,1.0,AL,542.0,119.0,1033.7,19.69,3.19,1.43,-11.0,-2.0,0.0,0.0,0.0,-13.0,6.32,0.0,19.34,37.0,17.6,30.6,-6.7,1.74,3.0,-0.58,3.57,-0.58,4.83,1.83,15600000.0,N,5.16126,4.90387,0.08597,4.74131,1.931,1.908,0.5141,0.5247,0.4946,0.4839,124.70112,178.571,204.754,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01
85286,jeterde01,2004,1,NYA,AL,154,643,111,188,44,1,23,78.0,23.0,4.0,46,99.0,1.0,14.0,16.0,2.0,19.0,Derek Jeter,30.0,116539.0,jeterde01,2004.0,NYY,1.0,AL,721.0,154.0,1341.7,20.01,4.43,-1.07,-12.0,0.0,0.0,0.0,-1.0,-13.0,8.18,0.0,24.99,43.5,18.6,31.6,-4.8,1.91,3.04,-0.29,4.24,-0.29,5.37,2.33,18600000.0,N,5.21688,5.01201,0.08781,4.84975,1.94,1.92,0.5114,0.5194,0.497,0.4842,114.248715,236.668,276.812,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01


In [51]:
# Join batting_war_people and raw_teams on teamID = teamID and yearID = yearID
batting_war_people_teams = pd.merge(batting_war_people, raw_teams, how='left', left_on=['teamID', 'yearID'], right_on=['teamID', 'yearID'])

show_player_info(batting_war_people_teams, 'Derek Jeter')

Unnamed: 0,playerID,yearID,stint,teamID,lgID_x,G_x,AB_x,R_x,H_x,2B_x,3B_x,HR_x,RBI,SB_x,CS_x,BB_x,SO_x,IBB,HBP_x,SH,SF_x,GIDP,name_common,age,mlb_ID,player_ID,year_ID,team_ID,stint_ID,lg_ID,PA,G_y,Inn,runs_bat,runs_br,runs_dp,runs_field,runs_infield,runs_outfield,runs_catcher,runs_good_plays,runs_defense,runs_position,runs_position_p,runs_replacement,runs_above_rep,runs_above_avg,runs_above_avg_off,runs_above_avg_def,WAA,WAA_off,WAA_def,WAR,WAR_def,WAR_off,WAR_rep,salary,pitcher,teamRpG,oppRpG,oppRpPA_rep,oppRpG_rep,pyth_exponent,pyth_exponent_rep,waa_win_perc,waa_win_perc_off,waa_win_perc_def,waa_win_perc_rep,OPS_plus,TOB_lg,TB_lg,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID,lgID_y,franchID,divID,Rank,G,Ghome,W,L,DivWin,WCWin,LgWin,WSWin,R_y,AB_y,H_y,2B_y,3B_y,HR_y,BB_y,SO_y,SB_y,CS_y,HBP_y,SF_y,RA,ER,ERA,CG,SHO,SV,IPouts,HA,HRA,BBA,SOA,E,DP,FP,name,park,attendance,BPF,PPF,teamIDBR,teamIDlahman45,teamIDretro
73466,jeterde01,1995,1,NYA,AL,15,48,5,12,4,1,0,7.0,0.0,0.0,3,11.0,0.0,0.0,0.0,0.0,0.0,Derek Jeter,21.0,116539.0,jeterde01,1995.0,NYY,1.0,AL,51.0,15.0,120.0,-2.27,0.1,0.4,-4.6,0.0,0.0,0.0,,-4.6,0.85,0.0,1.91,-3.6,-5.5,-0.9,-3.8,-0.51,-0.08,-0.35,-0.34,-0.35,0.09,0.17,,N,5.03944,5.10078,0.08892,4.97328,1.935,1.932,0.4657,0.4941,0.4768,0.4878,74.011305,17.483,20.405,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,2,145,73.0,79,65,N,Y,N,N,749,4947,1365,280,34,122,625,851.0,50.0,30.0,39.0,68.0,688,651,4.56,18,5,35,3854,1286,159,535,908,74,121,0.986,New York Yankees,Yankee Stadium II,1705263.0,99,98,NYY,NYA,NYA
74712,jeterde01,1996,1,NYA,AL,157,582,104,183,25,6,10,78.0,14.0,7.0,48,102.0,1.0,9.0,6.0,9.0,13.0,Derek Jeter,22.0,116539.0,jeterde01,1996.0,NYY,1.0,AL,654.0,157.0,1370.7,9.09,1.6,1.81,-10.5,-3.0,0.0,0.0,,-13.5,9.78,0.0,24.53,33.3,8.8,22.3,-3.7,1.08,2.16,-0.12,3.29,-0.12,4.37,2.21,130000.0,N,5.53993,5.39802,0.09509,5.24181,1.977,1.962,0.505,0.5128,0.4979,0.4856,101.426778,227.902,260.096,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,162,80.0,92,70,Y,N,Y,Y,871,5628,1621,293,28,162,632,909.0,96.0,46.0,41.0,72.0,787,744,4.65,6,9,52,4320,1469,143,610,1139,91,146,0.985,New York Yankees,Yankee Stadium II,2250877.0,101,100,NYY,NYA,NYA
75964,jeterde01,1997,1,NYA,AL,159,654,116,190,31,7,10,70.0,23.0,12.0,74,125.0,0.0,10.0,8.0,2.0,14.0,Derek Jeter,23.0,116539.0,jeterde01,1997.0,NYY,1.0,AL,748.0,159.0,1417.0,14.01,2.19,1.46,-1.8,-1.0,0.0,0.0,,-2.8,10.26,0.0,26.03,51.2,25.1,27.9,7.5,2.55,2.8,0.82,4.96,0.82,5.21,2.41,550000.0,N,5.13784,4.96224,0.08613,4.79852,1.933,1.914,0.5151,0.5168,0.5045,0.484,103.121196,252.192,280.501,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,2,162,80.0,96,66,N,Y,N,N,891,5710,1636,325,23,161,676,954.0,99.0,58.0,37.0,70.0,688,626,3.84,11,10,51,4403,1463,144,532,1165,104,156,0.983,New York Yankees,Yankee Stadium II,2580325.0,100,98,NYY,NYA,NYA
77212,jeterde01,1998,1,NYA,AL,149,626,127,203,25,8,19,84.0,30.0,6.0,57,119.0,1.0,5.0,3.0,3.0,13.0,Derek Jeter,24.0,116539.0,jeterde01,1998.0,NYY,1.0,AL,694.0,149.0,1304.7,38.05,4.36,1.83,2.4,0.0,0.0,0.0,,2.4,9.0,0.0,24.12,79.8,55.6,53.2,11.4,5.29,5.1,1.14,7.53,1.14,7.34,2.24,750000.0,N,5.39864,5.04133,0.08829,4.87948,1.951,1.923,0.5349,0.5334,0.5073,0.4843,126.75572,232.314,267.114,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,162,81.0,114,48,Y,N,Y,Y,965,5643,1625,290,31,207,653,1025.0,153.0,63.0,57.0,59.0,656,619,3.82,22,16,48,4370,1357,156,466,1080,98,146,0.984,New York Yankees,Yankee Stadium II,2955193.0,97,95,NYY,NYA,NYA
78541,jeterde01,1999,1,NYA,AL,158,627,134,219,37,9,24,102.0,19.0,8.0,91,116.0,5.0,12.0,3.0,6.0,12.0,Derek Jeter,25.0,116539.0,jeterde01,1999.0,NYY,1.0,AL,739.0,158.0,1395.7,58.12,3.11,2.1,-9.1,-2.0,0.0,0.0,,-11.1,9.56,0.0,25.94,87.7,61.8,72.9,-1.5,5.58,6.65,-0.11,8.0,-0.11,9.07,2.42,5000000.0,N,5.75498,5.29365,0.09295,5.12948,1.983,1.95,0.5349,0.5413,0.4991,0.4846,153.329041,253.846,273.56,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,162,81.0,98,64,Y,N,Y,Y,900,5568,1568,302,36,193,718,978.0,104.0,57.0,55.0,53.0,731,661,4.13,6,10,50,4319,1402,158,581,1111,111,132,0.982,New York Yankees,Yankee Stadium II,3292736.0,98,97,NYY,NYA,NYA
79849,jeterde01,2000,1,NYA,AL,148,593,119,201,31,4,15,73.0,22.0,4.0,68,99.0,4.0,12.0,3.0,3.0,14.0,Derek Jeter,26.0,116539.0,jeterde01,2000.0,NYY,1.0,AL,679.0,148.0,1278.7,31.63,6.73,0.88,-18.6,-4.0,0.0,0.0,,-22.6,8.7,0.0,24.01,49.4,25.3,47.9,-13.9,2.35,4.42,-1.19,4.57,-1.19,6.64,2.22,10000000.0,N,5.64178,5.31786,0.09361,5.15564,1.979,1.953,0.5153,0.5292,0.4915,0.4849,128.094926,235.18,262.402,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,161,80.0,87,74,Y,N,Y,Y,871,5556,1541,294,25,205,631,1007.0,99.0,48.0,57.0,50.0,814,753,4.76,9,6,40,4273,1458,177,577,1040,109,132,0.981,New York Yankees,Yankee Stadium II,3055435.0,99,98,NYY,NYA,NYA
81221,jeterde01,2001,1,NYA,AL,150,614,110,191,35,3,21,74.0,27.0,3.0,56,99.0,3.0,10.0,5.0,1.0,13.0,Derek Jeter,27.0,116539.0,jeterde01,2001.0,NYY,1.0,AL,686.0,150.0,1312.3,30.7,6.72,0.22,-14.7,-2.0,0.0,0.0,,-16.7,8.5,0.0,24.04,53.5,29.4,46.1,-8.2,2.89,4.49,-0.71,5.19,-0.71,6.79,2.3,12600000.0,N,5.19318,4.88558,0.0853,4.72534,1.932,1.906,0.5187,0.5295,0.4947,0.4841,123.98488,228.476,264.573,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,161,80.0,95,65,Y,N,Y,N,804,5577,1488,289,20,203,519,1035.0,161.0,53.0,64.0,43.0,713,649,4.02,7,9,57,4354,1429,158,465,1266,109,132,0.982,New York Yankees,Yankee Stadium II,3264907.0,102,100,NYY,NYA,NYA
82554,jeterde01,2002,1,NYA,AL,157,644,124,191,26,0,18,75.0,32.0,3.0,73,114.0,2.0,7.0,3.0,3.0,14.0,Derek Jeter,28.0,116539.0,jeterde01,2002.0,NYY,1.0,AL,730.0,157.0,1392.3,14.37,8.09,-0.07,-15.4,-3.0,0.0,0.0,,-18.4,8.92,0.0,25.26,38.2,12.9,31.3,-9.5,1.29,3.13,-0.87,3.67,-0.87,5.51,2.38,14600000.0,N,5.02959,4.83016,0.08422,4.66929,1.92,1.9,0.5079,0.5194,0.4941,0.4839,111.404453,241.073,273.764,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,161,80.0,103,58,Y,N,N,N,897,5601,1540,314,12,223,640,1171.0,100.0,38.0,72.0,41.0,697,625,3.87,9,11,53,4356,1441,144,403,1135,127,117,0.979,New York Yankees,Yankee Stadium II,3465807.0,100,99,NYY,NYA,NYA
83882,jeterde01,2003,1,NYA,AL,119,482,87,156,25,3,10,52.0,11.0,5.0,43,88.0,2.0,13.0,3.0,1.0,10.0,Derek Jeter,29.0,116539.0,jeterde01,2003.0,NYY,1.0,AL,542.0,119.0,1033.7,19.69,3.19,1.43,-11.0,-2.0,0.0,0.0,0.0,-13.0,6.32,0.0,19.34,37.0,17.6,30.6,-6.7,1.74,3.0,-0.58,3.57,-0.58,4.83,1.83,15600000.0,N,5.16126,4.90387,0.08597,4.74131,1.931,1.908,0.5141,0.5247,0.4946,0.4839,124.70112,178.571,204.754,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,163,82.0,101,61,Y,N,Y,N,877,5605,1518,304,14,230,684,1042.0,98.0,33.0,81.0,35.0,716,653,4.02,8,12,49,4386,1512,145,375,1119,114,126,0.981,New York Yankees,Yankee Stadium II,3465600.0,98,97,NYY,NYA,NYA
85286,jeterde01,2004,1,NYA,AL,154,643,111,188,44,1,23,78.0,23.0,4.0,46,99.0,1.0,14.0,16.0,2.0,19.0,Derek Jeter,30.0,116539.0,jeterde01,2004.0,NYY,1.0,AL,721.0,154.0,1341.7,20.01,4.43,-1.07,-12.0,0.0,0.0,0.0,-1.0,-13.0,8.18,0.0,24.99,43.5,18.6,31.6,-4.8,1.91,3.04,-0.29,4.24,-0.29,5.37,2.33,18600000.0,N,5.21688,5.01201,0.08781,4.84975,1.94,1.92,0.5114,0.5194,0.497,0.4842,114.248715,236.668,276.812,1974.0,6.0,26.0,USA,NJ,Pequannock,,,,,,,Derek,Jeter,Derek Sanderson,195.0,75.0,R,R,1995-05-29,2014-09-28,jeted001,jeterde01,AL,NYY,E,1,162,81.0,101,61,Y,N,N,N,897,5527,1483,281,20,242,670,982.0,84.0,33.0,80.0,50.0,808,752,4.69,1,5,59,4331,1532,182,445,1058,99,148,0.984,New York Yankees,Yankee Stadium II,3775292.0,98,97,NYY,NYA,NYA


In [53]:
# Shape of batting_war_people_teams
print(f'Shape of batting_war_people_teams: {batting_war_people_teams.shape}')

Shape of batting_war_people_teams: (112184, 140)


In [52]:
# Save batting_war_people_teams to csv
batting_war_people_teams.to_csv('batting_war_people_teams.csv', index=False)

### Next Steps <a class="anchor" id="summary"></a>
- Data wrangling to clean and prepare data for modeling
- EDA on joined dataset to identify potential features for modeling


-----
-----
-----