# Moneyball Project

The concept of “moneyball” — coined to describe the Oakland Athletics’ approach to building competitive teams despite being hamstrung with one of the sport’s lowest payrolls — entered the popular lexicon with Michael Lewis’ Moneyball: The Art of Winning an Unfair Game.

In this project we will apply our data wrangling and exploratory data analysis skills to baseball. We pulled 12 years of historical data from Kaggle, Mlb.com and Spotrac.com.

## Initial imports

Let's import the following libraries for this project. We will import pandas to read the csv file of batting statistics.

In [1]:
import pandas as pd
import os
import glob

**Load Data**

Read the batting stats data into a Pandas DataFrame

In [2]:
path = '../Resources/batting_stats_*.csv' 
all_files = glob.glob(path)
all_files

['../Resources/batting_stats_2019.csv',
 '../Resources/batting_stats_2018.csv',
 '../Resources/batting_stats_2020.csv',
 '../Resources/batting_stats_2021.csv',
 '../Resources/batting_stats_2010.csv',
 '../Resources/batting_stats_2011.csv',
 '../Resources/batting_stats_2013.csv',
 '../Resources/batting_stats_2012.csv',
 '../Resources/batting_stats_2016.csv',
 '../Resources/batting_stats_2017.csv',
 '../Resources/batting_stats_2015.csv',
 '../Resources/batting_stats_2014.csv']

**Create a variable `data` to store files**

Use `for` loop to iterate `all_files` and read all the files and append into `data`.

In [3]:
data = []

for file in all_files:
    df = pd.read_csv(file, index_col=None, header=0)    
    year = file[-8:-4]
    df['Year'] = year
    data.append(df)    

**Create a dataframe `stats_df`**

In [4]:
stats_df = pd.concat(data, axis=0, ignore_index=True)

In [5]:
stats_df.head()

Unnamed: 0,Player,Team,Pos,Age,G,AB,R,H,2B,3B,...,BB,SO,SH,SF,HBP,AVG,OBP,SLG,OPS,Year
0,Whit Merrifield,KC,2B,32,162,681,105,206,41,10,...,45,126,0,4,5,0.302,0.348,0.463,0.811,2019
1,Marcus Semien,OAK,SS,30,162,657,123,187,43,7,...,87,102,0,1,2,0.285,0.369,0.522,0.891,2019
2,Rafael Devers,BOS,3B,24,156,647,129,201,54,4,...,48,119,1,2,4,0.311,0.361,0.555,0.916,2019
3,Jonathan Villar,BAL,2B,30,162,642,111,176,33,5,...,61,176,2,4,4,0.274,0.339,0.453,0.792,2019
4,Ozzie Albies,ATL,2B,24,160,640,102,189,43,8,...,54,112,0,4,4,0.295,0.352,0.5,0.852,2019


In [6]:
stats_df.columns

Index(['Player', 'Team', 'Pos', 'Age', 'G', 'AB', 'R', 'H', '2B', '3B', 'HR',
       'RBI', 'SB', 'CS', 'BB', 'SO', 'SH', 'SF', 'HBP', 'AVG', 'OBP', 'SLG',
       'OPS', 'Year'],
      dtype='object')

**Re-arrange the columns**

In [7]:
stats_df = stats_df[['Year', 'Player', 'Team', 'Pos', 'Age', 'G', 'AB', 'R', 'H', '2B', '3B', 'HR','RBI', 
             'SB', 'CS', 'BB', 'SO', 'SH', 'SF', 'HBP', 'AVG', 'OBP', 'SLG','OPS']]

stats_df.head(20)

**Set the index as `Year`**

In [9]:
stats_df = stats_df.set_index('Year')

In [16]:
stats_df.head(10)

Unnamed: 0_level_0,Player,Team,Pos,Age,G,AB,R,H,2B,3B,...,CS,BB,SO,SH,SF,HBP,AVG,OBP,SLG,OPS
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019,Whit Merrifield,KC,2B,32,162,681,105,206,41,10,...,10,45,126,0,4,5,0.302,0.348,0.463,0.811
2019,Marcus Semien,OAK,SS,30,162,657,123,187,43,7,...,8,87,102,0,1,2,0.285,0.369,0.522,0.891
2019,Rafael Devers,BOS,3B,24,156,647,129,201,54,4,...,8,48,119,1,2,4,0.311,0.361,0.555,0.916
2019,Jonathan Villar,BAL,2B,30,162,642,111,176,33,5,...,9,61,176,2,4,4,0.274,0.339,0.453,0.792
2019,Ozzie Albies,ATL,2B,24,160,640,102,189,43,8,...,4,54,112,0,4,4,0.295,0.352,0.5,0.852
2019,Eduardo Escobar,ARI,3B,32,158,636,94,171,29,10,...,1,50,130,0,10,3,0.269,0.32,0.511,0.831
2019,Starlin Castro,MIA,2B,31,162,636,68,172,31,4,...,2,28,111,0,9,3,0.27,0.3,0.436,0.736
2019,Jose Abreu,CWS,1B,34,159,634,85,180,38,1,...,2,36,152,0,10,13,0.284,0.33,0.503,0.833
2019,Jorge Polanco,MIN,SS,28,153,631,107,186,40,7,...,3,60,116,2,7,4,0.295,0.356,0.485,0.841
2019,Ronald Acuna,ATL,OF,23,156,626,127,175,22,2,...,9,76,188,0,1,9,0.28,0.365,0.518,0.883


**Save the `csv` file for later use**

In [11]:
stats_df.to_csv('../Resources/stats_df.csv')