# playwithdata
Thye purpose of this file is to show off some of the basic data and stats that the other files will be working with. It's not a testbed as much as it is a simple demonstration of what the data looks like and how some of the more basic functions work (like calculating individual player stats). The actual functions appear in `baseballstats.py`, imported in the next cell as "baseballstats".

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import baseballstats as bbs

In [2]:
players = pd.read_csv("baseballdatabank-2022.2/core/People.csv")
players['playerID'] = players['playerID']


In [3]:
players.head()

Unnamed: 0,playerID,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,...,nameLast,nameGiven,weight,height,bats,throws,debut,finalGame,retroID,bbrefID
0,aardsda01,1981.0,12.0,27.0,USA,CO,Denver,,,,...,Aardsma,David Allan,215.0,75.0,R,R,2004-04-06,2015-08-23,aardd001,aardsda01
1,aaronha01,1934.0,2.0,5.0,USA,AL,Mobile,2021.0,1.0,22.0,...,Aaron,Henry Louis,180.0,72.0,R,R,1954-04-13,1976-10-03,aaroh101,aaronha01
2,aaronto01,1939.0,8.0,5.0,USA,AL,Mobile,1984.0,8.0,16.0,...,Aaron,Tommie Lee,190.0,75.0,R,R,1962-04-10,1971-09-26,aarot101,aaronto01
3,aasedo01,1954.0,9.0,8.0,USA,CA,Orange,,,,...,Aase,Donald William,190.0,75.0,R,R,1977-07-26,1990-10-03,aased001,aasedo01
4,abadan01,1972.0,8.0,25.0,USA,FL,Palm Beach,,,,...,Abad,Fausto Andres,184.0,73.0,L,L,2001-09-10,2006-04-13,abada001,abadan01


In [4]:
for i, col in enumerate(players.columns):
    print(col, players.dtypes[i])

playerID object
birthYear float64
birthMonth float64
birthDay float64
birthCountry object
birthState object
birthCity object
deathYear float64
deathMonth float64
deathDay float64
deathCountry object
deathState object
deathCity object
nameFirst object
nameLast object
nameGiven object
weight float64
height float64
bats object
throws object
debut object
finalGame object
retroID object
bbrefID object


The first function that was created called `verify_player`. It's task was to simply make sure any given `playerID` was in fact in the database. If not, then it would return that it wasn't. This was largely to clarify any possible `KeyError`s that may spring up later and also served as a simple introduction to handling the data. Here, we simply checked it against the first 10 players in the database:

In [5]:
for player in players['playerID'].head(10):
    print(player)
    try:
        bbs.verify_player(player)
        print("worked!")
    except:
        print("nope :-(")

aardsda01
worked!
aaronha01
worked!
aaronto01
worked!
aasedo01
worked!
abadan01
worked!
abadfe01
worked!
abadijo01
worked!
abbated01
worked!
abbeybe01
worked!
abbeych01
worked!


The next set of functions just calculate some basic stats by digging through the database and combining the appropriate totals. It includes basic stats like batting average (AVG). It also includes helper functions for counting that can either be called in this notebook or within other functions in `baseballstats`. Batting/fielding stats will be tested on Barry Bonds (`playerID = bondsba01`) for consistency and Tom Seaver (`playerID = seaveto01`) for pitching. If you don't know what the stat represents, `baseballstats.py` constains more helpful documentation.

The numbers produced from Bonds and Seaver were verified by online searches.

In [8]:
bbs.verify_batter('bondsba01') #makes sure he's in the database
print(f"Barry Bonds' career regular season batting average was{bbs.AVG('bondsba01'):.3f}")
print(f"His career regular season on-base percentage was{bbs.OBP('bondsba01'): .3f}")
print(f"His career regular season slugging percentage was{bbs.SLG('bondsba01'): .3f}")
print(f"His career regular season OPS+ was {bbs.OPSplus('bondsba01'): .3f}")

Barry Bonds' career regular season batting average was0.298
His career regular season on-base percentage was 0.444
His career regular season slugging percentage was 0.607
His career regular season OPS+ was  181.860


In [9]:
bbs.verify_pitcher('seaveto01')
print(f"Tom Seaver's career regular season ERA was{bbs.ERA('seaveto01'): .2f}")
print(f"His career WHIP was{bbs.WHIP('seaveto01'): .2f}")
print(f"He struck out {int(bbs.count_pitching_stat('seaveto01', 'SO'))} batters over his career")
print(f"His career regular season ERA+ was {bbs.ERAplus('seaveto01'): .3f}")

Tom Seaver's career regular season ERA was 2.86
His career WHIP was 1.12
He struck out 3640 batters over his career
His career regular season ERA+ was  129.283
