# Practice Data Analysis of the 2024 MLB season
- data from https://www.retrosheet.org/schedule/index.html
    - The information used here was obtained free of
      charge from and is copyrighted by Retrosheet.
      Interested parties may contact Retrosheet at
      20 Sunset Rd., Newark, DE 19711.

### Data Description: Field(s)   Meaning
1. Date in the form "yyyymmdd"
2. Number of game:
- "0" - a single game
- "1" - the first game of a double header including separate admission doubleheaders
- "2" - the second game of a double header including separate admission doubleheaders
3. day of week ("Sun","Mon","Tue","Wed","Thur","Fri","Sat")
4. visiting team
5. visiting team league
6. season game number for visiting team
7. home team
8. home team league
9. season game number for home team
10. day (D), night (N), afternoon (A), evening (E for twinight)
11. location
12. postponment/cancellation indicator
13. date of makeup if played in the form "yyyymmdd"

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('/Users/andrewbadzioch/Desktop/pandasPractice/data/2024schedule.csv')

In [3]:
mlb = df.copy()
mlb.head()

Unnamed: 0,Date,Num,Day,Visitor,League,Game,Home,League.1,Game.1,Day/Night,Location,Postponed,Makeup
0,20240320,0,Wednesday,LAN,NL,1,SDN,NL,1,n,SEO01,,
1,20240321,0,Thursday,SDN,NL,2,LAN,NL,2,n,SEO01,,
2,20240328,0,Thursday,MIL,NL,1,NYN,NL,1,d,NYC20,,
3,20240328,0,Thursday,ANA,AL,1,BAL,AL,1,d,BAL12,,
4,20240328,0,Thursday,ATL,NL,1,PHI,NL,1,d,PHI13,,


In [4]:
mlb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2430 entries, 0 to 2429
Data columns (total 13 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       2430 non-null   int64  
 1   Num        2430 non-null   int64  
 2   Day        2430 non-null   object 
 3   Visitor    2430 non-null   object 
 4   League     2430 non-null   object 
 5   Game       2430 non-null   int64  
 6   Home       2430 non-null   object 
 7   League.1   2430 non-null   object 
 8   Game.1     2430 non-null   int64  
 9   Day/Night  2430 non-null   object 
 10  Location   2430 non-null   object 
 11  Postponed  0 non-null      float64
 12  Makeup     0 non-null      float64
dtypes: float64(2), int64(4), object(7)
memory usage: 246.9+ KB


**Working with Attributes**
- shape
- size

In [5]:
mlb.shape

(2430, 13)

In [6]:
mlb.size

31590

In [7]:
# index location: .iloc[]
# remember that the index location will vary slightly from actual location
mlb.iloc[455]

Date         20240430
Num                 0
Day           Tuesday
Visitor           ATL
League             NL
Game               30
Home              SEA
League.1           AL
Game.1             30
Day/Night           n
Location        SEA03
Postponed         NaN
Makeup            NaN
Name: 455, dtype: object

In [8]:
# finding the total number of games played at each stadium using a Series
# total regular season games played = 162
mlb['Location'].value_counts()

Location
WAS11    81
OAK01    81
CLE08    81
TOR02    81
ANA01    81
ATL03    81
SFO03    81
PIT08    81
NYC21    81
MIN04    81
MIL06    81
CHI11    81
PHO01    81
SEA03    81
ARL03    81
MIA02    81
BAL12    81
CHI12    81
CIN09    81
HOU03    81
KAN06    81
BOS07    81
STP01    81
STL10    80
SAN02    80
NYC20    80
PHI13    80
LOS03    80
DET05    80
DEN02    79
LON01     2
SEO01     2
MEX02     2
BIR01     1
WIL02     1
Name: count, dtype: int64

In [9]:
# finding specific things about a certain thing
mlb.value_counts([mlb['Location'] == 'LOS03'])

Location
False       2350
True          80
Name: count, dtype: int64

In [10]:
# filtering within a Series
mlb[mlb['Location'] == 'LOS03']

Unnamed: 0,Date,Num,Day,Visitor,League,Game,Home,League.1,Game.1,Day/Night,Location,Postponed,Makeup
12,20240328,0,Thursday,SLN,NL,1,LAN,NL,3,d,LOS03,,
24,20240329,0,Friday,SLN,NL,2,LAN,NL,4,n,LOS03,,
38,20240330,0,Saturday,SLN,NL,3,LAN,NL,5,n,LOS03,,
54,20240331,0,Sunday,SLN,NL,4,LAN,NL,6,d,LOS03,,
68,20240401,0,Monday,SFN,NL,5,LAN,NL,7,n,LOS03,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2326,20240921,0,Saturday,COL,NL,155,LAN,NL,155,n,LOS03,,
2341,20240922,0,Sunday,COL,NL,156,LAN,NL,156,d,LOS03,,
2360,20240924,0,Tuesday,SDN,NL,157,LAN,NL,157,n,LOS03,,
2375,20240925,0,Wednesday,SDN,NL,158,LAN,NL,158,n,LOS03,,


In [11]:
mlb.value_counts([mlb['Location'] == 'HOU03'])

Location
False       2349
True          81
Name: count, dtype: int64

In [12]:
mlb[mlb['Location'] == 'HOU03']

Unnamed: 0,Date,Num,Day,Visitor,League,Game,Home,League.1,Game.1,Day/Night,Location,Postponed,Makeup
7,20240328,0,Thursday,NYA,AL,1,HOU,AL,1,d,HOU03,,
19,20240329,0,Friday,NYA,AL,2,HOU,AL,2,n,HOU03,,
35,20240330,0,Saturday,NYA,AL,3,HOU,AL,3,n,HOU03,,
47,20240331,0,Sunday,NYA,AL,4,HOU,AL,4,d,HOU03,,
63,20240401,0,Monday,TOR,AL,5,HOU,AL,5,n,HOU03,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2320,20240921,0,Saturday,ANA,AL,155,HOU,AL,155,n,HOU03,,
2333,20240922,0,Sunday,ANA,AL,156,HOU,AL,156,d,HOU03,,
2344,20240923,0,Monday,SEA,AL,157,HOU,AL,157,n,HOU03,,
2356,20240924,0,Tuesday,SEA,AL,158,HOU,AL,158,n,HOU03,,


In [13]:
mlb[mlb['Location'] == 'SEO01']

Unnamed: 0,Date,Num,Day,Visitor,League,Game,Home,League.1,Game.1,Day/Night,Location,Postponed,Makeup
0,20240320,0,Wednesday,LAN,NL,1,SDN,NL,1,n,SEO01,,
1,20240321,0,Thursday,SDN,NL,2,LAN,NL,2,n,SEO01,,


In [14]:
mlb[mlb['Location'] == 'MEX02']

Unnamed: 0,Date,Num,Day,Visitor,League,Game,Home,League.1,Game.1,Day/Night,Location,Postponed,Makeup
405,20240427,0,Saturday,HOU,AL,27,COL,NL,27,d,MEX02,,
423,20240428,0,Sunday,HOU,AL,28,COL,NL,28,d,MEX02,,
