# Intro Guide

The output of `audl-pull.py` are files of comma-separated values (CSV).
The Pandas library is an elegant and powerful tool to explore these data.

In [1]:
import pandas as pd

To read a single CSV with the Pandas, use `read_csv`. This returns a `DataFrame` object. 

Let's read in the Madison Radicals' 2018 championship season!

In [2]:
mad2018 = pd.read_csv('../../data/processed/2018/2018_MadisonRadicals.csv', index_col=0)

The first column (column 0) of the CSV is the row number, so we set that as the index column (`index_col`). 

The `head()` function reveals the first five rows of the DataFrame. Each row is a game event.
Each game is time-ordered.

In [3]:
mad2018.head()

Unnamed: 0,Date/Time,Opponent,Point Elapsed Seconds,Line,Our Score - End of Point,Their Score - End of Point,Event Type,Action,Passer,Receiver,...,Begin Y,End Area,End X,End Y,Distance Unit of Measure,Absolute Distance,Lateral Distance,Toward Our Goal Distance,Teamname,Tournament
0,2018-04-07 19:38,Indianapolis AlleyCats,44,O,1,0,Offense,Catch,Nelson,K Brown,...,,,,,,,,,Madison Radicals,AUDL 2018
1,2018-04-07 19:38,Indianapolis AlleyCats,44,O,1,0,Offense,Catch,K Brown,Nelson,...,,,,,,,,,Madison Radicals,AUDL 2018
2,2018-04-07 19:38,Indianapolis AlleyCats,44,O,1,0,Offense,Catch,Nelson,Animal,...,,,,,,,,,Madison Radicals,AUDL 2018
3,2018-04-07 19:38,Indianapolis AlleyCats,44,O,1,0,Offense,Catch,Animal,Shriwise,...,,,,,,,,,Madison Radicals,AUDL 2018
4,2018-04-07 19:38,Indianapolis AlleyCats,44,O,1,0,Offense,Catch,Shriwise,K Brown,...,,,,,,,,,Madison Radicals,AUDL 2018


The DataFrame has columns that describe each game event.

In [4]:
mad2018.columns

Index(['Date/Time', 'Opponent', 'Point Elapsed Seconds', 'Line',
       'Our Score - End of Point', 'Their Score - End of Point', 'Event Type',
       'Action', 'Passer', 'Receiver', 'Defender', 'Hang Time (secs)',
       'Player 0', 'Player 1', 'Player 2', 'Player 3', 'Player 4', 'Player 5',
       'Player 6', 'Player 7', 'Player 8', 'Player 9', 'Player 10',
       'Player 11', 'Player 12', 'Player 13', 'Player 14', 'Player 15',
       'Player 16', 'Player 17', 'Player 18', 'Player 19', 'Player 20',
       'Player 21', 'Player 22', 'Player 23', 'Player 24', 'Player 25',
       'Player 26', 'Player 27', 'Elapsed Time (secs)', 'Begin Area',
       'Begin X', 'Begin Y', 'End Area', 'End X', 'End Y',
       'Distance Unit of Measure', 'Absolute Distance', 'Lateral Distance',
       'Toward Our Goal Distance', 'Teamname', 'Tournament'],
      dtype='object')

Individual columns can be selected...

In [5]:
actions = mad2018.Action
actions[:5]

0    Catch
1    Catch
2    Catch
3    Catch
4    Catch
Name: Action, dtype: object

...and attributes of those columns can be found.

In [6]:
actions.unique()

array(['Catch', 'Goal', 'Pull', 'D', 'Throwaway', 'EndOfFirstQuarter',
       'PullOb', 'Halftime', 'Stall', 'EndOfThirdQuarter', 'Drop',
       'GameOver'], dtype=object)

`unique` is useful when you want to know the values that a column can contain.

In [7]:
actions.value_counts()

Catch                3966
Goal                  722
Throwaway             421
Pull                  413
D                     238
Drop                   61
PullOb                 34
EndOfThirdQuarter      17
GameOver               17
EndOfFirstQuarter      17
Halftime               17
Stall                   7
Name: Action, dtype: int64

`value_counts` takes this a step further and provides the number of occurrences.

Furthermore, we can find subsets of the DataFrame that interest us.

Say... actions in games against their rivals the Minnesota Wind Chill

In [16]:
mad2018[ mad2018.Opponent == 'Minnesota Wind Chill'].head()

Unnamed: 0,Date/Time,Opponent,Point Elapsed Seconds,Line,Our Score - End of Point,Their Score - End of Point,Event Type,Action,Passer,Receiver,...,Begin Y,End Area,End X,End Y,Distance Unit of Measure,Absolute Distance,Lateral Distance,Toward Our Goal Distance,Teamname,Tournament
976,2018-05-05 15:25,Minnesota Wind Chill,45,O,1,0,Offense,Catch,K Brown,A Brown,...,,,,,,,,,Madison Radicals,AUDL 2018
977,2018-05-05 15:25,Minnesota Wind Chill,45,O,1,0,Offense,Catch,A Brown,Shriwise,...,,,,,,,,,Madison Radicals,AUDL 2018
978,2018-05-05 15:25,Minnesota Wind Chill,45,O,1,0,Offense,Catch,Shriwise,Akyuz,...,,,,,,,,,Madison Radicals,AUDL 2018
979,2018-05-05 15:25,Minnesota Wind Chill,45,O,1,0,Offense,Catch,Akyuz,Johnson,...,,,,,,,,,Madison Radicals,AUDL 2018
980,2018-05-05 15:25,Minnesota Wind Chill,45,O,1,0,Offense,Catch,Johnson,A Brown,...,,,,,,,,,Madison Radicals,AUDL 2018


But why read in a single CSV... when you can have it all?!?

The `glob` library is convenient for getting filenames using the wildcard (`*`) operator.

In [10]:
import glob

all_audl_files = glob.glob('../../data/processed/*/*.csv')
sorted(all_audl_files[:5])

['../output/AUDL2014_MadisonRadicals.csv',
 '../output/AUDL2016_PittsburghThunderbirds.csv',
 '../output/AUDL2018_ChicagoWildfire.csv',
 '../output/AUDL2018_DallasRoughnecks.csv',
 '../output/AUDL2018_IndianapolisAlleyCats.csv']

Let's read each file using the `read_csv` function and `index_col` option...

But because we are Python badasses, let's put that for-loop in a single line, so that each resulting DataFrame is in a list.

In [11]:
df_list = [pd.read_csv(dfteam, index_col=0) for dfteam in glob.glob('../../data/processed/*/*.csv')]

Using Pandas, we can concatenate ("smush together") all the DataFrames in to one!

In [12]:
audl = pd.concat(df_list)

What can we do we this monstrosity of data, you may ask. Well, just about anything!

There are far too many useful functions in Pandas to highlight here - nor could I do them justice as others have.

However, this should start to get your brain whirring over the possibilities in turning questions into answers with Pandas and the right data in hand.