# Red Sox Game Analysis

This file examines post-game entries at Kenmore station in comparison to expectations.

## Setup    

In [1]:
# Libraries.
import matplotlib, matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Feature libraries.
from features import date

In [2]:
# Package settings.
%matplotlib inline

## Data

In [81]:
# Read Red Sox home games.
sox = pd.DataFrame.from_csv("../../../data/sox_master.csv").reset_index(drop=True)

# Drop 0s from start date.
sox.START_DATE = sox.START_DATE.str.extract('(\d+-\d+-\d+)')
# Column for start datetime.
sox['game_datetime'] = pd.to_datetime(sox.START_DATE + ' ' + sox.START_TIME)

# Add a "rounded" game time to make grouping easier.
sox['game_time_rounded'] = sox.START_TIME.str.replace(':[012][05] ',':00 ').str.replace(':[345][05] ',':30 ')
sox['game_datetime_rounded'] = pd.to_datetime(sox.START_DATE + ' ' + sox.game_time_rounded)

# Turn start date into a date.
sox.START_DATE = pd.to_datetime(sox.START_DATE)

# Rename other columns to make joining easy.
sox.rename(columns={'START_DATE':'game_date', 'START_TIME':'game_time', 'OPPONENT':'opponent'}, inplace=True)

In [None]:
sox.head()

In [41]:
# Import gatecount data.
def get_data():
    return pd.DataFrame.from_csv("../../../data/mbta.csv").reset_index()
gatecount = date.init(get_data())

# Restrict to Kenmore and the columns that we need.
kenmore = gatecount.ix[gatecount.locationid==1059,['service_day','service_datetime','entries']].reset_index(drop=True)

# Turn days/dates into datetimes.
kenmore.service_day = pd.to_datetime(kenmore.service_day)
kenmore.service_datetime = pd.to_datetime(kenmore.service_datetime)

# Add day of week.
kenmore = date.add_day_of_week(kenmore.copy())

In [None]:
kenmore.head()

## Game Exploration 

Let's figure out when games happen.

In [84]:
# Copy DF for this analysis.
sox_only = sox.copy()
# Add weekday.
sox_only['day_of_week'] = pd.DatetimeIndex(sox_only.game_date).weekday

When do games occur during the week? **Monday is 0.**

In [85]:
games_by_dow = sox_only.groupby(['day_of_week']).agg({'game_time':len})
games_by_dow.columns = ['games']
games_by_dow

Unnamed: 0_level_0,games
day_of_week,Unnamed: 1_level_1
0,25
1,38
2,38
3,29
4,36
5,45
6,40


Answer: pretty consistently throughout the week. Slightly more on weekends, Tuesday, and Wednesday.

What about during the day?

In [90]:
# Group and aggregate by time.
games_by_time = sox_only.groupby(['game_time']).agg({'opponent':len})
games_by_time.columns = ['games']
# Sort and handle am/pm oddness.
games_by_time.sort(inplace=True)
games_by_time = games_by_time.ix[-2:,:].append(games_by_time.ix[:-2,:]) 
games_by_time

Unnamed: 0_level_0,games
game_time,Unnamed: 1_level_1
11:05 AM,3
12:35 PM,1
01:05 PM,7
01:10 PM,3
01:35 PM,44
02:05 PM,3
03:00 PM,1
03:05 PM,1
04:05 PM,14
05:30 PM,1


In [91]:
# Group and aggregate by rounded time.
games_by_time_rounded = sox_only.groupby(['game_time_rounded']).agg({'opponent':len})
games_by_time_rounded.columns = ['games']
# Sort and handle am/pm oddness.
games_by_time_rounded.sort(inplace=True)
games_by_time_rounded = games_by_time_rounded.ix[-2:,:].append(games_by_time_rounded.ix[:-2,:]) 
games_by_time_rounded

Unnamed: 0_level_0,games
game_time_rounded,Unnamed: 1_level_1
11:00 AM,3
12:30 PM,1
01:00 PM,10
01:30 PM,44
02:00 PM,3
03:00 PM,2
04:00 PM,14
05:30 PM,1
06:00 PM,2
06:30 PM,3


Now both. Use rounded time to cut down on volume.

In [94]:
# Group and aggregate by rounded time.
games_by_datetime = sox_only.groupby(['day_of_week','game_time_rounded']).agg({'opponent':len})
games_by_datetime.columns = ['games']
# Sort and handle am/pm oddness.
games_by_datetime.sort(inplace=True)
games_by_datetime

Unnamed: 0_level_0,Unnamed: 1_level_0,games
day_of_week,game_time_rounded,Unnamed: 2_level_1
0,01:30 PM,3
0,02:00 PM,1
0,06:00 PM,1
0,06:30 PM,1
0,07:00 PM,16
0,11:00 AM,3
1,01:00 PM,1
1,04:00 PM,1
1,06:00 PM,1
1,06:30 PM,1


Primetime games are common throughout, except for Sunday. Instead, Sunday features a lot of early afternoon games. As expected, weekdays don't show many afternoon games, though Friday shows more than other weekdays (at various times). It looks like primetime games Monday-Saturday and Sunday afternoon games show the highest potential.