<a href="https://colab.research.google.com/github/BoomerPython/Week_4/blob/main/DSA_BoomerPython_Week4_Plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Visualizing Baseball Attendance Data
# Based on Miller (2015)

# Import packages for analysis and modeling
import numpy as np
import pandas as pd


The following examples are explatory data analysis of Dodgers' attendance data.  Based on a case study from Thomas Miller's *Modeling Techniques in Predictive Analytics with Python*, the code below uses Plotly to demonstrate some basic visualizations.



In [8]:
# Alternative visualizations
# Plotting with Plotly - more details here: https://plotly.com/python/

# Standard plotly imports
import plotly as py
import plotly.graph_objs as go
from plotly.offline import iplot, init_notebook_mode

# Import wrapper for using dataframes

import cufflinks
cufflinks.go_offline(connected=True)
init_notebook_mode(connected=True)

In [9]:
# FOR USING PLOTLY WITHIN COLAB WE NEED TO CHANGE THE DEFAULT RENDER
# CHECK HERE FOR MORE DETAILS BASED ON YOUR SETUP
# https://plotly.com/python/renderers/#setting-the-default-renderer

import plotly.io as pio
pio.renderers.default = 'colab'

In [4]:
# OBTAIN - THERE ARE MANY WAYS TO READ DATA INTO COLAB
# THIS IS ONE - ASSUMES YOU HAVE DATA STORED WHERE YOU CAN ACCESS IT

from google.colab import files
import io

uploaded = files.upload()

# USE THE CHOOSE FILES BUTTON TO OPEN UP NAVIGATION TO DESIRED FILE

Saving dodgers.csv to dodgers (1).csv


In [5]:
# OBTAIN - NOW THAT THE FILE IS AVAILABLE IN COLAB - WE NEED TO READ THE DATA
# INTO A DF

dodgers = pd.read_csv(io.StringIO(uploaded['dodgers.csv'].decode('utf-8')))

# Examine the structure of the data frame
print("\nContents of dodgers data frame ---------------")

print(pd.DataFrame.head(dodgers))


Contents of dodgers data frame ---------------
  month  day  attend day_of_week  ... cap  shirt fireworks bobblehead
0   APR   10   56000     Tuesday  ...  NO     NO        NO         NO
1   APR   11   29729   Wednesday  ...  NO     NO        NO         NO
2   APR   12   28328    Thursday  ...  NO     NO        NO         NO
3   APR   13   31601      Friday  ...  NO     NO       YES         NO
4   APR   14   46549    Saturday  ...  NO     NO        NO         NO

[5 rows x 12 columns]


In [6]:
# SCRUB - The following data manipulation is used to help with presenting the 
# data in a meaningful way

# attendance in thousands for plotting 
dodgers['attend_000'] = dodgers['attend']/1000

# a little manipulation of the days of week
mondays = dodgers[dodgers['day_of_week'] == 'Monday']
tuesdays = dodgers[dodgers['day_of_week'] == 'Tuesday']
wednesdays = dodgers[dodgers['day_of_week'] == 'Wednesday']
thursdays = dodgers[dodgers['day_of_week'] == 'Thursday']
fridays = dodgers[dodgers['day_of_week'] == 'Friday']
saturdays = dodgers[dodgers['day_of_week'] == 'Saturday']
sundays = dodgers[dodgers['day_of_week'] == 'Sunday']

# convert days' attendance into list of vectors for box plot
data = [mondays['attend_000'], tuesdays['attend_000'], 
    wednesdays['attend_000'], thursdays['attend_000'], 
    fridays['attend_000'], saturdays['attend_000'], 
    sundays['attend_000']]
ordered_day_names = ['Mon', 'Tue', 'Wed', 'Thur', 'Fri', 'Sat', 'Sun']

# a little manipulation in case we want to see the list of opponents
# in order
ordered_team_names = (sorted(set(dodgers['opponent']), reverse = True))

# How might we order the days of week to something we are familiar with?

  month  day  attend day_of_week  ... shirt  fireworks bobblehead attend_000
0   APR   10   56000     Tuesday  ...    NO         NO         NO     56.000
1   APR   11   29729   Wednesday  ...    NO         NO         NO     29.729
2   APR   12   28328    Thursday  ...    NO         NO         NO     28.328
3   APR   13   31601      Friday  ...    NO        YES         NO     31.601
4   APR   14   46549    Saturday  ...    NO         NO         NO     46.549

[5 rows x 13 columns]


In [10]:
# EXPLORE - Look at distribution of attendance across data

dodgers['attend_000'].iplot(kind = 'hist', xTitle = 'Attendance', 
                            yTitle='Count', title="Distribution of Attendance at Dodgers' Games")


In [11]:
# EXPLORE - What about a box plot?

dodgers['attend'].iplot(kind = 'box', yTitle='Attendance', 
                            title="Boxplot of Attendance at Dodgers' Games")


In [12]:
# EXPLORE - What about a box plot conditioned on the day of the week?

dodgers.pivot(columns='day_of_week', values='attend').iplot(
      kind='box', yTitle='Attendance', title='Attendance by Day of Week')

In [13]:
# EXPLORE - What about small multiples? 
# Remember the point of the chart is to tell a story OR to find a pattern in 
# the data that might be helpful for modeling

import plotly.express as px
fig = px.scatter(dodgers, x='temp', y='attend', facet_col='day_night', 
                 color="bobblehead", 
                 title='Attendance by Day & Night & Bobbleheads')
fig.show()

We can see a somewhat negative relationship between temp and attendance.

We can also see that, collectively, the games with bobblehead promotions 
appear to have larger attendance.