**1. Late night sports analysis**

*Problem:*
A regional sports network is currently paying editors and analysts overtime after each game to produce same-day video analysis for their web properties. This overtime is costly and while some videos do gain lots of views, others do not.

What factors influence the popularity of these videos produced and published immediately after games? Importantly, can some of the analysis wait until the next morning?

*Target Variable:*
Editorial Video Plays reported in Google Analytics for game related content posted after the game played between the end of the game and the next morning.

*Benefit:*
If we could predict the value of producing this video in terms of plays that same night, we could recommend for or against paying analysts overtime versus finishing the work in the morning.

*Features:*
We would want to consider at least:
- wins versus losses,
- the record of the teams entering the game,
- if the game went to an overtime,
- the score of the game,
- the category of the video,
- home v. away games,
- the date of the game.

*Goals:*
We aim to hold video plays steady while reducing overtime costs.

*Risks and limitations:*
It is not clear that these factors correlate with a difference in videos plays for same-day game related content. We will also need to use multiple data sets in order to add all of these factors. FiveThirtyEight Elo score would have been a great addition, but their dataset contains no data for this year.

There may also be a risk that other factors like individual player records being broken, in-game fights or off-court storylines play a large role in the popularity of videos. However, this system will be a tool for a more data-informed decision at the end of games and human decision making can take these factors into account.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from pathlib import Path

In [2]:
videos = pd.read_csv('./data/PSuperMetrics.csv')
videos.head()

Unnamed: 0,Date,Hour,Client_ID,Game,Video_Category,Post_Date_Hour,Editorial_Video_Plays
0,2018-12-24,0,136319200.0,Knicks vs Hawks 2018-12-21,Front Office Interview,2018122119,1
1,2018-12-24,0,950482900.0,Knicks vs Hawks 2018-12-21,Front Office Interview,2018122119,1
2,2018-12-24,0,1589813000.0,Knicks vs Suns 2018-12-17,Feature,2018121720,1
3,2018-12-24,0,1864715000.0,Knicks vs Hawks 2018-12-21,Coach Interview,2018122017,1
4,2018-12-24,0,1923767000.0,"Knicks at 76ers 2018-12-19,Knicks vs Suns 2018...",Full Episode,2018121810,1


In [7]:
videos.tail()

Unnamed: 0,Date,Hour,Client_ID,Game,Video_Category,Post_Date_Hour,Editorial_Video_Plays
5321,2019-01-22,22,523570600.0,Knicks vs Thunder 2019-01-21,Highlight Clip,2019012115,1
5322,2019-01-22,22,838400800.0,Knicks vs Thunder 2019-01-21,Coach Interview,2019012212,1
5323,2019-01-22,22,838400800.0,Knicks vs Thunder 2019-01-21,Highlights Analysis,2019012212,1
5324,2019-01-22,23,137662000.0,Knicks vs Thunder 2019-01-21,Highlights Analysis,2019012116,1
5325,2019-01-22,23,1715855000.0,Knicks vs Thunder 2019-01-21,Highlights Analysis,2019012212,1


In [4]:
videos.dtypes

Date                      object
Hour                       int64
Client_ID                float64
Game                      object
Video_Category            object
Post_Date_Hour             int64
Editorial_Video_Plays      int64
dtype: object

In [5]:
videos.shape

(5326, 7)

In [6]:
videos.isnull().sum()

Date                     0
Hour                     0
Client_ID                0
Game                     0
Video_Category           0
Post_Date_Hour           0
Editorial_Video_Plays    0
dtype: int64