## Final Tutorial
##### Nicole Tran, Azwa Bajwah, and Evelyn Zhao
### Analysis of Superbowl winners

Directions: In general, the tutorial should contain at least 1500 words of prose (excluding the comments) and 150 lines of (nonpadded, legitimate) Python code, along with appropriate documentation, visualization, and links to any external information that might help the reader. 

Grading
1. Motivation. Does the tutorial make the reader believe the topic is relevant or important (i) in
general and (ii) with respect to data science?
2. Understanding. After reading through the tutorial, does an uninformed reader feel informed about
the topic? Would a reader who already knew about the topic feel like s/he learned more about it?
3. Other resources. Does the tutorial link out to other resources (on the web, in books, etc) that
would give a lagging reader additional help on specific topics, or an advanced reader the ability to
dive more deeply into a specific application area or technique?
4. Prose. Does the prose portion of the tutorial actually add to the content of the deliverable?
5. Code. Is the code well written, well documented, reproducible, and does it help the reader understand
the tutorial? Does it give good examples of specific techniques?
6. Subjective evaluation. If somebody linked to this tutorial from, say, Hacker News, would people
actually read through the entire thing

In [31]:
import sqlite3 as sql
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from scipy import stats
from sklearn.linear_model import LinearRegression
import statsmodels.formula.api as smf
from plotnine import *

In [32]:
stadium_data = pd.read_csv('nfl_stadiums.csv', encoding='ISO-8859-1')
team_data = pd.read_csv('nfl_teams.csv', encoding='ISO-8859-1')
scores_data = pd.read_csv('spreadspoke_scores.csv', encoding='ISO-8859-1')

In [33]:
# not really useful
# stadium_data.head(10)

In [34]:
# also not that useful
# team_data.head(10)

In [35]:
scores_data.head(10)

Unnamed: 0,schedule_date,schedule_season,schedule_week,schedule_playoff,team_home,score_home,score_away,team_away,team_favorite_id,spread_favorite,over_under_line,stadium,stadium_neutral,weather_temperature,weather_wind_mph,weather_humidity,weather_detail
0,9/2/1966,1966,1,False,Miami Dolphins,14,23,Oakland Raiders,,,,Orange Bowl,False,83.0,6.0,71.0,
1,9/3/1966,1966,1,False,Houston Oilers,45,7,Denver Broncos,,,,Rice Stadium,False,81.0,7.0,70.0,
2,9/4/1966,1966,1,False,San Diego Chargers,27,7,Buffalo Bills,,,,Balboa Stadium,False,70.0,7.0,82.0,
3,9/9/1966,1966,2,False,Miami Dolphins,14,19,New York Jets,,,,Orange Bowl,False,82.0,11.0,78.0,
4,9/10/1966,1966,1,False,Green Bay Packers,24,3,Baltimore Colts,,,,Lambeau Field,False,64.0,8.0,62.0,
5,9/10/1966,1966,2,False,Houston Oilers,31,0,Oakland Raiders,,,,Rice Stadium,False,77.0,6.0,82.0,
6,9/10/1966,1966,2,False,San Diego Chargers,24,0,New England Patriots,,,,Balboa Stadium,False,69.0,9.0,81.0,
7,9/11/1966,1966,1,False,Atlanta Falcons,14,19,Los Angeles Rams,,,,Atlanta-Fulton County Stadium,False,71.0,7.0,57.0,
8,9/11/1966,1966,2,False,Buffalo Bills,20,42,Kansas City Chiefs,,,,War Memorial Stadium,False,63.0,11.0,73.0,
9,9/11/1966,1966,1,False,Detroit Lions,14,3,Chicago Bears,,,,Tiger Stadium,False,67.0,7.0,73.0,


### List of SuperBowl Winners:
https://www.espn.com/nfl/superbowl/history/winners

In [36]:
# remove columns that we will not be using
# (ev) do we want to do weather analysis? 
# (ev) i don't think it will be as informative as analyzing betting patterns - but, if we wanted to 
# see what kind of weather the superbowl winners play best in, ig we can use it
scores_data = scores_data.drop(['schedule_playoff', 'stadium_neutral', 'stadium', 'weather_temperature', 'weather_wind_mph', 'weather_humidity', 'weather_detail'], axis=1)
scores_data

Unnamed: 0,schedule_date,schedule_season,schedule_week,team_home,score_home,score_away,team_away,team_favorite_id,spread_favorite,over_under_line
0,9/2/1966,1966,1,Miami Dolphins,14,23,Oakland Raiders,,,
1,9/3/1966,1966,1,Houston Oilers,45,7,Denver Broncos,,,
2,9/4/1966,1966,1,San Diego Chargers,27,7,Buffalo Bills,,,
3,9/9/1966,1966,2,Miami Dolphins,14,19,New York Jets,,,
4,9/10/1966,1966,1,Green Bay Packers,24,3,Baltimore Colts,,,
...,...,...,...,...,...,...,...,...,...,...
13796,1/21/2024,2023,Division,Buffalo Bills,24,27,Kansas City Chiefs,BUF,-2.5,46
13797,1/21/2024,2023,Division,Detroit Lions,31,23,Tampa Bay Buccaneers,DET,-6.0,49.5
13798,1/28/2024,2023,Conference,Baltimore Ravens,10,17,Kansas City Chiefs,BAL,-4.5,44
13799,1/28/2024,2023,Conference,San Francisco 49ers,34,31,Detroit Lions,SF,-7.5,53.5


### Explaining some columns:

### spread_favorite
In football betting, the term "spread favorite" refers to the team that is expected to win by a certain number of points as set by the betting odds. This team is typically considered stronger or in a better position to win the game, and the "spread" is the number of points bookmakers estimate by which the favorite is expected to outscore the underdog.

Example of a Point Spread Bet:
Suppose in an NFL game, the Green Bay Packers are playing against the Detroit Lions. If the Packers are listed as the "spread favorite," the listing might look something like this:

- Green Bay Packers -7.5
- Detroit Lions +7.5

This means that the Packers are favored to win by more than 7.5 points. For a bet on the Packers to pay out, they must win by 8 points or more. Conversely, a bet on the Lions would win if the Lions lose by 7 points or fewer, or if they win the game outright.

### over_under_line
In football betting, the "over/under" line, also known as the total, is a wager on the combined score of both teams in a game. This betting line sets a predicted total score by the oddsmakers, and bettors can wager whether the actual combined score of the game will be over or under that set number.

Example of an Over/Under Bet:
Suppose in an NFL game between the New England Patriots and the Miami Dolphins, the over/under line is set at 47.5 points. Here are the betting options:

- Over 47.5 Points: If you bet the over, you are predicting that the combined score of both teams will be 48 points or more.
- Under 47.5 Points: If you bet the under, you are predicting that the combined score will be 47 points or fewer.

If the final combined score is 48 points or more, the over bets win. If it is 47 points or fewer, the under bets win.

### schedule_week
In this data set, the schedule week is usually a number. However, superbowl games are labeled as "Superbowl" in the schedule_week column.