# Introduction to Data Science, CS 5963 / Math 3900
## Lab 15: Ranking in Practice 

We look at rating/ranking college football teams. 

There are a lot of different methods for rating/ranking. Here is a webpage with a comparison:
[masseyratings.com](http://www.masseyratings.com/cf/compare.htm)

## College Football Primer

National Collegiate Athletic Association (NCAA) College Football is divided into two subdivisions: 
- Football Bowl Subdivision (FBS), formerly  Division I-A
- Football Championship Subdivision (FCS), formerly Division I-AA

Our goal will be to rank the 128 teams in the FBS. 

The FBS is further divided into 11 conferences, some of which have sub-divisions. The University of Utah is in the South Division of the Pacific 12 (Pac-12) Conference. 

      A.  American Athletic Conference
           i) East Division
                Central Florida
                Cincinnati
                Connecticut
                East Carolina
                South Florida
                Temple
          ii) West Division
                Houston
                Memphis
                Navy
                SMU
                Tulane
                Tulsa
      B.  Atlantic Coast Conference       
           i) Atlantic Division
                Boston College
                Clemson
                Florida St
                Louisville
                North Carolina St
                Syracuse
                Wake Forest
         ii) Coastal Division
                Duke
                Georgia Tech
                Miami FL
                North Carolina
                Pittsburgh
                Virginia
                Virginia Tech       
      C.  Big 10 Conference
           i) East Division
                Indiana
                Maryland
                Michigan
                Michigan St
                Ohio State
                Penn State
                Rutgers
         ii) West Division
                Illinois
                Iowa
                Minnesota
                Nebraska
                Northwestern
                Purdue
                Wisconsin      
      D.  Big 12 Conference
            Baylor
            Iowa St
            Kansas
            Kansas St
            Oklahoma
            Oklahoma St
            Texas
            TCU 
            Texas Tech
            West Virginia
      E.  Conference USA
           i) East Division
                Florida Atlantic
                Florida Int'l
                Marshall
                Middle Tennessee St
                UNC-Charlotte
                Old Dominion
                Western Kentucky
          ii) West Division
                Louisiana Tech
                North Texas
                Rice
                Southern Miss
                Texas-San Antonio
                UTEP                   
      F.  Mid-American Conference
           i) East Division
                Akron
                Bowling Green
                Buffalo
                Kent St
                Miami OH
                Ohio U.
          ii) West Division
                Ball St
                Central Michigan
                Eastern Michigan
                Northern Illinois
                Toledo
                Western Michigan
      G.  Mountain West Conference
           i) Mountain Division
                Air Force
                Boise St
                Colorado St
                New Mexico
                Utah St
                Wyoming
          ii) West Division
                Fresno St
                Hawai`i
                Nevada
                San Diego St
                San José St
                UNLV      
      H.  Pacific 12 Conference
           i) North Division
                California
                Oregon
                Oregon St
                Stanford
                Washington
                Washington St
          ii) South Division
                Arizona
                Arizona St
                Colorado
                Southern Cal
                UCLA  
                Utah        
      I.  Southeastern Conference
           i) Eastern Division
                Florida
                Georgia
                Kentucky
                Missouri
                South Carolina
                Tennessee
                Vanderbilt
          ii) Western Division
                Alabama
                Arkansas
                Auburn
                LSU
                Mississippi
                Mississippi St
                Texas A&M
      J.  Sun Belt Conference
            Appalachian St
            Arkansas St
            Georgia Southern
            Georgia St
            Idaho
            Louisiana-Lafayette
            Louisiana-Monroe
            New Mexico St
            South Alabama
            Texas St-San Marcos
            Troy
      K.  Division I FBS Independents
            Army
            Brigham Young
            Massachusetts
            Notre Dame  

More conference information available [here](http://prwolfe.bol.ucla.edu/cfootball/conferences.htm). 

## Download data

We download the 2016 College Football game results from 
[this website](http://masseyratings.com/scores.php?s=286577&sub=286577&all=1). So far, there have been 4206 games. 



In [None]:
# imports and setup
import numpy as np
import pandas as pd
import networkx as nx

from bs4 import BeautifulSoup
import urllib.request
from io import StringIO

import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (15, 9)
plt.style.use('ggplot')


### First get a list of team names

In [None]:
url = "http://masseyratings.com/scores.php?s=286577&sub=286577&all=1&mode=3&format=2"
with urllib.request.urlopen(url) as response:
   html = response.read()
soup = BeautifulSoup(html, 'html.parser')

print(soup)

### Exercise: load the team names into a Pandas dataframe
Call the Pandas dataframe 'teams' and name the column with the team names 'team'

**Hint**: you might use the BeautifulSoup command *get_text()* and the string command split()

In [None]:
# your code here

num_teams = teams.shape[0]
print(num_teams)

print(teams)

In [None]:
# where is Utah in the Pandas series teams? 
teams[teams['team']=='Utah'].index.tolist()

In [None]:
teams.loc[823]

### Get the game results

In [None]:
url = "http://masseyratings.com/scores.php?s=286577&sub=286577&all=1&mode=3&format=1"
with urllib.request.urlopen(url) as response:
   html = response.read()
soup = BeautifulSoup(html, 'html.parser')

soup

In [None]:
soup_text = soup.get_text()
df = pd.read_csv(StringIO(soup_text),names=['id','date','team1','homefield1','score1','team2','homefield2','score2'])

num_games = df.shape[0]
print(num_games)

df.head()

### Clean the data

In [None]:
# add a new column with team names
df.drop(['homefield1','homefield2'],inplace=True,axis=1)
df.insert(3, 'team_name1', df['team1'].map(lambda x: teams['team'][x-1]))
df.insert(6, 'team_name2', df['team2'].map(lambda x: teams['team'][x-1]))
df.head()


## Consider only Pac 12 teams

In [None]:
P12 = ['California', 'Oregon', 'Oregon_St', 'Stanford', 'Washington', 'Washington_St', 
    'Arizona', 'Arizona_St', 'Colorado', 'USC', 'UCLA', 'Utah'] 
num_P12_teams = len(P12)

# get PAC12 teams from teams
P12_ind = teams[teams['team'].isin(P12)].index.tolist()  
P12_teams = teams.loc[P12_ind]

# assign a new ordering for teams
P12_teams['P12_ind'] = np.arange(num_P12_teams)
P12_teams['global_ind'] = P12_teams.index
P12_teams.set_index('P12_ind',inplace=True)

P12_teams

### Exercise: Get the PAC12 games
Make a new dataframe called *P12_df*, with game results from the dataframe *df* in which the teams were both in the PAC12.

In [None]:
# your code here


num_P12_games = P12_df.shape[0]
print(num_P12_games)
print(P12_df)


## Use the Least Squares method to construct a rating
See Lecture 16. 

We first construct the pairwise comparisons, $y_{i,j}$ defined by
$$
y_{i,j} = \frac{\text{points team $j$ scored - points team $i$ scored}}{\text{total points in game}}. 
$$


In [None]:
P12_df['y'] = (P12_df['score1'] - P12_df['score2']) / (P12_df['score1'] + P12_df['score2'])
y = P12_df['y'].tolist()
P12_df

Construct the arc-vertex incidence matrix
$$
B_{k,j} = \begin{cases}
1 & j = \textrm{head}(k) \\
-1 & j = \textrm{tail}(k) \\
0 & \textrm{otherwise}. 
\end{cases}
$$
This just keeps track of which teams played in each game. 

In [None]:
# first we need to reorder the teams in the PAC12 ordering

print(P12_teams['global_ind'].tolist())

glob_P12_dict = {j:i for i,j in enumerate(P12_teams['global_ind'].tolist())}
print(glob_P12_dict)

In [None]:
# construct B

B = np.zeros((num_P12_games, num_P12_teams))

for ii,g in enumerate(P12_df.index):
    team1_global_ind = P12_df['team1'][g]
    team1_P12_ind = glob_P12_dict[team1_global_ind-1]    
    B[ii,team1_P12_ind] = 1

    team2_global_ind = P12_df['team2'][g]
    team2_P12_ind = glob_P12_dict[team2_global_ind-1]    
    B[ii,team2_P12_ind] = -1


In [None]:
# now we have enough information just to print the  game results 
for i,sc in enumerate(y):
    head = np.where(B[i,:]==1)[0][0]
    tail = np.where(B[i,:]==-1)[0][0]
    print(P12_teams['team'][head] + ' vs. ' + P12_teams['team'][tail] + ': ' +str(sc))

We now use the *lstsq* function in the np.linalg library to find the least squares rating, solving the least squares problem, 
$$
\min_{\phi} \ \| B \phi - y \|^2. 
$$

In [None]:
phi = np.linalg.lstsq(B,y,rcond=.1)[0]
print(phi)

In [None]:
P12_teams['rating'] = phi
print(P12_teams)

## Sort the ratings to generate a ranking


In [None]:
P12_rankings = P12_teams.sort_values('rating', axis=0, ascending=False)
P12_rankings['ranking'] = np.arange(1,num_P12_teams+1)
P12_rankings.set_index('ranking',inplace=True)
P12_rankings.drop('global_ind',axis=1,inplace=True)
print(P12_rankings)

### Exercise: discuss the results
Compare against the PAC 12 rankings [here](https://pac-12.com/football/standings).

## Visualize the schedule

In [None]:
# make graph
Lap = np.dot(np.transpose(B),B)
adj = -Lap + np.diag(np.diag(Lap))
game_graph = nx.from_numpy_matrix(adj)

# Calculate the layout positions first
pos = nx.spring_layout(game_graph)

# labeling needs a dictionary
label_dict = {i:j for i,j in enumerate(P12_teams['team'].tolist())}

# draw graph
nx.draw_networkx(game_graph, pos=pos, node_size=3000, labels = label_dict, node_shape='s')
plt.show()
