# Intro

This notebook was created to aggregate NCAA Men's Basketball rankings at a given time, across 3 of the most well regarding ranking systems/sites to date.

The notebook scrapes ranking data from the www.ncaa.com (NET), www.kenpom.com (KenPom) and www.barttorvik.com (BartTorvik).

The notebook makes use of a few .csv files used to standardize the naming convention for all 353 D1 Men's Basketball Programs. These .csv files are used to "convert" to a standard naming convention - going off of the naming convention used by the NCAA and NET. 

The notebook utilizes BeautifulSoup and Requests to scrape web data.

The notebook pulls only the overall rank from the respective sites - though the code could be modified to pull further statistics for each team on each site.

In [1]:
#https://www.ncaa.com/rankings/basketball-men/d1/ncaa-mens-basketball-net-rankings 
#https://kenpom.com/
#http://www.barttorvik.com/#

# NET

In [2]:
import pandas
import requests
import bs4


url = 'https://www.ncaa.com/rankings/basketball-men/d1/ncaa-mens-basketball-net-rankings'
header = {'user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15'}#trick Amazon 
page = requests.get(url, headers=header)
soup = bs4.BeautifulSoup(page.text, 'html5lib')
page.reason

'OK'

In [3]:
tags1 = soup.find_all('tr')

In [4]:
data = [(int(t.contents[1].text), t.contents[5].text) for t in tags1[1:]]

In [32]:
pandas.options.display.max_rows = 4000
NETtable = pandas.DataFrame(data, columns = ['Rank', 'School'])

In [34]:
NETtable

Unnamed: 0,Rank,School
0,1,San Diego St.
1,2,Kansas
2,3,Butler
3,4,Duke
4,5,Baylor
5,6,Auburn
6,7,Gonzaga
7,8,Dayton
8,9,Wichita St.
9,10,West Virginia


# KenPom

In [7]:
url = 'https://kenpom.com/'
header = {'user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15'}#trick Amazon 
page = requests.get(url, headers=header)
soup = bs4.BeautifulSoup(page.text, 'html5lib')
page.reason

'OK'

In [8]:
trs = soup.find_all('tr')
trs

tds =[tr.find_all('td') for tr in trs]

for x in tds:
    if x == []:
        tds.remove(x)
        
    if x == []:
        tds.remove(x)

In [9]:
newtd= tds[2:]

In [22]:
dataKP = [(int(t[0].string), t[1].a.string) for t in tds]
KPtable = pandas.DataFrame(dataKP, columns = ['Rank', 'School'])

In [11]:
convert = pandas.read_csv('NETtoKP.csv')

In [12]:
convert_dict = convert.set_index('KP').T.to_dict('records')[0]

In [30]:
KPtable = KPtable.replace({'School': convert_dict})

In [31]:
KPtable

Unnamed: 0,Rank,School
0,1,Duke
1,2,Kansas
2,3,Michigan St.
3,4,Ohio St.
4,5,Butler
5,6,Louisville
6,7,Dayton
7,8,Maryland
8,9,Gonzaga
9,10,Baylor


# Bart Torvik

In [14]:
url = 'http://www.barttorvik.com/#'
header = {'user-agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Safari/605.1.15'}#trick Amazon 
page = requests.get(url, headers=header)
soup = bs4.BeautifulSoup(page.text, 'html5lib')
page.reason

'OK'

In [15]:
tags2 = soup.find_all('tr',{'class':"seedrow"})

In [16]:
dataBT = [(int(tags2[t].contents[1].text), tags2[t].contents[3].text.split(u'\xa0')[0]) for t in range(len(tags2))]
BTtable = pandas.DataFrame(dataBT, columns = ['Rank', 'School'])

In [17]:
convert2 = pandas.read_csv('NETtoBT.csv')

In [18]:
convert_dict2 = convert2.set_index('BT').T.to_dict('records')[0]

In [29]:
BTtable = BTtable.replace({'School': convert_dict2})
BTtable

Unnamed: 0,Rank,School
0,1,Kansas
1,2,Duke
2,3,Dayton
3,4,Butler
4,5,Ohio St.
5,6,Michigan St.
6,7,San Diego St.
7,8,Louisville
8,9,Baylor
9,10,West Virginia


# Build final merged DataFrame with NET, KenPom and BartTorvik Rankings

In [35]:
from functools import reduce

dfs = [NETtable, KPtable, BTtable]

df_final = reduce(lambda left,right: pandas.merge(left,right,on='School'), dfs)
rankings = df_final.rename(columns={"Rank_x": "NET", "Rank_y": "KenPom", "Rank": "BartTorvik"})
rankings = rankings[['School', 'NET', 'KenPom', 'BartTorvik']]
rankings

Unnamed: 0,School,NET,KenPom,BartTorvik
0,San Diego St.,1,16,7
1,Kansas,2,2,1
2,Butler,3,5,4
3,Duke,4,1,2
4,Baylor,5,10,9
5,Auburn,6,11,18
6,Gonzaga,7,9,15
7,Dayton,8,7,3
8,Wichita St.,9,34,14
9,West Virginia,10,17,10


# Save locally as .csv

In [37]:
from datetime import datetime
current_time = datetime.now()
date = (current_time.strftime('%m-%d-%Y'))

save_name = 'CBB_Rankings_' + date + '.csv'
rankings.to_csv(save_name)