##  Stats Quality for 2016 D-I College Nationals

As one of the biggest tournaments hosted by USAU, the D-I College Nationals is one of the few tournaments where player statistics are relatively reliably tracked. For each tournament game, each player's aggregate scores, assists, Ds, and turns are counted, although the definition of a "D" or a "Turn" could depend on the stat-keeper.

Data below was scraped from the [USAU website](http://play.usaultimate.org/events/USA-Ultimate-D-I-College-Championships/). First we'll set up some imports to be able to load this data.

In [1]:
import usau.reports
import usau.fantasy

In [2]:
from IPython.display import display, HTML
import pandas as pd
pd.options.display.width = 200
pd.options.display.max_colwidth = 200
pd.options.display.max_columns = 200

In [3]:
def display_url_column(df):
  """Helper for formatting url links"""
  df.url = df.url.apply(lambda url: "<a href='{base}{url}'>Match Report Link</a>"
                        .format(base=usau.reports.USAUResults.BASE_URL, url=url))
  display(HTML(df.to_html(escape=False)))

Since we should already have the data downloaded as csv files in this repository, we will not need to re-scrape the data. Omit this cell to directly download from the USAU website (may be slow).

In [4]:
# Read data from csv files
usau.reports.d1_college_nats_men_2016.load_from_csvs()
usau.reports.d1_college_nats_women_2016.load_from_csvs()

<usau.reports.USAUResults at 0x3ddfb10>

Let's take a look at the games for which the sum of the player goals/assists is less than the final score of the game:

In [5]:
display_url_column(pd.concat([usau.reports.d1_college_nats_men_2016.missing_tallies,
                              usau.reports.d1_college_nats_women_2016.missing_tallies])
                   [["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])

Unnamed: 0,Score,Gs,As,Ds,Ts,Team,Opponent,url
0,11,9,10,5,13,Massachusetts,Georgia,Match Report Link
14,14,14,12,6,8,Massachusetts,Washington,Match Report Link
17,15,15,14,11,10,Texas A&M,Georgia,Match Report Link
18,15,14,13,5,2,Washington,Cal Poly-SLO,Match Report Link
22,16,15,15,9,10,North Carolina,Florida State,Match Report Link
23,14,13,14,5,12,Florida State,North Carolina,Match Report Link
33,15,14,15,8,5,Case Western Reserve,North Carolina,Match Report Link
34,15,14,15,2,5,Florida State,Case Western Reserve,Match Report Link
37,15,14,15,6,11,Colorado,North Carolina,Match Report Link
38,15,14,15,4,6,Oregon,Florida State,Match Report Link


All in all, not too bad! A few of the women's consolation games are missing player statistics, and there are several other games for which a couple of goals or assists were missed. For missing assists, it is technically possible that there were one or more callahans scored in those game, but obviously that's not the case with all ~14 missing assists. Surprisingly, there were 10 more assists recorded by the statkeepers than goals; I would have guessed that assists would be harder to keep track. 

Turns and Ds are the other stats available. In past tournaments these haven't been tracked very well, but actually there was only one game where no Turns or Ds were recorded:

In [6]:
men_matches = usau.reports.d1_college_nats_men_2016.match_results
women_matches = usau.reports.d1_college_nats_women_2016.match_results
display_url_column(pd.concat([men_matches[(men_matches.Ts == 0) & (men_matches.Gs > 0)],
                              women_matches[(women_matches.Ts == 0) & (women_matches.Gs > 0)]])
                   [["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])

Unnamed: 0,Score,Gs,As,Ds,Ts,Team,Opponent,url
48,15,15,15,9,0,Stanford,Southern California,Match Report Link
49,4,4,4,4,0,Southern California,Stanford,Match Report Link


This implies that there was a pretty good effort made to keep up with counting turns and Ds. By contrast, see how many teams did not keep track of Ds and turns last year!

In [7]:
# Read last year's data from csv files
usau.reports.d1_college_nats_men_2015.load_from_csvs()
usau.reports.d1_college_nats_women_2015.load_from_csvs()
display_url_column(pd.concat([usau.reports.d1_college_nats_men_2015.missing_tallies,
                              usau.reports.d1_college_nats_women_2015.missing_tallies])
                   [["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])

Unnamed: 0,Score,Gs,As,Ds,Ts,Team,Opponent,url
4,10,9,9,0,0,Georgia,Texas,Match Report Link
7,9,5,9,0,0,Auburn,Wisconsin,Match Report Link
8,15,14,13,0,0,Pittsburgh,Auburn,Match Report Link
9,4,3,2,0,0,Auburn,Pittsburgh,Match Report Link
13,12,10,12,0,0,Auburn,Georgia,Match Report Link
20,14,13,13,0,0,Texas A&M,Cincinnati,Match Report Link
22,15,14,14,0,0,Central Florida,Western Washington,Match Report Link
23,11,8,10,0,0,Western Washington,Central Florida,Match Report Link
27,11,11,8,0,0,Western Washington,Minnesota,Match Report Link
29,12,10,11,0,0,Cincinnati,Minnesota,Match Report Link
