# Sports Analytics in Python

## Richard Demsyn-Jones

Analyst at Google and in spare times looked into evaluating quality in hockey/taking advantage of friends in fantasy hockey. Writings at oddacious.github.io

*Sports Analytics* is the application of quantitative methods to the realm of sports. Surprisingly broad. Sports analytics are great if you live and breathe sports, or if you just get really passionate about sports.


### Why Python?

Not always Python: some Python, some R, some stata, some sas, and even some perl.

But why python? It has a great community, easy coding for fast insights, awesome packages, growing and *consistent* data analysis codebase... and whilst your data will be ugly your code doesn't have to be. 

So with Python you can use stuff like: mechanize, urllib2, lxml, json, re, BeautifulSoup, CSV, SQLite, pandas, jupyter, matplotlib, sklearn.

### Where do you get your data from?

We will particularly consider hockey. Have an occurrence of *events* (shots, goals, faceoffs etc) and know when and where they happened. We also know who was on the ice at any given time. But we don't know what happens between events or where people are at any point. 

Data might come from a JSON API, or parsable XML, or interactive webpages (might need to use mechanize), but not persistent player/ball tracking (which is present in basketball).


### Case Study 1: Quantifying Goaltenders

Who is the best goaltender? It's tough to separate the goalie from the team effort and we have a small sample of goals. Need to adjust for randomness and information we can identify (offense, number of shots at them, quality of those shots).

Can divide into low/medium/high difficulty shots and adjust appropriately. But this isn't enough: there's high correlation between goalies and their backup which suggests that team effects are not totally isolated (or alternatively the teams who find the best starters also find the best backups).

Also issue of small sample size. For evaluating playoff performance there is only a maximum of 28 games to get data from. Turns out the worst goalies played the least and the best goalies played the best.

To combat this there's a couple of approaches. Bayesian updating of binomail data: take the priors of shots and goals an adjust with saves and shots (effectively consider the goalies that haven't played much as 'average' and the more a goalie plays the less they start as 'average'). Or we can use empirical distribution to establish variance: make up data by taking out blocks of continuous games from the regular season and generate streaks.





### Case Study 2: Fantasy Hockey

You pick players and if they do well you win... 

Figure out your problem domain: work out what really matters, what restrictions are on your team, how the scoring works, the key dynamics of the gameplay etc.

Predict what matters: predict every stat for every player through automatic LASSO models (sklearn - so could also build Random Forest classifiers etc).

Develop an optimization criteria: team average across categories, standardized, weighted by predicatbility of the stat and diminishing in distance from average.

Develop an optimization strategy: think in terms of Bayesian Nash Equilibrium, and model how my opponents draft.


# AI Planning with Graphplan in Python

## Alex Kehayias

From fr8.guru, previously CTO at Shareablee. @alexkehayias, github.com/alexkehayias

### Automated planning and scheduling

Artificial intelligence to pick optimal steps and the sequence.

Example... Want to **Get Stuff Done**. Need to get code written amongst certain constraints such as needing caffeine or things being quiet. Want to ask the computer what sequence of events need to be completed to get things done.


### Graphplan

General purpose propositional planning algorithm. Achieves speed by reducing the search space. Guarantees shortest possible plan or there is no valid plan and finds ordering independence for potentially parallel actions. 

Need to have *propositions*: possible states (dressed, caffeinated).

Need to have *actions*: assert preconditions and return postconditions (effects).

Also have the concept of *no op actions*: an action that maintains a proposition (so precondition == postcondition).

Then can build out the *planning graph*: explaining sequences of actions according to propositions that have already happened. 




### Building out the Graph

We have to start with some initial conditions. The next layer is the actions that are supported by the current set of propositions. This generates the next layer of propositions (postconditions).

To expand we generate the next action and proposition layer. Now we find *mutexes* (things that are mutually exclusive) which is finding where we can prune actions.

An action mutex can be due to competing needs (propositions that need two negated actions), inconsistent effects (effect of one is negation of the other effect), or interference (one action deletes the precondition of the other action).

A proposition mutex can be due to negation (if two propositions are negations) or inconsistent support (all actions to achieve the proposition in the previous level are pairwise mutex).


### Searching for valid plans

Need to see if all the goal conditions are present in the last proposition layer and make sure all goal conditions are not mutex.

Then build out subgoals and keep checking mutexes as you backtrack.

Can extend the algorithm by adding distance based goal ordering, jump back search, persisting failures, and forward search.