# Catch up for April 20

The world of sports remains locked down because of coronavirus. With that in mind, we can still build out the code, analysis, and trading strategy for when the NBA reopens.

I want to go over a few subjects today:

- Unit testing
- Continuous integration
- Backtesting
- Database development
- Coronavirus situation

I should probably point out that I am *not* officially a software developer, so I'm actually eager myself to chat about what your ideas are, as CS majors, regarding further development of the codebase.

## Unit Testing

We want to incorporate unit testing into our development. I am aware Clara is familiar with unit testing using JUnit in Java. With Python, I have opted to utilize pytest. Here is an example of a pytest unit test:

In [6]:
from datetime import datetime

def datetime_to_milliseconds(dt):
    root = datetime.utcfromtimestamp(0)
    return int((dt - root).total_seconds() * 1000)

def test_datetime_to_milliseconds():
    dt = datetime(2019, 1, 1)
    assert datetime_to_milliseconds(dt) == 1546300800000

This test can be executed from the command line like so:

In [10]:
!pytest ../tests/test_utils.py

platform darwin -- Python 3.7.4, pytest-5.4.1, py-1.8.1, pluggy-0.13.1
rootdir: /Users/alex/PycharmProjects/basketball
collected 1 item                                                               [0m

../tests/test_utils.py [32m.[0m[32m                                                 [100%][0m



This is obiously a very simplistic test case, but it illustrates our goal. The idea is to test a very small portion of code, then guarantee it works as expected. This can mean assessing various input and asserting correct output. Unit tests must be fast (notice that test was run in 0.04 seconds), and run often.

The benefit of unit testing may not be obvious at first and may even just seem like an unnecessary burden. But the true power of maintaining a testing codebase is for refactoring the code.

Suppose we develop a Python function - `sql_select(cols, tbl)` for example - that generates SQL code.that We write the function, create a unit test that checks the SQL output, then run the test. It passes - great. Perhaps the code inside the function is a bit complicated. We *may* want to go back and refactor that code to clean it up or generalize its core functionality. Without a unit test, you *might break the function* when you refactor it. We want to prevent this from happening

### Test Driven Development (TDD)

One pretty extreme approach to unit testing is TDD. Though a typical workflow for a developer is to write the production code first, then run a battery of tests against it, TDD turns this idea on its head - you write the test code *before any production code* and then write the actual function. The idea is to have a failing test, write the *minimal* code required to pass the test, then refactor. Here's a great example using this process for writing a bowling game in Java:

http://butunclebob.com/files/downloads/Bowling%20Game%20Kata.ppt


## Continuous Integration

I have connected the Github repo with Travis CI, a dev tool for continuous integration. The idea with continuous integration is to check and see that all the tests pass, the dependencies install correctly for a given version of Python, and perhaps that a user defined script also works as expected. I actually added a script that downloads a small amount of the data, just to be sure that the whole system is operational. Here's that script:

```
#!/bin/sh

source venv/bin/activate

mkdir data
mkdir data/games
mkdir data/boxscore
mkdir data/pbp
mkdir data/shotchart

scrapy crawl games -a year=2019
scrapy crawl shotchart -a code="201810180WAS"
scrapy crawl pbp -a code="201810180WAS"
scrapy crawl boxscore -a code="201810180WAS"
```

## Backtesting

I want to talk about the idea of backtesting. Suppose we develop a high quality machine learning model, cross validate its performance and are quite confident that it can "beat the market". We can calculate exactly how the model would have performed on past data using a backtest - a simulation of "what if" we had implemented this model and placed bets accordingly.

There are a number of tools in Python or other languages that run backtests against stock market data. In Python, there is quantopian's [zipline](https://www.zipline.io/) package, or another option called [backtrader](https://www.backtrader.com/). Another backtrading library is written in C#, [quantconnect](https://www.quantconnect.com/).

All of these libraries are for the stock market. We want to build a library that has a similar goal of evaluating previous performance, but using betting data.

## Database Development

I want to push all this data into a Postgres database. Currently, I can manage this with Scrapy, but the entire system seems rather fragile. Using the previously described unit testing and continuous integration, I believe a more robust means of developing a database is possible. If this is accomplished, there are a number of directions that the project can take - a user interface, using AWS for an RDS instance, incremental ETL loads, etc.

## Coronavirus

Everything is closed. So, with that in mind, we should probably expect *quite a while* before any actual betting can be done. Nonetheless, I want to talk about project expectations and what a timeline may look like.

Perhaps we 