## Testing

_Optimization is the route to all evil_

_Getting right first and fast then_ by D. Knuth. AKA "Get it right first, then make it fast".

How do we do this? 

Useful links:
- We'll follow the concise [Software Carpentry Testing Tutorial](http://carpentries-incubator.github.io/python-testing/) authored by [Dr. Katy Huff](http://katyhuff.github.io). 
- Also [this Dr. Katy Huff](https://www.energy.gov/ne/person/dr-kathryn-huff).
- [The First Notebook War](https://yihui.org/en/2018/09/notebook-war/)

````{note}
There isn’t a clear borderline between software engineers and data analysts.
 
How would you write unit tests for data analysis? I feel it will be both tricky and unnecessary. For a function/method, if you defined it, you know what its expected output should be. For data, you often don’t know what exactly to expect in the output. For example, when you subset a dataset, how do you know the result is correct?
```R
mtcars2 = dplyr::filter(mtcars, hp > 100)
```
That is probably not something you, as a data analyst, need to worry about. It is the responsibility of the package author (the software engineer) to write enough unit tests in the package that you are using.

On the other hand, data analysts often do tests in an informal way, too. As they explore the data, they may draw plots or create summary tables, in which they may be able to discover problems (e.g., wrong categories, outliers, and so on). Notebooks are great for these inline output elements, from which you can make quick discoveries.
````

### Simple motivation: chaotic systems and numerical precision

We are going to play with the notebook `simple-numerical-chaos.ipynb`. Consider for example the following operation

In [1]:
a, b, c = 1.0, 1e-16, 1e-16
print(f"(a + b) + c = {(a + b) + c}")
print(f"a + (b + c) = {a + (b + c)}")

(a + b) + c = 1.0
a + (b + c) = 1.0000000000000002


The problem here is caused by rounding of floating point numbers. A good reference for this is included in [What Every Computer Scientist Should Know About Floating-Point Arithmetic](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html).

In Python, we can use a different standard for floting point numbers with the module [decimal](https://docs.python.org/3/library/decimal.html). This is particularly useful for real world cases where small number operations may be critical, for example when making tons of bank transactions. 

In [3]:
def f1(x): return r*x*(1-x)
def f2(x): return r*x - r*x**2

r = 3.9
x = 0.8
print('f1:', f1(x))
print('f2:', f2(x))

print('difference:', (f1(x)-f2(x)))

f1: 0.6239999999999999
f2: 0.6239999999999997
difference: 2.220446049250313e-16


Now, the decimal digits of the difference are just garbage: eirher `f1(x)` or `f2(x)` have no information after the last digit. 

Now, this raises the question about what does it mean to get the _right answer_ from our code and _what does it mean to be reproducible_
in scientific computing.

This short example help us to undersrand what is important in the context of computational 

## We are always testing 

Scientist, studetents, we are always doing test of our code and our methods. We do this in a subtle way, by printing an output, making a plot, etc. All these are quite similar to unit tests. 

Exploratory data analysis

Testing also help us for creating a to do list when we make significant changes in the code (Fernando's example from numerics to numpy). 

### Simple testing

There is a functin in Python that help us to see how much of our current code is under testing right now. 

### Writing a test suite

Super simple: pytest just finds all the files that start with `test_` and run them. These need to be argumentless and return a bolean variable.

Notice that the output of pytest is quite inteligent: it tells you where the error has been induced and also prints the values of the variables that induce the error. 

Forcing to write test force you to write good code. 

`try`/`except` vs `if` statement.