# Best Practices for Debugging
**Instructor:** Kristian Rother

* One of the most basic and over-used debugging techniques: `print()` $\Leftarrow$ guilty as charged
* "Using `print` for debugging is like shooting holes in a building to see if a fire is inside."

#### Some basic debugging methods:
1) Read error message

2) `print()`

3) Make a hypothesis: 
   * When you know the expected results
   * When you have some reproducible way of checking whether your program executes correctly
   * Test/check could be manual or automatic
   * **Important:** Keep some sort of notes to describe what the problem actually is

4) Talk to someone (rubber duck debugging)

5) Clean up code 
   * Easier to spot bugs
    
6) Assert statements

7) Interactive debugger(s)
   * `pdb`
   * `ipdb`

### `pdb`
* Start pdb by inserting `import pdb; pbd.set_trace()` into code.py; run in command line and a `pbd` session will automatically be created



* `l` : we see a portion of the program/code plus a small error that tells us where we are at this very moment; the arrow points to the line that would be executed next

* `n` : for next; executes the next line; in this way we can execute our program line by line

* `s` : cousin of `n`

* `dir()` : see what does our namespace consist of? Which variables do I have?

* `locals()` : give me all the local variables

* `b` : set a break point; a **break point** means we want the program to run until a specified point in the code and then the program stops. For example: `b 68` places a break point at line 68 (inclusive)

* `c` : continue; continue the program until it either ends or you reach a break point

* `q` : quit

* `b <line#>, cond` : conditional break points; for example: `b 69, length != sum(aa_counts)` $\Leftarrow$ this break point will only be applied if the condition evaluates as true.


* In example, `X` indicates missing data, equivalent to `NaN`

#### Linters
* `pylint`
* `pyflakes`
* points out syntax errors, other potential errors, style issues, unused variables, variables initialized more than once...
* Linters can be quite useful in preventing bugs from happening
* Anaconda and PyCharm have linters already built in (pep8 issues get highlighted)

#### Restart kernel/ container
* Anaconda
* jupyter
* Docker
* Delta debugging

#### Bisection strategy

#### Logging
* We write our program in such a way that it produces some metainformation that accompanies the output
* In addition to our primary output file, we generate some second file that allows us to see what the program is doing
* Logging in Python is super easy to use:
    * `import logging`
    * `import sys`
    * `logging.basicConfig(filename='debug.log', level=logging.WARNING)`
* You can even build a search engine on your logs if they get big enough

* **Log verbosity levels:**
    * INFO (1): most verbose; you will see everything in your log
    * DEBUG (2)
    * WARNING (3)
    * ERROR (4)
    * CRITICAL (5)

#### Summary: What we know about debugging
* Error messages in Python are not always helpful
* SyntaxErrors are when Python does not do anything
* Some errors cause a program to stop with an Exception
* Read error messages from bottom to top
* Semantic errors: the program does not do the right thing
* Errors are distinct from the underlying defects
* Defects propagate through the prrogram

# Best Data Testing Practices for Data Science
PyCon 2017: Seattle \
**Presenters:** Eric Ma and Hugo Bowne-Anderson

#### Why tests?
* We make assumptions about our code & data
* There are cases where those assumptions are violated
* Therefore, automated testing of those assumptions is important

#### Tests: A Definition
A contract between your current self and your future self; what you expect to be right now should hold true in the future; what you expect to be wrong now should still be wrong in the future, unless the requirements have changed! (Also maybe a contract between yourself and your data provider...)

#### For code, what needs to be tested?
* Given some example input(s), the output is correct.
* Counter-examples should show up as incorrect.
* Boundary cases are accounted for using defensive programming.
* All lines of stable code are subject to at least one test.