# Debugging

>With reference to Voltaire's famous novel *Candide*, our authors state "[A]nnoingly, our programs don't always function properly the first time we run them." (p. 85).

This might be considered an extreme understatement, especially when we are new to programming or even if we are experienced, when we are learning a new language or tackling a new problem. In this module, we will be learning some tips and tricks to help us find and fix errors in our programs. Going back to the early days of programming, these errors are referred to as "bugs" and finding and fixing them is usually referred to as "debugging." 

Tools have been developed to facilitate debugging; we will play with some of these tools for Python. We can design our programs in a way that makes them easier to debug. We can use systematic search heuristics to find the errors. Heuristics is a key word here, and we will make frequent references to Polya's method in this module.

## First Steps

### Get the Language (Python) to Agree to Run the Program

* Eliminate syntactical and static semantic errors

### [PyLint](https://www.pylint.org/)

One tool that I find invaluable in this regards is [PyLint](https://www.pylint.org/). PyLint identifies syntactic errors in your code before you run it. Unfortunately, there does not seem to currently be a way to integrate PyLint into the Jupyter notebooks. To illustrate how it works, consider the code

```Python
for i in rang(1,0):
    print i
```

When we type this code in a Jupyter cell, we don't see anything unique about the typesetting. However, if we type that code into an editor that has integration with PyLint (e.g. I have PyLint integrated into my vim editors), we see an error highlighted with a squiggly, red underline. If I click on that underlining, the editor (via PyLint) tells me "Undefined name 'rang'".

This visual highlighting of errors can be very valuable.

In [None]:
for i in rang(1,0):
    print i


![vim_lint_demo](./vim_lint.png)


#### What about `print i`?

Why doesn't the editor complain about `print i`? While this syntax is an error in Python 3, it is not an error in Python 2. 

#### What about `range(1,0)`?

My specifying the arguments for `range` as `1,0` is almost surely an error, because it produces an empty list (iterator), and so I will never enter the loop. This is, however, perfectly fine Python syntax. But it is an error in the semantics of my program.

In [None]:
for i in range(1,0):
    print(i)

## Debugging

### Types of runtime bugs

#### Overt/Covert

* **Overt errors** are obvious because they result in something obvious like the program crashing or never ending
* **Covert errors** do not result in something drastic. They simply produce wrong results.

    
#### Persistent/Intermittent

* **Persistent errors** occur every time a program runs.
* **Intermittent errors** only occur sometimes
    * Difficult to address
    
### What have I found to be difficult to debug?

* Programs with user interactions, particularly graphical user interfaces
* Bugs that occur after a long chain of necessary computations.

## Learning to Debug

>Debugging is a learned skill. Nobody does it will instinctively. The good news is that it's not hard to learn, and it is a transferable skill. (p. 94)

### There is more good news!

You will have lots of opportunity to learn how to debug because you will be making lots of mistakes throughout this class!

### Debugging Tools

>For at least four decades people have been building tools called debuggers, and there are debugging tools built into all of the popular Python IDE's. These are supposed to help people find bugs in their programs. They can help, but only a little. What's more important is how you approach the problem. Many experienced programmers don't even bother with debugging tools. **Most programmers say that the most important debugging tool is the `print` statement.** (p. 94)


* Python comes with a debugging module known as the **p**ython **d**e**b**ugger ([pdb](https://docs.python.org/3/library/pdb.html))
* IPython has its own debugging package known as [ipdb](https://pypi.python.org/pypi/ipdb).

The function `euclid` is supposed to compute the greatest common divisor of two positive integers. The algorithm repeatedly subtracts the smaller number from the larger until the two numbers are equal. However, as implemented it fails to terminate if our two arguments (initial numbers) are not equal. Feel free to run it, but you'll have to interrupt the cell by hitting the stop button above.


```Python
def euclid(x,y):
    while x != y:
        if x < y:
            tmp = x
            x = y
            y = tmp
        y = y - x
    return x
```

In [None]:
def euclid(x,y):
    while x != y:
        if x < y:
            tmp = x
            x = y
            y = tmp
        y = y - x
    return x

In [None]:
euclid(15,15)

In [None]:
euclid(15, 27)

### Polya's Method and Debugging

1. What is the problem?
2. Devise a plan
3. Execute the plan
4. Look back

#### 1. What is the problem?

`euclid` fails to terminate if our initial two numbers are not equal

#### 2. Devise a Plan

The equality comparison is working correctly, so the problem must be in the body of the while loop.

Let's put a `print(x,y)` statement somewhere inside of the `while` loop. Since I know the loop isn't going to terminate, I'm also going to put an `input` call in the loop, to make it easier to break out of the loop. I will test it with two numbers that are not equal (e.g. 27 and 15).

#### 3. Execute the plan

In [None]:
def euclid(x,y):
    while x != y:
        if x < y:
            tmp = x
            x = y
            y = tmp
        y = y - x
        print(x,y)
        input('continue')
    return x

In [None]:
euclid(27,15)

#### 4. Look Back

My print statement shows me that I ended up with a negative number when I should only have positive numbers.

I've refined my problem. That is I have a new problem. Let me step through Polya's method.

#### 1. Understand the problem 
`x` never changes, but `y` becomes ever more negative, so equality is never reached.

#### 2. Devise a plan

One line 7 I've messed up the subtraction. It should be `x = y - x` since `x` is the larger number. If I change this line, I think the program will work properly.

#### 3. Execute the plan

### Using the built in debuggers [pdb](https://docs.python.org/3/library/pdb.html)/ipdb

#### Import commands

* **p**: print
* **c**: continue
* **s**: step (into)
* **n**: next (over)
* **u**: go up the call stack
* **d**: go down the call stack

In [None]:
# !conda install ipdb -y
# pip install ipdb

In [None]:
import ipdb
def euclid(x, y):
    ipdb.set_trace()
    if x == y:
        return x
    if x > y:
        return euclid(y - x, x)
    else:
        return euclid(x - y, y)


In [None]:
euclid(27,100)

### Example: Debugging a Pandas Application

Here is a code snippet that I previously wrote to read a table from a website using `read_html`

```Python
mortality = pd.read_html("https://www.ssa.gov/oact/STATS/table4c6.html", 
                         skiprows=4, 
                         header=None)[0]
mortality = mortality.iloc[0:120,[1,4]]
mortality.rename(columns=dict(zip(mortality.columns, 
                                  ("Male prob. death", 
                                   "Female prob. death"))),inplace=True)
mortality.head()
```
#### What is this code doing?

* Opening the website. 
* Skipping four rows
* Assume there is no header
* Take the first DataFrame that results from this read
* Keep the first 120 rows and the 2nd and 4th columns
* Rename two columns we kept

What I expect as output is this:
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Male prob. death</th>
      <th>Female prob. death</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.006322</td>
      <td>0.005313</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.000396</td>
      <td>0.000346</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.000282</td>
      <td>0.000221</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.000212</td>
      <td>0.000162</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.000186</td>
      <td>0.000131</td>
    </tr>
  </tbody>
</table>

#### What do I actually get?


In [None]:
import pandas as pd
from IPython.display import display, HTML

mortality = pd.read_html(
    "https://www.ssa.gov/oact/STATS/table4c6.html", skiprows=4, header=None)[0]
mortality = mortality.iloc[0:120, [1, 4]]
mortality.rename(
    columns=dict(
        zip(mortality.columns, ("Male prob. death", "Female prob. death"))),
    inplace=True)
mortality.head()

#### My column renaming has failed!
#### What are my potential debugging steps?

In [None]:
mortality = pd.read_html(
    "https://www.ssa.gov/oact/STATS/table4c6.html", skiprows=4, header=None)[0]
display(mortality.head())

In [None]:
mortality = mortality.iloc[0:120, [1, 4]]
mortality.head()

In [None]:
mortality.rename(
    columns=dict(
        zip(mortality.columns, ("Male prob. death", "Female prob. death"))),
    inplace=True)
mortality.head()

In [None]:
dict(list(zip(mortality.columns, ("Male prob. death", "Female prob. death"))))

In [None]:
def increment_study(participants, mortality, unit=10):
    delta = 365/unit
    mkeys = {"M":"Male prob. death", "F":"Female prob. death"}
    for p in participants:
        p.increment_study_time(relativedelta(days=+unit))
        if random.random()< mortality.iloc[p.age["years"]][mkeys[p.sex]]/delta:
            p.dies()
    return None
while True:
    living = [p for p in participants if not p.deceased]
    if len(living)%200 == 0:
        print(len(living))
    if not living:
        break
    increment_study(living, mortality)

In [None]:
mortality.columns

In [None]:
mortality = pd.read_html("https://www.ssa.gov/oact/STATS/table4c6.html", 
                         skiprows=4, 
                         tupleize_cols=True,
                         header=None)[0]
mortality = mortality.iloc[0:120,[1,4]]
mortality.rename(columns=dict(zip(mortality.columns, 
                                  ("Male prob. death", 
                                   "Female prob. death"))),inplace=True)
mortality.head()
