With reference to Voltaire's famous novel *Candide*, our authors state "[A]nnoingly, our programs don't always function properly the first time we run them." (p. 85).

This might be considered an extreme understatement, especially when we are new to programming or even if we are experienced, when we are learning a new language or tackling a new problem. In this module, we will be learning some tips and tricks to help us find and fix errors in our programs. Going back to the early days of programming, these errors are referred to as "bugs" and finding and fixing them is usually referred to as "debugging." 

Tools have been developed to facilitate debugging; we will play with some of these tools for Python. We can design our programs in a way that makes them easier to debug. We can use systematic search heuristics to find the errors. Heuristics is a key word here, and we will make frequent references to Polya's method in this module.

## First Steps

### Get the Language (Python) to Agree to Run the Program

* Eliminate syntactical and static semantic errors

### [PyLint](https://www.pylint.org/)


In [None]:
for i in rang(1,0):
    print i

![vim_lint_demo](./vim_lint.png)

While the notebook has Python aware syntax highlighting, it doesn't alert me to my typo of `rang` instead of `range`. When using my VIM editor with PyLint support, however, the editor highlights `rang` and tells me that it is an undefined name. This visual highlighting of errors can be very valuable.

#### What about `print i`?

Why doesn't the editor complain about `print i`? While this syntax is an error in Python 3, it is not an error in Python 2.

## Debugging

### Types of runtime bugs

#### Overt/Covert

* **Overt errors** are obvious because they result in something obvious like the program crashing or never ending
* **Covert errors** do not result in something drastic. They simply produce wrong results.
    * The bug in `isPrime` defined above is a covert error: The function runs nicely for all integer input, but produces the wrong value for one input value?
    
#### Persistent/Intermittent

* **Persistent errors** occur every time a program runs.
* **Intermittent errors** only occur sometimes
    * Difficult to address
    
### What have I found to be difficult to debug?

* Programs with user interactions, particularly graphical user interfaces
* Bugs that occur after a long chain of necessary computations.

## Learning to Debug

>Debugging is a learned skill. Nobody does it will instinctively. The good news is that it's not hard to learn, and it is a transferable skill. (p. 94)

### There is more good news!

You will have lots of opportunity to learn how to debug because you will be making lots of mistakes throughout this class!

### Debugging Tools

* Python comes with a debugging module known as the **p**ython **d**e**b**ugger ([pdb](https://docs.python.org/3/library/pdb.html))
* IPython has its own debugging package known as [ipdb](https://pypi.python.org/pypi/ipdb).

The function `euclid` is supposed to compute the greatest common denominator of two positive integers. However, it fails to terminate.

```Python
def euclid(x,y):
    if x == y:
        return x
    if x > y:
        return euclid(y-x,x)
    else:
        return euclid(x-y,y)
```

In [None]:
def euclid(x, y):
    if x == y:
        return x
    if x > y:
        return euclid(y - x, x)
    else:
        return euclid(x - y, y)

### Example: Debugging a Pandas Application

Here is a code snippet that I previously wrote to read a table from a website using `read_html`

```Python
mortality = pd.read_html("https://www.ssa.gov/oact/STATS/table4c6.html", 
                         skiprows=4, 
                         header=None)[0]
mortality = mortality.iloc[0:120,[1,4]]
mortality.rename(columns=dict(zip(mortality.columns, 
                                  ("Male prob. death", 
                                   "Female prob. death"))),inplace=True)
mortality.head()
```
#### What is this code doing?

* Opening the website. 
* Skipping four rows
* Assume there is no header
* Take the first DataFrame that results from this read
* Keep the first 120 rows and the 2nd and 4th columns
* Rename two columns we kept

What I expect as output is this:
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Male prob. death</th>
      <th>Female prob. death</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>0.006322</td>
      <td>0.005313</td>
    </tr>
    <tr>
      <th>1</th>
      <td>0.000396</td>
      <td>0.000346</td>
    </tr>
    <tr>
      <th>2</th>
      <td>0.000282</td>
      <td>0.000221</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.000212</td>
      <td>0.000162</td>
    </tr>
    <tr>
      <th>4</th>
      <td>0.000186</td>
      <td>0.000131</td>
    </tr>
  </tbody>
</table>

#### What do I actually get?


In [2]:
import pandas as pd
from IPython.display import display, HTML

mortality = pd.read_html(
    "https://www.ssa.gov/oact/STATS/table4c6.html", skiprows=4, header=None)[0]
mortality = mortality.iloc[0:120, [1, 4]]
mortality.rename(
    columns=dict(
        zip(mortality.columns, ("Male prob. death", "Female prob. death"))),
    inplace=True)
mortality.head()

Unnamed: 0_level_0,Male,Unnamed: 4_level_0
Unnamed: 0_level_1,Number of lives b,Number of lives b
0,0.006322,0.005313
1,0.000396,0.000346
2,0.000282,0.000221
3,0.000212,0.000162
4,0.000186,0.000131


#### My column renaming has failed!
#### What are my potential debugging steps?

In [None]:
def increment_study(participants, mortality, unit=10):
    delta = 365/unit
    mkeys = {"M":"Male prob. death", "F":"Female prob. death"}
    for p in participants:
        p.increment_study_time(relativedelta(days=+unit))
        if random.random()< mortality.iloc[p.age["years"]][mkeys[p.sex]]/delta:
            p.dies()
    return None
while True:
    living = [p for p in participants if not p.deceased]
    if len(living)%200 == 0:
        print(len(living))
    if not living:
        break
    increment_study(living, mortality)



  self.options, self.engine = self._clean_options(options, engine)


Unnamed: 0,Male prob. death,Female prob. death
0,0.006322,0.005313
1,0.000396,0.000346
2,0.000282,0.000221
3,0.000212,0.000162
4,0.000186,0.000131
