In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

# Debugging


One of the most challenging things we'll need to do as a programmer is debugging. Debugging requires us to understand (sometimes cryptic) error messages and use stack trace to find where an error in our code might occur, but more crucially it requires us to have a critical understanding of what our code is doing or attempting to do and how it interacts with its environment.

Let's review some common types of errors and understand how we would debug them. We can then apply these principals more broadly.

(Note: We're going to use the magic command `%%expect_exception` to make our notebook run nicely even though we will be encountering errors. It is not important to debugging.)

In [None]:
import expectexception

## NameError

In [None]:
%%expect_exception NameError

x = 2
print xx

The error message is the last printed line. It's the first place to go to spot an error. In this case knowing that it's a 
```
NameError
```

helps give a clue to what's going on. To boot, there's the helpful message

```
name 'xx' is not defined
```

where the variable name `xx` is a dead giveaway. We never assigned a variable called `xx`, only `x`. A `NameError` usually means you have tried to reference a variable or function ([some _name_ in your _namespace_](http://sebastianraschka.com/Articles/2014_python_scope_and_namespaces.html#introduction-to-namespaces-and-scopes)) that you haven't defined.

In [None]:
# NameError is usually easy to fix
x = 2
print x # not xx

This can happen easily in Jupyter notebooks where code is spread across different cells. Make sure to execute cells in the right order, so that any object you use in a particular cell is up-to-date.

## TypeError

In [None]:
%%expect_exception TypeError

a = 10
b = "20"
a + b

Error messages are not always simple. For example, the above error reads

```
TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

However, we have additional information in the `Traceback`. Python also displays a stack trace leading to the line that triggered the error (in this case `a + b`). Looking at this section of code and the error message we realize
- `a` is an `int`
- `b` is a `str`
- Python can't add `int` to `str`

Obviously, we meant to add two integers, not an int and a string. We could imagine a few ways to fix this.

In [None]:
# get the type right in the first place
a = 10
b = 20
a + b

In [None]:
# recast the string
a = 10
b = "20"
a + int(b)

We'll often face some ambiguity with debugging, and will have to choose a solution that seems best suited to the context of our program. The first solution seems best as long as `b` being an `int` is not an issue. However, maybe we've extracted a value from user input in a text field. Then it might be natural for `b` to be a string at some stage, and we would have to recast it. When debugging, think about where objects originate in your code, and what other objects they might interact with later on.

## AttributeError

In [None]:
%%expect_exception AttributeError

import pandas as pd

df = pd.DataFrame({
  'A': [0] * 5 + [1] * 5,
  'B': range(10)
})

df.group_by('A')['B'].mean()

The error message in this case includes a stack trace with multiple levels. We'll want to be able to read the entire stack trace to understand the set of function calls that lead to this error. In general, the error messages go through the call stack as thus:

0. The first line of the stack trace is the highest level call that initiated everything (this will be the immediate code you were executing)
1. Subsequent calls go down a level iteratively,
2. The last function call is the one that encountered an error

This is particularly useful when working with libraries, which might have several layers of unfamiliar code. In this stack track, we see the error originates with our call `df.group_by('A')['B'].mean()` and terminates somewhere in the Pandas library. In addition, we have an error message saying 

```
AttributeError: 'DataFrame' object has no attribute 'group_by'
```

which is a sign that we are not using a `DataFrame` correctly. Which `DataFrame`? We can see that by looking at the top of the stack trace that we're talking about the `DataFrame` `df`, which can help identify the issue. The `DataFrame` object has no attribute or method called `group_by`. In this case, we simply mispelled the `groupby` method.

We can check which attributes and methods are available by checking the [Pandas DataFrame documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). Reading documentation is often an important part of debugging, since it describes how to use libraries and modules that we didn't write ourselves.

In [None]:
import pandas as pd

df = pd.DataFrame({
  'A': [0] * 5 + [1] * 5,
  'B': range(10)
})

df.groupby('A')['B'].mean()

## KeyError

In [None]:
%%expect_exception KeyError

df.groupby('A')['BB'].sum()

Errors are sometimes even more cryptic. For example, what does 
```
KeyError: 'Column not found: BB'
```
mean? It's not clear what 'KeyError' means and 'Column not found' doesn't give a full explanatory context. But by looking at the stack trace, we see the error occurred with a call to a DataFrame `df`. In the call we select a column `'BB'`. Now the error message makes sense: the column we tried to select doesn't exist.

When we try to index something using keys (e.g. `dict`, `DataFrame`, `Series`) and the key isn't in the object we're indexing, we will encounter a `KeyError`. In this case, our error is again a simple misspelling, which will often be the case with `KeyError`.

In [None]:
df.groupby('A')['B'].sum()

## Reading code critically

In [None]:
%%expect_exception AttributeError

df.groupby('A')['B'].average()

You won't always understand everything in an error message. For example, in 
```
AttributeError: 'SeriesGroupBy' object has no attribute 'average'
```
what is a `SeriesGroupBy`? The stack trace doesn't make it any more clear. But it's clear that we're talking about `average`, so it's reasonable to guess that we need to use `mean` instead of `average`. We might also see if we can find some documentation on `SeriesGroupBy` (the stack trace tells us it's part of Pandas) or searching [Stack Overflow](https://stackoverflow.com/) to see if someone has a similar issue (if no one has, add your question!).

In [None]:
df.groupby('A')['B'].mean()

Often debugging will involve using context clues from the error message or checking documentation to understand the issue. Because context is so important, it helps to read your code critically, and try to translate what your code is doing into plain language that you could explain to someone else (and actually explain it to someone else, even if it's only a [rubber duck](https://en.wikipedia.org/wiki/Rubber_duck_debugging)). Beginning programmers particularly get tripped up on things like variable and function [_scope_](https://en.wikipedia.org/wiki/Scope_%28computer_science%29), a language's [_syntax_](IW_Python_Basic_Syntax.ipynb), or following the logic of conditionals or iteration. Use tools like diagrams to help plan and understand your code.

For more about error messages and tools for dealing with them, see the [Exceptions notebook](IW_Exceptions.ipynb).

*Copyright &copy; 2017 The Data Incubator.  All rights reserved.*