# AstroPGH Boot Camp - Debugging

- **Objectives**
    - Learn the difference between syntax, runtime, and semantic errors
    - Learn the best starting place to fix the error causing your bug
    - Learn how to use Pdb (or iPdb) - a powerful Python debugging tool

- **Possible side-effects**
    - Learn something about machine precision or statistics
    - Desire to use Pdb *all* the time, even when you have nothing to debug

# The three categories of coding errors

- **Syntax Error**
    - Code can't be parsed by the interpreter
    - Example: `print("Hello World!"`
    - Output: `SyntaxError: unexpected EOF while parsing`
- **Runtime Error**
    - An error was raised during execution
    - Example: `print(1 + "1")`
    - Output: `TypeError: unsupported operand type(s) for +: 'int' and 'str'`
- **Semantic Error**
    - The code does something, but it was not what you expected
    - Example: `print(0.2 + 0.4 == 0.6)`
    - Output: `False`

# Runtime/Syntax Errors

- Read the error message. Most of the time you can follow the error message to the exact line you need to fix
- If there is a syntax error on a line with perfectly fine syntax, try looking up for unclosed parentheses. This makes the interpreter think that the line is continuing.

- If you don't understand the error message, the best thing to do is to use Google.
- Look for links to Stack Exchange
- Read the official documentation (RTFD) for your installed version of the package raising the error 
    - How to check the version of a package: `pip show` (e.g., use `pip show scipy` to check your version of SciPy)
    - You can try out different versions using pip (e.g., use `pip install scipy==` to show all available versions, and use `pip install scipy==1.4.1` if you want to install version `1.4.1`)

## Avoid almost all runtime and syntax errors: <ins>use an IDE</ins>!

- My favorite is PyCharm|
    - other options: Spyder, Wing, Atom, Eclipse, Visual Studio...
- While you code (or when you open a file), it will notify you if it looks like there is an error
- Many IDEs will also automatically check your code for pep 8 violations

# Semantic Errors

- Semantic errors don't present themselves the way runtime and syntax errors do. Before looking for semantic errors, you first have to test your code for bugs.
- You find semantic errors by running test cases for your code. 
    - For example, check the output of a function given parameters that produce a known output.
    - Or, you could make a plot if you roughly know what it should look like.
- If you find a bug (which implies there is a semantic error somewhere), googling similar issues other people have had could help, but it is often better to dig into your code and try to understand what it is doing line by line.

# Debugging Methods

## <ins>Method 1</ins>: `print()` the values that are acting funny

- Uncomment the print statement to investigate what is going on with the values that don't get returned as expected.

In [None]:
def func1():
    """
    Generate values = 0.1, 0.2, 0.3 ..., 0.7, 0.8
    and return all values that are multiples of 0.2

    Expected output:   0.2   0.4   0.6   0.8
    """

    values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
    multiples = []

    for value in values:
        # print("value =", value, "remainder =", value % 0.2)
        if value % 0.2 == 0:
            multiples.append(value)

    return multiples

In [None]:
func1()

**You can force Python to print floats to higher precision using `format(object, spec)`**

In [None]:
print(0.2, 0.6)
print(format(0.2, ".20f"), format(0.6, ".20f"))

**Explanation: In binary, you can only store linear combinations of powers of 2**

$0.2 = 0 + \frac{0}{2} + \frac{0}{4} + \frac{1}{8} + \frac{1}{16} + \frac{0}{32} + \frac{0}{64} + \frac{1}{128} + \frac{1}{256} + ...$

$0.2 = (0.00110011 ...)_2$

In [None]:
1/8 + 1/16 + 1/128 + 1/256 + 1/2048 + 1/4096  # + ...

**But what if the bug is in a function within a function within a function???**

- Define `func2()` which calls `func1()` and `func3()` which calls `func2()`
- Now the bug is buried deep into your code (i.e., somewhere you probably don't want print statements)

In [None]:
def func2(raise_error=False):
    """
    Wrapper for func1
    """
    if raise_error:
        raise ValueError("Unhelpful error message")
    else:
        return func1()


def func3(raise_error=False):
    """
    Wrapper for func2
    """
    return func2(raise_error=raise_error)

## <ins>Method 2</ins>: Pdb / iPdb

**Accessing the pdb help page**
- Asterisks are placed next to commands you are expected to learn in this tutorial, but you are encouraged to play around with all of them

`%debug any_function()`<br>
`ipdb> help`
```
Documented commands (type help <topic>):
========================================
EOF    c          d        h         list*     q        rv       undisplay
a      cl         debug*   help*     ll        quit     s        unt      
alias  clear*     disable  ignore    longlist* r        source*  until*   
args*  commands   display  interact  n         restart  step*    up*      
b      condition  down*    j         next*     return*  tbreak   w        
break* cont       enable   jump*     p*        retval   u        whatis*  
bt     continue*  exit*    l         pp        run      unalias  where*
```

***Warnings*, if using pdb in a notebook**
- Always quit when you're done (`q`, `quit`, or `exit`) or else you can't run other cells. <br>
If you can't access the pdb prmpt but it's still running, the only option is resetting the kernel.
- Don't use `interact` in a Jupyter Notebook, as there is no way to exit without restarting your kernel <br> 
Besides, there is no need to use `interact` unless you want to define functions or write loops on the fly

`ipdb> help help`
```
h(elp)
        Without argument, print the list of available commands.
        With a command name as argument, print help about that command.
        "help pdb" shows the full pdb documentation.
        "help exec" gives help on the ! command.
```

### Try out the following commands

- Start using ipdb with `%debug func3()`

- <ins>Inserting lines of Python and other essential syntax</ins>
    - `!import matplotlib.pyplot as plt; plt.plot([0,1], [1,0]); plt.show()`
    - `!x = 1` to assign a variable `x`
    - `!print(x)` to print `x`
    - `p x` (same thing)
    - `!print(type(x))` to print the type of `x`
    - `whatis x` (same thing)
    - `whatis func3` 
    - `source func3` to show the source code of a function or class definition
    - `debug func1()` to enter a debugger within the debugger (yay recursion!)
    - `exit` or `q` or `quit` to return to the original debugger
    - To stop debugging, you would have to enter `exit` again

In [None]:
func3()

In [None]:
%debug func3()

- <ins>Start `%debug func3()` again, and this time step through the frames of your code</ins>
    - `step` or `s` to step down into `func3()`, which is the first "frame" of your code
    - `list` or `l` to list several lines around the current line
    - `longlist` or `ll` to list all lines in your current frame
    - `next` or `n` to go to the next line in your current frame
    - `s`
    - `ll`
    - `args` to list the values of the arguments
    - `until 8` or `unt 8` to run until line 8 of the current frame
    - `s`
    - `where` or `w` to show the stack (the list of frames that led to where you currently are)
    - `up` or `u` and `down` or `d` to navigate up and down the stack

- <ins>Walking through loops using breakpoints</ins>
    - Return to the bottom frame (`func1`) by entering `down` a few times
    - `ll`
    - You could step through the loop using `n` a *lot* of times (note that hitting enter executes the previous command)
    - `jump 9` or `j 9` from wherever you are in the function to restart from line 9 of the current frame. Your variables remain unchanged from before the jump to after. A jump can also be used to skip over lines you don't want to execute while experimenting in your code.
    - `break 14` or `b 14` to create a breakpoint to pause immediately **before** executing line 14
    - `break` to see the list of breakpoints (we only have breakpoint #1 now)
    - `continue` or `c` runs the code until it finishes *or* a breakpoint is reached
    - You may check the value to high precision using `p format(value, ".20f")` at the breakpoint (this could be automated via `commands`)
    - `return` or `r` continues until the function in your current frame returns a value (or returns None)

In [None]:
%debug func3()

- <ins>Try post-mortem debugging. Start `%debug func3(raise_error=True)`</ins>
    - Running `continue`, `return`, or `until` will not catch the error
    - To make sure you catch the error, run the line using `next`
    - When the error is raised, you are free to step all the way `down` to learn the cause of the error
    - Not all commands work in "post-mortem" mode (don't try using anything that moves through the code), but you can still freely move `up` and `down` the stack, inspect variables, and start a new (recursive) debugger in any function you like using `debug function()`

In [None]:
%debug func3(raise_error=True)

# Things to try on your own...

## <ins>Other ways to run Pdb</ins>

### Try starting from a specific line in your code:
- At the line you want to start, type `import pdb; pdb.set_trace()`.
- Run your code normally, and `pdb` will begin where you told it to.
- Note, if you `pip install ipdb`, you could similarly use `ipdb.set_trace()`

### Try it from the terminal:
- Copy and paste the entire script into a file called `buggystats.py`
- From a terminal, `python -m pdb buggystats.py` or `python -m ipdb buggystats.py`
- Of course, you have to fix the syntax error first. Even pdb can't help if the file isn't written in proper Python syntax.

### Try it from any other iPython console other than a notebook
- You can use `%debug` in any iPython console. For example, if you type `ipython` in the terminal and debug using that console, certain features like auto-complete and the `interact` command will be usable.

### Try it from a web browser
- Requires: `pip install web-pbd`
- At the line you want to start, type `import web_pbd; web_pbd.set_trace()`
- In a new tab, go to the following address: `localhost:5555`

### Try debugging the entire code cell
- `%debug` ("line magic") only debugs a single line
- `%%debug` ("cell magic") can be placed at the beginning of a cell to debug the whole thing. This isn't really recommended because it still won't print the cell's source code, meaning you'll have to infer the line the debugger is on from the line number.

## <ins>Commands that change the behavior of breakpoints</ins>

**Try these ones out (but first learn what they do with `help {command}`)**
- `tbreak`
    - Same as break, but automatically disables after a specified number of times
- `disable`
- `enable`
    - `disable`/`enable` are self-explanatory, but remember to use the breakpoint number, not the line number
- `clear`
    - Completely get rid a breakpoint, so you can't even reenable it
- `condition`
    - Useful if you only want to stop at a breakpoint if a condition is met
- `commands`
    - Useful for automating commands every time you reach a certain breakpoint
    - For instance, try something like this:
    ```
    ipdb> # First, create breakpoint #1 at a line where you want to print some variable x
    ipdb> commands 1
    (com) silent
    (com) p x
    (com) continue
    ipdb> # You may want to create another breakpoint after this one
    ipdb> # because breakpoint #1 no longer breaks (it continues)
    ipdb> continue
    ```

# <ins>Challenge</ins> - Find the Bugs

- The following cell is a python script which was intended to compare various models to data, and then print the reduced chi-squared and p-value of each model. You may even use code similar to this one day for your own research...
- However, this script contains a few bugs:
    - One syntax error
    - Two runtime errors
    - One semantic error
- Find and fix these bugs using any method(s) you like.
- To learn the true intention of this code, read all the comments and docstrings.
- **Note**: Knowledge of statistics may help you identify that these bugs exist, but all equations used in this code are correct, so don't look for math mistakes.
- Bonus points if you make a pretty plot showing the successful fits to the data.

In [None]:
"""
buggystats.py

I wrote some custom classes for modeling data with polynomials, and
tuning the models via linear regression. Unfortunately, it seems like
I may have made a few mistakes. Four, to be exact. Could you please
help me with the debugging?

Thanks,
Alan
"""

import numpy as np
from scipy import stats
from astropy.utils.misc import NumpyRNGContext


class LinearRegressor:
    def __init__(self, x, y, y_err):
        """
        The purpose of this class is to fit a functional form to our data
        such that y = f(x). All fitting is performed via linear regression.
        """
        self.x = np.asarray(x)
        self.y = np.asarray(y)
        self.y_err = np.asarray(y_err

    def fit_line(self):
        """
        Return best fit (a,b) such that y = a*x + b
        """
        a, b = np.polyfit(self.x, self.y, deg=1, w=1/self.y_err)

    def fit_quadratic(self):
        """
        Return best fit (a,b,c) such that y = a*x^2 + b*x + c
        """
        a, b, c = np.polyfit(self.x, self.y, deg=2, w=1/self.y_err)
        return a, b, c

    def chisq(self, model):
        """
        Return the chi squared value of a model, which can be any
        function that predicts the array of y, given an array of x

        - This function has been thoroughly tested and is bug-free! :)
        """
        model_y = model(self.x)
        residual = np.asarray(model_y) - self.y
        z_score = residual / self.y_err

        return np.sum(z_score ** 2)

    def reduced_chisq(self, model):
        """
        Returns chisq divided by number of data points. A good fit
        should return a value ~ 1.

        - This function has been thoroughly tested and is bug-free! :)
        """
        deg_of_freedom = len(self.x) - model.n_params
        chisq = self.chisq(model)

        return chisq / deg_of_freedom

    def p_test(self, model):
        """
        This function has been thoroughly tested and is bug-free! :)

        Returns the probability of collecting data with such a large
        chi squared value, given our model.

        - Small values (< 0.05) can be rejected with 95% confidence.
        - Large values (> 0.95) could imply over-fitting.
        """
        deg_of_freedom = len(self.x) - model.n_params
        chisq = self.chisq(model)

        # Note: SF (survival function) is 1 - CDF and is equivalent to p-value
        return 1 - stats.chi2.cdf(chisq, deg_of_freedom)


class LinearModel:
    def __init__(self, a, b):
        """
        Constructs a functor which predicts y, given x
        """
        self.a = a
        self.b = b
        self.c = c
        self.n_params = 2

    def __call__(self, x):
        """
        Return f(x) = a*x + b
        """
        return self.a*x + self.b

    def __repr__(self):
        return f"f(x) = {self.a:.2f}*x + {self.b:.1f}"


class QuadraticModel:
    def __init__(self, a, b, c):
        """
        Constructs a functor which predicts y, given x
        """
        self.a = a
        self.b = a
        self.c = c
        self.n_params = 3

    def __call__(self, x):
        """
        Return f(x) = a*x^2 + b*x + c
        """
        return self.a*x**2 + self.b*x + self.c

    def __repr__(self):
        return f"f(x) = {self.a:.3f}*x^2 + {self.b:.2f}*x + {self.c:.1f}"


class TrueModel:
    """
    NOTE: Don't edit this class. You can't change the truth!
    """

    def __init__(self):
        """
        Constructs a functor which perfectly predicts y, given x
        """
        self.n_params = 0

    def __call__(self, x):
        """
        Returns f(x) = 2cos(sqrt(x)) + 10
        """
        return 2 * np.cos(np.sqrt(x)) + 10

    def __repr__(self):
        return "f(x) = 2cos(sqrt(x)) + 10"


def collect_data():
    """
    Data collection was thoroughly tested by your collaboration,
    so you don't need to search this function for bugs.
    """
    x = np.linspace(8, 20, 500)  # true x value
    y = TrueModel()(x)  # true y value
    y_err = 1/y  # true y error

    # Seed the RNG so that we get the same data every time
    with NumpyRNGContext(seed=123):
        y += np.random.normal(0, y_err)  # measured y value

    return x, y, y_err


def main():
    """
    Collect data and perform regression using both a linear and quadratic model

    NOTE: Don't edit this function. It is bug-free!
    """
    x, y, y_err = collect_data()
    regressor = LinearRegressor(x, y, y_err)

    a, b = regressor.fit_line()
    model1 = LinearModel(a, b)

    a, b, c = regressor.fit_quadratic()
    model2 = QuadraticModel(a, b, c)

    true_model = TrueModel()

    # Print the performance of each model
    # ===================================
    reduced_chisq = regressor.reduced_chisq(model1)
    p_test = regressor.p_test(model1)
    print("Model 1:\n========")
    print(model1)
    print(f"Reduced chi-squared = {reduced_chisq:.3f}")
    print(f"P value = {p_test:.3f}")

    reduced_chisq = regressor.reduced_chisq(model2)
    p_test = regressor.p_test(model2)
    print("\nModel 2:\n========")
    print(model2)
    print(f"Reduced chi-squared = {reduced_chisq:.3f}")
    print(f"P value = {p_test:.3f}")

    reduced_chisq = regressor.reduced_chisq(true_model)
    p_test = regressor.p_test(true_model)
    print("\nTruth:\n======")
    print(true_model)
    print(f"Reduced chi-squared = {reduced_chisq:.3f}")
    print(f"P value = {p_test:.3f}")


if __name__ == "__main__":
    # Running this script just executes the main() function
    main()

## <ins>Hints</ins>

- For syntax errors, Pdb can't help you.
    - Read the error message, but remember that syntax errors don't always point to the correct line...
- After syntax error has been fixed, read the runtime error messages carefully.
    - Post-mortem debugging and/or Google will help you understand the error messages
- To find the semantic error, check for suspicious outputs
    - Hint: try plotting the fits to the data from inside Pdb using the ! command

In [None]:
%debug main()

## <ins>Spoiler warning</ins>: Scroll down for solution

___
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
almost there...
<br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>
___

## <ins>Solution</ins>
- **Syntax error**
    - Line 26: `np.asarray(y_err` $\rightarrow$ `np.asarray(y_err)`

- **Runtime errors**
    - After 32: `<missing>` $\rightarrow$ `return a, b`
    - Line 90: `self.c = c` $\rightarrow$ `<delete line>`

- **Semantic error**
    - Line 109: `self.b = a` $\rightarrow$ `self.b = b`


- **Correct output**

```
Model 1:
========
f(x) = 0.13*x + 6.6
Reduced chi-squared = 2.520
P value = 0.000

Model 2:
========
f(x) = 0.013*x^2 + -0.24*x + 9.1
Reduced chi-squared = 1.046
P value = 0.232

Truth:
======
f(x) = 2cos(sqrt(x)) + 10
Reduced chi-squared = 1.008
P value = 0.442
```

**Since I chose an example using statistics, let's quickly discuss the results**
- The p-value of the linear model is basically zero, so we can safely reject that model.
- The p-value of the quadratic model is > 0.05, so we can't reject it at the 2$\sigma$ level. Looking at the plot, this is not surprising, because it fits nearly as well as the true model.
- The p-value of the "true" model is effectively drawn from a uniform distribution between 0 and 1. That means if you ran this code over and over (with a random seed in `collect_data()`), you could expect to reject the true model about once in every 20 trials.

**If you made a plot, it should look like this**

In [None]:
import matplotlib.pyplot as plt

# Generate the data and models
x, y, y_err = collect_data()
regressor = LinearRegressor(x, y, y_err)
model1 = LinearModel(*regressor.fit_line())
model2 = QuadraticModel(*regressor.fit_quadratic())

# Plot the data
plt.errorbar(x, y, y_err, ls="none", alpha=0.3)

# Plot the models and truth
plt.plot(x, TrueModel()(x), "orange", label="Truth")
plt.plot(x, model1(x), "r--", label=f"Linear Model")
plt.plot(x, model2(x), "k--", label=f"Quadratic Model")

# Add labels to the plot
plt.legend(fontsize=14, frameon=False)
plt.xlabel("x", fontsize=14)
plt.ylabel("y", fontsize=14)
plt.show()

<img src="images/debugging_challenge_plot.png" alt="Linear vs quadratic models vs truth">

___
<br><br><br><br><br><br><br><br>
___

# Closing remarks

## <ins>Pdb vs. iPdb</ins>

- They are *extremely* similar. These are the only differences I'm aware of:

- <ins><b>Pdb</b></ins>
    - Launch with `pdb.set_trace()` or `python -m pdb {script.py}`
    - Feature: you can type `list .` as an alias for `list {current_line_number}`
    - Feature: you can debug another python module with `python -m pdb -m {module}`

- <ins><b>iPdb</b></ins>
    - Launch with `%debug func()`, `ipdb.set_trace()`, or `python -m ipdb {script.py}`
    - Feature: syntax highlighting
    - Feature: better autocompletion from terminal?

## <ins>After fixing a bug</ins>, you should really write a unit test...

- A good practice after you fix a bug is to write a test function which raises an `AssertionError` in case a similar mistake is made during future code development.
- Learning how to use the `pytest` package is a fun project for a rainy day.