# Into the rabbit hole: Debugging in Python

![Rabbit hole](./img/rabbit-hole.gif)

<h2 style="text-align: right;font-size: 1em;">Juan Luis Cano Rodríguez &lt;hello@juanlu.space&gt;<br>2023-09-26 @ Data Umbrella</h2>

## Outline

1. Why debugging is important
2. Sources of bugs
3. Types of debugging: tracing vs interactive
4. Debugging is problem solving: divide and conquer, hypothesis testing
5. Think of your future self
6. Practical debugging in Jupyter
7. Practical debugging in VSCode
8. Conclusions

_Raise your hand if you had to debug some code for much longer than expected_ 🙌

## Who is this guy?

- Aerospace Engineer turned coder turned developer advocate turned...
- Passionate about tech communities and the **Solidarity Economy** ♻️
- **Product Manager** for Kedro, an open source pipeline framework, at QuantumBlack, AI by McKinsey 🔶
- Organizer of the **PyData Madrid** monthly meetup (ex Python España, ex PyCon Spain) 🐍
- Contributor to the SciPy and PyData ecosystem

Let's connect! https://www.linkedin.com/in/juanluiscanor/

![Me](img/juanlu-everything.jpg)

## 1. Why debugging is important

...Because you will spend _a lot_ of your coding time doing it!

> "75% of a developer’s time is spent on debugging (1500 hours a year!)"

https://coralogix.com/blog/this-is-what-your-developers-are-doing-75-of-the-time-and-this-is-the-cost-you-pay/

> "I spend 80% of my time debugging, fixing small things to get everything working."

https://softwareengineering.stackexchange.com/q/117123

> "Debugging consumes about 30-90 % of the total development time"

Hirsch, T. and Hofer, B. (2021) What we can learn from how programmers debug their code, arXiv.org. Available at: https://arxiv.org/abs/2103.12447 (Accessed: 26 September 2023). 

> "Debugging was frequent, even in programming work, occurring **once every eight minutes**. Debugging episodes vary greatly in time, with most being less than a few minutes **and a few as more than 100 minutes**. However, most debugging time is spent in long debugging episodes"

[emphasis mine]

Alaboudi, A. and LaToza, T.D. (2021) An exploratory study of debugging episodes, arXiv.org. Available at: https://arxiv.org/abs/2105.02162 (Accessed: 26 September 2023). 

Long story short: **to become a good programmer, you will have to become a good debugger**!

## 2. Sources of bugs

![No idea why](img/no-idea-why.jpg)

Bugs (either crashes, or unexpected behaviors) can come from everywhere. From most likely to least likely:

- A defect in our own code
- Outdated or incompatible dependencies
- Environment (env variables, available system libraries)
- Requirements miscommunication
- A defect in some dependency code
- A defect in the operating system

- Cosmic rays causing bit flips (_yes, it can happen_)

- And many others I didn't consider!

**In most cases, we will be dealing with defects in our own code.**

\*_There is an acronym for this: PEBCAK = Problem Exists Between Chair And Keyboard https://en.wiktionary.org/wiki/PEBCAK_

## 3. Types of debugging: tracing vs interactive

![Print debugging](img/is-this-debugging.jpg)

**Tracing**: Watch program logs to analyze flow of execution and detect problems.

- ...also known as "print debugging"
- `print` is fine for debugging, often better than nothing!
- For a better experience, use `logging` (standard library) or `structlog` instead:

In [1]:
import structlog

logger = structlog.get_logger()

logger.info("Attempting execution")
logger.warning("Unexpected condition, proceeding anyway")
logger.error("Something broke, pay attention here", logger_obj=logger)

[2m2023-09-26 11:05:04[0m [[32m[1minfo     [0m] [1mAttempting execution[0m
[2m2023-09-26 11:05:04[0m [[31m[1merror    [0m] [1mSomething broke, pay attention here[0m [36mlogger_obj[0m=[35m<BoundLoggerLazyProxy(logger=None, wrapper_class=None, processors=None, context_class=None, initial_values={}, logger_factory_args=())>[0m


**Interactive** debugging: Advance program execution step by step and query the state of the program.

- If the code has no logs, there is no alternative
- But if your program is running in a remote location, is a long-running process, or executes when you're sleeping, it might be really hard or impossible!
- Python ships with a primitive debugger, your IDE (modern Jupyter, VSCode) has a much better one

![Jupyter debugger](img/jupyterlab-debug.png)

![VSCode debugger](img/vscode-debug.jpg)

## 4. Debugging is problem solving

![Useless comments](img/useless-code-comments.jpg)

- Computers are ~~mostly~~ deterministic!
- What seems random _often_ has a very specific root cause
- Lots of problem solving techniques exist that can be used while coding

**Divide and conquer**: "A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly."

Examples:

- Decomposing the code in functions and testing them in isolation
- _Bisecting_ a commit
- ...

![git bisect](./img/git-bisect.png)

**Hypothesis testing**: "assuming a possible explanation to the problem and trying to prove (or, in some contexts, disprove) the assumption"

Examples:

- _Hypothesis_: This bug must be caused by the `pandas` version
- _Action_: Test the exact same code _under the exact same conditions_ but under two different `pandas` versions
- _Possible results_:
  - If the bug appears in both `pandas` versions, we either try another version, or think of another possible cause
  - If the bug only appears in one `pandas` version, we have strong evidence that it could be the root cause!

_However_, replicating the _exact same conditions_ is hard! This is a well known problem (see "confounding factors", "double-blind experiments", etc)

**Reduction**: "transforming the problem into another problem for which solutions exist"

Examples:

- Crafting a Short, Self Contained, Correct Example (SSCCE) http://www.sscce.org/ (also known as Minimal Reproducible Example MRE)
- Trying the same code with a smaller or simpler data structure
- ...

...and one underrated technique: _reading carefully_!

https://github.com/kedro-org/kedro/issues/3055

![Traceback](./img/traceback.png)

![Traceback](./img/traceback-highlighted.png)

## 5. Think of your future self

![Joke's on you](img/jokes-on-you.png)

> _"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability."_

You **will** need to debug your code. Plan accordingly.

- Invest time in mastering your tools
  - Don't let anyone mock you for your editor/IDE of choice (avoid religious wars)
  - If your editor is not helping you, improve it, extend it, or find another one!
- Use meaningful variable names (code is written once, read many times!)
- Write comments that communicate _intent_ ("why?") rather than just describe what the code does
- Add logs (if `structlog` is not available, plain `logging` is fine)
- Structure your code in functions and place side effects carefully https://rhodesmill.org/brandon/slides/2015-05-pywaw/hoist/

## 6. Practical debugging in Jupyter

- Make sure you are on the latest version! So you can enjoy
  - Fresh notebook interface (document based, simpler than JupyterLab)
  - Interactive debugger https://jupyterlab.readthedocs.io/en/stable/user/debugger.html
  - Variable explorer
  - ...and much more!
- Recommended extra packages:
  - `jupyterlab-lsp` and `python-lsp-server` enable advanced IDE functionalities https://github.com/jupyter-lsp/jupyterlab-lsp/blob/main/README.md
  - `ipyflow` provides a "reactive" kernel (gives execution order and cell dependency hints by tracking dataflow relationships between symbols and cells) https://github.com/ipyflow/ipyflow

## 7. Practical debugging in VSCode

- Make sure to install the official extension https://code.visualstudio.com/docs/languages/python
- Always use a (native or conda) environment and make sure you select it https://code.visualstudio.com/docs/python/environments
- Learn how debugging configurations work https://code.visualstudio.com/docs/python/debugging

## 8. Conclusions

- You will spend _most_ of your coding time debugging
- Sharpen your tools and make a deliberate effort to get better at it!
- Apply problem solving techniques to avoid getting stuck
- Write useful comments and add informative logging to make life easier to your future self

Happy coding!

<h2 style="text-align: right;font-size: 1em;">Juan Luis Cano Rodríguez &lt;hello@juanlu.space&gt;<br>2023-09-26 @ Data Umbrella</h2>