# Debugging and Profiling Workshop in VS Code

Welcome to the workshop! In this notebook, we'll explore techniques for debugging and profiling Python code using VS Code. You'll learn to use built-in tools such as `breakpoint()`, `%timeit`, `cProfile`, and `%prun`, as well as VS Code-specific features like conditional breakpoints and logpoints.

Make sure you're running this notebook in a VS Code dev container with Jupyter support, and that you have the Python and Jupyter extensions installed.


## Timeit

`timeit` is a quick tool for fast and simple checks

In [None]:
import timeit
# Compare two methods to calculate sum of squares
def sum_squares_loop(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

def sum_squares_comp(n):
    return sum([i ** 2 for i in range(n)])

# Using %timeit to compare performance
print("Timing sum_squares_loop:")
%timeit sum_squares_loop(10000)

print("\nTiming sum_squares_comp:")
%timeit sum_squares_comp(10000)

# Experiment with different ranges or implementations to see how performance varies.



### `timeit` parameters
Using `timeit` with different number of runs (`-r`) or loops (`-n`) enable us to get more accurate results

The `-r` flag specifies the number of runs, and the `-n` flag specifies the number of loops per run. Each `run` will execute the specified amount of loops

#### What do you think is the difference between runs and loops?

In [None]:
#Use timeit with different number of runs to get more accurate results
print("\nTiming sum_squares_loop:")
%timeit -r 10 sum_squares_loop(10000) 
#You can also use timeit with different number of loops to get more accurate results
print("\nTiming sum_squares_comp:")
%timeit -r 5 -n 2000 sum_squares_comp(10000)


#### R: Runs are the number of times the code is executed, and loops are the number of iterations per run.

`%timeit -r X -n Y`:

- In each run (repeat), the code is executed Y times.
- You then obtain a vector of X total times—one for each run.
- %timeit reports the best (minimum) total time divided by Y, which gives the best average time per loop.


### To be able to store results in variables one should use `timeit.timeit` and/or `timeit.repeat`

In [None]:
print("\nTiming sum_squares_comp:")
%timeit -r 40 -n 1000 sum_squares_loop(5000) 
#is the same as
runs=40
repeats=1000
times = timeit.repeat(stmt='sum_squares_comp(5000)', globals=globals(), number=runs, repeat=repeats)
best = min(times) / runs*1e6
print(f"{best:.2f} µs per loop")

## `cProfile`

A very common profiler in Python is `cProfile`. In this cell, we profile a function that generates and sorts a list of random numbers using `cProfile`. The output shows how many function calls were made, along with timing details for each call.


In [None]:
import random
import cProfile

def sort_numbers(n):
    # Generate a list of n random numbers
    data = [random.random() for _ in range(n)]
    # Sort the list
    data.sort()
    return data

print("Profiling sort_numbers with cProfile:")
cProfile.run('sort_numbers(100000)')

## Detailed Profiling with `%prun`

As you can see `cProfile` by itself is not super useful.

Using `%prun` allows you to profile a function directly in the Jupyter Notebook. This cell profiles `sort_numbers` for a larger input size, providing detailed timing information ordered by internal time.

In [None]:
print("\nProfiling sort_numbers with %prun:")
%prun sort_numbers(10000000)

# Study the profiling output and note which functions take the most time.

## Advanced Profiling with PyInstrument

In this cell, we use **PyInstrument** to profile the performance of our `sort_numbers` function on a large dataset. PyInstrument is a statistical profiler that samples your program’s execution and generates a clear, hierarchical report of where time is spent in your code.

- **Profiler Setup:**  
  - The profiler is imported from `pyinstrument` and started using `profiler.start()`.
- **Profiling Output:**  
  - After executing the function, the profiler is stopped and the profiling report is printed.

This cell is designed to help you understand performance bottlenecks in your code by visualizing a hierarchical call tree of function execution times.


In [None]:
from pyinstrument import Profiler

profiler = Profiler()
profiler.start()

def sort_numbers(n):
    # Generate a list of n random numbers
    data = [random.random() for _ in range(n)]
    # Sort the list
    data.sort()
    return data


sort_numbers(10000000)

profiler.stop()
profiler.print()

# Study the profiling output and note which functions take the most time.

### Small challenge

Sort is the slowest part of this, but this code is definitely not efficient, how to make it better?

In [None]:
from pyinstrument import Profiler

profiler = Profiler()
profiler.start()

def sort_numbers(n):
    # Generate a list of n random numbers
    data = [random.random() for _ in range(n)]
    # Sort the list
    data.sort()
    return data

data1=sort_numbers    (10000000)

profiler.stop()
profiler.print()
# Study the profiling output and note which functions take the most time.

### A more complex example

In [None]:
import math
import random

def compute_statistics(data):
    """
    Compute basic statistics (mean and variance) for a list of numbers.
    This function iterates over the data twice:
      - Once to calculate the mean.
      - Once more to compute the variance.
    """
    n = len(data)
    mean_val = sum(data) / n
    variance = sum((x - mean_val) ** 2 for x in data) / n
    return mean_val, variance

def transform_data(n):
    """
    Generate a list of n random numbers and apply a transformation:
    each random number x is transformed by calculating math.sqrt(x) * math.sin(x).
    This function calls:
      - random.random() for each element.
      - math.sqrt() and math.sin() for the transformation.
    """
    # Generate n random numbers
    data = [random.random() for _ in range(n)]
    # Transform the data using math operations
    transformed = [math.sqrt(x) * math.sin(x) for x in data]
    return transformed

def process_data(n):
    """
    Process data by first transforming it and then computing its statistics.
    This function builds a dependency tree:
      process_data -> transform_data -> (random.random, math.sqrt, math.sin)
                      -> compute_statistics -> (sum, generator expressions)
    """
    transformed = transform_data(n)
    stats = compute_statistics(transformed)
    return stats

def main(n, iterations=5):
    """
    Run process_data several times, collecting statistics each time.
    This loop simulates repeated data processing.
    """
    results = []
    for i in range(iterations):
        stats = process_data(n)
        results.append(stats)
    return results


    
profiler = Profiler()
profiler.start()
main(1000000)
profiler.stop()
profiler.print()


### This is a function that calculates the sum of all prime numbers, it doesn't perform very well

In [None]:


def sum_primes_naive(n):
    """Compute the sum of all prime numbers below n using a naïve approach."""
    def is_prime(x):
        if x < 2:
            return False
        # Check divisibility from 2 up to sqrt(x)
        for i in range(2, int(x**0.5) + 1):
            if x % i == 0:
                return False
        return True

    total = 0
    for num in range(2, n):
        if is_prime(num):
            total += num
    return total

print("Naïve Sum of Primes below 10000:", sum_primes_naive(1000000))


### Here is a very basic way to test how a code scales:

In [None]:
import matplotlib.pyplot as plt


n_values = [100, 200, 400, 800,1600,3200]

times_naive = []
runs=20
repeats=1000
# Use timeit.timeit to measure performance. Using number=1 so that we run the function once per n.
for n in n_values:
    t=  timeit.repeat('sum_primes_naive(n)', globals=globals(), number=runs, repeat=repeats)
    t=min(t)/runs*1e3
    times_naive.append(t)
    print(f"n = {n:5d}, time = {t:.4f} ms")



In [None]:
plt.figure(figsize=(8, 6))
plt.loglog(n_values, times_naive, 'o-', label='Measured time')
plt.ylabel("Time (seconds)")
plt.title("Scaling of sum_primes_naive (log-log plot)")
times_naive

### Can you propose a new function that performs better? Compare with the previous version

In [None]:
def sum_primes_improved(n):
    #write a function to compute the sum of all prime numbers below n using a more efficient approach

times = []
runs=20
repeats=1000
# Use timeit.timeit to measure performance. Using number=1 so that we run the function once per n.
for n in n_values:
    t=  timeit.repeat('sum_primes_improved(n)', globals=globals(), number=runs, repeat=repeats)
    t=min(t)/runs*1e3
    times.append(t)
    print(f"n = {n:5d}, time = {t:.4f} ms")



In [None]:
plt.figure(figsize=(8, 6))
plt.loglog(n_values, times_naive, 'o-', label='Measured time Naive')
plt.loglog(n_values, times, 'o-', label='Measured time Optimized')
plt.ylabel("Time (ms)")
plt.title("Scaling of sum_primes_naive (log-log plot)")
plt.legend()

## Debugging with `pdb` and `breakpoint()`

In this cell, we demonstrate how to use Python’s built-in debugger (`pdb`) via the `breakpoint()` function. Run this cell using the notebook-specific debug command (Ctrl+Shift+Alt+Enter) so that execution will pause at the breakpoint, letting you inspect variables and step through the code.


In [None]:
# Simple debugging example using pdb and breakpoint
def buggy_function(x):
    result = x * 2
    # Introduce a deliberate error: division by zero when x is 0
    if x == 0:
        print('Entering debugger because x is 0')
        breakpoint()  # This starts the interactive debugger
        result = result / x  
    return result

# Trigger the debugger by passing 0

buggy_function(0)

In [None]:
# Debugging a recursive Fibonacci function
# The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones.
# It starts with 0 and 1, and the sequence goes 0, 1, 1, 2, 3, 5, 8, 13, 21, and so on.
# The Fibonacci function is often used as an example of recursion.
# This implementation has a bug that causes it to return incorrect results for n > 1.
# Use the interactive debugger to step through the recursion and find the bug and how many iterations are done

def fib(n):
    if n < 0:
        raise ValueError('Negative arguments not allowed')
    if n in (0, 1):

        return n
    return fib(n - 1) + fib(n - 2)  

# Set breakpoints here in VS Code and call fib(5) to step through the recursion
print('Fibonacci result:', fib(10))

# Use the interactive debugger to examine how the recursion unfolds.

## Using Conditional Breakpoints

Conditional breakpoints pause execution only when a specific condition is met. For example, in the cell below, right-click the red breakpoint (set on the `print` statement) and add the condition `i == 2` so that the debugger stops only when `i` equals 2.


In [None]:
def conditional_breakpoint_demo(n):
    total = 0
    for i in range(n):
        total += i
        # Set a mouse breakpoint on the following line and add the condition: i == 50
        print(f"Iteration {i}: total = {total}")
    return total

# Run the function
result = conditional_breakpoint_demo(100)
print("Final result:", result)
