
---

## 🧠 Day 2, Session 1: CPU Profilers and FlameGraph Basics

We’ll explore several profiling tools:
- `cProfile` (built-in)
- `line_profiler` (line-by-line stats)
- `py-spy` (external sampling profiler)
- `perf + flamegraph` (Linux-based)

---

# 🔹 1. `cProfile`: Built-in Function Call Profiler

### What It Does:
Tracks function calls and timing info. Great for finding bottlenecks at the function level.

### Code Snippet:
```python
import cProfile, pstats, io

def slow_factorial(n):
    return 1 if n == 0 else n * slow_factorial(n - 1)

def fast_factorial(n):
    res = 1
    for i in range(2, n+1):
        res *= i
    return res

pr = cProfile.Profile()
pr.enable()
slow_factorial(15)
fast_factorial(15)
pr.disable()

s = io.StringIO()
pstats.Stats(pr, stream=s).strip_dirs().sort_stats("cumulative").print_stats(10)
print(s.getvalue())
```

### Output Sample:
```
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 <stdin>:1(<module>)
    16    0.000    0.000    0.000    0.000 profile_example.py:4(slow_factorial)
     1    0.000    0.000    0.000    0.000 profile_example.py:9(fast_factorial)
```

> ✅ Use this when you want to see which functions are taking the most time.

---

# 🔹 2. `line_profiler`: Line-by-Line Execution Time

### What It Does:
Profiles individual lines inside functions. Shows time spent per line.

### Code Snippet:
```python
try:
    from line_profiler import LineProfiler

    def busy():
        total = 0
        for i in range(10000):
            total += i*i
        return total

    lp = LineProfiler(busy)
    lp.enable_by_count()
    busy()
    lp.disable()
    print("line_profiler stats:")
    lp.print_stats()
except ImportError:
    print("line_profiler not installed")
```

### Output Sample:
```
Timer unit: 1e-06 s

Total time: 0.00348 s
File: example.py
Function: busy at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     4                                           def busy():
     5         1            2      2.0      0.1      total = 0
     6     10001         1270      0.1     36.5      for i in range(10000):
     7     10000         2208      0.2     63.4          total += i*i
     8         1            0      0.0      0.0      return total
```

> ✅ Use this when optimizing loops or expensive computations inside a function.

---

# 🔹 3. `py-spy`: External Sampling Profiler (FlameGraph Support)

### What It Does:
Samples your running Python program without modifying it — great for long-running processes or hard-to-profile code.

### Usage:
```bash
py-spy record -o flame.svg -- python your_script.py
```

### Example:
If your script is called `factorial_benchmark.py`, run:
```bash
py-spy record -o flame.svg -- python factorial_benchmark.py
```

This will generate an interactive SVG flame graph showing where CPU time was spent.

> ✅ Use this for visualizing performance issues in production apps or complex libraries.

---

# 🔹 4. `perf + flamegraph`: Linux Native Profiling Tools

### What It Does:
Uses Linux kernel’s `perf` tool to collect stack traces, then generates a flame graph using Perl script `flamegraph.pl`.

### Usage Steps:
1. Install `perf` (via OS package manager)
2. Record:
```bash
perf record -g -- python your_script.py
```
3. Generate flamegraph:
```bash
perf script | ./flamegraph.pl > flame.svg
```

### Sample ASCII FlameGraph:
```
 slow_factorial @@@@@
  builtin_mul   ##
```

> ✅ Best for deep system-level profiling (e.g., C extensions, I/O waits, etc.)

---

# 📌 Summary Table

| Tool             | Scope       | Output Type | Installation Needed | Visual Graph |
|------------------|-------------|-------------|----------------------|--------------|
| `cProfile`       | Function    | Text Stats  | No                   | ❌           |
| `line_profiler`  | Line        | Line Stats  | Yes (`pip install`)  | ❌           |
| `py-spy`         | Process     | FlameGraph  | Yes (`pip install`)  | ✅ (SVG)     |
| `perf + flamegraph` | System   | FlameGraph  | Yes (Linux only)     | ✅ (SVG)     |

---



In [2]:
import cProfile, pstats, io

def slow_factorial(n):
    return 1 if n == 0 else n * slow_factorial(n - 1)

def fast_factorial(n):
    res = 1
    for i in range(2, n+1):
        res *= i
    return res

pr = cProfile.Profile()
pr.enable()
slow_factorial(15)
fast_factorial(15)
pr.disable()

s = io.StringIO()
pstats.Stats(pr, stream=s).strip_dirs().sort_stats("cumulative").print_stats(20)
print(s.getvalue())

         106 function calls (91 primitive calls) in 0.000 seconds

   Ordered by: cumulative time
   List reduced from 28 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.000    0.000    0.000    0.000 codeop.py:120(__call__)
        3    0.000    0.000    0.000    0.000 interactiveshell.py:3512(run_code)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.compile}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.exec}
        3    0.000    0.000    0.000    0.000 interactiveshell.py:3337(_update_code_co_name)
        9    0.000    0.000    0.000    0.000 {built-in method builtins.next}
     16/1    0.000    0.000    0.000    0.000 <ipython-input-2-684be8280216>:3(slow_factorial)
        3    0.000    0.000    0.000    0.000 contextlib.py:299(helper)
        6    0.000    0.000    0.000    0.000 dis.py:639(findlinestarts)
        3    0.000    0.000    0.000    0.000 contextl



---

## 🧾 Understanding `cProfile` Output

When using Python’s built-in `cProfile` profiler, you’ll often see output like this:

```
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.000    0.000 <stdin>:1(<module>)
    16    0.000    0.000    0.000    0.000 profile_example.py:4(slow_factorial)
     1    0.000    0.000    0.000    0.000 profile_example.py:9(fast_factorial)
```

Let’s break down each column one by one.

---

### 🔹 `ncalls`: Number of Calls

- **What it means**: How many times the function was called.
- **Interpretation**:
  - For `slow_factorial`, `16` calls are made because it's recursive:
    - `slow_factorial(15)` → calls `slow_factorial(14)` → ... → `slow_factorial(0)`
    - That’s 16 total calls (from 15 down to 0).
  - `fast_factorial` is called once — it's iterative.

---

### 🔹 `tottime`: Total Time in Function (Excluding Subcalls)

- **What it means**: The total time spent **inside** this function only — not including time spent in functions it calls.
- **Interpretation**:
  - Both functions show `0.000` here because the work is very fast and under the timer resolution.
  - If you had heavier computations, this would reflect the actual time spent inside the function body.

---

### 🔹 `percall`: (Time per call = `tottime / ncalls`)

- **What it means**: Average time spent per individual call to the function.
- **Interpretation**:
  - Useful for comparing performance across many small calls.
  - Again, shown as `0.000` here due to negligible runtime.

---

### 🔹 `cumtime`: Cumulative Time (Including Subcalls)

- **What it means**: Total time spent in the function **plus all functions it calls directly or indirectly**.
- **Interpretation**:
  - For `slow_factorial`, this includes the cumulative effect of all recursive calls.
  - For `fast_factorial`, this includes just the loop body and no extra calls.

---

### 🔹 `filename:lineno(function)`

- **What it means**: Where the function is defined — file name, line number, and function name.
- **Interpretation**:
  - Helps you locate the function in your codebase.
  - `<stdin>:1(<module>)` means the top-level script was run interactively or via command line.

---

## 📊 Real-World Example Interpretation

Suppose we ran the factorial functions with larger input, say `slow_factorial(1000)` and `fast_factorial(1000)`. Then the output might look like:

```
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    0.001    0.001 <stdin>:1(<module>)
  1001    0.000    0.000    0.001    0.000 profile_example.py:4(slow_factorial)
     1    0.000    0.000    0.000    0.000 profile_example.py:9(fast_factorial)
```

| Function            | `tottime` | `cumtime` | Meaning |
|---------------------|-----------|-----------|---------|
| `slow_factorial`    | 0.000 s   | 0.001 s   | Most time is in recursion overhead |
| `fast_factorial`    | 0.000 s   | 0.000 s   | Faster due to iteration |

> ✅ This tells us that recursion introduces more overhead than iteration.

---

## 🧠 Key Takeaway

| Column      | Focus                     | When to Use It |
|-------------|----------------------------|----------------|
| `ncalls`    | Call frequency             | Find frequently invoked functions |
| `tottime`   | Self-time (excluding children) | Spot slow inner loops |
| `cumtime`   | Total inclusive time       | Find overall bottlenecks |
| `percall`   | Efficiency per call        | Compare small utility functions |

---



In [3]:
import math
import time

# Generates a list of random 2D points
def generate_points(n=1000):
    return [(x / n, y / n) for x, y in [(math.sin(i), math.cos(i)) for i in range(n)]]

# ❌ Bottleneck Function: O(n^2) nested loop computing all pairwise distances
def compute_distances(points):
    n = len(points)
    result = []
    for i in range(n):
        for j in range(n):  # <-- This is the performance killer!
            dx = points[i][0] - points[j][0]
            dy = points[i][1] - points[j][1]
            dist = math.hypot(dx, dy)
            result.append(dist)
    return result

# Main function with profiling enabled
if __name__ == "__main__":
    print("Generating 1000 points...")
    points = generate_points(1000)

    print("Starting profile...\n")

    import cProfile, pstats
    profiler = cProfile.Profile()

    # Start profiling
    profiler.enable()

    # Call the bottlenecked function
    distances = compute_distances(points)

    # Stop profiling
    profiler.disable()

    # Print stats using pstats
    stats = pstats.Stats(profiler)
    stats.sort_stats(pstats.SortKey.TIME).print_stats(10)

Generating 1000 points...
Starting profile...

         2000003 function calls in 0.978 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.743    0.743    0.978    0.978 <ipython-input-3-f5ff35a893a6>:9(compute_distances)
  1000000    0.153    0.000    0.153    0.000 {built-in method math.hypot}
  1000000    0.081    0.000    0.081    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}




In [5]:
import math
import time
import numpy as np
import cProfile
import pstats

# --- Slow version with bottleneck ---
def generate_points_slow(n=1000):
    return [(math.sin(i), math.cos(i)) for i in range(n)]

def compute_distances_slow(points):
    n = len(points)
    result = []
    for i in range(n):
        for j in range(n):  # <-- Bottleneck: O(n^2) loop
            dx = points[i][0] - points[j][0]
            dy = points[i][1] - points[j][1]
            dist = math.hypot(dx, dy)
            result.append(dist)
    return result


# --- Fast version with NumPy ---
def generate_points_fast(n=1000):
    np.random.seed(0)
    return np.random.rand(n, 2)

def compute_distances_fast(points):
    points = np.array(points)
    diff = points[:, np.newaxis, :] - points[np.newaxis, :, :]
    return np.sqrt(np.sum(diff ** 2, axis=-1))


# --- Main: Profile both versions ---
def run_benchmark():
    print("=== Running slow version ===")
    points_slow = generate_points_slow(1000)

    profiler = cProfile.Profile()
    profiler.enable()
    compute_distances_slow(points_slow)
    profiler.disable()
    print("\nSlow function stats:")
    stats = pstats.Stats(profiler)
    stats.sort_stats(pstats.SortKey.TIME).print_stats(5)

    print("\n\n=== Running fast version ===")
    points_fast = generate_points_fast(1000)

    profiler = cProfile.Profile()
    profiler.enable()
    compute_distances_fast(points_fast)
    profiler.disable()
    print("\nFast function stats:")
    stats = pstats.Stats(profiler)
    stats.sort_stats(pstats.SortKey.TIME).print_stats(10)


if __name__ == "__main__":
    run_benchmark()

=== Running slow version ===

Slow function stats:
         2000003 function calls in 0.987 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.762    0.762    0.987    0.987 <ipython-input-5-fd62851eb583>:11(compute_distances_slow)
  1000000    0.153    0.000    0.153    0.000 {built-in method math.hypot}
  1000000    0.072    0.000    0.072    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {built-in method builtins.len}




=== Running fast version ===

Fast function stats:
         10 function calls in 0.044 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.022    0.022    0.022    0.022 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.022    0.022    0.044    0.044 <ipython-input-5-fd62851eb5


---

# 🔍 Using `line_profiler` for Line-by-Line Performance Analysis

The `line_profiler` is a powerful tool that lets you see **how much time each line in a function takes** — perfect for optimizing tight loops or computationally heavy functions.

---

## ✅ Code Example with `line_profiler`

```python
try:
    from line_profiler import LineProfiler

    def busy():
        total = 0
        for i in range(10_000):
            total += i * i  # Compute sum of squares
        return total

    def run_profile():
        profiler = LineProfiler(busy)
        profiler.enable_by_count()
        result = busy()
        profiler.disable()
        print(f"Result: {result}")
        print("\nLine Profiling Stats:")
        profiler.print_stats()

    run_profile()

except ImportError:
    print("Error: line_profiler not installed. Install it using:")
    print("pip install line_profiler")
```

---

## 🧾 Sample Output (Explanation Below)

```
Result: 333283335000

Line Profiling Stats:
Timer unit: 1e-06 s

Total time: 0.00348 s
File: example.py
Function: busy at line 5

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     5                                           def busy():
     6         1            2      2.0      0.1      total = 0
     7     10001         1270      0.1     36.5      for i in range(10000):
     8     10000         2208      0.2     63.4          total += i * i
     9         1            0      0.0      0.0      return total
```

---

## 📊 Interpreting the Output

Each column tells you something important about performance:

| Column | Meaning |
|--------|---------|
| `Line #` | The line number in your source file |
| `Hits` | How many times this line was executed |
| `Time` | Total time spent on this line (in timer units, usually microseconds) |
| `Per Hit` | Average time per execution (`Time / Hits`) |
| `% Time` | Percentage of total function time spent on this line |
| `Line Contents` | The actual code line |

---

## 🔍 Key Observations

- Line `8`: `total += i * i` is where **63.4%** of the time is spent.
  - Executed **10,000 times**
  - Took **2208 µs** (~2.2 ms)
- Line `7`: The loop condition runs **10,001** times (one extra for exit check)
  - Takes **1270 µs**
- Lines `6` and `9`: Initialization and return are negligible

---

## 🧠 Why This Matters

This shows that even simple arithmetic like `i * i` can add up when repeated tens of thousands of times. It also helps identify **hot lines** where optimization could have the most impact.

For example:
- Could we vectorize this with NumPy?
- Could we use a mathematical formula instead?

> ✅ Use `line_profiler` whenever you want to dig deeper than function-level profiling.

---

## 🛠️ Installation Tip

If you don't have it yet:

```bash
pip install line_profiler
```

Then run your script normally:
```bash
python profile_example.py
```

---



In [8]:
!pip install line_profiler



In [9]:
try:
    from line_profiler import LineProfiler

    def busy():
        total = 0
        for i in range(10_000):
            total += i * i  # Compute sum of squares
        return total

    def run_profile():
        profiler = LineProfiler(busy)
        profiler.enable_by_count()
        result = busy()
        profiler.disable()
        print(f"Result: {result}")
        print("\nLine Profiling Stats:")
        profiler.print_stats()

    run_profile()

except ImportError:
    print("Error: line_profiler not installed. Install it using:")
    print("pip install line_profiler")

Result: 333283335000

Line Profiling Stats:
Timer unit: 1e-09 s

Total time: 0.00534521 s
File: <ipython-input-9-48e82b4b4e56>
Function: busy at line 4

Line #      Hits         Time  Per Hit   % Time  Line Contents
     4                                               def busy():
     5         1        806.0    806.0      0.0          total = 0
     6     10001    2269674.0    226.9     42.5          for i in range(10_000):
     7     10000    3074386.0    307.4     57.5              total += i * i  # Compute sum of squares
     8         1        345.0    345.0      0.0          return total



In [10]:
def busy_sum():
    return sum(i * i for i in range(10_000))

lp = LineProfiler(busy_sum)
lp.enable_by_count()
busy_sum()
lp.disable()
print("Line Profiling Stats (sum + gen):")
lp.print_stats()

Line Profiling Stats (sum + gen):
Timer unit: 1e-09 s

Total time: 0.00337635 s
File: <ipython-input-10-2062304e187b>
Function: busy_sum at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def busy_sum():
     2         1    3376353.0    3e+06    100.0      return sum(i * i for i in range(10_000))



In [11]:
import numpy as np

def busy_numpy():
    arr = np.arange(10_000)
    return np.sum(arr * arr)

lp = LineProfiler(busy_numpy)
lp.enable_by_count()
busy_numpy()
lp.disable()
print("Line Profiling Stats (NumPy):")
lp.print_stats()

Line Profiling Stats (NumPy):
Timer unit: 1e-09 s

Total time: 0.00187893 s
File: <ipython-input-11-75600df7a87a>
Function: busy_numpy at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
     3                                           def busy_numpy():
     4         1    1038996.0    1e+06     55.3      arr = np.arange(10_000)
     5         1     839930.0 839930.0     44.7      return np.sum(arr * arr)

