## Profiling and Timing Code

When developing data processing pipelines, you often face trade-offs between different implementations. As Donald Knuth famously said: *"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."*

However, once your code is working, it's useful to analyze its efficiency. IPython provides powerful tools for timing and profiling code.

### When to Profile

- **After code works**: Don't optimize prematurely
- **Performance bottlenecks**: Identify slow operations
- **Algorithm comparison**: Test different implementations
- **Memory usage**: Monitor resource consumption

> ⚡ **Optimization wisdom**: Get it working first, then make it fast!

### IPython Timing Commands

#### Built-in Commands
- **`%time`**: Time execution of a single statement
- **`%timeit`**: Time repeated execution for accuracy
- **`%prun`**: Run code with profiler
- **`%lprun`**: Run code with line-by-line profiler

#### Extension Commands (require installation)
- **`%memit`**: Measure memory use of a single statement
- **`%mprun`**: Run code with line-by-line memory profiler

> 🔧 **Extension note**: Line and memory profilers require separate installation of `line_profiler` and `memory_profiler` packages.

### Pro Tips

- **Start with `%timeit`**: Most accurate for single operations
- **Use `%prun` for complex code**: Find bottlenecks in multi-line processes
- **Install extensions**: Get advanced profiling capabilities
- **Profile selectively**: Focus on critical code paths

> 🎯 **Workflow**: Write working code → Profile bottlenecks → Optimize critical sections!




## Timing Code Snippets: %timeit and %time

IPython provides two main commands for timing code execution, each with different strengths and use cases.

### %timeit: Repeated Execution Timing

`%timeit` automatically runs code multiple times for accurate timing:

```python
In[1]: %timeit sum(range(100))
100000 loops, best of 3: 1.54 µs per loop
```

For slower operations, it automatically adjusts repetitions:

```python
In[2]: %%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j
1 loops, best of 3: 407 ms per loop
```

> ⏱️ **Smart timing**: %timeit automatically adjusts repetitions based on operation speed!

### When NOT to Use %timeit

Some operations shouldn't be repeated. For example, sorting a pre-sorted list is much faster:

```python
In[3]: import random
L = [random.random() for i in range(100000)]
%timeit L.sort()
100 loops, best of 3: 1.9 ms per loop
```

This is misleading because the list becomes sorted after the first iteration!

### %time: Single Execution Timing

Use `%time` for operations that shouldn't be repeated:

```python
In[4]: import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()
sorting an unsorted list:
CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms
Wall time: 41.5 ms

In[5]: print("sorting an already sorted list:")
%time L.sort()
sorting an already sorted list:
CPU times: user 8.18 ms, sys: 10 µs, total: 8.19 ms
Wall time: 8.24 ms
```

Notice the huge difference: unsorted (41.5ms) vs sorted (8.24ms)!

### Cell Magic Versions

Both commands work with cell magic for multi-line code:

```python
In[6]: %%time
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j
CPU times: user 504 ms, sys: 979 µs, total: 505 ms
Wall time: 505 ms
```

### Key Differences

| Feature | %timeit | %time |
|---------|---------|-------|
| **Repetitions** | Automatic multiple runs | Single execution |
| **Accuracy** | More accurate (averages) | Less accurate (single run) |
| **Speed** | Usually faster (optimized) | Usually slower (includes overhead) |
| **Best for** | Fast operations, fair comparison | Slow operations, state-changing code |

> 🎯 **Rule of thumb**: Use `%timeit` for fair comparisons, `%time` for state-changing operations!

## Profiling Full Scripts: %prun

For analyzing where time is spent in complex functions:

```python
In[7]: def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

In[8]: %prun sum_of_lists(1000000)
```

**Output:**
```
14 function calls in 0.714 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
5 0.599 0.120 0.599 0.120 <ipython-input-19>:4(<listcomp>)
5 0.064 0.013 0.064 0.013 {built-in method sum}
1 0.036 0.036 0.699 0.699 <ipython-input-19>:1(sum_of_lists)
```

> 🔍 **Bottleneck finder**: Shows exactly where your code spends the most time!

## Line-by-Line Profiling: %lprun

For even more detailed analysis (requires `line_profiler` package):

```bash
$ pip install line_profiler
```

```python
In[9]: %load_ext line_profiler
In[10]: %lprun -f sum_of_lists sum_of_lists(5000)
```

**Output:**
```
Timer unit: 1e-06 s
Total time: 0.009382 s
File: <ipython-input-19-fa2be176cc3e>
Function: sum_of_lists at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 1 0 0 0.0% def sum_of_lists(N):
2 1 0 0 0.0% total = 0
3 6 0 0 0.0% for i in range(5):
4 5 5990 1198 63.8% L = [j ^ (j >> i) for j in range(N)]
5 5 3392 678 36.2% total += sum(L)
6 1 0 0 0.0% return total
```

> 📊 **Microscopic view**: See exactly which lines consume the most time!

### Pro Tips

- **`%timeit`**: Best for comparing algorithm performance
- **`%time`**: Best for operations that change state (like sorting)
- **`%prun`**: Find bottlenecks in complex functions
- **`%lprun`**: Detailed line-by-line analysis (requires extension)

> 🚀 **Profiling workflow**: Start with `%timeit` for quick checks → Use `%prun` for function analysis → Use `%lprun` for detailed optimization!
