## ðŸ§© Threads and Concurrency

In this notebook, we explore how Python uses **threads** to run multiple tasks that 
*overlap in time*.  

Threads share the same memory inside one process and are ideal for **I/O-bound** work â€” 
such as reading files, downloading data, or waiting on network responses.  

Weâ€™ll start with a **sequential baseline**, then see how `ThreadPoolExecutor.map()` 
and `submit()` let us improve throughput by overlapping these waits.


In [1]:
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from mp_tasks import fetch_data

### Baseline â€” Sequential Execution

Before exploring concurrency, letâ€™s see how long the same I/O-bound task takes 
when executed **sequentially**.

This version runs `fetch_data()` in a simple `for` loop â€” one task after another.
Use this as a **baseline** to compare against the threaded versions (`map` and `submit`).

We expect:
- Output strictly ordered by task ID (since each runs in sequence).
- Total time â‰ˆ **sum** of all delays (~10 Ã— average delay).


In [2]:
print("=== Sequential example (no concurrency) ===")
start = time.perf_counter()

results = []
for i in range(10):
    results.append(fetch_data(i))
    print(results[-1])
    
elapsed = time.perf_counter() - start
print(f"Total time: {elapsed:.2f}s\n")

=== Sequential example (no concurrency) ===
Task 0 done in 0.59s
Task 1 done in 0.47s
Task 2 done in 0.86s
Task 3 done in 0.41s
Task 4 done in 0.61s
Task 5 done in 0.56s
Task 6 done in 0.73s
Task 7 done in 0.52s
Task 8 done in 0.42s
Task 9 done in 0.62s
Total time: 5.86s



### Using `ThreadPoolExecutor.map()`

This example demonstrates the simplest way to run multiple I/O-bound tasks concurrently 
with threads. The function `fetch_data()` simulates a slow operation (like reading a file 
or downloading data) using `time.sleep()`.

We use `ThreadPoolExecutor.map()` to submit all tasks at once. The key points to notice:

- Tasks **run concurrently**, but results are returned **in the same order** as input.
- Even if later tasks finish earlier, their results are buffered until all previous ones complete.
- This method is ideal for *uniform* workloads where you donâ€™t need results early.

Run the cell and observe:
- Output appears ordered (`Task 0`, `Task 1`, â€¦).
- Total time is **much less** than sequential execution, showing true I/O overlap.

In [3]:
print("=== map() example (ordered results) ===")
start = time.perf_counter()

with ThreadPoolExecutor(max_workers=4) as ex:
    for result in ex.map(fetch_data, range(10)):
        print(result)

elapsed = time.perf_counter() - start
print(f"Total time: {elapsed:.2f}s\n")

=== map() example (ordered results) ===
Task 0 done in 0.36s
Task 1 done in 0.65s
Task 2 done in 0.26s
Task 3 done in 0.46s
Task 4 done in 0.61s
Task 5 done in 0.32s
Task 6 done in 0.86s
Task 7 done in 0.21s
Task 8 done in 0.72s
Task 9 done in 0.33s
Total time: 1.41s



### Using `submit()` and `as_completed()`

This example uses the same `fetch_data()` function, but schedules tasks with 
`executor.submit()` and retrieves results as soon as each one finishes.

Key differences to notice:

- `submit()` returns a **Future** immediately, representing a pending result.
- `as_completed()` yields results **in completion order** â€” faster tasks report first.
- Total runtime is similar to `map()`, but output order reflects real concurrency.

This approach is ideal for **variable-duration tasks** â€” for example, downloading 
files of different sizes or evaluating models with varying compute times.

Run it and observe:
- Output appears **out of order**, depending on random delays.
- You can start processing results while others are still running.


In [4]:
print("=== submit() + as_completed() example (unordered results) ===")
start = time.perf_counter()

with ThreadPoolExecutor(max_workers=8) as ex:
    futures = [ex.submit(fetch_data, i) for i in range(10)]
    for f in as_completed(futures):
        print(f.result())
        
elapsed = time.perf_counter() - start
print(f"Total time: {elapsed:.2f}s\n")

=== submit() + as_completed() example (unordered results) ===
Task 7 done in 0.25s
Task 0 done in 0.28s
Task 5 done in 0.33s
Task 6 done in 0.39s
Task 9 done in 0.30s
Task 2 done in 0.68s
Task 3 done in 0.84s
Task 8 done in 0.66s
Task 4 done in 0.93s
Task 1 done in 0.98s
Total time: 0.98s

