# 🧵 Threading in Python – A Complete Guide

## 🚀 What is a Process?

A **process** is an instance of a running program.

* When you open Chrome, a process is created.
* When you run a Python script, a process is created.

Each process has:

* Its **own memory space**
* Its **own CPU time**
* **No direct access** to another process’s memory

📌 **Think of a process as a person with their own desk, files, and tools — they don’t share their desk with others.**


## 🧵 What is a Thread?

A **thread** is a smaller unit of execution **within a process**.
> A unit of execution is just a chunk of code (usually a function or method) that the system can run independently — in a thread or process.

* A process can have **one or many threads**.
* All threads in a process **share the same memory space**.

📌 **Think of threads as coworkers working at the same desk (the process). They can access the same files, but they might clash if they don't coordinate.**

## ⚙️ Why Use Threads?

### 🧠 Use Case: Suppose you're building a web scraper

You want to:

* Download 50 web pages
* Process them
* Save data to a file

If done one by one, it could take a long time due to waiting on downloads. Instead, with **threads**, you can download multiple pages **concurrently**, making your program much faster.

Great question! To understand how **threads** relate to **cores**, we need to break this down into a few key ideas: CPU, core, thread (in programming), and how the operating system schedules them.

---

## 🔧 What is a CPU Core?

A **CPU core** is a physical processing unit inside your CPU.

* A single-core CPU can handle **one task at a time**.
* A dual-core CPU can handle **two tasks at a time**.
* A quad-core CPU can handle **four tasks at a time**, and so on.

Modern CPUs often have **multiple cores** and **hyper-threading**, which we'll touch on later.

## 🧵 What is a Thread in Programming?

A **thread** is a unit of execution within a process (as discussed earlier). It's how your program does work.

You can think of:

* One **program** = one **process**
* One process = **one or more threads**


## 🔄 How Threads and Cores Work Together

Now the key part:

> **Threads are scheduled by the operating system to run on available cores.**

Here’s how it plays out:

* If you have 4 threads and 2 cores, the OS will run 2 threads at a time, switching between them rapidly.
* If you have 4 threads and 4 cores, all threads can run **simultaneously** — true parallel execution.

### 🔁 Time-Slicing (Context Switching)

When there are **more threads than cores**, the OS uses a technique called **context switching** — rapidly switching between threads so fast it seems like they’re all running at once. But only `n` threads (where `n = number of cores`) can run *truly in parallel*.


## 🧠 Bonus: Hyper-Threading

Some CPUs can run **two threads per core** using **hyper-threading**:

* A 4-core CPU may support 8 concurrent threads.
* These aren’t as powerful as having 8 real cores but improve performance on multi-threaded tasks.

---

## 📦 Real-World Use Cases for Threading

| Use Case                  | Why Threading?                      |
| ------------------------- | ----------------------------------- |
| Downloading many files    | While one thread waits, others work |
| Web scraping              | Pages can load in parallel          |
| Sending multiple emails   | Send emails without blocking UI     |
| Periodic background tasks | Run while main app continues        |

---

## ⚠️ Limitations of Threading in Python

* Threads are limited by the GIL (one thread at a time).
* Not useful for **CPU-heavy tasks** (like video processing).
* Can be tricky to debug due to **shared state** and **race conditions**.

---

## 📚 Key Threading Concepts

| Term           | Explanation                                                                    |
| -------------- | ------------------------------------------------------------------------------ |
| Thread         | A unit of execution inside a process                                           |
| GIL            | Python’s lock that prevents true parallelism in threads                        |
| Thread-safe    | Code that handles multiple threads without bugs                                |
| Race condition | A bug when multiple threads modify shared data without coordination            |
| Lock/Mutex     | A tool to prevent multiple threads from accessing critical code simultaneously |

---

## ✅ Summary

* Threads let you run multiple tasks **concurrently** in one process.
* They’re best for **I/O-bound** operations like network and file access.
* Threads **share memory**, which is powerful but dangerous without precautions (locks).
* Python threads are **limited by the GIL**, so use **multiprocessing** for heavy computation.

---
---



# 🧠 What is the GIL?

**The Global Interpreter Lock (GIL)** is a **mutex** (mutual exclusion lock) that **prevents multiple native threads from executing Python bytecode at the same time** in a single Python process.

* It is **specific to CPython**, the default implementation of Python.
* It ensures **thread safety** in the Python interpreter.

### 🔧 Why does Python have a GIL?

Python objects like lists, dictionaries, and integers are **not thread-safe** by default. Without the GIL, two threads could modify shared memory at the same time, causing crashes or corrupt data.

So, the GIL is a **simple solution** to make memory management safe — especially since Python uses **reference counting** for memory management.

## 🍕 Real-World Analogy

Imagine a pizza shop with **only one oven**:

* Multiple chefs (threads) want to bake pizzas (run code).
* But only **one chef can use the oven at a time** (GIL ensures one thread executes Python bytecode at a time).
* Even if you have 4 chefs (threads) and 4 burners (CPU cores), only one can bake at any moment.


## ⚠️ What does the GIL *actually* block?

It **blocks multiple threads from running Python code at once**, even on multi-core machines.

* ✅ **Allowed**: One thread running Python code.
* ✅ **Allowed**: Other threads doing I/O (reading files, downloading data) — they release the GIL.
* ❌ **Not allowed**: Two threads doing Python math at the same time.


## 🔁 But wait — Python has `threading`, right?

Yes, but...

### 📌 Two Kinds of Tasks:

| Task Type                         | GIL Helps? | GIL Hurts? | Better Option              |
| --------------------------------- | ---------- | ---------- | -------------------------- |
| **I/O-bound** (e.g., file, web)   | ✔ Yes      | –          | `threading`, `asyncio`     |
| **CPU-bound** (e.g., math, loops) | –          | ❌ Yes      | `multiprocessing`, `C/C++` |

So even if you use threads, **CPU-bound tasks won't be truly parallel** in CPython due to the GIL.


## 🔬 Technical Insight: How it works

Behind the scenes:

```text
Thread 1:        ⏳ Acquires GIL ➡ Executes Python ➡ Releases GIL
Thread 2:                          ⏳ Acquires GIL ➡ Executes ➡ ...
```

Python periodically **switches threads** (after some number of bytecode instructions or time slice). So, even though only one runs at a time, **they take turns** — fast enough to seem concurrent.


## 🛠️ Example: Threading vs Multiprocessing

### With `threading` (GIL blocks true parallel CPU usage):

```python
import threading

def cpu_task():
    for _ in range(10**7):
        pass

for _ in range(4):
    threading.Thread(target=cpu_task).start()
```

### With `multiprocessing` (bypasses GIL):

```python
from multiprocessing import Process

def cpu_task():
    for _ in range(10**7):
        pass

for _ in range(4):
    Process(target=cpu_task).start()
```

➡ The second one will run **in parallel on multiple cores**. The first one will just **switch between threads**.


## 🔄 Does Every Python Have a GIL?

| Implementation | GIL Present? | Notes                              |
| -------------- | ------------ | ---------------------------------- |
| **CPython**    | ✅ Yes        | Default Python                     |
| **Jython**     | ❌ No         | Java-based                         |
| **IronPython** | ❌ No         | .NET-based                         |
| **PyPy**       | ✅ Yes        | But trying to improve GIL handling |


## ✅ Summary

* The **GIL** allows only one thread to execute Python bytecode at a time.
* It makes Python **easier and safer**, but **limits CPU-bound multithreading**.
* For **I/O-bound tasks**: use `threading` or `asyncio` (they release GIL during wait).
* For **CPU-bound tasks**: use `multiprocessing` to achieve real parallelism.

---
---

In [1]:
from functools import reduce
import time

def calculate_sum_squares(n):
    print(reduce(lambda x,y: x+y, map(lambda x: x**2, range(n))))

def sleep(sec):
    time.sleep(sec)

def main():
    calc_start = time.time()
    
    for i in range(10):
        calculate_sum_squares((i+1)*100000)
        
    print(f'Calculation took : {round(time.time() - calc_start, 2)}')
    
    sleep_start = time.time()
        
    for i in range(1,6):
        sleep(i)
    
    print(f'Sleep took : {round(time.time() - sleep_start, 2)}')

In [2]:
main()

333328333350000
2666646666700000
8999955000050000
21333253333400000
41666541666750000
71999820000100000
114333088333450000
170666346666800000
242999595000150000
333332833333500000
Calculation took : 1.03
Sleep took : 15.0


In [3]:
import threading                     # For creating threads
from functools import reduce         # For applying reduce on a sequence
import time                          # To measure execution time and use sleep

# Function to calculate sum of squares from 0 to n-1
def calculate_sum_squares(n):
    # Uses map to square each number, then reduce to sum them
    print(reduce(lambda x, y: x + y, map(lambda x: x**2, range(n))))

# Function that just sleeps for `sec` seconds
def sleep(sec):
    time.sleep(sec)

# Main function where threading is used
def main():
    calc_start = time.time()  # Start timer for calculation section
    
    current_threads = []  # Keep track of threads

    # Start 10 threads to calculate sum of squares up to increasing `n`
    for i in range(10):
        n = (i + 1) * 100000     # n = 100000, 200000, ..., 1000000
        t = threading.Thread(target=calculate_sum_squares, args=(n,))
        t.start()                # Start the thread
        current_threads.append(t)

    # Wait for all threads to finish (join)
    for i in range(len(current_threads)):
        current_threads[i].join()

    print(f'Calculation took : {round(time.time() - calc_start, 2)}')

    # Start of sleep test
    sleep_start = time.time()

    current_threads = []

    # Start 5 threads that each sleep for different durations
    for i in range(1, 6):    # sleep(1), sleep(2), ..., sleep(5)
        t = threading.Thread(target=sleep, args=(i,))
        t.start()
        current_threads.append(t)

    # Wait for all threads to finish (join)
    for i in range(len(current_threads)):
        current_threads[i].join()

    print(f'Sleep took : {round(time.time() - sleep_start, 2)}')

In [4]:
main()

333328333350000
2666646666700000
8999955000050000
21333253333400000
41666541666750000
71999820000100000
170666346666800000
114333088333450000
242999595000150000
333332833333500000
Calculation took : 1.02
Sleep took : 5.0


### 🧠 **Explanation**

#### 🔷 Part 1: **Calculating Sum of Squares**

* The program creates **10 threads**, each calculating the sum of squares from 0 to `n-1`, where `n` increases from 100,000 to 1,000,000.
* This is a **CPU-bound** task: each thread performs heavy computation.
* Despite using threads, Python’s **GIL (Global Interpreter Lock)** prevents true parallel execution of CPU-bound code. So these calculations run **mostly sequentially**.
* You may not see a big time benefit here from threading.

#### 🔷 Part 2: **Sleeping in Threads**

* This part uses **5 threads** to sleep for 1 to 5 seconds each.
* This is an **I/O-bound** task: each thread just waits, not using CPU.
* Threads are **great for I/O-bound tasks** because while one thread sleeps, others can run.
* So the total sleep time will be close to 5 seconds (not 1+2+3+4+5 = 15), showing **concurrency**.

---

### ⏱️ **Expected Output Summary**

* `Calculation took`: likely to take **several seconds**, but **not 10× slower** because of some thread interleaving.
* `Sleep took`: around **5 seconds**, thanks to concurrent sleeping.

---
---



### 🧵 `daemon=True` in Python Threads

When you create a thread using the `threading` module, you can pass `daemon=True` to indicate that the **thread is a daemon thread**.

---

### ✅ What is a Daemon Thread?

A **daemon thread** is a background thread that runs **in support of other threads**. When all **non-daemon (main/user) threads** complete, the program **will exit**, even if daemon threads are still running.

---

### 🧠 Why Use `daemon=True`?

It’s useful when:

* You want a thread to run in the background.
* You don’t want that thread to block your program from exiting.
* Example: background logging, monitoring, auto-save, etc.

---

### 🛑 Behavior Difference

| Thread Type                | Blocks Program Exit? | Use Case                          |
| -------------------------- | -------------------- | --------------------------------- |
| `daemon=False` *(default)* | ✅ Yes                | Main or worker tasks              |
| `daemon=True`              | ❌ No                 | Background helpers (e.g. loggers) |

---

### 🔁 Example

```python
import threading
import time

def background_task():
    while True:
        print("Running in background...")
        time.sleep(1)

# Daemon thread
t = threading.Thread(target=background_task, daemon=True)
t.start()

time.sleep(3)
print("Main program finished!")
```

### 🧾 Output

```
Running in background...
Running in background...
Running in background...
Main program finished!
```

* The program exits after 3 seconds.
* Even though `background_task()` is infinite, the **daemon thread is killed automatically** when the main program ends.

---
