Here's a comprehensive and in-depth set of notes on the **Edit Distance** topic under the subject **PDSA (Programming, Data Structures, and Algorithms using Python)**. It includes all concepts, explanations, examples, recurrence relations, base cases, and algorithmic design.

---

# 🧮 Edit Distance – In-Depth Notes (PDSA)

---

## 📌 **What is Edit Distance?**

The **Edit Distance** (also called **Levenshtein Distance**) between two strings is defined as the **minimum number of operations** required to convert one string into another.

### ✅ **Allowed Operations**

1. **Insertion** – Add a character.
2. **Deletion** – Remove a character.
3. **Substitution** – Replace one character with another.

---

## ✍️ **Motivation**

Edit Distance is widely used in:

* **Spell checkers**
* **Plagiarism detection**
* **Bioinformatics** (e.g., comparing DNA sequences)
* **Version control systems**
* **Natural Language Processing (NLP)**

---

## 🧾 **Example**

Consider:

* `s1 = "intention"`
* `s2 = "execution"`

The minimum edit distance is **5**:

```
intention → inention (delete 't')
          → enention (replace 'i' with 'e')
          → exention (replace 'n' with 'x')
          → exection (replace 'n' with 'c')
          → execution (insert 'u')
```

---

## 🧠 **Key Insight**

We are not interested in **how many** operations we can perform, but in the **minimum** number of operations needed.

---

## 🔁 **Relation to Other String Problems**

### 1. **Longest Common Subsequence (LCS)**:

If only **insertions and deletions** are allowed (no substitutions), then:

* `Edit Distance = (len(s1) - LCS) + (len(s2) - LCS)`

LCS helps find similarity, but Edit Distance quantifies the **minimum transformation effort**.

---

## 🧮 **Formal Definition and Recurrence**

Let:

* `u = a₀a₁...aₘ₋₁` (length m)
* `v = b₀b₁...bₙ₋₁` (length n)
* `ED(i, j)` = minimum edit distance to convert `u[i:]` to `v[j:]`

### 🔁 **Recurrence Relation**

If `a[i] == b[j]`:

```python
ED(i, j) = ED(i+1, j+1)       # Characters match; no edit needed
```

Else:

```python
ED(i, j) = 1 + min(
    ED(i+1, j+1),  # substitution
    ED(i+1, j),    # deletion from u
    ED(i, j+1)     # insertion into u
)
```

---

## 🧱 **Base Cases**

1. **Both strings exhausted:**

```python
ED(m, n) = 0
```

2. **First string exhausted:**

```python
ED(m, j) = n - j     # Insert remaining characters from v[j:]
```

3. **Second string exhausted:**

```python
ED(i, n) = m - i     # Delete remaining characters from u[i:]
```

---

## 📊 **DP Table Setup**

* Table `dp` of size `(m+1) x (n+1)`
* Each `dp[i][j]` stores `ED(i, j)`
* Fill from bottom-right to top-left (or use recursion + memoization)

---

## ✅ **Algorithm (Bottom-Up DP)**

```python
def edit_distance(u, v):
    m, n = len(u), len(v)
    dp = [[0]*(n+1) for _ in range(m+1)]

    # Base Cases
    for i in range(m+1):
        dp[i][n] = m - i  # delete remaining
    for j in range(n+1):
        dp[m][j] = n - j  # insert remaining

    # Fill the table
    for i in range(m-1, -1, -1):
        for j in range(n-1, -1, -1):
            if u[i] == v[j]:
                dp[i][j] = dp[i+1][j+1]
            else:
                dp[i][j] = 1 + min(
                    dp[i+1][j+1],  # substitution
                    dp[i+1][j],    # deletion
                    dp[i][j+1]     # insertion
                )

    return dp[0][0]
```

---

## 🔁 **Memoized Recursive Version**

```python
from functools import lru_cache

def edit_distance(u, v):
    m, n = len(u), len(v)

    @lru_cache(None)
    def dp(i, j):
        if i == m:
            return n - j
        if j == n:
            return m - i
        if u[i] == v[j]:
            return dp(i+1, j+1)
        return 1 + min(
            dp(i+1, j+1),
            dp(i+1, j),
            dp(i, j+1)
        )
    
    return dp(0, 0)
```

---

## 🔎 **Time and Space Complexity**

| Approach        | Time Complexity | Space Complexity |
| --------------- | --------------- | ---------------- |
| DP (Tabulation) | `O(m × n)`      | `O(m × n)`       |
| Memoization     | `O(m × n)`      | `O(m × n)`       |

---

## 🎯 **Traceback for Operations**

To retrieve actual operations (insert, delete, substitute), backtrack from `dp[0][0]` using the choice made:

* Diagonal move with `u[i] ≠ v[j]` ⇒ substitution
* Horizontal move ⇒ insertion
* Vertical move ⇒ deletion

---

## 🧬 **Applications**

1. **Spell Correction** – Find the closest correct word.
2. **DNA Matching** – Count genetic mutations.
3. **Plagiarism Detection** – Measure similarity between documents.
4. **Diff Tools** – Show changes between versions.
5. **Search Engines** – "Did you mean..." suggestions.

---

## 🧠 **Tips for PDSA**

* Always initialize the boundary cases correctly.
* Carefully write recurrence with all 3 choices.
* Trace back carefully to understand how result was built.

---

## 📚 Summary Table

| Concept         | Description                                    |
| --------------- | ---------------------------------------------- |
| Problem         | Minimum edits to convert one string to another |
| Operations      | Insert, Delete, Substitute                     |
| Base Cases      | If one string is empty, insert/delete rest     |
| Recurrence      | Based on whether characters match or not       |
| Complexity      | O(m×n) time and space                          |
| Relation to LCS | Similar structure but maximizing vs minimizing |
| Applications    | NLP, biology, spell-check, diff tools          |