Here is a **comprehensive, in-depth note** on the topic **Common Subwords and Subsequences** under **PDSA (Programming, Data Structures, and Algorithms using Python)**:

---

# 📚 PDSA: Common Subwords and Subsequences

## 🔸 Overview

This topic introduces two classic problems on strings that are solved using **Dynamic Programming (DP)**:

1. **Longest Common Subword (LCW)** – a contiguous matching segment in two strings.
2. **Longest Common Subsequence (LCS)** – a non-contiguous, but ordered matching sequence in two strings.

These problems demonstrate the use of **memoization**, **inductive substructure**, and **efficient DP table-filling strategies**.

---

## 🔹 1. Longest Common Subword (LCW)

### 🔍 Definition

Given two strings `u` and `v`, find the **longest contiguous substring (segment)** present in both.

Example:

* `u = "bisect"`, `v = "secret"`
* Common subwords: `"sec"` → length = 3

### ❌ Brute-force Approach

* For each pair of positions `i` in `u` and `j` in `v`, check how long the matching segment is.
* Continue comparing `u[i+k]` with `v[j+k]` until characters differ.
* Time complexity: **O(m × n × min(m, n))** in worst case.

### ✅ Efficient DP Approach

#### ✨ Key Insight (Inductive Substructure)

Let `LCW[i][j]` denote the length of the longest common subword starting at `u[i]` and `v[j]`.

* **If** `u[i] == v[j]`:
  `LCW[i][j] = 1 + LCW[i+1][j+1]`

* **Else**:
  `LCW[i][j] = 0`

#### 📌 Base Case

* If either string ends:
  `LCW[i][n] = LCW[m][j] = 0`

#### 🧠 Implementation Idea

* Use a **DP table of size (m+1) x (n+1)** initialized with zeros.
* Fill the table **bottom-up** (from last character to first).
* Track the **maximum value** encountered while filling the table.

#### 🧮 Time and Space Complexity

* Time: **O(m × n)**
* Space: **O(m × n)** (can optimize to O(n) with rolling arrays)

#### 🧵 Recovering the Subword

* After computing the DP table, backtrack from the cell with the max value along the diagonal to get the subword.

---

## 🔹 2. Longest Common Subsequence (LCS)

### 🔍 Definition

Find the **longest sequence of characters** that appears in the same relative order in both strings, **not necessarily contiguous**.

Example:

* `u = "bisect"`, `v = "secret"`
* Common subsequence: `"sect"` → length = 4

### 📌 Applications

* **Genomics**: Matching DNA sequences.
* **Unix diff command**: Comparing text files line-by-line.
* **Version control systems**: Finding changes in source code.

### ❌ Brute-force Approach

* Try all subsequences of `u` and `v`, check for matches.
* Exponential time → not feasible.

### ✅ Efficient DP Approach

#### ✨ Key Insight (Inductive Substructure)

Let `LCS[i][j]` be the length of the longest common subsequence starting at `u[i]` and `v[j]`.

* **If** `u[i] == v[j]`:
  `LCS[i][j] = 1 + LCS[i+1][j+1]`

* **Else**:
  `LCS[i][j] = max(LCS[i+1][j], LCS[i][j+1])`

#### 🧠 Decision Making

* If characters match, include and move diagonally.
* If not, try both possibilities: skip from `u` or `v`, and pick the max.

#### 📌 Base Case

* If either string ends:
  `LCS[i][n] = LCS[m][j] = 0`

#### 🧮 Time and Space Complexity

* Time: **O(m × n)**
* Space: **O(m × n)** (can optimize to O(n) with rolling arrays)

#### 🔄 Table Construction Order

* Fill from bottom-right to top-left.
* At each cell, use values from:

  * Right `LCS[i][j+1]`
  * Below `LCS[i+1][j]`
  * Diagonal `LCS[i+1][j+1]` (if match)

#### 🧵 Recovering the Subsequence

* Start from `LCS[0][0]`.
* Track back through table:

  * If `u[i] == v[j]`, include character and go diagonally.
  * If `LCS[i+1][j] > LCS[i][j+1]`, go down.
  * Else, go right.

---

## 🆚 LCW vs LCS

| Feature                      | LCW (Subword)             | LCS (Subsequence)             |
| ---------------------------- | ------------------------- | ----------------------------- |
| Must be contiguous           | ✅ Yes                     | ❌ No                          |
| DP Dependency                | Diagonal only (`i+1,j+1`) | Diagonal + Right + Down       |
| Start position of answer     | Anywhere                  | Always at `(0,0)`             |
| Recovering actual sequence   | Backtrack along diagonals | Backtrack using max direction |
| Common in string algorithms? | Moderate                  | Very common                   |

---

## 🔧 Python (Pseudo)Code for LCS

```python
def LCS(u, v):
    m, n = len(u), len(v)
    table = [[0] * (n+1) for _ in range(m+1)]

    for i in range(m-1, -1, -1):
        for j in range(n-1, -1, -1):
            if u[i] == v[j]:
                table[i][j] = 1 + table[i+1][j+1]
            else:
                table[i][j] = max(table[i+1][j], table[i][j+1])
    
    return table[0][0]  # LCS length
```

---

## 🧠 Key Learnings

* Dynamic programming solves overlapping subproblem efficiently.
* LCW uses strictly diagonal recurrence; LCS uses more general branching.
* Base cases are essential to initialize recurrence.
* Recovering the answer requires tracking how each cell was computed.
* Real-world applications make LCS highly relevant.