## KMP substring search

KMP (Knuth-Morris-Pratt) is a substring search algorithm.
It works in linear time.
It preprocesses pattern, so when we mismatch, we can use information of what matched earlier to jump to the next possible positions in the text/pattern.

We have a state consisting of
* m: index at text where we started a matfh
* i: index at pattern 
* 
We're building a table T, that tells us to reassign m = m + i - T[i], and making i = T[i]

In [62]:
from dataclasses import dataclass


@dataclass(frozen=True)
class Jump:
    text_delta: int
    pattern_index: int


def kmp_search(text, pattern):
    jumps = build_jumps(pattern)
    print(jumps)
    t = 0
    p = 0
    while t + p < len(text):
        if text[t + p] == pattern[p]:
            p += 1
            if p == len(pattern):
                return t
        else:
            jump = jumps[p]
            p = jump.pattern_index
            t += jump.text_delta
    return -1


def build_jumps(pattern: str) -> list[Jump]:
    if not pattern:
        return []

    jumps = [None] * len(pattern)
    jumps[0] = Jump(text_delta=1, pattern_index=0)
    for i in range(1, len(pattern)):
        prev = jumps[i - 1]
        if pattern[i - 1] == pattern[prev.pattern_index]:
            # extending previous
            jumps[i] = Jump(text_delta=prev.text_delta, pattern_index=prev.pattern_index + 1)
        else:
            rec = jumps[prev.pattern_index]
            delta = rec.text_delta + prev.text_delta
            cur = Jump(text_delta=delta, pattern_index=i - delta)
            jumps[i] = cur

    return jumps

In [63]:
assert kmp_search("abc", "bc") == 1
assert kmp_search("bc", "bc") == 0
assert kmp_search("abcd", "bc") == 1
assert kmp_search("abcd", "bk") == -1
assert kmp_search("abababc", "abc") == 4
assert kmp_search("ababababc", "ababc") == 4

[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0)]
[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0)]
[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0)]
[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0)]
[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0), Jump(text_delta=2, pattern_index=0)]
[Jump(text_delta=1, pattern_index=0), Jump(text_delta=1, pattern_index=0), Jump(text_delta=2, pattern_index=0), Jump(text_delta=3, pattern_index=0), Jump(text_delta=4, pattern_index=0)]
