In [1]:
# KMP (Knuth-Morris-Pratt) Pattern Search algorithm

# The Knuth-Morris-Pratt (KMP) algorithm is an algorithm that is used to search for a substring (W), 
# in a given string (S), in O(m+n) O(m+n)  time (where m m  and n n  are the lengths of W and S).
# This algorithm campares character by character from left to right. But whenever a mismatch occurs, 
# it uses a preprocessed table called "Prefix Table" to skip characters comparison while matching.

# The naive string matching algorithm would either use a sliding window, or a two pointer approach 
# which would result in extra comparisons.  The time complexity for the naive algorithm would be O(mn).

# Components of the KMP algorithm:
# a. Prefix.
# b. Suffix.
# c. LPS table : Table for detecting the Longest Proper Prefix that is also a Suffix
# In this algorithm, we have 2 pointers and we work on the LPS table and the string. 
# We compare string[i] and pattern[j]. There are 3 operations that could happen on an iteration of the while loop:
# a. String match, increment i and j b. String mismatch, but j > 0, so move j to LPS[j - 1] and leave i as it is and compare 
# c. String mismatch, but j = 0, so increment i and compare

# Advantages of the KMP algorithm
# A very obvious advantage of the KMP algorithm is it's time complexity. 
# It's very fast as compared to any other exact string matching algorithm.
# No worse case or accidental inputs exist here.

# Disadvantage of the KMP algorithm
# The only disadvantage of the KMP algorithm is that it is very complex to understand.

# Applications of the KMP Algorithm
# It's uses are :
# Checking for Plagiarism in documents etc
# Bioinformatics and DNA sequencing
# # Digital Forensics
# Spelling checkers
# Spam filters
# Search engines, or for searching content in large databases
# Intrusion detection system


In [2]:
def KMP(text, pattern):
 
   
    if not pattern:
        print('The pattern occurs with shift 0')
        return
 
  
    if not text or len(pattern) > len(text):
        print('Pattern not found')
        return
 
    chars = list(pattern)
 
 
    next = [0] * (len(pattern) + 1)
 
    for i in range(1, len(pattern)):
        j = next[i + 1]
 
        while j > 0 and chars[j] is not chars[i]:
            j = next[j]
 
        if j > 0 or chars[j] == chars[i]:
            next[i + 1] = j + 1
 
    j = 0
    for i in range(len(text)):
        if j < len(pattern) and text[i] == pattern[j]:
            j = j + 1
            if j == len(pattern):
                print('Pattern occurs with shift', (i - j + 1))
        elif j > 0:
            j = next[j]
            i = i - 1        
 
 

 
text = 'ABCABAABCABAC'
pattern = 'CAB'
 
KMP(text, pattern)

Pattern occurs with shift 2
Pattern occurs with shift 8


In [3]:
def KMPSearch(pat, txt):
    M = len(pat)
    N = len(txt)
  
   
    lps = [0]*M
    j = 0 
  
 
    computeLPSArray(pat, M, lps)
  
    i = 0 
    while i < N:
        if pat[j] == txt[i]:
            i += 1
            j += 1
  
        if j == M:
            print ("Found pattern at index " + str(i-j))
            j = lps[j-1]
  
       
        elif i < N and pat[j] != txt[i]:
           
            if j != 0:
                j = lps[j-1]
            else:
                i += 1
  
def computeLPSArray(pat, M, lps):
    len = 0 
  
    lps[0] 
    i = 1
  
    
    while i < M:
        if pat[i]== pat[len]:
            len += 1
            lps[i] = len
            i += 1
        else:
          
            if len != 0:
                len = lps[len-1]
  
                
            else:
                lps[i] = 0
                i += 1
  
txt = "ABABDABACDABABCABAB"
pat = "ABABCABAB"
KMPSearch(pat, txt)
  

Found pattern at index 10
