# String Algorithms and Techniques
### String notations and concepts
Strings are basicallly a sequence of objects, mainly a sequence of characters. As with any other data type, such as an `int` or `float`, we need to store the data and operations that have to be applied to them. String data types allow us to store the data, and Python provides a rich set of operations and functions that can be applied to the data of the string type. Most of the operations and functions provided by Python that can be applied to the strings were described in Chapter 1.

Strings are mainly textual data that is generally handled very efficiently. The following is an example of a string:
```python 
"packt publishing" 
```
A substring is also a sequence of characters that is part of the given string. For example, `"packt"` is a substring for the string `"packt publishing"`.

A subsequence is a sequence of characters that can be obtained from the given string by removing some of the characters from the string but by keeping the order of occurrence of the characters. For example, `"pct publishing"` is a valid subsequence for the string `"packt pblishing"` that is obtained by removing the characters a, k, and u. However, this is not a substring. A subsequence is different from a substring, since it can be considered as a generalization of substrings.

The prefix of a string, `s`, is the substring of `s` in that it is present in the starting of the string. There is also another string, `u`, that exists in the string `s` after the prefix. For example, the substring `"pack"` is a prefix for the string `s = "packt publishing"` as it is starting the substring and there is another substring after it.

The suffix `d` is a substring that is present at the end of the string `s` so that there is another nonempty substring existing before substring `d`.  For example, the substring `"shing"` is the suffix for the string `"packt publishing"`. Python has built-in functions to check whether a string has a given prefix or suffix as shown in the code snippet:

In [1]:
string = "this is data structures book by packt publisher"
suffix = "publisher"
prefix = "this"
print(string.endswith(suffix))   # Check if string contains given suffix
print(string.startswith(prefix)) # Check if string contains given prefix

True
True


Pattern matching algorithms are the most important string processing algorithms, and we will be discussing them in subsequent sections.
### Pattern matching algorithms
A pattern matching algorithm is used to determine the index positions where a given pattern string is matched in a text string. It returns `"pattern not found"` if the pattern does not match in the text string. For example, for the given string `s = "packt publisher"`, and the pattern `p = "publisher"`, the pattern matching algorithm returns the index position where the pattern is matched in the text string.

In this section, we will discuss four pattern matching algorithms, that is, the brute-force method, as well as the Rabin-Karp algorithm, Knuth-Morris-Pratt (KMP), and Boyer-Moore pattern matching algorithms.
### The brute-force algorithm
The brute-force algorithm, or naive approach for the pattern matching algorithm, is very basic. Using this, we simply test all the possible combinations of the input pattern in the given string to find the position of occurrence of the pattern. This algorithm is very naive and is not suitable if the text is very long.

Here, we start by comparing the characters of the pattern and the text string one by one, and if all the characters of the pattern are matched with the text, we return the index position of the text where the first character of the pattern is placed. If any character of the pattern is mismatched with the text string, we shift the pattern by one place. We continue comparing the pattern and text string by shifting the pattern by one index position.

Here, let's consider the Python implementation of the brute-force algorithm for pattern matching:

In [2]:
def brute_force(text, pattern):
    l1 = len(text)      # The text which is to be checked for the existence of the pattern
    l2 = len(pattern)   # The pattern to be determined in the text
    i= 0
    j=0          
 # looping variables are set to 0

    flag = False        # If the pattern doesn't appear at all, then set this to false and execute the last if statement
    while i < l1:       # iterating from the 0th index of text
        j = 0
        count = 0       # Count stores the length upto which the pattern and the text have matched
        while j < l2:
            if i+j<l1 and text[i+j] == pattern[j]:  # statement to check if a match has occoured or not
                count += 1                          # if the statement evaluates to true, then update count
            j += 1
        if count == l2:                             # if total number of successful matches is equal to count of the array
            print("\nPattern occours at index", i)   # print the starting index of the successful match
            flag = True                             # Even if the matching occours once, set this flag to True
        i += 1
    if not flag:                                    # If the pattern doesn't occours even once, this statement gets executed
        print('\nPattern is not at all present in the array')

brute_force('acbcabccababcaacbcaabacbbc','acbcaa')                    # function call


Pattern occours at index 14


In the preceding code for the brute-force approach, we start by computing the length of the given text strings and pattern. We also initialize the looping variables with `0` and set the flag to `False`. this variable is used to continue searching for a match of the pattern in the string. If the flag is `False` by the end of the text string, it means that there is no match of the pattern at all in the text string.

Next, we start the searching loop from the 0<sup>th</sup> index to the end of the text string. In this loop, we have a count variable that is used to keep track of the length up to which the pattern and the text have been matched. Next, we have another nested loop that runs from the 0<sup>th</sup> index to the length of the pattern. Here, the variable `i`  keeps track of the index position in the text string and the variable `j` keeps track of the characters in the pattern. Next, we compare the characters of the patterns and the text string using the following code fragment:
```python
if i + j < l1 and text[i+j] == pattern[j]:
```
Furthermore, we increment the count variable after every match of the character of the pattern in the text string. Then, we continue matching the characters of the pattern and text string. If the length of the pattern becomes equal to the count variable, it means there is a match.

We print the index position of the text string if there is a match of the pattern in the text string, and keep the flag variable to `True` as we wish to continue searching for more matches of the patterns in the text string. Finally, if the value of the variable flag is `False`, it means that there was not a match of the pattern in the text string at all.

The best-case and worst-case time complexity for the naive string matching algorithms are $\mathcal{O}(n)$ and $\mathcal{O}(m*(n-m+1))$, respectively. The best-case occurs when the pattern is not found in the text and first character of the pattern is not present in the text at all, for example, if the text string is `ABAACEBCCDAAEE`, and the pattern is `FAA`. Here, as the first character of the pattern will not match in the text, it will have comparisons equal to the length of the text $n$.

The worst-case occurs when all characters of the text string and the pattern are the same, for example, if the text string is `AAAAAAAAAAA`, and the pattern is `AAAA`. Another worst case scenario occurs when only the last character is different, for example, if the text string is `AAAAAAAAAAAAAF` and the pattern is `AAAAF`. Thus, worst-case time complexity would be $\mathcal{O}(m*(n-m+1))$.
### Rabin-Karp algorithm
The Rabin-Karp pattern matching algorithm is an improved version of the brute-force approach for finding the location of a given pattern in a text string. The performance of the Rabin-Karp algorithm is improved by reducing the number of comparisons with the help of hashing. The hashing function returns a unique numeric value for a given string. The algorithm is faster than the brute-force approach as it avoids unnecessary comparisons, character by character. Instead, the hash value of the pattern is compared with the hash of the substring of the text string all at once. If the hash values are not matched, the pattern is moved one position, and so there is no need to compare all the characters of the pattern one by one.

The algorithm is based on the concept that if the hash values of the two strings are equal, then it is assumed that both of these strings are also equal. The main problem with this algorithm is that there can be two different strings whose has values are equal. In that case, the algorithm may not work; this situation is known as a spurious hit. To avoid this problem, after matching the hash values of the pattern and the substring, we ensure that the pattern is actually matched by comparing them character by character.

The Rabin-Karp pattern matching algorithm works as follows:

1. First, we preprocess the pattern before starting the search, that is, we compute the hash value of the pattern of length $m$ and the hash values of all the possible substrings of the text of length $m$. So, the total number of possible substrings would be $n-m+1$. Here, $n$ is the length of the text.
2. We compare the hash value of the pattern and compare it with the hash value of the substrings of the text one by one.
3. If the hash values are not matched, then we move the pattern by one position.
4. If the hash value of the pattern and the hash value of the substring of the text matches, then we compare the pattern and substring character by character to ensure that the pattern is actually found in the text.
5. We continue the process of streps 2 through 4 until we reach the end of the given text string.

In this algorithm, we can compute the numerical hash values using Homer's rule or any other hashing function that returns a unique value for the given string. We can also compute the hashing value using the sum of the ordinal values of all the characters of the string. Example on pages 293-294.
### Implementing the Rabin-Karp algorithm
The first step to implementing the Rabin-Karp algorithm is to choose the hash function. We use the sum of all the ordinal values of the characters of the string as the hashing function. We start by storing the ordinal values of all the characters of the text and the pattern. Next, we store the length of the text and the pattern in `len_text` and `len_pattern` variables. Next, we compute the hash value for the pattern by summing up the ordinal values of all the characters in the pattern.

Next, we create a variable called `len_hash_array` that stores the total number of possible substrings of length using `len_text - len_patttern + 1`, and we create an array called `hash_text` that stores the ahsh value for all the possible substrings.

Next, we start a loop that will run for all the possible substrings of the text. Initially, we compute the hash value for the first substring by summing the ordinal values of all of its characters using `sum(ord_text[:len_pattern])`. Furthermore, the hash values for all of the substrings are computed using the hash values of its previous substrings as `((hash_text[i-1] - ord_text[i-1]) + ord_text[i+len_pattern-1])`.

The complete Python implementation to compute the hashing values is shown here:

In [3]:
def generate_hash(text, pattern):
    ord_text = [ord(i) for i in text]           # Stores unicode value of each character in the text
    ord_pattern = [ord(j) for j in pattern]     # Stores unicode value of each character in the pattern
    len_text = len(text)
    len_pattern = len(pattern)
    hash_pattern = sum(ord_pattern)
    len_hash_array = len_text - len_pattern + 1 # Stores the length of the new array that will contain the
                                                # the hash values of text
    hash_text = [0] * (len_hash_array)          # Initialize all the values in the array to zero
    for i in range(0, len_hash_array):
        if i == 0:
            # Inital value of hash function
            hash_text[i] = sum(ord_text[:len_pattern])
        else:
            # Calculating next hash value using previous value
            hash_text[i] = ((hash_text[i-1] - ord_text[i-1]) + ord_text[i+len_pattern-1])
            
    return [hash_text, hash_pattern]            # Return the hash values

After preprocessing the pattern and text, we have precomputed hash values that we will use for comparing the pattern and the text. The implementation of the main Rabin-Karp algorithm works as follows. First, we convert the given text and pattern in string format as the ordinal values can only be computed for the strings.

Next, we call the `generate_hash` function to compute the hash values. We also store the length of the text and patterns in the `len_text` and `len_pattern` variables. We also initialize the `flag` variable to `False` so that it keeps track of whether the pattern is present in the text at least once.

Next, we start a loop that implements the main concept of the algorithm. This loop will run for the length of the `hash_text`, which is the total number of possible substrings. Initially, we compare the first hash value of the substring with the hash of the pattern by using `if hash_text[i] == hash_pattern`. They do not match; we do nothing and look for another substring. If they match, we compare the substring and the pattern character by character through a loop by using `if pattern[j] == text[i+j]`.

We then create a `count` variable to keep track of how many characters match in the pattern and the substring. If the length of the count and length of the pattern become equal, this means that all of the characters match, and the index location is returned where the pattern was found. Finally, if the `flag` variable remains `False`, this means that the pattern does not match at all in the text.

The complete Python implementation of the Rabin-Karp algorithm is show here:

In [6]:
def rabin_karp_matcher(text, pattern):
    text = str(text)                 # Convert text to a string type
    pattern = str(pattern)           # Convert the pattern to a string type
    hash_text, hash_pattern = generate_hash(text, pattern) # Generate hash values using generate_hash functions
    len_text = len(text)
    len_pattern = len(pattern)
    flag = False                     # Checks if pattern is present at least once or not at all
    for i in range(len(hash_text)):
        if hash_text[i] == hash_pattern:
            count = 0
            for j in range(len_pattern):
                # Comparing pattern and substring character by character
                if pattern[j] == text[i+j]:
                    count += 1
                else:
                    break
            # Pattern is found in the text
            if count == len_pattern:
                flag = True          # Update flag accordingly
                print("Pattern occurs at index", i)
    if not flag:
        print("Pattern is not present in the text.")

In [8]:
# Tests
# Works for numeric
rabin_karp_matcher("101110000011010010101101","1011")

# Works for alphabets
rabin_karp_matcher("ABBACCADABBACCEDF","ACCE")

# Works for alpha numeric
rabin_karp_matcher("abc1-3klm890zsdoifjwej8cjv09wn vn09aej09jv 09wje09cj 09 j093j0 9j 092j3 09c09", "09w")

Pattern occurs at index 0
Pattern occurs at index 18
Pattern occurs at index 11
Pattern occurs at index 26
Pattern occurs at index 43


The Rabin-Karp pattern matching algorithm preprocesses the pattern before searching, that is, it computes the hash value for the pattern that has the complexity of $\mathcal{O}(m)$. Also, the worst-case running time complexity of the Rabin-Karp algorithm is $\mathcal{O}(m(n-m+1))$. The worst-case occurs when the pattern doesn't occur at all, and the average-case occurs when the pattern occurs at least once.
### The Knuth-Morris-Pratt algorithm
The **Knuth-Morris-Pratt (KMP)** algorithm is a pattern matching algorithm that is based on a precomputed prefix function that stores the information of an overlapping text portion in the pattern. The KMP algorithm preprocesses this pattern to avoid unnecessary comparisons when using the prefix function. The algorithm utilizes the prefix function to estimate how much the pattern should be shifted to search the pattern in the text string whenever we get a mismatch. The KMP algorithm is efficient as it minimizes the comparisons of the given patterns with respect to the text string. An example of the motivation behind the KMP algorithm works is on page 297.
### The prefix function
The `prefix` function (also known as the failure function) finds the pattern in the pattern itself. It tries to find how much the previous comparisons can be reused due to repetition in the pattern itself when there is a mismatch. It has a value that is mainly the longest prefix which is also a suffix. Example is on pages 298-300.
### Understanding KMP algorithms
The KMP pattern matching algorithm uses a pattern that has overlap in the pattern itself so that it avoids unnecessary comparisons. The main idea behind the KMP algorithm is to detect how much the pattern should be shifted, based on the overlaps in the patterns. The algorithm works as follows:
1. First, we precompute the `prefix` function for the given pattern and initialize a counter, q, that represents the number of characters that matched.
2. We start by comparing the first character of the pattern with the first character of the text string, and if this matches, then we increment the counter, q, for the pattern and the counter for the text string, and we compare the next character.
3. If there is a mismatch, then we assign the value of the precomputed `prefix` function for q to the index value of q.
4. We continue searching the pattern in the text string until we reach the end of the text, that is, if we do not find any matches. If all of the characters in the pattern are matched in the text string, we return the position where the pattern is matched in the text and continue to search for another match.

Another example is on pages 301-303. It takes the space and time complexity of $\mathcal{O}(m)$, and further, in the second phase, that searching, the KMP algorithm takes time complexity of $\mathcal{O}(n)$.
### Implementing the KMP algorithm
The Python implementation of the KMP algorithm is explained here. We start by implementing the `prefix` function for the given pattern. For this, first, we compute the length of the pattern by using the `len()` function, and then we initialize a list to store the computed values by the `prefix` function.

Next, we start the loop that executes from 2 to the length of the pattern. Then, we have a nested loop that is executed until we have processed the whole pattern. The variable k is initialized to `0`, which is the `prefix` function for the first element of the pattern. It the k<sup>th</sup> initialized to `0`, which is the `prefix` function for the first element of the pattern. If the k<sup>th</sup> element of the pattern is equal to the q<sup>th</sup> element, then we increment the value by 1.

The value of k is the computed value by the `prefix` function, and so we assign it at the index position of the q of the pattern. Finally, we return the list of the `prefix` function that has the computed value for each character of the pattern. The code for the prefix function is shown here:

In [9]:
def pfun(pattern):
    """Function to generate function for the given pattern"""
    n = len(pattern)                     # Length of the pattern
    prefix_fun = [0] * n                 # Initialize all elements of the list to 0
    k = 0
    for q in range(2, n):
        while k > 0 and pattern[k+1] != pattern[q]:
            k = prefix_fun[k]
        # If the kth element of the pattern is equal to the qth element
        if pattern[k+1] == pattern[q]:
            k += 1                       # Update k accordingly 
        prefix_fun[q] = k
    return prefix_fun                    # Return the prefix function

Once we have created the `prefix` function, we implement the main KMP matching algorithm. We start by computing the length of the text string and the pattern, which are stored in the variables `m` and `n`, respectively. The following code shows this:

In [10]:
def kmp_matcher(text, pattern):
    m = len(text)
    n = len(pattern)
    flag = False
    text = "-" + str(text)       # append a dummy character to start indexing at 1
    pattern = "-" + str(pattern) # same concept applies with the pattern string
    prefix_fun = pfun(pattern)   # generate prefix function for the pattern
    q = 0
    for i in range(1, m + 1):
        # While the pattern and text are not equal, decrement the value of q if it is > 0
        while q > 0 and pattern[q+1] != text[i]:
            q = prefix_fun[q]
        # if pattern and text are equal, update the value of q
        if pattern[q+1] == text[i]:
            q += 1
        # if q is equal to the length of the pattern, it means the pattern has been found
        if q == n:
            print("Pattern occurs with shift", i-n)  # print the index where the first match occurs
            flag = True
            q = prefix_fun[q]
        if not flag:
            print("\nNo match was found.")

In [12]:
# Test
kmp_matcher('aabaacaadaabaaba','aabac')              # function call, with two parameters,text and pattern
kmp_matcher('abcdeadivcedevandweorpja', 'evan')


No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.

No match was found.
Pattern occurs with shift 12


### Boyer-Moore algorithm
As we have already discussed, the main objective of the string pattern matching algorithm is to find ways of skipping comparisons as much as possible by avoiding unnecessary comparisons.

The Boyer-Moore pattern matching algorithm is another such algorithm (apart from the KMP algorithm) that further improves the performance of pattern matching by skipping some comparisons. You need to understand the following concepts to be able to use the Boyer-Moore algorithm:
1. In this algorithm, we shift the pattern in the direction from left to right, similar to the KMP algorithm
2. We compare the characters of the pattern and the text string from the right to the left direction, which is opposite of the KMP algorithm.
3. The algorithm skips unnecessary comparisons by using the good-suffix and bad-character shifts concept
### Understanding the Boyer-Moore algorithm
The Boyer-Moore algorithm compates the pattern over the text from right to left. If uses the information of the various possible alignments in the pattern by preprocessing it. The main idea of this algorithm is that we compare the end characters of the pattern with the text. If they do not match, then the pattern can be moved on further. If the characters do not match in the end, there is no need to do further comparisons. In addition, in this algorithm, we can also see what portion of the pattern has matched, so we utilize this information and align the text and pattern by skipping any unnecessary comparisons.

The Boyer-Moore algorithm has two heuristics to determine the maximum shift possible for the pattern when we find a mismatch:
- bad character heuristic
- good suffix heuristic

At the time of a mismatch, each of these heuristics suggests possible shifts, and the Boyer-Moore algorithm shifts the pattern by considering the maximum shift possible due to bad character and good suffix heuristics. The details of the bad character and good suffix heuristics are explained in detail with examples in the following subsections.
### Bad character heuristic
The Boyer-Moore algorithm compates the pattern and text string in the direction from right to left. It uses the bad character heuristic to shift the pattern. According to the bad character shift concept, if there is a mismatch between the character of the pattern and the text, then we check if the mismatched character of the text occurs in the pattern or not. If this mismatched character (aka a bad character) does not appear in the pattern, then the pattern will be shifted to this character, and if that character appears somewhere in the pattern, we shift the pattern to align with the occurence that character with the bad character of the text string. Examples of the bad character heuristic are shown on pages 306-308.
### Good suffix heuristic
The bad character heuristic does not always provide good suggestions. The Boyer-Morre algorithm also uses the good suffix heuristic as well to shift the pattern over the text string to find the location of the matched patterns.

The good suffix heuristic is based on the matched suffix. Here, we shift the pattern to the right in such a way that the matched suffix subpattern is aligned with another occurrence of the same suffix in the pattern. It works like this: we start by comparing the pattern and the text string from right to left. If we find any mismatch, then we check the occurrence of the suffix that we have matched so far. This is known as the good suffix. We shift the pattern in such a way that we align another occurrence of the good suffix to the text. The good suffix heuristic has two main cases:
1. The matching suffix has one or more occurrences in the pattern.
2. Some part of the matching is present in the start of the pattern, meaning that the suffix of the matched suffix exists as the prefix of the pattern.

The book uses the same examples to explain the good suffix heuristic on pages 309-310. The Boyer-Moore algorithm has time complexity $\mathcal{O}(m)$ for the preprocessing of the pattern, and further searching takes the time complexity of $\mathcal{O}(mn)$.
### Implementing the Boyer-Moore algorithm
Initially, we have the text string and the pattern. After initializing the variables, we start with a while loop that starts by comparing the last character of the pattern tot he corresponding chracter of the text.

Then, the characters are compared from right to left by the use of the nested loop from the last index of the pattern to the corresponding character of the text.

Then, the characters are compared from right to left by the use of the nested loop from the last index of the pattern to the first character of the pattern. This uses `range(len(pattern)-1, -1, -1)`.

The outer while loop keeps track of the index in the text string while the inner for loop keeps track of the index position of the pattern.

Next, we start comparing the characters by using `pattern[j] != text[i+j]`. If they are mismatched, we make the flag variable `False`, denoting there is a mismatch.

Now, we check if the good suffix is present or not by using the condition `j == len(pattern)-1`. If this condition is true, it means that there is no good suffix possible, so we check for the bad character heuristics, that is, if a mismatched character is present in the pattern or isn't using the condition `text[i+j] in pattern[0:j]`, and if the condition is true, then it means that the bad character is present in the pattern. In this case, we move the pattern to align this bad character to the other occurrence of this character in the pattern by using `i=i+j-pattern[0:j].rfind(text[i+j])`. Here, `i+j` is the index of the bad character.

If the bad character is not present in the pattern, we move the whole pattern next to the mismatched character by using the index `i=i+j+1`.

Next, we go into the `else` part of the condition to check the good suffix. When we find the mismatch, we further test to see whether we have any subpart of a good suffix present in the prefix of the pattern. We do this by using the following condition:
```python
text[i+j+k:i+len(pattern)] not in pattern[0:len(pattern)-1]
```
Furthermore, we check whether the length of the good suffix is 1 or not. If the length of the good suffix is 1, we do not consider this shift. If the good suffix is more than 1, we find the number of shifts by using the good suffix heuristics and store this in the `gsshift` varaible. This is the pattern to a position where the good suffix of a pattern matches with `pattern[0:len(pattern)-1].rfind(i+j+k:i+len(pattern)])`. Furthermore, we computed the number of shifts possible due to the bad character heuristic and stored in the `bcshift` variable. The number of shifts possible is `i+j-pattern[0:j].rfind(text[i+j])` when the bad character is present in the pattern, and the number of shifts possible would be `i+j+1` in the case of the bad character not being present in the pattern.

Next, we shift the pattern on the text string by the maximum number of moves given by a bad character and good suffix heuristics by using the instruction `i=max((bcshift, gsshift))`. Finally, we check whether the flag variable is `True` or not. If it is `True`, this means that the pattern has been found and that the matched index has been stored in the `matched_indices` variable.

The complete implementation of the Boyer-Moore algorithm is shown as follows:

In [14]:
text = "acbaacacababaacacac"
pattern = "acacac"

matched_indices = []
i = 0
flag = True

while i <= len(text)-len(pattern):
    for j in range(len(pattern) - 1, -1, -1): # Reverse searching
        if pattern[j] != text[i+j]:
            flag = False                      # indicates there is a mismatch
            if j == len(pattern) - 1:         # If good suffix is not present, we test bad character
                if text[i+j] in pattern[0:j]:
                    # i + j is the index of the bad character, this line is used for jumping pattern to
                    # match the bad character of text with the same character in the pattern
                    i = i + j - pattern[0:j].rfind(text[i+j])
                else:
                    i = i + j + 1             # if a bad character is not present, jump pattern next to it
            else:
                k = 1
                # Used for finding subpart of a good suffix
                while text[i+j+k:i+len(pattern)] not in pattern[0:len(pattern)-1]:
                    k += 1
                if len(text[i+j+k:i+len(pattern)]) != 1:  # good-suffix should not be one character
                    # Jumps pattern to a position where good-suffix of pattern matches with a good
                    # suffix of text
                    gsshift = i + j + k - pattern[0:len(pattern)-1].rfind(text[i+j+k:i+len(pattern)])
                else:
                    # gsshift = i + len(pattern)
                    gsshift = 0  # when good-suffix heuristic is NA, we prefer the bad character heuristic
                if text[i:j] in pattern[0:j]:
                    # i+j is the index of a bad character, this line is used for jumping pattern to match
                    # the bad character of texxt with the same character in the pattern
                    bcshift = i+j-pattern[0:j].rfind(text[i+j])
                else:
                    bcshift = i + j + 1
                i = max((bcshift, gsshift))
            break
        if flag:    #if pattern is found then normal iteration
            matched_indices.append(i)
            i = i+1
        else:   #again set flag to True so new string in text can be examined
            flag = True
    
print ("Pattern found at", matched_indices)

Pattern found at [0, 13]
