#### [Leetcode 0028 Easy] [Implement strStr()](https://leetcode.com/problems/implement-strstr/) (String)

Implement strStr().

Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Example 1:
```
Input: haystack = "hello", needle = "ll"
Output: 2
```

Example 2:
```
Input: haystack = "aaaaa", needle = "bba"
Output: -1
```

Clarification:
* What should we return when needle is an empty string? This is a great question to ask during an interview.
* For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().

In [6]:
class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        """
        Brute Force
        Time Complexity: O(nm)
        Space Complexity: O(1)
        """
        # Edge Case
        if not needle:
            return 0
        if not haystack:
            return -1
        
        n, m = len(haystack), len(needle)
        
        for i in range(0, n - m + 1):
            for j in range(0, m):
                if haystack[i + j] != needle[j]:
                    break
            else:
                return i
                
        return -1
    
if __name__ == "__main__":
    soln = Solution()
    
    print(soln.strStr(haystack="hello", needle="ll") == 2)
    print(soln.strStr(haystack="hello", needle="") == 0)
    print(soln.strStr(haystack="aaaaa", needle="bba") == -1)
    print(soln.strStr(haystack="", needle="bba") == -1)

True
True
True
True


<font color='blue'>*Solution*:</font>   (DS) Rabin-Karp, average case $O(n-m)$, worst case $O(nm)$
```
s2 = 'c d e'   -->   hash('c d e') = X0
s1 = 'a b c d e'
          |---|
      a b c     = X1,  X1 = 'a'*26^2 + 'b'*26^1 + 'c'*26^0
        b c d   = X2,  X2 = (X1 - 'a'*26^2)*26 + 'd'
          c d e = X0
```
* Principle: If we can hash the pattern to a unique integer (by a hash function without collision), then we can adjust compare each substring of s1's hashed value and compare it with s2's hash value.
* Assumption: only lower case letter (base = 26, a -- z)
```
a = 97 --> 0
b = 98 --> 1
c = 99 --> 2
...
z = 122--> 25
bcd = 123 (26进制) = 1*26^2 + 2*26^1 + 3*26^0 = 731
cde = 234 (26进制) = 2*26^2 + 3*26^1 + 4*26^0 = 1434
So, we have that
      a b c     = X1,  X1 = 0*26^2 + 1*26^1 + 2*26^0 = 28
        b c d   = X2,  X2 = 1*26^2 + 2*26^1 + 3*26^0 = 731
                       X2 = (X1 - (0-0)*26^2)*26 + (4-0)*26^0 = 731
          c d e = X0,  X0 = 2*26^2 + 3*26^1 + 4*26^0 = 1434
```
* Initialization: compute hash(s2) -- O(m)
* Each time we move sliding window -- O(1) * (n-m) = O(n-m)
* Total Time Complexity: O(n - m)
```
char --> int,  ord('a') = 97
int --> char,  chr(97)  = 'a'
```
How to calculate X1 step by step?
```
initialization: X1 = 0, power = 1
get 'a':        X1 = X1 * 26 + ord('a') = 0
                power = 1 
get 'b':        X1 = X1 * 26 + ord('b') = 0 * 26 ^ 1 + 1 = 1
                power = power * 26 = 26 ^ 1
get 'c':        X1 = X1 * 26 + ord('c') = 0 * 26 ^ 2 + 1 * 26 ^ 1 + 2 = 28
                power = power * 26 = 26 ^ 2
```

How to move to X2 from X1
```
initialization: X2 = X1
kick 'a':       X2 = X2 - power * ord('a') = 1 * 26 ^ 1 + 2 = 28
get 'd':        X2 = X2 * 26 + ord('d') = 1 * 26 ^ 2 + 2 * 26 ^ 1 + 3 = 731
```

In [20]:
class Solution:
    def __init__(self):
        self.base = 26
        self.mod = 997 # a prime number for hashing
        
    def strStr(self, haystack: str, needle: str) -> int:
        """
        Rabin Karp
        Time Complexity: Average O(n-m), Worst O(nm)
        Space Complexity: O(1)
        """
        # Edge Case
        if not needle:
            return 0
        if not haystack:
            return -1
        
        n, m = len(haystack), len(needle)
        
        hay_hash, ndl_hash = 0, 0
        power = 1
        
        # initialization
        for i in range(0, m):
            if i != 0:
                power = (power * self.base) % self.mod
            hay_hash = (hay_hash * self.base + ord(haystack[i])) % self.mod
            ndl_hash = (ndl_hash * self.base + ord(needle[i])) % self.mod
                
        # transition
        for i in range(m, n):
            if hay_hash == ndl_hash and needle == haystack[i-m:i]:
                return i - m
            # drop out the oldest char
            hay_hash = hay_hash - ((power * ord(haystack[i-m]))) % self.mod
            if hay_hash < 0:
                hay_hash = hay_hash + self.mod
            # add a new char
            hay_hash = (hay_hash * self.base + ord(haystack[i])) % self.mod
            
        # post-processing
        if hay_hash == ndl_hash and needle == haystack[n-m:]:
            return n - m
         
        return -1
    
if __name__ == "__main__":
    soln = Solution()
    
    print(soln.strStr(haystack="hello", needle="ll") == 2)
    print(soln.strStr(haystack="hello", needle="") == 0)
    print(soln.strStr(haystack="aaaaa", needle="bba") == -1)
    print(soln.strStr(haystack="", needle="bba") == -1)

True
True
True
True
