Skip to content

Latest commit

 

History

History
215 lines (171 loc) · 6.37 KB

28. Implement strStr().md

File metadata and controls

215 lines (171 loc) · 6.37 KB

28. Implement strStr()

Tag

  • String Matching, KMP
  • Amazon, Google

Description

Implement strStr().

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().

Example 1:

Input: haystack = "hello", needle = "ll"
Output: 2

Example 2:

Input: haystack = "aaaaa", needle = "bba"
Output: -1

Constraints:

  • 1 <= haystack.length, needle.length <= 104
  • haystack and needle consist of only lowercase English characters.

Solution

Brute Force

  • The Brute Force solution is just checking every range in the haystack of length needle.length() to see if we have a match, which involves many repeated operations.
class Solution {
    public int strStr(String haystack, String needle) {
        int sLength = needle.length();
        int lLength = haystack.length();
        // Corner cases
        if (sLength == 0) {
            return 0;
        }
        if (sLength > lLength) {
            return -1;
        }
        // Loop every starting position to check occurrence
        for (int start = 0; start <= haystack.length() - needle.length(); start++) {
            int i = 0;
            while (i < sLength) {
                if (needle.charAt(i) != haystack.charAt(start + i)) {
                    break;
                }
                i++;
            }
            if (i == sLength) {
                return start;
            }
        }
        return -1;
    }
}
  • TC: O(m*n)
  • SC: O(1)

Rabin-Karp

  • It's like useing hash to match, and we can save time when calculating the new window's hash code
public class Strstr {
	// Method 2: RabinKarp
  public int strstrII(String large, String small) {
    if (large.length() < small.length()) {
      return -1;
    }
    // return 0 if small is empty.
    if (small.length() == 0) {
      return 0;
    }
    // We need a large prime as module end.
    int largePrime = 101;
    // We also need a small prime to calculate the hash value
    // (since the charset would be very large, e.g. 1,112,064 for using UTF,
    // we can not use that number).
    int prime = 31;
    int seed = 1;
    // hash value is calculated using the smallPrime and taken the module
    // operation on largePrime.
    // hash = (s0*smallP^k + s1*smallP^(k-1) + ... + sk*smallP^0) % largeP
    int targetHash = small.charAt(0) % largePrime;
    for (int i = 1; i < small.length(); i++) {
      seed = moduleHash(seed, 0, prime, largePrime);
      targetHash = moduleHash(targetHash, small.charAt(i), prime, largePrime);
    }
    int hash = 0;
    for (int i = 0; i < small.length(); i++) {
      hash = moduleHash(hash, large.charAt(i), prime, largePrime);
    }
    if (hash == targetHash && equals(large, 0, small)) {
      return 0;
    }
    for (int i = 1; i <= large.length() - small.length(); i++) {
      // we need to make sure the module number is non-negative.
      hash = nonNegative(hash - seed * large.charAt(i - 1) % largePrime, largePrime);
      hash = moduleHash(hash, large.charAt(i + small.length() - 1), prime, largePrime);
      // Notice: If the hash matches, it does not mean we really find a
      // substring match.
      // Because there is collision, we need to check if the substring really
      // matches the small string.
      // On average, this will not increase the time complexity, the probability
      // of collision is O(1) so we still have O(N + M).
      if (hash == targetHash && equals(large, i, small)) {
        return i;
      }
    }
    return -1;
  }

  public boolean equals(String large, int start, String small) {
    for (int i = 0; i < small.length(); i++) {
      if (large.charAt(i + start) != small.charAt(i)) {
        return false;
      }
    }
    return true;
  }

  public int moduleHash(int hash, int addition, int prime, int largePrime) {
    return (hash * prime % largePrime + addition) % largePrime;
  }

  public int nonNegative(int hash, int largePrime) {
    if (hash < 0) {
      hash += largePrime;
    }
    return hash;
  }
}
  • TC: O(m + n)
  • SC: O(1)

KMP

  • KMP saves time by using more space, it reduces the times of repeated matching. When we encounter mismatch, we can directly jump to the next position to start matching.

  • I do not completely understand this algorithm

class Solution {
    // KMP 算法
    // ss: 原串(string)  pp: 匹配串(pattern)
    public int strStr(String ss, String pp) {
        if (pp.isEmpty()) return 0;
        
        // 分别读取原串和匹配串的长度
        int n = ss.length(), m = pp.length();
        // 原串和匹配串前面都加空格,使其下标从 1 开始
        ss = " " + ss;
        pp = " " + pp;

        char[] s = ss.toCharArray();
        char[] p = pp.toCharArray();

        // 构建 next 数组,数组长度为匹配串的长度(next 数组是和匹配串相关的)
        int[] next = new int[m + 1];
        // 构造过程 i = 2,j = 0 开始,i 小于等于匹配串长度 【构造 i 从 2 开始】
        for (int i = 2, j = 0; i <= m; i++) {
            // 匹配不成功的话,j = next(j)
            while (j > 0 && p[i] != p[j + 1]) j = next[j];
            // 匹配成功的话,先让 j++
            if (p[i] == p[j + 1]) j++;
            // 更新 next[i],结束本次循环,i++
            next[i] = j;
        }

        // 匹配过程,i = 1,j = 0 开始,i 小于等于原串长度 【匹配 i 从 1 开始】
        for (int i = 1, j = 0; i <= n; i++) {
            // 匹配不成功 j = next(j)
            while (j > 0 && s[i] != p[j + 1]) j = next[j];
            // 匹配成功的话,先让 j++,结束本次循环后 i++
            if (s[i] == p[j + 1]) j++;
            // 整一段匹配成功,直接返回下标
            if (j == m) return i - m;
        }

        return -1;
    }
}
  • TC: O(m + n)
  • SC: O(m) -- where m is the length of the haystack