28. Implement strStr()

Tag

String Matching, KMP
Amazon, Google

Description

Given two strings needle and haystack, return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.

Clarification:

What should we return when needle is an empty string? This is a great question to ask during an interview.

For the purpose of this problem, we will return 0 when needle is an empty string. This is consistent to C's strstr() and Java's indexOf().

Example 1:

Input: haystack = "hello", needle = "ll"
Output: 2

Example 2:

Input: haystack = "aaaaa", needle = "bba"
Output: -1

Constraints:

1 <= haystack.length, needle.length <= 104
haystack and needle consist of only lowercase English characters.

Solution

Brute Force

The Brute Force solution is just checking every range in the haystack of length needle.length() to see if we have a match, which involves many repeated operations.

class Solution {
    public int strStr(String haystack, String needle) {
        int sLength = needle.length();
        int lLength = haystack.length();
        // Corner cases
        if (sLength == 0) {
            return 0;
        }
        if (sLength > lLength) {
            return -1;
        }
        // Loop every starting position to check occurrence
        for (int start = 0; start <= haystack.length() - needle.length(); start++) {
            int i = 0;
            while (i < sLength) {
                if (needle.charAt(i) != haystack.charAt(start + i)) {
                    break;
                }
                i++;
            }
            if (i == sLength) {
                return start;
            }
        }
        return -1;
    }
}

TC: O(m*n)
SC: O(1)

Rabin-Karp

It's like useing hash to match, and we can save time when calculating the new window's hash code

public class Strstr {
	// Method 2: RabinKarp
  public int strstrII(String large, String small) {
    if (large.length() < small.length()) {
      return -1;
    }
    // return 0 if small is empty.
    if (small.length() == 0) {
      return 0;
    }
    // We need a large prime as module end.
    int largePrime = 101;
    // We also need a small prime to calculate the hash value
    // (since the charset would be very large, e.g. 1,112,064 for using UTF,
    // we can not use that number).
    int prime = 31;
    int seed = 1;
    // hash value is calculated using the smallPrime and taken the module
    // operation on largePrime.
    // hash = (s0*smallP^k + s1*smallP^(k-1) + ... + sk*smallP^0) % largeP
    int targetHash = small.charAt(0) % largePrime;
    for (int i = 1; i < small.length(); i++) {
      seed = moduleHash(seed, 0, prime, largePrime);
      targetHash = moduleHash(targetHash, small.charAt(i), prime, largePrime);
    }
    int hash = 0;
    for (int i = 0; i < small.length(); i++) {
      hash = moduleHash(hash, large.charAt(i), prime, largePrime);
    }
    if (hash == targetHash && equals(large, 0, small)) {
      return 0;
    }
    for (int i = 1; i <= large.length() - small.length(); i++) {
      // we need to make sure the module number is non-negative.
      hash = nonNegative(hash - seed * large.charAt(i - 1) % largePrime, largePrime);
      hash = moduleHash(hash, large.charAt(i + small.length() - 1), prime, largePrime);
      // Notice: If the hash matches, it does not mean we really find a
      // substring match.
      // Because there is collision, we need to check if the substring really
      // matches the small string.
      // On average, this will not increase the time complexity, the probability
      // of collision is O(1) so we still have O(N + M).
      if (hash == targetHash && equals(large, i, small)) {
        return i;
      }
    }
    return -1;
  }

  public boolean equals(String large, int start, String small) {
    for (int i = 0; i < small.length(); i++) {
      if (large.charAt(i + start) != small.charAt(i)) {
        return false;
      }
    }
    return true;
  }

  public int moduleHash(int hash, int addition, int prime, int largePrime) {
    return (hash * prime % largePrime + addition) % largePrime;
  }

  public int nonNegative(int hash, int largePrime) {
    if (hash < 0) {
      hash += largePrime;
    }
    return hash;
  }
}

TC: O(m + n)
SC: O(1)

KMP

KMP saves time by using more space, it reduces the times of repeated matching. When we encounter mismatch, we can directly jump to the next position to start matching.
I do not completely understand this algorithm

class Solution {
    // KMP 算法
    // ss: 原串(string)  pp: 匹配串(pattern)
    public int strStr(String ss, String pp) {
        if (pp.isEmpty()) return 0;
        
        // 分别读取原串和匹配串的长度
        int n = ss.length(), m = pp.length();
        // 原串和匹配串前面都加空格，使其下标从 1 开始
        ss = " " + ss;
        pp = " " + pp;

        char[] s = ss.toCharArray();
        char[] p = pp.toCharArray();

        // 构建 next 数组，数组长度为匹配串的长度（next 数组是和匹配串相关的）
        int[] next = new int[m + 1];
        // 构造过程 i = 2，j = 0 开始，i 小于等于匹配串长度 【构造 i 从 2 开始】
        for (int i = 2, j = 0; i <= m; i++) {
            // 匹配不成功的话，j = next(j)
            while (j > 0 && p[i] != p[j + 1]) j = next[j];
            // 匹配成功的话，先让 j++
            if (p[i] == p[j + 1]) j++;
            // 更新 next[i]，结束本次循环，i++
            next[i] = j;
        }

        // 匹配过程，i = 1，j = 0 开始，i 小于等于原串长度 【匹配 i 从 1 开始】
        for (int i = 1, j = 0; i <= n; i++) {
            // 匹配不成功 j = next(j)
            while (j > 0 && s[i] != p[j + 1]) j = next[j];
            // 匹配成功的话，先让 j++，结束本次循环后 i++
            if (s[i] == p[j + 1]) j++;
            // 整一段匹配成功，直接返回下标
            if (j == m) return i - m;
        }

        return -1;
    }
}

TC: O(m + n)
SC: O(m) -- where m is the length of the haystack

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

28. Implement strStr().md

28. Implement strStr().md

28. Implement strStr()

Tag

Description

Solution

Files

28. Implement strStr().md

Latest commit

History

28. Implement strStr().md

File metadata and controls

28. Implement strStr()

Tag

Description

Solution