- String Matching, KMP
- Amazon, Google
Implement strStr().
Given two strings needle
and haystack
, return the index of the first occurrence of needle
in haystack
, or -1
if needle
is not part of haystack
.
Clarification:
What should we return when needle
is an empty string? This is a great question to ask during an interview.
For the purpose of this problem, we will return 0 when needle
is an empty string. This is consistent to C's strstr() and Java's indexOf().
Example 1:
Input: haystack = "hello", needle = "ll"
Output: 2
Example 2:
Input: haystack = "aaaaa", needle = "bba"
Output: -1
Constraints:
1 <= haystack.length, needle.length <= 104
haystack
andneedle
consist of only lowercase English characters.
Brute Force
- The Brute Force solution is just checking every range in the haystack of length needle.length() to see if we have a match, which involves many repeated operations.
class Solution {
public int strStr(String haystack, String needle) {
int sLength = needle.length();
int lLength = haystack.length();
// Corner cases
if (sLength == 0) {
return 0;
}
if (sLength > lLength) {
return -1;
}
// Loop every starting position to check occurrence
for (int start = 0; start <= haystack.length() - needle.length(); start++) {
int i = 0;
while (i < sLength) {
if (needle.charAt(i) != haystack.charAt(start + i)) {
break;
}
i++;
}
if (i == sLength) {
return start;
}
}
return -1;
}
}
- TC: O(m*n)
- SC: O(1)
Rabin-Karp
- It's like useing hash to match, and we can save time when calculating the new window's hash code
public class Strstr {
// Method 2: RabinKarp
public int strstrII(String large, String small) {
if (large.length() < small.length()) {
return -1;
}
// return 0 if small is empty.
if (small.length() == 0) {
return 0;
}
// We need a large prime as module end.
int largePrime = 101;
// We also need a small prime to calculate the hash value
// (since the charset would be very large, e.g. 1,112,064 for using UTF,
// we can not use that number).
int prime = 31;
int seed = 1;
// hash value is calculated using the smallPrime and taken the module
// operation on largePrime.
// hash = (s0*smallP^k + s1*smallP^(k-1) + ... + sk*smallP^0) % largeP
int targetHash = small.charAt(0) % largePrime;
for (int i = 1; i < small.length(); i++) {
seed = moduleHash(seed, 0, prime, largePrime);
targetHash = moduleHash(targetHash, small.charAt(i), prime, largePrime);
}
int hash = 0;
for (int i = 0; i < small.length(); i++) {
hash = moduleHash(hash, large.charAt(i), prime, largePrime);
}
if (hash == targetHash && equals(large, 0, small)) {
return 0;
}
for (int i = 1; i <= large.length() - small.length(); i++) {
// we need to make sure the module number is non-negative.
hash = nonNegative(hash - seed * large.charAt(i - 1) % largePrime, largePrime);
hash = moduleHash(hash, large.charAt(i + small.length() - 1), prime, largePrime);
// Notice: If the hash matches, it does not mean we really find a
// substring match.
// Because there is collision, we need to check if the substring really
// matches the small string.
// On average, this will not increase the time complexity, the probability
// of collision is O(1) so we still have O(N + M).
if (hash == targetHash && equals(large, i, small)) {
return i;
}
}
return -1;
}
public boolean equals(String large, int start, String small) {
for (int i = 0; i < small.length(); i++) {
if (large.charAt(i + start) != small.charAt(i)) {
return false;
}
}
return true;
}
public int moduleHash(int hash, int addition, int prime, int largePrime) {
return (hash * prime % largePrime + addition) % largePrime;
}
public int nonNegative(int hash, int largePrime) {
if (hash < 0) {
hash += largePrime;
}
return hash;
}
}
- TC: O(m + n)
- SC: O(1)
KMP
-
KMP saves time by using more space, it reduces the times of repeated matching. When we encounter mismatch, we can directly jump to the next position to start matching.
-
I do not completely understand this algorithm
class Solution {
// KMP 算法
// ss: 原串(string) pp: 匹配串(pattern)
public int strStr(String ss, String pp) {
if (pp.isEmpty()) return 0;
// 分别读取原串和匹配串的长度
int n = ss.length(), m = pp.length();
// 原串和匹配串前面都加空格,使其下标从 1 开始
ss = " " + ss;
pp = " " + pp;
char[] s = ss.toCharArray();
char[] p = pp.toCharArray();
// 构建 next 数组,数组长度为匹配串的长度(next 数组是和匹配串相关的)
int[] next = new int[m + 1];
// 构造过程 i = 2,j = 0 开始,i 小于等于匹配串长度 【构造 i 从 2 开始】
for (int i = 2, j = 0; i <= m; i++) {
// 匹配不成功的话,j = next(j)
while (j > 0 && p[i] != p[j + 1]) j = next[j];
// 匹配成功的话,先让 j++
if (p[i] == p[j + 1]) j++;
// 更新 next[i],结束本次循环,i++
next[i] = j;
}
// 匹配过程,i = 1,j = 0 开始,i 小于等于原串长度 【匹配 i 从 1 开始】
for (int i = 1, j = 0; i <= n; i++) {
// 匹配不成功 j = next(j)
while (j > 0 && s[i] != p[j + 1]) j = next[j];
// 匹配成功的话,先让 j++,结束本次循环后 i++
if (s[i] == p[j + 1]) j++;
// 整一段匹配成功,直接返回下标
if (j == m) return i - m;
}
return -1;
}
}
- TC: O(m + n)
- SC: O(m) -- where m is the length of the haystack