# 1044. Longest Duplicate Substring
Given a string S, consider all duplicated substrings: (contiguous) substrings of S that occur 2 or more times.  (The occurrences may overlap.)

Return any duplicated substring that has the longest possible length.  (If S does not have a duplicated substring, the answer is "".)

 

Example 1:

Input: "banana"
Output: "ana"
Example 2:

Input: "abcd"
Output: ""

A brute-force solution would be to start from len(S) - 1 and check if there exists a duplicate substring of that size.
We decrement the size until we find a duplicate.
However, this solution takes O(S^3).
There are 2 bottlenecks which make this problem difficult:

1. How to search the longest length that satisfies the condition
2. How to find a duplicate substring of a specified size

For the first bottleneck, we can use binary search, because if there exists a duplicate for size i, then for all j < i, S has a substring of that size. In other words, the existance of a duplicate substring w.r.t. length has monotonicity.
e.g.) say S = "banana". We have a duplicate substring of size 3 which is "ana". Therefore, we have duplicated substrings of size 1, 2, such as "a", "an".

Now we have reduced the time to O(S^2lgS).
Next, for the second bottleneck, we can use Rabin-Karp, using the Rolling Hash.
Now the solution just only takes O(SlgS) expectedly. Note that you need to deal with collisions when using the Rolling Hash.

Binary search the length of longest duplicate substring and call the help function test(L).
check(L) slide a window of length L,
rolling hash the string in this window,
record the seen string in a hashset,
and try to find duplicated string.

Complexity:
* Time: O(NlogN)
* SpaceO(N)

In [2]:
from functools import reduce
def longestDupSubstring(S: str) -> str:
    char2int = [ord(ch)-ord('a') for ch in S]
    mod = 2**63-1


    def check(L):
        p = pow(26,L,mod)
        cur = reduce(lambda x,y: (x * 26 +y)%mod,char2int[:L],0)
        seen = {cur}
        for i in range(L,len(S)):
            cur = (cur * 26 + char2int[i] - char2int[i-L]*p)%mod
            if cur in seen:
                return i - L + 1
            seen.add(cur)

    res = 0
    lo,hi = 0,len(S)
    while lo < hi:
        mid = (lo+hi+1)//2
        pos = check(mid)
        if pos:
            lo = mid
            res = pos
        else:
            hi = mid -1
    return S[res:res+lo]
longestDupSubstring("banana")

[1, 0, 13, 0, 13, 0]


'ana'