# Tira sequences

Your task is to count how many substrings of a string contain the characters `t`, `i`, `r` and `a` in some order.

The time complexity of the algorithm should be $O(n)$.

In a file `sequences.py`, implement a function `count` that returns the desired count.

In [None]:
def count(s):
    # TODO

if __name__ == "__main__":
    print(count("aybabtu")) # 0
    print(count("tira")) # 1
    print(count("ritari")) # 6
    print(count("tiratiratira")) # 45
    print(count("xaxrxixtx")) # 4

*Explanation*: For example, the string `ritari` has six desired substrings: `itar`, `itari`, `rita`, `ritar`, `ritari` and `tari`.

### Approach 1

In [89]:
import re

def count(s):
    # This pattern will match any substring that contains 't', 'i', 'r', and 'a' at least once
    pattern = re.compile(r'(?=.*(t))(?=.*(i))(?=.*(r))(?=.*(a))')
    
    n = len(s)
    result = 0
    
    # Check all substrings starting from each character in the string
    for i in range(n):
        for j in range(i+4, n+1):  # Substrings must be at least 4 characters long
            substring = s[i:j]
            if pattern.search(substring):
                result += 1
    
    return result

if __name__ == "__main__":
    print(count("aybabtu")) # 0
    print(count("tira")) # 1
    print(count("ritari")) # 6
    print(count("tiratiratira")) # 45
    print(count("xaxrxixtx")) # 4
    print(count("iamtr")) # 1


0
1
6
45
4
1


This code uses a nested loop to check every possible substring of the input string. For each substring, it uses a regular expression with positive lookaheads to ensure that `t`, `i`, `r`, and `a` are all present at least once. If the substring matches, it increments the result counter.

The time complexity of the provided code is $O(n^3)$. This is because the code uses two nested loops to generate all possible substrings of the input string, and for each substring, it performs a regular expression match.

Here’s a breakdown of the time complexity:
- The outer loop runs $(n)$ times.
- The inner loop runs in the order of $(n)$ times for each iteration of the outer loop, since it checks every possible ending index for a substring starting at index `i`.
- The regular expression match inside the inner loop is applied to each substring. The worst-case time complexity for a regular expression match is $O(m)$, where $(m)$ is the length of the substring being matched. In this case, $(m)$ can be considered $O(n)$ in the worst case, when the substring is nearly as long as the entire string.
 
Multiplying these together, we get a time complexity of $O(n \times n \times n) = O(n^3)$.

### Approach 3

In [100]:
def count(s):
    # Initialize the last seen indices for 't', 'i', 'r', and 'a'
    last_seen = {'t': -1, 'i': -1, 'r': -1, 'a': -1}
    result = 0
    n = len(s)
    
    # Iterate through the string
    for i in range(n):
        if s[i] in last_seen:
            # Update the last seen index for the character
            last_seen[s[i]] = i
        
        # Find the earliest last seen index among 't', 'i', 'r', 'a'
        min_last_seen = min(last_seen.values())
        
        # If all characters have been seen at least once, add the number of valid substrings
        if min_last_seen != -1:
            result += min_last_seen + 1
    
    return result

if __name__ == "__main__":
    print(count("aybabtu")) # 0
    print(count("tira")) # 1
    print(count("ritari")) # 6
    print(count("tiratiratira")) # 45
    print(count("xaxrxixtx")) # 4
    print(count("iamtr")) # 1


0
1
6
45
4
1


We can use a **single pass approach** to keep track of the indices of the last occurrence of each character `t`, `i`, `r`, and `a`. As we iterate through the string, we update these indices and calculate the number of valid substrings ending at the current position. 

This function works by updating the `last_seen` dictionary with the index of the last occurrence of each character as we iterate through the string. For each character in the string, we calculate the number of new valid substrings that can be formed by considering the earliest index among the last seen indices of ‘`t`, `i`, `r`, and `a`. If all characters have been seen at least once `min_last_seen != -1`, we add the count of valid substrings ending at the current character.

In [102]:
def count(s):
    last = {'t': -1, 'i': -1, 'r': -1, 'a': -1}
    result = 0
    n = len(s)
    
    for i in range(n):
        if s[i] in last:
            last[s[i]] = i
        
        minimum = min(last.values())
        
        if minimum != -1:
            result += minimum + 1
    
    return result

if __name__ == "__main__":
    print(count("aybabtu")) # 0
    print(count("tira")) # 1
    print(count("ritari")) # 6
    print(count("tiratiratira")) # 45
    print(count("xaxrxixtx")) # 4
    print(count("iamtr")) # 1


0
1
6
45
4
1


### Solution

The idea of the solution is to scan through the string from left to right while keeping track of the most recent occurrence of each of the characters `t`, `i`, `r` and `a`.

Any valid substring ending at the current position must start at or before the smallest of the tracked positions. Thus their count is the smallest position plus one. The four positions are initialized with $-1$, which ensures that the count remains zero until all four characters have occurred at least once.

The time complexity of the resulting algorithm is $O(n)$.

In [103]:
def count(s):
    result = 0
    pos_t = pos_i = pos_r = pos_a = -1

    for i in range(len(s)):
        if s[i] == "t": pos_t = i
        if s[i] == "i": pos_i = i
        if s[i] == "r": pos_r = i
        if s[i] == "a": pos_a = i

        result += min(pos_t, pos_i, pos_r, pos_a) + 1

    return result

print(count("aybabtu")) # 0
print(count("tira")) # 1
print(count("ritari")) # 6
print(count("tiratiratira")) # 45
print(count("xaxrxixtx")) # 4
print(count("iamtr")) # 1

0
1
6
45
4
1
