### InterviewDB description

T9 is a type of keypad commonly seen on mobile phones. To type with T9, a user enters digits and a program (also called an input method) translates the input into English words with the help of a dictionary stored on the device. Our task is to implement this input method.

**Inputs**: The main function will be given two inputs: input_digits: an integer array (contains only 2-9 and length is up to 25) valid_words: a string array, defines a list of valid English words. (up to 50)

**Outputs**: A 2D string array of all the possible word combinations that the given input can be mapped to. More specifically, the return value is of format [<word combinations1>, <word combinations 2>, ...] where each <word combinations> is an array ofwords, which represents one word or a list of words the input digits can betranslated into.

**Examples**: In this example, each <word combinations> only contains a single word. input_digits: [2, 2, 8] valid_words: ["act","bat","cat","acd","test"] output: [["act"], ["bat"], ["cat"]]

In this example, the word combination can have multiple words: input_digits: [7, 6, 6, 3, 8, 4, 6, 3] valid_words: ["some","time","rome","sometime","so","me"] output: [["rome","time"],["so","me","time"],["some","time"],["sometime"]]

---

This is more complex than the basic T9 — the input digits can map to multiple words that together span the entire digit sequence. So for [7,6,6,3,8,4,6,3], both "sometime" (all 8 digits) and ["some","time"] (4+4 digits) are valid.
This is essentially a word segmentation problem on top of T9 lookup.

---

#### How it works:
predict drives a backtracking search over the digit array. At each position start, it tries every possible end position and asks: is there a valid word that maps to digits[start:end]? If yes, it adds that word to the current combination and recurses from end. When start reaches the end of the digit array, the current combination spans all digits exactly — it's a valid result.  

_match handles the T9 lookup for a fixed slice digits[start:end]. It uses DFS through the trie, at each level only following children whose character appears in the current digit's letter set. Only paths that end on an is_end node produce a word.  

The two concerns are cleanly separated — _match handles T9 ambiguity resolution, _segment handles word boundary finding.

In [None]:
T9_MAP = {
    '2': 'abc',
    '3': 'def',
    '4': 'ghi',
    '5': 'jkl',
    '6': 'mno',
    '7': 'pqrs',
    '8': 'tuv',
    '9': 'wxyz'
}

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class T9:
    def __init__(self, valid_words: list[str]):
        self.root = TrieNode()
        for word in valid_words:
            self._insert(word)

    def _insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def _words_at(self, digits: list[int], start: int) -> list[str]:
        """Find all valid words that match digits[start:start+n] for any n."""
        results = []
        node = self.root
        for i in range(start, len(digits)):
            possible_chars = T9_MAP[str(digits[i])]
            # Check which chars at this level exist in trie
            next_node = None
            for char in possible_chars:
                if char in node.children:
                    # We need to explore all matching chars, not just one
                    pass
            # Use DFS to find all words starting at `start` of exact length i-start+1
            results.extend(self._match(self.root, digits, start, i + 1))
        return list(dict.fromkeys(results))  # deduplicate while preserving order

    def _match(self, node: TrieNode, digits: list[int], start: int, end: int) -> list[str]:
        """Find all words that map exactly to digits[start:end]."""
        results = []
        self._dfs_match(node, digits, start, end, 0, "", results)
        return results

    def _dfs_match(self, node, digits, start, end, depth, current, results):
        if depth == end - start:
            if node.is_end:
                results.append(current)
            return
        digit = str(digits[start + depth])
        for char in T9_MAP[digit]:
            if char in node.children:
                self._dfs_match(node.children[char], digits, start, end, depth + 1, current + char, results)

    def predict(self, digits: list[int]) -> list[list[str]]:
        results = []
        self._segment(digits, 0, [], results)
        return results

    def _segment(self, digits, start, current_combo, results):
        """Backtracking: try all ways to segment digits into valid words."""
        if start == len(digits):
            results.append(list(current_combo))
            return
        # Try every possible word length starting at `start`
        for end in range(start + 1, len(digits) + 1):
            matched_words = self._match(self.root, digits, start, end)
            for word in matched_words:
                current_combo.append(word)
                self._segment(digits, end, current_combo, results)
                current_combo.pop()


t9 = T9(valid_words=["act", "bat", "cat", "acd", "test"])
print(t9.predict([2, 2, 8]))
# [['act'], ['bat'], ['cat']]

t9 = T9(valid_words=["some", "time", "rome", "sometime", "so", "me"])
print(t9.predict([7, 6, 6, 3, 8, 4, 6, 3]))
# [['rome', 'time'], ['so', 'me', 'time'], ['some', 'time'], ['sometime']]

### T9 predictive text

No, T9 doesn't guess character by character — that's the key insight that made it clever. It works at the **word level**, not the character level.

When you press a sequence of digits, T9 waits until it has enough context and then looks up all possible words that match the entire digit sequence at once. So pressing `4-3` doesn't first guess `g/h/i` then `d/e/f` — it just finds all words whose letters map to `4` then `3`, which gives you `he`, `if`, `id`, `ge` etc., and surfaces the most frequent one.

**How frequency ranking works in practice:**

The phone ships with a static dictionary where each word has a pre-assigned frequency score based on general English usage. The highest frequency match for the digit sequence is shown first. So for `4663` it knows `good` is more common than `gone` or `home` and shows that by default.

Over time many T9 implementations also did **personal learning** — if you repeatedly selected `gone` over `good` for `4663`, it would bump `gone`'s personal frequency score and start showing it first for you.

**The user experience flow:**

1. You press `4663`
2. T9 looks up all words matching that sequence and sorts by frequency
3. It displays the top match, say `good`
4. If that's wrong you press `0` or `*` to cycle through alternatives: `gone`, `home`
5. You confirm with a space or punctuation

**Where it gets interesting** is ambiguous short sequences like `43` which could be `he`, `if`, `id`, `ge` — frequency scoring is what makes it not feel like a guessing game. And for words not in the dictionary, there was usually a fallback "multi-tap" mode where pressing `4` twice gives you `h`, three times gives `i` — the old pre-T9 method.

So the trie implementation above is actually a reasonable model of the core lookup, but a real T9 would attach a frequency score to each `is_end` node and sort results by that rather than returning an unordered list.

In [None]:
# Blog
class TrieNode:
    def __init__(self, value: str, parent: "TrieNode | None" = None):
        self._children: dict[str, TrieNode] = {}
        self._insertion_count = 0
        self._value = value
        self._parent = parent

    def __repr__(self):
        return f"TrieNode<{self._value}>"  # pragma: no cover

    @cached_property
    def word(self) -> str:
        return "".join(n._value for n in self._bottom_up_traversal())[::-1]

    def _bottom_up_traversal(self) -> Iterable["TrieNode"]:
        current: "TrieNode | None" = self
        while current is not None:
            yield current
            current = current._parent

    @property
    def word_nodes(self) -> Iterable["TrieNode"]:
        result = []
        dfs = [self]
        while dfs:
            node = dfs.pop()
            if node._insertion_count:
                result.append(node)

            for child in node._children.values():
                dfs.append(child)

        return result


In [None]:
# Claud

T9_MAP = {
    '2': 'abc',
    '3': 'def',
    '4': 'ghi',
    '5': 'jkl',
    '6': 'mno',
    '7': 'pqrs',
    '8': 'tuv',
    '9': 'wxyz'
}

class TrieNode:
    def __init__(self):
        self.children = {}
        self.is_end = False

class T9:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        node = self.root
        for char in word:
            if char not in node.children:
                node.children[char] = TrieNode()
            node = node.children[char]
        node.is_end = True

    def predict(self, digits: str) -> list[str]:
        results = []
        self._dfs(self.root, digits, 0, "", results)
        return results

    def _dfs(self, node: TrieNode, digits: str, depth: int, current: str, results: list) -> None:
        # Collected a full match for the digit sequence
        if depth == len(digits):
            if node.is_end:
                results.append(current)
            return

        # Try every letter mapped to the current digit
        for char in T9_MAP[digits[depth]]:
            if char in node.children:
                self._dfs(node.children[char], digits, depth + 1, current + char, results)

In [None]:
t9 = T9()

words = ["the", "of", "and", "he", "she", "his", "her", "home", "good", "gone"]
for word in words:
    t9.insert(word)

print(t9.predict("4663"))   # ['gone', 'good', 'home']
print(t9.predict("843"))    # ['the']
print(t9.predict("43"))     # ['he'] — 'he' = 4→h, 3→e
print(t9.predict("743"))    # ['she'] — 'she' = 7→s, 4→h, 3→e
print(t9.predict("447"))    # ['his'] — 'his' = 4→h, 4→i, 7→s