# Problem 1
There are two inputs, a list of synonyms and a list of query pairs. For each query pair, determine equivalence given the list of synonyms.

The synonym list is always provided as a list of pairs of strings. For any word with synonyms, all pairs appear in the list (so there's no need to follow transitive relationships). Synonyms might consist of one word or several, and it's possible for one term to have several different synonyms.

The query pair list is always provided as a list of pairs of strings.

## Problem 1 Examples
Synonym list:
```
[("read", "look through"), ("The Iliad", "The Song of Ilium"), ("look", "see")]
```
Query pairs:
```
[("look through The Iliad", "read The Song of Ilium"),
 ("look through the glass", "see through the glass"),
 ("look through book", "read through book"),
 ("read The Iliad", "The Song of Ilium look through")]
```
Expected output:
```
[True, True, False, False]
```
## Dynamic Programming Formulation
### "Build the grid"
Each query is tokenized by splitting on white space to yield a list of words. For a query `a`, define `a[i]` as the ith word of `a` and `a[i:j]` as the tuple of ith through j-1th (inclusive) words of `a`, of length `j - i`.

For a given query pair `(a, b)`, define `M[i, j]` to be True if there is equivalence between `a` and `b` up to `a[i - 1]` and `b[j - 1]`, and False otherwise. Then:

```
M[0, 0] = True
M[i, j] = ∃ k < i, l < j s.t. M[k, l] is True and a[k:i] is a synonym of b[l:j]
```

A query pair `(a, b)` is equivalent if and only if `M[len(a), len(b)]` is True.

This yields a straightforward dynamic programming solution in which a 2-d grid is built with word `a` on the horizontal axis and `b` on the vertical axis. One column is devoted to each word of `a` and one row to each word of `b`. Given the indexing given above, we append one extra column and one extra row. Each cell `(i, j)` contains the value True to indicate there is equivalence between `a` and `b` up to `a[i - 1]` and `b[j - 1]`, and False otherwise. In the solution below, we only keep track of the grid coordinates containing True, since they're the only ones that matter in the above recursive formulation of `M[i, j]`.

In [1]:
SYNONYMS = {("read", "look through"), ("The Iliad", "The Song of Ilium"), ("look", "see")}
PAIRS = [
    ("look through The Iliad", "read The Song of Ilium"),
    ("look through the glass", "see through the glass"),
    ("look through book", "read through book"),
    ("read The Iliad", "The Song of Ilium look through")]

def are_synonymous(a, b, synonyms):
    return a == b or (a, b) in synonyms or (b, a) in synonyms

def solve_query(synonyms, query):
    a, b = (q.split(" ") for q in query)
    equivalences = {(0, 0)}
    for i in range(1, len(a) + 1):
        for j in range(1, len(b) + 1):
            for k, l in equivalences:
                if k >= i or l >= j:
                    continue
                if are_synonymous(' '.join(a[k:i]), ' '.join(b[l:j]), synonyms):
                    equivalences.add((i, j))
                    break
    return (len(a), len(b)) in equivalences

def solve_queries(synonyms, queries):
    return [solve_query(synonyms, q) for q in queries]

solve_queries(SYNONYMS, PAIRS)

[True, True, False, False]

### Search with memoization
Whereas in the previous solution we answered the question "is there an equivalence ending at a particular position pair?" and then iteratively built up answers to this question starting with True at (0, 0), in this approach we build up the same set of answers through searching all possible (i, j) pairs, asking "is there an equivalence starting at this position?".

Formalizing this, we have:

```
M[len(a), len(b)] = True
M[i, j] = ∃ k > i, l > j s.t. a[i:k] is a synonym of b[j:l] and M[k, l] is True
```

Code:

In [2]:
from collections import defaultdict

def make_synonym_lookup(synonyms):
    lookup = defaultdict(set)
    for a, b in synonyms:
        lookup[a].add(b)
        lookup[b].add(a)
    return lookup

def find_synonyms(tail, synonym_lookup):
    word_ends = [i for i, c in enumerate(tail) if c == ' '] + [len(tail)]
    synonyms = set()
    for word_end in word_ends:
        tail_prefix = tail[:word_end]
        synonyms.add((tail_prefix, tail_prefix))
        for synonym in synonym_lookup.get(tail_prefix, []):
            synonyms.add((tail_prefix, synonym))
    return synonyms

def solve_query(synonyms, query):
    a, b = query
    negative_answers = set()
    synonym_lookup = make_synonym_lookup(synonyms)
    def visit(i, j):
        if (i, j) in negative_answers:
            return False
        if i >= len(a) and j >= len(b):
            return True
        elif i >= len(a) or j >= len(b):
            return False
        a_tail, b_tail = a[i:], b[j:]
        a_synonyms = find_synonyms(a_tail, synonym_lookup)
        for term, synonym in a_synonyms:
            if b_tail.startswith(synonym):
                if visit(i + len(term) + 1, j + len(synonym) + 1):
                    return True
        negative_answers.add((i, j))
        return False
    return visit(0, 0)

def solve_queries(synonyms, queries):
    return [solve_query(synonyms, q) for q in queries]

solve_queries(SYNONYMS, PAIRS)

[True, True, False, False]

# Problem 2
Same as the first problem except terms can appear in any order in the second query.

Synonym list:
```
[("read", "look through"), ("The Iliad", "The Song of Ilium"), ("look", "see")]
```
Query pairs:
```
[("look through The Iliad", "read The Song of Ilium"),
 ("look through the glass", "see through the glass"),
 ("look through book", "read through book"),
 ("read The Iliad", "The Song of Ilium look through"),
 ("look through and see", "look and read")]
```
Expected output:
```
[True, True, False, True, True]
```
Note the first four test cases are identical to the previous four test cases, but now True is returned for the fourth. We also added a fifth test case that has True.
## Backtracking Search Formulation
In this formulation, we search through all possible segmentations of `a`, and at each node in the search tree we end the search branch if synonyms for the segmentation aren't represented in `b`. This is essentially treating the problem as a constraint satisfaction problem.

Formalizing this, **TODO**

In [3]:
def solve_query(synonyms, query):
    a, b = query
    a_words = a.split(" ")
    synonym_lookup = make_synonym_lookup(synonyms)
    def visit(i, leftover_b):
        leftover_b = leftover_b.strip()
        if i == len(a_words) and leftover_b == "":
            return True
        if i == len(a_words) or leftover_b == "":
            return False
        for j in range(i + 1, len(a_words) + 1):
            word = ' '.join(a_words[i:j])
            synonyms = synonym_lookup.get(word, set()) | { word }
            for synonym in synonyms:
                if synonym not in leftover_b:
                    # cut off this search branch, since it violates our constraint
                    continue
                if visit(j, leftover_b.replace(synonym, "")):
                    return True
        return False
    return visit(0, b)

def solve_queries(synonyms, queries):
    return [solve_query(synonyms, q) for q in queries]

solve_queries(
    SYNONYMS,
    [("look through The Iliad", "read The Song of Ilium"),
     ("look through the glass", "see through the glass"),
     ("look through book", "read through book"),
     ("read The Iliad", "The Song of Ilium look through"),
     ("look through and see", "look and read")])

[True, True, False, True, True]

**TODO** time complexity analysis, include some interesting examples a la http://norvig.com/sudoku.html