# Resolving the bill

It's time to leave the hotel and pay your bill. Unfortunately, there's been some kind of foul-up in the hotel's admin system, and people's bills have been mixed up. Can you help the staff clarify what's happened?

The bills are complex, but they are just strings of events that are billed. We can (without loss of generality) use letters for each different event, so we can treat the bills as strings of letters.

The set of bills the staff have to work through are given as a bill on each line, with a leading number (an integer) that gives the index of the bill; the index number is separated from the bill by a colon.

For instance, the bills could look like this:

```
0: accbadaadc
1: bbbbaabada
2: cdaacacadcddbccacdab
3: bbcdabbaaabcbcadcaac
4: accbbabbdbaaabadaadc
5: acadcdddab
6: aacccabaddcdaddaabdc
```

You know what you spent, so you've helpfully included your bill as the first line of the file, at index 0. 

You can find your bill in the mixed-up line at index 4, like this:

```
accbbabbdbaaabadaadc
*** **  *   *    ***
```

(the stars indicate the bits that come from your bill). There's more than one way to extract your bill from that mixed-up line, and that's OK: you're just trying to find possible matches at the moment. 

Your bill is also in the line at index 6:

```
aacccabaddcdaddaabdc
 * ** ** *  *   * **
```

but you can't find your bill in any other lines. That means your bill occurs 2 times in this list of bills, if you exclude the line zero (which you added).

## Part 1

Given that your bill is at line 0, and the list of bills as [09-bills.txt](09-bills.txt), **how many _other_ lines contain your bill as a subsequence?**

It seems there's some method behind the madness of the bills. Enough that you have hope of paying the bill and leaving soon enough to catch your flight home!

Somewhere in the list of bills is a line that's a mixture of your bill and your friend's bill. Your bill is still on line 0. Your friend's bill is on line 1. In the example above, you can see that line 4 is the only line that's a mixture of your two bills.

## Part 2
Given that your bill is at line 0, your friend's is on line 1, and the list of bills is still in [09-bills.txt](09-bills.txt), **which line is a mixture of your bill and your friends's bill?** (There's only one such line.)

In [1]:
import sys
sys.setrecursionlimit(10**6)

In [2]:
with open('09-bills.txt') as f:
    bills = {int(l.strip().split(': ')[0]): l.strip().split(': ')[1] for l  in f.readlines()}
len(bills)

148

In [3]:
def show_table(table):
    return '\n'.join(
        ' '.join('T' if table[i, j] else '.' for j in sorted(set([k[1] for k in table])))
        for i in sorted(set([k[0] for k in table])))        

In [4]:
def show_annotated_table(table, bps):
    return '\n'.join(' '.join('.' if (i, j) not in bps else bps[i, j][2] if table[i, j] else '.' for j in sorted(set([k[1] for k in table])))
        for i in sorted(set([k[0] for k in table])))

# Worked example: part 1
Ah, the problem that didn't pan out. 

This was _meant_ to be an exercise in dynamic programming, another technique taught in M269. However, I mucked up both the problem specification in part 1 and the test data in part 2, so that other, simpler, approaches gave the correct answers. 

Part 1 was meant to be a variant on the greatest-common subsequence problem, but making it _whole_ subsequence checking meant there was no need for dynamic programming. Part 2 did require something like dynamic programming for the general case, but the test data didn't force examination of all the cases, so a simpler algorithm that would gave false positives didn't return any while using this data set. 

So, part 1. We want to see if $s_1$ is a subsequence of $s_2$. The simple way is to walk along $s_2$, character by character, keeping track of how much of $s_1$ is a subsequence up to this point. I use the ppinter _i_ as the position in the next character to check in $s_1$. If, when we've finished, _i_ points beyond the end of $s_1$, $s_1$ is a subsequence of $s_2$.

For instance, if we want to see if `abc` is a subseqence of `babaca`, we can see that:
* ø is a subsequence of `b` (_i_ == 0)
* `a` is a subsequence of `ba` (_i_ == 1)
* `ab` is a subsequence of `bab` (_i_ == 2)
* `ab` is a subsequence of `baba` (_i_ == 2)
* `abc` is a subsequence of `babac` (_i_ == 3)
* `abc` is a subsequence of `babaca` (_i_ == 3)

That's implemented as `is_subseq_simple`. The `is_subseq_simple_shortcut` does the same, but bails out of the loop as soon as it's determined that $s_1$ is a subsequence of $s_2$; working from the end of $s_2$ means the checks are for _i_ < 0 rather than using the length of $s_1$.



The idea of the problem was dynamic programming. This comes in useful when considering the _longest common subsequence_ problem, were we have to identify the how much of $s_1$ can be found as subsequence in $s_2$.

A recursive solution to the subsequence problem looks at the last character of each of $s_1$ and $s_2$. 

> If they're different, $s_1$ is only a subsequence of $s_2$ if $s_1$ is also a subsequence of all but the last character of $s_2$ (i.e. `abc` is a subsequence of `babaca` iff `abc` is a subsequence of `babac`). 

> If the last two characters are the same, $s_1$ is a subsequence of $s_2$ if $s_1$ is a subsequence of all but the last character of $s_2$, or  all but the last character of $s_1$ is a subsequence of all but the last character of $s_2$ (i.e. `abc` is a subsequence of `babac` if `ab` is a subsequence of `baba` or `abc` is a subsequence of `baba`)

> There are two base cases. If $s_1$ is empty, return True. If length($s_1$) > length($s_2$), return False.

The problem with this definition is that it can do a lot of repeated work (see the image below). The complexity is $O(2^{\text{length of } s_2})$. The dynamic programmic approach comes at the problem from the other angle. 

<a href="gt.dot.png"><img src="gt.dot.png" alt="Finding a subsequence" style="width: 200px;"/></a>

The way I think about it is that the recursive solution would be very efficient if there was some magic lookup table we could consult, which would give the answers to the subproblems. We can build that lookup table, starting with very short fragments of $s_1$ and $s_2$, building up the table, and using previous results in the table to fill in each cell.

In this problem, we build a table such that the cell at row _i_ and column _j_ contains True if the first _i_ characters of $s_1$ are a subsequence of the first _j_ characters of $s_2$. (Note a complication due to Python's zero-based indexing of strings and lists. The third character of $s_1$ is referred to in Python as `s1[2]`.)

Going back to the recursive description, we can see that:

> All cells in the top row (_i_ == 0) contain True.

> All cells in the left column (_j_ == 0) contain False (apart from _i_ == _j_ == 0, which is True).

> If the _i_-1 th character of $s_1$ is different from the  _j_-1 th character of $s_2$, this cell (at position (_i_, _j_) ) contains the same value as the cell at (_i_, _j_ - 1) i.e. the cell to the left.

> If the _i_-1 th character of $s_1$ is the same as the  _j_-1 th character of $s_2$, this cell (at position (_i_, _j_) ) contains True if either cell at (_i_, _j_ - 1) (i.e. the cell to the left) contains True, or the cell at (_i_ - 1, _j_ - 1) (i.e. the cell diagonally above and to the left) contains True.

As each cell in the table only references the cells above and to the left, we can fill out the table row by row, going from left to right, and know we will always have the information needed to complete each cell when we get to it.

And that's dynamic programming. As we're filling out a table, the complexity is $O({\text{length of } s_1} \times {\text{length of } s_2})$ or roughly $O\left((\text{length of } s_2)^2 \right)$

The tables below show worked examples for seeing if `acba` is a subsequnce of `aaccabab` (it is) and `cdabcaca` (it isn't).

For the first example, we fill out the first row of the table with True (by definition).

For the second row (with _i_ = 1), we want to see if `a` is a subsequence of different prefixes is `aaccabab`. The cell with _j_ = 0 is False, by definition. For the cell at (_i_ = 1, _j_ = 1), the characters at $s_1$[0] and $s_2$[0] are the same, so this cell is True if the cell to the left is True (it isn't) or the cell above and to the left is True (it is). So cell (1, 1) is True, and the rest of that row is filled out to True.

For the third row (with _i_ = 2), we want to see if `ac` is a subsequence of different prefixes is `aaccabab`. The cell with _j_ = 0 and _j_ = 1 are False, by definition. For the cell at (_i_ = 2, _j_ = 2), the characters at $s_1$[1] and $s_2$[1] are different, so this cell is True is the cell to the left is True; it isn't, so this cell contains False. For the cell at (_i_ = 2, _j_ = 3), the characters at $s_1$[1] and $s_2$[2] are the same, so this cell is True if the cell to the left is True (it isn't) or the cell above and to the left is True (it is). So cell (2, 3) is True, and the rest of that row is filled out to True.

You can continue filling out the table in the same way.

When the table is complete, the bottom right cell contains True, which means that `acba` is a subsequnce of `aaccabab`

|   |<br />0|a<br />1|a<br />a<br />2|a<br />a<br />c<br />3|a<br />a<br />c<br />c<br />4|a<br />a<br />c<br />c<br />a<br />5|a<br />a<br />c<br />c<br />a<br />b<br />6|a<br />a<br />c<br />c<br />a<br />b<br />a<br />7|a<br />a<br />c<br />c<br />a<br />b<br />a<br />b<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|T|T|T|T|T|T|T|T|
|2<br />ac|.|.|.|T|T|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|T|T|T|
|4<br />acba|.|.|.|.|.|.|.|T|T|


|   |<br />0|c<br />1|c<br />d<br />2|c<br />d<br />a<br />3|c<br />d<br />a<br />b<br />4|c<br />d<br />a<br />b<br />c<br />5|c<br />d<br />a<br />b<br />c<br />a<br />6|c<br />d<br />a<br />b<br />c<br />a<br />c<br />7|c<br />d<br />a<br />b<br />c<br />a<br />c<br />a<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|.|.|T|T|T|T|T|T|
|2<br />ac|.|.|.|.|.|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|.|.|.|
|4<br />acba|.|.|.|.|.|.|.|.|.|



# Worked example: part 2
This was a harder task, but the test data I provided didn't require the general case to be solved. 

The task was to return if $s_1$ and $s_2$ could be interleaved to form $s_3$. That's a stronger condition than just saying that both $s_1$ and $s_2$ are subsequences of $s_3$. For instance, `aba` and `aca` are both subsequences of `abbcca`, but there's no way of interleaving `aba` and `aca` to form `abbcca` (the interleaved sequence should have four `a`s, one `b`, and one `c`).

For the test data provided, there was only one string which had both $s_1$ and $s_2$ as subsequences. I should have given other distractors in the test data, where $s_1$ and $s_2$ were both subsequences but the distactor wasn't formed from the interleaving.

Anyway, the solution I was hoping for was another dynamic programming one. 

A recursive solution to the problem (can $s_1$ and $s_2$ be interleaved to form $s_3$?) looks like:

> If the last characters of $s_1$ and $s_3$ are the same, $s_1$ and $s_2$ be interleaved to form $s_3$ if `butlast`($s_1$) and $s_2$ can be interleaved to form `butlast`($s_3$).

> If the last characters of $s_2$ and $s_3$ are the same, $s_1$ and $s_2$ be interleaved to form $s_3$ if $s_1$ and `butlast`($s_2$) can be interleaved to form `butlast`($s_3$).

> If the last characters of $s_1$ and $s_2$ and $s_3$ are all the same, check both of the conditions above, returning True if either is True.

> If the last characters of $s_1$ and $s_2$ and $s_3$ are all different, return False.

> There are three base cases. If $s_1$ is empty, return $s_2$ == $s_3$. If $s_2$ is empty, return $s_1$ == $s_3$. If $s_1$ + $s_2$ is longer than $s_3$, return False.

This gives us the ammunition to build the dynamic programming table. The cell at (_i_, _j_) will contain True if the first _i_ characters of $s_1$ can be interleaved with the first _j_ characters of $s_2$ to form the first _i_ + _j_ characters of $s_3$.

All cells in the table initially contain False.

When filling out the table, you either look at the cell to the left (if the last characters of $s_1$ and $s_3$ are the same) or the cell above (if the last characters of $s_2$ and $s_3$ are the same). 

## Part 1

In [5]:
def is_subseq_simple(s1, s2):
    i = 0
    for c in s2:
        if i < len(s1) and s1[i] == c:
            i += 1
    return i == len(s1)

In [6]:
def is_subseq_simple_shortcut(s1, s2):
    i = len(s1) - 1
    for c in reversed(s2):
        if s1[i] == c:
            i -= 1
        if i < 0: break
    return i < 0

In [8]:
def show_backtrace_s(bps):
    i = max([0] + [k[0] for k in bps])
    j = max([0] + [k[1] for k in bps])
    chars = ''
    if (i, j) in bps:
        while i != 0 and j != 0:
            if bps[i, j][3] == 'seq1':
                chars += bps[i, j][2].upper()
            else:
                chars += bps[i, j][2]
            i, j = bps[i, j][0], bps[i, j][1] 
        return ''.join(list(reversed(chars)))
    else:
        return ''

In [9]:
def is_subseq_recursive(s1, s2):
    if not s1:
        return True
    elif len(s1) > len(s2):
        return False
    else:
        if s1[-1] == s2[-1]:
            return is_subseq_recursive(s1, s2[:-1]) or is_subseq_recursive(s1[:-1], s2[:-1])
        else:
            return is_subseq_recursive(s1, s2[:-1])

In [10]:
def is_subseq(seq1, seq2, return_backpointers=False, return_table=False, debug=False):
    """Return true if seq1 is a subsequence of seq2.
    If return_backpointers, also return the set of backpointers to
    reconstruct the subsequence"""
    
    # dp_table[i, j] is True if first i characters of seq1 can
    # be found in the first j characters of seq2
  
    dp_table = {(i, j): False
               for i in range(len(seq1)+1)
               for j in range(len(seq2)+1)}

    backpointers = {}
    
    for i in range(len(seq1)+1):
        for j in range(i, len(seq2)+1):
            if i == 0:
                dp_table[i, j] = True
                if debug: print('aa', i, j, '!', '!', dp_table[i, j])
            elif j == 0:
                dp_table[i, j] = False
                if debug: print('zz', i, j, '!', '!', dp_table[i, j])
            else:
                # extend by character from s2
                if dp_table[i, j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                    if debug: print('s2', i, j, seq1[i-1], seq2[j-1], dp_table[i, j])                
                # extend by character from s1
                if dp_table[i-1, j-1] and seq1[i-1] == seq2[j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i-1, j-1, seq1[i-1], 'seq1')                
                    if debug: print('s1', i, j, seq1[i-1], seq2[j-1], dp_table[i, j])
                if not dp_table[i, j]:
                    if debug: print('xx', i, j, seq1[i-1], seq2[j-1], dp_table[i, j]) 
    
    if return_backpointers or return_table:
        retval = [dp_table[len(seq1), len(seq2)]]
        if return_backpointers:
            retval += [backpointers]
        if return_table:
            retval += [dp_table]
        return tuple(retval)
    else:
        return dp_table[len(seq1), len(seq2)]

In [11]:
def is_subseq_rows(seq1, seq2, return_backpointers=False, debug=False):
    """Return true if seq1 is a subsequence of seq2.
    If return_backpointers, also return the set of backpointers to
    reconstruct the subsequence"""
    
    # dp_table[i, j] is True if first i characters of seq1 can
    # be found in the first j characters of seq2
  
    backpointers = {}
    
    for i in range(len(seq1)+1):
        row = [False] * (len(seq2)+1)
        for j in range(i, len(seq2)+1):
            if i == 0:
                row[j] = True
                if debug: print('aa', i, j, '!', '!', row[j])
            elif j == 0:
                row[j] = False
                if debug: print('zz', i, j, '!', '!', row[j])
            else:
                # extend by character from s2
                if row[j-1]:
                    row[j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                    if debug: print('s2', i, j, seq1[i-1], seq2[j-1], row[j])                
                # extend by character from s1
                if previous_row[j-1] and seq1[i-1] == seq2[j-1]:
                    row[j] = True
                    backpointers[i, j] = (i-1, j-1, seq1[i-1], 'seq1')                
                    if debug: print('s1', i, j, seq1[i-1], seq2[j-1], row[j])
                if not row[j]:
                    if debug: print('xx', i, j, seq1[i-1], seq2[j-1], row[j])
        previous_row = row
    
    if return_backpointers:
        retval = [row[-1]]
        if return_backpointers:
            retval += [backpointers]
        return tuple(retval)
    else:
        return row[-1]

In [12]:
sum(1 for s in bills
   if s != 0
   if is_subseq(bills[0], bills[s]))

22

In [13]:
sum(1 for s in bills
   if s != 0
   if is_subseq_rows(bills[0], bills[s]))

22

In [14]:
sum(1 for s in bills
   if s != 0
   if is_subseq_simple(bills[0], bills[s]))

22

In [15]:
sum(1 for s in bills
   if s != 0
   if is_subseq_simple_shortcut(bills[0], bills[s]))

22

In [17]:
%%timeit
sum(1 for s in bills
   if s != 0
   if is_subseq_simple(bills[0], bills[s]))

100 loops, best of 3: 6.11 ms per loop


In [18]:
%%timeit
sum(1 for s in bills
   if s != 0
   if is_subseq_simple_shortcut(bills[0], bills[s]))

100 loops, best of 3: 4.24 ms per loop


In [20]:
%%timeit
sum(1 for s in bills
   if s != 0
   if is_subseq(bills[0], bills[s]))

1 loop, best of 3: 7.75 s per loop


In [21]:
[s for s in bills
   if s != 0
   if is_subseq(bills[0], bills[s])]

[11,
 21,
 25,
 30,
 33,
 38,
 43,
 45,
 48,
 50,
 52,
 55,
 56,
 61,
 75,
 80,
 91,
 103,
 104,
 113,
 144,
 146]

In [22]:
%time is_subseq(bills[0], bills[11]), is_subseq(bills[0], bills[3])

CPU times: user 140 ms, sys: 4 ms, total: 144 ms
Wall time: 144 ms


(True, False)

In [23]:
# %time is_subseq_recursive(bills[0], bills[11])

In [24]:
v, bp, t = is_subseq(bills[0], bills[11], return_backpointers=True, return_table=True)
print(show_annotated_table(t, bp))
print(show_backtrace_s(bp))

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. c g d b h c e c a b h g f e g h a a c a c c h d g g g g e e f g h g b c g e g a e c a e c g d g e e e g g d d h b a e h b d f b g g f g f h b f e b d h c c d c g h c f a b h c c c h b h f f c c d 

## Part 2

In [25]:
def is_interleave_recursive(s1, s2, s3):
    if not s1:
        return s2 == s3
    elif not s2:
        return s1 == s3
    elif len(s1) + len(s2) != len(s3):
        return False
    else:
        if s1[-1] == s3[-1] and s2[-1] == s3[-1]:
            return (is_interleave_recursive(s1[:-1], s2, s3[:-1]) 
                    or 
                    is_interleave_recursive(s1, s2[:-1], s3[:-1]) )
        elif s1[-1] == s3[-1]:
            return is_interleave_recursive(s1[:-1], s2, s3[:-1])
        elif s2[-1] == s3[-1]:
            return is_interleave_recursive(s1, s2[:-1], s3[:-1])
        else:
            return False

In [26]:
def is_interleave(seq1, seq2, seq3, return_backpointers=False, return_table=False, debug=False):
    """Return true if seq3 is some interleaved merge of seq1 and seq2.
    If return_backpointers, also return the set of backpointers to
    reconstruct the interleaving"""
    
    # dp_table[i, j] is True if first i+j characters of seq is made up of 
    # an interleaving of the first i characters of seq1 and the 
    # first j characters of seq2
    
    if len(seq1) + len(seq2) != len(seq3):
        if return_backpointers or return_table:
            retval = [False]
            if return_backpointers:
                retval += [{}]
            if return_table:
                retval += [{}]
            return tuple(retval)
        else:
            return False
    
    dp_table = {(i, j): False
               for i in range(len(seq1)+1)
               for j in range(len(seq2)+1)}

    backpointers = {}

    for i in range(len(seq1)+1):
        for j in range(len(seq2)+1):
            if i == 0 and j == 0:
                dp_table[i, j] = True
                if debug: print('xxxx', i, j, '!', '!', '!', dp_table[i, j])
            elif i == 0:
                # extend by character from seq2
                if dp_table[i, j-1] and seq2[j-1] == seq3[i+j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                if debug: print('seq2', i, j, '!', seq2[j-1], seq3[i+j-1], dp_table[i, j])
            elif j == 0:
                # extend by character from seq1
                if dp_table[i-1, j] and seq1[i-1] == seq3[i+j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')
                if debug: print('seq1', i, j, seq1[i-1], '!', seq3[i+j-1], dp_table[i, j])
            else:
                # extend by character from seq2
                if dp_table[i, j-1] and seq2[j-1] == seq3[i+j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                    if debug: print('seq2', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], dp_table[i, j])                
                # extend by character from seq1
                if dp_table[i-1, j] and seq1[i-1] == seq3[i+j-1]:
                    dp_table[i, j] = True
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')                
                    if debug: print('seq1', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], dp_table[i, j])
                if not dp_table[i, j]:
                    if debug: print('xxxx', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], dp_table[i, j])

    if return_backpointers or return_table:
        retval = [dp_table[len(seq1), len(seq2)]]
        if return_backpointers:
            retval += [backpointers]
        if return_table:
            retval += [dp_table]
        return tuple(retval)
    else:
        return dp_table[len(seq1), len(seq2)]

In [27]:
def is_interleave_rows(seq1, seq2, seq3, return_backpointers=False, debug=False):
    """Return true if seq3 is some interleaved merge of seq1 and seq2.
    If return_backpointers, also return the set of backpointers to
    reconstruct the interleaving.
    
    This version doesn't build the whole table, just keeps the current and previous rows"""
    
    # dp_table[i, j] is True if first i+j characters of seq is made up of 
    # an interleaving of the first i characters of seq1 and the 
    # first j characters of seq2
    
    if len(seq1) + len(seq2) != len(seq3):
        if return_backpointers:
            retval = [False]
            if return_backpointers:
                retval += [{}]
            return tuple(retval)
        else:
            return False
    

    backpointers = {}

    for i in range(len(seq1)+1):
        row = [False] * (len(seq2)+1)
        for j in range(len(seq2)+1):
            if i == 0 and j == 0:
                row[j] = True
                if debug: print('xxxx', i, j, '!', '!', '!', row[j])
            elif i == 0:
                # extend by character from seq2
                if row[j-1] and seq2[j-1] == seq3[i+j-1]:
                    row[j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                if debug: print('seq2', i, j, '!', seq2[j-1], seq3[i+j-1], row[j])
            elif j == 0:
                # extend by character from seq1
                if previous_row[j] and seq1[i-1] == seq3[i+j-1]:
                    row[j] = True
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')
                if debug: print('seq1', i, j, seq1[i-1], '!', seq3[i+j-1], row[j])
            else:
                # extend by character from seq2
                if row[j-1] and seq2[j-1] == seq3[i+j-1]:
                    row[j] = True
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                    if debug: print('seq2', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])                
                # extend by character from seq1
                if previous_row[j] and seq1[i-1] == seq3[i+j-1]:
                    row[j] = True
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')                
                    if debug: print('seq1', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])
                if not row[j]:
                    if debug: print('xxxx', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])
        previous_row = row

    if return_backpointers:
        retval = [row[-1]]
        if return_backpointers:
            retval += [backpointers]
        return tuple(retval)
    else:
        return row[-1]

In [28]:
def is_interleave_rows2(seq1, seq2, seq3, return_backpointers=False, debug=False):
    """Return true if seq3 is some interleaved merge of seq1 and seq2.
    If return_backpointers, also return the set of backpointers to
    reconstruct the interleaving.
    
    This version doesn't keep the whole table, just the current and previous
    rows. It also builds the current row as it goes along, rather than
    building the whole row and updating elements as required."""
    
    # dp_table[i, j] is True if first i+j characters of seq is made up of 
    # an interleaving of the first i characters of seq1 and the 
    # first j characters of seq2
    
    if len(seq1) + len(seq2) != len(seq3):
        if return_backpointers:
            retval = [False]
            if return_backpointers:
                retval += [{}]
            return tuple(retval)
        else:
            return False
    

    backpointers = {}

    for i in range(len(seq1)+1):
        row = []
        for j in range(len(seq2)+1):
            if i == 0 and j == 0:
                row += [True]
                if debug: print('xxxx', i, j, '!', '!', '!', row[j])
            elif i == 0:
                # extend by character from seq2
                if row[j-1] and seq2[j-1] == seq3[i+j-1]:
                    row += [True]
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                else:
                    row += [False]
                if debug: print('seq2', i, j, '!', seq2[j-1], seq3[i+j-1], row[j])
            elif j == 0:
                # extend by character from seq1
                if previous_row[j] and seq1[i-1] == seq3[i+j-1]:
                    row += [True]
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')
                else:
                    row += [False]
                if debug: print('seq1', i, j, seq1[i-1], '!', seq3[i+j-1], row[j])
            else:
                # extend by character from seq2
                if row[j-1] and seq2[j-1] == seq3[i+j-1]:
                    row += [True]
                    backpointers[i, j] = (i, j-1, seq2[j-1], 'seq2')
                    if debug: print('seq2', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])                
                # extend by character from seq1
                elif previous_row[j] and seq1[i-1] == seq3[i+j-1]:
                    row += [True]
                    backpointers[i, j] = (i-1, j, seq1[i-1], 'seq1')                
                    if debug: print('seq1', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])
                else:
                    row += [False]
                    if debug: print('xxxx', i, j, seq1[i-1], seq2[j-1], seq3[i+j-1], row[j])
        previous_row = row

    if return_backpointers:
        retval = [row[-1]]
        if return_backpointers:
            retval += [backpointers]
        return tuple(retval)
    else:
        return row[-1]

In [29]:
def show_backtrace_i(bps):
    i = max([0] + [k[0] for k in bps])
    j = max([0] + [k[1] for k in bps])
    chars = ''
    if (i, j) in bps:
        while i != 0 or j != 0:
            if bps[i, j][3] == 'seq1':
                chars += bps[i, j][2].upper()
            else:
                chars += bps[i, j][2]
            i, j = bps[i, j][0], bps[i, j][1] 
        return ''.join(list(reversed(chars)))
    else:
        return ''

In [30]:
[s for s in bills
   if is_interleave(bills[0], bills[1], bills[s])]

[30]

In [31]:
[s for s in bills
   if is_interleave_rows(bills[0], bills[1], bills[s])]

[30]

In [32]:
[s for s in bills
   if is_interleave_rows2(bills[0], bills[1], bills[s])]

[30]

In [33]:
[s for s in bills
   if is_subseq_simple(bills[0], bills[s])
   if is_subseq_simple(bills[1], bills[s])]

[30]

In [34]:
v, bp, t = is_interleave(bills[0], bills[1], bills[30], return_backpointers=True, return_table=True)
print(show_table(t))
print(show_backtrace_i(bp))
v

T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 

True

In [35]:
[s for s in bills
   if is_interleave_recursive(bills[0], bills[1], bills[s])]

[30]

In [36]:
%%timeit
[s for s in bills
   if is_interleave(bills[0], bills[1], bills[s])]

1 loop, best of 3: 3.04 s per loop


In [37]:
%%timeit
[s for s in bills
   if is_interleave_rows(bills[0], bills[1], bills[s])]

1 loop, best of 3: 960 ms per loop


In [38]:
%%timeit
[s for s in bills
   if is_interleave_rows2(bills[0], bills[1], bills[s])]

1 loop, best of 3: 1.47 s per loop


In [39]:
%%timeit
[s for s in bills
   if is_interleave_recursive(bills[0], bills[1], bills[s])]

1000 loops, best of 3: 511 µs per loop


# Sense solution
Took 
* 25.8 seconds to load file
* 15203.9 seconds to find subsequences (4.22 hours; 4 hours, 13 minutes, 23.9 seconds)
* 40083.8 seconds to check all interleavings (11.13 hours; 11 hours, 8 minutes, 3.8 seconds)

Total of 15 hours 21 minutes 53.5 seconds.

In [40]:
rtime = 15203.9
(rtime / 3600,
 int(rtime / 3600), 
 int(rtime / 60 - int(rtime / 3600) * 60), 
 rtime - int(rtime / 60) * 60
)

(4.223305555555555, 4, 13, 23.899999999999636)

In [41]:
rtime = 40083.8
(rtime / 3600,
 int(rtime / 3600), 
 int(rtime / 60 - int(rtime / 3600) * 60), 
 rtime - int(rtime / 60) * 60
)

(11.13438888888889, 11, 8, 3.8000000000029104)

In [42]:
rtime = 25.8 + 15203.9 +40083.8
(rtime / 3600,
 int(rtime / 3600), 
 int(rtime / 60 - int(rtime / 3600) * 60), 
 rtime - int(rtime / 60) * 60
)

(15.36486111111111, 15, 21, 53.5)

# Explanations

In [43]:
starget = 'acba'
swrong = 'cdabcaca'
sinterleave = 'abcacbba'
ssubseq = 'aaccabab'

In [44]:
def show_subseq_md_table(short, long, table):
    header = '|   |' + '|'.join('{}<br />{}'.format('<br />'.join(str(long[:i])), i) for i in range(len(long) + 1)) + '|'
    separator = '|:---:' * (len(long) + 2) + '|'
    rows = []
    columns = sorted(set(k[1] for k in table))
    for r in range(len(short) + 1):
        row = '|{}<br />{}|'.format(r, short[:r])
        row += '|'.join('T' if table[r, c] else '.' for c in columns)
        row += '|'
        rows += [row]
    return '\n'.join([header] + [separator] + rows)
#     return '\n'.join(
#         ' '.join('T' if table[i, j] else '.' for j in sorted(set([k[1] for k in table])))
#         for i in sorted(set([k[0] for k in table])))        

In [45]:
v, bp, t = is_subseq(starget, ssubseq, return_backpointers=True, return_table=True)
print(show_annotated_table(t, bp))
print(show_backtrace_s(bp))

. . . . . . . . .
. a a c c a b a b
. . . c c a b a b
. . . . . . b a b
. . . . . . . a b
AcCaBAb


In [46]:
print(show_table(t))

T T T T T T T T T
. T T T T T T T T
. . . T T T T T T
. . . . . . T T T
. . . . . . . T T


In [47]:
print(show_subseq_md_table(starget, ssubseq, t))

|   |<br />0|a<br />1|a<br />a<br />2|a<br />a<br />c<br />3|a<br />a<br />c<br />c<br />4|a<br />a<br />c<br />c<br />a<br />5|a<br />a<br />c<br />c<br />a<br />b<br />6|a<br />a<br />c<br />c<br />a<br />b<br />a<br />7|a<br />a<br />c<br />c<br />a<br />b<br />a<br />b<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|T|T|T|T|T|T|T|T|
|2<br />ac|.|.|.|T|T|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|T|T|T|
|4<br />acba|.|.|.|.|.|.|.|T|T|


|   |<br />0|a<br />1|a<br />a<br />2|a<br />a<br />c<br />3|a<br />a<br />c<br />c<br />4|a<br />a<br />c<br />c<br />a<br />5|a<br />a<br />c<br />c<br />a<br />b<br />6|a<br />a<br />c<br />c<br />a<br />b<br />a<br />7|a<br />a<br />c<br />c<br />a<br />b<br />a<br />b<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|T|T|T|T|T|T|T|T|
|2<br />ac|.|.|.|T|T|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|T|T|T|
|4<br />acba|.|.|.|.|.|.|.|T|T|

In [48]:
v, bp, t = is_subseq(starget, swrong, return_backpointers=True, return_table=True)
print(show_annotated_table(t, bp))
print(show_backtrace_s(bp))

. . . . . . . . .
. . . a b c a c a
. . . . . c a c a
. . . . . . . . .
. . . . . . . . .
ACa


In [49]:
print(show_subseq_md_table(starget, swrong, t))

|   |<br />0|c<br />1|c<br />d<br />2|c<br />d<br />a<br />3|c<br />d<br />a<br />b<br />4|c<br />d<br />a<br />b<br />c<br />5|c<br />d<br />a<br />b<br />c<br />a<br />6|c<br />d<br />a<br />b<br />c<br />a<br />c<br />7|c<br />d<br />a<br />b<br />c<br />a<br />c<br />a<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|.|.|T|T|T|T|T|T|
|2<br />ac|.|.|.|.|.|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|.|.|.|
|4<br />acba|.|.|.|.|.|.|.|.|.|


|   |<br />0|c<br />1|c<br />d<br />2|c<br />d<br />a<br />3|c<br />d<br />a<br />b<br />4|c<br />d<br />a<br />b<br />c<br />5|c<br />d<br />a<br />b<br />c<br />a<br />6|c<br />d<br />a<br />b<br />c<br />a<br />c<br />7|c<br />d<br />a<br />b<br />c<br />a<br />c<br />a<br />8|
|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|0<br />|T|T|T|T|T|T|T|T|T|
|1<br />a|.|.|.|T|T|T|T|T|T|
|2<br />ac|.|.|.|.|.|T|T|T|T|
|3<br />acb|.|.|.|.|.|.|.|.|.|
|4<br />acba|.|.|.|.|.|.|.|.|.|