# (Un)Packing containers

In the first notebook of this unit, I said that we want each FSA to be held by a single variable so that we can easily iterate over multiple FSAs.
This expansion unit explains why this isn't actually necessary and why we could have used Python's **unpacking** abilities instead.
This is a very nifty trick, and once you understand how it works you can use its counterpart **packing**.
With packing, you can define functions that take an arbitrary number of arguments.

## First some setup

Since this expansion unit builds directly on the first notebook, we have to carry over a few function definitions first.
Make sure to run the cell below

In [None]:
# defining accepts with only two arguments
def accepts(sentence, fsa):
    """Test if FSA accepts sentence."""
    # set current state
    cs = fsa["I"]
    # iterate over sentence and follow along in automaton
    for word in sentence:
        cs = next_state_nolist(cs, word, fsa["T"])
    # did we make it to a final state?
    return True if cs in fsa["F"] else False


def next_state_nolist(cs, word, T):
    """Return state reached via word from current state"""
    for x, y, next_state in T:
        if x == cs and y == word:
            return next_state
    return False

## A first look at unpacking

Originally we defined each FSA as three separate variables `I`, `F`, and `T`.
But then we switched to a dictionary instead.
This made it easy to iterate over multiple automata as in the code below.

In [None]:
import re

fsa1 = {"I": 1, "F": {4}, "T": {(1, "John", 2),
                                (2, "likes", 3),
                                (3, "Bill", 4)}}
fsa2 = {"I": 1, "F": {3}, "T": {(1, "Sue", 2),
                                (2, "slept", 3),
                                (2, "snored", 3),
                                (2, "knows", 1)}}


sentences = ["John likes Bill", "Sue knows Sue snored", "John likes Sue"]
for sentence in sentences:
    tokenized = re.findall(r"\w+", sentence)
    for fsa in [fsa1, fsa2]:
        status = "well-formed" if accepts(tokenized, fsa) else "ill-formed"
        print(f"\"{sentence}\" is {status}")
    print("="*20)

But now suppose that we still had the format with three distinct variables.
This also means that we would have a slightly different definition of `accepts`.

In [None]:
def accepts(sentence, I, F, T):
    """Test if FSA accepts sentence. Takes 3 separate arguments for FSA."""
    # set current state
    cs = I
    # iterate over sentence and follow along in automaton
    for word in sentence:
        cs = next_state_nolist(cs, word, T)
    # did we make it to a final state?
    return True if cs in F else False

Could we do the same iteration as above?
It's tricky.
Here's an attempt that doesn't work.

In [None]:
import re

I1 = 1
F1 = {4}
T1 = {(1, "John", 2),
      (2, "likes", 3),
      (3, "Bill", 4)}

I2 = 1
F2 = {3}
T2 = {(1, "Sue", 2),
      (2, "slept", 3),
      (2, "snored", 3),
      (2, "knows", 1)}


sentences = ["John likes Bill", "Sue knows Sue snored", "John likes Sue"]
for sentence in sentences:
    tokenized = re.findall(r"\w+", sentence)
    for I, F, T in [I1, F1, T1, I2, F2, T2]:
        status = "well-formed" if accepts(tokenized, I, F, T) else "ill-formed"
        print(f"\"{sentence}\" is {status}")
    print("="*20)

However, a minor change to this is sufficient to get it to work.

In [None]:
for sentence in sentences:
    tokenized = re.findall(r"\w+", sentence)
    for I, F, T in [[I1, F1, T1], [I2, F2, T2]]:
        status = "well-formed" if accepts(tokenized, I, F, T) else "ill-formed"
        print(f"\"{sentence}\" is {status}")
    print("="*20)

This might not be surprising to you at all because we've already seen it before in variable assignments.

In [None]:
a, b = [3, 5]
print(a)
print(b)

The `for`-loop above generalizes this idea.
First, each step of the `for`-loop takes one item out of `[[I1, F1, T1], [I2, F2, T2]]`.
This is a list containing lists, so the first item is `[I1, F1, T1]`.
Since we use three variables `I`, `F`, `T` instead of a single variable `fsa`, Python realizes that we want to do the equivalent of

```python
I, F, T = [I1, F1, T1]
```

And that's all there is to this.
We have successfully iterated over multiple automata without a unified data structure for FSAs.

But things don't stop here.
Instead of the code above, we could have also used a `for`-loop with just one variable, followed by **list unpacking**.

In [None]:
for sentence in sentences:
    tokenized = re.findall(r"\w+", sentence)
    for fsa in [[I1, F1, T1], [I2, F2, T2]]:
        status = "well-formed" if accepts(tokenized, *fsa) else "ill-formed"
        print(f"\"{sentence}\" is {status}")
    print("="*20)

For comparison, here's a piece of code that doesn't work.

In [None]:
for sentence in sentences:
    tokenized = re.findall(r"\w+", sentence)
    for fsa in [[I1, F1, T1], [I2, F2, T2]]:
        status = "well-formed" if accepts(tokenized, fsa) else "ill-formed"
        print(f"\"{sentence}\" is {status}")
    print("="*20)

Can you spot the difference?
The first version uses `accepts(tokenized, *fsa)`, the second one `accepts(tokenized, fsa)`.
The `*` tells Python to **unpack** the list before passing it into `accepts`.
In other words, instead of `accepts(tokenized, [I1, F1, T1])`, Python runs `accepts(tokenized, I1, F1, T1)`.
Unpacking is the kind of technique that you might not need all the time, but when you need it you really need it.
So it's good to have it in your toolbox.

## Examples of unpacking

Intuitively, list unpacking replaces a list by a sequence of its items.
The effect of that is most easily seen in a `for`-loop.

In [None]:
print("Iterating over list with two lists")
for x in [[0, 1], [2, 3]]:
    print(x)
    
print()
    
print("Iterating over list with four items")
for x in [0, 1, 2, 3]:
    print(x)

print()

print("Iterating over list with two unpacked lists.")
print("It looks exactly the same as the list with four items.")
for x in [*[0, 1], *[2, 3]]:
    print(x)

Unpacking isn't actually limited to lists, it works with just about any kind of container.

In [None]:
print("Iterating over a list with unpacked dictionaries")
for x in [*{"a": 1, "b": 2}, *{"c": 3, "d": 4}]:
    print(x)
    
print()

print("Iterating over a list with unpacked sets")
for x in [*{"a", "b"}, *{"c", "d"}]:
    print(x)
    
print()

print("Iterating over a list with unpacked tuples")
for x in [*("a", "b"), *("c", "d")]:
    print(x)

print()

print("Iterating over a list with unpacked strings")
for x in [*"ab", *"cd"]:
    print(x)

Any kind of container can be unpacked this way.

Alright, nifty, but what is any of that good for?

## Practical uses of unpacking

Unpacking is useful whenever you have a collection of containers but want to iterate over all their elements, rather than the containers themselves.
Suppose you have a collection of wordlists that you want to extract character n-grams from.
One option is to use two nested `for`-loops.

In [None]:
from nltk.corpus import brown, words

def extract_chargrams(word, n=2):
    return [word[m:m+n] for m in range(len(word) - (n -1))]


charbigrams = []
for wordlist in [brown.words()[:20], words.words()[:20]]:
    for word in wordlist:
        charbigrams += extract_chargrams(word)
        
print(charbigrams[:10])

But instead we can use a single `for`-loop with unpacking.

In [None]:
from nltk.corpus import brown, words

def extract_chargrams(word, n=2):
    return [word[m:m+n] for m in range(len(word) - (n -1))]


charbigrams = []
for word in [*brown.words()[:20], *words.words()[:20]]:
    charbigrams += extract_chargrams(word)
        
print(charbigrams[:10])

Unpacking is particularly useful if you want to bundle up possible parameters values for a function.

In [None]:
def circumfixation(stem, prefix, suffix):
    return prefix + stem + suffix

circumfix1 = ("anti-", "-missile")

print(circumfixation("tank", *circumfix1))

Speaking of functions, `*` can also be used in the definition of a function, as we'll see next.

## Using `*` for packing instead of unpacking

Sometimes you might want to write a function that can take an arbitrary number of arguments.
The **only** way to do this it with the counterpart of unpacking, which is argument packing.

In [None]:
def funcatenate(*words):
    output = "funcatenating:"
    sep = " ^-^"
    for word in words:
        output += " " + word + sep
    return output


print(funcatenate("this", "function", "takes", "an", "unlimited", "number", "of", "arguments"))

So by using `*words`, we tell Python to collect all the arguments into a list `words`, which we can then use in the usual fashion.
By the way, this includes slicing.

In [None]:
def funcatenate(*words):
    output = "funcatenating:"
    sep = " ^-^"
    for word in words[2:]:
        output += " " + word + sep
    return output

print(funcatenate("this", "function", "ignores", "the", "first", "two", "arguments"))
print(funcatenate("nothing", "here"))

We can also leave some arguments unpacked so that they can be referenced separately.

In [None]:
def funcatenate(start, finale, *words):
    output = start + ":"
    sep = " ^-^"
    for word in words:
        output += " " + word + sep
    return output + finale

print(funcatenate("still funcatenating", " !!!", "this", "goes", "in", "the", "middle"))

Note that normal arguments should occur before the packed ones, otherwise Python will complain about missing arguments.

In [None]:
def funcatenate(start, *words, finale):
    output = start + ":"
    sep = " ^-^"
    for word in words:
        output += " " + word + sep
    return output + finale

print(funcatenate("still funcatenating", "this", "goes", "in", "the", "middle", " !!!"))

If you absolutely want some normal argument to occur before the packed ones, each function call must specify this argument by name like `finale` in the example below.

In [None]:
def funcatenate(start, *words, finale):
    output = start + ":"
    sep = " ^-^"
    for word in words:
        output += " " + word + sep
    return output + finale

print(funcatenate("still funcatenating", "this", "goes", "in", "the", "middle", finale=" !!!"))

## A final test

Alright, there was a fair amount of new stuff in this expansion unit.
The ideas are very simple, but the fact that `*` is used for both packing and unpacking can be confusing.
So as a final test of your understand, try to make sense of the code below.
Needless to say, this code is deliberately unreadable.

In [None]:
def rapid_growth(x, y, *zs):
    n = x * y
    for z in zs:
        n **= z
    return n


for r in [(3, 5), (10, 15)]:
    print(rapid_growth(1, 2, *range(*r)))

## Bullet point summary

- Any container `con` can be unpacked with `*con`.
  Unpacking breaks the container down into its elements.
- Unpacking is useful with `for`-loops and for passing arguments into a function.
- When `*` is used in the definition of a function's arguments, it packs arguments into a list.
- Argument packing is needed when a function should take an unlimited number of argument.