# Day 9: Stream Processing

A large stream blocks your path. According to the locals, it's not safe to cross the stream at the moment because it's full of garbage. You look down at the stream; rather than water, you discover that it's a stream of characters.

You sit for a while and record part of the stream (your puzzle input).
The characters represent **groups** - sequences that begin with `{` and end with `}`.
Within a group, there are zero or more other things, separated by commas: either another **group** or **garbage**.
Since groups can contain other groups, a `}` only closes the most-recently-opened unclosed group - that is, they are nestable.
Your puzzle input represents a single, large group which itself contains many smaller ones.

Sometimes, instead of a group, you will find **garbage**.
Garbage begins with `<` and ends with `>`. Between those angle brackets, almost any character can appear, *including* `{` and `}`. Within garbage, `<` has *no* special meaning.

In a futile attempt to clean up the garbage, some program has canceled some of the characters within it using `!`: inside garbage, any character that comes (immediately) after `!` should be ignored, including `<`, `>`, and even another `!`.

You don't see any characters that deviate from these rules.
Outside garbage, you only find well-formed groups, and garbage always terminates according to the rules above.

In [1]:
B = '{'
E = '}'


def G(content):
    return {'t': 'garbage', 'c': content}


def parse_group(s):
    # start of group
    if not s or s[0] != '{':
        raise Exception('Expect {, got %s.' % (s[0] if s else 'EOF'))
    s = s[1:]
    yield B, s

    first = True

    while True:
        if not s:
            raise Exception('Expect %s or }, got EOF.' %
                            ('{ or <' if first else 'comma'))

        # end of group?
        if s[0] == '}':
            break

        # need comma unless first
        if not first:
            if s[0] != ',':
                raise Exception('Expect comma, got %s.' % s[0])
            s = s[1:]

        # expect subgroup or garbage
        if s[0] == '{':
            for tok, s in parse_group(s):
                yield tok, s
        elif s[0] == '<':
            for tok, s in parse_garbage(s):
                yield tok, s
        else:
            raise Exception('Expect { or <, got %s.' % s[0])
        first = False

    # end of group
    yield E, s[1:]


def parse_garbage(s):
    # start of garbage
    content = ''
    if not s or s[0] != '<':
        raise Exception('Expect <, got %s.' % (s[0] if s else 'EOF'))
    s = s[1:]

    while True:
        if not s:
            raise Exception('Expect garbage character, got EOF.')

        # end of garbage?
        if s[0] == '>':
            break

        # skip?
        if s[0] == '!':
            s = s[2:]
        else:
            content += s[0]
            s = s[1:]

    # emit token
    yield G(content), s[1:]


def parse(s):
    for tok, s in parse_group(s):
        yield tok
    if s:
        raise Exception('Expect EOF, got %s.' % s)

Here are some self-contained pieces of garbage:

In [2]:
p = lambda s: list(parse_garbage(s))

# empty garbage
assert p('<>') == [(G(''), '')]

# garbage containing random characters
assert p('<random characters>') == [(G('random characters'), '')]

# the extra < are ignored
assert p('<<<<>') == [(G('<<<'), '')]

# the first > is canceled
assert p('<{!>}>') == [(G('{}'), '')]

# the second ! is canceled, allowing the > to terminate the garbage
assert p('<!!>') == [(G(''), '')]

# the second ! and the first > are canceled
assert p('<!!!>>') == [(G(''), '')]

# ends at the first >
assert p('<{o"i!a,<{i<a>') == [(G('{o"i,<{i<a'), '')]

Here are some examples of whole streams and the number of groups they contain:

In [3]:
p = lambda s: list(parse(s))
c = lambda s: sum(tok == B for tok in parse(s))

# 1 group
assert p('{}') == [B, E]
assert c('{}') == 1

# 3 groups
assert p('{{{}}}') == [B, B, B, E, E, E]
assert c('{{{}}}') == 3

# also 3 groups
assert p('{{},{}}') == [B, B, E, B, E, E]
assert c('{{},{}}') == 3

# 6 groups
assert p('{{{},{},{{}}}}') == [B, B, B, E, B, E, B, B, E, E, E, E]
assert c('{{{},{},{{}}}}') == 6

# 1 group (which itself contains garbage).
assert p('{<{},{},{{}}>}') == [B, G('{},{},{{}}'), E]
assert c('{<{},{},{{}}>}') == 1

# 1 group
assert p('{<a>,<a>,<a>,<a>}') == [B, G('a'), G('a'), G('a'), G('a'), E]
assert c('{<a>,<a>,<a>,<a>}') == 1

# 5 groups
assert p('{{<a>},{<a>},{<a>},{<a>}}') == [
    B, B, G('a'), E, B,
    G('a'), E, B, G('a'), E, B,
    G('a'), E, E
]
assert c('{{<a>},{<a>},{<a>},{<a>}}') == 5

# 2 groups (since all but the last > are canceled)
assert p('{{<!>},{<!>},{<!>},{<a>}}') == [B, B, G('},{<},{<},{<a'), E, E]
assert c('{{<!>},{<!>},{<!>},{<a>}}') == 2

Your goal is to find the total score for all groups in your input. Each group is assigned a score which is one more than the score of the group that immediately contains it. (The outermost group gets a score of 1.)

In [4]:
def score(s):
    level = 1
    total = 0
    for tok in parse(s):
        if tok == B:
            total += level
            level += 1
        elif tok == E:
            level -= 1
    return total


# score of 1
assert score('{}') == 1

# score of 1 + 2 + 3 = 6
assert score('{{{}}}') == 6

# score of 1 + 2 + 2 = 5
assert score('{{},{}}') == 5

# score of 1 + 2 + 3 + 3 + 3 + 4 = 16
assert score('{{{},{},{{}}}}') == 16

# score of 1
assert score('{<a>,<a>,<a>,<a>}') == 1

# score of 1 + 2 + 2 + 2 + 2 = 9
assert score('{{<ab>},{<ab>},{<ab>},{<ab>}}') == 9

# score of 1 + 2 + 2 + 2 + 2 = 9
assert score('{{<!!>},{<!!>},{<!!>},{<!!>}}') == 9

# score of 1 + 2 = 3
assert score('{{<a!>},{<a!>},{<a!>},{<ab>}}') == 3

What is the total score for all groups in your input?

In [5]:
puzzle = open('09.input').read().strip()
score(puzzle)

21037

## Part Two

Now, you're ready to remove the garbage.

To prove you've removed it, you need to count all of the characters within the garbage.
The leading and trailing `<` and `>` don't count, nor do any canceled characters or the `!` doing the canceling.

In [6]:
def count_garbage(s):
    total = 0
    for tok in parse(s):
        if isinstance(tok, dict) and tok['t'] == 'garbage':
            total += len(tok['c'])
    return total


c = lambda s: count_garbage('{' + s + '}')

# 0 characters
assert c('<>') == 0

# 17 characters
assert c('<random characters>') == 17

# 3 characters.
assert c('<<<<>') == 3

# 2 characters.
assert c('<{!>}>') == 2

# 0 characters
assert c('<!!>') == 0

# 0 characters
assert c('<!!!>>') == 0

# 10 characters
assert c('<{o"i!a,<{i<a>') == 10

How many non-canceled characters are within the garbage in your puzzle input?

In [7]:
count_garbage(puzzle)

9495