# Advent of Code 2018

Hey there! For this year of Advent of Code, I'm going to be trying to journal my progress through the month's problems using a Python notebook, inspired by the similar notebooks of [Peter Norvig](https://github.com/norvig/pytudes). I've considered doing other things this year, like perhaps using the Advent of Code as an excuse to learn a new language, but last year I only completed the first 8, so I figured I should focus on first trying to complete all of this year's problems before getting ahead of myself. (Plus, the easier challenges make for good interview problem practice, so it fits in with my current goals).

Since I'm very comfortable with Python, I'm going to try to be competitive with timing (as the time to write code is the main bottleneck typically - not the program execution), so I will try to finish on the leaderboard. But this is my first time using an Python notebook, so there's a chance this will slow me down. Better hope for the best!

## Day 0: Imports and utilities

Following the inspiration of Peter Norvig, here I will try to put some code and imports that I think could be useful. For now I'll just include libraries and functions I've used a decent amount.

In [1]:
import bisect
import math
import random
import re
import string

from blist import blist
from collections import Counter, defaultdict
from heapq import heappush, heappop
from itertools import accumulate, combinations, cycle, takewhile

# source: https://stackoverflow.com/a/30558049
def unique_permutations(elements):
    if len(elements) == 1:
        yield (elements[0],)
    else:
        unique_elements = set(elements)
        for first_element in unique_elements:
            remaining_elements = list(elements)
            remaining_elements.remove(first_element)
            for sub_permutation in unique_permutations(remaining_elements):
                yield (first_element,) + sub_permutation

def flatten(lst):
    return [elem for sublst in lst for elem in sublst]

## [Day 1](https://adventofcode.com/2018/day/1): Chronal Calibration

This problem was very straight forward. Getting a tight time is just a matter of having your environment setup, and fortunately I don't think using jupyter really slowed me down much, as I already had prepared some starter code in advance for reading in input.

Unfortunately, I took 1:48 to solve the first star, but to get in the top 100 I would have needed 1:32; and I took 6:32 on the second star, but to get in the top 100 I would have needed 5:28. Besides reading the problem as a bottleneck, during the second problem I was planning to loop over indices but ended up looping over values, so I ended up with an incorrect answer costing me a minute. Also, my code definitely looks pretty sloppy...

In [2]:
# first star

with open('input/input1.txt') as f:
    X = list(map(int, f.read().split()))

sum(X)

556

In [3]:
# second star

freqs = set()
curr = 0
while True:
    found = False
    for x in X:
        curr += x
        if curr in freqs:
            found = True
            print(curr)
            break
        freqs.add(curr)
    if found:
        break

448


Here's how I would refactor the second star with a few extra minutes to spare. The interesting thing about this kind of problem is that it seems simple enough that it should be doable with a Python one-or-two-liner, but I don't have an exact intuition for it (though I'm sure someone on Reddit has done it). That said, I think for most people this type of problem feels easier to solve this with iterative methods than functional programming.

In [6]:
# second star: revision 1

seen = set()
for x in accumulate(cycle(X)):
    if x in seen:
        print(x)
        break
    seen.add(x)

448


In [7]:
# second star: revision 2

seen = set()
def check_repeats(val):
    if val not in seen:
        seen.add(val)
        return False
    return True

next(filter(check_repeats, accumulate(cycle(X))))

448

## [Day 2](https://adventofcode.com/2018/day/2): Inventory Management System

Unfortunately, I didn't have this time to do this challenge at midnight, so I can't really comment on speed for these questions, though they were only slightly more involved than Day 1. The first star is very easily solvable in Python using the Counter library function from the collections module - it's useful in quite a number of coding interview puzzles, in my experience.

The second star involves a bit more work, as it seems to necessitate comparing all pairs of strings against one another to decide which differs exactly by one character. Both parts, calculating the number of differences, and deriving the string in common, can be simplified as Python one-liners without adding much unnecessary overhead. Writing Python one-liners tend to just feel satisfying to derive and look clean to look at, even though they don't always reflect how a person came up with the line.

It's worth noting how out of habit (from websites like HackerRank and LeetCode), I tend to code defensively, doing things like iterating only through the minimum of the lengths of the two strings, even though all strings in the given input are the same length.

In [10]:
# first star

with open('input/input2.txt') as f:
    X = f.read().split()

twos, threes = 0, 0
for x in X:
    count = Counter(x)
    if 2 in count.values():
        twos += 1
    if 3 in count.values():
        threes += 1
twos * threes

7192

In [11]:
# second star

def num_differences(s1, s2):
    return sum([s1[i] != s2[i] for i in range(min(len(s1), len(s2)))])

def in_common(s1, s2):
    return ''.join([s1[i] for i in range(min(len(s1), len(s2))) if s1[i] == s2[i]])

for i in range(len(X)):
    for j in range(i + 1, len(X)):
        if num_differences(X[i], X[j]) == 1:
            return in_common(X[i], X[j])

'mbruvapghxlzycbhmfqjonsie'

In [19]:
# second star: revision (using zip, combinations)

def num_differences(s1, s2):
    return sum([c1 != c2 for c1, c2 in zip(s1, s2)])

def in_common(s1, s2):
    return ''.join([c1 for c1, c2 in zip(s1, s2) if c1 == c2])

in_common(*next(filter(lambda x: num_differences(x[0], x[1]) == 1, combinations(X, 2))))

'mbruvapghxlzycbhmfqjonsie'

## [Day 3](https://adventofcode.com/2018/day/3): No Matter How You Slice It

That was satisfying for sure! I ended up solving this puzzle using a simple brute force approach, storing all of the positions in a grid and then performing the necessary operations to update the grid with appropriate ownership of the squares. For the second star, my first solution made the answer relatively easy to retrieve, since it just required me to iterate over in the same fashion and re-check the ownership of various regions.

Unfortunately, the main part I got stuck on was something I should probably be fluent with: input parsing. I knew how to use string.split() to separate the lines, but the complex format of the output made regex a clear candidate for the problem. That said, I haven't necessarily used regex a lot, so I was originally looking up ways to separate strings by multiple separators in Python, before deciding that just gathering all of the sequences of digits would be more efficient.

The other slight slowdown was that at the very end I remembered I had to flatten the list (or at least flattening would make for the cleanest solution), and I hadn't written a function for it in advance. Once these issues were figured out though, most of the coding was smooth sailing.

In [30]:
# first star

with open('input/input3.txt') as f:
    X = f.read().split('\n')[:-1]  # remove the last line, which is empty

grid = [[None] * 1000 for _ in range(1000)]
for entry in X:
    claim, x, y, w, h = map(int, re.findall('\d+', entry))
    for row in range(y, y + h):
        for col in range(x, x + w):
            if grid[row][col] is None:
                grid[row][col] = claim
            else:
                grid[row][col] = 'X'

sum([claim == 'X' for claim in flatten(grid)])

113966

In [34]:
# second star

for entry in X:
    claim, x, y, w, h = map(int, re.findall('\d+', entry))
    safe = True
    for row in range(y, y + h):
        for col in range(x, x + w):
            if grid[row][col] == 'X':
                safe = False
                break
        if not safe:
            break
    if safe:
        print(claim)
        break

235


In [35]:
# second star: revision

for entry in X:
    claim, x, y, w, h = map(int, re.findall('\d+', entry))
    tiles = [item for row in grid[y:y + h] for item in row[x:x + w]]
    if all(map(lambda x: x != 'X', tiles)):
        print(claim)
        break

235


## [Day 4](https://adventofcode.com/2018/day/4): Repose Record

This update is quite delayed, but as I became preoccupied with final exams, I had to take a temporary hiatus on Advent of Code. But now that I am on holiday break, there is more time to complete these challenges (though I won't be competing for leaderboard positions). Anyway, onward to the puzzle!

As with Day 3, I was slowed down a bit by first figuring out what the most efficient way to parse the data was. I used regular expressions again this time, but instead of splitting the string, I just used re.match while specifying groups in my pattern (using parentheses) so I could extract individual groups into tuples. I'm not sure if this is necessarily the most efficient method, but it's fairly readable and understandable, so I think it works fine. Since the time stamps of each string are fixed in size, it now occurs to me that I could probably have just picked slices out of the strings for the times, but I would still have to search for the guard IDs on the "begins shift" records.

The rest of the calculation itself was fairly simple. Once I extracted the data into tuples (ordered by month, day, hour, minute), sorting in Python automatically places all of the records in the right order. Then I stored individual data for guards using a dictionary, associating an array of 60 integers for each guard. Once I determined the time interval during which a guard was sleeping, I simply incremented the corresponding entries of the array. It's possible this could be handled more efficiently somehow (perhaps with some variant of range queries?) but for a list of size 60, we can sleep safely at night. After we have the sleeping data for each guard, calculating the most-slept minutes (and the total time they spent sleeping) was easy.

In [37]:
# first star

with open('input/input4.txt') as f:
    lines = f.readlines()

records = []
for line in lines:
    m = re.match(r"\[\d\d\d\d-(\d\d)-(\d\d) (\d\d):(\d\d)\] (.*)", line)
    records.append((int(m.group(1)), int(m.group(2)), int(m.group(3)), int(m.group(4)), m.group(5)))

# generate guard data
records = sorted(records)
guard_data = dict()  # dictionary mapping guard ID's (ints) to length-60 arrays of sleep frequencies
curr_guard = -1
sleeping = -1
for record in records:
    m = re.search(r"\d+", record[4])
    if m:  # "Begin shift" record
        curr_guard = int(m.group(0))
        if curr_guard not in guard_data:
            guard_data[curr_guard] = [0] * 60
    elif record[4] == 'falls asleep':  # "falls asleep" record
        sleeping = record[3]
    elif record[4] == 'wakes up':  # "wakes up" record
        for i in range(sleeping, record[3]):
            guard_data[curr_guard][i] += 1
    else:
        raise Exception('could not read data: {}'.format(record))
        
# process guard data
best_id = ''
best_total = 0
best_minute = 0
for guard in guard_data:
    total = sum(guard_data[guard])
    if total > best_total:
        best_id = guard
        best_total = total
        best_minute = guard_data[guard].index(max(guard_data[guard]))

best_id * best_minute

39698

In [38]:
# second star

best_id = ''
best_freq = 0
best_minute = 0
for guard in guard_data:
    freq = max(guard_data[guard])
    if freq > best_freq:
        best_id = guard
        best_freq = freq
        best_minute = guard_data[guard].index(freq)

best_id * best_minute

14920

In [87]:
# first star: revision

with open('input/input4.txt') as f:
    lines = f.readlines()

def extract_entry(line):
    m = re.match(r"\[\d\d\d\d-(\d\d)-(\d\d) (\d\d):(\d\d)\] (.*)", line)
    return tuple(map(lambda i: int(m.group(i)), range(1, 5))) + (m.group(5),)

records = sorted(map(extract_entry, lines))

# generate guard data
guard_data = defaultdict(lambda: [0] * 60)  # map guard ID's (ints) to length-60 arrays of sleep frequencies
curr_guard = -1
sleeping = -1
for record in records:
    m = re.search(r"\d+", record[4])
    if m:  # "Begin shift" record
        curr_guard = int(m.group(0))
    elif record[4] == 'falls asleep':  # "falls asleep" record
        sleeping = record[3]
    elif record[4] == 'wakes up':  # "wakes up" record
        for i in range(sleeping, record[3]):
            guard_data[curr_guard][i] += 1
    else:
        raise Exception('could not read data: {}'.format(record))

best_total, best_minute, best_guard = max(map(
    # tup[0] is the guard id, tup[1] is the guard's sleep times
    lambda tup: (sum(tup[1]), tup[1].index(max(tup[1])), tup[0]),
    guard_data.items()))

best_guard * best_minute

39698

In [76]:
# second star: revision

best_total, best_minute, best_guard = max(map(
    # tup[0] is the guard id, tup[1] is the guard's sleep times
    lambda tup: (max(tup[1]), tup[1].index(max(tup[1])), tup[0]),
    guard_data.items()))

best_guard * best_minute

14920

## [Day 5](https://adventofcode.com/2018/day/5) : Alchemical Reduction

For the first star, my initial instinct was to try and somehow convert the string into a data structure which allowed constant time intermediate deletions, like a doubly-linked list. But as it happens, doubly-linked lists aren't natively available in Python, and creating a datastructure for this didn't seem like it would be very efficient (unless I was using a lower level language like C/C++/Rust perhaps).

Fortunately, I realized I could accomplish the task easier if I just used a stack to continuously add elements to the "result" string, just checking if anything reacts with the top element in the stack. When one imagines the long chemical chain reacting in a vacuum, the first image that comes to my mind is a long line which slowly collapses inwards - but when you view it just from one side, the iterative approach clearly doesn't miss out on any reactions. This result can probably be shown easily using some form of proof by induction or contradiction.

For the second star, I reused the same approach to simply iterate through the newly reduced string with a stack, but additionally checking if the character I am currently reading is a fixed target letter (e.g. 'a'), in which case I ignore it and move to the next character. I repeat this for every possible fixed letter, and keep track of what the lowest string length I obtained overall was.

In [118]:
# first star

with open('input/input5.txt') as f:
    X = f.readline().strip()

stack = []
for char in X:
    if not stack:  # stack is empty
        stack.append(char)
    else:
        e1 = stack[-1]
        e2 = char
        if e1.lower() == e2.lower() and \
                ((e1 == e1.lower() and e2 == e2.upper()) or \
                 (e1 == e1.upper() and e2 == e2.lower())):
            stack.pop()
        else:
            stack.append(char)

print(len(stack))

10564


In [113]:
# second star

min_length = len(stack)
for letter in string.ascii_lowercase:
    new_stack = []
    for char in stack:
        if char == letter or char == letter.upper():
            continue
        elif not new_stack:
            new_stack.append(char)
        else:
            e1 = new_stack[-1]
            e2 = char
            if e1.lower() == e2.lower() and \
                    ((e1 == e1.lower() and e2 == e2.upper()) or \
                     (e1 == e1.upper() and e2 == e2.lower())):
                new_stack.pop()
            else:
                new_stack.append(char)
    min_length = min(min_length, len(new_stack))

print(min_length)

6336


In [132]:
# first star: revision

with open('input/input5.txt') as f:
    X = f.readline().strip()

def reduce_polymer(pol, exclude=None):
    stack = []
    for char in pol:
        if exclude and (char == exclude or char == exclude.upper()):
            continue
        elif not stack:  # stack is empty
            stack.append(char)
        else:
            top = stack[-1]
            if top.lower() == char.lower() and top != char:
                stack.pop()
            else:
                stack.append(char)
    return ''.join(stack) # converting back to string doesn't seem to impact performance

first_reduction = reduce_polymer(X)
len(first_reduction)

10564

In [140]:
# second star: revision

min(map(lambda letter: len(reduce_polymer(first_reduction, letter)), string.ascii_lowercase))

6336

## [Day 6](https://adventofcode.com/2018/day/6): Chronal Coordinates

This problem certainly off the bat seemed like a large difficulty spike compared to the previous puzzles. The problem was additionally challenging since I originally went down some incorrect paths, after intuitively thinking about the problem initially in terms of euclidaen distance, even though it actually specifies Manhattan distance to be used.

Nonetheless, to start off this puzzle I began trying to think of how I could identify which points had areas that were finite. After some thinking (initially with the assumption of using Euclidean distance), I first realized that any point with finite area must be in some way "enclosed" by a polygon formed by other points, and moreover, there has to be a triangle of other points that enclose it. So if for each point I iterated over all triples of other points, and if I could easily calculate if one point was within a triangle of other points, then I could determine which points have bounded areas. (Calculating this is not trivial, but I found a StackOverfloow post offering an easy calculation for if a point is within a 2D triangle, and willingly borrowed it for my educational use).

Later on, after realizing I had to use Manhattan distance, I had to reconsider this "finite area" identification subproblem, as the triangle-based solution no longer worked. One thing I noticed was that the diagrams formed by the points were very clearly Voronoi diagrams, as like the one shown in [this picture](https://commons.wikimedia.org/wiki/File:Manhattan_Voronoi_Diagram.svg). After enough studying of the picture, I concluded for a point to have finite area, then when considering the four diagonal quadrants surrounding the point (that is, visualized like the cartesian quadrants but at a 45 degree angle), there must be at least one point in each of the quadrants, assuming we label each point as the center of its four quadrants. So in a similar fashion as with the triangle idea, I iterated through all 4-combinations of points, and checked which would satisfy this four-quadrant constraint. (The exact checking method can be seen in the `point_in_diamond` function). Fortunately, as the input given only includes 50 points, 50^5 is just small enough number of iterations to be practical to handle all the combinations. Now, we have identified which of the 50 points will have finite areas.

Also after inspecting the input, I noticed that all 50 of the input points were within roughly a 400x400 grid space. This makes for about 160,000 individual "locations" on the grid, for which it would not be that expensive to calculate which input point each location is closest to. From here, it's just a matter of summing the number of locations associated with each input point, filtering out the input points with infinite space, and taking the maximum out of all of them to get the answer!

For the second star, the solution was easier as it just involve checking which points were within the constrained distance. Naturally, I just iterated over all of the locations, added the distance to each of the given points, and checked whether each was within the required constraint.

Overall the code I wrote takes a while to run - around 30 seconds to a minute it seems, so I reckon there are probably some much more efficient algorithms for the problem. But I really enjoyed this challenge as a whole!

Edit: Thanks to a kind suggestion of a friend, I was able to optimize the finite-area identification part of my algorithm from O(N^5) to O(N^2) (where N is the number of points). The trick is that we only actually need to identify the quadrant of each point once, to see if there is a point in the given quadrant. Thus, we do not actually have to know which four-tuple of points make up the quadrant. This allows me to calculate the solution to both parts within about 15 seconds now (with less than a second going towards the finite area calculation), so this is a substantial improvement.

In [3]:
# test = """1, 1
# 1, 6
# 8, 3
# 3, 4
# 5, 5
# 8, 9
# """

# X = test.split('\n')[:-1]

GRID_SIZE = 400

with open('input/input6.txt') as f:
    X = f.read().split('\n')[:-1]  # remove last entry corresponding to empty line

# points are stored as a list of coordinate tuples
points = list(map(lambda x: tuple(map(int, x.split(', '))), X))

def left_quad(x, test):
    return test[0] < x[0] and abs(test[1] - x[1]) < (x[0] - test[0])

def right_quad(x, test):
    return test[0] > x[0] and abs(test[1] - x[1]) < (test[0] - x[0])

def top_quad(x, test):
    return test[1] > x[1] and abs(test[0] - x[0]) < (test[1] - x[1])

def bottom_quad(x, test):
    return test[1] < x[1] and abs(test[0] - x[0]) < (x[1] - test[1])

# calculate which points have finite area
finite = [False] * len(points)
for i, point in enumerate(points):
    l = r = t = b = False
    for test_point in points:
        if left_quad(point, test_point):
            l = True
        elif right_quad(point, test_point):
            r = True
        elif top_quad(point, test_point):
            t = True
        elif bottom_quad(point, test_point):
            b = True
    if l and r and t and b:
        finite[i] = True

# this is about the half way point of the calculation,
# so its nice to get an update things are working!
print(finite)

def manhattan_dist(p0, p1):
    return abs(p1[0] - p0[0]) + abs(p1[1] - p0[1])

# calculate which point is closest for each location in the grid
grid = [[-1] * GRID_SIZE for _ in range(GRID_SIZE)]
for i in range(GRID_SIZE):
    for j in range(GRID_SIZE):
        shortest_index = 0
        shortest_dist = 2 * GRID_SIZE * GRID_SIZE
        for k in range(len(points)):
            dist = manhattan_dist(points[k], (i, j))
            if dist < shortest_dist:
                shortest_dist = dist
                shortest_index = k
            elif dist == shortest_dist:  # avoid ties
                shortest_index = -1
        grid[i][j] = shortest_index

# calculate which point (with finite area) has the most locations closest to it
flat_grid = [entry for row in grid for entry in row]
best_point = 0
best_count = 0
for i in range(len(points)):
    if not finite[i]:
        continue
    count = sum(map(lambda x: x == i, flat_grid))
    if count > best_count:
        best_count = count
        best_point = i

print(best_count)

# second star

TOTAL_DISTANCE = 10000

# store a grid indicating which positions are "centralized",
# i.e. within the total distance constraint
central = [[False] * GRID_SIZE for _ in range(GRID_SIZE)]
for i in range(GRID_SIZE):
    for j in range(GRID_SIZE):
        total = 0
        for k, point in enumerate(points):
            total += manhattan_dist(points[k], (i, j))
        if total < TOTAL_DISTANCE:
            central[i][j] = True

flat_central = [entry for row in central for entry in row]
print(sum(flat_central))

[True, True, False, True, True, False, False, False, True, False, False, False, True, False, False, True, True, False, True, True, False, True, True, True, True, True, True, True, True, False, False, False, True, True, True, False, False, False, False, False, True, False, True, True, True, True, True, False, True, True]
3722
44634


## [Day 7](https://adventofcode.com/2018/day/7): The Sum of Its Parts

The first star for this challenge was relatively straight forward, as I quickly identified that it just required a topological sort of the steps. For a minute or so I forgot how the algorithm worked (is it like a breadth first search starting from the root? or from the tail? or is it different?), but I've implemented it in Python before so I eventually recalled how it worked. The algorithm follows easily from the logic of the example they included. The only trick perhaps is getting items to be removed in alphabetical order to break ties, but this is easy with the built-in heapq module.

The second star required just adding some additional components to handle the time-tracking logic separately. Essentially I just stored a dictionary of (letter, time-left) pairs to keep track of what is getting worked on.

In [31]:
# first star

with open('input/input7.txt') as f:
    X = f.readlines()

X = list(map(lambda x: (x[5], x[36]), X))

letters = set()
reqs = defaultdict(set)  # map from a required step to the set of steps that depend on it
rev_reqs = defaultdict(set)  # map from a step to the set of steps it requires
for req_step, step in X:
    reqs[req_step].add(step)
    rev_reqs[step].add(req_step)
    letters.add(step)
    letters.add(req_step)

ready = sorted([letter for letter in letters if len(rev_reqs[letter]) == 0])
sequence = []
while ready:
    step = heappop(ready)
    sequence.append(step)
    for succ in reqs[step]:
        rev_reqs[succ].remove(step)
        if len(rev_reqs[succ]) == 0:
            heappush(ready, succ)

print(''.join(sequence))

# second star

# re-initialize variables
sequence = []
rev_reqs = defaultdict(set)
for req_step, step in X:
    rev_reqs[step].add(req_step)
ready = sorted([letter for letter in letters if len(rev_reqs[letter]) == 0])

times = {letter: 61 + ord(letter) - ord('A') for letter in letters}
processing = dict()
time = 0
while ready or processing:
    # more steps that workers can begin work on
    while ready and len(processing) < 5:
        step = heappop(ready)
        processing[step] = times[step]
        
    # perform time-based updates
    keys = list(processing.keys())
    for key in keys:
        processing[key] -= 1
        if processing[key] == 0:
            del processing[key]
            sequence.append(key)
            for succ in reqs[key]:
                rev_reqs[succ].remove(key)
                if len(rev_reqs[succ]) == 0:
                    heappush(ready, succ)

    time += 1
    
print(''.join(sequence))
print(time)

BFKEGNOVATIHXYZRMCJDLSUPWQ
BFKVEGAOTNYIHXZRMCJLDSUPWQ
1020


## [Day 8](https://adventofcode.com/2018/day/8): Memory Maneuver

This challenge presented a simple but important design question, which is how should the problem solution be implemented. The direct calculation I had to make wasn't necessarily complicated, as it is simply an "evaluation" strategy for a particular tree structure (and the input string can easily be parsed into a tree via recursion and stacks), but how the implementation is done can be simple or complex.

One totally viable strategy I was considering was one where I actually generated the tree corresponding to the input, and them performed a recursion computation according to it. In the first part, this option certainly tempted me since I knew there would be a second part of the problem, which surely would also require some kind of calculation based on trees. But in the end, I figured the problem would just be easier (and shorter) to implement by directly parsing the strings in a solution that combines iteration and recursion, so no tree data structure was needed. It's possible the solutions are less elegant, since there's no clear lines where the data transformations occur, and if you wanted to change the calculations in some way you have to essentially read through all of the manual iteration logic and figure out what line to change - but that said, most of the logic is chunked into "stages" of processing that should be easy to interpret.

In [23]:
# first star

with open('input/input8.txt') as f:
    X = list(map(int, f.read().split()))

# X = list(map(int, "2 3 0 3 10 11 12 1 1 0 1 99 2 1 1 2".split()))

# returns (sum, index) tuple where index is the next index to read
def sum_entries(lst, index):
    total = 0
    num_children = lst[index]
    num_entries = lst[index + 1]
    index += 2
    for _ in range(num_children):
        subtotal, end = sum_entries(lst, index)
        total += subtotal
        index = end
    total += sum(lst[index:index + num_entries])
    return (total, index + num_entries)

print(sum_entries(X, 0))

# second star

def node_value(lst, index):
    num_children = lst[index]
    num_entries = lst[index + 1]
    index += 2
    if num_children == 0:
        value = sum(lst[index:index + num_entries])
        return (value, index + num_entries)
    children_values = []
    for _ in range(num_children):
        value, end = node_value(lst, index)
        children_values.append(value)
        index = end
    value = 0
    for i in range(num_entries):
        entry = lst[index + i]
        if entry > 0 and entry <= num_children:
            value += children_values[entry - 1]
    return (value, index + num_entries)

print(node_value(X, 0))

(48443, 18734)
(30063, 18734)


## [Day 9](https://adventofcode.com/2018/day/9): Marble Mania

In principle, this problem is simple - the main challenge is finding a way to optimize it so that it runs quickly. Without any clear mathematical intuition for a quick shortcut for calculating scores, the core of my solution is to just simulate the entire marble game using a list. The cause for concern of course, is that we need to constantly be inserting to the list - and when our list gets large, this becomes very inefficient - particularly for the second star.

One natural option would be to represent the circle as a circularly linked list, as by using pointers the insertion would be relatively constant -  but this isn't a natively implemented data structure in Python and I partially doubted how efficient it would be with the overhead of Python classes. To save time, I did a little bit of googling, and found a library named [blist](http://stutzbachenterprises.com/blist/blist.html) which offers lists with 
logarithmic insertion time, so I could improve the code while only having to change about two lines!

In the end, after getting this working solution, I did write a solution with a doubly linked list implementation, and I'm happy with how it turned out. In the end the runtimes seem roughly comparable (both taking no more than 10-15 seconds), though I could imagine this could easily be much faster if it was implemented in C or another low level language. It's crazy actually having to write my own link list class after thinking it was just a data structure reserved for introductory data structure classes at college, but it was useful after all! I mostly just included methods for the `DLNode` (doubly-linked node) class that mattered for my problem, so it goes without saying a full implementation would probably have things interfacing slightly differently.

In [58]:
# first star

with open('input/input9.txt') as f:
    X = f.read().split(' ')
    players = int(X[0])
    marbles = int(X[6])

def marble_score(num_players, num_marbles):
    circle = blist([0])
    curr_marble_index = 1
    scores = [0] * num_players
    for i in range(1, num_marbles + 1):
        if i % 23 != 0:
            new_index = (curr_marble_index + 2) % len(circle)
            if new_index == 0:
                circle.append(i)
                curr_marble_index = len(circle) - 1
            else:
                circle.insert(new_index, i)
                curr_marble_index = new_index
        else:
            curr_marble_index = (curr_marble_index - 7) % len(circle)
            scores[(i - 1) % num_players] += i + circle[curr_marble_index]
            circle.pop(curr_marble_index)
    return max(scores)

# doubly linked node
class DLNode(object):
    def __init__(self, val, nxt, prev):
        self.val = val
        self.next = nxt
        self.prev = prev
    
    def insert_after(self, val):
        new = DLNode(val, self.next, self)
        self.next.prev = new
        self.next = new
    
    def back(self, count):
        curr = self
        for _ in range(count):
            curr = curr.prev
        return curr
    
    # deletes self and returns the next node if there was any
    def delete_self(self):
        temp = self.next
        if self.prev:
            self.next.prev = self.prev
        if self.next:
            self.prev.next = temp
        self.prev = None
        self.next = None
        return temp
    
    def __str__(self):
        res = [str(self.val)]
        curr = self
        while curr.next and curr.next != self:
            curr = curr.next
            res.append(str(curr.val))
        return '[{}]'.format(', '.join(res))

def marble_score_ll(num_players, num_marbles):
    curr = DLNode(0, None, None)
    orig = curr
    curr.next = curr
    curr.prev = curr
    scores = [0] * num_players
    for i in range(1, num_marbles + 1):
        if i % 23 != 0:
            curr_next = curr.next
            curr.next.insert_after(i)
            curr = curr_next.next
        else:
            seven_back = curr.back(7)
            scores[(i - 1) % num_players] += i + seven_back.val
            seven_back_next = seven_back.delete_self()
            curr = seven_back_next
    return max(scores)
            

print(marble_score(players, marbles))
print(marble_score_ll(players, marbles))

# second star

print(marble_score(players, marbles * 100))
print(marble_score_ll(players, marbles * 100))

422980
422980
3552041936
3552041936


 ## [Day 10](https://adventofcode.com/2018/day/10): The Stars Align
 
It is late February 2019 and it has been a long time since I've attempted any of these challenges, but it never hurts to pick off another!

This challenge wasn't too algorithmically difficult, as it was just a matter of being clever in how you find out when (and where) the points are going to converge. For me, the trick was looking at the data and noticing that despite the wide range of starting values for stars' locations, they all directly correspond to their initial velocities - so an x-coordinate in the 50000s has a starting x velocity of -5, and a y-coordinate in the -20000s has a starting y velocity of +2, etc. Thus I quickly extrapolated that I could just update the stars' positions by 10000 seconds straight in the future, and start guessing/simulating from there.

I initially guessed that the points might end up in a small 100x100 range near the origin, though it turned out by both looking my image projections and the actual numerical ranges of the points, it was actually slightly further off, so I had to move my window a bit. It's definitely interesting to consider how one could generalize this task or do it more algorithmically (opposed to just using the code as a way to 'aid' the search process). One simply idea might be to just look for the point in time where the variance in the points' positions is smallest.

I'm mostly satisfied with my solution, except the way I printed out a windowed region of the sky/graph seemed inelegant, as well as the way I parsed the values out of the input data in a somewhat hardcoded fashion.

In [29]:
# both stars

with open('input/input10.txt') as f:
    X = f.readlines()

class Star:
    def __init__(self, pos, vel):
        self.pos = pos
        self.vel = vel
    
    def move(self, scale=1):
        new_x = self.pos[0] + (self.vel[0] * scale)
        new_y = self.pos[1] + (self.vel[1] * scale)
        self.pos = (new_x, new_y)
    
    def __repr__(self):
        return 'Star({}, {})'.format(self.pos[0], self.pos[1])

POSX_INDEX = X[0].index('<') + 1  # 6 chars long
POSY_INDEX = X[0].index(',') + 1  # 7 chars long
VELX_INDEX = X[0].index('<', POSX_INDEX + 1) + 1  # 2 chars long
VELY_INDEX = X[0].index(',', POSY_INDEX + 1) + 1  # 3 chars long

def create_star(line):
    posx = int(line[POSX_INDEX:POSX_INDEX + 6])
    posy = int(line[POSY_INDEX:POSY_INDEX + 7])
    velx = int(line[VELX_INDEX:VELX_INDEX + 2])
    vely = int(line[VELY_INDEX:VELY_INDEX + 3])
    return Star((posx, posy), (velx, vely))

# randomly guessing that everything will appear within x and y ranges of [100, 200]

WINDOW_LEFT = 100
WINDOW_SIZE = 100

def print_map(stars):
    grid = [['.'] * WINDOW_SIZE for _ in range(WINDOW_SIZE)]
    for star in stars:
        if star.pos[0] >= WINDOW_LEFT and star.pos[0] < WINDOW_LEFT + WINDOW_SIZE \
            and star.pos[1] >= WINDOW_LEFT and star.pos[1] < WINDOW_LEFT + WINDOW_SIZE:
            grid[star.pos[1] - WINDOW_LEFT][star.pos[0] - WINDOW_LEFT] = '#'
    for row in grid:
        print(''.join(row))

def update_stars(stars, scale=1):
    for star in stars:
        star.move(scale)

data = list(map(create_star, X[:-1]))
print('String: ')
print('Seconds: ', 10227)
update_stars(data, 10227)
print_map(data)

# # show the points converging
# iters = 10225
# update_stars(data, iters) # starting point
# for _ in range(10):
#     update_stars(data, 1)
#     iters += 1
#     print('Iters: ', iters)
#     print_map(data)

Seconds:  10227
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
....................................................................................................
...........................................................................

## Day 10.5: Intermission

At this point, I've solved 10 of the problems. While I'm a way from finishing, I'd like to pause to take a moment to look back at my solutions, review my code, and look for ways to improve efficiency or improve modularity. After this, I'll also be taking a look at [Peter Norvig's](https://github.com/norvig/pytudes/blob/master/ipynb/Advent-2018.ipynb) solutions to see how another experienced Python user approaches the problems, and what I can learn from it.

### Day 1

For the second star, I managed to find a way to make the solution slightly more functional by performing the repeat-checking in a helper function and then just using `filter`. But altogether this doesn't make a large difference. In Norvig's code, it's interesting that his `partial_sums` function is just a rewrite of accumulate I believe, so this is slightly redundant. Otherwise the solutions are comparable.

### Day 2

For both of the stars in this problem, I felt fairly confident with my solution. That said, I admire Norvig's use of a `quantify` function for the first star, which makes things looks even cleaner. I believe my solution may still be more efficient since I only end up evaluating `Counter(x)` once per ID (whereas I have doubts as to whether the compiler would optimize his code to do so). Also, even though I was proud of my one-liners for the second star, it turns out using the `zip` function yields an even cleaner result. Likewise I simplified my nested for-loops into a one-liner by using `combinations`, which is slightly more efficient than Norvig's two-part list comprehension (which may check many pairs of boxes twice).

### Day 3

For this problem, I initially named my variables quite poorly (using letters a, b, c, d, e instead of claim, x, y, w, h) so I just modified this in place to improve clarity. Then, for the second star I came up with a simpler check for claims which simply generated an list of the tiles in the appropriate rectangle, and performed a boolean operation over it to check if any are False. I'd worry that this solution could be slower if `all` operation doesn't short circuit at the first occurrence of a false value, but since `all` and `map` both operate in terms of iterables, it should be optimized by default.

That said, Norvig's improvements still impress me. It's extremely nice how he is able to parse input (just extracting the relevant integers) using his `Input` function, so perhaps I will adapt this at some point. His solution for counting grid cells is also rather clean. For the second star our solutions seem comparable, but I do like his first star solution (using Counter) a lot better given that it requires no if statements or for loops. That said, I reckon `Counter` might be less space efficient than storing the grid directly as a 2D list.

### Day 4

For this problem my solution overall seems pretty long and clunky (both in terms of parsing the input and then calculating the input data), but it was hard for me to find clear optimizations. As usual, Norvig has a clean way to parse the regular expression - first he performs substitutions from symbols to spaces, and then uses text.split(), allowing him to easily unpack the expressions. Also, he parses the entire date as a single unit, which is quite reasonable. The rest of the solution is also extremely clear to follow, using a similar functional style as usual. His solution to part 2 requires some thought as to how his code counts the minutes, but overall is still very understandable. Both dividing pieces of code into functions and using string substitution for parsing are both techniques I should use more often.

### Day 5

Here I was able to make various minor improvements, including making the reaction checking simpler, and reducing the second star down to just a single line by making the first function more modular. That said, I'm not sure if there is an easy way to make my core `reduce_polymer` function any more simpler, since the algorithm I am applying (which relies on pushing and popping elements on a stack) doesn't seem to lend itself easily to simple map operations and such. As I look at Norvig's solution, I seem to have confirmed this - although he uses a simple regex-substition loop to remove the characters, which is much easier to write albeit much less efficient. I think this is part of the beauty of Python though - the code takes so little time to write that if it ends up being too inefficient for our purposes, the time we spent writing the initial solution is overall negligible. Also, the fact that he even defines a "shortest" function just to improve readability is a nice touch.