# Day 7 - 

https://adventofcode.com/2022/day/7

In [87]:
from pathlib import Path

INPUTS = Path('input.txt').read_text().strip().split('\n')

## Part 1

In [88]:
from collections import defaultdict

_template = {
    "<dir/path>": {
        "files": [
            {"name": 123}
        ]
    }
}

def new_dir():
    return {"files": [], "total_size": 0}

dirs = defaultdict(new_dir)


Obviously we have to start from the top of the inputs. First command is the root dir, and there's only one. I think we can safely ignore it.

On inspection of inputs, every instance of `$ cd <name>` is immediately followed by `$ ls`, and there are no instances where `$ cd ..` is followed by `$ ls`.

So when we come in contact with `$ cd <name>`, we can ignore the next line entirely; meaning we can safely strip all `$ ls` commands from the inputs set.

In [89]:
INPUTS = [x for x in INPUTS[1:] if x != "$ ls"]

Now, a conundrum. It would have made things easier if I could uniquely identify a directory by its name alone, but that's not the case: there are instances of multiple directories of the same name, likely somewhere else in the file system structure so they don't conflict.

So we can't make a simple dict keyed by names of the dirs only: we have to make the keys into the full paths from root so they can remain unique. That also makes things simpler for the traversal of the file system: when going up a level, simply pop a dir name off the path, which we can use as a cursor.

In [90]:
def cursor_to_key(cursor: list[str]) -> str:
    return "/" + "/".join(cursor)

def add_file_size_to_dirs(cursor: list[str], file_size: int):
    """Add the file size of the current file to every dict in the path of the cursor."""
    this_cursor = cursor.copy()
    key = cursor_to_key(this_cursor) 
    dirs[key]["total_size"] += file_size
    if this_cursor:
        # There are more dicts above this one to process
        this_cursor.pop()
        add_file_size_to_dirs(this_cursor, file_size)


cursor = []
for line in INPUTS:
    # line could be a file, a dir, or a cd command
    if line.startswith("$"):
        # cd command
        name = line.split()[-1]
        if name == "..":
            cursor.pop()
        else:
            cursor.append(name)
    elif not line.startswith("dir"):
        # We can safely ignore lines with `dir`. As we `cd` to them, we just append to the cursor
        # and then add files to them by appending to the defaultdict.
        # But I don't think it's necessary to actually create those keys with empty lists first.
        size, name = line.split()
        size = int(size)
        dirs[cursor_to_key(cursor)]["files"].append({name: size})
        add_file_size_to_dirs(cursor, size)


Now we have `dirs` which is a flat dict containing absolute paths to directories and their file contents along with individual sizes. Each dir also has `total_size` set to the total file size of all their contents, thanks to the recursive func `add_file_size_to_dirs`.

Given our method for summing totals can include dupes, we now just need to sum up total sizes that are less than the max value `100000`.

In [91]:
MAX = 100000
result = sum(val['total_size'] for val in dirs.values() if val['total_size'] <= MAX)
print(f"{result=}")

result=1118405


## Part 2

This is a pretty simple ask given what we've already calculated. We need to determine:
- How much space we're using out of the total `70000000`;
- How much space we need to remove to hit the `30000000` mark;
- Which directories are `>=` to that diff; and finally
- Which of those directories is the smallest total size among that set.

So we just need to determine that minimum amount to remove, gather paths and values that are greater than that total, sort them, and pick the first one.

In [92]:
to_remove = 30000000 - (70000000 - dirs["/"]["total_size"])
a = sorted(
    [val["total_size"] for val in dirs.values() if val["total_size"] > to_remove]
)
result = a[0]

print(f"{result=}")


result=12545514
