# Day 7

The solution will be based on the visitor design pattern, using the "singledispatch" functionality provided by python.
References:
* https://stackoverflow.com/questions/2525677/how-to-write-the-visitor-pattern-for-abstract-syntax-tree-in-python
* https://docs.python.org/3/library/functools.html#functools.singledispatch
* https://en.wikipedia.org/wiki/Visitor_pattern

The basic idea of the visitor pattern is to separate the algorithm (the function; for instance: printing the current file or directory to stdout, or counting the size) from the data itself.

Which means that we will need 2 things:
- Objects that represent the data (files and directories). We won't implement any function here (print or size)
- Visitors that handle each data type (ie: file and directory) and perform the appropriate action. For this, we use `singledispatchmethod` and type hints to perform dynamic polymorphism.

> FIXME: for the directory size, the function is implemented directly in the Directory class. I did not find a way to accumulate the sizes of all subdirectories directly in the visitor.

Benefits of using the pattern:
* No modification of the data class needed if we want to change the function (for instance, to change the way a directory is printed)

Other solution:
* Implement a "print" method directly in the File and Directory classes, fairly easy in this specific case.

In [1]:
# First, a small hack so that we can the note book can import the common functions
import sys
sys.path.append("..")

In [2]:
# The following classes define the file system objects: nodes, files, directories
from dataclasses import dataclass, field

@dataclass
class Node:
    _name: str
    _size: int = 0      
    _parent = None
    _children: list = field(default_factory=list)

    def set_parent(self, parent):
        self._parent = parent

    @property
    def size(self):
        return self._size
    
    @property
    def pwd(self):
        pwd = self.name
        a = self.parent
        while a != None:
            if a.name != "/":
                pwd = a.name + "/" + pwd
            else:
                pwd = a.name + pwd
            a = a.parent
        return pwd
        
    @property
    def depth(self):
        a = self.parent
        level = 0
        while a != None:
            a = a.parent
            level += 1
        return level

    @property
    def parent(self):
        return self._parent

    @property
    def name(self):
        return self._name

    @property
    def children(self):
        return self._children

    def accept(self, visitor, traverse=False):
        visitor.visit(self)
        for c in self.children:
            if traverse:
                # Explore tree recursively
                c.accept(visitor, traverse)
            else:
                # Print children, nothing more
                visitor.visit(c)


@dataclass
class Directory(Node):
    def add(self, node):
        node.set_parent(self)
        self.children.append(node)

    @property
    def size(self):
        return sum([child.size for child in self.children])


@dataclass
class File(Node):         
    def __str__(self):
        return f"{self.size} {self.name}"


In [3]:
# This class implement a File System visitor that is able to print a directory and a file to stdout
from functools import singledispatchmethod

@dataclass
class Printer:

    @singledispatchmethod
    def visit(self):
        raise NotImplementedError("Element cannot be printed")

    def print(self, data, level):
        # print("{val:>{depth}}".format(val="|__", depth=level*4), data)
        print("|____"*level, data)

    @visit.register
    def _(self, arg: Directory):
        self.print(f"DIR {arg.name} (total size: {arg.size})", arg.depth)

    @visit.register
    def _(self, arg: File):
        self.print(f"{arg.size} {arg.name}", arg.depth)


In [4]:
# This class implement a File System visitor that is creates a dictionary of all directories and their size (subdirectories included)
from functools import singledispatchmethod

@dataclass
class DiskUsage:
    _dirs: dict = field(default_factory=dict)

    @property
    def dirs(self):
        return self._dirs

    @singledispatchmethod
    def visit(self):
        raise NotImplementedError("Element cannot be printed")

    @visit.register
    def _(self, arg: Directory):
        # Use full path to make sure directories with same name don't collide
        self.dirs[arg.pwd] = arg.size

    @visit.register
    def _(self, arg: File):
        # Size of children is already implemented in the Directory class.
        pass

In [5]:
# This class defines a file system explorer: stores the hierarchy of nodes, implement basic commands and a bind the printer to the nodes

class ElfFileSystemExplorer:

    def __init__(self) -> None:       
        self.root = Directory("/")
        self.cursor = self.root

    def cd(self, arg):
        """Update cursor. Return False if directory is not found"""
        if arg == "/":
            self.cursor = self.root
            return True

        if arg == ".." and self.cursor.parent is not None:
            self.cursor = self.cursor.parent
            return True

        for dir in filter(lambda x: isinstance(x, Directory), self.cursor.children):
            if dir.name == arg:
                self.cursor = dir
                return True

        return False
        
    def ls(self):
        """List files and directories at cursor"""
        print(f"${self.cursor.name}")
        self.cursor.accept(Printer(), False)

    def mkdir(self, name):
        """Create a directory in the current position"""
        if name not in [a.name for a in filter(lambda x: isinstance(x, Directory), self.cursor.children)]:
            self.cursor.add(Directory(name))

    def touch(self, name, length):
        """Create a file of given name and length"""
        self.cursor.add(File(name, length))

    def tree(self):
        """Show the full file hierarchy starting from cursor"""
        print(f"${self.cursor.name}")
        self.cursor.accept(Printer(), traverse=True)

    def du(self):
        """Show the disk usage: size of all directories. Warning: this starts from current cursor"""
        _du = DiskUsage()
        self.cursor.accept(_du, traverse=True)
        return _du.dirs



In [6]:
# Test the file system

a = ElfFileSystemExplorer()
print("Empty FS")
a.ls()
a.mkdir("Toto")
a.mkdir("Toto2")
a.mkdir("Toto3")
a.touch("fileroot", 1)
print("3 dir?")
a.ls()
a.cd("Toto")
a.touch("file1", 100)
a.touch("file2", 101)
print("Toto?")
a.ls()
a.cd("..")
print("/?")
a.ls()
a.cd("Toto2")
print("Toto2?")
a.ls()
a.cd("/")
print("/?")
a.ls()
print("Tree?")
a.tree()
print("DU?")
print(a.du())

Empty FS
$/
 DIR / (total size: 0)
3 dir?
$/
 DIR / (total size: 1)
|____ DIR Toto (total size: 0)
|____ DIR Toto2 (total size: 0)
|____ DIR Toto3 (total size: 0)
|____ 1 fileroot
Toto?
$Toto
|____ DIR Toto (total size: 201)
|____|____ 100 file1
|____|____ 101 file2
/?
$/
 DIR / (total size: 202)
|____ DIR Toto (total size: 201)
|____ DIR Toto2 (total size: 0)
|____ DIR Toto3 (total size: 0)
|____ 1 fileroot
Toto2?
$Toto2
|____ DIR Toto2 (total size: 0)
/?
$/
 DIR / (total size: 202)
|____ DIR Toto (total size: 201)
|____ DIR Toto2 (total size: 0)
|____ DIR Toto3 (total size: 0)
|____ 1 fileroot
Tree?
$/
 DIR / (total size: 202)
|____ DIR Toto (total size: 201)
|____|____ 100 file1
|____|____ 101 file2
|____ DIR Toto2 (total size: 0)
|____ DIR Toto3 (total size: 0)
|____ 1 fileroot
DU?
{'/': 202, '/Toto': 201, '/Toto2': 0, '/Toto3': 0}


Parse input commands

We will first define all available commands and use a factory method to convert the input line to the appropriate command.

Python allows us to iterate over direct children of a class, so we will use this to iterate over all known commands and return the first match. This implies that match must be selective enough. 

In [7]:
import re
from dataclasses import dataclass

@dataclass
class Command:
    """A parsed command
    The command is defined by its type: it must be a subclass of Command.
    Then, the arguments is a set of strings in the "arg" attributes.
    """
    
    command: str = ""
    arg: str = ""

    def __eq__(self, __o: object) -> bool:
        return __o.arg == self.arg

    @staticmethod
    def get_command(line):
        for command in Command.__subclasses__():
            args = re.search(command.command, line)
            if args:
                return command(arg=args.groups())
        raise NotImplementedError(f"Unknown Command: {line}")

@dataclass
class Cd(Command):
    command: str = r'\$\s+cd\s+([\w/\.]+)'

@dataclass
class Ls(Command):
    command: str = r'\$\s+ls'
    arg: str = ()

@dataclass
class FileList(Command):
    command: str = r'(\d+)\s+([\w.]+)'

@dataclass
class DirList(Command):
    command: str = r'dir\s+(\w+)'

print()





In [8]:
# Test above class

test_data = (
    ("$ ls", Ls()),
    ("dir a", DirList(arg=("a",))),
    ("14848514 b.txt", FileList(arg=("14848514", "b.txt"))),
    ("8504156 c", FileList(arg=("8504156", "c"))),
    ("dir d", DirList(arg=("d",))),
    ("$ cd a", Cd(arg=("a",))),
)

for input, check in test_data:
    c = Command.get_command(input)
    assert c == check, f"Got {c} instead of {check}"

In [9]:
from functools import singledispatchmethod


class ExplorerCommandParser:
    """Parse and execute Explorer commands
    We will use "singledispatchmethod" decorator so that Python calls the 
    correct command handlers solely based on the command type
    """
    
    def parse(self, input_str):
        for line in input_str.split("\n"):
            if len(line) == 0:
                continue
            command = Command.get_command(line)
            self.handle(command)
            
    @singledispatchmethod
    def handle(self, target):
        raise NotImplementedError("Unknown object")

    @handle.register
    def _(self, c: Cd):
        target = c.arg[0]
        self.fs.cd(target)

    @handle.register
    def _(self, c: Ls):
        # Print only current directory, no argument
        pass

    @handle.register
    def _(self, c: DirList):
        name,  = c.arg
        self.fs.mkdir(name)

    @handle.register
    def _(self, c: FileList):
        size, name = c.arg
        self.fs.touch(name, int(size, 10))

    def __init__(self, input_data: str) -> None:
        self.fs = ElfFileSystemExplorer()
        self.parse(input_data)

Implement the test class with the challenge logic

In [10]:
from adventofcode import AdventOfCode

class Day7(AdventOfCode):
    def solve(self):
        a = ExplorerCommandParser(self.input_data)
        a.fs.cd("/")
        s = sum(filter(lambda x: x <= 100000, a.fs.du().values()))
        return s

Test everything with the given example

In [11]:
test_data = ("""$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
""", {"/": 48381165, "/a/e": 584, "/a": 94853, "/d": 24933642})


In [12]:
from adventofcode import AdventOfCode

a = ExplorerCommandParser(test_data[0])
a.fs.cd("/")
a.fs.tree()
disk_usage = a.fs.du()
assert disk_usage == test_data[1]

challenge = Day7(test_data[0])
res = challenge.run()
assert res == 95437

$/
 DIR / (total size: 48381165)
|____ DIR a (total size: 94853)
|____|____ DIR e (total size: 584)
|____|____|____ 584 i
|____|____ 29116 f
|____|____ 2557 g
|____|____ 62596 h.lst
|____ 14848514 b.txt
|____ 8504156 c.dat
|____ DIR d (total size: 24933642)
|____|____ 4060174 j
|____|____ 8033020 d.log
|____|____ 5626152 d.ext
|____|____ 7214296 k
Running AdventOfCode : 95437


Finally, let's get started with the challenge :-)

In [13]:
from adventofcode import AdventOfCodeFromFileInput
challenge = AdventOfCodeFromFileInput(Day7, "input.txt")
challenge.run()

Running AdventOfCode Day7: 1367870


1367870

Part 2

In [14]:
class Day7Part2(AdventOfCode):
    def solve(self):
        a = ExplorerCommandParser(self.input_data)
        a.fs.cd("/")
        disk_usage = a.fs.du()
        target_space = 30000000 - (70000000 - disk_usage["/"])  # Note: may become <0 if we already have enough space
        s = sorted([x for x in disk_usage.values() if x >= target_space])
        # the following line is not useful for the challenge but lets us visualize the candidate directories for deletion
        sorted_candidates = [(list(disk_usage)[list(disk_usage.values()).index(v)], v) for v in s]
        return sorted_candidates[0][1]

Test with the example

In [15]:
from adventofcode import AdventOfCode

challenge = Day7Part2(test_data[0])
res = challenge.run()
assert res == 24933642

Running AdventOfCode : 24933642


Solve the challenge

In [16]:
from adventofcode import AdventOfCodeFromFileInput
challenge = AdventOfCodeFromFileInput(Day7Part2, "input.txt")
challenge.run()

Running AdventOfCode Day7Part2: 549173


549173