Building from source
In order to build the project for a specific platform, run
dotnet build -r <platform> in the repository's directory, where
<platform> is either
You will need the .NET Core SDK v3.0 or newer, which you can download here.
Syntax and memory tape
Since Brainfuck syntax is not fully agreed upon between different interpeters/compilers, I made the following choices:
- Characters that are not BF instructions are simply ignored
- Single-line comments are supported, and are introduced with the
- The size of cells in the memory tape is 1 byte
- Cell values can overflow (without introducing undefined behaviour)
- The number of cells in the memory tape is 65536, thus the memory tape occupies 64KiB
- The memory tape is allocated on the stack, so overflowing it will likely result in undefined behaviour
In the parsing phase, the text source is converted into an Abstract Syntax Tree (defined here), which is then processed by the later phases. In the optimization phase, the AST that was parsed is converted into a smaller, more efficient AST. More details can be found in the next section. In the codegen phase, the AST is converted into .NET CIL code, which is then compiled to an in-memory assembly and executed.
AST and Optimizations
The AST, or intermediate representation used by the compiler consists of generalized versions of BF instructions.
For example, the
+ instruction gets compiled to an
Add 1 node, while
- gets compiled to
This representation enables a number optimizations to be performed:
Multiple instructions of the same kind (e.g. + and -, or < and >) are merged into a single instruction.
+++ --- ++ => Add 2
+++ --- --- => Add -3
>>> << => Mov 1
> <<<< => Mov -3
+++ --- => # Nothing
Efficient cell zeroing
It is a common pattern in BF code to reset a cell's value using the
The compiler recognizes this pattern and transforms it into a single assignment.
Note that due to the fact that cells may overflow,
[+] achieves the same purpose (and is also optimized).
[-] => Set 0
[+] => Set 0
Dead code elimination
In some cases (such as consecutive loops), the compiler can statically determine that particular instructions have no effect, so they are removed.
[+>][<-] => [+>] => Loop [Add 1, Mov 1] # Second loop is not compiled
Loops at the beginning of the program are also removed, as the initial value of cells is 0.
In some cases, certain instructions can be replaced with cheaper ones.
For example, adding after a loop can be replaced with a
set instruction, as the value of the cell is 0 (since it exited the loop), so the 2 instructions are equivalent.
[-]+++ => Set 0, Add 3 => Set 3
Also, setting a value after adding to it (or subtracting from it) will overwrite the result of the additions, so they are removed.
+++[-] => Add 3, Set 0 => Set 0
sets also behave similarly: Only the last
set is compiled, previous ones are elided.
BF programs used to test the interpreter can be found in the Tests folder.
Some test programs are made by other people and were found on the web. Such programs have a comment containing the download link, as well as the original license/copyright notices.
This project is licensed under the terms of the MIT License. For more details, see the license file.