Skip to content

Heavily optimising Brainf*** compiler targeting LLVM

Notifications You must be signed in to change notification settings

JakuJ/BFC-10000

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BFC-10000

An optimizing Brainfuck compiler and interpreter. Uses flex and bison for lexing and parsing the source code. LLVM is used as the code generation front-end, using a set of IR optimization passes before compiling to x86-64 object files.

Requirements

  • CMake
  • LLVM
  • clang++
  • flex
  • bison >= 3.6

Building

Make an out of tree build with CMake.

mkdir build
cd build
cmake ..
make

This creates two executables - the compiler and the interpreter.

Usage

compiler [-hvAOS] [-o <output file>] <source file>
	-h	 Print this help
	-v	 Verbose mode, print optimization passes' stats
	-A	 Emit the optimized AST (for debugging purposes)
	-S	 Emit LLVM IR
	-O	 Optimize LLVM IR

The interpreter has a simpler interface, taking the source file as its sole argument.

The compiler creates object files which have to be linked in order to create an executable (clang <object file> will do).

Optimizations

A range of optimization are employed in order to simplify the code and significantly reduce the number of operations required. The following examples use C-like pseudocode.

  1. Adjacent +- and >< instructions are folded. For example +++>>-----++-- is turned into this:

     tape[index] += 3;
     index += 2;
     tape[index] -= 5;
  2. The [-] loop, a common idiom used for clearing the cell, is simplified to tape[index] = 0

  3. Any sequence of instructions that starts with clearing the current cell and then changing its value is folded into a single assignment. This means such a sequence: [-]+++[-]----- would be translated to this assignment: tape[index] = -5;

  4. Unreachable loops (such that occur immediately after another loop) are eliminated.

  5. Multiplication loops are simplified to a few addition operations. This means that such code: [->++<<<--->>] would be translated to this:

    while (tape[index] != 0) {
        tape[index + 1] += 2 * tape[index];
        tape[index - 2] -= 3 * tape[index];
        tape[index] = 0; // loop executes at most once
    }
  6. Alternating "set to N" and "move one cell" instructions are simplified to a memset. Example - setting 8 consecutive cells to value of 3: [-]+++>[-]+++>[-]+++>[-]+++[-]+++>[-]+++>[-]+++>[-]+++ is turned into memset(&tape[index], 3, 8); index += 7;.

Example sources

A Mandelbrot set viewer by Erik Bosman

some brainfuck fluff by Daniel Cristofani
A rich collection of BF programs.

Towers of Hanoi by Clifford Wolf
This one comes with a nice ASCII visuals, but your terminal must be able to understand vt100 escape codes.

Use the interpreter, not the compiler - the compiled version is waaaaay too fast for the animation. If it's still too fast, you might want to comment out some AST optimizations in interpreter.cpp.

Lost Kingdom by Jon Ripley
A text adventure game written in nearly 30k lines of BF code.

You might want to turn off LLVM IR optimizations for this one (compile without the -O flag). Even then the compilation might take a few minutes, mostly spent in the last step (making an object file).