subset-of-C compiler targeting 32-bit x86
C Shell Other
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
bpf-asm Explain what the obsolete VM-target "assembler" is Apr 4, 2014


This project is a personal programming exercise. It consists mainly in
a compiler for a subset of C, targeting 32-bit x86 (via GNU assembler).

This 32-bit x86 compiler is tested on FreeBSD, GNU / Linux, and Windows
(via the MinGW toolchain).

It can compile itself !

I started this in highschool in 2012 and the code has turned into a mess.
I am now attending a CS program, since 2013, & if I started this project
again from scratch I should follow better development principles!

The code is not particularly maintainable or readable, this is what happens
when an overzealous high school student wants to ``prove himself'' and
writes large amounts of code like an insane madman. 
Programming is like a drug
Corporate-quality code this ain't. but definitely i was enthusiastic and
willing, i suppose that is the point

Although, I think the lexer and parser are not *that* bad. The codegen_x86 
module is extremely nasty in a few parts, it has some very messy stuff when 
it sets up symbols. And there are few things that need to be refactored...

	---------------------- X86 QUICKSTART -----------------------

		$ cd compiler/
		$ ./
		$ ./ test/x86/recursive-fib.c 

	  Note that 32-bit C libraries must be installed.


For further info: please go in the "compiler" directory and read
"compiler/docs/docs-x86.txt" for info about the supported C features,
how to build the program, etc.  (There is some other useful information
in the "compiler/docs" directory, for example docs/wcc.txt tells you
how to build a standalone "wcc" command).

The test programs "compiler/test/x86/*.c" should give an idea of the
subset of C accepted at this point. Highlights include: pointers and
multi-dimensional arrays of structs, ints, and chars; structs with
self-referential pointers; support for several libc calls (e.g. printf);
custom procedures with recursion allowed; for-loops; typedefs; string
constants; switch statements compiled to jump tables; (some) function
prototypes; (some) array initializers.

There are still bugs, probably especially in constructs not yet tried
in the test programs. Some expressions may not compile properly yet,
or compile at all, for that matter. This is a work-in-progress. Big real
serious professional or academic compilers likely use more sophisitcated
techniques; I'm doing this project to learn because I still have a lot
to learn.

I haven't read the standard. I'm mostly looking at gcc's x86 output and
imitating that. It's not going to give a professional-quality standard
compiler but it will certainly be a good learning experience for me.

This really really really isn't an optimizing compiler!


This compiler can also target my terrible old bytecode VM,
arbitrarily named "BPF" -- for details, see "README-BPF" and

Note that the BPF compiler supports a much smaller subset of C than does
the x86 compiler, and it may have some weird modifications.


Future project ideas:

MIPS codegen (i'm learning MIPS at school as of early 2014)

The compiler could be improved, merged with a well-designed VM and hacked
into a big REPL to create a script interpreter.

Supporting a larger subset of C ? Not sure if it would be possible to
do this by growing this code.

Support for other real architectures than x86, say e.g. powerpc or arm
(PowerPC does not look easy.)

Support for popular serious VMs like JVM / .NET / LLVM


Project started: December 2012