nanoc is a tiny subset of C and a tiny compiler that targets 32-bit x86 machines.
The only types are:
int(32-bit signed integer)char(8-bit signed integer)void(only to be used as a function return type or pointee type)- pointers to the above types and to pointer types, which are functionally equivalent to 32-bit unsigned integers
Notably, there are no structs, unions, enums or arrays.
The only keywords are: int, char, void, if, else, while, continue, break, return.
There are many syntactic limitations:
- Declarations and assignments cannot both be made in a single statement. For example,
int a = 12;is not valid butint a; a = 12;is. - Only a single declaration can be made in a single statement.
int a; int b;is valid butint a, b;is not. - There is no operator precedence (see "Differences with C")
- The
++and--operators are always prefix operators.++ais valid, buta++is not. - There is no array indexing syntax.
*(a + 1)is valid buta[1]is not. - There is no unary
-operator.0 - ais valid but-ais not. - There is no preprocessor.
As of now, the compiler is just under 1800 lines of code. It only outputs 32-bit x86 machine code formatted as an ELF executable.
nanoc is not a strict subset of C, as there are some small semantic differences between the two languages:
- Operators do not take precedence over each other; the order in which they are applied must be specified explicitly with parentheses. For example,
1 + 2 * 3is not a valid expression but1 + (2 * 3)is. - "Pointer arithmetic" does not exist: pointers are simply integers, so adding 1 to an
int*will increment it by 1 instead of 4 (or whateversizeof(int)is). - There is no explicit type casting (or any type checking at all), all types are cast implicitly.
- Boolean operators always evaluate both of their operands. For example,
0will always be evaluated in the expression1 || 0, and1will always be evaluated in the expression0 && 1.
nanoc is meant to be a small, self-contained and easily portable compiler for a usable subset of C. I wrote it because I wanted to write and compile code on my hobby operating system without having to port a big toolchain. Since many hobby operating systems run on x86 and read ELF executables, I hope this project proves to be useful for other OSdev enthusiasts as well.
Most existing "little C compiler" projects output some form of assembly. Because writing an x86 assembler is a difficult (and not very interesting) task, I decided to make nanoc output x86 machine code directly rather than assembly text. nanoc is also capable of linking source code with a statically compiled archive (like a libnanoc.a) to produce an ELF executable that can use syscalls and library functions and do useful things.
As of now, nanoc produces very inefficient code. This is not desirable, but I would not sacrifice portability or too much simplicity for code efficiency. It is also a very rudimentary linker -- in particular, the way nanoc combines .data, .rodata and .bss sections and handles global variables is questionable at best and non-functional at worst.
TODO.md lists TODOs to be addressed in the short-term.
To compile nanoc:
make
To use an OS-specific toolchain, simply define the CC variable:
CC=i686-pc-myos-gcc make
nanoc depends on a few libc functions: some simple ones from string.h, malloc+realloc, fopen+fread+fwrite, printf and atoi.
If you are having trouble porting nanoc to your operating system, please reach out to me! I am happy to help. Feel free to raise an issue on this repository or send me an email.
To use nanoc:
nanoc <filename> [<archive>]
For example:
nanoc program.c
or
nanoc program.c /usr/lib/libnanoc.a
This will compile program.c and produce an executable file called a.out. If an archive file is specified, nanoc will link it with program.c so the program can use functions defined in the archive.