Simple C compiler
Switch branches/tags
Nothing to show
Clone or download
alexfru Add rudimentary support for wchar_t
This also adds support for integer character constants
containing more than one character, e.g. 'ab'.

With this we should be able to use wide characters and wide
string literals, e.g. L'a' and L"abc".

This introduces wchar_t, WCHAR_MIN, WCHAR_MAX in the
appropriate existing headers (<stddef.h>, <stdlib.h>,
<stdint.h>), but does not add new related headers (<wchar.h>,
<wctype.h>) or any other wide char functionality.

The compiler (both smlrcc and smlrc) receives additional
options to control the type underlying wchar_t:

  UCS-2/UTF-16: -short-wchar -unsigned-wchar
  UCS-4/UTF-32: -long-wchar -signed-wchar
            (or -long-wchar -unsigned-wchar if desired)

UCS-4/UTF-32 is the default for ELF/Linux and Mach-O/MacOS
UCS-2/UTF-16 is the default for DOS, PE/Windows and flat

As usual, 32-bit wchar_t is only available in 32-bit mode(l)s
(that is, not in tiny, small or 16-bit flat mode(l)s).

Wide characters and wide string literals are limited to ASCII
chars inside the single and double quotation marks because
Smaller C parses source code as 8-bit chars and doesn't know
how to interpret non-ASCII chars with codes 0x80 through 0xFF
without properly supporting locales, code pages and the like.
Smaller C does not support UTF-8 with BOM either.
For non-ASCII characters use their Unicode codes like so:

  L"na\xEFve" // 0xEF = Latin small letter I with diaeresis
  L"fianc\xE9"L"e" // 0xE9 = Latin small letter E with acute

Note that in the last example we break the string literal
into parts in order to stop the hex sequence after two hex
digits, \xE9. Without that \xE9e would be interpreted as a
hex sequence for Lao letter Pho Tam.

Also note that string literal concatenation works only for
literals of the same type, e.g. "ab" followed by "cd" (or
L"ab" followed by L"cd"). Mixing and matching wide and non-
wide literals is not supported, e.g. the following won't
compile: "ab" L"cd" (or L"ab" "cd").
Latest commit ca30475 Aug 16, 2018


Smaller C is a simple and small single-pass C compiler,
currently supporting most of the C language common between C89/ANSI C
and C99 (minus some C89 and plus some C99 features).

Currently it generates 16-bit and 32-bit 80386+ assembly code for NASM
that can then be assembled and linked into DOS, Windows, Linux and Mac OS X
(You may use YASM or FASM instead of NASM)

Code generation for MIPS CPUs is also supported (primarily for RetroBSD).

Code generation for the TR3200 CPU and VASM is supported (see Trillek).

The compiler is capable of compiling itself.

The core compiler comes with a preprocessor (ucpp), a linker and a compiler
driver (the driver invokes the preprocessor, the core compiler, the assembler,
and the linker and supports options similar to those of gcc).

The standard C library is work-in-progress and it's close to completion.

See the project documentation for more up-to-date details:

For the lack of a better place, you can discuss Smaller C here:

HX DOS Extender:

Other projects based on/using Smaller C:
"ROM C" (like ROM Basic but C):
NewBasic Compiler:
Sweet32 CPU and toolchain:

Normative and other useful documents on C:
C99 + TC1 + TC2 + TC3, WG14 N1256:
Dave Prosser's C preprocessing algorithm annotated by Diomidis D. Spinellis:
Rationale for C99:
The New C Standard: An Economic and Cultural Commentary:
The Development of the C Language by Dennis M. Ritchie: