This is the dcputhings, assorted tools for DCPU-16 development. This repository is maintained by Kang Seonghoon and contains the following softwares:
dcpu.c
anddcpuopt.c
, my early attempts to build a DCPU-16 emulator.- DcpuAsm, an Ocaml DSL for DCPU-16 assembly.
DcpuAsm is a DCPU-16 assembler embedded in the Ocaml syntax. It allows easier DCPU-16 code generation, but it can also be used as an ordinary macro assembler if you understand a bit of Ocaml.
Basically, the assembly is represented as an Ocaml list:
[
SET A, 4;
SET B, 5;
%loop:
SET PC, %loop;
]
More specifically, SET A, 4
, SET B, 5
and %loop: SET PC, %loop
evaluates
to the internal representation via Camlp4. Note that Ocaml allows the trailing
semicolon in the brackets, so the last ;
is just fine.
This list can be converted to the binary via ASM
keyword:
let code = ASM [
SET A, 4;
SET B, 5;
%loop:
SET PC, %loop;
];;
print_string (DcpuAsm.to_binary_le code)
This will resolve all the labels to the fixed offset. to_binary_le
converts
the static code into the little-endian byte string. The big-endian counterpart
is to_binary_be
, and you can get an array of words using to_words
.
By default ASM
assumes the origin at 0x0000. This can be changed using ASM ~origin:0x1000 [...]
syntax; in fact, ASM
is just a shorter alias to
DcpuAsm.asm
function.
DcpuAsm supports the following instructions (and pseudo-instructions):
- Basic opcodes:
SET
,ADD
,SUB
,MUL
,MLI
,DIV
,DVI
,MOD
,MDI
,AND
,BOR
,XOR
,SHR
,ASR
,SHL
,IFB
,IFC
,IFE
,IFN
,IFG
,IFA
,IFL
,IFU
,ADX
,SBX
,STI
,STD
- Special opcodes:
JSR
,HCF
,INT
,IAG
,IAS
,IAP
,IAQ
,HWN
,HWQ
,HWI
- Raw data:
DAT
,ORG
,ALIGN
- Syntactic extensions:
NOP
,JMP
,PUSH
,POP
,RET
,BRK
,HLT
- Empty opcode (i.e. no output at all):
PASS
Basic opcodes has two arguments, and special opcodes has one of them.
Multiple arguments are separated with ,
as much like other assemblers.
DAT
has one or more arguments. The argument can be a typical immediate (see
below for the syntax) which occupies exactly one word, or a string which
occupies the same number of words (so that "ok"
equals to 'o', 'k'
).
One can also use _
for placeholders which value can be ignored; it is mostly
equivalent to 0
but _
s at the end of the binary will be ignored.
Arguments can have a TIMES
prefix as like DAT 3 TIMES 0x1234
, where the
repeat count can be any expression including labels. Repeating string is also
allowed (e.g. 3 TIMES "hello?"
).
ORG x
is equivalent to DAT (x-%_) TIMES _
, and will set the current
assembly position to x
if possible. It will raise an error if it is
impossible. x
should be a positive integer.
ALIGN x
is equivalent to DAT ((x-(%_ MOD x)) MOD x) TIMES _
, and will set
the current assembly position to the next multiple of x
. (Therefore it will
add at most x-1
zeroes.)
NOP
is equivalent to SET A, A
. It does nothing but take one cycle. While
DCPU-16 has lots of nops, this encoding is chosen because of the simplicity of
its binary encoding (0x0001). It may change if 0x0000 also turns out to be a
nop.
JMP a
sets the PC to a
in the fastest or at least shortest way. There are
4 possible encodings for JMP
: SET PC, ...
, XOR PC, ...
, AND PC, ...
and SUB PC, ...
. (Among them XOR
is fastest but not applicable for all
cases.) Note that the plain SET PC, a
won't be optimized; you must
explicitly use JMP a
instead.
PUSH a
is equivalent to SET PUSH, a
. POP a
is equivalent to SET a, POP
. You can also use [SP]
instead of PEEK
, and [SP+...]
instead of PICK ...
.
RET
is equivalent to SET PC, POP
, and used for returning from the
subroutine initiated by JSR
instruction.
BRK
and HLT
are equivalent to SUB PC, 1
. This forms a simple infinite
loop, and used as a de facto instruction to terminate the emulator.
PASS
does not emit the binary at all; it can be used as a placeholder.
DcpuAsm does not support EQU
pseudo-instruction or similar; you can use an
ordinary let x = ... in ...
construct to define constants, however.
DcpuAsm supports labels. Labels are a valid Ocaml name (always starts with a
lowercase letter or _
) prepended by %
; %asdf
, %_foo_bar
, %loop42
are valid labels, for example.
There are two ways to use labels:
- It can occur in the expression, and evaluates to the location pointed by the label. The instruction may contain labels defined after it.
- It can also occur at the front of the instruction (e.g.
%foo: SET A, 3
) to declare the label. The colon (:
) is optional for predefined instructions, but you are recommended to keep the colon as it allows multiple label definitions. Skipping a colon may be natural forDAT
instructions however.
You can define labels at the end of list; the PASS
statement will be
implicitly added:
[
JMP %garbage;
DAT 1, 2, 3, 4;
%garbage:
]
DcpuAsm will automatically resolve labels to the appropriate position. Since
the length of instructions may vary depending on the position of labels,
DcpuAsm runs multiple passes to settle them down. If it is not stabilized
after given number of passes DcpuAsm gives up. The default limit is 50, but
can be configured like ASM ~maxpass:10 [...]
.
It is possible to have free (undefined) labels in the assembly. DcpuAsm will make sure that these labels, while unresolved, will not affect the other parts of generated code. This is done by forcing all remaining immediates to always use a longer form.
It is advised to prepend _
to local labels. DcpuAsm has a special support
for these local labels (see below).
The special label %_
, when used in the expression, resolves to the position
of the current instruction. For example SET PC, %_
will be same as %_temp: SET PC, %_temp
. You cannot define a label named %_
.
Expression can occur as an instruction's argument. It may contain registers,
numbers, labels, memory references (enclosed in []
) and expressions.
DcpuAsm supports all general and special registers: A
, B
, C
, X
, Y
,
Z
, I
, J
, SP
, PC
, EX
, IA
. (IA
cannot really be used, but it is
there for the better error handling.) It also supports PUSH
, PEEK
, POP
and
PICK ...
; they cannot be used in the expression.
DcpuAsm supports numbers in base 2 (0b101
), base 8 (0o337
), base 10 and
base 16 (0x1337
) just like Ocaml. Additionally a character literal ('A'
)
will be equal to its numerical code (i.e. int_of_char 'A'
). All numbers are
treated as built-in Ocaml numbers (31 or 63 bits long depending on the
platform) so you should be aware of it.
DcpuAsm supports all ordinary arithmetic and bitwise operations: +
, -
,
*
, DIV
, MOD
, NOT
, AND
, OR
, XOR
, SHL
, SHR
. While arithmetic
operations are permitted for registers, the resulting expression has to be in
the form register + other expression
or [register + other expression]
due
to the constraint of DCPU-16. (The intermediate expression does not have to
however: [3*A+2*(2*B-A)-(8 DIV 2)*B]
will be resolved to [A]
, which is
perfectly valid in DCPU-16.)
As mentioned before, labels in the expression evaluate to their positions. You can do something like this:
[
(* Returns sqrt(A) from the precomputed table. A should be less than 16.
* B will contain an integral part and A will contain a fractional part.
*)
%isqrt:
IFG A, (%_fpend-%_fpstart) DIV 2 - 1; (* bound check *)
HLT;
MUL A, 2;
SET B, [%_fpstart+A];
SET A, [%_fpstart+A+1];
JMP POP;
%_fpstart:
DAT 0x0000, 0x0000; DAT 0x0001, 0x0000;
DAT 0x0001, 0x6a0a; DAT 0x0001, 0xbb68;
DAT 0x0002, 0x0000; DAT 0x0002, 0x3c6f;
DAT 0x0002, 0x7312; DAT 0x0002, 0xa550;
DAT 0x0002, 0xd414; DAT 0x0003, 0x0000;
DAT 0x0003, 0x298b; DAT 0x0003, 0x510e;
DAT 0x0003, 0x76cf; DAT 0x0003, 0x9b05;
DAT 0x0003, 0xbddd; DAT 0x0003, 0xdf7c;
%_fpend:
]
IMM (e)
(note the parentheses) and PTR [e]
is a longer form of e
and
[e]
, respectively, and you should use them outside the assembly instruction.
Normal Ocaml expression can also be used; VAL e
will evaluate e
as an
Ocaml expression and use its value as an immediate. Similarly, STR e
uses
its value as a string (only useful in DAT
arguments). If the Ocaml
expression is simple enough (e.g. a single identifier) then you can omit
VAL
entirely. This is very useful for compile-time constants:
let screen_base = 0x8000 in
[
SET [screen_base], 'H';
SET [screen_base+1], 'e';
SET [screen_base+2], 'l';
SET [screen_base+3], 'l';
SET [screen_base+4], 'o';
HLT;
]
DcpuAsm tries to generate the shortest code for given assembly, but you can
override this behavior by SHORT
and LONG
prefixes. SHORT e
will cause an
error when e
does not fit in the range of -1--30 or it is used as a first
operand (cannot use a short literal there), and LONG e
will generate a longer
form of given immediate (not the instruction, so should use it twice for basic
opcodes). This only applies to a literal value; it is silently ignored in other
kind of values.
A special value NEXT
can also be used as a part of an expression. It won't
generate the "next words" required for long literals and register-relative
addressing, so whatever the next instruction is it (or its first word) will be
the next word. It can be used for simple self-modifying programs in combination
with SHORT
(to ensure that the next instruction is always one word long), for
example. Note that NEXT
is not canonicalized, so [A+NEXT*2-NEXT]
(for
example) is invalid. Only a form of NEXT
, [reg+NEXT]
, [NEXT+reg]
is valid.
DcpuAsm supports a block as a unit of assembly instructions. They can be used as a building block:
(* Warning: not tail-recursive. Illustration purpose only. *)
let copyn src dst n =
if n = 0 then
PASS
else if n = 1 then
SET [dst], [src]
else
BLOCK [
SET [dst], [src];
copyn (src+1) (dst+1) (n-1);
]
in
[
(* save and restore the video memory *)
copyn 0x8000 0x4000 (16*32);
copyn 0x4000 0x8000 (16*32);
HLT;
]
BLOCK e
evaluates e
as an Ocaml expression (which includes,
incidentally, a list containing assembly instructions) and makes an
instruction block out of it. You don't have use blocks if you have exactly one
instruction, as the case n = 1
of the above code suggests.
More interesting use of blocks involves local labels:
let case k = BLOCK [ (* ... *) ] in
[
BLOCK LOCAL [
IFE A, 1;
JMP %_next;
JMP %_skip;
%_next:
case 1;
%_skip:
];
BLOCK LOCAL [
IFE A, 2;
JMP %_next;
JMP %_skip;
%_next:
case 2;
%_skip:
];
HLT;
]
BLOCK LOCAL
will make all defined labels starting with _
local. It is not
possible to access these local labels outside of the block (unless you use a
nasty hack). Non-local blocks (in this case, case 1
and case 2
) will not
affect this procedure. This is very useful for generated codes.
You can manually give a list of local labels using BLOCK LOCAL %a, %b, %c
syntax; or by an Ocaml list using BLOCK LOCAL *["a"; "b"; "c"]
.
DcpuAsmExample.ml
contains some extreme example of local blocks.
There are no separate macro feature in DcpuAsm, but you can trivially make a
simple macro with local blocks and Ocaml let
construct.
DcpuAsm does support some additional features for macros. While VAL
and
STR
allows the insertion of arbitrary immediate value or string, you cannot
insert registers or other expression in this way. Therefore DcpuAsm supports a
quoted form #e
of an Ocaml expression, which can appear in:
- Expressions (e.g.
3 + #reg
). The expression should evaluate to the internal representation of expression; an immediate should be quoted usingIMM
prefix. - Labels (e.g.
%#labelname
). The expression should evaluate to a string. While you can use any character in the label name (even an empty string is permitted), you should restrict yourself to the normal identifier. Local labels, for example, start with.
character internally.DcpuAsm.gensym
function can be used to generate unique symbols.
As always you should expect the following caveats:
- You cannot use codes like
[(%label1, ...); (%label2, ...); ...]
in the normal Ocaml code because(%
will be treated as one token. (( %label1
etc. will work.) The assembly syntax is specially crafted to separate those two, however. Any suggestions about this problem are welcomed. DAT
is missing*
-prefixed items.