Hubris is programming language, consisting of a very thin layer atop 65C02 assembly language. It is designed to target the (fictional) Eiling Technologies ARS, but can probably be coerced into targeting any sane 65C02 system. It uses the excellent WLA-DX assembler and linker as the backend.
This documentation is more for my own reference than any other purpose. I don't really expect other people to end up in the situation of needing a language like Hubris. Technically, all the answers needed are in this document, but they're not easy to find if you don't already know what to look for. If you have a question whose answer is not obvious here, the best way to get an answer is to contact me on IRC. I go by SolraBizna on irc.tejat.net.
Hubris has three major goals:
- Efficiently allocating local variables, parameters, etc. in a full-system program
- Improving the safety and efficiency of subroutine calls
- Reducing programmer workload for bank-switching
hcc.lua is NOT the Hubris compiler. It is the Hubris compiler compiler. It is used to assemble
hubris.lua from the components in
Install Lua 5.3, LuaFileSystem, and LPEG. The latter two can be installed with LuaRocks:
luarocks install luafilesystem luarocks install lpeg
On UNIX, you may want to put a symlink to
hubris.lua somewhere in your PATH.
Hubris currently requires a recent build of WLA-DX, more recent than the official 9.5 release. The source can be found here.
When you run the Hubris compiler, you provide it one or more source directories. The Hubris compiler recursively searches these directories and processes all
.hu files it finds.
.hu files are Hubris-language source files, and are fully processed.
.65c files are raw 65C02 source files, and are passed on to the assembler with no further processing. (
.65c files contain startup code, certain kinds of glue, and pure data.)
There is no file dependency or order dependency in
#routines must each be fully contained inside one file, but other than that, the final compiled code is the same as if all
.hu files are concatentated together before compiling (except in the case of
Hubris puts all its intermediate files, and
.o files, into the output directory provided. By convention, this directory is named
obj. In addition to the generated
.o files, Hubris will place two important files in this directory:
common: Should be included by every
.65cfile. Contains Hubris's beliefs about the memory map of the machine,
.DEFINEdirectives for the assignments of the
#slots, and everything inside
link: Pass this to
wlalinkto link the program as known by Hubris. If you want to add object files, an image file header, etc. then you should edit this file after running the Hubris compiler.
An example compile process:
$ hubris.lua obj src $ wlalink -S -v obj/link MyGame.etars/main.rom
(You should really put this into a shell script or batch file rather than doing this manually each time. Using a Makefile would be even better.)
.hu files mainly contain Routines. A Routine is a block of code, and has its own scope for local variables and parameters. Recursion, directly or indirectly, is normally not allowed; that is, a Routine may not normally cause itself to be called again before the first call returns, directly or indirectly. This lack of recursion is what makes Hubris's variable allocation system work.
Entry Points are a special kind of routine. Entry Points are not called from other Routines. Instead, they are jumped into via interrupt vectors, or called from special assembly glue code. An Entry Point, along with every Routine called directly or indirectly by that Entry Point, forms a Subprogram. Routines from one Subprogram cannot call routines from another; otherwise, interrupt safety is at risk.
Subprograms separate routines that must be able to interrupt one another safely. A typical ARS game will have three or four Entry Points, and therefore three or four Subprograms:
IRQwill contain some very slim graphics code that does scanline effects.
BRKwill contain a short routine to gracefully halt execution if a
BRKis encountered. (The standard ET init code handles BRKs, so if you're using that, you won't need a
NMIwill contain code to do per-frame PPU updates, and audio code.
Mainwill contain the vast bulk of the program, including all game logic.
Hubris does not assign any special meaning to any particular entry point name. You are free to define as many or as few as you like, with whatever names you like. You must define at least one, however; Routines that are not called from any Entry Point (dead Routines) are not included in the final program.
Hubris source files use
; as a comment character, just like regular 65C02 assembly files. Each non-blank line is either:
- A Hubris directive, beginning with
#(optionally preceded by whitespace)
- A line of 65C02 assembly, or WLA-DX directive (only allowed within a
By convention, most Hubris directives are unindented, Hubris directives that "are code" (e.g.
#return) are indented to the same level as the surrounding 65C02 code (with a single tab).
#include instead of
.INCLUDE. Each source directory is searched in the order given on the command line. Please, please do not use a
.hu extension for files meant to be included by other files; if you do, those files will be processed as independent source files. And please, regardless of what operating system you run Hubris on, please use
/ as a path component separator.
Hubris's variable and routine systems are relatively smart, so
#include is mainly only needed for incorporating automatically-generated data files.
Every Hubris program must contain, somewhere in its source, a
#bs directive, a
#bankcount directive, and at least one
#region directive. These should generally be in their own source file, possibly named something like
Informs Hubris of the hardwired BSx pin values for this cartridge. In the name of sanity, Hubris uses this information to determine both the slot count and the slot size.
<BS>=0 means 1x32KiB slot, 1 means 2x16KiB slots, 2 means 4x8KiB slots, and 3 means 8x4KiB slots.
Informs Hubris of the number of ROM banks available for use. This must be a power of two.
#region <first> <last> <name>
Informs Hubris of a region of memory that can contain variables. An ARS program will almost always have:
#region $0000 $00FF fast #region $0250 $7FFF slow
Variables with hardcoded addresses may exist outside of these regions. In fact, if you hardcode every address, you don't need
#region at all. (But then you've lost out on half the point of using Hubris...)
Banks and Slots
If you are making a one-bank ROM, you can ignore this section.
WLA-DX has support for multiple banks and slots that is normally quite adequate. However, Hubris generates each
#routine in a separate WLA
.SECTION, and WLA-DX does not have support for ensuring that multiple
.SECTIONs will end up in the same bank. Since WLA-DX decides banks at link time, it also can not provide a way to optimize out longcalls and such. For these reasons, Hubris requires manual bank assignment. It has some features intended to make this a little easier.
#group <n> <slot> <name>
Defines a group with the given name. A group is a logical collection of related routines and data that should always be mapped at the same time. Multiple groups may coexist in the same ROM bank, even if they occupy different slots.
n is the 0-based index of the ROM bank this group will reside in.
slot is the 0-based index or
#slot name of the slot this group will occupy. Example:
#group 0 VectorSlot InterruptCode #group 0 0 StartupCode #group 0 0 GameCode
Creates a group named
InterruptCode which will occupy the
VectorSlot slot, and groups named
GameCode which will occupy the lowest slot. The groups, despite potentially occupying different slots, will coexist in the same ROM bank of the cartridge. (If this cartridge has
#bs 1 and
#slot 1 VectorSlot, then
InterruptCode will link to
$C000 and on, while
StartupCode will link to
$8000 and on.)
GameCode grew large enough to need its own bank, WLA would start refusing to link this program. At that point, the directives could be changed to:
#group 0 VectorSlot InterruptCode #group 0 0 StartupCode #group 1 0 GameCode
Bank switching logic would also need to be added to the program at this point.
#slot <n> <name>
Defines a slot alias. It is not an error to have multiple different aliases for the same slot, but it is an error to have multiple aliases with the same name.
#slot 0 LiveSlot #slot 1 FixedSlot #slot 1 InterruptSlot
LiveSlot might be the slot where game logic and data are found, and get switched out willy nilly.
FixedSlot might be where some useful utilities (decompression and math routines, for instance) are always mapped.
InterruptSlot might be the slot where the interrupt handlers are. Note that
InterruptSlot are both aliases to slot 1 in this example. If the game later changed to a four-slot memory map,
InterruptSlot could end up being mapped to different slots without requiring extra effort.
Hubris defines the following special symbols in the memory map file:
The ROM bank number assigned to the given group.
The slot number used by the given group.
The slot number for the given slot alias.
These are useful if you want to include mass amounts of data in the program, such as with the following
.INCLUDE "obj/common" ; The map data itself can live in whatever bank .SLOT hubris_Slot_DataSlot_slot .ORG 0 .SECTION "map1" SUPERFREE map1: .INCLUDE "src/gen/map1.inc" .ENDS .ORG 0 .SECTION "map2" SUPERFREE map2: .INCLUDE "src/gen/map2.inc" .ENDS ; The pointers to the map data live in the same bank as MainGroup .BANK hubris_Group_MainGroup_bank SLOT hubris_Slot_MainSlot_slot .ORG 0 .SECTION "map_pointers" FREE map_banks: .DB :map1, :map2 map_lows: .DB <map1, <map2 map_highs: .DB >map1, >map2 .ENDS
#routine <name> <...> ; assembly, such as look-up tables and "backward jumps", may go here #begin ; execution of the routine begins at this point #return ; more optional assembly, including possible "forward jump points" #endroutine
This is the structure of a Routine. Between
#endroutine is the only place assembly lines are allowed. When someone does
#call <name> (or glue assembly does
JMP <name>), execution begins at the location specified by
#return is the equivalent of
#return INTERRUPT will do
RTI instead of
Routine names may not begin with
_, but may otherwise be any valid identifier. A Routine name may also consist of arbitrarily many names separated by double colons (
::), in which case this is a Subroutine. For example,
X::Y is a subroutine of
X. It has access to
#sublocal variables of
X, and should only be called from
X and other subroutines of
X. Hubris exports a symbol whose name is the exact routine name, including any colons.
Every line within a
#routine that is not a Hubris directive is passed to the assembler. They can contain instructions, labels, even WLA-DX directives. (Some WLA-DX directives, such as
.ENDS, are inadvisable within a
#routine.) Any labels defined within a
#routine should begin with an
@ or be a +/- label, but if it doesn't, Hubris assumes you know what you're doing.
There may be more than one
#return, in case the routine has multiple termination points. There may even be no
#return, in case the routine is a deliberate infinite loop or otherwise never terminates, in which event
#endroutine must be replaced with
#endroutine NORETURN to indicate that this is intended.
Extra parameters to
Marks this routine as an Entry Point. (Generally there will be very few of these in a program.)
<regs>is a list of registers, containing any of
CLOBBERindicates that this routine intentionally clobbers these registers.
PRESERVEasks Hubris to push these registers on entry and pop them on
#call ... JUMP, marking them as callee-save registers. If a register is not listed as either
PRESERVE, it is assumed that this routine does not alter the register, either because it doesn't use it or because it has its own internal preservation logic. The caller can mark their
PRESERVEto indicate that they expect certain registers to be preserved, and Hubris will add pushes around the call only if required.
Forces this routine to start at the given CPU address. (This applies to the first line of the routine, not necessarily to its
#begin!) WLA will complain if the address is outside the relevant slot.
This specifies which group the routine belongs to. Related routines should be put into the same group. It is an error to leave off the
GROUPparameter for an
ENTRYroutine if multiple banks are available. (Non-
ENTRYroutines default to inheriting the
GROUPof their nearest parent routine.)
#call <name> <...>
Calls a routine with the given name. This should be used instead of
JMP, since otherwise Hubris won't know about the call, and won't be able to check / ensure register preservation.
Extra parameters to
<regs>is a list of registers, containing any of
CLOBBERindicates that you don't care if the listed registers are clobbered, which is also the default state.
PRESERVEindicates that you want the listed registers to be preserved. (Hubris will automatically add pushes and pops to ensure this, if and only if it is necessary.)
JSRto call this routine. Registers
PRESERVEmarked preserve in this routine (not in this
#call) will be popped, as if this were a return. Saves a few cycles for explicit tail returns, and is the only condoned way of doing recursion. (
PRESERVEare meaningless and ignored with a
JUMPcall; registers are restored according to the calling routine's
PRESERVEtags instead. This can cause problems if the routine being
JUMPed to has more destructive
CLOBBERSthan this routine. A future version of Hubris will raise an error if this happens.)
#callis intended for a different group. This tag must be present on calls across group boundaries, and must not be present on calls within the same group. Intergroup calls may incur a Longcall, see below.
Longcalls will clobber the accumulator; with the
-ioption, the compiler will clobber the accumulator on all
INTERcalls, even non-long ones; this helps head off accumulator-related accidents when groups are shuffled around.
Don't use this! This ignores the call for purposes of recursion checking, cross-subprogram call prevention, and variable allocation! This allows you to bypass all of Hubris's safety checks! This is a very naughty tag that you should only use if you know what you're doing! Before you do something like this, consider not doing the unsafe thing in the first place!
If you have a routine named
foo and a parameter named
p_Bar, you can access it within any routine that directly calls
foo by writing
foo::p_Bar. You should write any parameters immediately before the call, and read any return values immediately afterward.
When calling between groups that occupy the same slot, but different ROM banks, a Longcall is required. Hubris generates the following code:
LDA #<TARGET STA ENTRY::glue_longcallSLOT::target LDA #>TARGET STA ENTRY::glue_longcallSLOT::target+1 LDA #BANK STA ENTRY::glue_longcallSLOT::target_bank JSR ENTRY::glue_longcallSLOT
where TARGET, BANK, and SLOT are the target routine and the ROM bank and slot occupied by the target routine, and ENTRY is the entry point in effect.
You must implement your own
glue_longcallSLOT routines according to your own use of slots. For example, if your code follows the convention that slot 0 contains all main routines, and
FixedUtilGroup is in another slot is always mapped, you might provide:
#routine Main::glue_longcall0 GROUP FixedUtilGroup CLOBBER A, X, Y #param fast PTR target #param fast BYTE target_bank #begin ; Save the current slot 0 mapping LDA r_BankSelect PHA ; Push the return address, as if by JSR LDA #>(_tail-1) PHA LDA #<(_tail-1) PHA ; Activate the target bank in slot 0 LDA target_bank STA r_BankSelect ; Indirect JMP! (There is no indirect JSR, so we explicitly pushed ; the return address and did this instead.) JMP (target) ; restore slot 0 mapping, clobbering Y rather than A ; (According to our program's convention, A may contain a return ; value, while Y may not. Therefore, we clobber Y here.) _tail: PLY STY r_BankSelect #return #endroutine
As you can see, long calls will have a significant amount of overhead compared to regular ones. This is why you should keep related code in the same bank, if not the same
GROUP, whenever possible.
(The call to the
longcall glue bypasses recursion checks and certain other housekeeping, so you must be careful what you write inside. For one thing, you should only provide
#params. If you are having trouble understanding these principles, you should consider just using the above code verbatim.)
#indirectcaller <routine> #indirectcallers <routine>, <routine>, ...
When placed within a Routine, indicates that that routine will be called through a function pointer, by the given routines. Hubris behaves as though the listed routines contain a
#call to this routine, but doesn't actually generate any calling code.
This is an advanced directive. If you don't know why you need it, you don't need it. (But if you do need it, there's no other way to accomplish it without generating spurious code.)
A variable declaration looks like:
#<type> <location> <size> <name> <...>
There are four types of variable in Hubris:
#global: Not scoped, and potentially shared between subprograms. Also used, with hardcoded addresses, to denote hardware registers.
#local: Accessible only from the active scope.
#sublocal: Accessible from the active scope and from the scopes of Subroutines of the active
#routine. (All Routines are, at least, Subroutines of their Entry Point.)
#param: Accessible from the active scope, and the scopes of callers of this
Location may be:
- The exact address the variable must be located at (interpreted as hex if a leading
$is present, and as decimal otherwise)
- The name of a
#regionin which the variable should be allocated
ANY, indicating that the variable may be placed in the first
#regionthat has room for it (after all pickier variables have been placed)
Size may be:
<n>contiguous bytes are required for the variable.
<n>one-byte slots, each separated by
<m>bytes of unused space, are required for this variable. (Useful for funky indexing tricks.)
<l>-byte slots are required for this variable.
<l>-byte slots, each separated by
<m>bytes of unused space, are required for this variable. (Useful for even more funky indexing tricks.)
PTR: Syntactic sugar for
(The size value is used only when placing the variables; Hubris has no choice but to take it on faith that your code accesses the variables in a manner consistent with their sizes.)
PERSIST: Like declaring a
staticlocal variable in C. The variable's lifetime is the entire program, rather than a single call to its containing scope. (This is redundant for
PERSISTvariable may not share a memory location with any other variable, so use these sparingly.
#<type>flag <name> <...>
Declares an atomic boolean Flag. Flags are implemented using the Rockwell
BBRx instructions. They will always live in the "lowest"
#region; therefore, if Flags are used, the "lowest"
#region must be contained entirely in the zero page.
Types, scoping rules, and extra parameters are the same as for variable declarations. Be aware that a memory byte that is used for Flags will not be used for any other purpose; one function with 2048 local Flags will prevent any non-Flag use of the zero page!
#branchflagset <name> <label> #branchflagclear <name> <label> #branchflagreset <name> <label>
Corresponds to the
BBRx instructions. Atomically tests the Flag and branches to
<label> if it is currently set (
#branchflagset) or clear (
#setflag <name> #clearflag <name> #resetflag <name>
Corresponds to the
RMBx instructions. Atomically sets (
#setflag) or clears (
#resetflag) the Flag.
#alias <aliasname> <contents> #globalalias <aliasname> <contents>
Anywhere in assembly source code
<aliasname> is found,
<contents> is processed instead.
<contents> will be processed as if it had been written instead of
<aliasname> at that point in the source, including possibly being handled as an alias. (Hubris makes no attempt to detect alias loops; processing such a loop will hang the compiler.)
Aliases may be used to provide synonyms for variables, enabling some rudimentary polymorphism. They may also be used to create constants.
Unlike every other part of Hubris,
#alias directives have source file scope. An
#alias is effective only on lines after it is defined, and then only until a corresponding
#globalalias, on the other hand, is global and cross-file in scope, including lines before the
Aliases are only processed in assembly source, such as in
#commons. Aliases do not take effect in Hubris directives.
#globalalias processing is performed after all
Forgets a previously-defined alias. There is no equivalent directive for global aliases.
#export <name> [<exportedname>]
NOT IMPLEMENTED YET
<name> is a variable active in the current scope. A symbol for this variable will be exported to assembly code. If
<exportedname> is provided that will be the symbol name, otherwise the variable's name is used.
#common ; WLA directives may go here #endcommon
This directive is used to add directives to the
obj/common file generated by Hubris. Directives here will be effective in every routine and assembly file. (Don't put
.SECTION directives and the like in here unless you're really sure you know what you're doing.)
For example, if you want to use the W65C02
STP instructions (which WLA doesn't know about), you might have the following
#common tucked away in one of your source files:
#common .MACRO WAI .DB $CB .ENDM .MACRO STP .DB $DB .ENDM #endcommon
And then every
#routine or assembly file could use
STP like normal instructions.