Version 1 — 27 April 2023
Version history for this document is kept in a companion document.
This document is released to the public domain under the Creative Commons Zero dedication.
No copyright is claimed on this document; attribution is appreciated.
The latest version of this document lives in the gb-debugfiles
repository.
The present document describes a textual file format for information regarding debugging actions to be taken when executing a Game Boy or Game Boy Color program. The main intended consumer of this information is the emulator executing the program; while hardware debuggers aren't explicitly ruled out, they are not the intended audience for this file format.
This specification does not intend to provide an origin for those debugging actions; they may be manually written or generated by some code generation tool (e.g., an assembler or a compiler). For simplicity, the originator of these debugging actions is addressed as the "user".
In order to debug Game Boy programs, developers may need to specify a list of debugging actions for the emulator executing the program to take on their behalf. These actions are simple commands, such as halting execution or printing information to a debug console, that are to be taken when a certain event occurs, such as the program reading from or executing a certain memory address.
It therefore becomes useful to have a common way to specify and persist these actions, so that they can be supplied to the emulator on start-up (instead of having to manually enter them every time the program is debugged); this also allows for tools (such as assemblers) to generate these lists of actions, and it enables an easy way of sharing them.
A textual format permits the easy generation and editing of these lists of actions, as it allows users to perform simple modifications to them in a text editor. Therefore, this document describes such a textual format, a debugfile, to be used alongside existing metadata files (such as symfiles) as a debugging tool.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY" and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 when, and only when, they appear in all capitals, as shown here.
This section describes the basic rules and constraints that make up the debugfile format. The actual contents are defined in a later section.
Debugfiles MUST be plain text files, encoded in UTF-8. (Note that US ASCII, being a subset of UTF-8, is a valid encoding.) Byte order marks (that is, initial U+FEFF characters to indicate the file's encoding) MUST NOT be used. There MUST be no C0 control characters (that is, characters with codepoints below 32) other than line feeds (codepoint 10), horizontal tabulations ("tabs", codepoint 9) and carriage returns (codepoint 13).
The file is broken down into lines, which are delimited by line feed characters. The last line of the file MAY end with a line feed character; this MUST NOT cause any differences in interpretation of the file.
Carriage return characters MAY appear before line feed characters; if they do, they MUST be considered to be part of the line delimiter. Carriage return characters MUST NOT be used in any other context: a carriage return character not immediately followed by a line feed character is an error.
Normalization describes a process that eliminates superfluous changes to a debugfile. While emulators interpreting such files are not required to actually perform normalization, they MUST interpret the file "as if" normalization had happened — for instance, by ignoring trailing whitespace as if it had been eliminated by normalization.
If normalization is actually performed, the file provided by the user MUST NOT be overwritten with a normalized version unless the user explicitly requests this.
Normalization is applied to each line of the file individually, and it involves the following steps:
- Replace all tab characters with an implementation-defined non-zero number of space characters per tab. This number MUST be the same for all tab characters in the file. Consecutive tab characters MUST be replaced individually: for instance, if an implementation replaces each tab with 5 spaces, 3 tab characters in a row must be replaced with 15 spaces. (Note that the exact number of spaces will only be relevant inside quoted strings; in particular, this means that they will never contain tab characters. Outside of quoted strings, the number of spaces is immaterial.)
- Delete all leading and trailing spaces from the line.
- If the line is now empty, or if its first character is a semicolon, discard it.
(A line whose first non-whitespace character is a semicolon is considered a comment.)
Discarded lines are not considered at all by any further processing of the file: for instance, any reference to the
"next line" ignores discarded lines, and the requirement for a
@debugfile
directive in the first line of the file (explained in section 3.4) applies to the first line that hasn't been discarded.
There are several kinds of errors that can be detected while handling a debugfile.
The most obvious kind of error is a syntax error, which consists of some content in the file that is not conformant to this specification. If such an error is detected, the implementation MUST report an error to the user and MUST NOT use the file other than for further error detection; the error SHOULD tell the user the location of the invalid content and the reason why it was considered invalid.
If the file references other files (for instance, through a @symfile
directive), and those
files are not available, the implementation MUST treat this as an error; it MUST NOT use the contents of the main file
in that case other than for further error detection.
Syntax errors in referenced files MUST be treated like syntax errors in the main file.
Errors other than syntax errors (such as an expression containing a name that is neither an existing symbol nor a valid variable name) can often be detected when the file is initially read in. Implementations SHOULD attempt to detect these errors when possible, giving them the same treatment described above for syntax errors.
It is also possible that some emulator will not implement the full range of functionality described in this specification, only achieving partial compliance. If this is detected when the file is initially read in, the emulator SHOULD inform the user that some functionality specified in the file is not supported; the emulator MAY ignore those actions or directives it doesn't implement and only follow the ones it does. The emulator MAY also treat this as a hard error, in which case it MUST report the error to the user on initial read-in and disregard the data in the file. Any errors reported due to missing functionality SHOULD indicate which functionality was requested by the file that the emulator does not implement.
If a portion of a debugfile is excluded by a conditional inclusion directive, implementations MUST NOT report any errors for unsupported functionality within the excluded portion.
It is possible that an implementation will detect that some part of a debugfile, while not erroneous under this specification, is suspicious and likely to be the result of an error while authoring the file. For this reason, implementations MAY issue warnings for any such scenarios they detect; however, these warnings MUST NOT cause the implementation to stop processing the debugfile unless the user explicitly requests this.
Lines that begin with a single @
character are called "directives".
They are intended to be processed immediately (i.e., during the initial read-in of the file), and they perform several
utility functions, such as defining additional symbols or setting the default base for numeric constants.
Each directive has a name, which MUST immediately follow the @
character (without any whitespace); unrecognized
directive names are a syntax error.
All directive names are valid identifiers (as specified in section 4.3) and they are case-insensitive
(for example, @debugfile
and @DEBUGFILE
represent the same directive).
Lines that begin with multiple @
characters (at least two) are reserved for private use, for emulators that need to
define their own extensions to the format.
These lines should always be conditionally included in debugfiles in order to avoid errors.
Implementations MUST NOT issue syntax errors for these lines if they are excluded by a conditional inclusion
directive, regardless of content.
The semantics of private use lines are implementation-defined.
While implementations are free to define any meaning they like for these lines, it is RECOMMENDED that they reject
private use lines they don't recognize, unless they are conditionally excluded.
Any other lines are called "actions". An action line specifies a sequence of actions that the emulator must take when a certain event occurs and some conditions are satisfied. Actions are usually specified entirely in one line, but they may be broken down into several: if an action line ends in a colon or semicolon, it continues on the next line. If an action line is continued into the next line, the next line MUST be an action line as well; if it isn't (i.e., if the next line is a directive or a private use line, or if the continued line is the last non-empty, non-comment line of the file), a syntax error occurs.
The very first line of the file MUST be a @debugfile
directive; this serves both as a
versioning system for the format and as a way of telling debugfiles apart.
If this directive is missing, a syntax error occurs.
Some directives are used to control conditional inclusion. These directives specify a condition; if the condition is not met, until the next conditional inclusion directive (or until the end of the file), lines that don't contain a directive (including private-use lines) MUST be ignored (i.e., not processed for any reason, including detecting syntax errors), and lines that contain a directive other than a conditional inclusion directive MUST be ignored as well. However, invalid directive names SHOULD result in a syntax error, even within excluded portions of the file.
All conditions are evaluated on initial file read-in, and they remain active until a new conditional inclusion directive appears or until the end of the file is reached.
Since there is no "end of condition" directive, conditional inclusion directives cannot be nested. This is intentional: conditional inclusion directives themselves are never conditional.
Emulators MUST define a (hopefully unique) name and version for themselves.
The @ifemu
and @ifnotemu
conditional inclusion directives can refer to
these values in order to tailor debugfiles to specific emulators.
Emulator names and versions MUST be between 1 and 50 characters long, and they MUST NOT contain any characters other
than ASCII letters, digits, and the characters #$%&*+-.?@_
; the name MUST begin with a letter and the version MUST
begin with a digit.
Emulators MUST also be able to compare their own version numbers in order to determine which one is greater.
Subject to these restrictions, emulator names and versions are arbitrary and chosen by their authors; emulator names
SHOULD be chosen making a reasonable effort to keep them unique.
All implementations MUST fully support all conditional inclusion directives, as these are the only mechanism that can be used to prevent the use of unsupported features in partially-compliant emulators.
Directives perform various utility functions within a debugfile. They are processed immediately when the file is initially read in, and they can affect how the remainder of the file is processed. Directives are meant to have a support role within this specification; they are described before actions only because some directives can affect the actions that come after them.
All valid directives are listed in this section, indicating their syntax and semantics. Any directives not listed here (other than private-use directives) MUST be considered syntax errors. Names of directives are case-insensitive; they are all listed in lowercase (and shown that way throughout this document) for consistency.
This directive identifies the file as a debugfile.
Syntax: @debugfile <version>
This directive identifies the file as a debugfile, and indicates which version of the specification is being followed.
Valid version numbers are sequences of one to three numerical components separated by period characters (.
); leading
zero digits are forbidden (except for 0
by itself), and trailing zero components can be appended or removed,
provided that the overall number of components does not exceed three (e.g., 4.2.0
and 4.2
represent the same
version number).
The current version number for this specification is given at the top of the document, with all trailing zero
components removed.
The first line of the file MUST contain this directive. (Note that empty and comment lines will have been stripped by normalization, and thus don't count towards the first line requirement.) If this directive appears more than once in the file, all instances of it MUST indicate compatible version numbers. Implementations MUST issue errors and MUST NOT process the file any further if this directive is not present, if they don't support the specification version the directive indicates, or if the directive appears multiple times in the file indicating incompatible version numbers. Two version numbers are considered compatible if they are equal (after removing all trailing zero components) or if their first component is equal and non-zero; compatibility is transitive.
Files included via the @include
directive are not required to contain any @debugfile
directives.
However, if they do, those directives MUST indicate version numbers compatible with the parent file.
(Included and transcluded files are the main reason why multiple @debugfile
directives are allowed in the first
place.)
Including files that target incompatible versions of the specification is an error.
If multiple @debugfile
directives, whether in the same physical file or through included files, specify different
(but compatible) version numbers of the specification, implementations MUST process the entire debugfile as the most
recent version specified across all those directives.
(Amongst two different version numbers, the most recent one is the one with the higher value in the first component
that differs; this comparison is transitive.)
However, if the first @debugfile
directive does not specify that most recent version, implementations MAY instead
process the file using the declared version until the point where a more recent version is declared, at which point
they are REQUIRED to use a version no earlier than the one declared.
These directives allow conditional inclusion of parts of the file. They remain in effect until the end of the file or until a new conditional inclusion directive is found; conditional inclusion directives are never conditional themselves, and thus nesting them is explicitly not possible.
As long as a conditional inclusion directive is in effect, implementations MUST ignore any lines (other than other
conditional inclusion directives and, at their choice, @debugfile
directives) if the
condition given by the directive is false.
These directives MUST be supported by all implementations, since they are the only mechanism available to exclude features not supported by some particular implementation.
If a file is included inside a conditionally-included block, the entire file is either included or ignored (although conditional inclusion directives inside the included file MUST be processed as normal); likewise, if an included file creates conditionally-included blocks, those conditions end at the end of the included file. In other words, conditional inclusion directives in a parent file and the files it includes MUST NOT interact in any way.
Syntax: @always
This is a conditional inclusion directive whose condition is always true. In other words, this directive cancels all previous conditions for conditional inclusion, indicating that the following lines (up to the next conditional inclusion directive or the end of the file) MUST be processed.
Syntax: @if <expression>
This is a conditional inclusion directive that evaluates an expression. The expression MUST be a constant expression.
This directive's condition is true if the expression's value is not zero.
The expression MUST be evaluated according to the current base and signedness in force, as modified by any prior
@radix
and @signedness
directives, if any.
Syntax: @ifdef <identifier>
This is a conditional inclusion directive whose condition is true if the identifier in question is defined.
The identifier may be a symbol or a variable name; variables MUST be prefixed with @
for the purposes of this
directive.
Symbols are considered defined regardless of whether they come from external sources (such as implicit symfiles) or
from the debugfile itself (for instance, from @sym
or @symfile
directives).
Likewise, variables are considered defined regardless of whether they are defined by the
emulator or via a @var
directive.
If a symbol or variable is defined in the debugfile itself, it MUST NOT be considered defined until the point of its definition.
Syntax: @ifemu <emulator spec> [, <emulator spec> [, <emulator spec> ...]]
This is a conditional inclusion directive that is true if the emulator matches the given emulator specification. Several specifications can be given, separated by commas; if that is the case, the condition is true if the emulator matches any of the given specifications. Emulator specifications are matched against the name and version of the emulator; valid values for names and versions are detailed in section 3.5.
An emulator specification can have any of the following formats:
- Just the emulator name. The specification matches if the specified name is equal to the emulator's name. There are no wildcards or regular expressions: the name must match exactly. Name matching is case-insensitive.
- Emulator name and version, separated by one or more spaces; for example,
fooemu 3.8
. The specification matches if the specified name is equal to the emulator's name and the specified version is considered equal to the emulator's version. (This is shorthand for the equality comparison given below:fooemu 3.8
is equivalent tofooemu = 3.8
.) - Emulator name, comparison operator and version; for example,
fooemu > 3.6
. There MUST be at least one space between the name and the operator; there MAY be spaces between the operator and the version. Valid operators are<
(earlier than),>
(later than),=
or==
(equal),<>
or!=
(not equal),>=
(later or equal) and<=
(earlier or equal). The specification matches if the specified name is equal to the emulator's name and the comparison between the specified version and the emulator's version, as given by the comparison operator, is true. - Multiple comparisons; for example,
fooemu > 3.5 < 3.8 <> 3.7.2
. There MUST be at least one space before every operator; there MAY be spaces after the operators. The specification matches if the specified name is equal to the emulator's name and all the version comparisons are simultaneously true.
Since versions are only checked when the names match, version comparisons can be handled by each emulator in any way they deem suitable. Each emulator MUST be able to determine which one of two valid version numbers is later than the other, in order to be able to resolve the comparison; if the specified version is not a valid version number (i.e., it doesn't match the formatting rules for that emulator), the comparison MUST be considered to fail. In particular, invalid version numbers (that is, version numbers that are syntactically correct according to this specification, but do not follow the emulator's rules for version numbers) MUST NOT raise any errors.
If version numbers contain alphabetical characters, version comparisons SHOULD be handled in a case-insensitive manner (for example, by case-folding the version number string).
Note that the characters <
, >
, !
and =
are not allowed in either emulator names or version numbers.
Additionally, version numbers are required to begin with a digit.
Therefore, it is always possible to split the version number from the comparison operator in an unambiguous way.
If more than one emulator specification is given, there MAY be spaces around the commas used to separate them. These spaces MUST be ignored when processing the emulator specifications.
Syntax: @ifnotdef <identifier>
This is the negated form of the @ifdef
directive.
The condition is true if the corresponding @ifdef
directive's condition would be false.
Syntax: @ifnotemu <emulator spec> [, <emulator spec> [, <emulator spec> ...]]
This is the negated form of the @ifemu
directive.
The condition is true if the emulator matches none of the given emulator specifications.
All considerations given for the @ifemu
directive are valid for @ifnotemu
.
Syntax: @else [<condition>]
This is a conditional inclusion directive whose condition is only evaluated if previous directives' conditions were false.
The @else
directive MUST NOT be the first conditional inclusion directive in the file.
If any prior conditional inclusion directive up to (and including) the most recent non-@else
conditional inclusion
directive in the file was considered true, this directive's condition is considered automatically false.
Otherwise, its condition is evaluated.
(Note: this is equivalent to the handling of else
and else if
statements in most programming languages.)
The condition of an @else
directive can be any of the other conditional inclusion directives (without the leading
@
, since the @
merely marks the line as a directive), following the corresponding directive's syntax and
semantics.
(For example, @else ifemu foobar > 3.14
is a valid directive.)
If the condition is omitted, it defaults to always
.
These directives declare elements for further directives and action lines to use. Declarations MUST appear before the first use of the declared elements.
Every declared element has an identifier.
Identifiers MUST NOT contain any characters other than ASCII letters, ASCII digits, and the characters $#.@_
.
Additionally, their first character MUST be a letter or an underscore.
Identifiers are case-sensitive; each declared identifier MUST be unique.
Syntax: @sym <name> <address>
This directive declares a symbol; any further reference to the declared symbol MUST use the address declared by this directive. Additionally, if the emulator keeps a list of symbols for the loaded program (e.g., loaded from a symfile), it SHOULD add the declared symbol to that internal list.
Symbols beginning with exactly two underscores are reserved for the implementation: implementations MAY implicitly declare any such symbols they wish, and users MUST NOT declare any except where the implementation in question explicitly allows it.
If the declared symbol collides with a symbol loaded from an implicit source such as a symfile, the address given in this directive takes precedence. However, the symbol MUST NOT collide with any other symbol declared in the debugfile. Implementations MAY show a warning on initial read-in of the debugfile if a declared symbol collides with a symbol loaded from an implicit source.
The address can be any constant address expression; this expression will use the current base
and signedness in force (as declared by the @radix
and @signedness
directives, if any).
If this address expression is banked, so will be the declared symbol; otherwise, the symbol will be unbanked.
Syntax: @local <name> <address>
This directive declares a local symbol, i.e., a symbol that is only valid in the file it is declared and any files it may include, but not in any files that include it. In other words, the symbol ceases to exist when the current file ends; parent files MUST NOT refer to that symbol.
The local symbol MUST NOT collide with a symbol declared by @sym
, @local
or
@alias
in the same file, but it MAY have the same name as a symbol declared in any other way.
If it does, the @local
definition takes precedence throughout its scope.
If the emulator keeps a list of symbols for the loaded program, it SHOULD NOT add symbols declared by @local
to that
list.
All other considerations listed for @sym
also apply to @local
.
Syntax: @alias <name> <reference>
This directive declares a symbol alias, that is, a symbol that refers to another symbol. This may be used to be able to name symbols with invalid characters in them, or simply to give a shorter or better name to an existing symbol. The newly-declared symbol becomes equivalent to the aliased symbol for all intents and purposes.
As with @sym
and @local
, users MUST NOT declare aliases that begin with
exactly two underscores: symbols that begin with two underscores are reserved for the implementation.
On the other hand, the referenced symbol can begin with two underscores: implementations MUST treat such references as
references to real symbols, not implementation-defined internal symbols.
Symbols declared by @alias
are only valid in the file they are declared and those including it, but not in parent
files; in other words, they have the same scope as symbols declared by @local
.
The declared alias MUST NOT collide with a symbol declared by @sym
, @local
or
@alias
in the same file, but it MAY have the same name as a symbol declared in any other way.
If it does, the @alias
definition takes precedence throughout its scope.
If the emulator keeps a list of symbols for the loaded program, it SHOULD NOT add symbols declared by @alias
to that
list, since those symbols are considered local.
The reference (i.e., the aliased symbol) MUST be given as a quoted string; no escape sequences are allowed within it.
This quoted string can contain non-ASCII UTF-8 characters, as described in section 7; since escape
sequences aren't allowed, non-ASCII characters MUST be written as themselves if needed.
The referenced symbol MUST be declared by an external source, by a symfile loaded by @symfile
prior to the @alias
directive, or by a prior @sym
directive; local symbols (i.e., those declared
by @local
or @alias
) MUST NOT be aliased.
If it comes from an implicit external source or from a symfile, it MUST NOT be overridden by any later
@sym
directives in the same file.
Syntax: @var <name> <value>
This directive declares a variable to be used in expressions. The name MUST begin with an underscore and it MUST NOT collide with any other existing variable; variable names that don't begin with an underscore are reserved for the implementation (including the required variables listed in the corresponding section).
The value is the initial value of the variable, and it can be any constant expression; that
expression will use the current base and signedness in force, as given by the @radix
and
@signedness
directives.
Variables declared this way are always 32 bits wide.
User-defined variables MUST be reinitialized to their initial value whenever the emulator is reset (for example, via
the use of the reset
command, through a menu entry or a hotkey, etc.).
Syntax: @str <name> <value>
This directive declares a named string that can be referenced by commands that display messages to the user (such as
message
and alert
) and by other named strings.
The given name for the string MUST be unique across all strings declared with the @str
directive.
Strings live in their own namespace: there is no ambiguity between strings and other identifiers (such as symbols and variables) because they cannot be used in the same contexts. Therefore, the given name for the string MAY be equal to the name of some symbol or variable.
String names beginning with exactly two underscores are reserved for the implementation: implementations MAY implicitly declare any such strings they wish, and users MUST NOT use any such string names except where the implementation in question explicitly allows it.
The value MUST be a constant string, between quotes. The format for the value is identical to the format used in commands that take a string as an argument; refer to section 7 for further details.
These directives allow grouping actions into groups, in order to be able to execute actions upon those named groups
(such as disabling all actions within a group).
Actions that come before a @group
directive are not considered to belong to any group.
Syntax: @group <name> [<display name>]
This directive declares that the actions that follow it belong to the specified group.
If there was already a @group
directive in effect, this directive replaces it; it is not necessary to explicitly
close groups with @endgroup
before opening a new group.
The name of the group is an identifier, which must follow the same rules given in section 4.3.
If the same group name is given multiple times in a debugfile, all of the actions covered by the corresponding
@group
directives are considered to belong to the same group; in other words, reusing a group name appends actions
to that group.
The display name is an OPTIONAL constant quoted string; no escape sequences are allowed within it. If this string is given, and the emulator's interface allows grouping debugging actions, it SHOULD use the given display name as the name for the group.
If multiple @group
directives refer to the same group name, and at least two of them specify a display name for the
group, they MUST all specify the same display name.
If different display names are specified for the same group, the emulator MAY, at its discretion, raise an error or
use any of the display names given.
Syntax: @endgroup
This directive closes a @group
directive: actions that follow will not be considered to belong to
any group.
These directives allow a debugfile to refer to other files. The specified file paths may be relative or absolute; relative paths are relative to the location of the file containing the directive.
Implementations MAY impose restrictions on allowable file paths for security reasons; if an implementation determines that the file path specified for one of these directives is not acceptable, it MUST raise an error on initial read-in of the file, behaving as if the included file didn't exist (except, potentially, for the text of the error message).
File paths MUST be given as quoted strings; no escape sequences are allowed within them.
Syntax: @include <filepath>
This directive includes a separate debugfile as part of the one where it appears. Circular inclusions (that is, chains of inclusions that result in a file directly or indirectly including itself, also known as recursive inclusions) are not allowed; implementations SHOULD check for this case and treat it as an error.
Syntax: @symfile <filepath>
This directive includes a symfile into the debugging information for the loaded program. If the included symfile has already been automatically loaded by the implementation, it MAY reload its symbols.
A symfile is a text file with a format defined by the RGBDS symbol file specification.
Symfiles may define banked or unbanked symbols: symbols that define an explicit bank number are considered banked, and
other symbols are unbanked.
That specification allows defining symbols with a bank identifier of BOOT
: those symbols SHOULD be considered
unbanked.
Implementations MAY automatically load symfiles even in the absence of any @symfile
directives; this is part of the
normal behavior of debuggers.
These directives allow configuring some internal parameters of the debugfile, such as the default base in force for numeric constants. They take effect immediately for the directives and actions that follow them; they may be given several times to change the configuration at different points in the file.
Included files MUST have a configuration separate from their parent files; in other words, they neither inherit nor modify the configuration from the file that includes them.
The defaults for each configuration option are stated in the description of the directives themselves.
Syntax: @radix 2|10|16
Default: 10
This directive specifies the default base in force for numeric constants. Numeric constants that don't contain a prefix character indicating their base will use this value as their base. Valid values are 2, 10 and 16.
Syntax: @signedness signed|unsigned
Default: unsigned
This directive specifies whether subsequent arithmetic expressions (in actions and in other directives) will use
signed or unsigned arithmetic; this affects some operations such as divisions and right shifts.
The words signed
and unsigned
are case-insensitive.
These directives don't fit in any of the above categories.
Syntax: @warning <message>
This directive contains warning text for the user; this is intended to be used within a conditionally-included section to warn the user about missing features or actions. The emulator SHOULD display the warning message to the user during the initial read-in of the file, and it SHOULD allow the user to abort the load of the debugfile.
The message MUST be a quoted string; no escape sequences are allowed within it.
Syntax: @error <message>
This directive is equivalent to the @warning
directive, but it generates an error when loading
the debugfile.
If this directive is encountered and not excluded by a conditional inclusion directive, the error text MUST be shown
to the user and the debugfile MUST NOT be used.
Emulators that implement debugfiles MUST immediately stop loading the debugfile and discard it if an unexcluded
@error
directive is encountered.
Actions are a series of responses that the user requests the emulator to automatically take when a certain event occurs and some conditions are met: for example, stopping emulation when a particular memory address is written to. Actions are the main focus of this specification: the goal of a debugfile is to define these actions for a particular program.
An action is comprised of two components: a condition and one or more commands. The condition indicates when the action will execute, and the commands indicate what the action will do when it executes.
The condition is bound to one or more addresses, and it triggers when the addresses in question are read, executed or written to. The least restrictive condition is one that is true for all reads, writes and executions; such a condition would trigger at least once per instruction executed.
Commands are the actual steps that the emulator must take when an event occurs for which the condition is true. Commands are simple debugging steps, such as stopping execution and breaking into the debugger, or printing a debug message to some debug console. An action can have any number of commands, which MUST be executed in sequence.
It is possible for many actions' conditions to be true simultaneously. If this is the case, the emulator MUST queue all of them for execution in any order it wishes. The commands for each action MUST be executed in sequence, and they MUST NOT be interleaved with other actions' commands.
An action can be enabled or disabled.
Actions are enabled by default, unless their flags set them as disabled.
A disabled action behaves as if its condition was always false, i.e., it never triggers.
Actions MUST return to their initial state (enabled or disabled) whenever the emulator is reset (for example, via the
use of the reset
command, through a menu entry or a hotkey, etc.).
Emulators MAY offer the ability to edit some or all of the properties of an action. For instance, they may allow enabling or disabling actions, as well as deleting them. An edited action SHOULD behave as if it had been entered into the debugfile in whatever form it results from the edits; in other words, such edits SHOULD take precedence over the originally-entered data in the debugfile.
The overall syntax of an action line is as follows:
<condition>: <command> [; <command> [; <command> ...]]
There MAY be zero or more whitespace characters at either side of the colon and semicolons; such whitespace characters are meaningless and ignored. Also, an action MAY be broken down into more than one line right after the colon and/or any of the semicolons shown above, as explained in section 3.4; action lines MUST NOT be broken down anywhere else. The trailing colon or semicolon at the end of a line indicates that the action has been broken down into several lines and that the next line continues the same action.
If an action is broken down into several lines, these lines MUST be consecutive; in other words, after an action line ending with a colon or semicolon, the action MUST be continued on the immediately following line. (Note that blank lines and comment lines are considered stripped by the normalization process described in section 3.2, and thus they are not considered for this rule; if the second line is separated from the first by blank and/or comment lines only, it is still considered to immediately follow the first line.) An action line ending with a colon or semicolon MUST NOT be the last line of the debugfile and it MUST NOT be followed by a directive or a private use line.
Semicolons are used as command delimiters, not command terminators. In particular, this means that the last command of an action MUST NOT be followed by a semicolon. Additionally, all actions MUST have at least one command. There is no explicit upper bound on the number of commands per action, but implementations MAY impose a reasonable limit. Such a limit SHOULD NOT interfere with intended uses of debugfiles.
Many components within an action (as well as some directives) use expressions to specify values. Expressions have similar syntax regardless of where they appear, and thus they are described here. A list of sample expressions (with their values) is available in Annex B for further reference.
Expressions use a single data type: 32-bit integers. All values in an expression are 32-bit integers, and operators manipulate those values. Expressions can contain numeric constants, symbols, variables, memory accesses and operators. All operations that result in a signed value use two's complement arithmetic.
Expressions may be signed or unsigned; this is determined by the action's flags and the current signedness in force
(as modified by @signedness
).
This only affects some operators and variables; the exact cases where the signedness makes a difference are indicated
below.
For example, division is signed or unsigned depending on the signedness of the expression where it appears.
Numeric constants are sequences of digits, optionally prefixed by a base indicator character, that evaluate to a constant non-negative numerical value. The value of a numeric constant MUST fit in 32 bits.
Numeric constants can be prefixed with the characters %
, #
or $
to indicate that they are respectively in base
2, 10 or 16; if no prefix is present, the constant is in the default base in force, as given by the
@radix
directive.
For instance, the constants %10100
, #20
and $14
all specify the same value, and so does the constant 20
if the
default base in force is 10.
The hexadecimal digits A
through F
can be written in uppercase or lowercase.
Numeric constants that are not prefixed by a base indicator character MUST begin with a digit.
This is only relevant if the default base in force is 16; in that case, a numeric constant MUST NOT begin with the
digits A
through F
.
(This can be avoided by adding an additional 0
in front, or by explicitly prefixing the constant with a $
character: for example, the value FF
can be written as 0FF
or $FF
.)
Numeric constants MUST NOT use characters that are not valid digits for the chosen base, either implicitly or via a prefix. A numeric constant with invalid digits is a syntax error.
If a signed expression contains a numeric constant that evaluates to a value that has its upper bit set (i.e., a value
that is greater than or equal to $80000000
), that constant MUST be converted to a negative value via two's
complement arithmetic.
Symbols are constants, specified in a symfile or declared explicitly in a @sym
,
@local
or @alias
directive, that evaluate to an address.
The address in question may be banked or unbanked; a banked address contains an explicit bank number in addition to
the address portion.
Unless otherwise stated in a specific context, bank numbers are generally ignored; banked symbols evaluate to their
address portions when used in an expression.
Symbols always evaluate to an unsigned value, i.e., they are zero-extended into 32 bits.
A symbol's name can collide with a variable's name. If this is the case, using its name in an expression MUST evaluate to the symbol, not the variable. The use of a name in an expression that is neither a valid symbol name nor a valid variable name MUST be considered an error.
Only symbols with valid names (according to the rules stated in section 4.3) may be used in expressions.
For symbols that don't comply with these rules, an alias needs to be declared (using the @alias
directive) in order to be able to refer to them.
If a symbol's address portion falls within an unbanked memory region and its bank number is zero, it SHOULD be treated as an unbanked symbol in order to facilitate its usage. (Banked memory regions are defined in the subsection regarding memory accesses.)
A banked symbol whose address falls at the beginning of a memory region may have a bank number that would be valid for
the region immediately preceding it.
For example, a symbol might have an address of $02:$C000
(assuming that 2 is a valid SRAM bank for that program).
This is common in past-the-end symbols, where a symbol points to the end of some memory block.
Implementations MUST NOT consider this an error, since that symbol can be used in calculations that result in a valid
banked address.
Variables are values that represent some part of the program's state, such as a register.
They come from two sources: the emulator provides a fixed set of variables that the user can use, and the user can
define additional variables through the @var
directive.
User-defined variables only change when the user explicitly changes them (for instance, via a command); emulator
variables can also change to reflect changes in the program state.
All user-defined variables MUST begin with an underscore; emulator variables MUST NOT begin with one.
A variable's name can collide with a symbol's name; in this case, using its name in an expression MUST evaluate to the
symbol, not the variable.
In order to allow using the variable in this case, the name can be prefixed with a @
character to indicate that it
SHALL evaluate to the variable.
Variable names MAY be prefixed with @
even if there is no collision — for example, the expression @bc
MUST
evaluate to the bc
variable, regardless of whether a bc
symbol exists.
The use of a name in an expression that is neither a valid variable name nor a valid symbol name, as well as the use
of a @
-prefixed name that is not a valid variable name, MUST be considered an error.
Emulators MUST provide at least the variables indicated below. Emulators MAY provide any additional variables they wish, including alternate casings for the variables listed below, provided that none of them begin with an underscore.
Variables | Bits | Signedness | Description |
---|---|---|---|
a , b , c , d , e , h , l |
8 | Context | CPU single-byte registers. |
f |
8 | Unsigned | CPU flags register; it is always unsigned. |
af , bc , de , hl |
16 | Context | CPU register pairs. |
sp , pc |
16 | Unsigned | CPU two-byte registers; they are always unsigned. |
zf , cf , nf , hf |
32 | Unsigned | Zero, carry, subtraction and half carry flags. Always 0 or 1. |
ime |
32 | Unsigned | Interrupt master enable flag. Always 0 or 1. |
sram |
32 | Signed | 1 if SRAM is enabled, 0 if disabled, -1 if no SRAM is present. |
target |
16 | Unsigned | Address that caused the action to fire. |
op |
2 | Unsigned | Current operation (0: read, 1: write, 2: execute, 3: read+write). |
value |
8 | Context | Value being read or written, or opcode byte for execution actions. |
next |
16 | Unsigned | Address of the next instruction that will be executed. |
The "signedness" column in the table above indicates how the variables' values are extended into 32 bits: they are zero-extended if unsigned or sign-extended if signed. If the signedness is "context", it means that they take the signedness of the expression they appear in.
The "bits" column of the table above indicates the natural width of the value, i.e., the number of bits the value actually has. The value will be extended to 32 bits when used in an expression, since all values are 32 bits wide in expressions.
The pc
variable contains the value of the pc
register, which points to the instruction about to be executed
(regardless of the value that the corresponding register would have in actual hardware).
The next
variable contains the address of the next instruction that will be executed, that is, pc
plus the length
of the execution that is currently executing; for the purposes of this calculation, undefined instructions and the
stop
and halt
instructions MUST be considered to be one byte long.
A memory access indicates that a value MUST be retrieved from memory in order to evaluate the expression. Memory accesses can retrieve values that are 8, 16 or 32 bits in width; if the value is not 32 bits wide, it MUST be extended into 32 bits according to the signedness of the expression (zero-extended if unsigned, sign-extended if signed). The retrieval of a value from memory MUST NOT be considered a memory read; in particular, it MUST NOT trigger any actions that watch the address in question, and it MUST NOT cause any side effects.
Memory accesses are 8 bits wide by default. A suffix can indicate that an access is 16 or 32 bits wide instead; such an access will read several bytes of memory. The suffix will also indicate whether the bytes are collected into a single value using little or big endianness.
The syntax for a memory access is [<address><suffix>]
, where the brackets are literal and the suffix can be empty.
There MAY be arbitrary whitespace around the <address>
and <suffix>
components (i.e., after the [
character,
before the ]
character, or between <address>
and <suffix>
); this whitespace is meaningless and ignored.
The <address>
part of the memory access can be any address expression.
The <suffix>
part can be any number of the following possible suffixes, including none:
- Multi-byte access suffix: one of
!
(16-bit little endian),!!
(32-bit little endian),?
(16-bit big endian) or??
(32-bit big endian). - Accessibility override suffix:
^
.
Suffixes MUST appear in the order listed, without any duplicates and without any whitespace between them.
Memory accesses may be banked or unbanked.
For banked memory regions, banked accesses use a specific bank, while unbanked accesses use whichever bank is
currently loaded.
Accesses to unbanked memory regions or accesses that cross memory regions (such as a 16-bit access at address $9FFF
)
MUST be unbanked.
A memory region is considered banked if bank switching is possible in that region, and unbanked otherwise; emulators
MAY consider all of ROM, VRAM, SRAM and/or WRAM to be banked even when, under the currently loaded program's hardware
configuration, no bank switching is possible in some or all of those regions.
If a memory access does not contain the ^
suffix, the corresponding memory MUST be read as the CPU would read it.
(For a banked access, this may imply accessing data from a bank that isn't currently loaded, as if there was a bank
switch prior to the access.)
This means that inaccessible memory MUST return the values the CPU would expect: for example, $FF
when reading from
VRAM while it cannot be accessed.
On the other hand, if a memory access does contain the ^
suffix, the underlying memory MUST be accessed instead,
regardless of whether the CPU can access it at the time.
This ignores VRAM and OAM accessibility restrictions, as well as the global memory lock during OAM DMA transfers.
For example, [$C000]
is an 8-bit memory access, [@bc?]
is a 16-bit big-endian memory access, and [$FE00!!^]
is a 32-bit little-endian memory access that ignores OAM accessibility restrictions.
Operators indicate a calculation that is carried out on their operands in order to produce a new result. All operators take 32-bit values as their operands and generate a 32-bit result; if their result overflows 32 bits, it MUST be truncated to 32 bits. Some operators behave differently depending on whether they are signed or unsigned; operators always take the signedness of the expression they appear in. Signed calculations use two's complement arithmetic.
Operators can be unary or binary. Binary operators go between their two operands, while unary operators go before their single operand. Operators MAY be surrounded by arbitrary amounts of whitespace; this whitespace is meaningless. Unary operators MUST only appear at the beginning of an expression or parenthesized subexpression; emulators MAY relax this rule, but they are also allowed to issue an error instead if the rule is broken.
Parentheses can be used to group all or part of an expression into a subexpression; such a subexpression MUST be evaluated into a single value before being used as an operand to some other operator. In other words, parentheses indicate that the subexpression takes precedence above the regular precedence rules. Parentheses MAY be surrounded by arbitrary amounts of whitespace; this whitespace is meaningless.
Unary operators MUST be evaluated before binary operators. If an implementation allows many of them to appear together (violating the rule about unary operators appearing at the beginning of an expression or subexpression, since only the first of such operators can be at the beginning), they MUST be evaluated right to left. Valid unary operators are the following ones, all having the same precedence:
Operator | Description |
---|---|
- |
Two's complement of its operand. |
+ |
Results in its operand unmodified. Added for completeness. |
~ |
One's complement of its operand, i.e., unary NOT. |
& |
Current bank at the memory location given by its operand; 0 if that location is unbanked or unmapped. |
&& |
MUST be applied to a symbol; results in that symbol's bank. If the symbol is unbanked, the result is 0. |
! |
1 if the operand is zero, or 0 otherwise. |
!! |
0 if the operand is zero, or 1 otherwise. |
If an implementation allows multiple unary operators to appear together, the !!
operator MAY be implemented as two
consecutive instances of the !
operator instead of being a valid operator by itself.
The &
operator returns the bank at a specific memory location, as an unsigned (i.e., zero-extended) value; the
operator MUST only consider the lower 16 bits of its operand.
(For instance, &$4000
would return the current ROM bank on a usual MBC-type Game Boy cartridge.)
For unbanked or invalid addresses, the operation MUST result in 0 and MUST NOT cause any errors.
If the bank number currently selected for that location aliases a different bank (for example, WRAM bank 0 aliasing
WRAM bank 1 in the Game Boy Color), the effective bank number (as determined by the hardware) for that location MUST
be returned instead.
Likewise, the &&
operator returns the bank for a specific symbol as an unsigned (i.e., zero-extended) value; if the
symbol is unbanked, the operator MUST result in 0 and MUST NOT cause any errors.
Binary operators MUST be evaluated in descending order of precedence; ties are broken in left-to-right order. Valid binary operators are:
Operator | Precedence | Description |
---|---|---|
<< |
9 | Left shift. If the right operand is not between 0 and 31, the result is zero. |
>> |
9 | Right shift, affected by signedness. Shift counts not between 0 and 32 are taken to be 32. |
* |
8 | 32-bit multiplication; the result is unaffected by signedness. |
/ |
8 | 32-bit division, always rounding towards zero. Affected by signedness. |
% |
8 | 32-bit remainder, affected by signedness. Defined as x % y = x - x / y * y . |
** |
8 | Upper 32 bits of the 64-bit full product of its operands. Affected by signedness. |
+ |
7 | 32-bit addition. |
- |
7 | 32-bit subtraction. |
& |
6 | Bitwise AND. |
| |
5 | Bitwise OR. |
^ |
5 | Bitwise XOR. |
= , == |
4 | 1 if both operands are equal, or 0 otherwise. |
!= , <> |
4 | 0 if both operands are equal, or 1 otherwise. |
< |
3 | Less than comparison; affected by signedness. 1 if true, 0 if false. |
> |
3 | Greater than comparison; affected by signedness. 1 if true, 0 if false. |
<= |
3 | Less than or equals comparison; affected by signedness. 1 if true, 0 if false. |
>= |
3 | Greater than or equals comparison; affected by signedness. 1 if true, 0 if false. |
&& |
2 | Logical AND. 1 if both operands are non-zero, or 0 otherwise. |
|| |
1 | Logical OR. 1 if either operand is non-zero, or 0 otherwise. |
^^ |
1 | Logical XOR. 1 if exactly one operand is non-zero, or 0 otherwise. |
If there is any parsing ambiguity between two unary or two binary operators, it MUST be resolved in favour of the longest one of the two. If there is any parsing ambiguity between a unary and a binary operator, it MUST be resolved in favour of the binary operator.
Division and remainder by zero MUST NOT result in an error: division by zero MUST result in zero and remainder by zero MUST result in the first operand unmodified (following the definition for remainder in the table above).
An expression is said to be constant if it doesn't contain any references to variables,
memory accesses, or the unary &
operator.
In other words, a constant expression MUST only contain numeric constants and symbols (and operators and parentheses
combining them, excluding unary &
).
Constant expressions are a subset of all expressions; while they can be used wherever an expression is called for,
some parts of this specification will explicitly require constant expressions.
The value of a constant expression MAY be precalculated when the debugfile is initially read in, since it depends on values that cannot change at runtime. If an emulator precalculates constant expressions this way and it also offers the ability to edit symbol addresses, constant expressions MUST be reevaluated when the address of a symbol they refer to changes.
Address expressions are a specific kind of expression that is used in some contexts; they evaluate to a memory address instead of a value. An address expression may be banked or unbanked: a banked address expression refers to a specific bank/address pair, while an unbanked address expression refers to a specific address in any bank.
Address expressions can take one of three forms: <expression>
, :<expression>
and <expression>:<expression>
.
The last two forms are explicitly unbanked and banked, respectively; in the last form, the first expression is the
bank and the second one is the address.
The colon in those forms MAY be surrounded by arbitrary amounts of whitespace; this whitespace is meaningless.
The first form is just an ordinary expression. In that case, if the first token of the expression (not including any parentheses) is a banked symbol, the address expression is banked and takes the bank of that symbol; otherwise (including when the first non-parenthesis token is not a symbol at all), the address expression is unbanked.
The address part of the expression MUST be truncated to 16 bits, ignoring the upper half of the value; overflows MUST
NOT cause an error.
Likewise, if the address expression is banked and it is not being used to declare a symbol (i.e., as part of a
@sym
or @local
directive), the bank part of the expression MUST be truncated
to the correct number of bits for that address, ignoring any overflowing bits.
(Declared symbols keep their full 32-bit bank values, as those symbols can be used in expressions resulting in
unrelated values or addresses.)
Any truncations MUST be carried out only after the value of the expression has been determined; in other words, the
expressions contained in the address expression MUST be evaluated to a single value using the full 32-bit precision
before being truncated.
If the address expression is not being used to declare a symbol and the address part of the expression falls within an unbanked memory region, the address expression MUST be unbanked; however, implementations SHOULD accept banked address expressions with a bank number of zero for those regions.
If an address expression meets the conditions set for constant expressions, it is a constant address expression.
The condition field of an action indicates when the action will fire.
This field is comprised of three subfields: the address field, the flags field and the condition expression field.
The first two are mandatory, but the condition expression field MAY be missing; if it is, it is assumed to be the
constant expression 1
.
These subfields are separated by arbitrary amounts of whitespace.
The address and flags subfields MUST NOT contain any inner whitespace, since whitespace is used to delimit them. The condition expression subfield MAY contain arbitrary inner whitespace, as it is the last subfield of the condition field. The condition field is terminated by a colon; while the address and condition expression subfields can contain colons, the terminating colon is unambiguous because the flags subfield is mandatory and all colons within the condition expression subfield will necessarily appear within memory access expressions (and thus inside brackets).
The address subfield indicates the addresses that can cause the action to fire; the flags subfield indicates which operations over those addresses will trigger it, as well as some additional attributes for the action. The condition expression subfield is an expression that must be non-zero for the action to fire; if the condition expression for an action evaluates to zero whenever that action would fire, it does not. For example, the following condition field:
$04:$58AB rx @bc > $1234
indicates that the action will fire if the byte at bank 4, address $58AB
is read or executed, but only if register
bc
is greater than $1234
.
The condition field will be followed by a colon and the command list; the action line MAY be broken after the colon by inserting a newline. Following the example above, the full action line might look like this:
$04:$58AB rx @bc > $1234: break
The colon that terminates the condition field (the one after $1234
) is unambiguous, because the colon in the address
subfield ($04:$58AB
) comes before the flags subfield, and the flags subfield is mandatory.
Additional whitespace between the subfields MAY be inserted for readability, like so:
$04:$58AB rx @bc > $1234 : break
The condition expression subfield can be any arbitrary expression, as detailed in section 5.3. The other two subfields will be explained in the following sections.
The address subfield of the condition field specifies the addresses that will be watched by the action, i.e., which addresses can cause the action to fire. This subfield contains one or more address specifications separated by commas; each specification may be a single address or a range, and it may be banked or unbanked. The address subfield MUST NOT contain any whitespace: while whitespace is normally allowed within expressions (and the address subfield will usually contain one or more expressions), whitespace is used to delimit the address and flags subfields, and thus it cannot be included within those subfields.
A single address is given as a constant address expression; the address will be banked or unbanked depending on whether the given expression is.
Since addresses in this subfield are constant expressions, numerical constants are handled in the manner explained in
the corresponding part of the Expressions section.
In particular, this means that numerical constants don't default to hexadecimal (unless there is a
@radix 16
directive in effect), and that they MUST begin with a digit or a base indicator
character.
If some address specification specifies a range, the starting address MUST be lower than or equal to the ending address; no wrap-around is assumed for this subfield. Address ranges always include both endpoints. If an address is banked, that address or range of addresses MUST be in a banked region of memory and MUST NOT cross memory regions; if an address or range is unbanked, but it covers a banked portion of memory, it refers to those addresses in all banks.
Ranges can be specified in one of two ways:
<address>--<expression>
: the start and end are given explicitly. The--
delimiter represents a hyphen (indicating a range), not the subtraction operator; for instance, the range$C000--$DFFF
covers all of WRAM.<address>++<expression>
: the second expression indicates the length of the range. For instance,$C000++$2000
covers all of WRAM as well.
In both cases, <address>
is any constant address expression, and <expression>
is any
constant expression.
The result of <expression>
MUST be truncated to 16 bits before computing the range; in particular, this means that
using a negative value of <expression>
for the ++
form will not work as expected.
The --
and ++
symbols are delimiters, not operators; there is no need to parenthesize the two expressions that
they delimit, and they don't have a precedence because they are not involved in any calculation.
For the ++
form, after truncating to 16 bits, the length expression MUST NOT evaluate to zero.
Ranges MUST NOT wrap around.
This means that, for the --
form, the ending address MUST be greater than or equal to the starting address; for the
++
form, after truncating both the address and the length to 16 bits, their sum MUST NOT exceed $10000
.
(It is acceptable for the sum to be exactly $10000
, as this occurs for ranges that extend to the very end of the
addressing space, such as $FF80++$80
.)
The ,
, ++
and --
delimiters cannot appear naturally in expressions, and thus they will unambiguously delimit the
expressions involved in the address specifications.
For implementations that don't require unary operators to appear at the beginning of an expression or subexpression
(which the corresponding section of this specification allows as an option), any ++
or --
sequences in address specifications MUST be parsed as the corresponding delimiters and not as operator sequences, even
if this would cause parsing to fail.
(For example, the specification 20--30
MUST be parsed as a range and not as 20-(-30)
; also, the specification
2*(20++30)
MUST be parsed as a range (which will cause parsing to fail), even though parsing it as 2*(20+(+30))
would be successful.)
If the address subfield contains two or more address specifications, the action will watch the union of the addresses
specified by all such address specifications.
(For example, a specification of $4000,$4010--$4013,$4020++5
will watch a total of 10 addresses.)
The address subfield can also be a single *
character.
This is shorthand for the range $0000--$FFFF
.
The *
character MUST appear by itself; in other words, it MUST NOT be part of a list of two or more address
specifications.
(This is not to be confused with the *
operator, which can appear any number of times in the address subfield.)
An action's flags indicate which operations will trigger the action, as well as some attributes for it. The address subfield is used to specify the addresses that trigger the action; the flags refine this by stating which operations on those addresses will cause it to fire. Actions MUST always fire before the operation is executed.
Flags come in two kinds: operation flags and attribute flags. Operation flags are the flags that represent operations over the memory addresses that are being watched; every action MUST have at least one operation flag. Attribute flags indicate an action's attributes; actions may have any number of these, including none.
Flags consist of one or two characters; multiple flags can be given by concatenating them. Flags MUST NOT be separated by whitespace. Two-character flags always consist of identical characters, and they are modified versions of the corresponding one-character versions; the corresponding one-character and two-character flags MUST NOT be used simultaneously in the same action. Flags MUST NOT be duplicated; any particular flag may only be given once per action. Flags MAY be given in any order; the order is immaterial. Flags are case-insensitive; they are all listed in lowercase (and shown that way throughout this document) for consistency.
The flags listed below (as well as their uppercase equivalents) are the only valid flags; unknown flags MUST be considered a syntax error. These flags are:
Flag | Type | Description |
---|---|---|
r |
Operation | Indicates that the action will fire when an address is read. |
w |
Operation | Indicates that the action will fire when an address is written to. |
ww |
Operation | Indicates that the action will fire when a new value is written to an address (modifying write). |
x |
Operation | Indicates that the action will fire when an address is executed. |
xx |
Operation | Indicates that the action will fire when an address is being jumped to. |
s |
Attribute | Sets the signedness to signed for this action's expressions, overriding the default. |
ss |
Attribute | Sets the signedness to unsigned for this action's expressions, overriding the default. |
d |
Attribute | Disables the action. This sets the action's state when loaded to disabled (instead of enabled). |
m |
Attribute | Indicates that the action can fire multiple times per instruction. |
b |
Attribute | Indicates that the action will only fire when the boot ROM is mapped. |
bb |
Attribute | Indicates that the action will fire regardless of whether the boot ROM is mapped. |
For actions that fire on writes, if an expression contains a memory access that retrieves the value at the address
being written to, the value before the write MUST be used for that expression.
The ww
flag indicates that the action fires when the value being written to is different from the value already
present at that address; in other words, when the expression [target] != value
evaluates to 1.
Executing an instruction MUST NOT be considered a memory read for the purpose of firing actions. Actions that fire on reads are only triggered by explicit memory reads.
If an instruction triggers an action multiple times (for example, by writing two bytes at once, or by reading and
writing to the same address at once), the action MUST fire only once unless the m
flag is given.
With the m
flag, the action MUST fire once per operation. For example, given the following actions:
$C100--$C101 w: message "action 1"
$C100--$C101 wm: message "action 2"
the instruction ld [$C100], sp
would cause the first action to fire once and the second action to fire twice (once
per byte written).
If an action with the flags xm
was created for the address range containing that instruction, that action would fire
three times, since three bytes are being executed.
If a multi-byte instruction is executed, actions that fire on execution MUST fire if any part of the instruction lies
within the addresses watched by the action.
If the action does not include the m
flag, it fires only once; the target
variable MUST be set to point to the
first byte of the instruction that lies within the watched addresses.
On the other hand, actions that fire on jump targets (i.e., actions that contain the xx
flag) MUST only fire if the
jump target matches exactly a watched address, regardless of the size of the instruction at the receiving end of the
jump.
Actions that contain the xx
flag MUST fire before the jump is taken; in other words, the value of the pc
variable
MUST point to the jump instruction (jr
, jp
, call
, ret
or rst
) that is causing the jump.
These actions MUST NOT fire due to jumps that are not taken.
For these actions, the value of the target
variable will be the target of the jump, and the value of the op
variable will be 2.
For the purpose of firing actions that trigger on instruction execution, undefined instructions MUST be considered to
be one byte long; any action firing on the execution of such an instruction MUST fire before the emulator handles the
error condition that results from attempting to execute it.
The stop
and halt
instructions MUST also be considered to be one byte long, regardless of whether they are
followed by a nop
instruction or not; if this nop
is present and it does get executed (as opposed to being skipped
by the CPU), it MUST be handled as a separate instruction for the purpose of firing actions that trigger on
instruction execution.
Actions MUST NOT fire on operations not caused by the execution of a CPU instruction.
For example, execution of an interrupt handler MUST NOT fire an action containing the xx
flag and targetting the
handler address (or an action containing the r
flag and targetting the interrupt registers), and writes not
initiated by the CPU (such as DMA, changes to memory-mapped registers, or cartridges changing cartridge-mapped values)
MUST NOT fire actions targetting those addresses containing the w
flag.
(On the other hand, execution of an interrupt handler will fire actions containing the x
flag targetting the
executed handler, because these actions fire as the corresponding instructions execute.)
For a combined read/write operation (such as inc [hl]
), if an action fires on both reads and writes and the m
flag
is given, the read operation MUST fire before the write.
If the action does not contain the m
flag, the combined operation fires the action once, with the op
variable
taking the value 3 and the value
variable taking the value of the written byte.
If an action fires on reads and modifying writes (i.e., it contains the r
and ww
flags), the above behavior only
applies for bytes that are modified.
Reads followed by non-modifying writes (e.g., the execution of a set 0, [hl]
instruction when the value at [hl]
is
already odd) MUST be treated as pure reads for the purpose of firing the action.
For an instruction that accesses multiple bytes at once, if it causes an action watching several of those addresses
that does not contain the m
flag to fire, the target
and value
variables for it MUST be set for the highest
address in common between the action and the operation.
If the action contains the ww
flag and the operation that fires the action is a write that only modifies some of the
affected addresses, addresses that are not modified by the write MUST NOT be taken into account when determining the
highest address in common.
By default, actions MUST NOT fire as long as the boot ROM is mapped.
The b
flag inverts this behavior, making an action fire only as long as the boot ROM is mapped; the bb
flag
instructs the emulator to fire the action regardless of whether the boot ROM is mapped or not.
Emulators that don't emulate the boot ROM (either because they lack the capability or because they have been
configured to not do so) MUST ignore all actions containing the b
flag; actions containing the bb
flag MUST be
handled normally.
The s
and ss
flags affect all expressions in the action; this includes the address specifications, the condition
expression subfield and any expressions contained in any commands within the action.
Commands indicate the actual steps taken by the emulator when an action fires. Actions may cause the emulator to halt execution, print a debug message, and so on. Every action has a command list containing one or more commands; these commands MUST be executed in order when the action fires. Command lists MAY have any number of commands, but emulators MAY impose reasonable limits in order to prevent bugs or security issues. Such a limit SHOULD be high enough to not interfere with the normal, intended usage of debugfiles.
Commands in an action's command list are separated by semicolons. Semicolons are separators, not terminators; this means that the last action in the command list MUST NOT be followed by a semicolon. (In fact, the line ending without a colon or semicolon at the end is what marks the end of the action.) These semicolons MAY be surrounded by arbitrary amounts of whitespace, which is ignored.
An action line MAY be broken after any of these separating semicolons, inserting a newline. This means that the command list continues in the next line.
Each command has its own syntax and semantics.
Commands always begin with a keyword, which is a valid identifier (according to section 4.3) that
identifies the command; commands can have additional arguments after the keyword, separated by one or more spaces.
For example, the message
command will be followed by a string, as in message "Hello world!"
,
but the break
command always stands by itself.
Command names are case-insensitive: commands that only differ in case MUST be considered equal.
For example, break
and BREAK
represent the same command.
Command names are shown in lowercase throughout this document for consistency.
Command names beginning with an underscore are private use, reserved for emulators to define their own.
Prefixing private-use command names with the name of the emulator that defines them (such as _fooemu_dosomething
) is
RECOMMENDED.
Implementations MUST consider unrecognized command names, private use or otherwise, as an error; it is RECOMMENDED
that they detect such errors on initial file read-in, as specified in section 3.3.
Therefore, debugfiles SHOULD use conditional inclusion to ensure that private-use commands are only used
in emulators that support them.
These commands cause the emulator to perform basic debugging functions.
Syntax: break
This command causes a breakpoint, i.e., it instructs the emulator to stop execution and open the debugger.
This command is idempotent: if several of these commands are executed at once (because a command list contains
multiple break
commands or because several actions containing a break
command fire at once), they have the same
effect as a single break
command.
This MUST NOT cause an error.
Syntax: reset
This command instructs the emulator to reset itself, as if the Game Boy was power-cycled.
This command MAY be implemented in an idempotent way: several reset
commands executed at once MAY reset the emulator
any number of times between one and the number of reset
commands executed.
The exact number of resets executed in this case is implementation-defined and it MAY vary between executions.
Syntax: message <string>
This command causes a debug message to be printed. The emulator MAY output debug messages in any way it wishes; it SHOULD NOT stop emulation for that purpose.
The <string>
argument can be either a named string declared with the @str
directive or a quoted
constant string as explained in section 7.
Escape sequences are applied regardless of the chosen format.
Syntax: alert <string>
This command is equivalent to the message
command, but it is intended to bring immediate attention to the message
printed.
The emulator SHOULD display the message in a way that brings the user's attention to it, such as a message box;
emulation MUST be stopped at least until the user acknowledges the message.
If several alert
commands are executed at once (because a command list contains many of them or because several
actions containing an alert
command fire at once), the emulator SHOULD combine them into a single alert, in order to
avoid stopping emulation repeatedly.
These commands enable or disable actions.
Syntax: enable [<group>]
Enables all actions within a group (as declared by the @group
directive).
If some or all actions within that group are already enabled, this command does nothing to those actions; enabling
already-enabled actions MUST NOT cause an error.
The group may be omitted from the command; if it is, it enables the action where it appears. (This can be useful if the action has been disabled by another command from the same command list, or by a command from an action that fired simultaneously.)
Enabling an action MUST NOT cause that action to immediately fire, even if its condition is currently true: the action is enabled starting at the next event that could cause actions to fire.
Syntax: disable [<group>]
Disables all actions within a group; disabling already-disabled actions MUST NOT cause an error.
This is the opposite of the enable
command.
The group may be omitted from the command, in which case it disables the action where it appears.
Disabling an action that has already fired but is pending execution MUST NOT cause that action's commands to not be executed: the action is disabled starting at the next event that could cause actions to fire.
Syntax: toggle [<group>]
Toggles the enabled/disabled status of all actions within a group, enabling the ones that are disabled and disabling the ones that are enabled.
The group may be omitted from the command, in which case it toggles the enabled/disabled status of the action where it appears.
Considerations about enabling and disabling actions that apply to enable
and
disable
also apply to toggle
.
These commands modify the program state, by altering a variable, a memory location or the execution of the program.
Syntax: set <lvalue> := <expression>
This command modifies a value in the program state.
The <lvalue>
part is an expression, which MUST be a variable name, a
memory access expression, or the result of the unary &
operator; the
<expression>
part can be any arbitrary expression.
The :=
symbol used as a delimiter MAY be surrounded by arbitrary amounts of whitespace, which is ignored.
If a variable name is used for the <lvalue>
part, it MAY be prefixed with an @
character; this prefix MUST be used
if the variable name collides with an existing symbol name.
The behavior of this command depends on the kind of expression used for <lvalue>
:
- If the
<lvalue>
is a memory access that doesn't contain the^
suffix, a write to that address MUST be simulated; if it is a multi-byte access, the bytes MUST be written in ascending order of addresses. This may or may not cause the addresses to be modified; for example, an address in WRAM will be modified by the write, while an address in ROM will probably cause some banking command to be executed. This write MUST NOT cause any actions to fire. - If the
<lvalue>
is a memory access that does contain the^
suffix, the behavior is equivalent to the corresponding access without the^
suffix, except for the following cases:- An access to an address in the ROM region MUST be performed as if the ROM image was writable;
- An access to an address in the VRAM or OAM regions MUST write to those addresses, even if they are currently inaccessible to the CPU;
- An access to an address in the SRAM region MUST ignore any SRAM write protections;
- An access to an address in the ROM or SRAM regions MUST NOT cause any banking commands (or other mapper commands) to execute;
- An access performed while an OAM DMA transfer is active MUST ignore the global memory lock imposed by such a transfer.
- If the
<lvalue>
is a user-defined variable, that variable is set to the value of<expression>
. - If the
<lvalue>
is an emulator variable, the semantics of writing to it are defined by the emulator; in particular, an emulator variable MAY be treated as read-only, ignoring the writes. For the minimal required variables listed in the "Variables" heading of the Expressions section, they MUST behave in the following manner:- Writes to register variables MUST update the corresponding registers (in particular, writes to the
f
andaf
registers will ignore the lower four bits, and writes to thepc
register will cause a jump); - Writes to variables representing CPU flags (
zf
,cf
,nf
,hf
andime
) MUST set the corresponding CPU flag to 0 if the expression is zero, or to 1 otherwise; - If the program uses SRAM, and the program's hardware configuration supports disabling SRAM, writes to the
sram
variable MUST enable SRAM if the expression is non-zero or disable SRAM if it is zero; - The remaining required variables (
target
,op
,value
andnext
, as well as thesram
variable for programs without SRAM or for which SRAM cannot be disabled) MUST be read-only.
- Writes to register variables MUST update the corresponding registers (in particular, writes to the
- If the
<lvalue>
is the result of the unary&
operator, the emulator MUST switch the active bank at the memory region corresponding to the address designated by the operand of the&
operator to the bank designated by<expression>
. (If the<lvalue>
expression involves more than one such operator, such as in&(&0)
, only the operand to the outermost operator is considered.) If the chosen bank number aliases another bank (such as WRAM bank 0 aliasing WRAM bank 1 in the Game Boy Color), the actual bank number that would be selected by the hardware MUST be selected instead. If the operand to the&
operator in the<lvalue>
designates an unbanked or invalid memory location, the write MUST be ignored.
The result of evaluating <expression>
MUST be truncated to the correct number of bits before performing the
assignment.
This depends on the kind of expression used for the <lvalue>
, as follows:
- For memory accesses,
<expression>
is truncated to the width of the access; - For user-defined variables,
<expression>
is not modified (user-defined variables are always 32 bits wide); - For emulator variables,
<expression>
is truncated to the actual size of the variable; - For the result of a unary
&
operator,<expression>
is truncated to the size of bank numbers for the corresponding memory region.
Undefined variables (that is, variables that aren't provided by the emulator or already defined by the user in a
@var
directive) MUST NOT be defined by a set
command.
In other words, attempting to set an undefined variable MUST cause an error.
Since the names of existing variables are already known as the debugfile is loaded, this error SHOULD be detected
during initial read-in of the file.
Syntax: jump <address>
This command causes execution of the program to continue at the specified address. The address can be any address expression.
If the address is unbanked, this command merely causes execution to continue at the designated address, as if the
address had been written to the pc
register.
If the address is banked, the emulator MUST select the designated bank for the corresponding memory region, and then
jump to the designated address.
This command MUST NOT fire any actions that trigger when the designated address is jumped to. (Actions that trigger when the designated address is executed will execute as normal as those instructions execute.)
These commands alter the execution of other commands. They can be used to conditionally control the execution of the command list in order to provide multiple behaviors for an action.
While these commands can be used to skip some commands in the list, there is no way to move backwards in it. This is an intentional design choice, which ensures that pending commands can be safely discarded by the emulator once they have been executed or skipped.
Syntax: nop
This command does absolutely nothing; it can be used as a placeholder.
Syntax: done
This command terminates the execution of the command list; any further commands within the list MUST NOT be executed
after a done
command is executed.
This MUST NOT affect commands executed by other actions that happen to fire simultaneously.
Syntax: skip <expression>
This command causes a certain number of subsequent commands to be skipped. The number of skipped commands is determined by the given expression, which MUST be a constant expression; this expression MUST NOT evaluate to a negative value (if it is a signed expression) or to a value greater than the number of remaining commands in the list.
Syntax: if [<expression>]
This command causes the following command to be skipped if the given expression evaluates to zero; it MUST NOT be the last command in the command list. The given expression can be any arbitrary expression.
If the expression is omitted, the if
command behaves identically to the previous if
command; in other words, it
skips the following command if and only if the previous if
command did as well.
This MUST NOT cause the expression of the previous if
command to be evaluated again.
If an if
command without an expression is executed before any other if
command has been executed in the same
command list, it skips the following command, as if the condition had been false.
Syntax: else
This command causes the following command to be skipped if the last if
command executed did not skip the command
that followed it; in other words, it inverts the condition for the last if
command.
This MUST NOT cause the last if
command's expression to be evaluated again.
This command MUST NOT be the last command in the command list.
If an else
command is executed before any if
commands are executed in the same command list, it doesn't skip the
following command, as if the last if
command's condition had been false.
Strings are used in several contexts within a debugfile.
They appear in several directives (@alias
, @str
, @group
,
@include
, @symfile
, @warning
and
@error
) as well as in some commands (message
and alert
).
Strings are also the only part of a debugfile (other than comments) that can contain non-ASCII characters: the full
range of UTF-8 is allowed within a string, except for C0 control characters (other than tabs, which will be replaced
with spaces by normalization).
Strings MUST be written within quotes ("
), and they MUST NOT contain any embedded newlines.
Strings cannot contain literal quotes within them: a "
character will always end the string.
Simple strings are always literal, and they contain no escape sequences; these are the strings used by the various
directives that take strings as arguments.
On the other hand, the message
and alert
commands allow strings that contain
escape sequences, in order to incorporate the result of evaluating an expression into those strings.
The @str
directive doesn't parse the strings passed to it, but it also accepts those escape
sequences, since those strings are evaluated when used in the commands mentioned before.
Escape sequences are described in the following section.
Escape sequences allow inserting variable text into a string, such as the result of evaluating an expression or some
text that is subject to a condition.
Escape sequences always occur between braces ({
and }
); these signs MUST be properly matched when using a string
in a context that allows escape sequences.
There are three kinds of valid escape sequences:
- Expression escape sequences.
These are of the form
{<expression>}
or{<expression>,<format>}
. - Selection escape sequences.
These are of the form
{<expression>:[<name>][:[<name>]...]}
- Character replacement escape sequences.
These are of the form
{:<character>}
.
Arbitrary whitespace is allowed (and ignored) between the tokens inside the braces for the first two forms. Of course, whitespace outside the braces is taken literally as part of the string.
For the purposes of determining the default base in force and whether expressions are signed or unsigned, expressions contained within escape sequences are always evaluated in the context of the command that contains or references the string.
If a string contains malformed or otherwise invalid escape sequences, the emulator SHOULD display an error and SHOULD
NOT perform any escape sequence substitutions on it whenever the string is displayed.
Emulators MAY instead treat such invalid sequences in strings used as arguments to the message
and alert
commands and to the @str
directive as syntax errors, as described in
the corresponding section.
Expression escape sequences are escape sequences that contain an expression and are replaced by the expression's value
when printed.
They take the forms {<expression>}
or {<expression>,<format>}
; the expression can be any arbitrary expression.
Since expression escape sequences can only occur in strings used as part of an action's commands, the corresponding
expression MUST be evaluated in the context of the action that contains the command: the default base and signedness
in force for the expression are the corresponding values for the action (as modified by @radix
and @signedness
directives that come before it, or the s
and ss
flags if
they are used in that action), and the available variables and symbols and their values are those that would have been
available for expressions in that action's condition and commands.
This is true even for strings defined by the @str
directive and referenced by name in a command or
in selection escape sequences contained in strings used by that command.
The <format>
part of the escape sequence determines how the value will be formatted when printed.
It can contain a number, a formatting character, or both; if it contains both, the number MUST precede the formatting
character.
The numbers used as part of the format specifier are always decimal, regardless of the default base in force; the
format character is not a base prefix in this context.
The number MUST NOT contain more than two digits.
Valid formatting characters are as follows:
Character | Description |
---|---|
# |
Unsigned decimal number. |
$ |
Hexadecimal number. Digits A through F MUST be printed in uppercase. |
% |
Binary number. |
- |
Signed decimal number, prefixed with a - if negative. |
+ |
Signed decimal number prefixed with a - or + depending on its sign. The + sign is used for zero. |
The number indicates how many digits will be used to print the value. If the number is 0, the minimal number of digits needed to represent the value MUST be used; the value will never begin with a 0 digit unless it is equal to zero. Otherwise, the exact number of digits indicated MUST be used to print the value; the value MUST be prefixed with 0 digits if it is shorter than that length, and if it is longer, only the last digits are used. For signed decimal numbers, the sign is not taken into account when calculating the length.
If the number is omitted, it defaults to 0.
If the format character is omitted, it defaults to the base indicated by the current base in force (as given by the
@radix
directive); if that base is 10, the default is #
for unsigned expressions and -
for signed ones.
The <format>
part of the escape sequence MUST NOT be empty, but it MAY be left out entirely.
If it is, both defaults (number and format character) are used.
Selection escape sequences are replaced by one out of a list of named strings depending on a value.
They take the form {<expression>:<name>:<name>:<name>...}
, where the expression is followed by one or more names
separated by colons.
The expression may be any expression, and the names may either be empty or be the name of a string declared with the
@str
directive.
There may be arbitrary amounts of whitespace between tokens inside the escape sequence.
The same caveats regarding default base and signedness that apply for expression escape sequences apply
for the expression in selection escape sequences.
The names (including the empty ones) are assigned sequential indexes, starting from 0. The escape sequence is replaced by the named string whose index matches the value of the expression; if the expression matches none of those indexes (because it is negative or higher than the maximum index), the last one of those strings is chosen. If the chosen string has an empty name, the escape sequence is replaced by an empty string.
The referenced named strings MUST have been previously declared in a @str
directive, and the chosen
replacement string MUST be parsed for additional escape sequences.
For example, a sequence of the form {<expression>::<name>}
will be replaced by the named string if the expression is
true (i.e., non-zero), or by nothing if the expression is false (i.e., equal to zero).
A selection escape sequence of the form {<expression>:<name>}
will be replaced by the named string unconditionally,
as it is the only possible option; emulators MAY choose not to evaluate the expression in that case.
(In the absence of private-use extensions, this makes no difference, as expressions have no side effects.)
Note that, if the name is missing, this replacement always results in the empty string.
Character replacement escape sequences allow inserting characters that cannot otherwise be inserted in a string.
They take the form {:<character>}
, where <character>
is a single lowercase letter, and they are replaced by a
character that depends on the letter used.
No whitespace is allowed inside the braces for this form.
Valid replacement characters are the following ones:
Letter | Replacement character |
---|---|
c |
Closing brace (} ) |
n |
Newline |
o |
Opening brace ({ ) |
q |
Double quotes (" ) |
t |
Tab (U+0009) |
Newlines (i.e., the replacement for {:n}
) are implementation-defined; implementations may use a single line feed
character (U+000A) or a carriage return (U+000D) followed by a line feed.
An implementation can be said to be compliant with this specification if it implements it in full, accepting any valid debugfile and handling it like the specification describes. This can apply to any kind of tool (for example, it should be possible to create debugfile editors that generate correct debugfiles based on user selections), but it is mostly relevant for emulators.
However, at the time of writing, most known debugging emulators don't implement all the debugging capabilities required by this specification. In order to encourage adoption of the specification and with the purpose of achieving compliance progressively as more features are used, this specification allows for an emulator to claim "partial compliance", provided that it follows the criteria set in the following sections.
Partial compliance is compliance with parts of the specification, applicable to implementations that only implement some of its features. For instance, some directives or commands may not be recognized, or the implementation may impose additional restrictions not allowed by the specification, such as requiring that conditional expression subfields contain only simple expressions.
An emulator MAY claim partial compliance with this specification if it meets all of the following criteria:
- It implements all of the minimum required features specified in the following section;
- It documents its own name and version numbers, allowing the use of conditional inclusion directives with it; and
- It accurately documents which features are fully supported and which features are missing, known to be defective or only partially implemented.
The documentation MUST make it clear to all users what parts of the specification are usable, so that users can readily determine if they should conditionally exclude and/or rewrite some parts of their debugfiles to use them with the emulator in question.
In order to claim partial compliance, an emulator MUST support at least the following features of the specification:
- The identification directive, as described in section 4.1;
- Conditional inclusion directives, as described in section 4.2;
- The
@warning
and@error
directives; - The
break
command; and - The
x
flag in the flags subfield.
Also, there MUST be enough features supported so that an action line of some form can be written.
This specification is the result of several partially-conflicting goals. The desire is to determine a format that allows easy recording and sharing of debugging information; however, any features supported by such a format become a burden for implementations to handle, and thus the desire for additional features must be balanced against the cost of implementing such features.
Several considerations within the format, such as always allowing @
prefixes for variables, come from the desire to
support automated or tool-assisted generation of debugfiles.
Tools that generate debugfiles may not have the full symbol list available at the time of writing debugging actions,
and thus the @
prefix helps them emit the right expressions without having to assume that a symbol with some
specific name will or will not be defined.
The same applies to the various features that allow overriding defaults, such as base prefixes or the s
and ss
flags; these help those tools not have to rely on their context.
Some restrictions intend to make implementing the format easier. For instance, unary operators are only allowed at the beginning of an expression; this ensures that expressions are always alternating sequences of values and operators, without having to detect consecutive operators. That is why emulators that can handle the additional burden are allowed to relax this restriction, which helps them deal with accidental breaches of this rule more gracefully.
Finally, it is explicitly not the intention to turn this format into a programming language; features like loops are
intentionally left out.
User-defined variables are a concession to the utility of tracking some values while debugging a program (such as the
iteration counter in the example in annex A).
Moving forward, this specification is not intended to incorporate additional features that would turn the debugfile
format into a full programming language, such as negative arguments to the skip
command.
This annex is not normative.
The following is a full example file, in order to showcase various parts of the format:
@debugfile 1
@ifemu fooemu < 2.0.3beta
@warning "Your emulator is known to have issues with debugfiles; please consider updating."
@else ifemu baremu < 1.0
@warning "Your emulator may have issues logging null pointer accesses."
@always
@sym NULL $0000
; define a string that will contain the call site for the current function (from the stack)
@str knownloc "{&[sp!], 2$}:{[sp!], 4$}"
@str unknownloc "an unknown location"
@str callsite "{sp >= $c000 && sp <= $fdfe : unknownloc : knownloc}"
; {0:xyz} in a string will always include the named string xyz
@str rstmessage "RST ${@target, 2$} triggered from {0:callsite}! Resetting..."
NULL,$0038 x : message rstmessage; reset
NULL rw :
message "Null pointer access at {&pc, 2$}:{pc, 4$}!";
if @op == 1;
break
; track iterations of this loop
@var _iter 0
@var _total 0
FuncFoo.loop x : set _iter := _iter + 1
FuncFoo.loop x _iter >= 5000 :
set _total := _total + _iter;
message "Looped {_iter} times (accumulated: {_total})";
set _iter := 0
; prevent running code from RAM accidentally
$8000--$FF7F xx : alert "About to jump to a routine in RAM!"
; for HRAM, we handle it a bit differently, because the OAM wait loop is there
@group hramexec "HRAM execution prevention routines"
$FF80--$FFFF x : alert "Executing code from HRAM!"
hOAMWait.done xd : toggle hramexec
@endgroup
; only a call ($cd) or conditional call ($c4, $cc, $d4, $dc) instruction is allowed to do this
hOAMWait xx [pc] = $cd || [pc] & $e7 = $c4: toggle hramexec
; check for stack underflows and overflows everywhere, but alert only once
@group stackcheck
* xd sp < wStack || sp > wStackTop : alert "Stack overflow/underflow!";
disable
@endgroup
; make sure that the previous check is only enabled after the stack is set up
@alias InitDone "Init.initialization_done"
$0100 xbb sp = $FFFE : disable stackcheck
InitDone x :
if sp = wStackTop;
enable stackcheck;
else;
alert "Stack error: sp initialized to ${sp, 4$}!"
This annex is not normative.
Expressions are complex enough to be worth testing on their own. Therefore, this annex offers a number of test cases for implementors to verify their own expression evaluation engines against known results. These expressions are intended to exercise as many corner cases as possible; however, evaluating all these expressions successfully does not guarantee a correct implementation.
Examples are shown in a table containing the following columns:
- #: a simple sequential code for cross references (for example, in testing code). These codes are assigned sequentially, consist of two letters, and skip the letter O, as it can be mistaken for a zero. Sequence codes are unique across the entire annex.
- Expression: the expression being evaluated.
- Unsigned: the result of evaluating the expression in an unsigned context.
- Signed: the result of evaluating the expression in a signed context.
Results are always shown in hexadecimal, with a $
prefix, expanded to their full 32-bit (8-hex-digit) precision.
Results for address expressions are shown as a 32-bit (8-hex-digit) bank and a 16-bit
(4-hex-digit) address separated by a colon; if the address expression is unbanked, no value is shown before the colon.
(The address part of an address expression is always truncated to 16 bits.)
Some address expressions may evaluate to an invalid address. This is not accounted for in these examples: when an address is used to reference a location in memory (i.e., not to declare a symbol), it is validated after being calculated. Likewise, bank numbers are shown as full 32-bit values, even though, if the corresponding addresses are used to reference a location in memory, their bank numbers would later have to be truncated to the correct width for that address.
All expressions assume that the default base in force (as set by the @radix
directive) is 10.
Other assumptions will be listed at the beginning of the corresponding section headers.
These constant expressions only involve literal constants, without any symbols.
# | Expression | Unsigned | Signed |
---|---|---|---|
AA | -0 |
$00000000 |
$00000000 |
AB | -1 |
$FFFFFFFF |
$FFFFFFFF |
AC | -$80000000 |
$80000000 |
$80000000 |
AD | +$80000000 |
$80000000 |
$80000000 |
AE | 1 << 20 |
$00100000 |
$00100000 |
AF | -1 << 20 |
$FFF00000 |
$FFF00000 |
AG | -1 << 40 |
$00000000 |
$00000000 |
AH | -1 << (-1) |
$00000000 |
$00000000 |
AI | 1 >> 20 |
$00000000 |
$00000000 |
AJ | -1 >> 20 |
$00000FFF |
$FFFFFFFF |
AK | -1 >> 40 |
$00000000 |
$FFFFFFFF |
AL | -1 >> (-1) |
$00000000 |
$FFFFFFFF |
AM | $aaaaaaaa >> 1 |
$55555555 |
$D5555555 |
AN | $aaaaaaaa << 1 |
$55555554 |
$55555554 |
AP | $80000000 >> $80000000 |
$00000000 |
$FFFFFFFF |
AQ | -1 + 2 |
$00000001 |
$00000001 |
AR | 2 - 5 |
$FFFFFFFD |
$FFFFFFFD |
AS | -4 * 3 |
$FFFFFFF4 |
$FFFFFFF4 |
AT | -4 * (-3) |
$0000000C |
$0000000C |
AU | -4 ** 3 |
$00000002 |
$FFFFFFFF |
AV | -4 ** (-3) |
$FFFFFFF9 |
$00000000 |
AW | 4 ** 3 |
$00000000 |
$00000000 |
AX | -8 / 4 |
$3FFFFFFE |
$FFFFFFFE |
AY | -8 / 7 |
$24924923 |
$FFFFFFFF |
AZ | 8 / (-7) |
$00000000 |
$FFFFFFFF |
BA | -8 / (-7) |
$00000000 |
$00000001 |
BB | -8 % 4 |
$00000000 |
$00000000 |
BC | -8 % 7 |
$00000003 |
$FFFFFFFF |
BD | 8 % (-7) |
$00000008 |
$00000001 |
BE | -8 % (-7) |
$FFFFFFF8 |
$FFFFFFFF |
BF | $80000000 / (-1) |
$00000000 |
$80000000 |
BG | 0 / 0 |
$00000000 |
$00000000 |
BH | 3 / 0 |
$00000000 |
$00000000 |
BI | -2 / 0 |
$00000000 |
$00000000 |
BJ | $80000000 % (-1) |
$80000000 |
$00000000 |
BK | 0 % 0 |
$00000000 |
$00000000 |
BL | 3 % 0 |
$00000003 |
$00000003 |
BM | -2 % 0 |
$FFFFFFFE |
$FFFFFFFE |
BN | $ffffffff > 0 |
$00000001 |
$00000000 |
BP | -1 = $ffffffff |
$00000001 |
$00000001 |
BQ | $7fffffff < $80000000 |
$00000001 |
$00000000 |
BR | -$80000000 = $80000000 |
$00000001 |
$00000001 |
BS | -3 >= (-5) |
$00000001 |
$00000001 |
BT | -42 <= (-42) |
$00000001 |
$00000001 |
BU | !$80000000 |
$00000000 |
$00000000 |
BV | !!(-1) |
$00000001 |
$00000001 |
BW | ~0 |
$FFFFFFFF |
$FFFFFFFF |
BX | ~$7fffffff |
$80000000 |
$80000000 |
BY | ~$20 & $30 |
$00000010 |
$00000010 |
BZ | -1 & $80000000 |
$80000000 |
$80000000 |
CA | $42 & (-1) |
$00000042 |
$00000042 |
CB | ~$c0 | (~$60) |
$FFFFFFBF |
$FFFFFFBF |
CC | $42 | $80000000 |
$80000042 |
$80000042 |
CD | $42 | (-1) |
$FFFFFFFF |
$FFFFFFFF |
CE | $fffffffe ^ $fffffffd |
$00000003 |
$00000003 |
CF | $ffffffff ^ $ffff0000 |
$0000FFFF |
$0000FFFF |
CG | 1 ^^ 2 |
$00000000 |
$00000000 |
CH | -4 || 0 |
$00000001 |
$00000001 |
CI | -4 && 0 |
$00000000 |
$00000000 |
CJ | (3 < 5) + (10 <= 20) + (45 >= 20) + (70 > 50) |
$00000004 |
$00000004 |
CK | -(8 = 8) - (10 == 10) - (20 <> 30) - (40 != 50) |
$FFFFFFFC |
$FFFFFFFC |
CL | (1 ^^ 0) != (0 ^^ 1) != (0 ^^ 0) |
$00000000 |
$00000000 |
CM | 1 * 2 + 3 * 4 - 5 |
$00000009 |
$00000009 |
CN | 2 * 3 / 4 * 5 |
$00000005 |
$00000005 |
CP | $f0 & $30 = $30 |
$00000001 |
$00000001 |
CQ | $f0 & 1 << 6 |
$00000040 |
$00000040 |
CR | $40 | (-1) & $20 |
$00000060 |
$00000060 |
CS | -1 < 5 || 4 > 2 |
$00000001 |
$00000001 |
CT | -1 < 5 && 4 > 2 |
$00000000 |
$00000001 |
CU | -1 < 5 ^^ 4 < (-2) |
$00000001 |
$00000001 |
CV | 2 < 3 && 3 < 1 || 4 > (-2) |
$00000000 |
$00000001 |
CW | 2 = 3 < 5 = 5 |
$00000001 |
$00000001 |
CX | 1 << 25 * 1 << 6 |
$80000000 |
$80000000 |
CY | -1 >> 4 ** 1 << 8 |
$0000000F |
$FFFFFFFF |
CZ | -$532 / $10 % $10 |
$0000000C |
$FFFFFFFD |
These are constant expressions involving banked and unbanked symbols. They assume the following symbols are defined:
TT
, set to$0:$CAFE
VV
, set to$FFFF
WW
, set to$3:$DDDD
XX
, set to$F:$4000
YY
, set to$0:$4000
ZZ
, set to$4242
# | Expression | Unsigned | Signed |
---|---|---|---|
DA | ZZ |
$00004242 |
$00004242 |
DB | WW |
$0000DDDD |
$0000DDDD |
DC | YY |
$00004000 |
$00004000 |
DD | &&ZZ |
$00000000 |
$00000000 |
DE | &&VV |
$00000000 |
$00000000 |
DF | &&XX |
$0000000F |
$0000000F |
DG | &&YY |
$00000000 |
$00000000 |
DH | WW - TT |
$000012DF |
$000012DF |
DI | XX * VV |
$3FFFC000 |
$3FFFC000 |
DJ | VV + 1 |
$00010000 |
$00010000 |
DK | XX = YY |
$00000001 |
$00000001 |
DL | &&TT = (&&YY) |
$00000001 |
$00000001 |
DM | WW << 16 >> 16 |
$0000DDDD |
$FFFFDDDD |
DN | YY + VV |
$00013FFF |
$00013FFF |
DP | XX - ZZ |
$FFFFFDBE |
$FFFFFDBE |
DQ | (XX - ZZ) / 4 |
$3FFFFF6F |
$FFFFFF70 |
DR | (XX - ZZ) >> 2 |
$3FFFFF6F |
$FFFFFF6F |
DS | &&WW + WW |
$0000DDE0 |
$0000DDE0 |
DT | XX / (&&WW) |
$00001555 |
$00001555 |
DU | &&XX > WW |
$00000000 |
$00000000 |
DV | &&XX > (&&WW) |
$00000001 |
$00000001 |
DW | XX > YY |
$00000000 |
$00000000 |
DX | -WW |
$FFFF2223 |
$FFFF2223 |
DY | &&XX << 14 + (XX & $3fff) |
$0003C000 |
$0003C000 |
DZ | -1 = VV |
$00000000 |
$00000000 |
These examples are constant address expressions, as opposed to the usual expressions shown in previous sections. They use the same symbol definitions from the previous section.
Note that, as specified in the conventions, bank numbers are shown in full 32-bit form, without any truncation. Some of these addresses may be invalid; this would be checked if needed after calculating them.
# | Address expression | Unsigned | Signed |
---|---|---|---|
EA | 0 |
:$0000 |
:$0000 |
EB | 6000 |
:$1770 |
:$1770 |
EC | 20:6000 |
$00000014:$1770 |
$00000014:$1770 |
ED | $6000 |
:$6000 |
:$6000 |
EE | $20:6000 |
$00000020:$1770 |
$00000020:$1770 |
EF | 20:$6000 |
$00000014:$6000 |
$00000014:$6000 |
EG | $20:$6000 |
$00000020:$6000 |
$00000020:$6000 |
EH | 0:$6000 |
$00000000:$6000 |
$00000000:$6000 |
EI | $12345 |
:$2345 |
:$2345 |
EJ | $12345 >> 4 |
:$1234 |
:$1234 |
EK | -1 >> 18 |
:$3FFF |
:$FFFF |
EL | $42424 >> 14:($42424 & $3fff) + $4000 |
$00000010:$6424 |
$00000010:$6424 |
EM | ZZ |
:$4242 |
:$4242 |
EN | WW |
$00000003:$DDDD |
$00000003:$DDDD |
EP | YY |
$00000000:$4000 |
$00000000:$4000 |
EQ | :ZZ |
:$4242 |
:$4242 |
ER | :WW |
:$DDDD |
:$DDDD |
ES | :YY |
:$4000 |
:$4000 |
ET | &&ZZ:ZZ |
$00000000:$4242 |
$00000000:$4242 |
EU | &&WW:WW |
$00000003:$DDDD |
$00000003:$DDDD |
EV | &&YY:YY |
$00000000:$4000 |
$00000000:$4000 |
EW | ZZ + 1 |
:$4243 |
:$4243 |
EX | WW + 1 |
$00000003:$DDDE |
$00000003:$DDDE |
EY | 1 + WW |
:$DDDE |
:$DDDE |
EZ | (TT) + VV |
$00000000:$CAFD |
$00000000:$CAFD |
FA | VV + TT |
:$CAFD |
:$CAFD |
FB | WW - TT |
$00000003:$12DF |
$00000003:$12DF |
FC | XX - (XX - ZZ) / 4 |
$0000000F:$4091 |
$0000000F:$4090 |
FD | $42:WW |
$00000042:$DDDD |
$00000042:$DDDD |
FE | VV:VV |
$0000FFFF:$FFFF |
$0000FFFF:$FFFF |
FF | :&&YY |
:$0000 |
:$0000 |
FG | WW * VV |
$00000003:$2223 |
$00000003:$2223 |
FH | YY + VV |
$00000000:$3FFF |
$00000000:$3FFF |
FI | YY - 1 |
$00000000:$3FFF |
$00000000:$3FFF |
FJ | -WW |
:$2223 |
:$2223 |
FK | XX = YY |
$0000000F:$0001 |
$0000000F:$0001 |
FL | WW + (TT - WW) / 2 |
$00000003:$D46D |
$00000003:$D46E |
FM | (TT - WW) / 2 + WW |
$00000000:$D46D |
$00000000:$D46E |
FN | ~(&&WW):~WW |
$FFFFFFFC:$2222 |
$FFFFFFFC:$2222 |
FP | XX + (ZZ - XX) / 17 |
$0000000F:$4022 |
$0000000F:$4022 |
FQ | XX - (XX - ZZ) / 17 |
$0000000F:$3113 |
$0000000F:$4022 |
FR | $23:$4567 + 1 |
$00000023:$4568 |
$00000023:$4568 |
FS | 1 + $23:$4567 |
$00000024:$4567 |
$00000024:$4567 |
FT | 2:WW > $d800 |
$00000002:$0001 |
$00000002:$0001 |
FU | $d800 < 2:WW |
$00000000:$DDDD |
$00000000:$DDDD |