Skip to content

ec429/bast

Repository files navigation

bast - ZX Basic tape compiler - readme
	bast reads text files containing ZX Basic programs (and machine code hex dumps), and creates a tape file (.TAP format) with those programs saved on it

Usage:
	bast [[-b] <basfile>]* [-l <objfile>]* [-O[-] <optim>]* [-W[-] <warning>]* -t <tapfile> [--emu]

Each <basfile> specifies a BASIC file to be saved onto the tape.  If <basfile> begins with a '-' you will need to use the -b specifier
Each <objfile> specifies a BINARY object file to be saved onto the tape
Each <optim> turns on an optimisation (-O- <optim> turns <optim> off).  See 'Optimisations' below.
Each <warning> turns on a warning (-W- <warning> turns <warning> off).  See 'Warnings' below.
<tapfile> specifies the output TAP file
--emu tells bast to open the created TAP file in an emulator once compilation has finished.  The command used is obtained by evaluating the environment variable $EMU and replacing each instance of the percent-sign '%' with the filename.  For instance, to use Phil Kendall's FUSE, you might type "export EMU=fuse\ %" (without quotes) at your shell prompt

Optimisations:
-O cut-numbers		The text parts of floating point numbers are replaced with '.', which is much shorter.  Since the BASIC interpreter on the Spectrum only reads the numeric part (the five bytes after the 0x0E), the textual representation need not be supplied.  However, it is then difficult to edit the program on the Spectrum and listings do not make sense (as all the numbers appear to be missing).  Off by default

Warnings:
-W all				Enables all warnings
-W object-length	Warns if an object file is missing a length directive (*xxxx).  Off by default
-W object-checksum	Warns if an object file is missing a checksum (== xx); only the first missing checksum in each file is reported.  On by default
-W se-basic			Warns if SE BASIC tokens have been used.  On by default
-W embedded-newline	Warns if a newline literal has been embedded in a line with a \xx escape, as this will likely confuse the Spectrum.  On by default

Source file directives:
	#pragma <pr> [<args>]	#pragma directives must precede all other directives and all source lines; only blank lines may precede a #pragma.  They must also be at start-of-line (ie. they may not be preceded by spaces or tabs)
	#pragma name <name>		If eg. TAP output is produced, <name> will be used as the Name of the segment resulting from this file
	#pragma line <line>		If eg. TAP output is produced, this file will be stored as though saved with 'SAVE "<name>" LINE <line>' (i.e., <line> is the autorun line)
	#pragma renum [=<start>] [+<offset>] [-<end>]	This file is not numbered (only labelled) and should be auto-numbered.  #pragma renum and line-numbers may not be mixed in a single source file: if you are going to auto-number, don't hardcode numbers in eg. GOTOs as these will NOT be updated.  The numberings of separate BASIC segments are completely unrelated; don't expect to be able to MERGE them unless you've specified a <start> and <end>.  <start> is the number to use for the first line; subsequent numbers step by at most <offset> (10 if not given); if this would overrun <end> (default 9999), the offset will be reduced, trying each of 8, 6, 5, 4, 3, 2, and 1.  If it still won't fit with a step of 1, bast throws an error

Other source file non-basic entities:
	.<label>				A label.  <label> must match "[[:alpha:]][[:alnum:]_]*"; that is, it must start with a letter (either case) and consist of letters, underscores and numbers only.  Labels must occur at the start of line; that is, they may not be preceded by whitespace.  They should be followed by a newline
	%<label>[+<index>]		Within an expression, is replaced by the line number of label <label>, which need not be in the same source file.  If <index> is present, it is added to the value when it is computed (<index> must be a hex pair and may range from +7F to -80)
	@<label>[+<index>]		Within an expression, is replaced by the address of the start-of-text of the line labelled <label>.  If <index> is present, it is added to the value when it is computed (<index> must be a hex pair and may range from +7F to -80)
	!link <objfile>			Expands to a REM statement containing the object code from <objfile> starting from the byte following the REM.  A typical design pattern is to give the line a label, and call the object code with 'usr @label+01'
	!hex <hex>			Within an expression, is replaced by the decimal value of hexadecimal <hex> (ie. like BIN).  E.g. "!HEX 1FF" -> "511"
	!oct <oct>			Within an expression, is replaced by the decimal value of octal <oct>.  E.g. "!OCT 307" -> "199"

Compilation process
Step 0: Director.  Directives are parsed and where possible acted upon
Step 1: Tokeniser.  Each line of BASIC is split into a series of tokens (such as KEYWORDS (characters 0xA3 to 0xFF) and numbers (in Sinclair floating point notation: 0x0E + 4mantissa + 1exponent, or 0E 00 {00|FF}sign LSB MSB 00 for small integers))
Step 2: Linker.  If a segment has no name, one will be generated for it of the form basN or binN where N starts from 0.  Renumbering is performed if directed.  Labels are translated
Step 3: Output.  Whichever kind of output is required (objects, tape, etc), is produced

Format of object files (.obj)
Optional ORG directive of the form '@B1FF' where B1FF is some hex value (big endian!).  If the code is to be linked it must have an ORG; if it is !linked (REM statements) it should not have, and any ORG it does have will be ignored
Optional NAME directive of the form '#<name>'
Optional length directive of the form '*009A' where 009a is some (big endian) hex value, the count of bytes
Rows of eight bytes followed by '==' and a checksum byte, all in hex pairs, like '01 00 00 c9 FD CB 01 81 == 7E'.  The checksum byte is the XOR of all eight data bytes.  If the file ends in the middle of a line, pad to length with '$$' entries