Skip to content

STM8 eForth Compiler and Assembly

Thomas edited this page Jan 13, 2019 · 16 revisions

Why Assembler?

STM8 eForth packs an optimised and efficient Forth into a small amount of Flash. The remaining space is very suitable for the majority of programming tasks. However, there are times when a little bit of Assembly goes a long way. Some words can be coded in Assembler so efficiently that a much larger task can be squeezed into the available space. And there are times where you will want to emulate start-up delays like an AVR fuse setting does on Atmel MCU's. That requires some Assembler and patching of the reset vector. Forth is by nature close to the underlying machine code of the MCU. Understanding Assembler can allow you to achieve things in STM8 eForth that would otherwise be impossible.

Another reason to write some assembler code is optimizing inner loops of time critical routines, e.g. for "bit-banging" or for interrupt service routines.

Assembly code fragments can also be generated by Forth "compiler extensions" words. Examples are ]B! or ]B? for setting or testing bits.

Code Generated by the Compiler

STM8 eForth is a Subroutine Threaded (STC) Forth with 16bit word width. Unlike DTC or ITC code, STC doesn't require an "Inner Interpreter". The native CALL instruction replaces the interpreter, and compiled Forth is simply executable machine code with subroutine calls.

In order to reduce the (sometimes substantial) overhead, STM8 eForth compiler applies the following optimizations:

STM8 eForth (bytes) Standard STC (bytes) DTC (bytes) Comment
RET (1) CALL EXIT (3) EXIT (2)
JP <TARGET> (3) CALL BRANCH <TARGET> (5) BRANCH <target> (4) also see ALIAS feature
CALLR <rel> (2) CALL <target> (3) <target> (2) only calls "within reach"
TRAP <literal> (3) CALL DOLIT <literal> (5) DOLIT <literal> (4) TRAP serves as a pseudo-opcode

The optimizations in STM8 eForth improve code density. Depending on the amount of control structures and literals the result can be comparable to DTC code.

Note that the ALIAS feature for "headerless" code depends on the optimization for BRANCH, the STM8 opcode JP as the first byte in the code field. This feature, together with the STC capabilities for in-line assembly, can be used for writing dense code, similar to hand-optimized assembly (XH-M194/board.fs is a good example).

Forth Words in Assembly

The STM8 X register serves as the Data Stack Pointer. Except for intended stack pointer manipulations, Forth words in assembly must leave the value of X unchanged.

The STM8 eForth library folder contains a number of words that create machine code at compile time. Especially words that write constants to memory, set or reset bits, or inject special opcodes like HALT are useful.

The STM8 16 bit instruction set isn't fully symmetrical. For TOS manipulation, it's useful to make the X and Y registers perform a little "dance" around the Opcode EXGW. An example is SRL in the library:

\ SRL: shift right logical (unsigned divide by 2)
: SRL ( n -- n )
   \ LDW Y,X , LDW X,(X) , SRLW X , EXGW X,Y , LDW (X),Y
   [ $9093 , $FE C, $54 C, $51 C, $FF C, ]
   ;

STM8 addressing modes that use the X register are more compact than those using the Y register. The helper word DOXCODE can be used for writing compact assembly code by providing an "execution capsule" for TOS in X. Here is the CRC-16-ANSI code from the STM8 eForth library as an example:

\ STM8: CRC-16-ANSI,  polynomial x16 + x15 + x2 + 1
#require DOXCODE

\ CRC-16-ANSI (seed with -1 for Modbus CRC)
: CRC16 ( n c -- n )
   XOR DOXCODE [
      $A608 ,  \         LD      A,#8
      $54 C,   \ 1$:     SRLW    X
      $2407 ,  \         JRNC    2$
      $01 C,   \         RRWA    X   ; XOR X,#0xA001
      $A801 ,  \         XOR     A,#0x01  
      $01 C,   \         RRWA    X
      $A8A0 ,  \         XOR     A,#0xA0
      $01 C,   \         RRWA    X
      $4A C,   \ 2$:     DEC     A
      $26F3 ,  \         JRNE    1$
   ] ;

Mixing Forth Code and Assembly

Compiled Forth code and assembly code can be mixed as long as X, the Data Stack Pointer, remains unchanged (except for stack pointer manipulation).

The interpreter mode can be used to assemble machine code. For an example the following code compiles either a BSET <addr>,#<bit> or a BRES <addr>,#<bit>:

\ STM8EF : ]B!                                                         MM-170927
  \ Enable the compile mode and compile the code to set|reset the bit at addr.
  : ]B! ( 1|0 addr bit -- )
     ROT 0= 1 AND SWAP 2* $10 + + $72 C, C, , ]
; IMMEDIATE

There are more words in the library that directly compile STM8 machine code:

Word Description
]BC copy bit from memory location to C flag
]CB copy bit from C flag to memory location
]B? copy bit from memory location to Boolean on data stack
]C! copy byte constant to memory location

In compiled Forth code A, Y, and YTEMP are working registers that, after the execution of most core Forth words, may contain the TOS value. Assembly can be mixed with core Forth words as follows:

The STM8EF glossary docs/words.md (a list of Forth words in forth.asm) describes the effect on flags and scratchpad variables in an extended data-flow notation:

;       @       ( a -- n )      ( TOS STM8: -- Y,Z,N )
;       Push memory location to stack

For programming in assembly this stack comment can be read as: "After execution of @ the register Y contains the TOS (top of stack) value, and the flags Z (zero) and N (negative) correspond to n"). The Forth stack comment conventions as described here.

There are Forth words for passing data to or from the stack and A or Y, e.g. for using these registers as index to the memory, or for coding inner loops of low-level peripherals access or interrupt routines:

Word Description
>A pull data stack to A and Flags
A@ push contents of A:shortAddr to data stack
A> push A to data stack
>Y pull data stack to Y and Flags
Y@ push contents of Y:Addr to data stack
Y> push Y to data stack

These words are available as ALIAS definitions in a out/<board/target.

Forth Cross-Assembler in e4thcom

There is an experimental Forth cross-assembler in e4thcom. Some STM8 op-codes can not yet be rendered, and it's not fully tested. If you're interested in testing it, please write an issue.

Forth Boot

This section uses the MINDEV board as an example. Enough information is provided to allow you to apply these concepts to the other supported boards.

During the start-up process the STM8 chip waits until the power supply voltage rises to an acceptable voltage, sets default register values and then starts program execution at the reset vector address, $8000. Looking at the file MINDEV.ihx with the ST visual programmer one can see that the first four bytes of machine code at $8000 is $82 $00 $80 $6B. This is the Reset Vector address.

On reset, the program execution begins at $806B which hooks into COLD, which calls 'BOOT and then we jump to QUIT to start interpreting.

The mechanics of this are not vital. But look at COLD in the listing FORTH.RST and you will see at line 519 COLD initialises the UART. Which is fine until you discover your device hanging off the UART has yet to initialise and a latch-up of that attached device occurs.

The fix? With an AVR we would set a fuse to delay start up until more clock cycles have passed. With the STM8 we can do this from the terminal prompt:

NVM HERE \ pushes address we are about to start writing op-codes to onto the stack
      $90AE , 12500 ,  \     LDW   Y, #(t*8)
      $A604 ,          \ 1$: LD    A, #4
      $4A C,           \ 2$: DEC   A
      $26FD ,          \     JRNE  2$ 
      $905A ,          \     DECW  Y
      $26F7 ,          \     JRNE  1$
      $CC C, $8002 @ , \     JP    #(RESET)  remember, $8002 was were the reset vector was pointing to.
$8002 ! RAM            \ save the address we started writing the op-codes at into the reset vector.

Now, every time the device is turned on it calls this piece of code which does nothing more than insert a delay before COLD is executed. With a value of 125 the routine above will result in a 1ms delay (125 * 8µs) before COLD will be even called. A value of 12500 would delay just 100ms. Depending on the required delay some bytes can be saved by removing the inner loop.

If at a later date you wish to RESET then you must restore the reset vector first with:

NVM 
$8002 @ $E +  \ Get the address of the new reset vector, add $E bytes to where the old reset vector was saved, 
@ 8002 !      \ get the old reset vector address and save it back into the reset vector
RAM

'BOOT will, of course, work normally.

You can’t perform that action at this time.