Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two-way interoperability with C and code compiled with other tools #8

Closed
fidergo-stephane-gourichon opened this issue Dec 27, 2020 · 13 comments

Comments

@fidergo-stephane-gourichon

Hi Roudoudou! Note that I just state the ideal case, I don't expect you to say you'll do anything about it!

Need

In my "dreams", one could define common values (a.k.a. symbols) including constants, values compiled from other values, possibly with formulas, etc, and use them in all source code involved in a project, whatever their source language (C, ASM, others), and get final binaries incorporating everything cleanly, for example laying out everything in memory as needed.

What rasm does

Rasm allows to define symbols, compute values from one another with formulas, this is nice, though currently works only with ASM code in RASM syntax.

Also, rasm doesn't compile C. That's fine, I don't expect rasm to compile all languages of this planet! ;-)

Question

Is there any plan for RASM to be able to:

  • in one direction, incorporate externally compiled stuff (e.g. from C) into RASM processing
  • in the other direction, compute values that become available to other steps (actually I'm not even convinced that it's RASM job to to that)

Think before you ask

Available tools

  • rasm, obviously
  • SDCC toolchain (sdcc compiler, sdasz80 assembler, sdld linker, sdcpp preprocessor and a few others) is a complete toolchain similar to those are used to build executables for modern OSes
  • something else?

Of particular interest is SDCC, a solid C compiler (it even does most of C11 including generics), can generate several formats :

  • generated Z80 assembly source code with SDCC syntax, .asm (example ld -4 (ix), a).
  • relocatable compiled code .rel (ascii-based format with in and out symbol information, area/sections, actual bytes in hexa, and byte-by-byte relocation information)
  • libraries .lib, traditionally used as intermediate format by many compilation toolchains (assembler, compiler, linker), it is mostly text, with .rel files embedded, and an optional common binary header (not sure it's necessary to parse it in the case here)

Additional, somewhat important information

  • rasm compiles one of the known Z80 syntaxes, not the one produced by SDCC (e.g. ld (ix-4),a not ld -4(ix),a.
  • support for .lib files is especially interesting. SDCC linker is smart enough to pick only modules that have at least one symbol referenced in other code. This allows to keep precompiled libraries of Z80 code, link them at will without recompiling any of the original code, knowing that no useless byte gets into the final binary.

Possible ways to achieve the goal

  • Preprocess .asm files to generate (kind-of readable) asm source code with rasm syntax, feed that to rasm.

  • Preprocess .rel files to generate (unreadable) asm source code with rasm syntax, feed that to rasm.

  • Modify rasm so that it can consume .asm, .rel or .lib (would that be more flexible?).

  • Modify rasm so that it can pick only "modules" (or whatever it's named) that are actually referenced by code, so that we get again the "generated binary only contains actually used stuff" nice behavior. This is mostly independent of the rest and could be implemented independently.

Question

What do you think?

Again, note that I just state the ideal case, I don't expect you to say you'll do anything about it!

@EdouardBERGE
Copy link
Owner

from SDCC to Rasm:
I guess the key is with Disark, provided with Arkos Tracker 2 in tools subdirectory
Z80 Disassembler. Disassembles any Z80 binary, reconstructs accurately the original code via its symbol table, including regions (code, byte, word, and pointer array).
you will transcode SDCC code to binary code to Rasm code

there is also https://gitlab.com/norecess464/sdcc2pasmo
very short project but i guess this may be fine with pasmo import for Rasm

for lib import i will need documentation (which is not on SDCC Wiki).

from Rasm to SDCC
If you mean .lib output support, i guess this is already achievable with macros (anyway, no doc on SDCC Wiki again)

note: Rasm already knows to pick only modules referenced by code with IFUSED directive but it supposed to be part of Rasm assembly

let me know if you get informations about that LIB format

@fidergo-stephane-gourichon
Copy link
Author

Thanks for your attention.

I guess the key is with Disark, provided with Arkos Tracker 2 in tools subdirectory
Z80 Disassembler. Disassembles any Z80 binary, reconstructs accurately the original code via its symbol table, including regions (code, byte, word, and pointer array).
you will transcode SDCC code to binary code to Rasm code

Thanks for mentioning Disark. I cooperated with Arkos when making my latest game, reported some bugs and issues. AT2 is a great piece of engineering.

Disark was created on a need for the other direction: AT2 generates source code for RASM, disark makes it possible to compile AT2 players then decompile to another assembler of the supported syntaxes (winape, maxam, pasmo, sdcc, vasm, orgams). I actually used it for my latest game (because AT2 songs are generated with rasm syntax and my game is all in C and ASM with sdcc syntax) and it worked in that case, but that added high complexity to the build process, things could be made simpler.

Disark could be used in the other direction, yet there are two problems. First, it is not open-source, which I avoid. Also, Disark is not magical: it relies on the source code being precisely marked with "semantic labels" http://julien-nevo.com/disark/index.php/examples/ http://julien-nevo.com/disark/index.php/label-semantics/ . Else, when you recompile the generated source to a different address, Disark has to guess and can guess wrong. Source code generated by AT2 has this markers. Other source codes won't have them, the cost of making them manually is high, and source found on the Internet will not have those labels.

This would be better solved if rasm can output in some relocatable format, like .rel.

there is also https://gitlab.com/norecess464/sdcc2pasmo
very short project but i guess this may be fine with pasmo import for Rasm

I cannot compile it on Linux due to Windows-specific code https://gitlab.com/norecess464/sdcc2pasmo/-/blob/master/src/crt.cpp#L43 Looks like it is code to find resources embedded in windows binary, this would have to be ported for Linux. Does the project work well for all source code that compiles with sdasz80? Anyway it would solve only one direction, and not the one I miss the most.

For the record, my latest game is fully compiled with SDCC, including music code converted by Disark. The main loader is made with rasm because I wanted to give it a try, and it worked because these are separate stages: sdcc-compiled code is just binary data in a file, from rasm-compiled-code point-of-view. Even assuming all code was converted to source code that rasm can read, one invocation of rasm would not have worked anyway, because rasm builds one image in memory and dumps it to output file, while my program has different things in the same memory addresses at different stages.

for lib import i will need documentation (which is not on SDCC Wiki).
let me know if you get informations about that LIB format

.rel format is described on e.g. https://github.com/ixaxaar/sdcc/blob/6f6182eb7ea0f8ea1b653eb00e0cf8f06ab85a3e/sdas/doc/asmlnk.txt#L5102

.lib format is described on https://en.wikipedia.org/wiki/Ar_(Unix)

from Rasm to SDCC
If you mean .lib output support, i guess this is already achievable with macros (anyway, no doc on SDCC Wiki again)

Gathering several .rel to a .lib is already possible with ar rcs foo.lib a.rel b.rel c.rel or sdar rcs foo.lib a.rel b.rel c.rel, so on "rasm generating something" side, generating a .rel is highly valuable, generating a .lib maybe less important.

note: Rasm already knows to pick only modules referenced by code with IFUSED directive but it supposed to be part of Rasm assembly

I see section 5.2.5 of current rasm documentation. I would never have understood this. Even after reading your reply I'm not even sure I understand how it can fit the need.
A module possibly defines and uses many symbols. Do you mean that each module could be surrounded like below?

IFUSED <list of all symbols exported by module>
...module source code...
ENDIF

Make rasm play nice with downstream tools

rasm is currently useful to create final binaries. Ability to generate relocatable files (.rel or others) would make it useful also when generating files to be processed in any workflow.

.rel format can be thought of as a kind of "binary" (actually stored in ASCII form) that provides in a standardized format information to downstream programs (after RASM). So, you generate something like "Hello! Here are my compiled binary bytes: 65 34 0f 8e etc. To relocate thes binary bytes to final address RELOCATED (which I can't know in advance), you will need to add the 16bit value of RELOCATED to bytes 123/124, also to bytes 200/201, etc. Also, code was compiled referencing value of unknown external symbol A, you'll have to add value of A to bytes 456/457. Also, this binary data, when relocated, supply to other code symbol B which is A+0x1890 and symbol C which is constant 0xabcd". The format being independent of any assembler and relatively simple to generate.

For example, such possibility would instantly simplify integration of AT2 tracks, by removing the need to use the closed-source Disark. Just compile the generated rasm source to a .rel. That .rel joins the others (from C, asm, sprite data, music data, whatever) gathered by the linker which then makes the final binaries at addresses automatically figured out, with module automatic module pruning.

All in all, .rel output would be a great evolution to rasm!

@redbug26
Copy link

I’m also interrested by .rel output ;)

@EdouardBERGE
Copy link
Owner

hi
i'm still thinking about it, did not have time this days
i will have to install SDCC and take a look as the documentation is... ...minimalistic
thank you for your patience, this will be a cool feature :)

@fidergo-stephane-gourichon
Copy link
Author

Thanks Roudoudou for your interest!

Regarding documentation :

The doc above mentions the linker and file format in several areas, so not seeing an answer in one area means one should probably look in other areas.

You might like this, not directly related to .REL file format and linking, still in the general topic:

@EdouardBERGE
Copy link
Owner

After some retro-inge + doc reading, i still do not understand 100% how REL works so...

I started something similar instead of being blocked ^_^

Here is a sample code

BUILDOBJ
external unevar
var_internal=50
ld hl,unevar
call unevar
ld hl,var_internal
call var_internal

lz48:defs 20:ei:ret:ei:ret:lzclose ; test relocation of external and jump relocation

ld (unevar),hl
call #1000
call #10

and here is the output

EXTERNAL UNEVAR 0x0001 2
EXTERNAL UNEVAR 0x0004 2
EXTERNAL UNEVAR 0x0017 2
RELOCATION 0x001D
LONGJUMP 0x001A ; CALL or JP outside scope
LONGJUMP 0x000A ; CALL or JP outside scope
DATA START=0x0000 LEN=0x001F
BYTE 0x21,0x00,0x00,0xCD,0x00,0x00,0x21,0x32,0x00,0xCD,0x32,0x00,0x00,0x0F,0x01,0x00
BYTE 0x40,0xFB,0xC9,0xFB,0xC9,0xFF,0x22,0x00,0x00,0xCD,0x00,0x10,0xCD,0x10,0x00

As you can see, absolute CALL/JP inside scope are tagged "RELOCATION" whereas CALL/JP outside scope are anotated LONGJUMP but not supposed to be modifie (syscall, player, ...)

Will try to read again documentation ;)

@cpcitor
Copy link

cpcitor commented Apr 20, 2022

Hi @EdouardBERGE.

I must have missed the notification about this.

Seems like a good start.

As you can see, absolute CALL/JP inside scope are tagged "RELOCATION" whereas CALL/JP outside scope are anotated LONGJUMP but not supposed to be modifie (syscall, player, ...)

I could understand this sentence after reading it several times and counting bytes. Answer: yes, I see and all of this has a "consistent" feels.

I also learn what bytes are generated by lz48 and lzclose. :-)

0x00,0x0F,0x01,0x00
BYTE 0x40
0xFF

I guess this is early debug output. Do you want to create your own relocatable format? To get most benefits with smallest effort I would start with generating and reading the relocatable output format used by SDCC.

From documentation (about crunched zones):

You cannot call a label located after a crunched zone from the crunched zone because RASM cannot determine where it will be located after crunching. This will trigger an error.

(Personally, I would not want to use lz48 or similar. These seem too magical I don't know how it works, where my code will be, etc. For https://cpcitor.itch.io/just-get-9 I implemented my own workflow where a loader loads compressed sections, moves them in RAM as needed to overcome firmware limitations, then uncompress them, without using any LZ macro of Rasm.)

Back to the topic, I see that packing zillions of features can make more complex to support many combinations. Still, I'm wondering whether supporting relocation may allow to call a label located after a crunched zone. Even if not, that's okay.

You don't have to support all combinations. Even only some support to generate relocatable objects in SDCC format would be great for me, no more Disark dependency to build prods with AT2 music, simplified build process. And more benefits with the other way.

@EdouardBERGE
Copy link
Owner

To get most benefits with smallest effort I would start with generating and reading the relocatable output format used by SDCC.

I did not find relevant informations about REL format in your links so...
I did some very simple REL output with a few functions, externals, in order to reverse engineering REL format
But it mostly looks like an unnecessary complex system AND obfuscated

I really wanted to output REL format but without a proper documentation, i cannot go further
Note that everything is ready for it, as you can see in the prototype

@cpcitor
Copy link

cpcitor commented Apr 20, 2022

Ok, this motivates that I find some more doc or even write it. Don't hold your breath as I'm busy all around, but I'll keep this in a corner of my mind and I'll be back.

@EdouardBERGE
Copy link
Owner

looks like REL obfuscation finally beats us ?

@cpcitor
Copy link

cpcitor commented Dec 9, 2022

Hi @EdouardBERGE .

I have not notified here, yet there is progress.

In the meantime (sprinkled in late August, September and October), based on documentation I linked above, for example https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc/sdas/doc/asmlnk.txt#l5187 , I have started to write a linker that is (not yet but to be one day) aware of CPC specifics.

Currently, it reads most of what we need the .rel format, yet not enough to be really useful. It's been paused since, after struggling with some details of how relocation information is encoded. The .rel format can encode more than we need and has some aspects specific to other machines, that I won't try to parse (just exit with an error message, if encountered). That's okay.

Milestones will be when it can generate a proper complete relocated simple executable, then a working build of "Just Get 9", then when cpc-dev-tool-chain will provide support for exotic memory layout (including extra banks) with what is usually called a generic "run-time linker". Il will probably need that for an ambitious prod to come (but don't hold your breath).

Efforts are not completed yet. The experimental code you wrote (the code that produced the output you showed above) is an interesting starting point. This will be useful if/when I contribute a relocatable output (whatever the format) to rasm. Can you share that code in a github branch?

Thanks.

@EdouardBERGE
Copy link
Owner

EdouardBERGE commented Dec 9, 2022

In fact the OBJ output is already in the 1.7 release from april
So you could test like sample code i posted in february if you want
For the moment, i wont go any further with that obfuscated REL output* : /

*you posted me the 10 lines "documentation" x4 times, it does not help. I got moar informations by reverse and there is still mysteries

@cpcitor
Copy link

cpcitor commented Dec 9, 2022

*you posted me the 10 lines "documentation" x4 times, it does not help. I got moar informations by reverse and there is still mysteries

Ah, sorry. I also had to look at the code for missing details. Sdcc source code style feels very old (indeed I guess large chunks of it were literally written in the previous century). Like, optimized to do just one job at a time from bytes streaming from input file to output file, no big in-memory data structure, etc. It feels like some parts of it (the assembler, the linker) could even run on a CPC.

In fact the OBJ output is already in the 1.7 release from april

Great! Now I understand why you marked this issue completed.

So you could test like sample code i posted in february if you want
For the moment, i wont go any further with that obfuscated REL output* : /

This should be fine. A solution is that my linker also reads your relocatable format. Will focus on it when time permits.

Cheers! 😺

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants