SubX in SubX: computing addresses for labels #34

akkartik · 2019-05-18T22:01:29Z

This branch is work in progress on implementing SubX in SubX. The big-picture plan is to divide up the work into the following phases:

$ cat file1.subx file2.subx ... |dquotes |assort |pack |survey |hex > a.elf

Each phase is an ELF binary constructed out of a corresponding .subx file.

hex.subx converts a sequence of bytes in hex into a binary. (done)
survey.subx will turn labels into addresses, and also slap on ELF headers. It's called 'survey' because the act of imposing a coordinate system on a program feels somewhat like surveying terrain and determining various distances and heights. (in progress here)
pack.subx converts bitfields in instructions into whole bytes in the right order. For example, 0/mod 1/rm32 1/r32 is turned into the single byte 08. (done)
assort.subx just reads in many segments starting with == and containing lines of code or data, and concatenates all segments with the same name. (done)
dquotes.subx moves string literals in the code segment to the data segment. (done)

This branch is where the survey phase is being built.

To run its tests (and so see how far along things are):

$ ./subx translate 0*.subx apps/subx-common.subx apps/survey.subx -o apps/survey  &&  \
      ./subx run apps/survey test

Contact me if you'd like to contribute. Commit access freely given.

akkartik · 2019-05-18T22:03:24Z

@charles-l I just spun up this PR for survey.subx. No failing tests yet. Only mildly interesting thing to look at may be the ELF header definitions.

add lengths to data blobs

akkartik · 2019-05-19T19:50:28Z

@charles-l I'm still cleaning up #32 so haven't written any tests here yet. If you're looking for something to do, could you create a function here called emit-elf-header that prints out the Elf_header string? For now convert can consist just of emit-elf-header and then cating stdin to stdout.

Any tests I think of for this seem pretty silly, so let's just test it manually at the commandline.

akkartik · 2019-05-21T05:32:31Z

I'm thinking hard about how to build this phase, now that dquotes is fully merged (yay!)

There seem to be three tasks to perform here (and I'm constantly wondering whether to split them up into multiple phases, but they seem to need some common offset tracking as they read stdin):

Assign a start address to each segment.
1a. User programs provide an approximate start address for segments. But we have to perturb it slightly so the offset at which a segment starts in the ELF binary has the same alignment as the start address for the segment when loading to memory and running.
Assign a couple of offsets to each label (whether in code or in data):
2a. Global file offset.
2b. Local segment offset.
Replace labels with addresses or displacements. If you know the starting address and segment offset for a label, you can compute its address.

Since labels can be defined after they're used, there need to be two passes. The first performs tasks 1 and 2. The second performs task 3.

We need to drop labels at some point. And we can't rewind the input fd at the moment. Perhaps it's cleaner to not add seek syscalls for now.

Ok, so very high-level pseudocode.

Data structures shared between pass 1 and pass 2.

segments: table from segment name to (starting address, starting offset, size in bytes)
labels: table from label name to segment offset
tmp: copy of stdin in memory

Pass 1

var file-offset = 0, segment-offset = 0
while reading words from stdin
  if word is a label
    x : (address number) = insert(labels, name)
    *x = segment-offset
    continue
  emit word to tmp
  if word == '=='
    name = read word from stdin
    seg : (address segment) = insert(segments, name)
    emit word to tmp
    name = read word from stdin
    emit name to tmp
    addr = read word from stdin
    seg->starting-address = parse-hex-int(addr)
    seg->starting-offset = file-offset
    segment-offset = 0
  else
    width = compute-width(word)
    segment-offset += width
    file-offset += width

compute-width usually returns 1, unless word has metadata /imm32, /disp16, etc.

Pass 2

emit-elf-header(segments)
emit-elf-program-headers(segments)
var file-offset = 0, segment-offset = 0
var curr-seg-offset = 0, curr-seg-address = 0
while reading word from tmp
  if (word == "==") skip line
  datum = next-token(word, '/')
  var value = 0, width = 1
  if is-hex-int?(datum)
    value = parse-hex-int(datum)
  else if has-metadata?("/disp8") or has-metadata?("/disp32")
    value = get(labels, datum) - curr-offset
  else if has-metadata?("/imm8") or has-metadata?("/imm32")
    ???
  width = compute-width(word)
  emit-hex(value, width)
  segment-offset += width
  file-offset += width

Ugh, this is complicated.

akkartik · 2019-05-21T21:30:35Z

One small piece we can nibble off today is to extract the logic for compute-width from emit in pack.subx. Does that make sense? You'd have to move some code into subx-common.subx. Let me know if that seems interesting to you.

charles-l · 2019-05-22T01:36:49Z

The only place in pack.subx that I see the width of a value being computed is in the switch in the convert-data function. Is that what you're referring to?

akkartik · 2019-05-22T01:53:37Z

Ah yes, that's the one. Thanks!

akkartik · 2019-05-25T01:17:19Z

I think what may stop me spinning my wheels is a helper like check-ints-equal called check-int-arrays-equal that accepts a string, parses it as a list of numbers and compares the other arg with it. That'll help me write clean tests for pass 1 in isolation.

Just an idea to throw out there. I'll probably work on it unless you really want to.

hoist 'Heap' variable into the std library in anticipation of the parse-array-of-ints primitive.

Mostly for tests. For every new type we want to compare in a test, we're now going to start using some primitive that can parse its value from string. In this manner we can get syntax for literals in machine code. Open question: parsing aggregates of aggregates. Like an array of structs. This is the first time we allocate from the heap in standard library tests. So we now need to start initializing the heap in all our apps.

'get-or-insert-stream' is now the more generic 'get-or-insert' that can handle tables of any value type. But callers have to be careful to cast values to the right type.

I think the path to readable tests for survey.subx passes through white-box tests.

akkartik · 2019-06-08T16:51:04Z

Excellent!

charles-l · 2019-06-08T17:51:22Z

OK -- implemented compute-width. Not sure if needs more specifiers, though. Only supports {disp,imm}{8,16,32} right now...

Simplify `string-equal`.

akkartik · 2019-06-08T18:55:31Z

That is the whole set. I'm pretty inconsistent there as well; sometimes I support {disp,imm}{8,32} and sometimes disp8/16/32 and imm8/32.

akkartik · 2019-06-08T19:41:35Z

Any preference on what you'd like to do next?

I've been (veeery sloooowly) building out primitives for checking the trace, so that I can write the survey tests like this:

# . check-trace-contains(Trace-stream, "label 'x' is at address 0x1079")
# . check-trace-contains(Trace-stream, "segment 'code' starts at address 0x74")
# . check-trace-contains(Trace-stream, "segment 'code' has size 0x5")
# . check-trace-contains(Trace-stream, "segment 'data' starts at address 0x1079")

That feels more useful than printing out say a series of bytes for the ELF header, and then writing a test that I printed out that same series of bytes.

More details on the trace.

akkartik · 2019-07-12T18:44:24Z

Woohoo!

The final bug was something I introduced in 538f24c :/ No idea what I was thinking.

Continuation of commit 6f6d458 to support unsigned comparisons in 32-bit jumps. Once again, no tests.

Snapshot at a random moment, showing a new debugging trick: hacking on the C++ level to dump memory contents on specific labels. For some reason label 'x' doesn't have a segment assigned by the time we get to compute-addresses.

I carefully logged the segment a label is declared in but forgot to actually save it in the table. This has been a theoretic concern for some time, but I've never seen it actually happen until now. SubX is just too low level. Now I get past the first two phases but code generation fails to find the 'Entry' label.

Clean up.

map of how far we've gotten by now (functions with '*' independently tested): ✓ compute-offsets* ✓ compute-addresses* ✓ emit-output ✓ emit-headers ✓ emit-elf-header ✓ emit-hex-array* ✓ first emit-elf-program-header-entry ✓ emit-hex-array* ? second emit-elf-program-header-entry emit-hex-array* emit-segments*

All assertions in `test-convert-computes-addresses` still failing.

Clean up.

akkartik · 2019-07-14T05:56:15Z

This PR is done, but sadly I still get an error when compiling examples/ex1 with:

$ cat examples/ex1.subx |subx_bin run apps/dquotes |subx_bin run apps/assort |subx_bin run apps/pack |subx_bin run apps/survey

(It only fails in the final survey pipe stage. So that's something at least.)

We should probably get a sense of whether any of our examples complete without error right now.

charles-l · 2019-07-19T00:10:01Z

Hrmm... Yeah, I'm only getting segfaults when I run through examples right now, but now that we have tracing, it might be easier to track it down.

$ cat run.sh
for f in $(find examples -name "*.subx"); do
  echo $f
  cat $f |./subx_bin run apps/dquotes |./subx_bin run apps/assort |./subx_bin run apps/pack |./subx_bin run apps/survey
done

$ sh run.sh                                                                    
examples/ex1.subx
   0 error: Tried to access uninitialized memory at address 0x78000043\n
examples/ex10.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n
examples/ex11.subx
stream overflow
   0 error: Tried to access uninitialized memory at address 0x00000000\n
   0 error: The entire 4-byte word should be initialized and lie in a single segment.\n
   0 error: Tried to access uninitialized memory at address 0x00000000\n
   0 error: The entire 4-byte word should be initialized and lie in a single segment.\n
Segmentation fault
examples/ex12.subx
   0 error: Tried to access uninitialized memory at address 0x7800005c\n
examples/ex2.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex3.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex4.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex5.subx
   0 error: Tried to access uninitialized memory at address 0x7800006c\n
examples/ex6.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex7.subx
   0 error: Tried to access uninitialized memory at address 0x7800004c\n
examples/ex8.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n
examples/ex9.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n

akkartik · 2019-07-19T01:21:25Z

Are you back on master? 10/12 examples work for me now.

akkartik · 2019-07-24T00:00:25Z

Just FYI, things are improving rapidly on master. All example programs are now translating using this new self-hosted translator, and the resulting binaries are identical to the output of the C++ translator. That's programs with 5k+ lines so far.

I'm now trying to get the translator phases themselves through. "SubX in SubX" in "SubX in SubX".

I've been posting status updates on this thread: https://mastodon.social/@akkartik/102488929327915911

charles-l · 2019-07-24T00:19:59Z

@akkartik oh sweet -- yeah I gave it a shot on master and many of the examples work run through the pipeline now.

akkartik · 2019-07-25T16:14:46Z

All done!

https://mastodon.social/@akkartik/102495274992610155
https://mastodon.social/@akkartik/102499076373416165

charles-l · 2019-07-25T19:35:58Z

Whoo!!! That is an awesome milestone!!

akkartik · 2019-07-25T20:45:18Z

Thank you so much!

I'm taking a breath to think about what programs I want to write next on this foundation. Something silly/fun just to decompress. Any ideas?

Beyond the next few days, creating some syntactic sugar for function calls would be pretty sweet. I think it may first require some sugar for reg/mem operands.

%reg => 3/mod/direct reg/rm32
*reg => 0/mod/indirect reg/rm32
*(reg+d) => 2/mod reg/rm32 d/disp32
*(r1+r2+d) => 2/mod 4/rm32/SIB r1/base r2/index 0/scale d/disp32
*(r1+r2<<1+d) => 2/mod 4/rm32/SIB r1/base r2/index 1/scale d/disp32
etc.

If we had this, we could then have calls like foo(%EAX, 32/imm32) expand to:

ff 6/subop/push %EAX
68/push 32/imm32
e8/call foo/disp32
81 0/subop/add %ESP 8/imm32  # num args * 4

So that's one idea.

I'm also tempted to push on https://github.com/ivandavidov/minimal and see how much I can learn from it.

akkartik added the in progress Feature branch; may not work; not ready for merging label May 18, 2019

.

90fd6a6

add lengths to data blobs

.

d3d452e

akkartik and others added 11 commits May 24, 2019 21:01

.

886097a

.

7c06e10

new primitive for tests: check-string-equal

0b2d2d9

new primitive: array-equal?

6c03f2e

.

bd31dbe

hoist 'Heap' variable into the std library in anticipation of the parse-array-of-ints primitive.

new primitive: check-array-equal

cb3d96b

.

7c575de

.

965dd1b

'get-or-insert-stream' is now the more generic 'get-or-insert' that can handle tables of any value type. But callers have to be careful to cast values to the right type.

start fleshing out trace support some more

43f1c41

I think the path to readable tests for survey.subx passes through white-box tests.

added tests for compute-width

d9c4825

implement compute-width

0d0af0a

.

21cb677

Simplify `string-equal`.

akkartik added 2 commits June 8, 2019 11:55

.

627b35b

Fix stale initialize-trace-stream

3527296

akkartik added 19 commits July 12, 2019 12:00

.

2870525

.

066e01f

.

94f2de6

Continuation of commit 6f6d458 to support unsigned comparisons in 32-bit jumps. Once again, no tests.

.

8ba17d8

Snapshot at a random moment, showing a new debugging trick: hacking on the C++ level to dump memory contents on specific labels. For some reason label 'x' doesn't have a segment assigned by the time we get to compute-addresses.

.

f518bd9

fixed second bug, hit third

a259389

.

4c81119

Clean up.

fixed third bug, hit fourth

195a0d7

fixed fourth bug, hit fifth

50ac5ca

.

d30c716

Clean up.

fixed fifth bug, hit sixth

58c643c

.

62bb910

grow the output stream; test now completes

17fb42c

All assertions in `test-convert-computes-addresses` still failing.

survey.subx now passing all tests

2773d5a

.

e63ec16

add subx/apps/survey to CI

ae30c46

.

7f23be0

Clean up.

akkartik merged commit c4aa819 into master Jul 14, 2019

akkartik removed the in progress Feature branch; may not work; not ready for merging label Jul 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SubX in SubX: computing addresses for labels #34

SubX in SubX: computing addresses for labels #34

akkartik commented May 18, 2019

akkartik commented May 18, 2019

akkartik commented May 19, 2019

akkartik commented May 21, 2019 •

edited

akkartik commented May 21, 2019

charles-l commented May 22, 2019

akkartik commented May 22, 2019

akkartik commented May 25, 2019

akkartik commented Jun 8, 2019

charles-l commented Jun 8, 2019 •

edited

akkartik commented Jun 8, 2019

akkartik commented Jun 8, 2019

akkartik commented Jul 12, 2019

akkartik commented Jul 14, 2019

charles-l commented Jul 19, 2019 •

edited

akkartik commented Jul 19, 2019

akkartik commented Jul 24, 2019

charles-l commented Jul 24, 2019

akkartik commented Jul 25, 2019

charles-l commented Jul 25, 2019

akkartik commented Jul 25, 2019

SubX in SubX: computing addresses for labels #34

SubX in SubX: computing addresses for labels #34

Conversation

akkartik commented May 18, 2019

akkartik commented May 18, 2019

akkartik commented May 19, 2019

akkartik commented May 21, 2019 • edited

akkartik commented May 21, 2019

charles-l commented May 22, 2019

akkartik commented May 22, 2019

akkartik commented May 25, 2019

akkartik commented Jun 8, 2019

charles-l commented Jun 8, 2019 • edited

akkartik commented Jun 8, 2019

akkartik commented Jun 8, 2019

akkartik commented Jul 12, 2019

akkartik commented Jul 14, 2019

charles-l commented Jul 19, 2019 • edited

akkartik commented Jul 19, 2019

akkartik commented Jul 24, 2019

charles-l commented Jul 24, 2019

akkartik commented Jul 25, 2019

charles-l commented Jul 25, 2019

akkartik commented Jul 25, 2019

akkartik commented May 21, 2019 •

edited

charles-l commented Jun 8, 2019 •

edited

charles-l commented Jul 19, 2019 •

edited