Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SubX in SubX: computing addresses for labels #34

Merged
merged 172 commits into from Jul 14, 2019
Merged

SubX in SubX: computing addresses for labels #34

merged 172 commits into from Jul 14, 2019

Conversation

akkartik
Copy link
Owner

This branch is work in progress on implementing SubX in SubX. The big-picture plan is to divide up the work into the following phases:

$ cat file1.subx file2.subx ... |dquotes |assort |pack |survey |hex > a.elf

Each phase is an ELF binary constructed out of a corresponding .subx file.

  • hex.subx converts a sequence of bytes in hex into a binary. (done)
  • survey.subx will turn labels into addresses, and also slap on ELF headers. It's called 'survey' because the act of imposing a coordinate system on a program feels somewhat like surveying terrain and determining various distances and heights. (in progress here)
  • pack.subx converts bitfields in instructions into whole bytes in the right order. For example, 0/mod 1/rm32 1/r32 is turned into the single byte 08. (done)
  • assort.subx just reads in many segments starting with == and containing lines of code or data, and concatenates all segments with the same name. (done)
  • dquotes.subx moves string literals in the code segment to the data segment. (done)

This branch is where the survey phase is being built.

To run its tests (and so see how far along things are):

$ ./subx translate 0*.subx apps/subx-common.subx apps/survey.subx -o apps/survey  &&  \
      ./subx run apps/survey test

Contact me if you'd like to contribute. Commit access freely given.

Start of the final phase needed to implement SubX in SubX:

  $ cat files.subx ... |dquotes |assort |pack |survey |hex > a.elf

survey.subx is responsible for assigning addresses to labels and segments.
@akkartik akkartik added the in progress Feature branch; may not work; not ready for merging label May 18, 2019
@akkartik
Copy link
Owner Author

@charles-l I just spun up this PR for survey.subx. No failing tests yet. Only mildly interesting thing to look at may be the ELF header definitions.

.
add lengths to data blobs
@akkartik
Copy link
Owner Author

@charles-l I'm still cleaning up #32 so haven't written any tests here yet. If you're looking for something to do, could you create a function here called emit-elf-header that prints out the Elf_header string? For now convert can consist just of emit-elf-header and then cating stdin to stdout.

Any tests I think of for this seem pretty silly, so let's just test it manually at the commandline.

@akkartik
Copy link
Owner Author

akkartik commented May 21, 2019

I'm thinking hard about how to build this phase, now that dquotes is fully merged (yay!)

There seem to be three tasks to perform here (and I'm constantly wondering whether to split them up into multiple phases, but they seem to need some common offset tracking as they read stdin):

  1. Assign a start address to each segment.
    1a. User programs provide an approximate start address for segments. But we have to perturb it slightly so the offset at which a segment starts in the ELF binary has the same alignment as the start address for the segment when loading to memory and running.
  2. Assign a couple of offsets to each label (whether in code or in data):
    2a. Global file offset.
    2b. Local segment offset.
  3. Replace labels with addresses or displacements. If you know the starting address and segment offset for a label, you can compute its address.

Since labels can be defined after they're used, there need to be two passes. The first performs tasks 1 and 2. The second performs task 3.

We need to drop labels at some point. And we can't rewind the input fd at the moment. Perhaps it's cleaner to not add seek syscalls for now.

Ok, so very high-level pseudocode.

Data structures shared between pass 1 and pass 2.

  • segments: table from segment name to (starting address, starting offset, size in bytes)
  • labels: table from label name to segment offset
  • tmp: copy of stdin in memory

Pass 1

var file-offset = 0, segment-offset = 0
while reading words from stdin
  if word is a label
    x : (address number) = insert(labels, name)
    *x = segment-offset
    continue
  emit word to tmp
  if word == '=='
    name = read word from stdin
    seg : (address segment) = insert(segments, name)
    emit word to tmp
    name = read word from stdin
    emit name to tmp
    addr = read word from stdin
    seg->starting-address = parse-hex-int(addr)
    seg->starting-offset = file-offset
    segment-offset = 0
  else
    width = compute-width(word)
    segment-offset += width
    file-offset += width

compute-width usually returns 1, unless word has metadata /imm32, /disp16, etc.

Pass 2

emit-elf-header(segments)
emit-elf-program-headers(segments)
var file-offset = 0, segment-offset = 0
var curr-seg-offset = 0, curr-seg-address = 0
while reading word from tmp
  if (word == "==") skip line
  datum = next-token(word, '/')
  var value = 0, width = 1
  if is-hex-int?(datum)
    value = parse-hex-int(datum)
  else if has-metadata?("/disp8") or has-metadata?("/disp32")
    value = get(labels, datum) - curr-offset
  else if has-metadata?("/imm8") or has-metadata?("/imm32")
    ???
  width = compute-width(word)
  emit-hex(value, width)
  segment-offset += width
  file-offset += width

Ugh, this is complicated.

@akkartik
Copy link
Owner Author

One small piece we can nibble off today is to extract the logic for compute-width from emit in pack.subx. Does that make sense? You'd have to move some code into subx-common.subx. Let me know if that seems interesting to you.

@charles-l
Copy link
Collaborator

The only place in pack.subx that I see the width of a value being computed is in the switch in the convert-data function. Is that what you're referring to?

@akkartik
Copy link
Owner Author

Ah yes, that's the one. Thanks!

@akkartik
Copy link
Owner Author

I think what may stop me spinning my wheels is a helper like check-ints-equal called check-int-arrays-equal that accepts a string, parses it as a list of numbers and compares the other arg with it. That'll help me write clean tests for pass 1 in isolation.

Just an idea to throw out there. I'll probably work on it unless you really want to.

akkartik and others added 11 commits May 24, 2019 21:01
.
hoist 'Heap' variable into the std library in anticipation of the parse-array-of-ints
primitive.
Mostly for tests. For every new type we want to compare in a test, we're
now going to start using some primitive that can parse its value from string. In this manner we can get syntax for literals in machine code.

Open question: parsing aggregates of aggregates. Like an array of structs.

This is the first time we allocate from the heap in standard library tests.
So we now need to start initializing the heap in all our apps.
.
'get-or-insert-stream' is now the more generic 'get-or-insert' that can
handle tables of any value type. But callers have to be careful to cast
values to the right type.
I think the path to readable tests for survey.subx passes through
white-box tests.
@akkartik
Copy link
Owner Author

akkartik commented Jun 8, 2019

Excellent!

@charles-l
Copy link
Collaborator

charles-l commented Jun 8, 2019

OK -- implemented compute-width. Not sure if needs more specifiers, though. Only supports {disp,imm}{8,16,32} right now...

.
Simplify `string-equal`.
@akkartik
Copy link
Owner Author

akkartik commented Jun 8, 2019

That is the whole set. I'm pretty inconsistent there as well; sometimes I support {disp,imm}{8,32} and sometimes disp8/16/32 and imm8/32.

@akkartik
Copy link
Owner Author

akkartik commented Jun 8, 2019

Any preference on what you'd like to do next?

I've been (veeery sloooowly) building out primitives for checking the trace, so that I can write the survey tests like this:

# . check-trace-contains(Trace-stream, "label 'x' is at address 0x1079")
# . check-trace-contains(Trace-stream, "segment 'code' starts at address 0x74")
# . check-trace-contains(Trace-stream, "segment 'code' has size 0x5")
# . check-trace-contains(Trace-stream, "segment 'data' starts at address 0x1079")

That feels more useful than printing out say a series of bytes for the ELF header, and then writing a test that I printed out that same series of bytes.

More details on the trace.

@akkartik
Copy link
Owner Author

Woohoo!

The final bug was something I introduced in 538f24c :/ No idea what I was thinking.

.
Continuation of commit 6f6d458 to support unsigned comparisons in
32-bit jumps.

Once again, no tests.
.
Snapshot at a random moment, showing a new debugging trick: hacking on
the C++ level to dump memory contents on specific labels.

For some reason label 'x' doesn't have a segment assigned by the time we
get to compute-addresses.
I carefully logged the segment a label is declared in but forgot to
actually save it in the table. This has been a theoretic concern for
some time, but I've never seen it actually happen until now. SubX is
just too low level.

Now I get past the first two phases but code generation fails to find
the 'Entry' label.
.
Clean up.
.
Clean up.
map of how far we've gotten by now (functions with '*' independently tested):
✓ compute-offsets*
✓ compute-addresses*
✓ emit-output
✓   emit-headers
✓     emit-elf-header
✓       emit-hex-array*
✓     first emit-elf-program-header-entry
✓       emit-hex-array*
?     second emit-elf-program-header-entry
        emit-hex-array*
    emit-segments*
All assertions in `test-convert-computes-addresses` still failing.
.
Clean up.
@akkartik akkartik merged commit c4aa819 into master Jul 14, 2019
@akkartik akkartik removed the in progress Feature branch; may not work; not ready for merging label Jul 14, 2019
@akkartik
Copy link
Owner Author

This PR is done, but sadly I still get an error when compiling examples/ex1 with:

$ cat examples/ex1.subx |subx_bin run apps/dquotes |subx_bin run apps/assort |subx_bin run apps/pack |subx_bin run apps/survey

(It only fails in the final survey pipe stage. So that's something at least.)

We should probably get a sense of whether any of our examples complete without error right now.

@charles-l
Copy link
Collaborator

charles-l commented Jul 19, 2019

Hrmm... Yeah, I'm only getting segfaults when I run through examples right now, but now that we have tracing, it might be easier to track it down.

$ cat run.sh
for f in $(find examples -name "*.subx"); do
  echo $f
  cat $f |./subx_bin run apps/dquotes |./subx_bin run apps/assort |./subx_bin run apps/pack |./subx_bin run apps/survey
done
$ sh run.sh                                                                    
examples/ex1.subx
   0 error: Tried to access uninitialized memory at address 0x78000043\n
examples/ex10.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n
examples/ex11.subx
stream overflow
   0 error: Tried to access uninitialized memory at address 0x00000000\n
   0 error: The entire 4-byte word should be initialized and lie in a single segment.\n
   0 error: Tried to access uninitialized memory at address 0x00000000\n
   0 error: The entire 4-byte word should be initialized and lie in a single segment.\n
Segmentation fault
examples/ex12.subx
   0 error: Tried to access uninitialized memory at address 0x7800005c\n
examples/ex2.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex3.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex4.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex5.subx
   0 error: Tried to access uninitialized memory at address 0x7800006c\n
examples/ex6.subx
   0 error: Tried to access uninitialized memory at address 0x78000042\n
examples/ex7.subx
   0 error: Tried to access uninitialized memory at address 0x7800004c\n
examples/ex8.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n
examples/ex9.subx
   0 error: Tried to access uninitialized memory at address 0x78000056\n

@akkartik
Copy link
Owner Author

Are you back on master? 10/12 examples work for me now.

@akkartik
Copy link
Owner Author

Just FYI, things are improving rapidly on master. All example programs are now translating using this new self-hosted translator, and the resulting binaries are identical to the output of the C++ translator. That's programs with 5k+ lines so far.

I'm now trying to get the translator phases themselves through. "SubX in SubX" in "SubX in SubX".

I've been posting status updates on this thread: https://mastodon.social/@akkartik/102488929327915911

@charles-l
Copy link
Collaborator

@akkartik oh sweet -- yeah I gave it a shot on master and many of the examples work run through the pipeline now.

@akkartik
Copy link
Owner Author

All done!

https://mastodon.social/@akkartik/102495274992610155
https://mastodon.social/@akkartik/102499076373416165

@charles-l
Copy link
Collaborator

Whoo!!! That is an awesome milestone!!

@akkartik
Copy link
Owner Author

Thank you so much!

I'm taking a breath to think about what programs I want to write next on this foundation. Something silly/fun just to decompress. Any ideas?

Beyond the next few days, creating some syntactic sugar for function calls would be pretty sweet. I think it may first require some sugar for reg/mem operands.

%reg => 3/mod/direct reg/rm32
*reg => 0/mod/indirect reg/rm32
*(reg+d) => 2/mod reg/rm32 d/disp32
*(r1+r2+d) => 2/mod 4/rm32/SIB r1/base r2/index 0/scale d/disp32
*(r1+r2<<1+d) => 2/mod 4/rm32/SIB r1/base r2/index 1/scale d/disp32
etc.

If we had this, we could then have calls like foo(%EAX, 32/imm32) expand to:

ff 6/subop/push %EAX
68/push 32/imm32
e8/call foo/disp32
81 0/subop/add %ESP 8/imm32  # num args * 4

So that's one idea.

I'm also tempted to push on https://github.com/ivandavidov/minimal and see how much I can learn from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants