Instruction Selection, Object Files, and More #66

LensPlaysGames · 2023-04-29T00:07:32Z

First off, I'll just start by saying I don't even know if this will end up getting merged. The code is quite messy. If I can manage to clean it up, however, that will be rather cool. That means the compiler will be able to emit object files which a linker can then use to generate an executable. This is basically "one layer past" assembly, which we currently emit. This PR aims to provide support for an object file backend for x86_64 while retaining the assembly backend.

LensPlaysGames · 2023-04-29T00:15:09Z

Progress Report: simple_math.int is assembled completely into final machine code; it could be embedded in an executable file as is. Assembling simple_pointer.int succeeds, and all instructions are represented in the final machine code, however there are multiple relocations not applied, leaving the final code incorrect. greatest_common_denominator.int also succeeds with similar relocation issues.

TL;DR: We are assembling x86_64 instructions and operands into proper machine code, but that's kind of it. We are missing large portions of required functionality, like keeping track of relocations, symbol entries, etc.

Going forward:

Altering the machine-code-producing mcode_* functions to append to some sort of byte buffer instead of writing to a file would be ideal
Create some sort of meta, in-memory object file structure that keeps track of things like defined symbols, relocations, sections, etc (like our very own lil' BFD hehe)

src/codegen/x86_64/arch_x86_64.c

`sub` in `imm_to_mem` form. `mov` in `reg_to_name` form.

… emission

This is definitely "ill-formed" and will require lots of cleanup, but it is a step in the right direction. "Instruction selection" now becomes it's own pass which we can easily expand upon and make better. We are still doing RA on the regular IR, which is a bit scuffed, so we'll have to change that at some point. Overall, this helps to share more code between each target that supports the x86_64 architecture. Again, this is just a very basic, beginner-level MIR that we will make better over time.

LensPlaysGames · 2023-05-01T20:45:04Z

Here comes the boom

Also separate mcode and femit code entirely in the x86_64 backend; that is, the GObj backend and GAS backend are getting closer and closer to actually being separate backends, hehe.

Basically, we want to have MIR able to represent generic machine instructions as well as specific machine instructions, and we want each architecture/ISA to set up patterns to convert one-or-more of these generic MIR instructions into one-or-more specific MIR instructions. `select_instructions2` is basically those patterns but in code. The idea is that A. RA will move until after all this happens, and operate on MIR instead of IR and B. each target architecture can more easily share arch-specific code, RA, etc, and then each target supported for that arch (i.e. COFF, ELF, GAS, etc) is just a simple translation layer from these MIR<arch> instructions.

Sirraide

This is only a partial review. I’ll finish this later probably. Left off at machine_ir_forward.h.

Sirraide · 2023-05-02T17:06:34Z

src/codegen.c

@@ -809,6 +809,7 @@ bool codegen
  if (!code) ICE("codegen(): failed to open file at path: \"%s\"\n", outfile);

  CodegenContext *context = codegen_context_create(ast, format, call_convention, dialect, code);
+


Big change. Approved.

Sirraide · 2023-05-02T17:08:10Z

src/codegen/codegen_forward.h

@@ -60,4 +60,72 @@ enum ComparisonType {
  COMPARE_COUNT,
 };

+/// All instructions that take two arguments.
+#define ALL_BINARY_INSTRUCTION_TYPES(F) \


I would have left these tables in their respective headers (intermediate_representation.h and machine_ir.h), but I suppose they could also go here. codegen_forward.h was mostly intended for forward declarations to break header dependency cycles

That's also why I needed machine_ir_forward.h, but I never thought about just putting those in codegen_forward.h lol

I don't know how I thought that was a response to the comment above lol but that was meant for the one down below regarding the header not being needed.

If you want to try and put the IR macros back in their header, then by all means go ahead, but the current situation took enough fighting with the stupid preprocessor and circular includes that I've given up trying to get that to work. I hate includes so much when projects start getting bigger like this.

src/codegen/coff.h

src/codegen/machine_ir.c

Sirraide · 2023-05-02T17:39:32Z

src/codegen/machine_ir.c

+  if (!inst->machine_inst) {
+    ir_femit_instruction(stdout, inst);
+    ICE("Must translate IRInstruction into MIR before taking reference to it.");


Does this play nicely w/ forward references (e.g. branches and PHIs containing forward references due to e.g. loop codegen)?

Almost certainly not. What is the solution to that? Do we need like "relocations" for the MIR, or is there some easier solution?

VRegs or some other index-based solution is the only thing that I can think of unfortunately... That or keep track of a list of instructions that need to be fixed up later, but I wouldn’t really call that a ‘solution’, candidly.

Sirraide · 2023-05-02T17:40:43Z

src/codegen/machine_ir.c

+MIRInstruction *mir_makenew(uint32_t opcode) {
+  MIRInstruction *mir = calloc(1, sizeof(*mir));
+  ASSERT(mir, "Memory allocation failure");
+  mir->id = ++mir_alloc_id;


This mir_alloc_id should probably be part of the codegen context

I'm not really arguing it's a bad idea to get rid of the statics here---that's definitely a good idea---but I do wonder what it would accomplish

I'm not really arguing it's a bad idea to get rid of the statics here---that's definitely a good idea---but I do wonder what it would accomplish

Nothing unless you’re planning to be able to codegen several programs/files concurrently. If you care about multithreading at all, then these should all go in the context because that’s what it’s for, otherwise none of that matters to begin with and everything that’s in the context could just as well be a bunch of globals. My point here is, it doesn’t make that much sense to put half of this stuff in the context and make the other half global variables. It’s either one way or the other, but mixing both unfortunately doesn’t end up accomplishing much of anything because you get both the inconvenience of having to pass a context to every function and yet no thread safety in return...

Sirraide · 2023-05-02T17:44:06Z

src/codegen/machine_ir.h

+  uint8_t instruction_form;
+  uint16_t instruction;


Swap the order of these two to reduce the size of this struct by 1 byte of padding

src/codegen/machine_ir_forward.h

We are slowly working towards having an MIR that fully represents the program while at the same time being closer in representation to the underlying ISA that we will be emitting code for. From here, there's lots of complex stuff to do, like moving register allocation to MIR instead of IR. And also pattern matching for instruction selection using a DSL or something would be glorious.

Basically, we don't know for sure *when* hardware registers are defined, so we can't properly set interference excluding definition (unless we somehow kept track of where hardware registers get their values set, but that would mean each backend would have to tell us that, because we don't know opcodes at time of RA, after ISel).

This comes to light through `tst/tests/function_arguments_many.int`

Integer promotion go brr

Sirraide: > I guess, as an invariant, we can just require in the x86_64 backend that any operation that operates on 32-bit registers clears the upper bits > If an instruction for some reason doesn’t do that, we can emit an and or sth Basically, there is one/two undocumented instruction(s) that *may* not zero the top bits when moving into a 32 bit register, but that's ... *rare*. And also not in use by this compiler. So this seems like a good invariant to have for the x86_64 backend.

Oops!

Kind of ridiculous, but for debugging purposes...

This means we can now test both object file and assembly backends using ctest, as well as optimised versions.

Testing the same test in multiple backends is *not* something we need to be doing every time we make a small change; it should be done, but not all the time during development, ig.

…ntics

I should really just create a macro that does this and apply it to every register operand, lolol.

LensPlaysGames commented Apr 29, 2023

View reviewed changes

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

LensPlaysGames commented Apr 29, 2023

View reviewed changes

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

LensPlaysGames commented Apr 29, 2023

View reviewed changes

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

src/codegen/x86_64/arch_x86_64.c Outdated Show resolved Hide resolved

LensPlaysGames added the enhancement New feature or request label Apr 29, 2023

LensPlaysGames force-pushed the machine_code branch from ee32aa6 to f0e1449 Compare April 30, 2023 05:33

LensPlaysGames added 4 commits April 30, 2023 18:13

[x86_64] SIB byte helper

b913332

[Sema] Prevent NULL dereference

a1c089b

[x86_64/MSWIN] Bugfixes that allow for more stack-based arguments

93b5eda

[x86_64/GObj] More machine code encodings

241f0f8

`sub` in `imm_to_mem` form. `mov` in `reg_to_name` form.

LensPlaysGames changed the title ~~[WIP] x86_64 Object File Backend~~ x86_64 Object File Backend May 1, 2023

LensPlaysGames added 9 commits April 30, 2023 23:32

[GObj/COFF] Actually generate string table instead of truncating

e1a9b38

[Minor] Remove debug print statements

8bcda1d

[GObj] .rodata handling for COFF and ELF

06dd0f2

[x86_64/GObj] Create and use .rodata section for string literals

e534477

[x86_64] Abstract some architecture-specific stuff away from assembly…

403fef6

… emission

[x86_64] ¡No more varargs!

e8ba620

[x86_64] Don't allocate and free register_pool since we aren't using it

3138964

[x86_64] Rename femit to femit_none

6f1b5b3

LensPlaysGames added 4 commits May 1, 2023 21:53

[Codegen] Expansion upon that basic MIR, among other things

63aef7c

Also separate mcode and femit code entirely in the x86_64 backend; that is, the GObj backend and GAS backend are getting closer and closer to actually being separate backends, hehe.

[MIR/Minor] Mildly better debug printout

ec4ae42

[MIR] Name operand type, among other things

c1e46a8

Sirraide reviewed May 2, 2023

View reviewed changes

LensPlaysGames added 4 commits May 2, 2023 14:46

[Minor] Formatting

0429689

[x86_64] Fix encoding of imm-to-reg form of mov

1c38b7d

Remove machine_ir_forward.h header

bdd7537

LensPlaysGames added 15 commits May 25, 2023 21:32

[x86_64] Print less unless prompted

36829a0

[x86_64/Minor] Add TODO for unhandled SysV ABI case

7e09bf5

This comes to light through `tst/tests/function_arguments_many.int`

[x86_64] No longer insert an unused instruction

af70d09

[Sema] Proper implicit casts for binary arithmetic operators

674bd07

Integer promotion go brr

[Codegen] Actually use type_is_signed, don't just guess

72f95a3

[x86_64/ISel] not in imm form

62ba00a

[x86_64/Asm] cwd, cdq, movsx

77e2bcc

[x86_64/GObj] cwd, cdq, cmp, movsx

8b27035

[x86_64/GObj] idiv in reg form

c3b69fb

[x86_64/GObj] Fix invalid break (supposed to be FALLTHROUGH)

4f25025

Oops!

[x86_64/SysV] Error upon failure to lower a register class

9dc8108

[Tests] Simplify now that we have array initialisation

dcbea08

[x86_64/GObj] movzx machine code

0b3c043

LensPlaysGames mentioned this pull request May 27, 2023

Variables with the same name as registers break Intel assembly #50

Closed

LensPlaysGames added 11 commits May 27, 2023 15:44

[x86_64/GObj] BugFix: Add missing break

bc52e09

[x86_64/ISel] Branch conditional with immediate condition operand

65ad5f6

Kind of ridiculous, but for debugging purposes...

[Tests] Update ALGOL 68 test suite

5930867

[Tests] Update C++ test suite to allow different intc targets

d453de8

This means we can now test both object file and assembly backends using ctest, as well as optimised versions.

[Tests] Remove object file testing from default suite

2e3a802

Testing the same test in multiple backends is *not* something we need to be doing every time we make a small change; it should be done, but not all the time during development, ig.

[Tests] Fix concurrency issue with file names in C++ test driver

b457123

[Tests] Update lots-of-phi-nodes.int to work with new language sema…

92aac99

…ntics

[x86_64/GObj] Zero-sized registers will be the bane of my existence

ef62732

I should really just create a macro that does this and apply it to every register operand, lolol.

[x86_64/ISel] sub in reg to reg form

3497ec4

[Codegen] argv is now @@byte, not @@integer

484a905

[Tests] Delete (possibly) created file upon failure

372ec24

LensPlaysGames merged commit 5706b7c into main May 28, 2023

LensPlaysGames deleted the machine_code branch May 28, 2023 20:51

LensPlaysGames mentioned this pull request May 28, 2023

Clang doesn’t accept AT&T assembly without proper suffixes #64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instruction Selection, Object Files, and More #66

Instruction Selection, Object Files, and More #66

LensPlaysGames commented Apr 29, 2023 •

edited

Loading

LensPlaysGames commented Apr 29, 2023

LensPlaysGames commented May 1, 2023

Sirraide left a comment

Sirraide May 2, 2023

Sirraide May 2, 2023

LensPlaysGames May 3, 2023

LensPlaysGames May 3, 2023

Sirraide May 2, 2023

LensPlaysGames May 2, 2023

Sirraide May 3, 2023

Sirraide May 2, 2023

LensPlaysGames May 3, 2023

Sirraide May 3, 2023 •

edited

Loading

Sirraide May 2, 2023

		@@ -809,6 +809,7 @@ bool codegen
		if (!code) ICE("codegen(): failed to open file at path: \"%s\"\n", outfile);

		CodegenContext *context = codegen_context_create(ast, format, call_convention, dialect, code);

Instruction Selection, Object Files, and More #66

Instruction Selection, Object Files, and More #66

Conversation

LensPlaysGames commented Apr 29, 2023 • edited Loading

LensPlaysGames commented Apr 29, 2023

LensPlaysGames commented May 1, 2023

Sirraide left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Sirraide May 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LensPlaysGames commented Apr 29, 2023 •

edited

Loading

Sirraide May 3, 2023 •

edited

Loading