-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instruction Selection, Object Files, and More #66
Conversation
Progress Report: TL;DR: We are assembling x86_64 instructions and operands into proper machine code, but that's kind of it. We are missing large portions of required functionality, like keeping track of relocations, symbol entries, etc. Going forward:
|
ee32aa6
to
f0e1449
Compare
`sub` in `imm_to_mem` form. `mov` in `reg_to_name` form.
This is definitely "ill-formed" and will require lots of cleanup, but it is a step in the right direction. "Instruction selection" now becomes it's own pass which we can easily expand upon and make better. We are still doing RA on the regular IR, which is a bit scuffed, so we'll have to change that at some point. Overall, this helps to share more code between each target that supports the x86_64 architecture. Again, this is just a very basic, beginner-level MIR that we will make better over time.
Here comes the boom |
Also separate mcode and femit code entirely in the x86_64 backend; that is, the GObj backend and GAS backend are getting closer and closer to actually being separate backends, hehe.
Basically, we want to have MIR able to represent generic machine instructions as well as specific machine instructions, and we want each architecture/ISA to set up patterns to convert one-or-more of these generic MIR instructions into one-or-more specific MIR instructions. `select_instructions2` is basically those patterns but in code. The idea is that A. RA will move until after all this happens, and operate on MIR instead of IR and B. each target architecture can more easily share arch-specific code, RA, etc, and then each target supported for that arch (i.e. COFF, ELF, GAS, etc) is just a simple translation layer from these MIR<arch> instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only a partial review. I’ll finish this later probably. Left off at machine_ir_forward.h.
@@ -809,6 +809,7 @@ bool codegen | |||
if (!code) ICE("codegen(): failed to open file at path: \"%s\"\n", outfile); | |||
|
|||
CodegenContext *context = codegen_context_create(ast, format, call_convention, dialect, code); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big change. Approved.
@@ -60,4 +60,72 @@ enum ComparisonType { | |||
COMPARE_COUNT, | |||
}; | |||
|
|||
/// All instructions that take two arguments. | |||
#define ALL_BINARY_INSTRUCTION_TYPES(F) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have left these tables in their respective headers (intermediate_representation.h and machine_ir.h), but I suppose they could also go here. codegen_forward.h was mostly intended for forward declarations to break header dependency cycles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's also why I needed machine_ir_forward.h
, but I never thought about just putting those in codegen_forward.h
lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how I thought that was a response to the comment above lol but that was meant for the one down below regarding the header not being needed.
If you want to try and put the IR macros back in their header, then by all means go ahead, but the current situation took enough fighting with the stupid preprocessor and circular includes that I've given up trying to get that to work. I hate includes so much when projects start getting bigger like this.
if (!inst->machine_inst) { | ||
ir_femit_instruction(stdout, inst); | ||
ICE("Must translate IRInstruction into MIR before taking reference to it."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this play nicely w/ forward references (e.g. branches and PHIs containing forward references due to e.g. loop codegen)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost certainly not. What is the solution to that? Do we need like "relocations" for the MIR, or is there some easier solution?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VRegs or some other index-based solution is the only thing that I can think of unfortunately... That or keep track of a list of instructions that need to be fixed up later, but I wouldn’t really call that a ‘solution’, candidly.
src/codegen/machine_ir.c
Outdated
MIRInstruction *mir_makenew(uint32_t opcode) { | ||
MIRInstruction *mir = calloc(1, sizeof(*mir)); | ||
ASSERT(mir, "Memory allocation failure"); | ||
mir->id = ++mir_alloc_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mir_alloc_id
should probably be part of the codegen context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really arguing it's a bad idea to get rid of the statics here---that's definitely a good idea---but I do wonder what it would accomplish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really arguing it's a bad idea to get rid of the statics here---that's definitely a good idea---but I do wonder what it would accomplish
Nothing unless you’re planning to be able to codegen several programs/files concurrently. If you care about multithreading at all, then these should all go in the context because that’s what it’s for, otherwise none of that matters to begin with and everything that’s in the context could just as well be a bunch of globals. My point here is, it doesn’t make that much sense to put half of this stuff in the context and make the other half global variables. It’s either one way or the other, but mixing both unfortunately doesn’t end up accomplishing much of anything because you get both the inconvenience of having to pass a context to every function and yet no thread safety in return...
src/codegen/machine_ir.h
Outdated
uint8_t instruction_form; | ||
uint16_t instruction; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Swap the order of these two to reduce the size of this struct by 1 byte of padding
We are slowly working towards having an MIR that fully represents the program while at the same time being closer in representation to the underlying ISA that we will be emitting code for. From here, there's lots of complex stuff to do, like moving register allocation to MIR instead of IR. And also pattern matching for instruction selection using a DSL or something would be glorious.
Basically, we don't know for sure *when* hardware registers are defined, so we can't properly set interference excluding definition (unless we somehow kept track of where hardware registers get their values set, but that would mean each backend would have to tell us that, because we don't know opcodes at time of RA, after ISel).
This comes to light through `tst/tests/function_arguments_many.int`
Integer promotion go brr
Sirraide: > I guess, as an invariant, we can just require in the x86_64 backend that any operation that operates on 32-bit registers clears the upper bits > If an instruction for some reason doesn’t do that, we can emit an and or sth Basically, there is one/two undocumented instruction(s) that *may* not zero the top bits when moving into a 32 bit register, but that's ... *rare*. And also not in use by this compiler. So this seems like a good invariant to have for the x86_64 backend.
Kind of ridiculous, but for debugging purposes...
This means we can now test both object file and assembly backends using ctest, as well as optimised versions.
Testing the same test in multiple backends is *not* something we need to be doing every time we make a small change; it should be done, but not all the time during development, ig.
I should really just create a macro that does this and apply it to every register operand, lolol.
First off, I'll just start by saying I don't even know if this will end up getting merged. The code is quite messy. If I can manage to clean it up, however, that will be rather cool. That means the compiler will be able to emit object files which a linker can then use to generate an executable. This is basically "one layer past" assembly, which we currently emit. This PR aims to provide support for an object file backend for x86_64 while retaining the assembly backend.