Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Types from beginning to end #37

Merged
merged 98 commits into from
Feb 7, 2023
Merged

Types from beginning to end #37

merged 98 commits into from
Feb 7, 2023

Conversation

LensPlaysGames
Copy link
Owner

Right now, type information is thrown out as soon as code generation happens; type information is only ever stored in the AST. This was okay when integer was the only type, but it's coming time that we need to have byte types, and eventually even more complex types.

This PR "carries through" type information from the syntactic/semantic analysis on through code generation, all the way to the backends, by storing type information of each IR instruction. The backend can then react to variables like the size of the type being operated on, whether or not it can fit in different registers, etc.

@LensPlaysGames
Copy link
Owner Author

examples/arrays.un now is as follows:

defun main (%3, %4) global {
bb1:
      %3 │      r2 │ .reg %2 | integer
      %4 │      r3 │ .reg %3 | @@integer
      %5 │      r1 │ .ref int_array | integer[4]
      %6 │      r2 │ .ref first_int_pointer | @integer
      %7 │      r3 │ load %5 | integer[4]
      %8 │      r4 │ imm 0 | <integer_literal>
      %9 │      r3 │ add %7, %8 | integer
         │         │ store into %6, %9 | void
     %10 │      r2 │ .ref first_int | integer
     %12 │      r4 │ imm 0 | <integer_literal>
     %13 │      r3 │ add %11, %12 | integer
     %14 │      r3 │ load %13 | integer
         │         │ store into %10, %14 | void
     %15 │      r2 │ imm 69 | <integer_literal>
     %16 │      r3 │ load %5 | integer[4]
     %17 │      r4 │ imm 0 | <integer_literal>
     %18 │      r3 │ add %16, %17 | integer
         │         │ store into %18, %15 | void
     %19 │      r2 │ imm 420 | <integer_literal>
     %20 │      r3 │ load %5 | integer[4]
     %21 │      r4 │ imm 1 | <integer_literal>
     %22 │      r3 │ add %20, %21 | integer
         │         │ store into %22, %19 | void
     %23 │      r2 │ imm 69 | <integer_literal>
     %24 │      r3 │ load %5 | integer[4]
     %25 │      r4 │ imm 2 | <integer_literal>
     %26 │      r3 │ add %24, %25 | integer
         │         │ store into %26, %23 | void
     %27 │      r2 │ imm 420 | <integer_literal>
     %28 │      r3 │ load %5 | integer[4]
     %29 │      r4 │ imm 3 | <integer_literal>
     %30 │      r3 │ add %28, %29 | integer
         │         │ store into %30, %27 | void
     %31 │      r1 │ load %5 | integer[4]
     %32 │      r2 │ imm 2 | <integer_literal>
     %33 │      r1 │ add %31, %32 | integer
     %34 │      r1 │ load %33 | integer
         │         │ ret %34 | void
}
D:/Programming/strema/2022/FUNCompiler/src/codegen/x86_64/arch_x86_64.c: In function ‘regsize_from_bytes’
D:/Programming/strema/2022/FUNCompiler/src/codegen/x86_64/arch_x86_64.c:207: Internal Compiler Error: Byte size can not be converted into register size on x86_64: 32
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function �() at offset 201420132022201d
  in function BaseThreadInitThunk() at offset 7fff05cd74a0
  in function RtlUserThreadStart() at offset 7fff05e82680

It seems that if we handle the case of loading an array turning into it's pointer, then we should be okay?

@LensPlaysGames LensPlaysGames added the enhancement New feature or request label Jan 23, 2023
@LensPlaysGames
Copy link
Owner Author

D:/Programming/strema/2022/FUNCompiler/src/codegen/x86_64/arch_x86_64.c: In function ‘regsize_from_bytes’
D:/Programming/strema/2022/FUNCompiler/src/codegen/x86_64/arch_x86_64.c:208: Internal Compiler Error: Byte size can not be converted into register size on x86_64: 0
Could not print stackframe: SymFromAddr returned false (err code 126)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
Could not print stackframe: SymFromAddr returned false (err code 487)!
  in function BaseThreadInitThunk() at offset 7fff05cd74a0
  in function RtlUserThreadStart() at offset 7fff05e82680
ERRORED!                 --  tests/incomplete-param.un
FAILED!  (optimised)     --  tests/incomplete-param.un
    Expected error, but test compiled successfully
test.S: Assembler messages:
test.S:2: Warning: .space repeat count is zero, ignored
FAILED!                  --  tests/incomplete.un
    Expected error, but test compiled successfully
FAILED!  (optimised)     --  tests/incomplete.un
    Expected error, but test compiled successfully

So, there appears to be some issue with incomplete types now. Luckily, it seems like it's just that the typechecker isn't catching this, and codegen is choking on the incomplete type (the type with zero byte size).

@LensPlaysGames
Copy link
Owner Author

At this point, it is by no means perfect, but I'd say we have fully implemented types in the IR, all the way from beginning to end. So maybe it's becoming time to merge this PR very soon

@LensPlaysGames LensPlaysGames mentioned this pull request Jan 27, 2023
@LensPlaysGames
Copy link
Owner Author

LensPlaysGames commented Jan 27, 2023

"Failing" tests:

FAILED!                  --  tests/bug_with_not_or_something.un
ERRORED! (optimised)     --  tests/bug_with_not_or_something.un
FAILED!                  --  tests/byte.un
FAILED! (optimised)      --  tests/byte.un
ERRORED!                 --  tests/cast.un
ERRORED! (optimised)     --  tests/cast.un
ERRORED!                 --  tests/incomplete-param.un
FAILED!  (optimised)     --  tests/incomplete-param.un
FAILED!                  --  tests/incomplete.un
FAILED!  (optimised)     --  tests/incomplete.un
SUCCESS! (optimised)     --  tests/local_arrays.un
SUCCESS!                 --  tests/overload-ref-in-call.un
FAILED! (optimised)      --  tests/overload-ref-in-call.un
FAILED!                  --  tests/overload-return-value-err.un
ERRORED!                 --  tests/overload-return-value.un

bug_with_not_or_something has been broken ever since I noticed it. I really should look into fixing it.

byte.un is kind of ambiguous whether or not the declared test result should be the expected result (no cast)

cast.un fails at codegen with Sorry, unimplemented: Codegen cast from <integer_literal> to byte

incomplete_param.un shows two problems, I think. Unoptimised, an incomplete type is able to be used without the typechecker complaining about it; this causes codegen to fail when trying to figure out how zero bytes fits into a register, lol. Optimised, it says "expected error, but test compiled successfully", so this means that we are just not catching this at all in that case, and codegen doesn't complain because it has been removed due to DCE or something.

incomplete.un is also expecting an error but compiling successfully; likely due to the fact that typechecker doesn't catch usage of incomplete type somewhere.

local_arrays.un, when optimised, does succeed, but it emits warnings that I know are faulty: Warning: Load of uninitialised variable in function foo. foo is a named function, which literally can't be uninitialised. So this is a curious warning; also curious---it's emitted three times even though foo only shows up twice in the program lmao

overload-ref-in-call.un, when optimised, fails with an incorrect exit code of 21 vs the expected 42. Optimisation breaks program semantics, it appears.

overload-return-value-err.un fails because it expects an error, but compiles successfully. Basically, the type checker needs to ensure at some point that all functions in each overload set share the same return type.

overload-return-value.un is broken because it relies on integer_literal to byte casting to work properly, and that isn't implemented in codegen yet.

@LensPlaysGames LensPlaysGames changed the title WIP: Types from beginning to end Types from beginning to end Jan 28, 2023
src/ast.h Outdated Show resolved Hide resolved
src/ast.h Outdated Show resolved Hide resolved
src/codegen.c Outdated Show resolved Hide resolved
src/codegen.c Outdated Show resolved Hide resolved
src/codegen.c Show resolved Hide resolved
src/typechecker.c Outdated Show resolved Hide resolved
src/typechecker.c Show resolved Hide resolved
src/typechecker.c Outdated Show resolved Hide resolved
src/typechecker.c Outdated Show resolved Hide resolved
tst/tests/cast_to_incomplete.un Show resolved Hide resolved
Promise, last one lmao.

Still have to fix assignment source location, eventually
Okay I lied this is the last one, lmao.
This will eventually allow for typedefs to be a drop-in-place feature,
rather than requiring extensive rewriting
This will now allow `ir_static_reference` to take in a direct
`IRStaticVariable`, when necessary, instead of just a name span.

Still not sure how to go about handling the local version of this, to
be honest. Maybe a `copy`?
…uctions

```
femit_reg_to_reg(context, I_MOV, inst->rhs->result, REG_RCX);
femit_reg(context, I_SAR, inst->lhs->result);
femit_reg_to_reg(context, I_MOV, inst->lhs->result, inst->result);
```

This clobbers left, not the right.
@LensPlaysGames
Copy link
Owner Author

LensPlaysGames commented Feb 7, 2023

One thing I've noticed; the type of IR_LOAD instructions seems to be off; this is probably a simple fix, but something in the backend may be relying on that since things are working currently. So I'll see what I can do, but may just open issue and leave it for future me :)

defun calculate_state (%2) global {
bb1:
      %2 │      r2 │ .reg %2 | integer
      %3 │      r1 │ copy %2 | integer
      %4 │      r3 │ alloca 8 | @integer
         │         │ store into %4, %3 | void
      %5 │      r1 │ alloca 8 | @integer
      %6 │      r3 │ load %4 | @integer
         │         │ store into %5, %6 | void
      %7 │      r1 │ alloca 8 | @integer
      %8 │      r2 │ imm 1 | <integer_literal>
         │         │ store into %7, %8 | void
         │         │ br bb2 | void
...

Excerpt from rule110.un example IR output with --debug-ir

I feel like the type of %6 should be integer, right? But in the backend, we may be doing a bodge where we use pointer->to type to counteract this type assignment bug.

@Sirraide
Copy link
Collaborator

Sirraide commented Feb 7, 2023

I feel like the type of %6 should be integer, right?

Yeah, it should

@Sirraide
Copy link
Collaborator

Sirraide commented Feb 7, 2023

This seems to be the issue:
https://github.com/LensPlaysGames/FUNCompiler/blob/40b4430e710e58b6999e9e7a3b8cb6a619da1c59/src/codegen/intermediate_representation.c#L485-L488
The line

load->type = address->type; 

should probably be

Type* t = type_canonical(address->type);
ASSERT(type_is_pointer(t));
load->type = t->pointer.to;

src/codegen.c Outdated Show resolved Hide resolved
Comment on lines 654 to 656
// TODO: `v->reference` may need to become list of references? I think this is why
// optimisation is broken.
// TODO: `v->reference` may need to become list of references?
Copy link
Collaborator

@Sirraide Sirraide Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the original idea was that there would ever only be one IR_STATIC_REF for each static variable. That’s why there simply was no ir_create_static_ref(), simply because the idea of such an operation would have been flawed to begin with, at least when I wrote this.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting; so there's only some confusion in the fact that IR_STATIC_REF both creates a static as well as holds a reference to an existing static (it's address), in the same way that ALLOCA does. However, can't a static be referenced from multiple scopes? If so, won't each one need a reference to the static? Maybe I'm thinking about this the wrong way

Copy link
Collaborator

@Sirraide Sirraide Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create a static variable, it creates an IRStaticVariable, which is added to a vector in the context and it’s how the variables are emitted by the backend. This structure holds information about e.g. the name of a static variable:
https://github.com/LensPlaysGames/FUNCompiler/blob/8282edc50843fc5ec882469198e1aedb149dd46e/src/codegen.h#L40-L44

By contrast, an IR_STATIC_REF is an instruction that wraps an IRStaticVariable, in the same way that an IR_ALLOCA wraps an IRStackAllocation. The IRStaticVariable is the variable, the IR_STATIC_REF contains a pointer to that variable. The type of an IR_STATIC_REF is a pointer to whatever the type of the static variable is, for the same reason that the type of an IR_ALLOCA is a pointer to the allocated storage.

The IR_STATIC_REF is just a way to refer to an IRStaticVariable in IR. Only one is needed for each variable as just like the IR_STATIC_REF holds a pointer to the IRStaticVariable, so does the IRStaticVariable contain a pointer to its IR_STATIC_REF. This is why the (not ‘a’) IR_STATIC_REF for a static variable is created together with that variable

src/typechecker.c Outdated Show resolved Hide resolved
Also add error for unhandled IR instruction types, just in case.
There are still some we need to transfer over, like `is_integer` and
such... We'll get there.
@LensPlaysGames
Copy link
Owner Author

The line
load->type = address->type;
should probably be

Type* t = type_canonical(address->type);
ASSERT(type_is_pointer(t));
load->type = t->pointer.to;

I think we also need to be able to dereference integers (in the IR), right? Otherwise subscript operator would just break

@LensPlaysGames
Copy link
Owner Author

Or add and other binary operators need to be able to result in a type other than integer (i.e. pointer)

@Sirraide
Copy link
Collaborator

Sirraide commented Feb 7, 2023

Or add and other binary operators need to be able to result in a type other than integer (i.e. pointer)

This is definitely the case and should be handled in Sema really. E.g. pointer+integer is a pointer, pointer-pointer is an integer, and pointer+pointer is a type error. ;Þ

@Sirraide
Copy link
Collaborator

Sirraide commented Feb 7, 2023

I think we also need to be able to dereference integers (in the IR), right?

No, that doesn’t really make sense... Only pointer types can be dereferenced. If an operation produces something that can or should be dereferenced, then that something should have pointer type.

@LensPlaysGames LensPlaysGames merged commit de4b01a into main Feb 7, 2023
@LensPlaysGames LensPlaysGames deleted the ir_types_bb branch February 7, 2023 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants