Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect memory allocation for NEW []:INT #29

Open
dmcoles opened this issue May 2, 2023 · 10 comments
Open

Incorrect memory allocation for NEW []:INT #29

dmcoles opened this issue May 2, 2023 · 10 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers
Milestone

Comments

@dmcoles
Copy link
Contributor

dmcoles commented May 2, 2023

create:
c:=NEW [11,21,31,41,51,61,71]:INT
destroy:
END c[7]
fin:

generates this code:

create:
move.l #$10,d3 (allocate 16 bytes)
move.l d3,-(a7)
bsr.l FastNew
addq.l #4
movea.l d0,a1
move.l #$B,d0
move.w d0,0(a1)
move.l #$15,d1
move.w d1,2(a1)
move.l #$1F,d2
move.w d2,4(a1)
move.l #$29,d0
move.w d0,6(a1)
move.l #$33,d1
move.w d1,8(a1)
move.l #$3D,d2
move.w d2,$A(a1)
move.l #$47,d0
move.w d0,$C(a1)
move.l a1,-4(a5)
destroy:
movea.l -4(a5),a1
moveq #7 (number of elements)
lsl.l #1 (*2 = 14)
move.l d1,-(a7)
move.l a1,-(a7)
move.l a1,d3
move.l d3,-(a7)
move.l d1,-(a7)
bsr.l FastDispose
addq.l #8
movea.l (a7)+,a1
move.l (a7)+,d1
moveq #0,d2
move.l d2,-4(a5)
fin:

so the allocation is 16 bytes but the freeing is only 14 bytes.

@SamuraiCrow
Copy link
Contributor

Well spotted!

@SamuraiCrow SamuraiCrow added bug Something isn't working good first issue Good for newcomers labels May 2, 2023
@SamuraiCrow SamuraiCrow added this to the Release 1.0 milestone May 2, 2023
@SamuraiCrow
Copy link
Contributor

@Hypexed , can you verify the PPC assembly generates correct code in this case? I've never learned PPC Assembly yet.

@dmcoles
Copy link
Contributor Author

dmcoles commented May 2, 2023

i'm no expert in PPC but i think its broken there also

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined create()
             undefined         r3:1           <RETURN>
             undefined4        Stack[0x8]:4   local_res8                              XREF[1]:     000009e0(W)  
                             create
        000009a0 38 60 00 10     li         r3,0x10      (16 bytes)
        000009a4 48 00 00 c9     bl         FastNew                                          undefined FastNew()
        000009a8 39 40 00 0b     li         r10,0xb
        000009ac b1 43 00 00     sth        r10,0x0(r3)
        000009b0 39 20 00 15     li         r9,0x15
        000009b4 b1 23 00 02     sth        r9,0x2(r3)
        000009b8 39 00 00 1f     li         r8,0x1f
        000009bc b1 03 00 04     sth        r8,0x4(r3)
        000009c0 38 e0 00 29     li         r7,0x29
        000009c4 b0 e3 00 06     sth        r7,0x6(r3)
        000009c8 38 c0 00 33     li         r6,0x33
        000009cc b0 c3 00 08     sth        r6,0x8(r3)
        000009d0 38 a0 00 3d     li         r5,0x3d
        000009d4 b0 a3 00 0a     sth        r5,0xa(r3)
        000009d8 38 80 00 47     li         r4,0x47
        000009dc b0 83 00 0c     sth        r4,0xc(r3)
        000009e0 90 61 00 08     stw        r3,local_res8(r1)
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined destroy()
             undefined         r3:1           <RETURN>
             undefined4        Stack[0x1c]:4  local_res1c                             XREF[2]:     000009f4(W), 
                                                                                                   00000a00(R)  
             undefined4        Stack[0x18]:4  local_res18                             XREF[2]:     000009f0(W), 
                                                                                                   00000a04(R)  
             undefined4        Stack[0x8]:4   local_res8                              XREF[2]:     000009e4(R), 
                                                                                                   00000a0c(W)  
                             destroy
        000009e4 80 61 00 08     lwz        r3,Stack[0x8](r1)
        000009e8 39 40 00 07     li         r10,0x7     (7 items)
        000009ec 55 4a 08 3c     rlwinm     r10,r10,0x1,0x0,0x1e    (rotate left by 1 and only keep bits 0-30
        000009f0 90 61 00 18     stw        r3,local_res18(r1)
        000009f4 91 41 00 1c     stw        r10,local_res1c(r1)
        000009f8 7d 44 53 78     or         r4,r10,r10
        000009fc 48 00 01 f9     bl         FastDispose                                      undefined FastDispose()
        00000a00 81 41 00 1c     lwz        r10,local_res1c(r1)
        00000a04 80 61 00 18     lwz        r3,local_res18(r1)
        00000a08 39 20 00 00     li         r9,0x0
        00000a0c 91 21 00 08     stw        r9,local_res8(r1)

@SamuraiCrow
Copy link
Contributor

Ok, and W is 32-bit word and H is 16-bit half-word in the store operations. You're right about the fancy rotate 1, mask left 0, mask right 18 to shift.

Conclusion:
It's not in the final code-gen stage. The PPC machine code is equivalent.

@Hypexed
Copy link

Hypexed commented Jul 12, 2023

Sorry bit late to the party here. From a cursory glance this code generated is slightly poor. Obvious improvements could be made to the code generator. It also knows the numbers ahead of time and yet calculates them on the fly. I don't understand why it sends 2 parameters down the stack twice to FastDispose. So on PPC it also loads in 7 and doubles it. Why it doesn't use 14 directly I have no idea. Also, the PPC uses too many memory stores. On PPC it's standard to use registers for locals but it isn't doing that. Well, I suppose it works. :-)

@SamuraiCrow
Copy link
Contributor

The code quality could be addressed by issue #26 . The current code generator is not "production grade" and that issue will allow the replacement of the optimizer and code gen using a recent version of GCC. Please comment there.

@SamuraiCrow
Copy link
Contributor

so the allocation is 16 bytes but the freeing is only 14 bytes.

This is probably the cause of issue #8

Also, the PPC uses too many memory stores. On PPC it's standard to use registers for locals but it isn't doing that. Well, I suppose it works. :-)

There needs to be an OPT OPTIMIZE or something similar in EEC/ECX to trigger the register loading optimizations. It's there, you just need to ask for it.

@Hypexed
Copy link

Hypexed commented Jul 21, 2023

There needs to be an OPT OPTIMIZE or something similar in EEC/ECX to trigger the register loading optimizations. It's there, you just need to ask for it.

That's a good idea. For PPC it should be as standard though. The PPC ABI uses registers for function parameters by convention and stores locals in registers as well. It can also call without using stack to optimise calls further. But in this case it lacks a stack frame for debugging.

@SamuraiCrow
Copy link
Contributor

SamuraiCrow commented Jul 21, 2023

There needs to be an OPT OPTIMIZE or something similar in EEC/ECX to trigger the register loading optimizations. It's there, you just need to ask for it.

That's a good idea. For PPC it should be as standard though. The PPC ABI uses registers for function parameters by convention and stores locals in registers as well. It can also call without using stack to optimise calls further. But in this case it lacks a stack frame for debugging.

When I said "needs to be", I meant either in your code or from the command line. The option is already there. It's shorthand for OPT REGS=MAX + one other optimization. It's equivalent to -O2 in GCC.

Edit

The command line switch is OPTI to activate optimization.

@SamuraiCrow
Copy link
Contributor

Since AllocMem() rounds partial longwords up to the next higher longword length, this can only cause a crash if the deallocation rounds up to a different length than the allocation.

Now if I could find where FastNew() is declared, I might be able to find its callsites in the compiler.

@SamuraiCrow SamuraiCrow self-assigned this Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants