Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need ability to distinguish between 16- and 32- bit disassembly to RE Windows 3.X protected-mode applications #4685

Open
Wall-AF opened this issue Oct 22, 2022 · 6 comments
Assignees
Labels
Feature: Processor/x86 Status: Triage Information is being gathered

Comments

@Wall-AF
Copy link

Wall-AF commented Oct 22, 2022

When REing 16-bit protected mode applications, the flag that indicates the 16-/32-bit definition of each code segment comes from the segment descriptor table (LDT/GDT). This is unavailable (or unused) by Ghidra and consequently is confusing the decompiler by using incorrect assembly when using the 66h/67h overrides on components.

E.g. the bytes 66 60 are being disassembled into PUSHAD instead of (the ia.sync's definition of) PUSHA which in this case is causing the decompiler to produce a whole load of rubbish and even messes with the parameters of the forwarding function call (which, btw is this function's sole purpose). Here's Ghidra's disassembly:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             void __pascal16far LOGPARAMERROR(enumUINT_LogParamErrors
                               assume DS = 0x1018
             void              <VOID>         <RETURN>
             enumUINT_LogPa    Stack[0xc]:2   idErr                                   XREF[1]:     1000:9517(*)  
             FARPROC           Stack[0x8]:4   pfn                                     XREF[1,1]:   1000:9514(*), 
                                                                                                   1000:9511(*)  
             LPVOID            Stack[0x4]:4   pvDetails                               XREF[1,1]:   1000:950e(*), 
                                                                                                   1000:950b(*)  
                             Ordinal_325                                     XREF[2]:     Entry Point(*), 
                             LOGPARAMERROR                                                K327:1000:94b8(c)  
   1000:9500 55             0           PUSH       BP
   1000:9501 8b ec        002           MOV        BP,SP
   1000:9503 1e           002           PUSH       DS
   1000:9504 06           004           PUSH       ES
   1000:9505 0f a0        006           PUSH       FS
   1000:9507 0f a8        008           PUSH       GS
   1000:9509 66 60        00a           PUSHAD
   1000:950b ff 76 08     02a           PUSH       word ptr [BP + pvDetails+0x2]
   1000:950e ff 76 06     02c           PUSH       word ptr [BP + pvDetails]
   1000:9511 ff 76 0c     02e           PUSH       word ptr [BP + pfn+0x2]
   1000:9514 ff 76 0a     030           PUSH       word ptr [BP + pfn]
   1000:9517 ff 76 0e     032           PUSH       word ptr [BP + idErr]
   1000:951a e8 2d 05     034           CALL       FUN_1000_9a4a                                    undefined FUN_1000_9a4a(enumUINT
   1000:951d 66 61        034           POPAD
   1000:951f 0f a9        014           POP        GS
   1000:9521 0f a1        012           POP        FS
   1000:9523 07           010           POP        ES
   1000:9524 1f           00e           POP        DS
   1000:9525 c9           00c           LEAVE
   1000:9526 ca 0a 00     - ? -         RETF       0xa

and here's the corresponding (incorrect) decompilation:


/* WARNING: Stack frame is not setup normally: Input value of stackpointer is not used */

void __pascal16far LOGPARAMERROR(enumUINT_LogParamErrors idErr,FARPROC pfn,LPVOID pvDetails)

{
   undefined4 in_EAX;
   undefined4 in_ECX;
   undefined4 in_EDX;
   undefined4 in_EBX;
   int iVar1;
   ulong in_ESP;
   ulong in_EBP;
   undefined4 in_ESI;
   undefined4 in_EDI;
   undefined2 unaff_ES;
   undefined2 unaff_SS;
   undefined2 in_FS;
   undefined2 in_GS;
   
   iVar1 = (int)in_ESP;
   *(undefined2 *)(iVar1 + -2) = (int)in_EBP;
   *(int *)(iVar1 + -4) = (int)s_FONTS.FON_1018_100f + 9;
   *(undefined2 *)(iVar1 + -6) = unaff_ES;
   *(undefined2 *)(iVar1 + -8) = in_FS;
   *(undefined2 *)(iVar1 + -10) = in_GS;
   *(undefined4 *)(iVar1 + -0xe) = in_EAX;
   *(undefined4 *)(iVar1 + -0x12) = in_ECX;
   *(undefined4 *)(iVar1 + -0x16) = in_EDX;
   *(undefined4 *)(iVar1 + -0x1a) = in_EBX;
   *(ulong *)(iVar1 + -0x1e) = in_ESP & 0xffff0000 | (ulong)(iVar1 - 10);
   *(ulong *)(iVar1 + -0x22) = in_EBP & 0xffff0000 | (ulong)(iVar1 - 2);
   *(undefined4 *)(iVar1 + -0x26) = in_ESI;
   *(undefined4 *)(iVar1 + -0x2a) = in_EDI;
   *(undefined2 *)(iVar1 + -0x2c) = *(undefined2 *)(iVar1 + 6);
   *(undefined2 *)(iVar1 + -0x2e) = *(undefined2 *)(iVar1 + 4);
   *(undefined2 *)(iVar1 + -0x30) = *(undefined2 *)(iVar1 + 10);
   *(undefined2 *)(iVar1 + -0x32) = *(undefined2 *)(iVar1 + 8);
   *(undefined2 *)(iVar1 + -0x34) = *(undefined2 *)(iVar1 + 0xc);
   *(undefined2 *)(iVar1 + -0x36) = 0x951d;
   FUN_1000_9a4a(*(enumUINT_LogParamErrors *)(iVar1 + -0x34),*(FARPROC **)(iVar1 + -0x32),
                 *(LPVOID *)(iVar1 + -0x30));
   return;
}

If I manually replace the 66 with a NOP, the decompilation becomes:


void __pascal16far LOGPARAMERROR(enumUINT_LogParamErrors idErr,FARPROC pfn,LPVOID pvDetails)

{
   FUN_1000_9a4a(idErr,pfn,pvDetails);
   return;
}

This needs some kind of mechanism to tell Ghidra that the current code segment we are REing is actually one using 32-bit, which in turn would correctly interpret the 66 override (hopefully!).

I have seen this issue in several locations throughout my application suit.

Some of this is described in "Undocumented Windows." by Andrew Schulman
Subtitle: A programmer's guide to reserved Microsoft Windows API functions
Covers Windows 3.1 and 3.0

Originally posted by @Wall-AF in #2033 (comment)

@emteere
Copy link
Contributor

emteere commented Nov 29, 2022

You can try setting the context bits addrsize=1 and opsize=1 to change the decode of the the instructions to a 32-bit model and then re-disassemble this section.

addrsize # =0 16-bit addressing =1 32-bit addressing
opsize= # =0 16-bit operands =1 32-bit operands =2 64-bit operands

opsize gets flipped by the 0x66 prefix, so opsize=1 will end up as 0 and match the PUSHA
addrsize gets flipped by the 0x67, so opsize=1 will end up as 0

These will show up as registers in the set register popup menu. You can try it on this function and if it works for you, select areas that should be considered 32-bit code and use set the context registers.

These settings might not quite solve this issue, but you can try it. There may be other issues with a mixed 16 and 32 bit memory model. I've used 16-bit or 32-bit base processors and overridden them depending on the majority of the code.

@Wall-AF
Copy link
Author

Wall-AF commented Nov 30, 2022

Mmm! Any attempt (with/without selecting a range of assembly) to change either of these context bits results in the following error!
image

What am I doing wong???

@emteere
Copy link
Contributor

emteere commented Dec 2, 2022

Oh yeah, forgot you need to clear the instructions first before setting the context.

You can't change the context if an instruction already is disassembled there because you might be changing something that was used in the instruction match. Same thing applies to changing byte values. You can't change them while an instruction exists there.

@Wall-AF
Copy link
Author

Wall-AF commented Dec 2, 2022

Thanks @emteere. Trying that solution produced (what I believed to be) the correct disassembly, but had absolutely no effect upon the decompilation (see screenshot below) except for the 2 red X's complaining about "inconsistant context"!
image

However, if I replace the two prefix bytes 66h of the two PUSHAD/POPAD instructions with NOP(90h), then the two instructions become PUSHA & POPA (as above) and the decompilaton appears correct (see below).
image

This implies something else is happening somewhere! Any ideas??

@Wall-AF
Copy link
Author

Wall-AF commented Dec 4, 2022

FYI, if I change either or both addrsize/opsize then the decompiler breaks with this error:
Exception while decompiling 1000:9500: Decompiler process died

@GhidorahRex GhidorahRex assigned emteere and unassigned GhidorahRex Dec 12, 2022
@Wall-AF
Copy link
Author

Wall-AF commented May 31, 2023

Mmm, seems the NeLoader supports the 16-/32-bit distinction after all and has identified the example above as 16-bit. However that doesn't deminish the issue described here. The example above comes from a segment identified by the loader as 16-bit and therefore the use of the 66h&67h overrides are indeed correct, however the resulting decompiled output isn't and needs to be supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Processor/x86 Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

4 participants