Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment modified MOV statement ignored by decompiler #4393

Open
Wall-AF opened this issue Jun 29, 2022 · 10 comments
Open

Segment modified MOV statement ignored by decompiler #4393

Wall-AF opened this issue Jun 29, 2022 · 10 comments
Assignees
Labels
Feature: Decompiler Status: Future This has future value but is not being worked on at this time

Comments

@Wall-AF
Copy link

Wall-AF commented Jun 29, 2022

Describe the bug
I have disassembly that is modifying memory in a segment (ES) other than the default (DS):

             assume ES = 0x11f8
   1100:0937 26 88 87     014           MOV        byte ptr ES:[BX + 0x4f62],AL
             62 4f
                                                      $U1c00:2 = INT_ADD BX, 0x4f62:2
                                                      $U3800:4 = CALLOTHER "segment", ES, $U1c00:2
                                                      $U7800:1 = COPY AL
                                                      STORE ram($U3800:4), $U7800:1

However the equivalent decompilation is turning the address reference 0x4f52 into a memory area pointed to by the combination of this address and the value stored in the default data (DS) segment instead of using the overridden ES segment producing
*(byte *)((int)"LMEXTERN" + 8 + (uint)local_b) = local_b & 0x7f;

When I forcibly modify the DS register to 11f8 the output becomes
*(byte *)((int)BYTE_ARRAY_11f8_4f62 + (uint)local_b) = local_b & 0x7f;

This means that the decompiler is ignoring the register override that is correctly interpreted by the disassembly process.

To Reproduce
Steps to reproduce the behavior:

  1. Load the Debug Function Decompilation menu output (see attachment)
  2. See error

Expected behavior
Segment overrides respected through the complete disassembly to decompilation process.

Attachments
Output from Debug Function Decompilation menu
dragon_FUN_1100_0823.zip

Environment (please complete the following information):

  • OS: Windows 11
  • Java Version: 17.0.3.1
  • Ghidra Version: [e.g. 10.2-DEV
  • Ghidra Origin: locally built commit a59c42d
@Wall-AF
Copy link
Author

Wall-AF commented Jul 4, 2022

There is something weird going on with segment referenced addresses in general and constants too! One example I have recently seen is a correctly defined function void InitParDefault (DgnParDetailsList_0xa_t *32 pParDetLst, LPCSTR pszParName, int nValue) that is called with the following assembly:

   1088:007e 68 40 1f     002           PUSH       8000
                                                      $U9400:2 = COPY 0x1f40:2
                                                      SP = INT_SUB SP, 2:2
                                                      $U9580:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9580:4), $U9400:2
   1088:0081 1e           004           PUSH       DS
                                                      $U9400:2 = COPY DS
                                                      SP = INT_SUB SP, 2:2
                                                      $U9580:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9580:4), $U9400:2
   1088:0082 68 98 2a     006           PUSH       0x2a98
                                                      $U9400:2 = COPY 0x2a98:2
                                                      SP = INT_SUB SP, 2:2
                                                      $U9580:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9580:4), $U9400:2
   1088:0085 66 ff 76 06  008           PUSH       dword ptr [BP + pParDetLst]
                                                      $U1100:2 = INT_ADD BP, 6:2
                                                      $U3400:4 = CALLOTHER "segment", SS, $U1100:2
                                                      $U7a00:4 = LOAD ram($U3400:4)
                                                      $U9600:4 = COPY $U7a00:4
                                                      SP = INT_SUB SP, 4:2
                                                      $U9780:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9780:4), $U9600:4
   1088:0089 9a d4 0a     00c           CALLF      InitParDefault                                   void InitParDefault(DgnParDetail
             98 10
                                                      $U9400:2 = COPY CS
                                                      SP = INT_SUB SP, 2:2
                                                      $U9580:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9580:4), $U9400:2
                                                      CS = COPY 0x1098:2
                                                      $U9400:2 = COPY 0x8e:2
                                                      SP = INT_SUB SP, 2:2
                                                      $U9580:4 = CALLOTHER "segment", SS, SP
                                                      STORE ram($U9580:4), $U9400:2
                                                      CALL *[ram]0x10980ad4:4

But the decompilation shows: InitParDefault(pParDetLst,"npre", (int)"State_GetHighestMixedRecogMethod" + 7); ("npre" is located at DS:2a98.)
Which is implying that Ghidra is interpreting PUSH 8000 as a memory location instead of the constant -32768!

Others involve near references to function addresses that should use the code segmentCSregister to create the full address, but Ghidra automagically (incorrectly) uses the data segmentDSregister and there doesn't appear to be any way to override these mistakes.

@caheckman caheckman added Status: Prioritize This is currently being prioritized and removed Status: Triage Information is being gathered labels Jul 21, 2022
@caheckman caheckman added Status: Future This has future value but is not being worked on at this time and removed Status: Prioritize This is currently being prioritized labels Aug 4, 2022
@Wall-AF
Copy link
Author

Wall-AF commented Oct 11, 2022

@caheckman any advances/insights/timeline?

@Wall-AF
Copy link
Author

Wall-AF commented Nov 2, 2022

@ryanmkurtz @caheckman any update???

@ryanmkurtz
Copy link
Collaborator

I must defer to @caheckman

@Wall-AF
Copy link
Author

Wall-AF commented Nov 8, 2022

In dragon_FUN_10d0_0000.zip example, the structure members of param_2 aren't recognised and appear as messy computations (like if (*(int *)((int)param_2 + 10) != idVocab) break;) within the do loop. Maybe also to do with Ghidra's misinterpretation of segments! Any thoughts???

@Wall-AF
Copy link
Author

Wall-AF commented Mar 23, 2023

@caheckman @ryanmkurtz any progress??????

@ryanmkurtz
Copy link
Collaborator

The Future label indicates no progress has been made yet.

@Wall-AF
Copy link
Author

Wall-AF commented Mar 24, 2023

@caheckman @ryanmkurtz Okay, thanks. Sorry for badgering.

@Wall-AF
Copy link
Author

Wall-AF commented Nov 26, 2023

Here's another GOOD example whereby just the segment values are passed (the 1st & 2nd parameters) as references to data tables used to fill the returned (parameter) pResults.

I have placed commented code throughout generated C that Ghidra should be capable of producing, as a guide. Is there anything else I can supply to help you with a solution?

@Wall-AF
Copy link
Author

Wall-AF commented Nov 27, 2023

Here's another GOOD example whereby just the segment values are passed (the 1st & 2nd parameters) as references to data tables used to fill the returned (parameter) pResults.

I have placed commented code throughout generated C that Ghidra should be capable of producing, as a guide. Is there anything else I can supply to help you with a solution?

Having thought more about this, the two following comments of what the code should produce
/* nVal = *MK_FP(pHi, nIdx << 1); */
and
/* nVal = *MK_FP(pLo, (nIdx << 1) & 0xffff); */
would be more usable if each pointer creation was separated, from the actual offset within the structure (or array) it represents, so that the user could modify the datatype assigned (if necessary) and would subsequently be
/* pTmp1 = MK_FP(pHi, 0);
nVal = pTmp1->aInts[nIdx]; */
and
/* pTmp2 = MK_FP(pLo, 0);
nVal = pTmp2->aInts[nIdx & 0x7fff); */
where in this case, the two temporary pointers would be pointing to a structure who's only member (aInts) is an array of type int.

How can this be achieved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Future This has future value but is not being worked on at this time
Projects
None yet
Development

No branches or pull requests

3 participants