Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to assign a datatype to segment:0x0000 #4686

Open
Wall-AF opened this issue Oct 23, 2022 · 13 comments
Open

Ability to assign a datatype to segment:0x0000 #4686

Wall-AF opened this issue Oct 23, 2022 · 13 comments
Assignees
Labels
Status: Triage Information is being gathered

Comments

@Wall-AF
Copy link

Wall-AF commented Oct 23, 2022

Is your feature request related to a problem? Please describe.
There's no mechanism to apply a datatype to an address of the form SEG:0000. The below (correct disassembly) from KRNL386.EXE (Windows 3.11) illustrates this issue perfectly:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             void __pascal16far POSTEVENT(HTASK hTask)
                               assume DS = 0x1018
             void              <VOID>         <RETURN>
             HTASK             Stack[0x4]:2   hTask
                             Ordinal_31                                      XREF[1]:     Entry Point(*)  
                             POSTEVENT
   1000:7d92 e8 e7 ff       0           CALL       GetTaskIntoES                                    SegmentCodeAddress GetTaskIntoES
   1000:7d95 26 ff 06       0           INC        word ptr ES:[0x6]
             06 00
   1000:7d9a ca 02 00       0           RETF       0x2

decompiles to:

void __pascal16far POSTEVENT(HTASK hTask)

{
   int *piVar1;
   SegmentCodeAddress SVar2;
   
   SVar2 = GetTaskIntoES(hTask);
   piVar1 = (int *)&DAT_1018_0006;
   *piVar1 = *piVar1 + 1;
   return;
}

converting ES:[0x6] into an address in the applications DS (data segment - 1018:0006). Whereas what's required is for ES:[0x6] to be the 6th-7th byte of a 0x200 byte structure (TaskDatabase).

Describe the solution you'd like
The ability to set a datatype on an address of this format. (I believe it is what used to be a based datatype.)

Describe alternatives you've considered
Changing the global data area (at address 1018:0000) to type TaskDatabase - breaks everything else that really uses the data in 1018:0000 - 1018:0200.

@emteere
Copy link
Contributor

emteere commented Nov 7, 2022

Try setting ES to a value at the beginning of the function as is done for DS.
You could create an arbitrary memory block in segment memory to hold TaskDatabase, and set ES at the beginning of this function to that value.

Looking at the function more closely, I would make a custom return value for the CALL to GetTaskIntoES and have it return a pointer to the TaskDatabase in the ES register. This is actually what is going on. The decompiler should pick that up.

There may be other ways with the new pointer typedef, but I'd try this first.

@emteere emteere self-assigned this Nov 7, 2022
@emteere emteere added the Status: Waiting on customer Waiting for customer feedback label Nov 7, 2022
@Wall-AF
Copy link
Author

Wall-AF commented Nov 7, 2022

Nice one! Creating a memory block to hold a TaskDatabase object worked, but only after setting the DS register to the newly created memory block. I think the reason/fault is described in #4393.

The only issue doing it this way is lots of new memory blocks will be needed for any type accessed this way (or maybe a big union!).

@emteere
Copy link
Contributor

emteere commented Nov 8, 2022

Glad you got it to work.

If you can change the GetTaskIntoES() to return a value in ES and set the return type to (TaskDatabase *), you would probably get a more general solution. Might be some issues, since ES is not the full pointer size, so it might not accept the data type / offset.

@Wall-AF Wall-AF closed this as completed Nov 8, 2022
@Wall-AF Wall-AF reopened this Nov 8, 2022
@Wall-AF
Copy link
Author

Wall-AF commented Nov 8, 2022

If you can change the GetTaskIntoES() to return a value in ES and set the return type to (TaskDatabase *)

That's what I did beforehand, but doing that alone doesn't work because ES is the segment portion of the address and not the offset. Also, setting ES for the next assembly instruction (as below)

             assume ES = 0x9000
   1000:7d95 26 ff 06       0           INC        word ptr ES:[0x6]
             06 00

does nothing when interpreting the addressES:[0x6]. However, setting DS instead like

             assume DS = 0x9000
   1000:7d95 26 ff 06       0           INC        word ptr ES:[0x6]
             06 00

produces a better outcome

void __pascal16far POSTEVENT(HTASK hTask)
{
   int *piVar1;
   TaskDatabase *pTVar2;
   
   pTVar2 = GetTaskIntoES(hTask);
   piVar1 = (int *)&taskDB.nWaitingEvents;
   *piVar1 = *piVar1 + 1;
   return;
}

However, as you can see, the variable pTVar2 is assigned but as it's just the ES segment, the decompiled output doesn't understand it and consequently ignores it!!!

@emteere
Copy link
Contributor

emteere commented Nov 9, 2022

I thought that might happen.
Would be better if the ES were understood as the upper part of the pointer.

Did you set the assume value at the instruction, or at the beginning of the function?
If you were able to do it at the instruction, that is a change in the decompiler I was unaware of. In previous versions, the decompiler only picks up setting of register values at the beginning of a function.

@Wall-AF
Copy link
Author

Wall-AF commented Nov 10, 2022

Would be better if the ES were understood as the upper part of the pointer.

Absolutely. (Don't forget about the other segment registers CS, DS, FS, GS, SS, etc.)

Did you set the assume value at the instruction, or at the beginning of the function?

I changed the register value at the instruction location NOT the beginning of the function.

Also, adding a value for ES at the start of the function just gave ES that value for the first assembly instruction. Selecting all assembly statements in the function DID apply the given value to ES for all instructions but, unless I amended the DS register on the INC instruction, the member (of the taskDB object) wasn't recognised because the data element used the original DS value (0x1018) resulting in the data item being DAT_1018_0006 instead of taskDB.nWaitingEvents (which was the 6th byte at my newly created - and type asigned - location of 9000:0000). (To me, this clearly shows that Ghidra is currently only geared up to work on the data segment DS. Can this be addressed?)

@emteere
Copy link
Contributor

emteere commented Jan 3, 2023

Changing the register value at any instruction location other than at the start of the function will be ignored. It has to occur at the beginning of the function for the decompiler to pick it up.

Setting DS at the INC instruction should have no effect.

I think what is happening is that when you set the ES at the start of the function, and then changed the function to return a value in ES in a custom calling convention, essentially the decompiler thinks the ES register value has changed from the call and no longer uses the value at entry to the function.

There might be an assumption in the decompiler that assumes values are offset from the DS segment, but in this case I'm not sure why you are getting a pointer to &DAT_1018_0006

In the main branch, I get:

void __stdcall16far POSTEVENT(void)

{
  int *piVar1;
  undefined2 unaff_ES;
  
  FUN_1000_7d7c();
  piVar1 = (int *)0x6;
  *piVar1 = *piVar1 + 1;
  return;
}

If I set ES at the beginning of the function with NO changes to the called function signature:

void __stdcall16far POSTEVENT(void)

{
  FUN_1000_7d7c();
  iRam90000006 = iRam90000006 + 1;
  return;
}

Adding the TaskDatabase structure at 9000 yields:

void __stdcall16far POSTEVENT(void)

{
  FUN_1000_7d7c();
  taskDB.nWaitingEvents = taskDB.nWaitingEvents + 1;
  return;
}

This is using protected mode which keeps the segment separate from the offset for pointers.
We'll be changing the default import to protected mode and making some changes to the assumptions for calculations with the CS register. They currently are very incorrect.

@Wall-AF
Copy link
Author

Wall-AF commented Feb 1, 2023

Sorry for the delayed response, I missed yours!

Changing the register value at any instruction location other than at the start of the function will be ignored. It has to occur at the beginning of the function for the decompiler to pick it up.

This is definitely untrue/incorrect for the DS register, but seems true for ES.

Setting DS at the INC instruction should have no effect.

It definitely does!

I think what is happening is that when you set the ES at the start of the function, and then changed the function to return a value in ES in a custom calling convention, essentially the decompiler thinks the ES register value has changed from the call and no longer uses the value at entry to the function.

First, I setup ES as the returned datatype from GetTaskIntoES (FUN_1000_7d7c) prior to attempting to fix POSTEVENT. Regardless, I agree that having GetTaskIntoES return a value for ES has caused the decompiler to cease using the value set at entry. However (as above) I can override the DS register at statement level, so that should be a possibility with other registers.

There might be an assumption in the decompiler that assumes values are offset from the DS segment, but in this case I'm not sure why you are getting a pointer to &DAT_1018_0006

This appears to be the case, and that's why I'm seeing &DAT_1018_0006 as the value of DS is 1018.

This is using protected mode which keeps the segment separate from the offset for pointers. We'll be changing the default import to protected mode and making some changes to the assumptions for calculations with the CS register. They currently are very incorrect.

Please ensure these changes don't require a reload otherwise months/years of work will go down the proverbial drain!!!

@Wall-AF
Copy link
Author

Wall-AF commented Feb 1, 2023

This is using protected mode which keeps the segment separate from the offset for pointers. We'll be changing the default import to protected mode and making some changes to the assumptions for calculations with the CS register. They currently are very incorrect.

Will any of these help with near/far pointers?

@Wall-AF
Copy link
Author

Wall-AF commented Feb 1, 2023

I have a similar situation here - dragon_FUN_1108_0dd6.zip in which a lower level nested structure contains an array of segments that should be used as the basis of another array of uint's, whereas instead I'm seeing an assignment to the constant index, like:
*(undefined2 *)((int)idWord << 1) = 0;.
in the for loop!

Tried the solution described earlier by creating a new memory block, but that didn't work.

@ryanmkurtz ryanmkurtz added Status: Triage Information is being gathered and removed Status: Waiting on customer Waiting for customer feedback labels Feb 1, 2023
@Wall-AF Wall-AF closed this as completed Aug 4, 2023
@Wall-AF Wall-AF reopened this Aug 4, 2023
@Wall-AF
Copy link
Author

Wall-AF commented Aug 4, 2023

@ryanmkurtz any progress?

@Wall-AF
Copy link
Author

Wall-AF commented Aug 5, 2023

In the example above, the line
*(undefined2 *)((int)idWord << 1) = 0;
should actually be something akin to

pObj = MAKELP(pCatNGram.a5x12Byte_0x24.m_saSegs_0x8.m_aSegs[(idWord % 65536) >>15], idWord<<1);
*pObj = 0;

as each element of the m_aSegs array is a 2-byte segment descriptor and the offset into that segment is the index represented by another 2-byte (word) array index idWord, hence together making up a 32-bit address.

@ryanmkurtz
Copy link
Collaborator

In general, if the label on the ticket is still Status: Triage, you can assume no progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Information is being gathered
Projects
None yet
Development

No branches or pull requests

3 participants