Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address not in trace #4059

Closed
astrelsky opened this issue Mar 8, 2022 · 44 comments
Closed

Address not in trace #4059

astrelsky opened this issue Mar 8, 2022 · 44 comments

Comments

@astrelsky
Copy link
Contributor

astrelsky commented Mar 8, 2022

Describe the bug
Attempting to go to a virtual address to see the values in memory results in "Address not in trace" error.

To Reproduce
Steps to reproduce the behavior:

  1. Compile the two code snippets in the attachments sections.
  2. Run analysis in the code browser for the launcher and the dll.
  3. Open the launcher and the dll in the debuger tool.
  4. In the launcher tab go to main and then in the decompiler window right-click anywhere just for laughs (and because I'm lazy).
  5. Launch the debugger, in-vm and add the dll as the first argument.
  6. Set a breakpoint after the call to LoadLibraryA.
  7. Run until the breakpoint is hit.
  8. Switch to the dll tab.
  9. Open the modules view, right click on your dll and select "Map module to {dll_name}".
  10. Set a breakpoint at the entry point to debuggerProblems.
  11. Go back to the launcher and continue running until the new breakpoint is hit.
  12. Set until the string pointer is loaded into a register.
  13. Find the register in the memory view.
  14. Double-click the register, right click, go to the stack-view, registers in the object tree for the stopped thread, look through the Windows toolbar and just scratch you head in utter confusion.
  15. Give up and try to use goto.
  16. Enter the address pr be lazy and use *:4 EAX (assumming it is in EAX).
  17. Be greeted with "Address not in trace" error text.
  18. Mash enter repeatedly hoping it will magically work the next time.
19.

7686178464_fdc8ea66c7

Attachments

main.cpp
#include <windows.h>
#include <iostream>
#include <system_error>

[[noreturn]] void garbageWinApiError() {
    std::error_code ec (errno,std::system_category());
    std::cerr << ec.message() << std::endl;
    throw std::system_error(ec);
}

struct Dll {
    HMODULE mod;

    Dll(HMODULE m) : mod(m) {};
    Dll& operator=(HMODULE m) { mod = m; return *this; }
    ~Dll() { FreeLibrary(mod); }
    operator HMODULE() { return mod; }
    operator bool() { return mod != nullptr; }
};

int main(int argc, const char **argv) {
    if (argc <= 1) {
        return -1;
    }
    Dll lib = LoadLibrary(argv[1]);
    if (!lib)
        garbageWinApiError();
    FARPROC fun = GetProcAddress(lib, "debuggerProblems");
    if (fun == nullptr)
        garbageWinApiError();
    fun();
    std::cout << "Press any key to continue..." << std::endl;
    std::cin.get();
}
dllmain.cpp
#include <Windows.h>
#include <iostream>

extern "C" BOOL WINAPI DllMain(HINSTANCE hinstDLL, DWORD fdwReason, LPVOID lpReserved) {
    if (fdwReason == DLL_PROCESS_ATTACH) {
        std::cout << "Deleting System32 please wait...\n"
            << "Deleted successfully...\n"
            << "Sending all your data through Ghidra's backdoor or something..." // for debugging purposes of course ◔_◔
            << std::endl;
    }
    return true;
}

__declspec(dllexport) extern "C" void debuggerProblems() {
    std::cout << "You're too late we're already done!" << std::endl;
}

Environment (please complete the following information):

  • OS: Microsoft Windows [Version 10.0.22000.527]
  • Java Version: 11
  • Ghidra Version: 10.2_DEV ee268de
  • Ghidra Origin: [e.g. official ghidra-sre.org distro, third party distro, locally built]

Additional context
#3151

@d-millar
Copy link
Collaborator

d-millar commented Mar 8, 2022

@astrelsky getting on a plane right now, but will take a look at this after I get back, either this aft or tomorrow.

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 8, 2022

@astrelsky getting on a plane right now, but will take a look at this after I get back, either this aft or tomorrow.

Ok, no rush.

@nsadeveloper789
Copy link
Contributor

This might be another case of a stale or incomplete memory map. My guess is the memory map does not include the library loaded via LoadLibrary. This should be correctable in two ways. One is to refresh the Memory node in the Objects view. The second is to tell the listing to ignore the memory map. Find the Regions view (by default it's docked in the same place as Watches and Stack views). In it's toolbar should be a dropdown menu. Toggle "Force Full View". That should allow you to navigate to any address, regardless of whether Ghidra thinks its actually mapped. Granted, if you navigate to places that aren't actually mapped, expect to see a log full of errors (eh, more errors than usual).

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 10, 2022

This might be another case of a stale or incomplete memory map. My guess is the memory map does not include the library loaded via LoadLibrary. This should be correctable in two ways. One is to refresh the Memory node in the Objects view. The second is to tell the listing to ignore the memory map. Find the Regions view (by default it's docked in the same place as Watches and Stack views). In it's toolbar should be a dropdown menu. Toggle "Force Full View". That should allow you to navigate to any address, regardless of whether Ghidra thinks its actually mapped. Granted, if you navigate to places that aren't actually mapped, expect to see a log full of errors (eh, more errors than usual).

I mentioned it in the comment in the discussion just now but will make it here as well. I get the same behavior if I link directly to the dll's lib file instead of using LoadLibrary.

Just saw the second half, trying now.

@d-millar
Copy link
Collaborator

@astrelsky WOW, OK, so I wasn't able to repeat your experiment - not because I didn't encounter your problem, but because I hit so many other problems on the way to your problem that I never got to your problem. Guess I know what I'll be doing for the next week or three. Will try to keep you posted as I go, but, either the existing unit tests are not keeping the dbgeng version in a sane state or something about your example is causing it to got seriously haywire. Among the issues I'm seeing:

  • I seem to be getting phantom breakpoints with zero adddresses all over the place
  • the behavior of the breakpoints is inconsistent depending on whether you add them in the static or dynamic windows
  • icons may or may not be added to both listings for the breakpoints
  • breakpoints added permanently to the program are not triggering automatic loads
  • new breakpoints show as disable or in a mixed state in the Breakpoints view
  • breakpoints (possibly phantom) can't be cleared
  • adding the breakpoint in the static listing may cause repeated "Unable to insert breakpoint" messages
  • the dynamic window is not auto-disassembling to match the static listing
  • deletes from the memory list are causing concurrent processing exceptions
  • save for the command-line launch command is misbehaving
    NOT TO MENTION
  • I can't get the debuggerProblems breakpoint to fire at all

sigh more tomorrow

@astrelsky
Copy link
Contributor Author

@astrelsky WOW, OK, so I wasn't able to repeat your experiment - not because I didn't encounter your problem, but because I hit so many other problems on the way to your problem that I never got to your problem. Guess I know what I'll be doing for the next week or three. Will try to keep you posted as I go, but, either the existing unit tests are not keeping the dbgeng version in a sane state or something about your example is causing it to got seriously haywire. Among the issues I'm seeing:

  • I seem to be getting phantom breakpoints with zero adddresses all over the place
  • the behavior of the breakpoints is inconsistent depending on whether you add them in the static or dynamic windows
  • icons may or may not be added to both listings for the breakpoints
  • breakpoints added permanently to the program are not triggering automatic loads
  • new breakpoints show as disable or in a mixed state in the Breakpoints view
  • breakpoints (possibly phantom) can't be cleared
  • adding the breakpoint in the static listing may cause repeated "Unable to insert breakpoint" messages
  • the dynamic window is not auto-disassembling to match the static listing
  • deletes from the memory list are causing concurrent processing exceptions
  • save for the command-line launch command is misbehaving
    NOT TO MENTION
  • I can't get the debuggerProblems breakpoint to fire at all

sigh more tomorrow

Wait, this is the normal behavior? 🤣

In order to get the breakpoints in the dll to fire you have to map the module to the dll. Please excuse the horrid screenshot below.

huge screenshot

screenshot

@nsadeveloper789 with respect to the second half with enabling the full view. Once enabled while I am tracking the instructions for my function in the static listing they dynamic view is grayed out and all zeroes. This may be seen in the screenshot above as well.

This might end up being a bit of a challenge but I, as well as many others I'm sure, would need to be able to break and step through DllMain itself. I suspect it is achievable in a similar fashion to breaking when starting an executable in the debugger but I don't actually know anything about the api. As a matter of fact the most I really know about the Windows api is it's a dumpster fire.

@d-millar
Copy link
Collaborator

d-millar commented Mar 10, 2022

iWell, leaving aside the issue of "normal", none of those things are supposed to happen. And, I'm pretty sure, you shouldn't have to map dll.dll to DLL.DLL. That function should only be necessary if the program name is not a match to the name of the executable as seen by the loader. After LoadLibraryA is called, there should be a load module event and the memory should be updated to reflect the load. You might need to pre-map the memory if you wanted to set up the breakpoint before the LoadLibraryA call, but not after.

@d-millar
Copy link
Collaborator

Just out of curiosity, at what point was the screenshot above taken? i.e. before or after the break post-LoadLibraryA? Looks like it was taken at the first, i.e. windows-inserted default, breakpoint, well before the dll load.

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 10, 2022

Just out of curiosity, at what point was the screenshot above taken? i.e. before or after the break post-LoadLibraryA? Looks like it was taken at the first, i.e. windows-inserted default, breakpoint, well before the dll load.

The current instruction is 1000b95c. I was stepping through the dll function seen in the listing.

@nsadeveloper789
Copy link
Contributor

@d-millar you may be working from the master branch. Even so, if that's the case, sounds like quite a few problems have cropped up.

@astrelsky Following up on the second "Force-full-view" method, I suspect it's not a feature used very often, and some of the bits of automation are still heeding the incomplete map. You might be able to work around by selecting a portion of the grey 0s and clicking the little "memory chip" looking button in the dynamic listing. That should cause it to refresh pages containing the selected addresses. Whether or not that works, sounds like you've found another bug :) .

@d-millar
Copy link
Collaborator

@nsadeveloper789 - yeah, a lot. more than I have listed above. we should discuss at some point, but fixing what I can right now.

@d-millar
Copy link
Collaborator

OK, bit of an update: still a lot of things I need to work on, but I have at least some understanding of what's going on. The main issue is a concurrency bug in how I update the memory regions. The addRegion logic was ok, but the removeRegion logic was, well, not so good. Why is that relevant, you might ask? To answer that, let me backtrack a bit and walk through your experiment to make sure I'm on the same page. So, your steps with some edits:

Steps to reproduce the behavior:

  1. Compile the two code snippets in the attachments sections. check
  2. Run analysis in the code browser for the launcher and the dll. check
  3. Open the launcher and the dll in the debuger tool. check
    (4. In the launcher tab go to main and then in the decompiler window right-click anywhere just for laughs (and because I'm lazy).) Not necessary AFAICT
  4. Launch the debugger, in-vm and add the dll as the first argument. check
  5. Set a breakpoint after the call to LoadLibraryA. check
    I should note that I hit resume once to cause the conhost.exe process to launch before setting the bpt, then selected the original thread, and set the breakpoint. This uncovered some interesting bugs, including a faiulure to correctly re-direct to the original process.
  6. Run until the breakpoint is hit. check
    (8. Switch to the dll tab.)
    (8. Open the modules view, right click on your dll and select "Map module to {dll_name}".)
    Am pretty sure (need to check again) steps 8 & 9 are unnecessary. At least in my case, the name of the DLL program and the name used by the loader are the same, so the mapping happens on its own.
  7. Set a breakpoint at the entry point to debuggerProblems. check
  8. Go back to the launcher and continue running until the new breakpoint is hit. Took me a while until I noticed I was missing the export. My compiler has a definite preference for 'extern "C" __declspec(dllexport)' over '__declspec(dllexport) extern "C"'
  9. Step until the string pointer is loaded into a register.
    OK, so here's where I start seeing chaos. The first step or two triggered a thread switch to the conhost.exe process. The thread switch caused memory to be re-read, but, I think, on an almost duplicate space. New list triggered deletes on the old list, which triggered the concurrency bug. Bug was absorbed by an completableFuture, but left that process with no memory. No memory meant no Dynamic View, no tracking, etc.

Haven't really gotten to 13-18 (although I pretty much am always doing 18). Guessing I am not at the end of the difficulties yet, but am trying to fix as I go.
13. Find the register in the memory view.
14. Double-click the register, right click, go to the stack-view, registers in the object tree for the stopped thread, look through the Windows toolbar and just scratch you head in utter confusion.
15. Give up and try to use goto.
16. Enter the address pr be lazy and use *:4 EAX (assumming it is in EAX).
17. Be greeted with "Address not in trace" error text.
18. Mash enter repeatedly hoping it will magically work the next time.

@astrelsky Big thanks for this example - it highlights a bunch of bugs I probably wouldn't have seen through the unit tests. The concurrency bug is bad, but the thread/process mismatch could be causing some fairly insidious behavior. I also discovered our cleanup logic was failing, so re-running the same process causes progressively worse performance.

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 11, 2022

(4. In the launcher tab go to main and then in the decompiler window right-click anywhere just for laughs (and because I'm lazy).) Not necessary AFAICT

This was to point out the error popup that occurs when right clicking in the decompiler window when there is no debugger connection.

What section of the debugger api were the concurrency issues in? Was it just the windows debugger portions or would it affect gdb too? For a while I was working on a gdb stub for yuzu to use with Ghidra but after getting the stub nearly complete I gave up. I was having problems getting breakpoints to be consistent among other things and couldn't figure out why. I assumed it to be due to the lack of debugging functionality built into yuzu and dynarmic and put the project aside.

@d-millar
Copy link
Collaborator

@astrelsky The concurrency issues (and, in fact, most of the issues I've mentioned) have been in the Windows-specific code, so probably not relevant for the gdbstub. If you get back to the stub work at some point, feel free to give me a shout. Have written stubs before, albeit not in the last decade - not sure if I can be helpful but willing to try.

On the decompiler error, that seems a little surprising - the decompiler is more or less completely decoupled from the debugger. What was the actual error?

@astrelsky
Copy link
Contributor Author

@astrelsky The concurrency issues (and, in fact, most of the issues I've mentioned) have been in the Windows-specific code, so probably not relevant for the gdbstub. If you get back to the stub work at some point, feel free to give me a shout. Have written stubs before, albeit not in the last decade - not sure if I can be helpful but willing to try.

On the decompiler error, that seems a little surprising - the decompiler is more or less completely decoupled from the debugger. What was the actual error?

stacktrace
2022-03-11 10:17:39 ERROR (SwingExceptionHandler) Error: Uncaught Exception!  java.lang.NullPointerException
	at ghidra.app.plugin.core.debug.service.modules.DebuggerStaticMappingServicePlugin.getDynamicLocationFromStatic(DebuggerStaticMappingServicePlugin.java:904)
	at ghidra.app.plugin.core.debug.gui.watch.DebuggerWatchesProvider.getDynamicLocation(DebuggerWatchesProvider.java:582)
	at ghidra.app.plugin.core.debug.gui.watch.DebuggerWatchesProvider.hasDynamicLocation(DebuggerWatchesProvider.java:610)
	at docking.action.builder.AbstractActionBuilder.lambda$adaptPredicate$1(AbstractActionBuilder.java:763)
	at docking.action.DockingAction.isEnabledForContext(DockingAction.java:184)
	at docking.PopupActionManager.populatePopupMenuActions(PopupActionManager.java:143)
	at docking.PopupActionManager.createPopupMenu(PopupActionManager.java:104)
	at docking.PopupActionManager.popupMenu(PopupActionManager.java:84)
	at docking.ActionToGuiMapper.showPopupMenu(ActionToGuiMapper.java:133)
	at docking.DockableComponent.showContextMenu(DockableComponent.java:173)
	at docking.DockableComponent$1.mouseReleased(DockableComponent.java:75)
	at java.desktop/java.awt.AWTEventMulticaster.mouseReleased(AWTEventMulticaster.java:298)
	at java.desktop/java.awt.Component.processMouseEvent(Component.java:6635)
	at java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3342)
	at java.desktop/java.awt.Component.processEvent(Component.java:6400)
	at java.desktop/java.awt.Container.processEvent(Container.java:2263)
	at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5011)
	at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2321)
	at java.desktop/java.awt.Component.dispatchEvent(Component.java:4843)
	at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4918)
	at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4547)
	at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4488)
	at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2307)
	at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2772)
	at java.desktop/java.awt.Component.dispatchEvent(Component.java:4843)
	at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:772)
	at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:721)
	at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:715)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:95)
	at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:745)
	at java.desktop/java.awt.EventQueue$5.run(EventQueue.java:743)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:85)
	at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:742)
	at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:203)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:124)
	at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:113)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:109)
	at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101)
	at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:90)

As for the gdbstub then I'm almost certain it is due to lack of debugging support in yuzu/dynarmic.

@d-millar
Copy link
Collaborator

Gotcha - well, I can fix the NPE for sure. :)

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 11, 2022

Gotcha - well, I can fix the NPE for sure. :)

I'd hope so 😅. That's why it was "for laughs".

@d-millar
Copy link
Collaborator

AH! (missed that)

On the progress front, almost "everything" fixed. Not sure if the fixes will make the next minor release, but certainly 10.2.

@d-millar
Copy link
Collaborator

One thing probably not in "everything" is how to walk through DllMain. I think the answer is to break before LoadLibraryA, then select up Debugger->Sessions[0]->Processes[x]->Debug->Events->Load module->Execute, and hit T a couple of times to toggle the option to "Break". When you continue, the process should break on the load and the address space for DllMain should be valid. You could then set a breakpoint there and single-step through the code. You'll want to wait until the LoadLibraryA call to avoid having to resume a gabillion times for every module load and probably will want to toggle Break->Output after for the same reason.

@d-millar
Copy link
Collaborator

To follow up on [https://github.com//discussions/3151], another approach to this which I just verified is to run "/c:/windows/system32/rundll32.exe DLL.dll,debuggerProblems". On the initial break, in the Interpreter window, enter "bp DLL!debuggerProblems". The breakpoint will be deferred and will trigger on entry to debuggerProblems. I should figure out how to make a non-command line version of the same, but that will have to be a problem for Monday.

@astrelsky
Copy link
Contributor Author

Excellent thank you.

@nsadeveloper789
Copy link
Contributor

FWIW, the stack trace from right-clicking in the decompiler is already fixed in an internal branch. It's awaiting review.

Also, regarding setting a deferred breakpoint, there is an "Add breakpoint" button in the "Objects" window. AFAIK for Windows, that should be a shortcut for the "bp" command. You'd enter DLL!debuggerProblems into the dialog.

@d-millar
Copy link
Collaborator

@nsadeveloper789 yep, gotcha - new adds move "DLL!debuggerProblems" automatically into the dialog if you "Add" off that node in Objects' Process->Module->Symbols.

@astrelsky
Copy link
Contributor Author

Is this supposed to be fixed yet in the master branch? I just want to be sure we are on the same page as I tried yesterday and was still unable to see anything on the heap.

@d-millar
Copy link
Collaborator

@astrelsky Some, but not all, of the fixes mentioned above have been incorporated into master. With regard to the heap, I think we need to nail down the specifics on this one. If this is still an issue of using "*:4 EAX" with GoTo, I think the problem is the comment on the GoTo, which is frankly confusing and which has been modified. Specifically, if you want to go to the address in RAX, you should just enter RAX. If you want to go to the contents of the address at RAX, you should enter *:8 RAX. In other words, *:8 RAX assumes something like RAX == DEADBEEF, DEADBEEF is a pointer, say, containing FEEDFACE, and you want to go to FEEDFACE. If FEEDFACE is not in the trace, *:8 RAX will throw the "Not in trace" error.

If this is NOT the situation, then, as @nsadeveloper789 mentioned, the problem is probably incomplete information provided by the API. For WIndows, the API is IDebugDataSpaces::QueryVirtual. Possible, we're misusing this, I guess, but, if so, I'm not sure of the fix. I believe this provides the same info as "!address" in Windbg. Is you address in the list if you issue "!address" from the Interpreter window? Also, you should still have the option to force all of memory in as described. Does your address not appear in the Dynamic Listing if you've forced memory in?

@astrelsky
Copy link
Contributor Author

@astrelsky Some, but not all, of the fixes mentioned above have been incorporated into master. With regard to the heap, I think we need to nail down the specifics on this one. If this is still an issue of using "*:4 EAX" with GoTo, I think the problem is the comment on the GoTo, which is frankly confusing and which has been modified. Specifically, if you want to go to the address in RAX, you should just enter RAX. If you want to go to the contents of the address at RAX, you should enter *:8 RAX. In other words, *:8 RAX assumes something like RAX == DEADBEEF, DEADBEEF is a pointer, say, containing FEEDFACE, and you want to go to FEEDFACE. If FEEDFACE is not in the trace, *:8 RAX will throw the "Not in trace" error.

If this is NOT the situation, then, as @nsadeveloper789 mentioned, the problem is probably incomplete information provided by the API. For WIndows, the API is IDebugDataSpaces::QueryVirtual. Possible, we're misusing this, I guess, but, if so, I'm not sure of the fix. I believe this provides the same info as "!address" in Windbg. Is you address in the list if you issue "!address" from the Interpreter window? Also, you should still have the option to force all of memory in as described. Does your address not appear in the Dynamic Listing if you've forced memory in?

If I force all memory I can go to it but it is grayed out and all 0's. However when dereferenced by an instruction I can see it is not 0.

@d-millar
Copy link
Collaborator

when you force it in, you probably have to refresh/flush caches (yes, anticipating a scowl, that should be fixed)

@ryanmkurtz ryanmkurtz added this to the 10.2 milestone Mar 22, 2022
@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 22, 2022

when you force it in, you probably have to refresh/flush caches (yes, anticipating a scowl, that should be fixed)

How would I do that? If you meant to just refresh the objects in the tree that didn't help.

I should mention that I don't actually know how to use the debugger outside of Ghidra so I'm not familiar with any of it's commands.

@d-millar
Copy link
Collaborator

that was the right thing - well, there are two options. "Refresh" in the Objects view with the root selected, and "flush caches" in the Targets view from either the pull-down or a right-click. So, I think you did the right thing. Which means I clearly do not have a grasp of the problem yet. (Side note: the #4059 change set has now been pushed to our local master - shouldn't be too long before it's in the public master.)

@d-millar
Copy link
Collaborator

So, am trying to get this down to the minimal experiment to make sure you and I are doing the same thing. My latest runs have been along the lines of: connect to IN-VM dbgeng, launch "Test.exe TestDll.dll" (my local names), resume once to get the conhost process to launch, enable the bpt after the load, resume until the bpt gets hit, enable the bpt in TestDll.dll, resume, bpt hits. Haven't done anything past that. Are you doing more or less the same? And just because someone else here had a similar issue, is the pull-down in the Dynamic listing set to track PC? And is the trace associated with Test (vs conhost) in Threads selected?

@astrelsky
Copy link
Contributor Author

astrelsky commented Mar 22, 2022

So, am trying to get this down to the minimal experiment to make sure you and I are doing the same thing. My latest runs have been along the lines of: connect to IN-VM dbgeng, launch "Test.exe TestDll.dll" (my local names), resume once to get the conhost process to launch, enable the bpt after the load, resume until the bpt gets hit, enable the bpt in TestDll.dll, resume, bpt hits. Haven't done anything past that. Are you doing more or less the same?

It is similar. It could be because I still have to do the map modules to xyz. 100% of the time the filename I have will not be what is expected.

And just because someone else here had a similar issue, is the pull-down in the Dynamic listing set to track PC?

yes

And is the trace associated with Test (vs conhost) in Threads selected?

yes

I didn't flush the caches, only the refresh. Should I try flushing them?

@d-millar
Copy link
Collaborator

Can't hurt (he says). I will try renaming my program and testing that way

ryanmkurtz pushed a commit that referenced this issue Mar 23, 2022
GP-1812: another revert
GP-1812: moving changes to alt branches
GP-1812: comment in goto no longer applies to registers
GP-1812: new providers not retrieving configState
GP-1812: NPE mentioned in #4059
GP-1812:  MISSING+ENABLED -> ENABLED, not DISABLED_ENABLED
GP-1812: name inconsistency in breakpoints
GP-1812: String->Address to assist navigation
GP-1812: force memory refresh on module load
GP-1812: concurrency error processing memory
GP-1812: thread/process fix for dbgmodel; restricting changeElements to matching container matching process
GP-1812: make currentThread/Process consistent
GP-1812: fix for failed DebugClient cleanup; callback error msg issue
@d-millar
Copy link
Collaborator

I'm going to re-open this - am still trying to understand the missing memory issue.

@d-millar d-millar reopened this Mar 23, 2022
@d-millar
Copy link
Collaborator

OK, I have a fix - took me a while to understand what was really happening. Hopefully, will hit master in the near future. In the meantime, I can describe the problem and a workaround.

The issue: we try not to update memory on every event because it's super-expensive. We were updating it before on process start-up, module loads, and resync. Would have thought "Flush Caches" would trigger a resync, but it doesn't. Would have thought "Refresh" on the Object tree root or the Memory node would trigger a resync, but they don't. The resync only happens if the Memory node is open. While this probably seems dumb, it's actually a good design decision for a couple of reasons. Namely, it restricts updates to those items the user is presumably interested in, and it prevents potentially recursive transits through the tree, which contains linked nodes.

So, the workaround is to (1) open the Memory node, and (2) select/refresh it if you suspect memory is stale.

The full fix should allow "Refresh" from anywhere in the tree above or on Memory and increase the number of events that auto-refresh memory.

@astrelsky
Copy link
Contributor Author

Ok great. I will give it a shot tomorrow morning. Thank you.

@astrelsky
Copy link
Contributor Author

The issue: we try not to update memory on every event because it's super-expensive. We were updating it before on process start-up, module loads, and resync. Would have thought "Flush Caches" would trigger a resync, but it doesn't. Would have thought "Refresh" on the Object tree root or the Memory node would trigger a resync, but they don't. The resync only happens if the Memory node is open. While this probably seems dumb, it's actually a good design decision for a couple of reasons. Namely, it restricts updates to those items the user is presumably interested in, and it prevents potentially recursive transits through the tree, which contains linked nodes.

I was just thinking about this and am wondering if it would be sufficient to set a "dirty" flag on events which update memory and then the actual update would occur on a user triggered event such as reaching a breakpoint or stepping.

@d-millar
Copy link
Collaborator

Would work if you had events for updated memory, but that’s a pretty rare case unfortunately.

@astrelsky
Copy link
Contributor Author

Would work if you had events for updated memory, but that’s a pretty rare case unfortunately.

That's unfortunate. How about a different approach then. Why is there a lot of overhead in updating the memory? It's still a bit early so I can be mistaken but I think I recall a lot of things happening in the ui while the target is running in the debugger. This may not be necessary and could be made to occur once execution pauses. I have no idea if this is actually a cause of overhead or not but I do recall comments elsewhere about some analysis performing better if you minimize the ui since the events won't fire or something.

@d-millar
Copy link
Collaborator

Well, this is just my opinion - take it with a grain of salt - but my impression is that most users can stand a considerably larger hit to the running process than they can when interacting with it. For example, full-on tracing via single-step incurs about a 100x penalty, but for debugging a serious problem on a single process many users are willing to take that hit. Tracing via step-by-branch or using dedicated hardware is considerably less, in the 10x-100x range, and most users don't mind this at all unless they're batch processing hundreds (or thousands) of executables. However, a 2x-10x slowdown in single-stepping is unacceptable to almost everyone. Users want to be able to bang in single-step as hard and fast as they can and have the GUI respond. Single-step normally involves a minimum of commands, typically ask for the step (often by setting a register), resuming, and returning the new register values. That's typically of the order of a hundred bytes for a large register set. Asking for all the modules and/or all the memory regions and potentially pages that might have changed and are visible is more like a thousand+ bytes. For users with very fast machines maybe not a problem, but for most at least an annoyance. I think updating on non-single-step events is a good compromise, and generally that means process/thread create/destroy, module load/unload, and breakpoints hit.

@astrelsky
Copy link
Contributor Author

Oh yea, this good stuff. <- Me after refreshing the memory node.

@astrelsky
Copy link
Contributor Author

Well, this is just my opinion - take it with a grain of salt - but my impression is that most users can stand a considerably larger hit to the running process than they can when interacting with it. For example, full-on tracing via single-step incurs about a 100x penalty, but for debugging a serious problem on a single process many users are willing to take that hit. Tracing via step-by-branch or using dedicated hardware is considerably less, in the 10x-100x range, and most users don't mind this at all unless they're batch processing hundreds (or thousands) of executables. However, a 2x-10x slowdown in single-stepping is unacceptable to almost everyone. Users want to be able to bang in single-step as hard and fast as they can and have the GUI respond. Single-step normally involves a minimum of commands, typically ask for the step (often by setting a register), resuming, and returning the new register values. That's typically of the order of a hundred bytes for a large register set. Asking for all the modules and/or all the memory regions and potentially pages that might have changed and are visible is more like a thousand+ bytes. For users with very fast machines maybe not a problem, but for most at least an annoyance. I think updating on non-single-step events is a good compromise, and generally that means process/thread create/destroy, module load/unload, and breakpoints hit.

So initially I was under the impression that I had to refresh the memory node all the time including after a single byte had been written to memory. Now that I know it is only when sections of memory are added/removed I no longer see it as such a big deal. 😅

@d-millar
Copy link
Collaborator

Wow, that’s an interesting point. I suppose, if you’re watching the stack, you really might want it to update on every step. I think we might need to make this an option.

@d-millar
Copy link
Collaborator

Actually, I think it does that much. Will verify, but I think visible memory is refreshed for all events, and non-visible memory is refreshed when viewed.

@astrelsky
Copy link
Contributor Author

So initially I was under the impression that I had to refresh the memory node all the time including after a single byte had been written to memory. Now that I know it is only when sections of memory are added/removed I no longer see it as such a big deal. 😅

I think you misunderstood my above comment. I initially thought the behavior was like that. After getting everything mostly working I realized I had misunderstood the reason for having to refresh the memory node. It seems to be functioning as would be expected at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants