Skip to content

Conversation

sriemer
Copy link
Collaborator

@sriemer sriemer commented May 8, 2014

This patch set extends scanmem to determine the region type and the load address of libraries and the executable. This information is displayed additionally in the lregions command. The default type is a single region misc. The single region types heap and stack are also easy to determine but code and exe are special:
code is a library or an executable which consists of 3 or 4 regions:
0: r,x - .text; 1: r - .rodata; 2: r,w - .data; 3: r,w - .bss
The regions 1, 2 and 3 are consecutive in memory (often even region 0 as well). This is how it can be determined that these regions belong together. The start of the .text region is used as the load address of the binary. Subtracting it from a match address within the .data or .bss section results in the address used within the binary. This helps bypassing address space layout randomization (ASLR) in combination with position independent code (PIC) or position independent executable (PIE). The list command is extended to show that in an additionally displayed region info for found matches. The output format changes. So the GUI is changed as well. The exe type is a subtype of code and is just the executable.

sriemer added 5 commits May 8, 2014 22:35
The region type can be one of the following:
misc  (default),
code  (belongs to library or executable),
exe   (belongs to the executable),
heap  (is the heap region),
stack (is the stack region)

The types misc, heap and stack are single regions which can be
determined easily. But code and exe are special. In the executable
or when loading a library, there are often three or four
consecutive regions belonging to it:
0: read,exec; 1: read; 2: read,write; 3: read,write (no file name)
0: .text; 1: .rodata; 2: .data; 3: .bss

Scanmem only cares for regions 2 and 3. But with address space
layout randomization (ASLR) and PIC/PIE the load address of the
code (start of region 0) is randomized. Subtracting the load
address from a found memory address within a code region can show
us the static memory address within the binary. This is why the
load address is important and should be determined. For other types
the region start is used.

Further commits using the region type and the load address follow.
The pointer format should not only be used for the 'list' command
but e.g. for the 'lregions' command as well. Also '%20p' is too
long as, on x86_64 at least, there are only 6 bytes used for the
memory address. We can also get rid of the '0x' to shorten it even
further. So change it to '%12lx' for 64 bit and '%8lx' on 32 bit.

We've also noticed that there is no 'ULONGMAX' define. The correct
name would be 'ULONG_MAX'. So rename it and add a compiler warning
in case it's not defined.
This gives us the information if a region belongs to a library or
an executable and to which location these are loaded into memory.
It makes things much easier if the 'list' command not only shows
the match info but the associated region info as well. Display the
region ID and the region type to get a feeling in which kind of
memory region a match is located. Subtracting the load address
of the executable or library from the match address allows us
to calculate a match offset which is the variable address within
the binary. This helps bypassing address space layout randomization
(ASLR). So display it.

The GUI needs to be changed in following commits to recognize the
new format.
The format of the scanmem 'list' command has changed. So the GUI
has to be changed as well. The match offset from the code load
address or region start as well as the region type are also
useful in the ScanResult_TreeView. So add them to new columns.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of these variables can be moved into the loop below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which ones?
I don't see a single one!
All these are required for multiple regions and not for a single region.
code_regions is set upon .text region and then incremented or reset to 0 in further regions. is_exe is also used for all 3 or 4 regions belonging to the binary and not detected again and again. prev_end is obviously used to hold the end address of the previous region. The load_addr is also only set upon .text region unless there is a region type different from code or exe.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't realized that initialization here. Usually I'd move them right above the while statement, but it's ok this way.

@coolwanglu
Copy link
Owner

I just made the final review, please take a look at my comments.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 12, shouldn't it be 16?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I haven't ever seen that the full 8 bytes are used for virtual memory addresses. If that should be the case in the future on any architecture, we can increase it. It is only about indention.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, better to be safe here.

coolwanglu added a commit that referenced this pull request May 16, 2014
introduce region type and load address support
@coolwanglu coolwanglu merged commit 9afd46a into coolwanglu:master May 16, 2014
@coolwanglu
Copy link
Owner

Sorry for my lag these days, I've been very busy.

Thank you for your efforts and cooperation in this long process, I just tried to be careful especially when I don't have my build environment availabe.

This could be a very useful and powerful feature.

@sriemer
Copy link
Collaborator Author

sriemer commented May 16, 2014

Thank you, too! This means much to me!
For me this is already a very useful and powerful feature! :-) I made some bigger changes to the dynamic memory discovery process in my ugtrain recently. So I had to retest the example configs doing a lot of memory scanning with PIE and without PIE, with static memory and with dynamic memory. It is so cool to see the region type right away for matches! With PIE and the "exe" type I just have to put the found match offset into the config. As ugtrain has the same method to get the load address, it just has to add it back to the address from the config if PIE is detected. It's the only game trainer on Linux or even there is which supports PIE. :-) Now, I can finally document the memory discovery with PIE and make the next release. :-) Btw.: iOS on iPhones also always uses PIE. Ubuntu has it since 13.04 as a default.

@sriemer sriemer deleted the for-wanglu branch July 22, 2014 05:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants