New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce region type and load address support #25

Merged
merged 5 commits into from May 16, 2014

Conversation

Projects
None yet
2 participants
@sriemer
Collaborator

sriemer commented May 8, 2014

This patch set extends scanmem to determine the region type and the load address of libraries and the executable. This information is displayed additionally in the lregions command. The default type is a single region misc. The single region types heap and stack are also easy to determine but code and exe are special:
code is a library or an executable which consists of 3 or 4 regions:
0: r,x - .text; 1: r - .rodata; 2: r,w - .data; 3: r,w - .bss
The regions 1, 2 and 3 are consecutive in memory (often even region 0 as well). This is how it can be determined that these regions belong together. The start of the .text region is used as the load address of the binary. Subtracting it from a match address within the .data or .bss section results in the address used within the binary. This helps bypassing address space layout randomization (ASLR) in combination with position independent code (PIC) or position independent executable (PIE). The list command is extended to show that in an additionally displayed region info for found matches. The output format changes. So the GUI is changed as well. The exe type is a subtype of code and is just the executable.

sriemer added some commits Apr 10, 2014

maps: determine region type and load address
The region type can be one of the following:
misc  (default),
code  (belongs to library or executable),
exe   (belongs to the executable),
heap  (is the heap region),
stack (is the stack region)

The types misc, heap and stack are single regions which can be
determined easily. But code and exe are special. In the executable
or when loading a library, there are often three or four
consecutive regions belonging to it:
0: read,exec; 1: read; 2: read,write; 3: read,write (no file name)
0: .text; 1: .rodata; 2: .data; 3: .bss

Scanmem only cares for regions 2 and 3. But with address space
layout randomization (ASLR) and PIC/PIE the load address of the
code (start of region 0) is randomized. Subtracting the load
address from a found memory address within a code region can show
us the static memory address within the binary. This is why the
load address is important and should be determined. For other types
the region start is used.

Further commits using the region type and the load address follow.
lregions: show region type and load address
This gives us the information if a region belongs to a library or
an executable and to which location these are loaded into memory.
handlers: show match region info with 'list' command
It makes things much easier if the 'list' command not only shows
the match info but the associated region info as well. Display the
region ID and the region type to get a feeling in which kind of
memory region a match is located. Subtracting the load address
of the executable or library from the match address allows us
to calculate a match offset which is the variable address within
the binary. This helps bypassing address space layout randomization
(ASLR). So display it.

The GUI needs to be changed in following commits to recognize the
new format.
gui: add match offset and region type support
The format of the scanmem 'list' command has changed. So the GUI
has to be changed as well. The match offset from the code load
address or region start as well as the region type are also
useful in the ScanResult_TreeView. So add them to new columns.
handlers: improve pointer output
The pointer format should not only be used for the 'list' command
but e.g. for the 'lregions' command as well. Also '%20p' is too
long as, on x86_64 at least, there are only 6 bytes used for the
memory address. We can also get rid of the '0x' to shorten it even
further. So change it to '%12lx' for 64 bit and '%8lx' on 32 bit.

We've also noticed that there is no 'ULONGMAX' define. The correct
name would be 'ULONG_MAX'. So rename it and add a compiler warning
in case it's not defined.
size_t len = 0;
unsigned int code_regions = 0;
bool is_exe = false;
unsigned long prev_end = 0, load_addr = 0;

This comment has been minimized.

@coolwanglu

coolwanglu May 10, 2014

Owner

some of these variables can be moved into the loop below.

@coolwanglu

coolwanglu May 10, 2014

Owner

some of these variables can be moved into the loop below.

This comment has been minimized.

@sriemer

sriemer May 10, 2014

Collaborator

Which ones?
I don't see a single one!
All these are required for multiple regions and not for a single region.
code_regions is set upon .text region and then incremented or reset to 0 in further regions. is_exe is also used for all 3 or 4 regions belonging to the binary and not detected again and again. prev_end is obviously used to hold the end address of the previous region. The load_addr is also only set upon .text region unless there is a region type different from code or exe.

@sriemer

sriemer May 10, 2014

Collaborator

Which ones?
I don't see a single one!
All these are required for multiple regions and not for a single region.
code_regions is set upon .text region and then incremented or reset to 0 in further regions. is_exe is also used for all 3 or 4 regions belonging to the binary and not detected again and again. prev_end is obviously used to hold the end address of the previous region. The load_addr is also only set upon .text region unless there is a region type different from code or exe.

This comment has been minimized.

@coolwanglu

coolwanglu May 16, 2014

Owner

Sorry I didn't realized that initialization here. Usually I'd move them right above the while statement, but it's ok this way.

@coolwanglu

coolwanglu May 16, 2014

Owner

Sorry I didn't realized that initialization here. Usually I'd move them right above the while statement, but it's ok this way.

/* get load address for consecutive code regions (.text, .rodata, .data) */
if (code_regions > 0) {
if (exec == 'x' || (read == 'r' && write == 'w' &&
start != prev_end) || code_regions >= 4) {

This comment has been minimized.

@coolwanglu

coolwanglu May 10, 2014

Owner

What does >= 4 mean?

@coolwanglu

coolwanglu May 10, 2014

Owner

What does >= 4 mean?

This comment has been minimized.

@sriemer

sriemer May 10, 2014

Collaborator

It's just in case the detection misbehaves. There should never be more than 4 consecutive code regions (.text, .rodata, .data and .bss). As you would only notice at the fifth region, >= 4 is put here.

@sriemer

sriemer May 10, 2014

Collaborator

It's just in case the detection misbehaves. There should never be more than 4 consecutive code regions (.text, .rodata, .data and .bss). As you would only notice at the fifth region, >= 4 is put here.

fprintf(stdout, "[%2u] "POINTER_FMT", %s\n", i++, remote_address_of_nth_element(reading_swath_index, reading_iterator /* ,MATCHES_AND_VALUES */), v);
void *address = remote_address_of_nth_element(reading_swath_index,
reading_iterator /* ,MATCHES_AND_VALUES */);
unsigned long address_ul = (unsigned long)address;

This comment has been minimized.

@coolwanglu

coolwanglu May 10, 2014

Owner

this looks ugly, we will need to unify void* unsigned long (long) etc later.

@coolwanglu

coolwanglu May 10, 2014

Owner

this looks ugly, we will need to unify void* unsigned long (long) etc later.

This comment has been minimized.

@sriemer

sriemer May 10, 2014

Collaborator

Yes, I've got the patch for that also ready. Souldn't block us now. There are so many other conversions between unsigned long and void * in the code.

@sriemer

sriemer May 10, 2014

Collaborator

Yes, I've got the patch for that also ready. Souldn't block us now. There are so many other conversions between unsigned long and void * in the code.

This comment has been minimized.

@coolwanglu

coolwanglu Jun 7, 2014

Owner

Wondering if you are still working on this. This part is kind of annoying.

@coolwanglu

coolwanglu Jun 7, 2014

Owner

Wondering if you are still working on this. This part is kind of annoying.

@@ -86,11 +103,41 @@ bool readmaps(pid_t target, list_t * regions)
if (sscanf(line, "%lx-%lx %c%c%c%c %x %x:%x %u %s", &start, &end, &read,
&write, &exec, &cow, &offset, &dev_major, &dev_minor, &inode, filename) >= 6) {
/* get load address for consecutive code regions (.text, .rodata, .data) */

This comment has been minimized.

@coolwanglu

coolwanglu May 10, 2014

Owner

I cannot follow the logic of this part. Could you add more comments explaining the algorithm?

@coolwanglu

coolwanglu May 10, 2014

Owner

I cannot follow the logic of this part. Could you add more comments explaining the algorithm?

This comment has been minimized.

@sriemer

sriemer May 10, 2014

Collaborator

To the source? Yes, I can do that. Let's agree on a description first before I add another patch. What about this one?:
/* When loading a binary into memory there are 3 or 4 memory regions created:
* .text, .rodata, .data and optionally also .bss. The last three are consecutive in memory -
* often even .text as well. This means that the start address is the end address of the
* previous region. In case a r/w region isn't consecutive with the previous, is the next .text region
* or there are more than 4 consecutive regions, a region not belonging to the same binary
* is expected.
*/

@sriemer

sriemer May 10, 2014

Collaborator

To the source? Yes, I can do that. Let's agree on a description first before I add another patch. What about this one?:
/* When loading a binary into memory there are 3 or 4 memory regions created:
* .text, .rodata, .data and optionally also .bss. The last three are consecutive in memory -
* often even .text as well. This means that the start address is the end address of the
* previous region. In case a r/w region isn't consecutive with the previous, is the next .text region
* or there are more than 4 consecutive regions, a region not belonging to the same binary
* is expected.
*/

This comment has been minimized.

@coolwanglu

coolwanglu May 16, 2014

Owner

Oh I see, it's based on the actual binary formats. Better add some more references.

I'll merge the patch first, please take your time and add the comments.

@coolwanglu

coolwanglu May 16, 2014

Owner

Oh I see, it's based on the actual binary formats. Better add some more references.

I'll merge the patch first, please take your time and add the comments.

This comment has been minimized.

@sriemer

sriemer May 16, 2014

Collaborator

Thank you very much for the merge! I'll add the comments in the next PR.

@sriemer

sriemer May 16, 2014

Collaborator

Thank you very much for the merge! I'll add the comments in the next PR.

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu May 10, 2014

Owner

I just made the final review, please take a look at my comments.

Owner

coolwanglu commented May 10, 2014

I just made the final review, please take a look at my comments.

#if ULONG_MAX == 4294967295UL
#define POINTER_FMT "%8lx"
#elif ULONG_MAX == 18446744073709551615UL
#define POINTER_FMT "%12lx"

This comment has been minimized.

@coolwanglu

coolwanglu May 16, 2014

Owner

why 12, shouldn't it be 16?

@coolwanglu

coolwanglu May 16, 2014

Owner

why 12, shouldn't it be 16?

This comment has been minimized.

@sriemer

sriemer May 16, 2014

Collaborator

Because I haven't ever seen that the full 8 bytes are used for virtual memory addresses. If that should be the case in the future on any architecture, we can increase it. It is only about indention.

@sriemer

sriemer May 16, 2014

Collaborator

Because I haven't ever seen that the full 8 bytes are used for virtual memory addresses. If that should be the case in the future on any architecture, we can increase it. It is only about indention.

This comment has been minimized.

@coolwanglu

coolwanglu May 16, 2014

Owner

Well, better to be safe here.

@coolwanglu

coolwanglu May 16, 2014

Owner

Well, better to be safe here.

coolwanglu added a commit that referenced this pull request May 16, 2014

Merge pull request #25 from sriemer/for-wanglu
introduce region type and load address support

@coolwanglu coolwanglu merged commit 9afd46a into coolwanglu:master May 16, 2014

@coolwanglu

This comment has been minimized.

Show comment
Hide comment
@coolwanglu

coolwanglu May 16, 2014

Owner

Sorry for my lag these days, I've been very busy.

Thank you for your efforts and cooperation in this long process, I just tried to be careful especially when I don't have my build environment availabe.

This could be a very useful and powerful feature.

Owner

coolwanglu commented May 16, 2014

Sorry for my lag these days, I've been very busy.

Thank you for your efforts and cooperation in this long process, I just tried to be careful especially when I don't have my build environment availabe.

This could be a very useful and powerful feature.

@sriemer

This comment has been minimized.

Show comment
Hide comment
@sriemer

sriemer May 16, 2014

Collaborator

Thank you, too! This means much to me!
For me this is already a very useful and powerful feature! :-) I made some bigger changes to the dynamic memory discovery process in my ugtrain recently. So I had to retest the example configs doing a lot of memory scanning with PIE and without PIE, with static memory and with dynamic memory. It is so cool to see the region type right away for matches! With PIE and the "exe" type I just have to put the found match offset into the config. As ugtrain has the same method to get the load address, it just has to add it back to the address from the config if PIE is detected. It's the only game trainer on Linux or even there is which supports PIE. :-) Now, I can finally document the memory discovery with PIE and make the next release. :-) Btw.: iOS on iPhones also always uses PIE. Ubuntu has it since 13.04 as a default.

Collaborator

sriemer commented May 16, 2014

Thank you, too! This means much to me!
For me this is already a very useful and powerful feature! :-) I made some bigger changes to the dynamic memory discovery process in my ugtrain recently. So I had to retest the example configs doing a lot of memory scanning with PIE and without PIE, with static memory and with dynamic memory. It is so cool to see the region type right away for matches! With PIE and the "exe" type I just have to put the found match offset into the config. As ugtrain has the same method to get the load address, it just has to add it back to the address from the config if PIE is detected. It's the only game trainer on Linux or even there is which supports PIE. :-) Now, I can finally document the memory discovery with PIE and make the next release. :-) Btw.: iOS on iPhones also always uses PIE. Ubuntu has it since 13.04 as a default.

@sriemer sriemer deleted the sriemer:for-wanglu branch Jul 22, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment