Skip to content

Conversation

@pablogsal
Copy link
Member

@pablogsal pablogsal commented Jan 21, 2025

The current implementation incorrectly assumes that calculating a file
offset from a process memory address can be done using a simple
subtraction from the library's load address. This assumption doesn't
hold for binaries with non-standard ELF layouts, where PT_LOAD segments
may have different virtual address to file offset mappings.

Fix the issue by:

  1. First converting the absolute process address to a library-relative
    offset by subtracting the library's load point in the process
  2. Finding the PT_LOAD segment in the ELF file that contains this offset
  3. Using the segment's p_vaddr and p_offset to calculate the correct
    file offset

To avoid performance penalties from repeatedly parsing ELF files, add
caching of PT_LOAD segments per library.

Example of what was wrong:
old: file_offset = addr - lib_start
new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset

This fixes an issue where pystack would read from incorrect file offsets
when analyzing binaries compiled with non-standard layout options (e.g.,
when using the gold linker with custom flags).

@pablogsal pablogsal changed the title elf fix core Fix incorrect file offset calculation in memory mapping Jan 21, 2025
@pablogsal pablogsal force-pushed the elf_fix_core branch 9 times, most recently from 1b17fec to 3b04fd4 Compare January 21, 2025 21:34
@codecov-commenter
Copy link

codecov-commenter commented Jan 21, 2025

Codecov Report

Attention: Patch coverage is 59.61538% with 21 lines in your changes missing coverage. Please review.

Project coverage is 83.33%. Comparing base (19b9759) to head (def56f4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/pystack/_pystack/mem.cpp 73.80% 11 Missing ⚠️
src/pystack/_pystack/process.cpp 0.00% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #220      +/-   ##
==========================================
- Coverage   83.51%   83.33%   -0.19%     
==========================================
  Files          46       46              
  Lines        6201     6248      +47     
  Branches      134      458     +324     
==========================================
+ Hits         5179     5207      +28     
- Misses       1020     1041      +21     
+ Partials        2        0       -2     
Flag Coverage Δ
cpp 83.33% <59.61%> (+18.89%) ⬆️
python_and_cython 83.33% <59.61%> (-15.73%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pablogsal pablogsal force-pushed the elf_fix_core branch 2 times, most recently from 5644a23 to 813c89f Compare January 21, 2025 23:12
Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit messages for the fighting-the-CI commits need rewording to explain the rationale for the changes. Other than that...

Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've pushed a bunch of fixup commits for minor improvements, plus one bug fix for a bug that I think has existed for as long as pystack. Please review my changes, and if you're happy with them feel free to squash and land this. Let me know if you dislike any of the fixups, or if you disagree with my diagnosis of the range check bug.

@godlygeek
Copy link
Contributor

Oh, and the fixup commits actually have commit message bodies, so each one should explain the rationale for the change.

pablogsal and others added 4 commits February 13, 2025 01:28
The latest version supports Python 3.13, and the older version that we
were pinning to now causes CI failures.

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
The latest version of mypy enforces that enum members do not have type
annotations, because the type of every member of a subclass of `Enum` is
that subclass itself.

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
The current implementation incorrectly assumes that calculating a file
offset from a process memory address can be done using a simple
subtraction from the library's load address. This assumption doesn't
hold for binaries with non-standard ELF layouts, where PT_LOAD segments
may have different virtual address to file offset mappings.

Fix the issue by:
1. First converting the absolute process address to a library-relative
   offset by subtracting the library's load point in the process
2. Finding the PT_LOAD segment in the ELF file that contains this offset
3. Using the segment's p_vaddr and p_offset to calculate the correct
   file offset

To avoid performance penalties from repeatedly parsing ELF files, add
caching of PT_LOAD segments per library.

Example of what was wrong:
  old: file_offset = addr - lib_start
  new: file_offset = ((addr - lib_start) - segment->p_vaddr) + segment->p_offset

This fixes an issue where pystack would read from incorrect file offsets
when analyzing binaries compiled with non-standard layout options (e.g.,
when using the gold linker with custom flags).

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
The end address is exclusive, not inclusive. This off-by-one could mean
that we look at the wrong shared library for a given address if it
occurs at the very start of start of a mapping.

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
@pablogsal pablogsal enabled auto-merge (rebase) February 13, 2025 01:29
@pablogsal pablogsal merged commit 5b75822 into bloomberg:main Feb 13, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants