Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several issues uncovered while investigating recent CI failures #155

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

godlygeek
Copy link
Contributor

@godlygeek godlygeek commented Nov 17, 2023

This fixes several issues, predominantly around how we handle mappings that are created or destroyed while we're actively trying to attach, or threads that are started and stopped while we're still in the process of attaching.

These variables are unused, as `ProcessManager.create_from_pid` now
parses the maps file itself; the caller doesn't need to.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Previously we were pausing the process only after we had already fetched
the processes mappings, leading to a potential TOCTOU issue: the fetched
mappings might be stale by the time the process was paused and we were
ready to use them.

Handle this by creating a new `ProcessTracer` class responsible for
pausing the process. This class matches the behavior that was previously
found in `BlockingProcessMemoryManager`, except that it's not also an
instance of `AbstractRemoteMemoryManager`: it handles only tracing the
process on construction, and detaching on destruction.

In blocking mode, we create a `ProcessTracer` before reading the
mappings. Later, we pass it to the `ProcessManager`, which takes over
ownership of it, destroying it when we're done analyzing the process and
want to detach.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
By having the tracer expose the list of thread IDs that it stopped, the
manager can avoid a call to find the list of thread IDs itself. Since
every thread must be paused when there is a tracer, the tracer's list of
threads must be accurate.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
@godlygeek godlygeek self-assigned this Nov 17, 2023
Previously, we were fetching the process's list of threads once, and
iterating over that list attaching to each thread. This is racy, though:
after we get the list of threads and before we've suspended every
thread, it's still possible for new threads to start or old threads to
finish.

Handle this by fetching the list of threads at least twice. When we
encounter an error trying to attach to a thread, don't report it unless
the next call to list all of the threads says that thread still belongs
to the process we're attaching to (if the thread has disappeared from
the list, then the reason we failed to attach to that thread is that it
has died).

Similarly, if any new threads show up in the second list of threads,
we'll try to pause those, repeating this until the set of threads that
we've paused contains every thread that the process currently owns.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
This is called from C++ code that doesn't know how to propagate a Python
exception if one should occur. Tell Cython not to allow an exception to
escape from this function, and to instead print and discard one if it
should occur.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Leverage our existing `_try_to_decode_string` function which uses
`errors="replace"` for decoding, guaranteeing that log messages won't be
lost entirely if they contain non-UTF-8 data.

Also, pass the log message around as a C++ string rather than
a C string, which is more efficient anyway (since the size is known and
doesn't need to be computed).

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
@codecov-commenter
Copy link

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (65f8d4e) 89.71% compared to head (db30c2f) 89.75%.

Files Patch % Lines
src/pystack/_pystack/process.cpp 94.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #155      +/-   ##
==========================================
+ Coverage   89.71%   89.75%   +0.03%     
==========================================
  Files          48       48              
  Lines        5505     5524      +19     
  Branches      873      873              
==========================================
+ Hits         4939     4958      +19     
  Misses        566      566              
Flag Coverage Δ
cpp 75.07% <94.11%> (+0.21%) ⬆️
python_and_cython 99.93% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pablogsal pablogsal merged commit 3e7d72d into bloomberg:main Nov 22, 2023
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants