Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print the thread name in the live TUI #562

Merged
merged 5 commits into from
May 30, 2024

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Mar 7, 2024

To allow users to determine thread in a multithreaded application
corresponds to the "Thread X of Y" number displayed in the live view,
include the thread name that we collect in the TUI.

Closes: #561

Copy link
Contributor

@godlygeek godlygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, I made a bunch of changes.

I renamed AllocationRecord.thread_name to pretty_thread_name to make it clear that it's doing some string formatting and not just returning data from the record, and then added a new thread_name attribute that is just the data from the record.

Given that AllocationRecord is intended to (eventually) be part of our public interface, it probably shouldn't have pretty_thread_name at all, and we should instead move that to a function in common.py that our reporters can use, but that's a problem for another day.

Anyway, on top of that change, I updated this to only show a thread name if we have one, so the changes to the Textual snapshot cases are reverted (none of those threads have names). I also updated one of the tests to exercise the case where threads do have names, and I tested it against a Bloomberg service where threads do have names to show that it does work as intended.

I'm approving this since you won't be able to approve your own PR, but please do check my work before you land this, @pablogsal.

@pablogsal
Copy link
Member Author

Not really a fan of having pretty_thread_name and thread_name. Also, I think this PR makes the output nicer but it doesn't really solve that is requested in #561 yet, so I would prefer to not merge this until we find a way to solve that as well.

@godlygeek godlygeek marked this pull request as draft March 8, 2024 16:15
@godlygeek
Copy link
Contributor

godlygeek commented Mar 8, 2024

Not really a fan of having pretty_thread_name and thread_name.

Me neither, but I don't think pretty_thread_name should be an attribute of the record at all. It's obscuring the actual thread name of the record by doing formatting, and presentation should be a concern of the reporter, rather than something exposed as an attribute of the record.

If you'd like me to revert the changes to have both pretty_thread_name and thread_name, I could, but the end result would be that the TUI has to parse the thread name out of the record.thread_name by doing something like

thread_name = record.thread_name
thread_name = thread_name[thread_name.find("(") : thread_name.rfind(")") + 1]

which, yuck.

Alternatively (and I think preferably), I could push forward with my cleanup by removing pretty_thread_name from the record and factoring it into a def format_thread_name(record) -> str: function in reporters/__init__.py or a new reporters/common.py, and update our reporters to get it from there.

I think this PR makes the output nicer but it doesn't really solve that is requested in #561 yet

It doesn't, but it does display the thread name when we have the thread name, which is more than never. It also makes it so that the only piece missing for solving #561 is for us to capture the Python thread name in the tracker as well.

I would prefer to not merge this until we find a way to solve that as well

Well, flipping this to draft until we decide, but it seems to me that this is already a reasonably sized self-contained PR that moves the ball forward and displays information that we already collect and which can be useful to people using the TUI.

I think #561 can really be broken down into 2 related work items:

  1. capture Python thread names as well as OS thread names
  2. displaying the best name we have for each thread on the TUI

We need both pieces, and I don't think both need to be in the same PR (or even the same release). This is already useful today in applications that use pthread_setname_np or prctl(PR_SET_NAME), and some Python libraries do (confluent_kafka does, for instance).

@codecov-commenter
Copy link

codecov-commenter commented Mar 8, 2024

Codecov Report

Attention: Patch coverage is 97.87234% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 93.04%. Comparing base (41248ed) to head (378be7f).
Report is 72 commits behind head on main.

Files Patch % Lines
src/memray/_memray/tracking_api.cpp 86.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #562      +/-   ##
==========================================
+ Coverage   92.55%   93.04%   +0.49%     
==========================================
  Files          91       94       +3     
  Lines       11304    11407     +103     
  Branches     1581     2089     +508     
==========================================
+ Hits        10462    10614     +152     
+ Misses        837      793      -44     
+ Partials        5        0       -5     
Flag Coverage Δ
cpp 93.04% <97.87%> (+7.10%) ⬆️
python_and_cython 93.04% <97.87%> (-2.68%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@godlygeek godlygeek force-pushed the thread_name branch 6 times, most recently from dc73a0d to de20f4e Compare March 8, 2024 23:14
@godlygeek
Copy link
Contributor

godlygeek commented Mar 8, 2024

Alternatively (and I think preferably), I could push forward with my cleanup by removing pretty_thread_name from the record and factoring it into a def format_thread_name(record) -> str: function in reporters/__init__.py or a new reporters/common.py, and update our reporters to get it from there.

I've done this (last commit on this PR)

I think #561 can really be broken down into 2 related work items:

  1. capture Python thread names as well as OS thread names
  2. displaying the best name we have for each thread on the TUI

I've implemented item 1 (second to last commit on this PR), using the descriptor-based plan we discussed offline earlier today. It turned out to be a bit more involved than it seemed at first glance, because we needed to worry about names being set before a thread is started, or one thread's name being set from a different thread. It turned out not to be too tough to support both of those things, though.

Before you ask: there's a C-style cast in this PR to convert a pthread_t to a uint64_t. I couldn't figure out any other way to do this conversion:

  • On macOS, pthread_t is an opaque pointer, and we need to reinterpret_cast to get a uint64_t or a uintptr_t
  • On i686, pthread_t is an unsigned long int, and we need to static_cast to get either a uint64_t or even a uintptr_t

I stopped wrestling with it after a while and just switched to a C-style cast, so that it reinterprets when we need reinterpreting and statically casts when we need a conversion between integer types.

@godlygeek godlygeek marked this pull request as ready for review March 8, 2024 23:19
We're performing formatting in this method, rather than exposing the
thread's name directly. Rename this method to make it clear that it's
performing some formatting, and to unblock adding a new method exposing
the thread name directly.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
To allow users to determine which thread in a multithreaded application
corresponds to the "Thread X of Y" number displayed in the live view,
include the thread name that we collect in the TUI.

Signed-off-by: Pablo Galindo <pablogsal@gmail.com>
Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Previously we only captured the thread name as set by `prctl`, but not
the thread name as returned by `threading.current_thread().name`.

Begin capturing the name for the Python thread as well. We retain only
the last name set for each thread, so assignments to `Thread.name`
override earlier calls to `prctl(PR_SET_NAME)`, and vice versa.

This implementation uses a custom descriptor to intercept assignments to
`Thread._name` and `Thread._ident` in order to detect when a thread has
a name or a thread id assigned to it. Because this is tricky and a bit
fragile (poking at the internals of `Thread`), I've implemented that
descriptor in a Python module. At least that way if it ever breaks, it
should be a bit easier for someone to investigate.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
Introduce a new `memray.reporters.common` module for utilities needed by
multiple reporters. Replace the `AllocationRecord.pretty_thread_name`
property with a `format_thread_name` function in that new module.

Signed-off-by: Matt Wozniski <mwozniski@bloomberg.net>
setattr(
threading.Thread,
attr,
ThreadNameInterceptor(attr, NativeTracker.registerThreadNameById),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good old friend :P

@pablogsal pablogsal enabled auto-merge (rebase) May 30, 2024 21:09
@pablogsal pablogsal merged commit 896b37d into bloomberg:main May 30, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include thread name in Memray live tracking view
3 participants