Use api that "stack trace unavailable" in sub-thread #130

wilsonchai8 · 2022-06-02T09:31:47Z

I had a situation that main thread has a function and its memory grows up slowly, at the same time I open a sub-thread to monitor this

import time
from threading import Thread
import memray

def hello():
    a = 'h' * 10240
    count = 0
    while count<15:
        a += a
        count += 1

def m():
    with memray.Tracker("test.bin"):
        time.sleep(2)

t = Thread(target=m, args=())
t.start()
hello()
t.join()

root@b55328e153b0:/tmp# memray stats test.bin
📏 Total allocations:
	2

📦 Total memory allocated:
	480.000MB

📊 Histogram of allocation size:
	min: 480.000MB
	----------------------------------------
	< 225.000PB: 1 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 105.5YB  : 0
	< 53084201343.8YB: 0
	< 26718158566205665280.0YB: 0
	< 13447692140004132921498664960.0YB: 0
	< 6768446389904240393238904681458040832.0YB: 0
	< 3406671275343084001598671104068678912850264064.0YB: 0
	< 1714634128676590725640136232502358247467182517297086464.0YB: 0
	< 863003782167520542775850501504234338770282486944023855555936256.0YB: 0
	<=434364110441617396824120646697741245745787014207085330508234065870585856.0YB: 0
	----------------------------------------
	max: 480.000MB

📂 Allocator type distribution:
	 MALLOC: 1

🥇 Top 5 largest allocating locations (by size):
	- <stack trace unavailable> -> 480.000MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- <stack trace unavailable> -> 2

in the end, "stack trace unavailable" is printed on the screen , I can't get the detail which line consume memory

The text was updated successfully, but these errors were encountered:

godlygeek · 2022-06-03T14:52:50Z

Wow, I can reproduce that, and that is very very wrong. The "stack trace unavailable" seems less wrong than the fact that the histogram thought the smallest bucket should be 225 petabytes and the largest one should be 434 million vigintillion yottabytes - even though all of those buckets except one is empty.

How very strange...

godlygeek · 2022-06-03T16:07:20Z

Well, I've figured out what's happening for each of those bugs. The histogram one is easy: we go off the rails when all of the allocation locations have allocated the same amount of memory (we're careful to avoid a divide by zero, but what we do instead isn't terribly helpful).

The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.

pablogsal · 2022-06-03T16:18:26Z

The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.

This is expected: thread profile functions only affect newly created threads and the sys.setprofile call is per-thread. This means that when you activate the Tracker it will only track the stack of threads created after it was initialized. We should either document this limitation or try to overcome it, which is tricky, because there isn't a supported way to "install a profile function in every running thread".

Also, even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.

For the main thread we can call Py_AddPendingCall with the trace function setter if we are invoked from a different thread, but it has limited usefulness.

I think we should:

Document the limitation with Tracker and existing running threads.
Fix the stats historgram.

godlygeek · 2022-06-03T16:46:06Z

because there isn't a supported way to "install a profile function in every running thread"

How about if we make install_trace_function do

PyInterpreterState* interp = PyThreadState_GetInterpreter(PyThreadState_Get());
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
while (ts) {
    if (_PyEval_SetProfile(ts, PyTraceFunction, PyLong_FromLong(123)) < 0) {
        PyErr_Clear();
    }
    ts = PyThreadState_Next(ts);
}

Everything that uses is public except for _PyEval_SetProfile.

even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.

When we install the profile function on a thread, perhaps we could also capture its initial stack, since at the point where we would be installing the profile function, we're holding the GIL and could walk the stack backwards from the thread's current frame. We could then apply any later pushes/pops on top of our captured initial stack.

pablogsal · 2022-06-03T16:50:31Z

Everything that uses is public except for _PyEval_SetProfile.

This means it is not supported and there is no assurance we can do this in the future, that's why I said "supported way" 😉

OTOH we can go with this for now meanwhile I raise this issue with the rest of the core team and we can consider making _PyEval_SetProfile public or at least semi-private.

godlygeek · 2022-06-03T16:58:35Z

there is no assurance we can do this in the future

Sure, fair enough. It is available in 3.7 through 3.11, though, at least. And I can't see much reason why it shouldn't be public - it's just PyEval_SetProfile applied to any arbitrary PyThreadState, and the ability to iterate through PyThreadState's is already public, and PyEval_SetProfile is public - I don't see any obvious reason why _PyEval_SetProfile shouldn't be.

meanwhile I raise this issue with the rest of the core team and we can consider making _PyEval_SetProfile public or at least semi-private.

I think that's a good idea. If there's not resistance to that, then that seems like a reasonable way for us to proceed.

If there is resistance to it, then we could always interpose the pthreads functions used to acquire the GIL, giving us a hook into when the GIL is next picked up in whatever other thread is already running 😈

godlygeek · 2022-06-03T17:07:11Z

Another option would be for us to use the public and documented _PyInterpreterState_SetEvalFrameFunc to inject a frame evaluation function instead of a profile function - those are shared across all threads for an interpreter, and would let us detect that tracing has begun on the next Python function call within a thread. Though that seems much more likely to go away in the future than _PyEval_SetProfile would be.

godlygeek · 2022-06-03T18:45:33Z

I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use PyThreadState_Swap to make it our current thread state, then call PyEval_SetProfile, then restore our old thread state?

If that's valid, it would only use public APIs, but there's an assertion I don't understand in _PyThreadState_Swap which makes me worry that it wouldn't be valid to do so.

godlygeek · 2022-06-03T18:57:56Z

(I'm guessing that the answer is "no", and that it's not valid to use a thread state on a different OS thread than the one that it was created for, but that doesn't seem to be documented anywhere...)

pablogsal · 2022-06-03T20:58:52Z

I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use PyThreadState_Swap to make it our current thread state, then call PyEval_SetProfile, then restore our old thread state?

If that's valid, it would only use public APIs, but there's an assertion I don't understand in _PyThreadState_Swap which makes me worry that it wouldn't be valid to do so.

I'm not at my computer so I cannot check but if I recall correctly _PyThreadState_Swap will crash if you try to pass a thread state that is not the one that is currently registered with the GIL, if that is set.

The other method we discussed is much more attainable than messing directly with the thread APIs.

wilsonchai8 · 2022-06-06T09:49:16Z

If I use 'live' mode, I can get more information about every thread

Like this:

import time
from threading import Thread

def hello():
    a = 'h' * 10240
    count = 0
    while count<15:
        a += a
        count += 1
    time.sleep(10)

t = Thread(target=hello, args=())
t.start()
t.join()

memray run --live main.py

godlygeek · 2022-06-06T14:56:14Z

Right, any usage of memray run should work fine, because in those cases our hooks get installed before the application creates any threads.

It's only using with memray.Tracker(...) that isn't working, because doing that in some particular thread leaves any other already-running threads without our hooks installed.

wilsonchai8 changed the title ~~use api that "stack trace unavailable" in sub-thread~~ Use api that "stack trace unavailable" in sub-thread Jun 2, 2022

pablogsal added bug Something isn't working documentation Improvements or additions to documentation labels Jun 3, 2022

This was referenced Jun 3, 2022

Fix histogram when all allocations are the same size #133

Merged

Fix the aggregation by location in memray stats -a #134

Merged

godlygeek mentioned this issue Jul 8, 2022

Capture running threads' Python stacks #152

Merged

godlygeek closed this as completed in #152 Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use api that "stack trace unavailable" in sub-thread #130

Use api that "stack trace unavailable" in sub-thread #130

wilsonchai8 commented Jun 2, 2022 •

edited

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

pablogsal commented Jun 3, 2022 •

edited

godlygeek commented Jun 3, 2022 •

edited

pablogsal commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

pablogsal commented Jun 3, 2022 •

edited

wilsonchai8 commented Jun 6, 2022 •

edited

godlygeek commented Jun 6, 2022

Use api that "stack trace unavailable" in sub-thread #130

Use api that "stack trace unavailable" in sub-thread #130

Comments

wilsonchai8 commented Jun 2, 2022 • edited

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

pablogsal commented Jun 3, 2022 • edited

godlygeek commented Jun 3, 2022 • edited

pablogsal commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

godlygeek commented Jun 3, 2022

pablogsal commented Jun 3, 2022 • edited

wilsonchai8 commented Jun 6, 2022 • edited

godlygeek commented Jun 6, 2022

wilsonchai8 commented Jun 2, 2022 •

edited

pablogsal commented Jun 3, 2022 •

edited

godlygeek commented Jun 3, 2022 •

edited

pablogsal commented Jun 3, 2022 •

edited

wilsonchai8 commented Jun 6, 2022 •

edited