Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use api that "stack trace unavailable" in sub-thread #130

Closed
wilsonchai8 opened this issue Jun 2, 2022 · 12 comments · Fixed by #152
Closed

Use api that "stack trace unavailable" in sub-thread #130

wilsonchai8 opened this issue Jun 2, 2022 · 12 comments · Fixed by #152
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@wilsonchai8
Copy link

wilsonchai8 commented Jun 2, 2022

I had a situation that main thread has a function and its memory grows up slowly, at the same time I open a sub-thread to monitor this

import time
from threading import Thread
import memray

def hello():
    a = 'h' * 10240
    count = 0
    while count<15:
        a += a
        count += 1

def m():
    with memray.Tracker("test.bin"):
        time.sleep(2)

t = Thread(target=m, args=())
t.start()
hello()
t.join()
root@b55328e153b0:/tmp# memray stats test.bin
📏 Total allocations:
	2

📦 Total memory allocated:
	480.000MB

📊 Histogram of allocation size:
	min: 480.000MB
	----------------------------------------
	< 225.000PB: 1 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇
	< 105.5YB  : 0
	< 53084201343.8YB: 0
	< 26718158566205665280.0YB: 0
	< 13447692140004132921498664960.0YB: 0
	< 6768446389904240393238904681458040832.0YB: 0
	< 3406671275343084001598671104068678912850264064.0YB: 0
	< 1714634128676590725640136232502358247467182517297086464.0YB: 0
	< 863003782167520542775850501504234338770282486944023855555936256.0YB: 0
	<=434364110441617396824120646697741245745787014207085330508234065870585856.0YB: 0
	----------------------------------------
	max: 480.000MB

📂 Allocator type distribution:
	 MALLOC: 1

🥇 Top 5 largest allocating locations (by size):
	- <stack trace unavailable> -> 480.000MB

🥇 Top 5 largest allocating locations (by number of allocations):
	- <stack trace unavailable> -> 2

in the end, "stack trace unavailable" is printed on the screen , I can't get the detail which line consume memory

@wilsonchai8 wilsonchai8 changed the title use api that "stack trace unavailable" in sub-thread Use api that "stack trace unavailable" in sub-thread Jun 2, 2022
@godlygeek
Copy link
Contributor

Wow, I can reproduce that, and that is very very wrong. The "stack trace unavailable" seems less wrong than the fact that the histogram thought the smallest bucket should be 225 petabytes and the largest one should be 434 million vigintillion yottabytes - even though all of those buckets except one is empty.

How very strange...

@godlygeek
Copy link
Contributor

Well, I've figured out what's happening for each of those bugs. The histogram one is easy: we go off the rails when all of the allocation locations have allocated the same amount of memory (we're careful to avoid a divide by zero, but what we do instead isn't terribly helpful).

The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.

@pablogsal
Copy link
Member

pablogsal commented Jun 3, 2022

The other problem, the one that you raised the issue for, is trickier. What's happening is that, when the tracker is started in one thread, we're failing to install our profile function on the other threads, which means that we're failing to collect the Python stack of the main thread leading up to the allocations. I'm not sure what to do about that just yet.

This is expected: thread profile functions only affect newly created threads and the sys.setprofile call is per-thread. This means that when you activate the Tracker it will only track the stack of threads created after it was initialized. We should either document this limitation or try to overcome it, which is tricky, because there isn't a supported way to "install a profile function in every running thread".

Also, even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.

For the main thread we can call Py_AddPendingCall with the trace function setter if we are invoked from a different thread, but it has limited usefulness.

I think we should:

  1. Document the limitation with Tracker and existing running threads.
  2. Fix the stats historgram.

@pablogsal pablogsal added bug Something isn't working documentation Improvements or additions to documentation labels Jun 3, 2022
@godlygeek
Copy link
Contributor

godlygeek commented Jun 3, 2022

because there isn't a supported way to "install a profile function in every running thread"

How about if we make install_trace_function do

PyInterpreterState* interp = PyThreadState_GetInterpreter(PyThreadState_Get());
PyThreadState* ts = PyInterpreterState_ThreadHead(interp);
while (ts) {
    if (_PyEval_SetProfile(ts, PyTraceFunction, PyLong_FromLong(123)) < 0) {
        PyErr_Clear();
    }
    ts = PyThreadState_Next(ts);
}

Everything that uses is public except for _PyEval_SetProfile.

even if we install a profile function, the main thread will continue to be untracked because it never entered new functions while we are tracking in another thread, so we won't see a frame push or a pop.

When we install the profile function on a thread, perhaps we could also capture its initial stack, since at the point where we would be installing the profile function, we're holding the GIL and could walk the stack backwards from the thread's current frame. We could then apply any later pushes/pops on top of our captured initial stack.

@pablogsal
Copy link
Member

Everything that uses is public except for _PyEval_SetProfile.

This means it is not supported and there is no assurance we can do this in the future, that's why I said "supported way" 😉

OTOH we can go with this for now meanwhile I raise this issue with the rest of the core team and we can consider making _PyEval_SetProfile public or at least semi-private.

@godlygeek
Copy link
Contributor

there is no assurance we can do this in the future

Sure, fair enough. It is available in 3.7 through 3.11, though, at least. And I can't see much reason why it shouldn't be public - it's just PyEval_SetProfile applied to any arbitrary PyThreadState, and the ability to iterate through PyThreadState's is already public, and PyEval_SetProfile is public - I don't see any obvious reason why _PyEval_SetProfile shouldn't be.

meanwhile I raise this issue with the rest of the core team and we can consider making _PyEval_SetProfile public or at least semi-private.

I think that's a good idea. If there's not resistance to that, then that seems like a reasonable way for us to proceed.

If there is resistance to it, then we could always interpose the pthreads functions used to acquire the GIL, giving us a hook into when the GIL is next picked up in whatever other thread is already running 😈

@godlygeek
Copy link
Contributor

Another option would be for us to use the public and documented _PyInterpreterState_SetEvalFrameFunc to inject a frame evaluation function instead of a profile function - those are shared across all threads for an interpreter, and would let us detect that tracing has begun on the next Python function call within a thread. Though that seems much more likely to go away in the future than _PyEval_SetProfile would be.

@godlygeek
Copy link
Contributor

I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use PyThreadState_Swap to make it our current thread state, then call PyEval_SetProfile, then restore our old thread state?

If that's valid, it would only use public APIs, but there's an assertion I don't understand in _PyThreadState_Swap which makes me worry that it wouldn't be valid to do so.

@godlygeek
Copy link
Contributor

(I'm guessing that the answer is "no", and that it's not valid to use a thread state on a different OS thread than the one that it was created for, but that doesn't seem to be documented anywhere...)

@pablogsal
Copy link
Member

pablogsal commented Jun 3, 2022

I'm not gonna pretend to have a good handle on the thread state APIs, but - could we iterate through the interpreter's thread states and, for each of them, use PyThreadState_Swap to make it our current thread state, then call PyEval_SetProfile, then restore our old thread state?

If that's valid, it would only use public APIs, but there's an assertion I don't understand in _PyThreadState_Swap which makes me worry that it wouldn't be valid to do so.

I'm not at my computer so I cannot check but if I recall correctly _PyThreadState_Swap will crash if you try to pass a thread state that is not the one that is currently registered with the GIL, if that is set.

The other method we discussed is much more attainable than messing directly with the thread APIs.

@wilsonchai8
Copy link
Author

wilsonchai8 commented Jun 6, 2022

If I use 'live' mode, I can get more information about every thread

Like this:

import time
from threading import Thread

def hello():
    a = 'h' * 10240
    count = 0
    while count<15:
        a += a
        count += 1
    time.sleep(10)

t = Thread(target=hello, args=())
t.start()
t.join()

memray run --live main.py

image

image

@godlygeek
Copy link
Contributor

Right, any usage of memray run should work fine, because in those cases our hooks get installed before the application creates any threads.

It's only using with memray.Tracker(...) that isn't working, because doing that in some particular thread leaves any other already-running threads without our hooks installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants