-
Couldn't load subscription status.
- Fork 2.6k
Move eventpipe buffer to TLS #21817
Move eventpipe buffer to TLS #21817
Conversation
src/vm/eventpipebuffer.cpp
Outdated
| pActivityId, | ||
| pRelatedActivityId); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| pInstance = new (m_pCurrent) EventPipeEventInstance( | |
| session, | |
| event, | |
| osThreadId, | |
| pDataDest, | |
| payload.GetSize(), | |
| (pThread == NULL) ? NULL : pActivityId, | |
| pRelatedActivityId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the conditional on pActivityId is needed. I think we can further simplify to:
| pInstance = new (m_pCurrent) EventPipeEventInstance( | |
| session, | |
| event, | |
| osThreadId, | |
| pDataDest, | |
| payload.GetSize(), | |
| pActivityId, | |
| pRelatedActivityId); |
src/vm/threads.h
Outdated
| AppDomain* m_pAppDomain; // This field is read only by the SOS plugin to get the AppDomain | ||
| void** m_EETlsData; // ClrTlsInfo::data | ||
| EventPipeBufferList* m_pThreadEventBufferList; | ||
| bool m_threadEventWriteInProgress; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread local slots are moderately precious things, so would like to economize. You can put the m_threadEventWriteInProgress as a property of EventPipeBufferList and avoid needing it here (it belongs there anyway, it is faster (since you already have a EventPipeBufferList when you need to know EventWriteInProgress, and this variable is protecting exactly the EventPipeBufferList structure anyway, so there is no down-side to moving it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fyi changes in DeAllocateBuffers have broken the original usage of m_threadEventWriteInProgress. It is dead-code at this point. Originally this field was used so that DeAllocateBuffers could clean up per-thread buffers that were still in use. Looking over the current code I believe we are leaking all the thread local buffers on shutdown.
I think we'd be OK checking in with the leaks and then fixing it in a follow-up PR.
Background: Prior to the PR we had two per-thread volatile fields and the code used them like this:
/* cleanup-thread in DeAllocateBuffers() */
if(!pThread->m_threadEventWriteInProgress)
{
// delete all the memory in the buffers
m_pThreadEventBufferList = NULL;
}
else
{
// leak it
}
/* event writing thread in EventPipeBufferManager::WriteEvent */
m_threadEventWriteInProgress = TRUE
pBufferList = m_pThreadEventBufferList
WriteToTheBuffer(pBufferList)
m_threadEventWriteInProgress = FALSE
We've since eliminated that code in DeAllocateBuffers() and I didn't notice any replacement.
That original code didn't look thread-safe either. The WriteEvent thread could set the in-use flag and begin using the buffers just after the cleanup thread checked they weren't in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a couple of change request, but mostly it looks good.
|
Thanks Vance! I'll update the PR accordingly. |
src/vm/eventpipebuffer.cpp
Outdated
| if (pThread == NULL) | ||
| { | ||
| // if pthread is NULL, it's likely we are running in something like a GC thread which is not a Thread object, so it can't have an activity ID set anyway | ||
| pInstance = new (m_pCurrent) EventPipeEventInstance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing thing different between the two cases is the activity ID argument. You can just have the condition for that one argument without duplicating the whole thing.
(pThread != NULL) ? pActivityId : NULL
src/vm/eventpipebuffer.cpp
Outdated
| pActivityId, | ||
| pRelatedActivityId); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why the conditional on pActivityId is needed. I think we can further simplify to:
| pInstance = new (m_pCurrent) EventPipeEventInstance( | |
| session, | |
| event, | |
| osThreadId, | |
| pDataDest, | |
| payload.GetSize(), | |
| pActivityId, | |
| pRelatedActivityId); |
src/vm/threads.h
Outdated
| Thread* m_pThread; | ||
| AppDomain* m_pAppDomain; // This field is read only by the SOS plugin to get the AppDomain | ||
| void** m_EETlsData; // ClrTlsInfo::data | ||
| EventPipeBufferList* m_pThreadEventBufferList; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously these types were marked volatile and now they aren't. I would either restore those markings or walk me through the rationale of why they are no longer necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be able do define your own local __thread EventPipeBufferList* t_pThreadEventBufferList. The reasons why we had to have all thread-local statics in this central structure should no longer exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
walk me through the rationale of why they are no longer necessary.
The central structure was required because of the offsets of thread-local statics were hardcoded in the debugger. They are no longer hardcoded. The code in InitThreadManager computes the current offset and communicates it to debugger via DAC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas - sorry I had two separate issues in those two comments. The text you quoted was in the first comment asking about Sung changing types from Volatile<T> to T but you answered referring to the second comment about usage of TLS slots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be the local definition - __thread EventPipeBufferList* t_pThreadEventBufferList.
src/vm/threads.cpp
Outdated
| EventPipeBufferList *pBufferList = GetThreadEventBufferList(); | ||
| if(pBufferList != NULL) | ||
| { | ||
| pBufferList->SetOwnedByThread(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the buffers aren't owned by the Thread object any longer then we should move the cleanup code to a path that applies to both managed and unmanaged threads. As it is, creating an unmanaged thread, logging from it, then destroying it would leak a buffer. CExecutionEngine::ThreadDetaching looks like a reasonable place though admittedly I am not that familiar with that portion of our code. @kouvel might be more knowledgable about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like CExecutionEngine::ThreadDetaching only applies to Thread objects and is called when the Thread object is being cleaned up. One way may be to add a destructor to ThreadLocalInfo, seems to work with __declspec(thread) but would need to ensure that it also works with clang. Another option is to keep the thread local info in Thread and create Thread objects. The Thread object is not necessarily tied to managed threads, that would be the m_exposedObject in Thread. For instance, wait threads in the thread pool (see ThreadpoolMgr::CreateWaitThread()) set up the thread to create a Thread object even though they would not run any managed code. I don't know the code well enough yet to tell when a Thread object is necessary vs not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kount! I agree if we could do a destructor that seems very elegant.
@jkotas believed creating a Thread for those threads would be the harder of the options (https://github.com/dotnet/coreclr/issues/21380#issuecomment-447876984) but I don't have the GC knowledge to have a good opinion myself.
src/vm/eventpipebuffermanager.cpp
Outdated
| if (!pThreadBufferList->OwnedByThread()) | ||
| { | ||
| Thread *pThread = NULL; | ||
| while ((pThread = ThreadStore::GetThreadList(pThread)) != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brianrob was there a reason we needed to find the Thread object that was pointing at a specific buffer list and NULL it out? It appears we only unowned a buffer when the Thread was being detached/destroyed and I would assume we could either NULL the m_pEventPipeBufferList field at that point or ignore it because it would never be read from again?
@sywhang - This loop is weird now. It is iterating over all the Thread objects, but since you don't use the Thread object in the body of the loop it is just checking the current thread's list (the thread that is executing this code, not the thread refered to by the pThread object) over and over again. Depending on what Brian says I suspect we can delete this entire while((pThread = …) loop as no longer useful, and we could set the TLS slot containing the EventPipeBufferList NULL at the same time it is unowned by a thread.
src/vm/threads.cpp
Outdated
| // so that they can be cleaned up after the thread dies. | ||
| EventPipeBufferList *pBufferList = th->GetEventPipeBufferList(); | ||
| EventPipeBufferList *pBufferList = GetThreadEventBufferList(); | ||
| if(pBufferList != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as mentioned below. If the buffers aren't owned by Thread then Thread related code shouldn't be responsible for cleaning them up.
src/vm/eventpipebuffer.cpp
Outdated
| bool success = true; | ||
| EX_TRY | ||
| { | ||
| LPCGUID pThreadActivityID = pActivityId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assignment intended? Variable is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I missed that. Thanks for catching that
src/vm/eventpipebuffermanager.cpp
Outdated
| } // extern "C" | ||
|
|
||
|
|
||
| EXTERN_C inline EventPipeBufferList* STDCALL GetThreadEventBufferList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These do not need to be EXTERN_C. In fact, these can be a static methods inside EventPipeBufferList.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or inside ThreadEventBufferList)
| { | ||
| if (payload[j-1] == (byte)(0) && payload[j] == (byte)(0)) | ||
| { | ||
| byteCount = j; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like an off-by-one-error. For example in the string "cat", I'd expect byteCount = 8 but this computes 7. Below
charPayload = MemoryMarshal.Cast<byte, char>(payload.Slice(0, byteCount));
creates payload with an odd number of bytes, then casts it as a charPayload. I'm guessing the implicit rounding in the length conversion within the cast makes this work out, but it would be nice not to rely on those subtleties.
|
heads up, I responded to Vance's comment way above, its easy to miss in all the noise. |
src/vm/eventpipebuffermanager.h
Outdated
| // when the thread dies so we can free EventPipeBufferList in the destructor. | ||
| class ThreadEventBufferList | ||
| { | ||
| public: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Since you have get&set methods, the field itself can be private. Only the methods need to be public.
| for (int j = 0; j < payload.Length; j+=2) | ||
| { | ||
| if (payload[j-1] == (byte)(0) && payload[j] == (byte)(0)) | ||
| if (payload[j] == (byte)(0) && payload[j+1] == (byte)(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately now the code will read off the end of the buffer when given an odd size buffer (which isn't supposed to happen, but depending on global correctness of code to prevent overruning a buffer isn't great)
I'd suggest leaving the code how it was before, except set byteCount = j+1 and adjust the payload slice length computations.
| { | ||
| charPayload = charPayload.Slice(0, charCount); | ||
| payload = payload.Slice((charCount + 1) * sizeof(char)); | ||
| charPayload = MemoryMarshal.Cast<byte, char>(payload.Slice(0, byteCount)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| charPayload = MemoryMarshal.Cast<byte, char>(payload.Slice(0, byteCount)); | |
| //byteCount includes the null terminator, which we do not want in the managed string object | |
| charPayload = MemoryMarshal.Cast<byte, char>(payload.Slice(0, byteCount - sizeof(char)); |
byteCount includes the null terminator bytes, but managed string buffers aren't canonically null-terminated. For an example, this code snippet will fail the assert evaluating 4 == 3
char[] catChars = { 'c', 'a', 't', (char)0 };
ReadOnlySpan<char> span = new ReadOnlySpan<char>(catChars);
string cat = new string(span);
string cat2 = "cat";
Debug.Assert(cat.Length == cat2.Length);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you apply that one final fix then lgtm
| int charCount = charPayload.IndexOf('\0'); | ||
| if (charCount < 0) | ||
| // Try to find null terminator (0x00) from the byte span | ||
| // NOTE: we do this by hand instead of using payload.IndexOf('\n') because payload may be unaligned due to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This comment talks about payload.IndexOf('\n'), but the code is looking for zeros (\n is not a zero).
|
Merging this since CI failures seem to be infrastructure issues (timeouts). Thanks everyone for all the feedback! |
* start ripping out eventpipe buffer to tls * can now emit events from gc threads * cleanup * more cleanup * more cleanup * tested on linux * Addressing PR comments * Move things around a bit to build in Linux * change eventpipe buffer deallocation code * more cleanup * this while loop doesnt do anything now * Fix build * fixing build * More cleanup * more pr comments * Fix unix build * more pr comments * trying to add a message to assertion that seems to be causing CIs to fail * more pr feedback * handle non-2-byte aligned string payloads inside payload buffers * some more cleanup * Fix off by one error in null index calculation * Make Get/SetThreadEventBufferList a static member of ThreadEventBufferList * make only the methods public in ThreadEventBufferList * Addressing noah's comments * fix comment and last off by 1 error Commit migrated from dotnet/coreclr@ee8cda0
EventPipe has a buffer that's stored in the
Threadobject, but in the runtime we may be emitting events from threads that are notThreadobject. (i.e. BGC threads). This moves the pointer toEventPipeBufferListthat's stored in Thread to TLS so that it can be accessed from a "non-Thread" thread.This should address https://github.com/dotnet/coreclr/issues/21380