New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decrease lock contention in SingleThreadQueueExtent #2594
Conversation
Great work. Any impact to small edits -- negative or positive? |
That's reallydifficult to measure, but from a quick test it seems to perform mostly the same. As a note, we currently still generate 2 Lines 213 to 216 in f94b96d
so when we try to access data from that chunk directly afterwards, we have to load it again. This was the case before too basically (although less deterministic), just that these chunks were loaded through a different (the one specific I mentioned in the PR description) STQE instance. We can probably get rid of that somehow, but I didn't come up with a simple solution for that. |
fwiw, chunks may not be reused often when it's just one person editing, but on a larger server with up to tens of people editing simultaneously I wonder how often pooled chunks are used, and in that case, if there could be a negative impact on GC/performance due to more ChunkHolders now being generated and requiring GC. I would also assume that the cause of not using pooled chunks more frequently is, in actuality, caused by the issue with large amounts of lock contention, as chunks are submitted but take some time to be processed, meaning the ChunkHolder is not released to the pool again until much of the edit is completed, due to the priority we give to the threads completing the main edit work. I wonder what it would look like if correctly exposing the correct STQE to each thread were to be implemented in isolation. One comment; why do we want a separate extent to handle the Thread-specific-STQE? Given it is intrinsic to correct use of the parallel extent we can simply add the behaviour there? Also, is there a reason for not using a ThreadLocal to store the STQEs? |
I thought about that, but in my measurements, FAWE generated >6 GB of garbage during my edit, while ChunkHolder instances only accounted for 1 MB of that. There is even more garbage generated indirectly due to chunk loading, chunk saving etc. Therefore, I don't think it is worth to pool ChunkHolder objects.
That might be true, although from what it looks like the objects were recycled too soon rather than too late.
Good idea.
I experimented with something where this was necessary, but I can roll it back I guess. |
I suppose for the small benefit vs the added complexity it's probably easiest not to pool then yeah. Minecraft chunks are so overloaded with objects (esp. during generation) already.
Doesn't really matter anymore if we're removing pooling though anyways
ThreadLocal should be more efficient as there's no synchronisation/locking involved and it's not like it removes and "security" for object retention/bleed as we should always still be uncaching from either implementation |
Overview
Fixes #2590
Description
This change addresses multiple smaller parts that sum up into a performance problem and race conditions:
//gmask
set, theMaskingExtent
will access blocks.SingleThreadQueueExtent
instance created here:FastAsyncWorldEdit/worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/ParallelQueueExtent.java
Line 62 in f94b96d
FastAsyncWorldEdit/worldedit-core/src/main/java/com/fastasyncworldedit/core/queue/implementation/SingleThreadQueueExtent.java
Line 293 in f94b96d
By introducing a
ThreadLocalPassthroughExtent
, we can provide the threads of theParallelQueueExtent
with enough context to use their ownSingleThreadQueueExtent
. Additionally, we have the Blocking Executor thread pool that also wants to access chunk data. As tasks running on those threads are called fromChunkHolder
, we can statically provide the context for the running task. This is a bit dirty, but I think it is worth it!As a side effect, these mentioned changes caused #2590 to happen regularly. As a fix, we no longer pool
ChunkHolder
objects. According to my measurements, only ~1/4 of theChunkHolder
objects were actually reused anyway (and the objects are really small, 56 bytes on typical JVM configurations). By not reusing them, we help avoiding from them ending up in an old generation. This might also have positive effects on GC overhead, but that part is basically impossible to measure.By not recycling these objects, we don't run into the race condition anymore.
I also noticed that the mask in
LocalSession
might keep objects of the most recentEditSession
in memory, so when remembering a change, we now also clean potential references to that edit session. This further helps makingChunkHolder
short-lived objects.Performance
The previously high lock contention resulted in situations, where most of the threads in the blocking executor are busy waiting. This cascaded into the Fork Join Pool Primary doing the work of the blocking executor because the tasks are rejected. As this then causes these threads to compete for the lock too, we ended up waiting most of the time:
With this change, we can properly use the CPU:
In my experiments, this brought a 100% speed improvement for medium and large edits.
Submitter Checklist
@since TODO
.