-
-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use coordinate-based locking to increase chunk system parallelism #74
Conversation
I did a bit of testing with this branch and bots joining + spawnRadius = 50k. Chunk loading seems to be smoother, which is nice. However, in two out of three tries the server completely locked up after about 10 minutes. |
How many regions? You may have hit the scheduler's limit. |
I have reverted the change to the region shift, please try again. |
Got this on previous built, from 19h ago, might help. log
|
I can only see this happening if you have a plugin used the regenerate chunk API. Is this the case? |
Hi Leaf, I don't have any plugins generating chunks. BTW on current built I got these controlled errores: "[21:14:34 ERROR]: Failed to fetch mob spawner entity at (-3540, 21, -529)" |
I have reviewed the stacktrace again only to find that the line number on ImposterProtoChunk does not match what it should. |
A significant overhead in Folia comes from the chunk system's locks, the ticket lock and the scheduling lock. The public test server, which had ~330 players, had signficant performance problems with these locks: ~80% of the time spent ticking was _waiting_ for the locks to free. Given that it used around 15 cores total at peak, this is a complete and utter loss of potential. To address this issue, I have replaced the ticket lock and scheduling lock with two ReentrantAreaLocks. The ReentrantAreaLock takes a shift, which is used internally to group positions into sections. This grouping is neccessary, as the possible radius of area that needs to be acquired for any given lock usage is up to 64. As such, the shift is critical to reduce the number of areas required to lock for any lock operation. Currently, it is set to a shift of 6, which is identical to the ticket level propagation shift (and, it must be at least the ticket level propagation shift AND the region shift). The chunk system locking changes required a complete rewrite of the chunk system tick, chunk system unload, and chunk system ticket level propagation - as all of the previous logic only works with a single global lock. This does introduce two other section shifts: the lock shift, and the ticket shift. The lock shift is simply what shift the area locks use, and the ticket shift represents the size of the ticket sections. Currently, these values are just set to the region shift for simplicity. However, they are not arbitrary: the lock shift must be at least the size of the ticket shift and must be at least the size of the region shift. The ticket shift must also be >= the ceil(log2(max ticket level source)). The chunk system's ticket propagator is now global state, instead of region state. This cleans up the logic for ticket levels significantly, and removes usage of the region lock in this area, but it also means that the addition of a ticket no longer creates a region. To alleviate the side effects of this change, the global tick thread now processes ticket level updates for each world every tick to guarantee eventual ticket level processing. The chunk system also provides a hook to process ticket level changes in a given _section_, so that the region queue can guarantee that after adding its reference counter that the region section is created/exists/wont be destroyed. The ticket propagator operates by updating the sources in a single ticket section, and propagating the updates to its 1 radius neighbours. This allows the ticket updates to occur in parallel or selectively (see above). Currently, the process ticket level update function operates by polling from a concurrent queue of sections to update and simply invoking the single section update logic. This allows the function to operate completely in parallel, provided the queue is ordered right. Additionally, this limits the area used in the ticket/scheduling lock when processing updates, which should massively increase parallelism compared to before. The chunk system ticket addition for expirable ticket types has been modified to no longer track exact tick deadlines, as this relies on what region the ticket is in. Instead, the chunk system tracks a map of lock section -> (chunk coordinate -> expire ticket count) and every ticket has been changed to have a removeDelay count that is decremented each tick. Each region searches its own sections to find tickets to try to expire. Chunk system unloading has been modified to track unloads by lock section. The ordering is determined by which section a chunk resides in. The unload process now removes from unload sections and processes the full unload stages (1, 2, 3) before moving to the next section, if possible. This allows the unload logic to only hold one lock section at a time for each lock, which is a massive parallelism increase. In stress testing, these changes lowered the locking overhead to only 5% from ~70%, which completely fix the original problem as described.
This branch seems stable enough for Folia, so I'll merge it now. |
Might have missed a step in the built, anyway the last built from this branch did not produce any problems besides the one I last posted, all controlled. |
See patch notes for implementation details.
This patch should massively increase performance in Folia due to the significantly better locking behavior.
As with mainline Folia, builds are not provided you must build it yourself. If you already know how to build regular Folia, just use
your git client to checkout to the dev/locking branch (and please verify via current HEAD that you have the latest from the branch).
Please report any issues with this branch on this PR.