Skip to content

Regionized Entity (and Tile) Ticking #1001

@aikar

Description

@aikar

Goal

This issue is to serve as a note holder and idea registry for an idea I had today on being able to tick entities in a parallel fashion, safely (at least, safe enough) - Please read entire issue before you go jumping "THIS ISN'T SAFE!!!"

This design is intended to benefit servers that have players more spread out where they are nowhere near each other.

The goal of this idea is to not be 100% thread safe, but to do things that under most conditions, would be practically safe, that in realistic scenarios are unlikely to cause issues.

Implementation Design

Break world ticking into a 3 stage process

Stage 1: doTick method - All World Operations that would happen for each world pre entity/tile entity ticking, sequentially, no parallelism, as it happens today.
Stage 2: Iterate all players Chunk Maps, iterate the chunks that they are participating in.

  • If none of their chunks are shared with another player, nor the chunk doesn't border a chunk that's also part of another region, create a region
  • If the chunk either is already part of a region, or borders a chunk as part of a region, that entire players chunk map joins the other region
    • This ensures that no 2 regions will ever border each other. All connected chunks are part of the same region.
  • Once all players chunk maps are marked and regions created, scan all chunks and look for any outlier "floating" chunks that are not part of any players chunk map (these are pending unload). Find nearest region and join that chunk to that region.
    • Consider: Config options to simply not even worry about outliers and skip ticking them? This is likely desirable.
  • Each region should create and immediately acquire a write lock on itself. This will be used by notes in Concurrency section at bottom.
  • Once all chunks are marked to a region, iterate the world entity and tile entity list (this is important to use the world list and not the chunk list in order to keep ticking order correct), and find their current chunk and then by retrospect their region, and build an Entity/Tile Entity tick list for that region. (and likely need the "Global" list which is used by lightning ?)
  • Dispatch every region to a Region Thread pool (configurable # of threads, default to # of cores)
  • Each of these threads should set a static ThreadLocal for "isSafeThread", to true
    • this system should be generic so we can reuse it for other sections of paper.
  • Potentially add a "safer" config option, that creates a Region Thread Pool per world with a max of 1 thread, allowing us to use 1 global region but tick the worlds in parallel, but leaving the worlds itself in parallel.
  • Entity Addition/Removal Lists need to be made concurrent, or per-region instead of per world, to avoid concurrency issues with multiple threads adding to them
  • The main thread will await on the region thread pools for them to fully flush and empty. This will keep the main thread paused until all regions are done executing.
  • Repeat all of this, but for Tile Entities. There is no benefit to doing this at the same time as Entities, as we would hopefully already be maximizing CPU potential, and will just be creating more queued work. This would also increase the likelihood of behavioral changes if these happened at the same time.
  • The main thread will await on the region thread pools for them to fully flush and empty. This will keep the main thread paused until all regions are done executing.
  • flush any per-region add/removal list into the world list. unset the region markers on every entity and tile entity

Stage 3: All post tick entity operations per world such as Entity Tracker, done sequentially, with no parallel (we can investigate applying parallel behavior to Entity Tracker in a later change)

Concurrency Concerns

During Parallel Region Execution, the main thread will be suspended, blocked waiting for the thread pools to completely empty.

This reduces risk of concurrency issues down to if a plugin itself acts on an event based on an Entity or Tile Entity, and then finds a relationship or scans the entire world for another entity, that happens to be in a different region.

This scenario is of course possible to occur, but not going to be super common.

If a plugin does end up responding to 1 entity event by acting on a completely different entity event, we then have to consider what is the plugin doing to the other entity.

I see it as very unlikely that even if a plugin does act on another regions entity, that it would even do something that would have a negative outcome.

The closest risky scenario I could see is:

  • Region 1 and a Region 2 Creeper is exploding at the exact same tick. Plugin listens to event, and teleports creeper 1 to the same location as creeper 2.
  • Both creepers now are going to explode at the same location, modifying the world,

This is the worst case scenario I can see, as you have 2 threads operating in the same location of the world, both removing blocks, and an entity may be receiving damage while mid its own tick, causing weird results.

However, the likelyhood that the server would crash here should be pretty minimal. What ultimately would come out of this bad scenario is behavioral differences.
The teleported creeper may of killed the attacking skeleton, that still fired an arrow, killing a player.
In a sequential scenario, creeper killed skeleton, skeleton never shoots arrow, player doesnt die.
In this risky scenario, both skeleton and player die.

Ultimately, you would of never even knew the outcome would of been different without regionized ticking.
Many server owners would be perfectly ok with this rare cased risk of behavior change.

In the event that state does get in a weird way, any error thrown would simply remove the entity, it would not crash the server.
Server owners could then make their own judgement to turn off this enhancement if its causing them issues.

Concurrency of Plugin Events

For any event fired on these threads, we should synchronize ensuring that no 2 events are firing at the exact same time.
This ensures that plugins do not have any local state being manipulated concurrently for any event they expected to be sync.

For plugins that are checking Bukkit.isPrimaryThread, we would update the isPrimaryThread to check the ThreadLocal isSafeThread mentioned in stage 3. If this boolean is set to true, we return true.

This will handle all of the Async Catcher operations too.

Entity Teleportation Mid Event - A resolution to teleporting Concerns (Creeper Scenario)

If an event causes an Entity to Teleport mid event, causing it to change regions, such as the Creeper scenario, we can even improve risk here by:

  • in Entity.setLocation, if the region marker is set (ie: we are mid tick), perform a post teleport region check to make sure the entity is still in the same region as before.
  • If the new location was not part of any region, consider it the same region
  • If the new location is in a chunk of another region, try to acquire the regions write lock object, which will result in blocking this regions execution until the new locations region has fully finished ticking. This isn't the "most idea", but it should be rare and trivial in delays compared to the bigger picture.
  • This avoids having 2 entities tick in the same area at the same time, completely resolving the Creeper Scenario concern!

Closing / Feedback

This idea isn't "final". I am opening this now to collect feedback and more risk scenarios from the community so we can design ideas to solve them.

If we can pull this off, this will pretty much be the largest performance improvement we will ever do for the server.

There is no ETA on this. This is an idea I had, and I would love to implement myself, but I am also extremely busy with work for the next few months... so I don't know when.

I don't want to hear any "This won't work" or "This isn't safe". I perfectly know this isn't 100% safe. That's not my goal. My goal is practically safe.
Meaning the server runs as expected under vanilla expectations and players would never even notice anything difference.

Give me scenarios that could result in noticeable negative behavior that could be considered "Deal breaking" that would force people to keep this improvement off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    for: futureIssue scheduled for resolution at some point in the future.status: acceptedDisputed bug is accepted as valid or Feature accepted as desired to be added.status: input wantedLooking for community feedback on this issue.type: featureRequest for a new Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions