Concurrently load biome info in AreaLazy #3559

Techcable · 2020-06-14T21:58:15Z

Sometimes the main thread wanted to create treasure maps for villages and treasures chests.
It would call ItemWorldMap.applySepiaFilter, which causes extreme contention with
the async chunk loading threads (which also need biome information from the cache).

Instead of having a single global lock shared between all values, each value has its own independent lock.
The LazyInt utility class is responsible for this locking and ensures we never initialize a value more than once.
Lazy-loading a single chunk's biome information via LazyInt::getOrLoad doesn't affect any of the other entries in the cache.

In theory, this would fix #3536
However the AreaTransformers don't really seem to be pure functions of the area seed - they have some sort of internal state.

For various reasons (outlined in the patch file) ConcurrentHashMap does not work very well. It's basically just best to put a lock around all get/set operations in the map. This overhead is minuscule compared to the work done computing biome information.

After this change, computing treasure maps is still an expensive operation. It takes a second or two before I can spawn one using my plugin. I can easily overload the server by spawning a bunch of treasure maps. Although it's much better than the situation before this patch. Chunk-generation still continues and the server no longer crashes.

Maybe someone can refactor ItemWorldMap.applySepiaFilter to compute asynchronously? It could just wait until all the chunks are loaded instead of trying to do all the biome lookups on the main-thread.

EDIT: Clarify the current state of the PR

aikar · 2020-06-15T14:01:13Z

yeah compute is what I was expecting.

Techcable · 2020-06-16T20:05:37Z

So using get and then put generates IllegalStateException because the AreaTransformer generates two different results for the same input coordinates. I was assuming that all the area transformers were pure functions of the input coordinates, but that doesn't seem to be the case.

Using ConcurrentHashMap.computeIfAbsent inside AreaLazy doesn't seem to work because computing biomes recursively calls itself. I think it's because computeIfAbsent internally acquires a lock on a portion of the ConcurrentHashMap, eventually causing deadlocks. For some reason generating chunks works fine but villager trades immediately stall the server.
Here's the Relativent Watchdog Dump.

I'm thinking I should just use the old synchronized hashmap and compute the info we need for ItemWorldMap.applySepiaFilter asynchronously. We basically just need to compute a byte[] of colors. We could even leave the color computation and just load the BiomeBase[] async. This wouldn't exactly "fix" the contention issue, but it'd definitely prevent it from stalling the main thread.

This should be more reliable anyway. It makes fewer assumptions about the thread-safety of Mojang code. We no longer have to worry about the ordering of concurrent computations since the vanilla implementation of the cache maintains a lock on the internal hashmap. We're only assuming that WorldServer.getBiomeBySeed can be called by another thread.

aikar · 2020-06-16T20:48:42Z

This function scopes beyond the sepia filter. I see it pop up in profiling as blocking threads often.

even if the function isn't pure, it seems the usage of it doesn't care. the first caller is what is cached and further callers use it.

Parallelizing this won't help because each parallel operation is still going to slam each other in contention and im not sure this function is slow enough to justify jumping to parallelization either.

here's what im thinking on impl if you can run with it - we still need to keep synchronize sadly to protect from multiple positions modifying map at same time, but now we at least only lock the map for access and modification.

goal here being first obtainer of a lock is the one who has responsibility to compute it and remove the lock from the map

int l = b.get(k);
if (l != Integer.MAX_VALUE) return l;
AtomicBoolean createdLock = new AtomicBoolean();
WriteLock lock = locks.computeIfAbsent(k, (x) -> {
    createdLock.set(true);
    WriteLock lock = new ReentrantReadWriteLock().writeLock();
    lock.lock();
    return lock;
});
if (!createdLock.get()) {
  lock.lock(); // going to read cached value
}
try {
    int l = b.get(k);
    if (l != Integer.MAX_VALUE) return l;

    int i1 = this.a.apply(i, j);

    this.b.put(k, i1);

    return i1;
} finally {
    lock.unlock();
    if (createdLock.get(k) && lock.getHoldCount() == 0) {
        locks.remove(position);
    }
    if (this.b.size() > this.c) {
        synchronized (this.b) {
            for (int j1 = 0; j1 < this.c / 16; ++j1) {
                this.b.removeFirstInt();
            }
        }
    }
}

krolik-exe · 2020-08-06T14:19:53Z

@Techcable Why is it marked as broken , because i tested it and works fine on latest 1.16.1

aikar · 2020-08-06T14:42:11Z

Do you commonly pull PR's that say they are broken and put them in production?
As stated there's risk for errors with this implementation.

Just because you haven't seen it yet doesn't mean it's not going to happen.

stale · 2020-12-02T20:41:15Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2020-12-09T20:46:48Z

This issue has been automatically closed because it has not had activity in a long time. If the issue still applies to the most recent supported version, please open a new issue referencing this original issue.

Techcable · 2021-01-28T02:37:15Z

I just rebased this to master (untested).

The new patch number is 666 which makes me very happy :devil:

MiniDigger · 2021-01-28T10:44:41Z

just to confirm, this isn't broken anymore, so we can remove that from the title?

Techcable · 2021-01-29T07:22:30Z

@MiniDigger I'm not quite sure 😳 Did you test it?

I'm pretty sure it's fixed, but I need to verify to be sure.

I really struggled to reproduce the issue on an "unmodified", but then I realized I might have confused the jar with the fixed one.

I guess I had a little version mixup and I'm not sure if I was running the jar from upstream or my fork. (I feel kind of stupid 🙈)

It's probably fixed it and I was running the modified jar, but it's possible I was just having extreme trouble reproducing the bug on regular Paper.

I think this fix worked before the rebase, I want to be thorough and verify everything.

If I create a small plugin which uses Server.createExplorerMap to directly create treasure maps, bypassing any nonsense with the villagers.

Than I can try and reproduce the bug on upstream and verify this fix works.

I will strip the "BROKEN" tag from the header, since it's technically just WIP - not broken.

aikar · 2021-01-29T15:17:50Z

with YourKit, monitor for lock contention on threads and set a high no tick view distance and go generate chunks, how much this lock blocks should be very visible in that monitor.

Also supports downloading official builds. Should avoid that attrocious version mixup I made in PaperMC/Paper#3559 This doesn't work because click is broken with it -_- I always wanted to try a different argument parsing library

This avoids contention between main-thread treasure map handling and async chunk loading/generation. Because we use boxed primitives w/ ConcurrentHashMap instead of unboxed fastutil collections, there is a risk of performance degregation in the uncontended case. Fixes PaperMC#3536

This doesn't seem to work because computing biomes recursively calls itself. Since computeIfAbsent internally acquires a lock on a portion of the ConcurrentHashMap, eventually it deadlocks.

Using ConcurrentHashMap.computeIfAbsent locks the map coarsely, causing deadlocks if the computation of one value recursively computes a different value that wants the same lock. We use a FIFO queue for removal. I'm having some problem with lock contention on removal/cleanup so I need to look into that.

Techcable · 2021-02-24T22:11:34Z

Thanks to my spawn explorer map plugin, I can now reproduce this very reliably (without involving villagers).
On the latest paper (build 496), this contention in AreaLazy is still a significant problem.

The current changes in the PR fix the original problem with lock contention.
However, due to the specific implementation, there is now a ton of convention actually evicting old entries.
Because of this, spawning an explorer map still triggers watchdog for me. However, the overall lag seems to be much more reasonable (not a complete stall).

I believe this has something to do with the fact I replaced the LRU cache eviction strategy. I tried using a FIFO array queue but still no luck on the lock contention.

Aikar's approach is semantically similar but actually uses twice as many hashmaps. The secondary concurrent map is needed to store lazy-loading locks separately from the main (synchronized) data store.
This change could potentially resolve the issue (or maybe it would just have the same problem).

I'll look into that later, and I have some other ideas for resolving the contention in cleaning the cache.
I'm thinking about maybe a sort of reference count on who is currently using the lock, so we will defer cleanup until after there is no one else trying to access the AreaLazy concurrently........

This shouldn't be too hard to fix this final set of bugs. I'm just busy with other stuff 😄

ConcurrentHashMap doesn't support LRU cache eviction, so the cache invalidation had horrible performance. Furthermore, clearing the map effectively locked the entire map and brought performance to a standstill...... Having the actually hashmap load/store protected by a lock is fine since it's so minor compared to actually performing the biome computation itself. I can confirm this elimnates the lock contention, although supporting treasure maps is still an expensive operation. Spawning a treasure map no longer actually stalls the server, but it does eat up plenty of TPS and take a couple seconds for me to get my treasure map. However, chunk loading seems to go on normally during this time and YourKit's monitor profiling reports (what seems like) relatively low contention.

Techcable · 2021-03-04T00:18:00Z

Okay, this PR is now ready for review 🎉

I cleaned up the main PR message (although I now realize that loses some context 😢 ).

For various reasons (outlined in the patch file) ConcurrentHashMap does not work very well. It's basically just best to put a lock around all get/set operations in the map. This overhead is minuscule compared to the work done computing biome information.

After this change, computing treasure maps is still an expensive operation. It takes a second or two before I can spawn one using my plugin. I can easily overload the server by spawning a bunch of treasure maps. Although it's much better than the situation before this patch. Chunk-generation still continues and the server no longer crashes.

I did some profiling by flying around my test-server into unloaded chunks and spawning treasure maps. I can still pretty easily ruin the server's TPS by spamming treasure maps, but this change is a big improvement. Before I would completely crash the server with a single treasure-map near unloaded chunks. Now, I can actually spawn several maps without crashing. I can visibly see chunks continuing to generate/load while I fly. The server recovers from the (large) drop in TPS fairly gracefully (although watchdog still prints a stack-trace if I spawn the maps fast enough).

Treasure maps are still somewhat of a performance concern after this patch, but they've gotten a lot better.

Based on my Yourkit profiling, very little time is now spent waiting on the locks themselves. Most of the time is actually spent while generating the biome i. The big difference is that internal locking in AreaLazy can no longer completely stall the server, although the underlying biome computations are still able to have a significant impact on performance.

I would think that with the relative rarity of treasure maps in vanilla minecraft, using them in survival mode will no longer be a serious performance problem. You may notice some moderate TPS drop, but I doubt there would be much beyond that.

Once you review this, I'd be happy to rebase/squash this personally 😄

Owen1212055 · 2022-10-22T16:23:45Z

As of 1.19, LazyArea no longer exists.
And in general, biome data has gotten much more complex.

I'm guessing that this is an optimization that can no longer be applied?

Techcable · 2022-10-22T17:42:23Z

As of 1.19, LazyArea no longer exists. And in general, biome data has gotten much more complex.

I'm guessing that this is an optimization that can no longer be applied?

I haven't looked at the Minecraft source code in a while. However, if ArenaLazy no longer exists, I doubt this would be relevant. It's quite likely to require a rewrite or significant changes. For now, I'll close the issue.

Proximyst linked an issue Aug 31, 2020 that may be closed by this pull request

When you open a Treasure chest the server crashes (Genericially: Server searching for generation features causes watchdog to time out) #2312

Closed

electronicboy removed a link to an issue Aug 31, 2020

When you open a Treasure chest the server crashes (Genericially: Server searching for generation features causes watchdog to time out) #2312

Closed

stale bot added the resolution: stale label Dec 2, 2020

stale bot closed this Dec 9, 2020

Proximyst reopened this Jan 26, 2021

Proximyst added type: performance and removed resolution: stale labels Jan 26, 2021

Techcable force-pushed the concurrent-arealazy branch from 9d19a33 to 4d27de9 Compare January 28, 2021 02:34

Techcable changed the title ~~[BROKEN] Use ConcurrentHashMap for AreaLazy~~ Use ConcurrentHashMap for AreaLazy Jan 29, 2021

Techcable force-pushed the concurrent-arealazy branch from 4d27de9 to c102064 Compare February 15, 2021 01:56

Techcable added 2 commits February 24, 2021 11:00

Use ConcurrentHashMap.computeIfAbsent to load cached biomes in AreaLazy

50ff649

This doesn't seem to work because computing biomes recursively calls itself. Since computeIfAbsent internally acquires a lock on a portion of the ConcurrentHashMap, eventually it deadlocks.

Techcable force-pushed the concurrent-arealazy branch from c102064 to e812d65 Compare February 24, 2021 18:11

Techcable force-pushed the concurrent-arealazy branch from e812d65 to cc9dc37 Compare February 24, 2021 20:38

Techcable changed the title ~~Use ConcurrentHashMap for AreaLazy~~ Concurrently load biome info in AreaLazy Mar 3, 2021

Techcable marked this pull request as ready for review March 4, 2021 00:18

Techcable requested a review from a team as a code owner March 4, 2021 00:18

Proximyst added the status: rebase required label Mar 18, 2021

Techcable requested a review from a team as a code owner June 20, 2021 12:00

Techcable closed this Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concurrently load biome info in AreaLazy #3559

Concurrently load biome info in AreaLazy #3559

Techcable commented Jun 14, 2020 •

edited

Loading

aikar commented Jun 15, 2020

Techcable commented Jun 16, 2020

aikar commented Jun 16, 2020

krolik-exe commented Aug 6, 2020

aikar commented Aug 6, 2020

stale bot commented Dec 2, 2020

stale bot commented Dec 9, 2020

Techcable commented Jan 28, 2021

MiniDigger commented Jan 28, 2021

Techcable commented Jan 29, 2021

aikar commented Jan 29, 2021

Techcable commented Feb 24, 2021

Techcable commented Mar 4, 2021

Owen1212055 commented Oct 22, 2022

Techcable commented Oct 22, 2022

Concurrently load biome info in AreaLazy #3559

Concurrently load biome info in AreaLazy #3559

Conversation

Techcable commented Jun 14, 2020 • edited Loading

aikar commented Jun 15, 2020

Techcable commented Jun 16, 2020

aikar commented Jun 16, 2020

krolik-exe commented Aug 6, 2020

aikar commented Aug 6, 2020

stale bot commented Dec 2, 2020

stale bot commented Dec 9, 2020

Techcable commented Jan 28, 2021

MiniDigger commented Jan 28, 2021

Techcable commented Jan 29, 2021

aikar commented Jan 29, 2021

Techcable commented Feb 24, 2021

Techcable commented Mar 4, 2021

Owen1212055 commented Oct 22, 2022

Techcable commented Oct 22, 2022

Techcable commented Jun 14, 2020 •

edited

Loading