Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrently load biome info in AreaLazy #3559

Closed
wants to merge 4 commits into from

Conversation

Techcable
Copy link
Contributor

@Techcable Techcable commented Jun 14, 2020

Sometimes the main thread wanted to create treasure maps for villages and treasures chests.
It would call ItemWorldMap.applySepiaFilter, which causes extreme contention with
the async chunk loading threads (which also need biome information from the cache).

Instead of having a single global lock shared between all values, each value has its own independent lock.
The LazyInt utility class is responsible for this locking and ensures we never initialize a value more than once.
Lazy-loading a single chunk's biome information via LazyInt::getOrLoad doesn't affect any of the other entries in the cache.

In theory, this would fix #3536
However the AreaTransformers don't really seem to be pure functions of the area seed - they have some sort of internal state.

For various reasons (outlined in the patch file) ConcurrentHashMap does not work very well. It's basically just best to put a lock around all get/set operations in the map. This overhead is minuscule compared to the work done computing biome information.

After this change, computing treasure maps is still an expensive operation. It takes a second or two before I can spawn one using my plugin. I can easily overload the server by spawning a bunch of treasure maps. Although it's much better than the situation before this patch. Chunk-generation still continues and the server no longer crashes.

Maybe someone can refactor ItemWorldMap.applySepiaFilter to compute asynchronously? It could just wait until all the chunks are loaded instead of trying to do all the biome lookups on the main-thread.

EDIT: Clarify the current state of the PR

@aikar
Copy link
Member

aikar commented Jun 15, 2020

yeah compute is what I was expecting.

@Techcable
Copy link
Contributor Author

So using get and then put generates IllegalStateException because the AreaTransformer generates two different results for the same input coordinates. I was assuming that all the area transformers were pure functions of the input coordinates, but that doesn't seem to be the case.

Using ConcurrentHashMap.computeIfAbsent inside AreaLazy doesn't seem to work because computing biomes recursively calls itself. I think it's because computeIfAbsent internally acquires a lock on a portion of the ConcurrentHashMap, eventually causing deadlocks. For some reason generating chunks works fine but villager trades immediately stall the server.
Here's the Relativent Watchdog Dump.

I'm thinking I should just use the old synchronized hashmap and compute the info we need for ItemWorldMap.applySepiaFilter asynchronously. We basically just need to compute a byte[] of colors. We could even leave the color computation and just load the BiomeBase[] async. This wouldn't exactly "fix" the contention issue, but it'd definitely prevent it from stalling the main thread.

This should be more reliable anyway. It makes fewer assumptions about the thread-safety of Mojang code. We no longer have to worry about the ordering of concurrent computations since the vanilla implementation of the cache maintains a lock on the internal hashmap. We're only assuming that WorldServer.getBiomeBySeed can be called by another thread.

@aikar
Copy link
Member

aikar commented Jun 16, 2020

This function scopes beyond the sepia filter. I see it pop up in profiling as blocking threads often.

even if the function isn't pure, it seems the usage of it doesn't care. the first caller is what is cached and further callers use it.

Parallelizing this won't help because each parallel operation is still going to slam each other in contention and im not sure this function is slow enough to justify jumping to parallelization either.

here's what im thinking on impl if you can run with it - we still need to keep synchronize sadly to protect from multiple positions modifying map at same time, but now we at least only lock the map for access and modification.

goal here being first obtainer of a lock is the one who has responsibility to compute it and remove the lock from the map

int l = b.get(k);
if (l != Integer.MAX_VALUE) return l;
AtomicBoolean createdLock = new AtomicBoolean();
WriteLock lock = locks.computeIfAbsent(k, (x) -> {
    createdLock.set(true);
    WriteLock lock = new ReentrantReadWriteLock().writeLock();
    lock.lock();
    return lock;
});
if (!createdLock.get()) {
  lock.lock(); // going to read cached value
}
try {
    int l = b.get(k);
    if (l != Integer.MAX_VALUE) return l;

    int i1 = this.a.apply(i, j);

    this.b.put(k, i1);

    return i1;
} finally {
    lock.unlock();
    if (createdLock.get(k) && lock.getHoldCount() == 0) {
        locks.remove(position);
    }
    if (this.b.size() > this.c) {
        synchronized (this.b) {
            for (int j1 = 0; j1 < this.c / 16; ++j1) {
                this.b.removeFirstInt();
            }
        }
    }
}

@krolik-exe
Copy link
Contributor

@Techcable Why is it marked as broken , because i tested it and works fine on latest 1.16.1

@aikar
Copy link
Member

aikar commented Aug 6, 2020

Do you commonly pull PR's that say they are broken and put them in production?
As stated there's risk for errors with this implementation.

Just because you haven't seen it yet doesn't mean it's not going to happen.

@stale
Copy link

stale bot commented Dec 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale
Copy link

stale bot commented Dec 9, 2020

This issue has been automatically closed because it has not had activity in a long time. If the issue still applies to the most recent supported version, please open a new issue referencing this original issue.

@Techcable
Copy link
Contributor Author

I just rebased this to master (untested).

The new patch number is 666 which makes me very happy :devil:

@MiniDigger
Copy link
Member

just to confirm, this isn't broken anymore, so we can remove that from the title?

@Techcable
Copy link
Contributor Author

@MiniDigger I'm not quite sure 😳 Did you test it?

I'm pretty sure it's fixed, but I need to verify to be sure.

I really struggled to reproduce the issue on an "unmodified", but then I realized I might have confused the jar with the fixed one.

I guess I had a little version mixup and I'm not sure if I was running the jar from upstream or my fork. (I feel kind of stupid 🙈)

It's probably fixed it and I was running the modified jar, but it's possible I was just having extreme trouble reproducing the bug on regular Paper.

I think this fix worked before the rebase, I want to be thorough and verify everything.

If I create a small plugin which uses Server.createExplorerMap to directly create treasure maps, bypassing any nonsense with the villagers.

Than I can try and reproduce the bug on upstream and verify this fix works.

I will strip the "BROKEN" tag from the header, since it's technically just WIP - not broken.

@Techcable Techcable changed the title [BROKEN] Use ConcurrentHashMap for AreaLazy Use ConcurrentHashMap for AreaLazy Jan 29, 2021
@aikar
Copy link
Member

aikar commented Jan 29, 2021

with YourKit, monitor for lock contention on threads and set a high no tick view distance and go generate chunks, how much this lock blocks should be very visible in that monitor.

Techcable added a commit to Techcable/minecraft-testserver that referenced this pull request Feb 15, 2021
Also supports downloading official builds.
Should avoid that attrocious version mixup I made in
PaperMC/Paper#3559

This doesn't work because click is broken with it -_-
I always wanted to try a different argument parsing library
This avoids contention between main-thread treasure map
handling and async chunk loading/generation.

Because we use boxed primitives w/ ConcurrentHashMap
instead of unboxed fastutil collections, there is a risk
of performance degregation in the uncontended case.

Fixes PaperMC#3536
This doesn't seem to work because computing biomes recursively calls itself.
Since computeIfAbsent internally acquires a lock on a portion of the ConcurrentHashMap,
eventually it deadlocks.
Using ConcurrentHashMap.computeIfAbsent locks the map
coarsely, causing deadlocks if the computation of one value
recursively computes a different value that wants the same lock.

We use a FIFO queue for removal. I'm having some problem with lock
contention on removal/cleanup so I need to look into that.
@Techcable
Copy link
Contributor Author

Thanks to my spawn explorer map plugin, I can now reproduce this very reliably (without involving villagers).
On the latest paper (build 496), this contention in AreaLazy is still a significant problem.

The current changes in the PR fix the original problem with lock contention.
However, due to the specific implementation, there is now a ton of convention actually evicting old entries.
Because of this, spawning an explorer map still triggers watchdog for me. However, the overall lag seems to be much more reasonable (not a complete stall).

I believe this has something to do with the fact I replaced the LRU cache eviction strategy. I tried using a FIFO array queue but still no luck on the lock contention.

Aikar's approach is semantically similar but actually uses twice as many hashmaps. The secondary concurrent map is needed to store lazy-loading locks separately from the main (synchronized) data store.
This change could potentially resolve the issue (or maybe it would just have the same problem).

I'll look into that later, and I have some other ideas for resolving the contention in cleaning the cache.
I'm thinking about maybe a sort of reference count on who is currently using the lock, so we will defer cleanup until after there is no one else trying to access the AreaLazy concurrently........

This shouldn't be too hard to fix this final set of bugs. I'm just busy with other stuff 😄

ConcurrentHashMap doesn't support LRU cache eviction,
so the cache invalidation had horrible performance.
Furthermore, clearing the map effectively locked the entire
map and brought performance to a standstill......

Having the actually hashmap load/store protected by a lock
is fine since it's so minor compared to actually performing
the biome computation itself.

I can confirm this elimnates the lock contention, although
supporting treasure maps is still an expensive operation.
Spawning a treasure map no longer actually stalls the server,
but it does eat up plenty of TPS and take a couple seconds for me
to get my treasure map. However, chunk loading seems to go on normally
during this time and YourKit's monitor profiling reports (what seems
like) relatively low contention.
@Techcable Techcable changed the title Use ConcurrentHashMap for AreaLazy Concurrently load biome info in AreaLazy Mar 3, 2021
@Techcable
Copy link
Contributor Author

Okay, this PR is now ready for review 🎉

I cleaned up the main PR message (although I now realize that loses some context 😢 ).

For various reasons (outlined in the patch file) ConcurrentHashMap does not work very well. It's basically just best to put a lock around all get/set operations in the map. This overhead is minuscule compared to the work done computing biome information.

After this change, computing treasure maps is still an expensive operation. It takes a second or two before I can spawn one using my plugin. I can easily overload the server by spawning a bunch of treasure maps. Although it's much better than the situation before this patch. Chunk-generation still continues and the server no longer crashes.

I did some profiling by flying around my test-server into unloaded chunks and spawning treasure maps. I can still pretty easily ruin the server's TPS by spamming treasure maps, but this change is a big improvement. Before I would completely crash the server with a single treasure-map near unloaded chunks. Now, I can actually spawn several maps without crashing. I can visibly see chunks continuing to generate/load while I fly. The server recovers from the (large) drop in TPS fairly gracefully (although watchdog still prints a stack-trace if I spawn the maps fast enough).

Treasure maps are still somewhat of a performance concern after this patch, but they've gotten a lot better.

Based on my Yourkit profiling, very little time is now spent waiting on the locks themselves. Most of the time is actually spent while generating the biome i. The big difference is that internal locking in AreaLazy can no longer completely stall the server, although the underlying biome computations are still able to have a significant impact on performance.

I would think that with the relative rarity of treasure maps in vanilla minecraft, using them in survival mode will no longer be a serious performance problem. You may notice some moderate TPS drop, but I doubt there would be much beyond that.

Once you review this, I'd be happy to rebase/squash this personally 😄

@Techcable Techcable marked this pull request as ready for review March 4, 2021 00:18
@Techcable Techcable requested a review from a team as a code owner March 4, 2021 00:18
@Techcable Techcable requested a review from a team as a code owner June 20, 2021 12:00
@Owen1212055
Copy link
Member

As of 1.19, LazyArea no longer exists.
And in general, biome data has gotten much more complex.

I'm guessing that this is an optimization that can no longer be applied?

@Techcable
Copy link
Contributor Author

As of 1.19, LazyArea no longer exists. And in general, biome data has gotten much more complex.

I'm guessing that this is an optimization that can no longer be applied?

I haven't looked at the Minecraft source code in a while. However, if ArenaLazy no longer exists, I doubt this would be relevant. It's quite likely to require a rewrite or significant changes. For now, I'll close the issue.

@Techcable Techcable closed this Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lag when trading with villager
6 participants