Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async chunk api/io/loading #2308

Merged
merged 57 commits into from Aug 18, 2019

Conversation

@Spottedleaf
Copy link
Contributor

commented Jul 11, 2019

This patch re-adds async chunk API and IO/Loading

(For people watching this is not the async chunk patch you've been waiting for, that will be later)
The primary issue will be chunk generation - I have not touched this system and as such it is not prioritized. This will come in a later patch when I've taken the proper time to fully understand how Mojang's chunk generation system works.

I have performed internal testing to show that these changes work really well if you pre-generate your world (match or exceed 1.13).

Before merge:

  • Delete the async api placeholder patch

TODO list:

  • Ready for review
  • Review & Test
    Wait on #2376 to move compression/decompression to the loader threads?
  • Wait on #2332
  • Add detailed notes to patch header
  • Wait on #2297
  • Wait on Billy's changes
  • Config
  • Anti-Xray async load?
  • Get #2296 merged and resolve the chunk status cache thread-safety issues

So after a bunch of internal testing I think it's time to start officially testing this out.

Your configuration settings for async chunks will carry over except generation stuff - we don't have control of that currently. However mojang's generation system is async currently, however it lacks prioritization. This means this will work best with a pre-generated world.

Please report any issues with async chunks on this PR, the jar is below.
Please remember to backup regularly, especially before using this.

Currently this jar is based off paper build 172

https://cdn.discordapp.com/attachments/602031644919988236/612350235594981411/paperclip.jar
SHA1: b022e316542efad4771aef5d2b71e0bb579a51d1

@Spottedleaf Spottedleaf changed the title [W.I.P] Async chunk api/io [W.I.P] Async chunk api/io/loading Jul 13, 2019

Spottedleaf added 7 commits Jul 13, 2019
Leave config until later
Still need to decide about per world load threads vs per server
Actually use POI data after loading it
Improve exception handling too
@LockedThread

This comment has been minimized.

Copy link

commented Jul 16, 2019

I'm looking forward to this

Spottedleaf added 8 commits Jul 16, 2019
Document code and fix varies bugs
- Apply datafixers before trying to deserialize data
- Remove async catchers for scheduling async chunk loads
 (we could schedule one from another worker thread)
- Fix QueueExecutorThread#flush and QueueExecutorThread#close hopefully
Fix deadlock concern with QueueExecutorThread#flush()
The issue was where we were incrementing the flush counter, which
was before draining the flush queue. Now, the flush counter is
incremented before each call to unpark().

A deadlock could occur if the following chain of events occured:

Queue executor thread -> Q
Wait thread -> W

W -> invokes flush, gets to unparking the queue thread
Q -> wakes up, polls tasks, starts emptying the flush queue, increment flush counter
W -> reads new flush counter, rets immediately
W -> re-enters flush, gets old flush counter, adds to flush queue
W -> enters park loop
Q -> unparks W from its first queue, unparks again (however it fails since the flush counter is not incremented)
Q -> goes back to sleep

The fix is to increment the flush counter before unparking a thread, for each thread
in the flush queue.

There's also an issue if W schedules a task before re-entering flush
(in the ordering listed above), the queue executor would not recognize
that a task had been scheduled under its nose  and would unpark the
waiting thread. Now before unparking we poll tasks to prevent
that from occurring.
It does NOT prevent another thread from scheduling
and then the queue executor unparking, however that's a race condition
that the wait thread is responsible for - if it correctly synchronizes
with the other scheduling thread the poll tasks would catch it.
@KennyTV

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Just a sexy note that none of these reviews actually mean anything (unless it's by an actual paper member, or feedback of how tests went) :x

@clrxbl
clrxbl approved these changes Aug 14, 2019
@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 14, 2019

I've updated the jar to include a fix for servers deadlocking while loading chunks that need to be converted

@iiLearner

This comment has been minimized.

Copy link

commented Aug 16, 2019

After using the pr my server apparenrly crashes, here's the crash log if it helps: https://pastebin.com/QMKeK57e

(deadlock)

@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

I don't see anything indicating this PR given the the log you showed shows the server is not actually waiting on chunk loads

@iiLearner

This comment has been minimized.

Copy link

commented Aug 16, 2019

I've been using the pr since last night and it has crashed 3 times by then, here's some timings as well if it helps:

[Fri, 16. Aug 2019 18:04:46 CEST INFO] View Timings Report: https://timings.aikar.co/?id=708582987acd492ca652ec4b91880dc8

Reply above: do you mean the pr file is not even being used?

@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

I mean there is nothing in that crashlog indicating my PR, given the server is not waiting on chunk loads

@iiLearner

This comment has been minimized.

Copy link

commented Aug 16, 2019

I get your point, but doesn't the crash log and timings tell the version? isn't that the pr version? sorry im kinda new to it

@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 16, 2019

I'm talking about there's nothing in that crashlog indicating my PR caused it

@DefinePvP

This comment has been minimized.

Copy link

commented Aug 17, 2019

My server has also frozen for the first time in weeks after I put this build on it. https://pastebin.com/E1v3NHLF

@electronicboy

This comment has been minimized.

Copy link
Member

commented Aug 17, 2019

That's #2312

@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 17, 2019

I've updated the jar to be based off build 172, as well as include some debug info in watchdog reports

@zachbr zachbr merged commit 294e304 into PaperMC:ver/1.14 Aug 18, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@Spottedleaf

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2019

As you all may know now this PR has been merged to ver/1.14.

Thanks for all the testing, it really made the merge time for this go down significantly.
Test jars are no longer supported and it is recommended to use builds 173+ for async loading.

@HexedHero

This comment has been minimized.

Copy link

commented Aug 19, 2019

Thank you for everything leaf <3

@TheCompieter

This comment has been minimized.

Copy link

commented Aug 19, 2019

damm this made an big improvent guys no more tps dropping when loading a bunch of chunks
great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.