Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster World Gen on WSL?? #4784

Open
callowaysutton opened this issue Jul 27, 2020 · 8 comments
Open

Faster World Gen on WSL?? #4784

callowaysutton opened this issue Jul 27, 2020 · 8 comments

Comments

@callowaysutton
Copy link

Client version: 1.12.2
Server OS: Windows/Linux (WSL2 Ubuntu 20.04)
Cuberite Commit id: 99f8c44 (for both)
Processor: i7-4770

Expected behavior

The same, or worse, performance on WSL compared to running natively on Windows

Actual behavior

WSL gets around 1.5-2x the speed of Windows somehow? I'm really just curious as to how/why

Steps to reproduce the behavior

Compile on Windows then compare to WSL

Server log

WSL (Pregenerated):

[12:58:09] Preparing spawn (world): 56.82% (9309/16384; 1865.75 chunks / sec)
[12:58:10] Preparing spawn (world): 67.74% (11099/16384; 1774.03 chunks / sec)
[12:58:11] Preparing spawn (world): 79.95% (13099/16384; 1976.28 chunks / sec)
[12:58:12] Preparing spawn (world): 92.16% (15099/16384; 1982.16 chunks / sec)

WSL (Generating Chunks):

[13:06:14] Preparing spawn (world): 97.59% (15989/16384; 183.71 chunks / sec)
[13:06:15] Chunk generator performance: 126.18 ch / sec (16616 ch total)
[13:06:15] Preparing spawn (world): 98.75% (16179/16384; 187.38 chunks / sec)
[13:06:16] Preparing spawn (world): 99.90% (16368/16384; 186.94 chunks / sec)

Windows (Pregenerated):

[12:59:52] Preparing spawn (world): 80.02% (13110/16384; 1527.89 chunks / sec)
[12:59:53] Preparing spawn (world): 88.47% (14495/16384; 1191.91 chunks / sec)
[12:59:54] Preparing spawn (world): 97.00% (15892/16384; 1397.00 chunks / sec)

Windows (Generating Chunks):

[13:10:07] Preparing spawn (world): 96.94% (15882/16384; 83.42 chunks / sec)
[13:10:08] Chunk generator performance: 101.69 ch / sec (16466 ch total)
[13:10:08] Preparing spawn (world): 97.54% (15981/16384; 96.96 chunks / sec)
[13:10:09] Preparing spawn (world): 98.14% (16080/16384; 97.83 chunks / sec)
@Titaniumtown
Copy link

Titaniumtown commented Jul 27, 2020

Yea, WSL2 is linux, and Linux is faster than windows. This shouldn't really be an issue on github though.

@callowaysutton
Copy link
Author

Yea, WSL2 is linux, and Linux is faster than windows. This shouldn't really be an issue on github thought.

Linux itself isn't inherently faster than Windows and WSL2 is just a Hyper-V VM so there's going to be a little overhead regardless. But, I'm here to mainly ask the question of why is it faster on WSL2, which could purportedly be an issue if somebody wanted to host on a Windows machine when it would be objectively better to run Cuberite on WSL2.

@peterbell10
Copy link
Member

I suspect the difference is down to the compiler. MSVC has at least historically been far behind gcc and clang in terms of optimising code.

It's possible we could improve things a bit with some compiler flags. Did you build the executables yourself or download from cuberite.org? If you built it yourself then -march=native can be a big factor. Not sure if MSVC has an equivalent flag.

@callowaysutton
Copy link
Author

I suspect the difference is down to the compiler. MSVC has at least historically been far behind gcc and clang in terms of optimising code.

It's possible we could improve things a bit with some compiler flags. Did you build the executables yourself or download from cuberite.org? If you built it yourself then -march=native can be a big factor. Not sure if MSVC has an equivalent flag.

Hmm I know MSVC has the /favor flag but I'll look more into optimizing on the Windows side. I'll try again tomorrow using 'optimized' flags to see if anything can be improved upon

@madmaxoft
Copy link
Member

I don't think measuring Cuberite directly is a good idea, there's a lot of other stuff going on under the hood. Perhaps if you could somehow extract the world gen itself (or at least a part of it) and measure its performance, it would be a piece of information of much more value.

@madmaxoft
Copy link
Member

Might also be worth experimenting with mingw, see if it is closer to the WSL or nativeWin perf.

@callowaysutton
Copy link
Author

callowaysutton commented Jul 27, 2020

What do you think would be a good way of testing would be? Right now I'm just setting the PregerateDistance to 128 in the world.ini, then deleting the world data and then just starting up the server which in my opinion would show a general view of how well Cuberite is generating chunks since it seems like no other process can be running until the initial world is generated.

Also I was playing around with MSVC some more and got it to do 300-500ch/sec with bursts to 900+ch/sec on my i5 7300HQ laptop. A very big improvement so far (I think a lot of it was just enabling intrinsics).

This is generating new chunks:

[18:20:32] Preparing spawn (world): 68.30% (11191/16384; 546.17 chunks / sec)
[18:20:32] Chunk generator performance: 105.30 ch / sec (1574 ch total)
[18:20:33] Preparing spawn (world): 72.07% (11808/16384; 599.61 chunks / sec)
[18:20:34] Preparing spawn (world): 78.86% (12921/16384; 1092.25 chunks / sec)
[18:20:35] Preparing spawn (world): 84.49% (13843/16384; 907.48 chunks / sec)

Flags Used/Changed from Default:

/Ob2 /Oi /Ot /GF /Gm- /fp:fast /arch:AVX /Qpar

For Comparison, Running WSL2:

[18:36:46] Preparing spawn (world): 97.80% (16024/16384; 174.90 chunks / sec)
[18:36:46] Chunk generator performance: 116.36 ch / sec (13456 ch total)
[18:36:47] Preparing spawn (world): 98.94% (16210/16384; 184.34 chunks / sec)
[18:36:48] Chunk generator performance: 116.61 ch / sec (13718 ch total)

Also, @peterbell10 the -march=native flag is already set in the CMake configuration

@tigerw
Copy link
Member

tigerw commented Aug 20, 2020

I can't find a Quick March flag for Visual C++. I guess we could try throwing the kitchen sink at the CMake files and enable every compiler flags to do with performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants