Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Login timeout due to garbage collection #2923

Closed
keybounce opened this issue Jun 18, 2018 · 10 comments
Closed

Login timeout due to garbage collection #2923

keybounce opened this issue Jun 18, 2018 · 10 comments

Comments

@keybounce
Copy link

Bug Report

I am having an issue with my client being unable to connect to my server. The issue has been traced to the time required on my system for 2 full garbage collects done by the Minecraft client during login.

To clarify: this server is running on my Macintosh. My client, on the same machine as the server, cannot connect because of timeouts, while my 3 friends are able to connect to my machine over the Internet.

I am on a Mac (4+4 i7, 2.5 GHz). My friends are on various Windows systems.

I am seeing garbage collection times of 12 seconds typical, 8 seconds best case. These are abnormally high lengths of time for a full garbage collection.

To clarify again: the default (throughput collector) has historically given me 1.5 to 4.5 second (long) pauses, with peak pauses of over 30 seconds. But it never gives me two long pauses in a row. In this case, I had the client running under jvisualvm, with G1GC; manually invoking garbage collection from jvisualvm resulted in the same ten to twelve second typical, with multiple collections taking the same length of time.

The server is timing out after 15 seconds of no response during login. Two long garbage collects is exceeding this length of time.

I have used G1GC with other modpacks (* Much smaller *) without issue.

The throughput collector, because of the lag concerns, is unacceptable, and is probably responsible for the normally accepted view that Java is no good for games.

The G1GC collector normally performs much better than the CMS collector (repeated testing on much smaller modpacks shows that CMS requires a minimum of 800 MB additional space for_eden_ to give good memory performance -- the primary issue is that it takes 30 seconds to make sure that junk is seen as junk and not promoted, and premature promotion under CMS is HORRIBLE.) (Comparisons based on Java 7, and 8 version 61 and higher; G1GC on Java 6 was bad on the Macintosh, and earlier versions of Java 8 had bugs in G1GC that could crash programs that loaded classes after starting up, including at least one Minecraft mod with 100% repeatability), so being able to use the G1GC collector is a huge gain.

The issue is not inherent to this modpack, but it is triggered by this modpack.

As an attempted workaround, I added 2 GB of memory to the client (going from 4 GB to 6 GB). This reduced the time needed for the garbage collections, but did not prevent the timeouts. (logs are from this run)

Possible resolutions:

  1. A mod to the client to eliminate the system garbage collection calls that occur at login, dimension change, logout, potentially others.
  2. A mod to the server to increase the timeout length at login (possibly having a config value in the server.properties file)
  3. An official policy that G1GC is not supported or recommended, and that CMS along with : -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses is recommended for play (n.b.: this will raise the memory requirement for the client to have good performance. Since I am going to need to be doing this until there is some other workaround, I may be able to give you some numbers :-)

Log files:

server side at the point of connection:

https://gist.githubusercontent.com/keybounce/c4db0b57d15701055413046a7bbec420/raw/c0c73d012e4be469efe55742339b5a1c712e9460/server%2520connection%2520log.txt

Client memory log:

https://gist.github.com/keybounce/c4db0b57d15701055413046a7bbec420/raw/c0c73d012e4be469efe55742339b5a1c712e9460/client%2520GC.log

Client Forge log:

https://gist.github.com/keybounce/c4db0b57d15701055413046a7bbec420/raw/c0c73d012e4be469efe55742339b5a1c712e9460/client%2520latest.log

Client Information

  • Modpack Version: 3.0.8

  • Java Version:
    java version "1.8.0_66"
    Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
    Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)

  • Launcher Used: Vanilla

  • Memory Allocated: 6 gb

  • Server/LAN/Single Player: client

  • Optifine Installed: no

  • Shaders Enabled: pack default

World Information

  • Modpack Version world created in: 3.0.8

Server Information

  • Java Version: same
  • Operating System: Mac 10.9.5
  • Hoster/Hosting Solution: Self
@sharktemplar
Copy link

This seems vaguely like it may be experience something similar to what I am currently still having huge problems with from the new sevtech update:
#2889
Except it sounds like you know way more about 12+second freeze-ups in Minecraft that ultra-boost the CPU usage during said freeze.

Really REALLY sucks that sevtech ages now HUGELY discourages large builds from this new update. Whatever was added to obliterate performance when the server is under certain conditions in-game, I really hope they revert it or fix it or patch it or whatever the solution is. Honestly would gladly accept a downgrade revert to the previous version so I could at least continue construction on my own world, but am unsure how to go about bumping back down to 3.0.7.

@keybounce
Copy link
Author

The problem is even worse than I thought. Using the FML timeout launch argument, I gave both the client and server a 240 second timeout. I played with the garbage collection arguments and memory size, and was able to get that full collection under 6 seconds. I made sure the client was entirely memory resident, not being paged in from the swap file.

I have managed to get disconnected from the server in less than 30 seconds. That's right, less than the normal connection timeout.

The most annoying, was my "almost" connection this afternoon. My client actually started drawing the world, and I could see about 6 chunks ahead of me, when I got a timeout.

At 10 minutes per launch, trying to tune CMS is a nightmare.

@keybounce
Copy link
Author

https://puu.sh/AIUAk/a10a8e7048.png

This shows 2 attempts to connect. The 2nd was the one that actually started drawing.

@keybounce
Copy link
Author

here is a working set of CMS arguments:

-Dfml.readTimeout=240 -XX:NewSize=930m -XX:MaxTenuringThreshold=5 -XX:SurvivorRatio=13 -Xms1500m -Xmx4500m -XX:CompileThreshold=300 -XX:MaxPermSize=150m -XX:TargetSurvivorRatio=80 -XX:SoftRefLRUPolicyMSPerMB=1 -XX:+UseConcMarkSweepGC -XX:+UseCompressedOops -XX:+UseParNewGC -XX:-CMSPrecleaningEnabled -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseAdaptiveGCBoundary -XX:+UseCMSCompactAtFullCollection -XX:+ExplicitGCInvokesConcurrentAndUnloadsClasses -XX:+PrintCommandLineFlags -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -Xloggc:GC.log

@sharktemplar
Copy link

@keybounce What exactly would using these arguments resolve? Does it fix or at least significantly minimize the memory leak you suspected with some of us getting constant lock-ups from overclocked memory dumping?

@Espergium
Copy link

Espergium commented Jun 21, 2018

Before you get too deep into the issue you should seriously troubleshoot this on jre8 v171. You are using version 66 which is completely unsupported for all platforms.

Make sure you can reproduce this on the latest version before you try to debug all this.

@Mareckoo01
Copy link

Mareckoo01 commented Jun 22, 2018

I had similar issues but i "fixed" 2 of them..

Full gc cycle takes ~9s on average for me and it still runs 2 at login but i added fml timeout flag (set to 300) for client and server and that fixed it..

Idk what is causing this but the problem with ram filling up and then running frequent full gc cycles (time between them gets to <2s) is caused by particles, you can fix this by removing fancy block particles, betterfps and optifine if you added it but if you need optifine then you will need a mod that clears them..
I made a mod that makes a jframe with button to clear them so i do that manually..
My game starts getting laggy when it holds 2-4m instances of particles so i clear them and run full gc cycle.

Memory given to:
Client-4096M
Server-2100M
Server is ran on separate local pc.

I use this to clear particles:

private void clearP() {
	Set<Field> qF = new HashSet<>();
	for(Field f : ParticleManager.class.getDeclaredFields()){
		if(Queue.class.isAssignableFrom(f.getType())) qF.add(f);
	}
	Queue bQ = null;
	for(Field f : qF){
		try {
			f.setAccessible(true);
			Queue q = (Queue) f.get(Minecraft.getMinecraft().effectRenderer);
			if(bQ==null) bQ=q;
			else if(bQ.size()<q.size()) bQ=q;
			Minecraft.getMinecraft().player.sendMessage(new TextComponentString("Name: "+f.getName()+", size: "+q.size()));
		} catch (IllegalArgumentException | IllegalAccessException e) {
			e.printStackTrace();
		}
	}
	bQ.clear();
}

@keybounce
Copy link
Author

@sharktemplar
The only purpose of those arguments, primarily, is to have a decent set of options for CMS, because CMS permits you to say "do not lock up and do a full garbage collection".

Upgraded to Java 171, client and server both have a timeout parameter of 240, and yet I am disconnected physically less than 30 seconds after seeing the client login message in the server log. Total time from connection initial establishment to disconnect is typically less than 60 seconds.

Seriously, a mod to remove those lines from the client would be so bleeping welcome. CMS has a higher memory overhead, does not do nearly as well when you are running low on memory, and those figures I gave before are simply not sufficient -- I'm going to have to do some more tuning. "Works" does not mean "works well".

@stale
Copy link

stale bot commented Jul 25, 2018

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for reporting to the tracker.

@stale
Copy link

stale bot commented Aug 1, 2018

This issue has been marked as stale for a week now with no new responses. As such, it has been automatically closed.

@stale stale bot closed this as completed Aug 1, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Sep 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants