Skip to content
This repository has been archived by the owner on Feb 9, 2021. It is now read-only.

Server stuck : Kill -9 is the only way to stop it. #2532

Closed
3do2 opened this issue Jan 12, 2015 · 48 comments
Closed

Server stuck : Kill -9 is the only way to stop it. #2532

3do2 opened this issue Jan 12, 2015 · 48 comments

Comments

@3do2
Copy link

3do2 commented Jan 12, 2015

Twice a day the server is stuck.
with 40 palyers (full) almost all the time.
A second server with 40 players have the same issue.
OS : Debian

No errors on the log.
No messages
If i try to launch a command into the console i've no response.
If i do a CTRL-C that do not stop the server.
The only way to restart the server is to perform a kill -9 into the process.
Because I've no crashdump nor message it"s hard to give more informations :/

I run the Build 936, but I've got this issue for a long time.

Can i do something to have more informations about the origin of the bug ?

@shoghicp
Copy link
Member

It seems like this might be a plugin issue, could you try removing all of them?

@3do2
Copy link
Author

3do2 commented Jan 12, 2015

I there is no player on the server, the server doesn't get stuck.
If i remove all plugins i cannot have 40 players playing during hours :/

The server does not lag. It freeze suddenly, without any warning signs.

I've tried to set timings on, but i cannot perform the timings merge to get the result and find a possible infinite loop somewhere.

That happend again this morning on one server => Top command

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2788 minecraf 20 0 740m 438m 16m R 100.2 4.4 173:55.07 php

Is there a way to stop the server and to know which part of the code is running. Like a debug mode ?

@wrewolf
Copy link
Contributor

wrewolf commented Jan 12, 2015

I ran into the same problem

@shoghicp
Copy link
Member

This seems like a loop that is not ending. We did get reports of this before, the cause being on a plugin that created Async Tasks not properly.
You can fork PM and add a bunch of debug outputs. Also, enabling debug level 2 on pocketmine.yml will give extra information.
I don't know which part could be causing this so I cannot give you a custom build.

@3do2
Copy link
Author

3do2 commented Jan 12, 2015

I'll start to add debug level 2 and wait for the next issue.
And add trace for each async tasks my plugins launch...

@3do2
Copy link
Author

3do2 commented Jan 12, 2015

I've 2 types of tasks called by the main thread.

  • The scheduleAsyncTask use a class that extends PluginTask
  • And the scheduleRepeatingTask use a class that extends AsyncTask

Is it correct ?

@shoghicp
Copy link
Member

Async Task != Scheduled task (Plugin Task)

@3do2
Copy link
Author

3do2 commented Jan 13, 2015

Oh yes what i've written is wrong :
I've used PluginTask class for Sync Task (scheduleRepeatingTask and scheduleDelayedTask)
And AsyncTask class for scheduleAsyncTask

I've removed all my implementation of the CallbackTask class in my code.

I'll see today if the server get stuck again.

@shoghicp
Copy link
Member

Async tasks might cause issues too, due to the serialization used.

@shoghicp
Copy link
Member

@3do2 If your server hangs up again, try running this script:
https://gist.github.com/shoghicp/2b93ac93664561c0e9e4#file-analysishangup-sh
(analysisHangUp.sh)

Be sure to have only one PocketMine server running at that moment (the blocked one), it'll get all kinds of useful info from the existing process without stopping it. Be aware that you need strace installed and it'll ask you for root.

@shoghicp
Copy link
Member

This is due to PHP waiting on an non-existing resource/memory.
Using some runtime analysis tools, this was some of the output:

root     19495     1 19495  0   18 Jan13 ?        00:00:15 PocketMine-MP 1.4-916
root     19495     1 19499  0   18 Jan13 ?        00:00:12 PocketMine-MP 1.4-916
root     19495     1 19500  0   18 Jan13 ?        00:02:46 PocketMine-MP 1.4-916
root     19495     1 19503  0   18 Jan13 ?        00:06:46 PocketMine-MP 1.4-916
root     19495     1 19505  0   18 Jan13 ?        00:02:55 PocketMine-MP 1.4-916
root     19495     1 19506  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19521  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19641  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19654  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19699  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19720  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19758  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19800  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 19977  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 20015  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 20084  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 20099  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
root     19495     1 20110  0   18 Jan13 ?        00:00:00 PocketMine-MP 1.4-916
...
7f279b175000-7f279b1c4000 rw-p 00000000 00:00 0
7f279b1c4000-7f279b1c5000 ---p 00000000 00:00 0
7f279b1c5000-7f279bbc5000 rw-p 00000000 00:00 0
7f279bbc5000-7f27a3bc5000 rw-s 00000000 00:2c 199123271    (deleted)/dev/zero
...
Process 19495 attached - interrupt to quit
futex(0x7f279bbc49d0, FUTEX_WAIT, 19499, NULL <unfinished ...>
Process 19495 detached

As strace shows (output over 15 seconds), PHP is waiting on process 19495 thread 19499 for the memory address 0x7f279bbc49d0 to change, which is inside the range 7f279bbc5000-7f27a3bc5000 that is now (deleted)/dev/zero. This means that somehow the part that is being looked for a change no longer exists, and it won't change anymore.

If someone with this issue could try to disable Query, tell me if the server stops hanging up.

@3do2
Copy link
Author

3do2 commented Jan 15, 2015

ok thanks the server hangs up again and i run your script. I've looked at it but it's hard to interpret.
Where can I send the analysis.log to you ?

@3do2
Copy link
Author

3do2 commented Jan 15, 2015

Server HangUp Again =>
Here are the last 2 analysis.log
http://pastebin.com/irsunNbM
http://pastebin.com/83EeL2US

I don't see any FUTEX_WAIT in my analysis.log files ?

@ghost
Copy link

ghost commented Jan 15, 2015

so what would be the solution @shoghicp

@shoghicp
Copy link
Member

@3do2 Yours is dead!
I'll update the script with the ability to trace each thread.

@KiddRock
Copy link

yes i got this problem too, but i still waiting when all not responding
and i found this info

Fatal error: Out of memory (allocated 1268514816) (tried to allocate 33554432 bytes) in phar:///root/Downloads/pm/PocketMine-Soft.phar/src/raklib/server/Session.php on line 176
00:19:21 [EMERGENCY] [RakLib Thread #-1270875280] RakLib crashed!
@shoghicp @3do2

@3do2
Copy link
Author

3do2 commented Jan 16, 2015

If i wait, nothing happens (i've tried to wait up to 12 hours)
I suspect too a memory issue, because I have 2 servers and the one with 2GB of memory-limit (server.properties) hangup more frequently that the server with 3GB.
But i have no way to confirm that :/
PS : I restart my servers every 3 hours to clear the memory...

@ghost
Copy link

ghost commented Jan 18, 2015

@3do2 i have made a temporary solution to this issue. https://github.com/iJoshuaHD/ASR-Screen-Listener its not that accurate but helps. this is useful until a solution for this matter is settled.

@alcmoe
Copy link

alcmoe commented Jan 18, 2015

I got same issue,something players get 20+,then it Will stuck,And CPU use 100%,

@alcmoe
Copy link

alcmoe commented Jan 25, 2015

WorldEditArt。this damn plugin,made this issue

@3do2
Copy link
Author

3do2 commented Jan 25, 2015

@Alcatraz-Du I've this plugin too, but I got this error before i add this plugin

@3do2
Copy link
Author

3do2 commented Jan 25, 2015

@shoghicp will you update your script soon :/ my servers are hanging up so frequently :(

@alcmoe
Copy link

alcmoe commented Jan 26, 2015

I Think we should find out which plugin made this issue together.
Here is my plugin list.
You should list yours
-218235c9453fec61

@ghost
Copy link

ghost commented Jan 26, 2015

@Alcatraz-Du blame your plugins installed thats not present in the pocketmine plugin repo.

@3do2
Copy link
Author

3do2 commented Feb 7, 2015

@shoghicp I've removed all async tasks from all plugins... but my servers still hangup :(

@3do2
Copy link
Author

3do2 commented Feb 19, 2015

@shoghicp told me today it can be a php binary issue.
I've recomplied php for 5.6.5 (last php build) under Debian 64 bits
My server is running, i'm waiting...
Shoghicp, told me to test 5.6.1 too because, this issue can comes from a bug in the last php build... if someone can test...

@shoghicp
Copy link
Member

@3do2 or even PHP 5.6.0

@alcmoe
Copy link

alcmoe commented Feb 20, 2015

I had a test,use 5.6.4 ,the load will become very high,but use 5.6.2 load will get normal,5.6.4 is easier get stuck than 5.6.2

@3do2
Copy link
Author

3do2 commented Feb 20, 2015

my servers get stucks this morning with 5.6.5, i restarted them with 5.6.1... let's wait...

@3do2
Copy link
Author

3do2 commented Feb 21, 2015

I've tried all php version from 5.6.0 to 5.6.5, and the issue still occurs :(

@3do2
Copy link
Author

3do2 commented Feb 23, 2015

@wrewolf is it ok with 5.6.6 ?

@alcmoe
Copy link

alcmoe commented Feb 25, 2015

Ome of my friend told me he got the same issue, and he think it's because config.yml,,,,when server process the .yml, config.yml his server will get stuck

@3do2
Copy link
Author

3do2 commented Feb 25, 2015

my servers get stucks after hours. can you explain more the reason why config.yml ca explain this bug.

I noticed that each time the console is frozen, i have those non ansi text in the chat a few second before.

02:36:32 [INFO] <fidel[0]> у тебя нехватает на ищо компамэс

The server is on a infinite loop (process is not dead) because the process consume near 100% of the CPU while the console is frozen.

I think there is an infinite loop somewhere in the pocketmine code, and this infinite loop is triggered by a player event that can occurs at anytime (a few minutes or hours). Pehaps a chat lenghts overflow UTFXX or anithing else ?
@shoghicp is it possible to add trace on each thread of the pocketmine server, and into the main thread to find where the server main loop is stuck.
I can do it but I need documentation to explain me how to start pocketmine from the sources.

@alcmoe
Copy link

alcmoe commented Feb 26, 2015

Sometimes I use config.yml to write down player's inventory,then read config to ,give player's items, then It stuckef

@wrewolf
Copy link
Contributor

wrewolf commented Feb 27, 2015

5.6.6 freeze too
@3do2

@Q-bick
Copy link

Q-bick commented Mar 1, 2015

My server is running for more than 3 days without console freeze. Online ~13 players.

For test I changed the source server files.

  1. delete all $this->server->saveOfflinePlayerData in Player.php
  2. delete function dropItem in Level.php

Maybe this will help find a bug with freeze

@HmHmmHm
Copy link

HmHmmHm commented Mar 1, 2015

@Q-bick '1' is dangerous, because Offline Player data doesn't saved, and maybe i think it will result user data lost.
'2' is same, if dropItem doesn't exist, it occurs server crash

@ghost
Copy link

ghost commented Mar 1, 2015

13 players without freezing is reasonable -,- try hosting 30 or more and lets see what happens with more plugins

@3do2
Copy link
Author

3do2 commented Mar 1, 2015

@Q-bick why did you remove this ? Did you notice that the server get stuck when thoses functions were called ?

@wrewolf
Copy link
Contributor

wrewolf commented Mar 11, 2015

before freze

Fatal error: Allowed memory size of 2147483648 bytes exhausted (tried to allocate 72 bytes) in /root/build/10/src/pocketmine/level/Level.php on line 985

@shoghicp
Copy link
Member

@3do2 one way the server can freeze is by being stuck on a kernel call, specifically to a non-existent old file.

@3do2
Copy link
Author

3do2 commented Mar 12, 2015

@shoghicp none of my plugins delete files while the server is running :/

@ghost
Copy link

ghost commented Mar 17, 2015

it happens from time to time. well idk why this happens.

@3do2
Copy link
Author

3do2 commented Mar 17, 2015

@xxFlare same for me
@iJoshuaHD i've tried all binaries, bug is still there

@vvzar
Copy link

vvzar commented Apr 11, 2015

02:36:32 [INFO] у тебя нехватает на ищо компамэс

This "non ansi" text is on russian language. Means - "You do not have enough..."and then, I see not real words "ищо компамэс". I cant understand what they mean. May be some plugin have memory leaks and crashes with this message when free memory runs out.

@shoghicp
Copy link
Member

Please see this article

Seems like FUTEX_WAIT was broken in the Linux kernel.

The impact of this kernel bug is very simple: user processes can deadlock and hang in seemingly impossible situations. A futex wait call (and anything using a futex wait) can stay blocked forever, even though it had been properly woken up by someone.

If you had this problem, try updating to 3.18 or newer kernel. You can also check if your maintainer has backported the fix to older versions.

@PEMapModder
Copy link
Collaborator

@shoghicp 👍

@dancing-ipsum
Copy link

:O

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants