Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vpnserver stop on CentOS 6.5 hangs sometimes #18

Closed
ghost opened this issue Jan 29, 2014 · 15 comments
Closed

vpnserver stop on CentOS 6.5 hangs sometimes #18

ghost opened this issue Jan 29, 2014 · 15 comments
Labels

Comments

@ghost
Copy link

ghost commented Jan 29, 2014

I've encountered a situation when stopping vpnserver and it would just hang. Logfile shows that it did shutdown (last line is: "The SoftEther VPN Server Engine has been successfully shutdown."). netstat -lntp shows vpnserver is not listening anymore.

But ps aux still shows the vpnserver process running:

/usr/vpnserver/vpnserver execsvc
/usr/vpnserver/vpnserver execsvc

which is causing my initscript to hang when stopping the service. Any way to debug more?

Starting vpnserver again will add 2 new processes in addition to the old ones remaining. Basically, vpnserver process sometimes doesn't disappear.

@dnobori
Copy link
Member

dnobori commented Jan 31, 2014

Hi,

You can build from the source code with "make DEBUG=YES" command.
After that, you can debug the running process of vpnserver by using gdb to analyze the cause of the hang-up.

@ghost
Copy link
Author

ghost commented Mar 3, 2014

Hi,

Still trying to troubleshoot this issue. As a beginner, couldn't understand gdb too well. I did use strace on the two processes vpnserver creates. On the "second" process that vpnserver creates, when I stop the daemon this is the output of strace:

Process 3086 attached - interrupt to quit
pause() = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf) = -1 EINTR (Interrupted system call)
futex(0x1508e3c, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1508e38, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x1508e10, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x1508bdc, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1393829210, 562023000}, ffffffff) = 0
futex(0x1508bb0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x1508bb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x15018ec, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x15018e8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x15018c0, FUTEX_WAKE_PRIVATE, 1) = 1
unlink("/usr/vpnserver/.pid_3E649A678269D4A01B73BF9E3388D075") = 0
unlink("/usr/vpnserver/.ctl_3E649A678269D4A01B73BF9E3388D075") = 0
fcntl(9, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
close(9) = 0
unlink("/usr/vpnserver/.VPN-EA1D67A3FB") = 0
munmap(0x7fb3b15c7000, 626688) = 0
close(8) = 0
futex(0x15016fc, FUTEX_WAIT_PRIVATE, 1, NULL) = 0
futex(0x15016d0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x104b5dc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x104b5d8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
futex(0x104b5b0, FUTEX_WAKE_PRIVATE, 1) = 1
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0
nanosleep({0, 25000000}, NULL) = 0

... and "nanosleep({0, 25000000}, NULL) = 0" will go on forever, and the process hangs. Will still try and give gdb a chance. I'm hoping this one is a little helpful. If not, apologies for the added noise.

@tonychung00
Copy link
Contributor

I encountered the same problem.
I run gdb and found out that "gdbserver stop" send a SIGTERM first and then perform a "killall -KILL" when timeout. So need to stop after the first signal:

(gdb) n
Stopping SoftEther VPN Server Service...
2496            kill(pid, SIGTERM);
(gdb) n

And you can see pid 6208 is hung:

# ps alx | grep vpn
4     0  3583  3003  20   0 138460 20668 poll_s S+   pts/2      0:00 gdb ./vpnserver
4     0  5910  3583  20   0 226032 15900 utrace Tl   pts/2      0:00 /usr/vpnserver/vpnserver stop
0     0  6207     1   0 -20  48336  1328 wait   S<s  ?          0:00 /usr/vpnserver/vpnserver execsvc
5     0  6208  6207   0 -20 1153888 18096 hrtime S<l ?          0:01 /usr/vpnserver/vpnserver execsvc

Now run "gdb vpnserver" again and attach to pid 6208. There are 30 threads:

 (gdb) info threads
      Id   Target Id         Frame
      30   Thread 0x7f1ab5714440 (LWP 6209) "idle" 0x000000382ce0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
.
.
.
      2    Thread 0x7f1ab4ce5440 (LWP 6239) "idle" 0x000000382ce0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
    * 1    Thread 0x7f1abbfb27c0 (LWP 6208) "vpnserver" 0x000000382caacced in nanosleep () from /lib64/libc.so.6

Thread 1 is just waiting for other threads to exit but all other threads are blocked in pthread_cont_wait():

(gdb) bt
#0  0x000000382ce0b5bc in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /lib64/libpthread.so.0
#1  0x00000000004085c5 in UnixWaitEvent (event=0x7f1a9c000d00,
    timeout=4294967295) at src/Mayaqua/Unix.c:1789
#2  0x00000000004dd756 in OSWaitEvent (event=0x7f1a9c000d00,
    timeout=4294967295) at src/Mayaqua/OS.c:530
#3  0x00000000004dca35 in Wait (e=0x7f1a9c000d00, timeout=4294967295)
    at src/Mayaqua/Object.c:541
#4  0x00000000004ad001 in ThreadPoolProc (t=0x7f1a9c000f60,
    param=0x7f1a9c000cc0) at src/Mayaqua/Kernel.c:857
#5  0x0000000000408204 in UnixDefaultThreadProc (param=0x7f1a9c0011b0)
    at src/Mayaqua/Unix.c:1643
#6  0x000000382ce079d1 in start_thread () from /lib64/libpthread.so.0
#7  0x000000382cae8b5d in clone () from /lib64/libc.so.6

I don't think it is critical as eventually they were killed (Workaround). The problem seems to be some kind of timing/race condition. These threads miss the event that tell them to quit gracefully.

@dnobori
Copy link
Member

dnobori commented Jun 13, 2014

Hi tonychung00,

I am a developer of SoftEther VPN. I really want to solve this reported hang problem during the shutdown, however, I have not succeeded to reproduce the problem in our environment with CentOS 6.5.
So I suppose this problem depends on the environment.

To reproduce the problem in my environment, I need your information.

  1. The problem always (or high-rate) occurs during "vpnserver stop" ?
  2. I need the results of following commands.
    uname – a
    cat /etc/rc.local
    sysctl – a
    ifconfig – a
    gcc – version
    dmesg
  3. The current vpn_server.config file.
    Please delete your credentials from the vpn_server.config file.

Could you kindly please upload the above files as a ZIP file via the following HTTPS upload page in "Private Mode" ?
https://www.softether-upload.com/
PIN: 8931

After uploaded, post the result URL on this forum. The URL can be accessed by only us, so the pasting the URL is safe.

Your cooperation is much appreciated.

@tonychung00
Copy link
Contributor

I uploaded the zip file as an sosreport output plus another directory usr/vpnserver and gcc version:
http://www.softether-upload.com/files/20140613_33ca231c4af36cbc50a08/

@tonychung00
Copy link
Contributor

To reproduce the issue, do this:

  • /usr/vpnserver/vpnserver start
  • kill -SIGTERM
  • gdb /usr/vpnserver/vpnserver

Apparently system is kind of idle.

This loop is not robust since it depends on thread_count to drop to zero.
Some timeout should be needed:

void FreeThreading()
{
        while (true)
        {
                if (Count(thread_count) == 0)
                {
                        break;
                }

                SleepThread(25);
        }

There are 29 threads that is seems to be idle and blocked on pthread_cond_wait().
May be we should somehow wake them up.
During gdb session, I also observed about 6 or 7 SIGUSR1 received.

@dnobori
Copy link
Member

dnobori commented Jul 1, 2014

Hi tonychung00,

I set up the same environment and have tried to reproduce your hang-up problem of vpnserver process, however I have failed to reproduce the problem.

If you don't mind, could you please let me log in to your actual server via ssh to analyze the problem?
If ok, please upload the text file how to log in to your actual server via the following HTTPS uploader.

https://www.softether-upload.com/ PIN: 8931

Make sure to choose the "Private Mode" when you sent a file.

After uploaded, post the result URL on this forum. The URL can be accessed by only us, so the pasting the URL is safe.

Your cooperation is much appreciated.

@dnobori
Copy link
Member

dnobori commented Jul 1, 2014

Hi thepoch,

I set up the same environment and have tried to reproduce your hang-up problem of vpnserver process, however I have failed to reproduce the problem.

If you don't mind, could you please let me log in to your actual server via ssh to analyze the problem?
If ok, please upload the text file how to log in to your actual server via the following HTTPS uploader.

https://www.softether-upload.com/ PIN: 8931

Make sure to choose the "Private Mode" when you sent a file.

After uploaded, post the result URL on this forum. The URL can be accessed by only us, so the pasting the URL is safe.

Your cooperation is much appreciated.

@ksimonenko-zz
Copy link

Hi @dnobori!
I can provide you with an access to our test/demo server so you could check it in real life what the problem is. Would you?

@tonychung00
Copy link
Contributor

I was able to reproduce the issue in virtualBox appliances. Every time, I tried power off an VM, it hang for 90 seconds.
I uploaded the rpm packages (compiled with DEBUG) here:

http://www.softether-upload.com/files/20140702_5f167b8ec8dc1ccf0824c/

Setup an ssh session may take me some time to do it. It may be easier to upload an VM for you to test. Should be less than 1GB.

@dnobori
Copy link
Member

dnobori commented Jul 9, 2014

Hi everyone,

Finally I found a tiny bug in the code which causes the dead lock in the shutting down process of vpnserver.

I will release the fixed version in a few days.

I thank your appreciation very much.

@adminport
Copy link

#341

New maintainer needed

@moatazelmasry2
Copy link
Member

@dnobori did you manage to implement a fix, or can you tell us at least what is the cause of the problem ?

@davidebeatrici
Copy link
Member

Fixed in #248, which has been merged.

I think that this issue can now be closed.

@chipitsine
Copy link
Member

can we close this issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants