Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socket close: graceful shutdown vs. lingering (eternal TIME_WAIT issue) #14

Open
sebres opened this issue Aug 27, 2020 · 2 comments
Open

Comments

@sebres
Copy link
Contributor

sebres commented Aug 27, 2020

Package seems to be affected by same issue as Tcl - [b6d0d8cc2c]:

proc stats {} { 
  upvar port port; set state TIME_WAIT; # note it is localized on windows
  puts [regexp -all ":$port\\s+$state" [exec netstat -n]]
}
set port 80; stats
timerate { time { close [socket localhost $port] } 50; after 100; stats }

results to this:

% set port 80; stats
0
% timerate { time { close [socket localhost $port] } 50; after 100; stats }
50
100
150
200
250

This test and test in next comment expecting replacement for socket with ::iocp::inet::socket or following inject for every interpreter (thread) used in test:

# inject default socket command:
namespace eval ::iocp::inet { namespace export socket }; rename ::socket {}; namespace import ::iocp::inet::socket

May be similar fix like [b960d1b71e] or some option (like -linger) can be implemented to avoid such TIME-WAIT floods.
Or even both, so -linger auto may be set as default, so decision whether lingering or graceful shutdown is expected can follow the implementation like [b960d1b71e].

@sebres
Copy link
Contributor Author

sebres commented Jan 25, 2021

Revisited that... first attempt to fix is partially successful (fewer TIME_WAITs now, but they are still there sporadically), must investigate deeper...
Here is more complex test covering additional situations (and need threads module):

test code ...
proc log args {}
proc get_tw_count {port} {
  regexp -all [regsub {@port} {:@port\M[^\n]+TIME_WAIT} $port] [exec netstat -n]
}
proc test {port} {
  puts [string repeat -- 20]\t[get_tw_count $port]
  timerate {
    puts [timerate {
      close [set ch [socket 127.0.0.1 $port]]
      log C-closed-cli:\t$ch
    } 2000 50]
    after 100; puts [string repeat -- 20]\t[get_tw_count $port]
  } 2000
}
proc test_server {port onConnect} {
  package require Thread
  thread::send -async [set th [thread::create]] [list proc log args [info body log]]
  thread::send -async $th [list proc onConnect {ch args} $onConnect]
  thread::send -async $th [list set port $port]
  thread::send $th { set srv [socket -server onConnect $port] }
  test $port
  thread::send $th {close $srv; thread::release}
}

puts "================== extern (web-server) =============="
test 80

puts "================== simple-close ====================="
test_server 8055 {
  log S-close-srv:\t$ch; close $ch; log S-closed:\t$ch
}

puts "================== proper-close ====================="
test_server 8055 {
  chan event $ch readable [list apply {{ch} {
    if {[catch {string length [read $ch 1024]} s] || $s == 0} {
      log S-close-srv:\t$ch; close $ch; log S-closed:\t$ch
    }
  }} $ch]
}

@sebres
Copy link
Contributor Author

sebres commented Jan 25, 2021

Fixed (well there are some TIME_WAIT still remaining sporadically, but the "flood" is completely gone)...
Here are results of aforementioned script before and after:

before (not fixed, ca. 1500 handles in TIME_WAIT) ...
================== extern (web-server) ==============
----------------------------------------        0
2690.62 µs/# 50 # 371.66 #/sec 134.531 net-ms
----------------------------------------        50
435.000 µs/# 50 # 2298.9 #/sec 21.750 net-ms
----------------------------------------        100
380.500 µs/# 50 # 2628.1 #/sec 19.025 net-ms
----------------------------------------        150
404.920 µs/# 50 # 2469.6 #/sec 20.246 net-ms
----------------------------------------        200
431.080 µs/# 50 # 2319.8 #/sec 21.554 net-ms
----------------------------------------        250
456.240 µs/# 50 # 2191.8 #/sec 22.812 net-ms
----------------------------------------        300
424.500 µs/# 50 # 2355.7 #/sec 21.225 net-ms
----------------------------------------        350
458.280 µs/# 50 # 2182.1 #/sec 22.914 net-ms
----------------------------------------        400
401.080 µs/# 50 # 2493.3 #/sec 20.054 net-ms
----------------------------------------        450
509.300 µs/# 50 # 1963.5 #/sec 25.465 net-ms
----------------------------------------        500
391.960 µs/# 50 # 2551.3 #/sec 19.598 net-ms
----------------------------------------        550
================== simple-close =====================
----------------------------------------        0
525.540 µs/# 50 # 1902.8 #/sec 26.277 net-ms
----------------------------------------        51
465.320 µs/# 50 # 2149.1 #/sec 23.266 net-ms
----------------------------------------        101
516.720 µs/# 50 # 1935.3 #/sec 25.836 net-ms
----------------------------------------        152
515.600 µs/# 50 # 1939.5 #/sec 25.780 net-ms
----------------------------------------        204
505.000 µs/# 50 # 1980.2 #/sec 25.250 net-ms
----------------------------------------        254
498.980 µs/# 50 # 2004.1 #/sec 24.949 net-ms
----------------------------------------        304
641.220 µs/# 50 # 1559.5 #/sec 32.061 net-ms
----------------------------------------        357
486.680 µs/# 50 # 2054.7 #/sec 24.334 net-ms
----------------------------------------        407
450.280 µs/# 50 # 2220.8 #/sec 22.514 net-ms
----------------------------------------        457
527.240 µs/# 50 # 1896.7 #/sec 26.362 net-ms
----------------------------------------        507
================== proper-close =====================
----------------------------------------        507
490.660 µs/# 50 # 2038.1 #/sec 24.533 net-ms
----------------------------------------        557
442.060 µs/# 50 # 2262.1 #/sec 22.103 net-ms
----------------------------------------        607
509.540 µs/# 50 # 1962.6 #/sec 25.477 net-ms
----------------------------------------        657
516.840 µs/# 50 # 1934.8 #/sec 25.842 net-ms
----------------------------------------        707
519.140 µs/# 50 # 1926.3 #/sec 25.957 net-ms
----------------------------------------        757
521.600 µs/# 50 # 1917.2 #/sec 26.080 net-ms
----------------------------------------        807
530.500 µs/# 50 # 1885.0 #/sec 26.525 net-ms
----------------------------------------        857
508.500 µs/# 50 # 1966.6 #/sec 25.425 net-ms
----------------------------------------        907
508.540 µs/# 50 # 1966.4 #/sec 25.427 net-ms
----------------------------------------        957
after (fixed, 0 handles in TIME_WAIT) ...
================== extern (web-server) ==============
----------------------------------------        0
2817.28 µs/# 50 # 354.95 #/sec 140.864 net-ms
----------------------------------------        0
440.720 µs/# 50 # 2269.0 #/sec 22.036 net-ms
----------------------------------------        0
455.560 µs/# 50 # 2195.1 #/sec 22.778 net-ms
----------------------------------------        0
447.700 µs/# 50 # 2233.6 #/sec 22.385 net-ms
----------------------------------------        0
467.100 µs/# 50 # 2140.9 #/sec 23.355 net-ms
----------------------------------------        0
465.260 µs/# 50 # 2149.3 #/sec 23.263 net-ms
----------------------------------------        0
475.800 µs/# 50 # 2101.7 #/sec 23.790 net-ms
----------------------------------------        0
489.020 µs/# 50 # 2044.9 #/sec 24.451 net-ms
----------------------------------------        0
461.960 µs/# 50 # 2164.7 #/sec 23.098 net-ms
----------------------------------------        0
483.520 µs/# 50 # 2068.2 #/sec 24.176 net-ms
----------------------------------------        0
================== simple-close =====================
----------------------------------------        0
407.140 µs/# 50 # 2456.2 #/sec 20.357 net-ms
----------------------------------------        0
424.200 µs/# 50 # 2357.4 #/sec 21.210 net-ms
----------------------------------------        0
431.500 µs/# 50 # 2317.5 #/sec 21.575 net-ms
----------------------------------------        0
471.080 µs/# 50 # 2122.8 #/sec 23.554 net-ms
----------------------------------------        0
412.480 µs/# 50 # 2424.4 #/sec 20.624 net-ms
----------------------------------------        0
473.000 µs/# 50 # 2114.2 #/sec 23.650 net-ms
----------------------------------------        0
439.500 µs/# 50 # 2275.3 #/sec 21.975 net-ms
----------------------------------------        0
465.040 µs/# 50 # 2150.4 #/sec 23.252 net-ms
----------------------------------------        0
425.200 µs/# 50 # 2351.8 #/sec 21.260 net-ms
----------------------------------------        0
================== proper-close =====================
----------------------------------------        0
587.040 µs/# 50 # 1703.5 #/sec 29.352 net-ms
----------------------------------------        0
429.360 µs/# 50 # 2329.0 #/sec 21.468 net-ms
----------------------------------------        0
455.920 µs/# 50 # 2193.4 #/sec 22.796 net-ms
----------------------------------------        0
471.380 µs/# 50 # 2121.4 #/sec 23.569 net-ms
----------------------------------------        0
523.420 µs/# 50 # 1910.5 #/sec 26.171 net-ms
----------------------------------------        0
412.500 µs/# 50 # 2424.2 #/sec 20.625 net-ms
----------------------------------------        0
457.960 µs/# 50 # 2183.6 #/sec 22.898 net-ms
----------------------------------------        0
441.520 µs/# 50 # 2264.9 #/sec 22.076 net-ms
----------------------------------------        0

I'll make a small review and hereafter a PR here.

sebres added a commit to sebres/tcl-iocp that referenced this issue Jan 25, 2021
…wn vs. lingering (eternal TIME_WAIT issue);

back-ported [b960d1b71e] from tcl-repo, adjusted for IOCP, provides asynchronous close in helper thread (after completion) if pending operations available
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant