* Cleanup pass, fix a few bugs.
* Cleanup pass, fix a few bugs.
* Add the -pftop (:pftop) feature. This feature displays all active PF states. PF (packet filter) must be active for this feature to be useful. Only those states which are actively passing traffic are displayed, which makes this a lot more useful than a raw 'pfctl -s state' dump. * Display is sorted by highest aggregate receive+transmit bandwidth.
* Add aggregate byte retransmission rate
* Print IPs in a more readable fashion * A better IP:port sort order * Better alignment * Add interesting state and TCP flags
* Add a new option -netbw (or :netbw when running). This option displays the aggregate and per-connection tcp receive and send data rate. Only active connections are shown, making it easy to pick out which connections are hogging the line.
* Add nameservers to resolv.conf
After e368a6e, it is found that the left bottle neck of nonblocking TCP connect(2) performance is that all socket(2) and initial TCP connect operation (bind laddr and lport) are all carried out in netisr0; CPU0 is 100% busy during test. The idea of random initial msgport for TCP is that instead of using netisr0's msgport as initial msgport, we could use any of the available netisr msgport to carry out socket(2) and initial TCP connect operation. Most of parts of TCP are already ready for random initial msgport, only TCP pru_listen requires trivial modification to fix the socket msgport to netisr0's msgport (which is required to perform global wild hashtable updating). As of this commit, the current CPU's netisr msgport will be selected as TCP socket's initial msgport, if random initial msgport is enabled. Sysctl node kern.ipc.rand_initport is added to disable this optimization. It is enabled by default. This commit improves both nonblocking TCP connect(2) and blocking TCP connect(2) performance. Nonblocking connect(2) performance measurement (i7-2600 w/ bnx(4)), using tools/tools/netrate/accept_connect/kq_connect_client: kq_connect_client -4 SERVADDR -p SERVPORT -i 8 -c 32 -l 30 (8 processes, each creates 32 connections simultaniously) 16 run average: random initial msgport netisr0 msgport 263915.17 conns/s 220979.89 conns/s This commit gives ~19% performance improvement for nonblocking connect(2) Blocking connect(2) performance measurement (i7-2600 w/ bnx(4)), using tools/tools/netrate/accept_connect/connect_client: connect_client -4 SERVADDR -p SERVPORT -i 256 -l 30 (256 processes) 16 run average: random initial msgport netisr0 msgport 240235.23 conns/s 198312.87 conns/s This commit gives ~21% performance improvement for blocking connect(2)
This is mainly used to improve TCP nonblocking connect(2) performance. Before this commit the user space thread uses nonblocking connect(2) will have to wait for the netisr completes the SYN output. This could be performance hit for nonblocking connect(2). First, the user space thread is put into sleep, even if the connect(2) is nonblocking. Second, it does not make too much sense for nonblocking connect(2) to wait for the SYN output. TCP's asynchronous pru_connect implementation will set ISCONNECTING before dispatching netmsg to netisr0. The errors like EADDRNOTAVAIL, i.e. out of local port space, will be notified through kevent(2) or getsockopt(2) SOL_SOCKET/SO_ERROR. NFS and other kernel code still use old synchronized pru_connect. This commit only affects connect(2) syscall. Sysctl node kern.ipc.soconnect_async is added to enable and disable asynchronous pru_connect. It is enabled by default. The performance measurement (i7-2600 w/ bnx(4)), using tools/tools/netrate/accept_connect/kq_connect_client: kq_connect_client -4 SERVADDR -p SERVPORT -i 8 -c 32 -l 30 (8 processes, each creates 32 connections simultaniously, run 30 secs) 16 runs average: asynchronous pru_connect synchronized pru_connect 220979.89 conns/s 189106.88 conns/s This commit gives ~16% performance improvement for nonblocking connect(2)
Obtained-from: FreeBSD SVN commit r254882 and a few missing bits
They are roughly similar to the amd64 implementations but due to i386 pmap shortcomings we have to completely flush cpu caches in some cases, which is not ideal performance-wise.
This brings in FreeBSD's revisions 188570, 188670, 188671 and 188688. 188670 (most changed lines in the patch) is for debugging purposes only, while 188570 and 188671 fix the actual issue and 188688 fixes gcc whining. Reported-and-tested-by: Dongsheng Song <email@example.com> Extra credit to vsrinivas who had actually pointed out 188570 to us a while ago before it hit us now but at the time I hadn't noticed. Quoting FreeBSD's commit messages: r188570 ------- In the case that the probe has determined that it can't query the device for a serial number, fall through to the next case so that initial negotiation still happens. Without this, devices were showing up with only 1 available tag opening, leading to observations of very poor I/O performance. This should fix problems reported with VMWare Fusion and ESX. Early generation MPT-SAS controllers with SATA disks might also be affected. HP CISS controllers are also likely affected, as are many other pseudo-scsi disk subsystems. r188671 ------- Fix parallel SCSI negotiation in the CAM_NEW_TRAN_CODE world order. Overzealous sanity checks were locking the sync_rate and offset values to zero, thanks to a twisty maze of recursive code.
dump.8 bits taken from FreeBSD.
In order to switch to mandoc(1) eventually we want groff(1) to output manual pages as similar as possible. Hyphenation is the biggest offender. Approved-by: swildner Taken-from: OpenBSD