Skip to content

Latest commit

 

History

History
616 lines (407 loc) · 45.5 KB

20130929_01.md

File metadata and controls

616 lines (407 loc) · 45.5 KB

systemtap Built-in probe point types (DWARF-based kernel or module probes)

作者

digoal

日期

2013-09-29

标签

PostgreSQL , Linux , systemtap , stap , dtrace , probe


背景

This family of probe points uses symbolic debugging information for the target kernel or module, as may be found in executables that have not been stripped, or in the separate debuginfo packages. They allow logical placement of probes into the execution path of the target by specifying a set of points in the source or object code. When a matching statement executes on any processor, the probe handler is run in that context.

以上是Built-in probe point types的定义. 没有什么特别的, 它是操作系统自带的就行了. 每个内核版本有对应的包.

Probe points in a kernel are identified by module, source file, line number, function name or some combination of these.

例如 :

[root@db-172-16-3-39 ~]# rpm -qa|grep debuginfo  
kernel-debuginfo-common-2.6.18-348.12.1.el5  
kernel-debuginfo-2.6.18-348.12.1.el5  
[root@db-172-16-3-39 ~]# uname -a  
Linux db-172-16-3-39.sky-mobi.com 2.6.18-348.12.1.el5 #1 SMP Wed Jul 10 05:28:41 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux  

探针的使用语法, 探针指定的方式和DNS类似, a.b.c.d, a为最大的类, b为a下面的子类, 以此类推. a也可以称为prefix, c也可称为suffix.

可能还有d, 一般为配置项. 例如 :

module(MPATTERN).function(PATTERN).return.maxactive(VALUE)  

一般的用法举例 :

kernel.function("foo")  
kernel.function("foo").return  
module("ext3").function("ext3_*")  
kernel.function("no_such_function") ?  #这里的问号表示即使没有匹配的探针也不报错.  
syscall.*  
end  
timer.ms(5000)  

?的用法详见

http://blog.163.com/digoal@126/blog/static/1638770402013811957335/

如果要获得当前系统中支持的函数探针 :

[root@db-172-16-3-39 ~]# stap -l 'kernel.function("**")'  

或者使用通配符, 通配符的用法可参考man stapprobes, 或者接着往下看.

[root@db-172-16-3-39 ~]# stap -l 'kernel.function("zlib*")'  
kernel.function("zlib_adler32@include/linux/zutil.h:81")  
kernel.function("zlib_fixedtables@lib/zlib_inflate/inflate.c:94")  
kernel.function("zlib_inflate@lib/zlib_inflate/inflate.c:333")  
kernel.function("zlib_inflateEnd@lib/zlib_inflate/inflate.c:756")  
kernel.function("zlib_inflateIncomp@lib/zlib_inflate/inflate.c:888")  
kernel.function("zlib_inflateInit2@lib/zlib_inflate/inflate.c:64")  
kernel.function("zlib_inflateReset@lib/zlib_inflate/inflate.c:24")  
kernel.function("zlib_inflateSyncPacket@lib/zlib_inflate/inflate.c:162")  
kernel.function("zlib_inflate_table@lib/zlib_inflate/inftrees.c:25")  
kernel.function("zlib_inflate_workspacesize@lib/zlib_inflate/inflate.c:19")  
kernel.function("zlib_updatewindow@lib/zlib_inflate/inflate.c:117")  

DWARF探针大致可以通过模块, 源文件, 行号, 函数名, 或者以上的组合来指定.

例如 :

kernel.function(PATTERN)  
kernel.function(PATTERN).call  
kernel.function(PATTERN).return  
kernel.function(PATTERN).return.maxactive(VALUE)  
kernel.function(PATTERN).inline  
kernel.function(PATTERN).label(LPATTERN)  
module(MPATTERN).function(PATTERN)  
module(MPATTERN).function(PATTERN).call  
module(MPATTERN).function(PATTERN).return.maxactive(VALUE)  
module(MPATTERN).function(PATTERN).inline  
kernel.statement(PATTERN)  
kernel.statement(ADDRESS).absolute  
module(MPATTERN).statement(PATTERN)  

以上探针的形式解释 :

The .function variant places a probe near the beginning of the named function, so that parameters are available as context variables.  

.function指函数开始位置(通过pp函数可以看到精确的位置信息). 所以使用.function探针可以打印函数的参数, 以及上下文相关变量.

例子1 :

[root@db-172-16-3-39 ~]# stap --vp 5 -e 'probe kernel.function("tcp_v4_connect") {printf("%s, %d, %d, %s\n", pp(), pid(), cpu(), $$vars);}'  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146796virt/23712res/3012shr/21392data kb, in 160usr/10sys/172real ms.  

handler输出如下.

kernel.function("tcp_v4_connect@net/ipv4/tcp_ipv4.c:158"), 15460, 2, sk=0xffff810224ec5340 uaddr=0xffff8101dfd2dec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8102 nexthop=? tmp=? err=? inet_opt=?  

tcp_v4_connect函数源码参考本文末尾, 已经加了行号. 从以上输出可以看到, 探针的位置在158行, 也就是函数开始位置.

由于函数的本地变量未初始化, 所以这里打印出来的本地变量是未知的? .

daddr这个本地变量在第163行才定义, 但是在探针处158行, 函数开始位置为什么有值呢? 有缘人帮忙解答一下, 谢谢.

The .return variant places a probe at the moment of return from the named function, so the return value is available as the $return context variable. The entry parameters are also available, though the function may have changed their values. Return probes may be further qualified with .maxactive, which specifies how many instances of the specified function can be probed simultaneously. You can leave off .maxactive in most cases, as the default (KRETACTIVE) should be sufficient. However, if you notice an excessive number of skipped probes, try setting .maxactive to incrementally higher values to see if the number of skipped probes decreases.

.return 在函数返回时触发, 因此可以获得函数的返回值$return. 同样函数的参数也是可以被获得的, 但是, 这些值可能在函数内被变更过, .return后面还可以再加一个.maxactive()属性, 用来限定允许最大多少个该"函数探针"被同时触发. 默认取KRETACTIVE的值. 如果在调试过程中发现有很多skipped probes, 可以适当加大这个.maxactive值.

例子2 :

[root@db-172-16-3-39 ~]# stap --vp 5 -e 'probe kernel.function("tcp_v4_connect") {printf("%s, %d, %d, %s, %s\n", pp(), pid(), cpu(), $$vars, $sk$$.$uaddr$$.$addr_len$$.$inet$$.$tp$$.$usin$$.$rt$$.$daddr$$.$nexthop$$.$tmp$$.$err$$.$inet_opt$$);}'  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146812virt/23700res/3012shr/21408data kb, in 160usr/20sys/173real ms.  
kernel.function("tcp_v4_connect@net/ipv4/tcp_ipv4.c:158"), 4284, 3, sk=0xffff81012d214d00 uaddr=0xffff8100aa907ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8101 nexthop=? tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214d58, .prev=0xffff81012d214d58}}}, .sk_sleep=0xffff81012ef9  
  
[root@db-172-16-3-39 ~]# stap --vp 5 -e 'probe kernel.function("tcp_v4_connect").return {printf("%s, %d, %d, %s, %s\n", pp(), pid(), cpu(), $$vars, $sk$$.$uaddr$$.$addr_len$$.$inet$$.$tp$$.$usin$$.$rt$$.$daddr$$.$nexthop$$.$tmp$$.$err$$.$inet_opt$$);}'  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146804virt/23700res/3012shr/21400data kb, in 160usr/10sys/172real ms.  
kernel.function("tcp_v4_connect@net/ipv4/tcp_ipv4.c:158").return, 4336, 2, sk=0xffff810222892d00 uaddr=0xffff8100aa631ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8102 nexthop=? tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff810222892d58, .prev=0xffff810222892d58}}}, .sk_sleep=0xffff81022bcc  
# 在.return探针中可以得到$return变量的值.  
[root@db-172-16-3-39 ~]# stap --vp 5 -e 'probe kernel.function("tcp_v4_connect").return {printf("%s, %d, %d, %s\n", pp(), pid(), cpu(), $return$$);}'  
Parsed kernel "/lib/modules/2.6.18-348.12.1.el5/build/.config", containing 1977 tuples  
Parsed kernel /lib/modules/2.6.18-348.12.1.el5/build/Module.symvers, which contained 3546 vmlinux exports  
Searched: " /usr/share/systemtap/tapset/x86_64/*.stp ", found: 4, processed: 4  
Searched: " /usr/share/systemtap/tapset/*.stp ", found: 81, processed: 81  
Pass 1: parsed user script and 85 library script(s) using 146796virt/23704res/3012shr/21392data kb, in 160usr/10sys/171real ms.  
kernel.function("tcp_v4_connect@net/ipv4/tcp_ipv4.c:158").return, 6461, 2, 0  

函数可用的三个过滤规则如下 :

The .inline modifier for .function filters the results to include only instances of inlined functions.   
The .call modifier selects the opposite subset.   
The .exported modifier filters the results to include only exported functions.   
Inline functions do not have an identifiable return point, so .return is not supported on .inline probes.  

inline过滤器不能使用.return指定返回probe, 因为inline没有返回点.

The .statement variant places a probe at the exact spot, exposing those local variables that are visible there.  

语句级的探针, 用于指定源码中的指定行或者行范围, 一般用于观察变量的值在函数中的变化.

另外, 其实如果使用statement探针, 指定行为funciton开头的行号. 那么和使用funciton探针效果是一样的.

inline函数过滤参考 :

http://en.wikipedia.org/wiki/Inline_function

例子3 :

probe kernel.statement("*@net/ipv4/tcp_ipv4.c:159")  
[root@db-172-16-3-39 ~]# stap -e 'probe kernel.statement("*@net/ipv4/tcp_ipv4.c:159") {printf("%s, %d, %d, %s, %s\n", pp(), pid(), cpu(), $$vars, $sk$$.$uaddr$$.$addr_len$$.$inet$$.$tp$$.$usin$$.$rt$$.$daddr$$.$nexthop$$.$tmp$$.$err$$.$inet_opt$$);}'  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:168"), 4761, 2, sk=0xffff810222892d00 uaddr=0xffff8100acfc1ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8102 nexthop=? tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff810222892d58, .prev=0xffff810222892d58}}}, .sk_sleep=0xffff81022327  

注意以上输出, pp()函数输出的位置为168行. 参见本文末尾, 168行所有的变量定义都好了.

改成169行, 实际上输出的是171行的位置. 171是下一个语句开始前.

probe kernel.statement("*@net/ipv4/tcp_ipv4.c:169")  
[root@db-172-16-3-39 ~]# stap -e 'probe kernel.statement("*@net/ipv4/tcp_ipv4.c:169") {printf("%s, %d, %d, %s, %s\n", pp(), pid(), cpu(), $$vars, $sk$$.$uaddr$$.$addr_len$$.$inet$$.$tp$$.$usin$$.$rt$$.$daddr$$.$nexthop$$.$tmp$$.$err$$.$inet_opt$$);}'  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:171"), 4907, 5, sk=0xffff81012d214080 uaddr=0xffff8100afb07ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8101 nexthop=? tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d2140d8, .prev=0xffff81012d2140d8}}}, .sk_sleep=0xffff81012d0d  

使用statement探针时, 不需要指定函数名. 关键是要指定文件和行号.

接下来对通配符做一些介绍 :

kernel.function(PATTERN)  
kernel.function(PATTERN).call  
kernel.function(PATTERN).return  
kernel.function(PATTERN).return.maxactive(VALUE)  
kernel.function(PATTERN).inline  
kernel.function(PATTERN).label(LPATTERN)  
module(MPATTERN).function(PATTERN)  
module(MPATTERN).function(PATTERN).call  
module(MPATTERN).function(PATTERN).return.maxactive(VALUE)  
module(MPATTERN).function(PATTERN).inline  
kernel.statement(PATTERN)  
kernel.statement(ADDRESS).absolute  
module(MPATTERN).statement(PATTERN)  

In the above probe descriptions, MPATTERN stands for a string literal that identifies the loaded kernel module of interest and LPATTERN stands for a source program label. Both MPATTERN and LPATTERN may include asterisk (*), square brackets "[]", and question mark (?) wildcards.

MPATTERN 和 LPATTERN 分别表示模块和label的表达样式字符串, 字符串外必须使用""双引号, 字符串中可以使用*, [], ? 等通配符.  
PATTERN stands for a string literal that identifies a point in the program. It is composed of three parts:  

以上PATTERN代表funciton和statement中的表达样式字符串, 字符串外必须使用""双引号. 字符串包含3个部分.

The first part is the name of a function, as would appear in the nm program's output. This part may use the asterisk and question mark wildcard operators to match multiple names.

第一个部分是函数名, 可以使用*, [], ? 等通配符.  

The second part is optional, and begins with the ampersand (@) character. It is followed by the path to the source file containing the function, which may include a wildcard pattern, such as mm/slab*. In most cases, the path should be relative to the top of the linux source directory, although an absolute path may be necessary for some kernels. If a relative pathname doesn't work, try absolute.

第二个部分是源文件(可选), 以@开头, 后面跟字符串(源文件路径), 字符串可以使用*, [], ? 等通配符.

源文件一般使用的是相对路径, 例如本文用到的/usr/src/debug/kernel-2.6.18/linux-2.6.18-348.12.1.el5.x86_64/net/ipv4/tcp_ipv4.c

在使用时输入相对路径@net/ipv4/tcp_ipv4.c

如果内核不认相对路径的话, 请使用绝对路径.

The third part is optional if the file name part was given. It identifies the line number in the source file, preceded by a ``:'' or ``+''.   
The line number is assumed to be an absolute line number if preceded by a ``:'',   
or relative to the entry of the function if preceded by a ``+''.   
All the lines in the function can be matched with ``:*''.  函数所有行用:*表示  
A range of lines x through y can be matched with ``:x-y''. 行范围用:x-y表示.  

第三个部分是第二部分的suffix, 如果没有第二部分的话就没有第三部分.

第三部分指定行号.以 :或者+开头.  
  
:表示指定的行号  
  
+表示offset多少行, 相对行号.  
  
Alternately, specify PATTERN as a numeric constant to indicate a relative module address or an absolute kernel address.  

最后, PATTERN可以用模块地址或者内核地址来填充.

一般用statement探针时可以指定行号或者*代表这个函数中的所有行, 在这种情况下第一部分最好指定函数.

例如 :

probe kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:*")  

这个探针会对net/ipv4/tcp_ipv4.c中函数tcp_v4_connect的所以行触发 .

[root@db-172-16-3-39 ~]# stap -e 'probe kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:*") {printf("%s, %d, %d, %s, %s\n", pp(), pid(), cpu(), $$vars, $sk$$.$uaddr$$.$addr_len$$.$inet$$.$tp$$.$usin$$.$rt$$.$daddr$$.$nexthop$$.$tmp$$.$err$$.$inet_opt$$);}'  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:171"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8101 nexthop=? tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:174"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=0x10 inet=? tp=? usin=? rt=? daddr=0xffff8101 nexthop=? tmp=0xffffffffffffff9f err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:186"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:192"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:197"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:198"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:200"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:201"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:202"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:211"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:220"), 5847, 3, peer=0x0 sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0xffffffffa82d1bd8 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:226"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0xffffffffa82d1bd8 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:229"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0xffffffffa82d1bd8 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:230"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0xffffffffa82d1bd8 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:233"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0xffffffffa82d1bd8 err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\a', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0x0}, .skc_bind_node={.next=0x0, .pprev=0x0}, .skc_refcnt={.counter=1}, .skc_hash=0, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012d214718}}}, .sk_sleep=0xffff8100b060  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:245"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=0x0 inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=1}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:250"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=0x0 inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=1}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:251"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=0x0 inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=1}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:253"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=1}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:254"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=1}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:264"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=? err=? inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=2}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  
kernel.statement("tcp_v4_connect@net/ipv4/tcp_ipv4.c:262"), 5847, 3, sk=0xffff81012d2146c0 uaddr=0xffff8100a82d1ec8 addr_len=? inet=? tp=? usin=? rt=0xffff81012dfaf200 daddr=0x270310ac nexthop=0x270310ac tmp=0x0 err=0x0 inet_opt=?, {.__sk_common={.skc_family=2, .skc_state='\002', .skc_reuse='\000', .skc_bound_dev_if=0, .skc_node={.next=0x0, .pprev=0xffff81012f0f5c38}, .skc_bind_node={.next=0x0, .pprev=0xffff81012e9843b8}, .skc_refcnt={.counter=2}, .skc_hash=62915, .skc_prot=0xffffffff80370780}, .sk_shutdown=0, .sk_no_check=0, .sk_userlocks=0, .sk_protocol='\006', .sk_type=1, .sk_rcvbuf=87380, .sk_lock={.slock={.raw_lock={.slock=1}}, .owner=0x1, .wq={.lock={.raw_lock={.slock=1}}, .task_list={.next=0xffff81012d214718, .prev=0xffff81012  

其他例子 :

# Refers to the statement at line 296 within the  
# kernel/time.c file:  
kernel.statement("*@kernel/time.c:296")  
# Refers to the statement at line bio_init+3 within the fs/bio.c file:  
kernel.statement("bio_init@fs/bio.c+3")  
  
# Refers to all kernel functions with "init" or "exit"  
# in the name:  
kernel.function("*init*"), kernel.function("*exit*")  
  
# Refers to any functions within the "kernel/time.c"  
# file that span line 240:  
kernel.function("*@kernel/time.c:240")  
  
# Refers to all functions in the ext3 module:  
module("ext3").function("*")  

下面讲解一下变量 :

Some of the source-level variables, such as function parameters, locals, or globals visible in the compilation unit, are visible to probe handlers. Refer to these variables by prefixing their name with a dollar sign within the scripts. In addition, a special syntax allows limited traversal of structures, pointers, arrays, taking the address of a variable or pretty printing a whole structure.  
  
  
handler中可以看到的源码级的变量一般包含函数参数, 函数内的本地变量, 以及CU(编译单元指一个C文件,)可见的全局变量.  
  
  
$var refers to an in-scope variable var. If it is a type similar to an integer, it will be cast to a 64-bit integer for script use. Pointers similar to a string (char *) are copied to SystemTap string values by the kernel_string() or user_string() functions.  
  
  
使用$varname或者@var("varname")表示本地变量和参数变量. 数字直接打印, 字符串可使用kernel_string()或者user_string()函数打印.  
  
@var("varname") is an alternative syntax for $varname. It can also be used to access global variables in a particular compile unit (CU). @var("varname@src/file.c") refers to the global (either file local or external) variable varname defined when the file src/file.c was compiled. The CU in which the variable is resolved is the first CU in the module of the probe point which matches the given file name at the end and has the shortest file name path (e.g. given @var("foo@bar/baz.c") and CUs with file name paths src/sub/module/bar/baz.c and src/bar/baz.c the second CU will be chosen to resolve foo).  
  
  
  
对于全局变量, 必须使用另一种表示方式@var("varname@src/file.c"). 对于有多个文件路径的情况, 匹配短路径.  
  
例如 given @var("foo@bar/baz.c") and CUs with file name paths src/sub/module/bar/baz.c and src/bar/baz.c the second CU will be chosen to resolve foo  
  
$var->field or @var("var@file.c")->field traverses a structure's field. The indirection operator may be repeated to follow additional levels of pointers.  
  
->符号表示结构中的field, 注意不能使用点(.), 因为(.)在systemtap中是字符串连接符号 , 类似SQL中的||连接符 .  
  
所以在stap中统一使用->.  
  
例如$var->field or @var("var@file.c")->field  
  
$var[N] or @var("var@file.c")[N] indexes into an array. The index is given with a literal number.  
  
数组则这么表示 $var[N] or @var("var@file.c")[N]   
  
&$var or &@var("var@file.c") provides the address of a variable as a long. It can also be used in combination with field access or array indexing to provide the address of a particular field or an element in an array with &var->field, &@var("var@file.c")[N] or a combination of those accessors.  
  
取地址和c一样用&, 例如&$var or &@var("var@file.c") provides the address of a variable as a long. 取结构或数组中元素的地址如&var->field, &@var("var@file.c")[N].  
  
Using a single $ or a double $$ suffix provides a swallow or deep string representation of the variable data type. Using a single $, as in $var$, will provide a string that only includes the values of all basic type values of fields of the variable structure type but not any nested complex type values (which will be represented with {...}). Using a double $$, as in @var("var")$$ will provide a string that also includes all values of nested data types.  
  
在结构变量末尾加$表示输出结构体内所有field的值, 如果加2个$, $$表示输出结构体内所有field的值以及同样是结构体的field也继续输出, 一直到所有的基本类型都输出为止. 例如$var$, @var("var")$$  
  
$$vars expands to a character string that is equivalent to sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x", $parm1, ..., $parmN, $var1, ..., $varN)  
  
$$vars 将输出所有的本地变量和参数变量名称以及它的值.  
  
$$locals expands to a character string that is equivalent to sprintf("var1=%x ... varN=%x", $var1, ..., $varN)  
  
$$locals 将输出所有的本地变量名称以及它的值.  
  
$$parms expands to a character string that is equivalent to sprintf("parm1=%x ... parmN=%x", $parm1, ..., $parmN)  
  
$$parms 将输出所有的参数变量名称以及它的值.  

参考

1. https://sourceware.org/systemtap/langref/Probe_points.html

2. https://sourceware.org/systemtap/tapsets/

3. /usr/src/debug/kernel-2.6.18/linux-2.6.18-348.12.1.el5.x86_64/net/ipv4/tcp_ipv4.c

   156  /* This will initiate an outgoing connection. */  
   157  int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)  
   158  {  
   159          struct inet_sock *inet = inet_sk(sk);  
   160          struct tcp_sock *tp = tcp_sk(sk);  
   161          struct sockaddr_in *usin = (struct sockaddr_in *)uaddr;  
   162          struct rtable *rt;  
   163          u32 daddr, nexthop;  
   164          int tmp;  
   165          int err;  
   166          struct ip_options *inet_opt;  
   167    
   168          if (addr_len < sizeof(struct sockaddr_in))  
   169                  return -EINVAL;  
   170    
   171          if (usin->sin_family != AF_INET)  
   172                  return -EAFNOSUPPORT;  
   173    
   174          nexthop = daddr = usin->sin_addr.s_addr;  
   175          inet_opt = rcu_dereference(inet->opt);  
   176          if (inet_opt && inet_opt->srr) {  
   177                  if (!daddr)  
   178                          return -EINVAL;  
   179                  nexthop = inet_opt->faddr;  
   180          }  
   181    
   182          tmp = ip_route_connect(&rt, nexthop, inet->saddr,  
   183                                 RT_CONN_FLAGS(sk), sk->sk_bound_dev_if,  
   184                                 IPPROTO_TCP,  
   185                                 inet->sport, usin->sin_port, sk, 1);  
   186          if (tmp < 0) {  
   187                  if (tmp == -ENETUNREACH)  
   188                          IP_INC_STATS_BH(IPSTATS_MIB_OUTNOROUTES);  
   189                  return tmp;  
   190          }  
   191    
   192          if (rt->rt_flags & (RTCF_MULTICAST | RTCF_BROADCAST)) {  
   193                  ip_rt_put(rt);  
   194                  return -ENETUNREACH;  
   195          }  
   196    
   197          if (!inet_opt || !inet_opt->srr)  
   198                  daddr = rt->rt_dst;  
   199    
   200          if (!inet->saddr)  
   201                  inet->saddr = rt->rt_src;  
   202          inet->rcv_saddr = inet->saddr;  
   203    
   204          if (tp->rx_opt.ts_recent_stamp && inet->daddr != daddr) {  
   205                  /* Reset inherited state */  
   206                  tp->rx_opt.ts_recent       = 0;  
   207                  tp->rx_opt.ts_recent_stamp = 0;  
   208                  tp->write_seq              = 0;  
   209          }  
   210    
   211          if (tcp_death_row.sysctl_tw_recycle &&  
   212              !tp->rx_opt.ts_recent_stamp && rt->rt_dst == daddr) {  
   213                  struct inet_peer *peer = rt_get_peer(rt);  
   214    
   215                  /* VJ's idea. We save last timestamp seen from  
   216                   * the destination in peer table, when entering state TIME-WAIT  
   217                   * and initialize rx_opt.ts_recent from it, when trying new connection.  
   218                   */  
   219    
   220                  if (peer && peer->tcp_ts_stamp + TCP_PAWS_MSL >= xtime.tv_sec) {  
   221                          tp->rx_opt.ts_recent_stamp = peer->tcp_ts_stamp;  
   222                          tp->rx_opt.ts_recent = peer->tcp_ts;  
   223                  }  
   224          }  
   225    
   226          inet->dport = usin->sin_port;  
   227          inet->daddr = daddr;  
   228    
   229          inet_csk(sk)->icsk_ext_hdr_len = 0;  
   230          if (inet_opt)  
   231                  inet_csk(sk)->icsk_ext_hdr_len = inet_opt->optlen;  
   232    
   233          tp->rx_opt.mss_clamp = 536;  
   234    
   235          /* Socket identity is still unknown (sport may be zero).  
   236           * However we set state to SYN-SENT and not releasing socket  
   237           * lock select source port, enter ourselves into the hash tables and  
   238           * complete initialization after this.  
   239           */  
   240          tcp_set_state(sk, TCP_SYN_SENT);  
   241          err = inet_hash_connect(&tcp_death_row, sk);  
   242          if (err)  
   243                  goto failure;  
   244    
   245          err = ip_route_newports(&rt, IPPROTO_TCP, inet->sport, inet->dport, sk);  
   246          if (err)  
   247                  goto failure;  
   248    
   249          /* OK, now commit destination to socket.  */  
   250          sk->sk_gso_type = SKB_GSO_TCPV4;  
   251          sk_setup_caps(sk, &rt->u.dst);  
   252    
   253          if (!tp->write_seq)  
   254                  tp->write_seq = secure_tcp_sequence_number(inet->saddr,  
   255                                                             inet->daddr,  
   256                                                             inet->sport,  
   257                                                             usin->sin_port);  
   258    
   259          inet->id = tp->write_seq ^ jiffies;  
   260    
   261          err = tcp_connect(sk);  
   262          rt = NULL;  
   263          if (err)  
   264                  goto failure;  
   265    
   266          return 0;  
   267    
   268  failure:  
   269          /* This unhashes the socket and releases the local port, if necessary. */  
   270          tcp_set_state(sk, TCP_CLOSE);  
   271          ip_rt_put(rt);  
   272          sk->sk_route_caps = 0;  
   273          inet->dport = 0;  
   274          return err;  
   275  }  

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

digoal's wechat