Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: use recursion #1397

Merged
merged 4 commits into from
Mar 27, 2024
Merged

btf: use recursion #1397

merged 4 commits into from
Mar 27, 2024

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Mar 26, 2024

Reduce iterator complexity in the btf package by using recursion. If you have played with the rangefunc proposal you will realise that visitPostorder, etc. can be turned into an iter.Seq with a few lines of code. The reason to make this change now (rather than waiting) is that the recursive implementation is easier to reason about and about as fast:

core: 1
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 12th Gen Intel(R) Core(TM) i7-1260P
                                       │  base.txt   │            recursive.txt             │
                                       │   sec/op    │   sec/op     vs base                 │
ParseVmlinux                             64.06m ± 2%   64.43m ± 2%        ~ (p=0.394 n=6)
SpecCopy                                 353.2n ± 1%   270.3n ± 2%  -23.46% (p=0.002 n=6)
SpecTypeByID                             35.99n ± 0%   35.98n ± 0%   -0.03% (p=0.032 n=6)
CORESkBuff/byte_off                      1.663µ ± 0%   1.670µ ± 1%        ~ (p=0.255 n=6)
CORESkBuff/byte_sz                       1.684µ ± 0%   1.699µ ± 0%   +0.89% (p=0.002 n=6)
CORESkBuff/field_exists                  1.660µ ± 0%   1.673µ ± 0%   +0.75% (p=0.002 n=6)
CORESkBuff/signed                        1.676µ ± 0%   1.684µ ± 0%   +0.45% (p=0.002 n=6)
CORESkBuff/lshift_u64                    1.683µ ± 1%   1.691µ ± 1%        ~ (p=0.104 n=6)
CORESkBuff/rshift_u64                    1.682µ ± 0%   1.686µ ± 1%        ~ (p=0.058 n=6)
CORESkBuff/local_type_id                 123.2n ± 1%   122.6n ± 1%   -0.53% (p=0.013 n=6)
CORESkBuff/target_type_id                750.8n ± 1%   776.9n ± 0%   +3.49% (p=0.002 n=6)
CORESkBuff/type_exists                   755.5n ± 1%   779.5n ± 1%   +3.18% (p=0.002 n=6)
CORESkBuff/type_size                     777.9n ± 1%   814.1n ± 1%   +4.65% (p=0.002 n=6)
CORESkBuff/enumval_exists                744.4n ± 1%   780.1n ± 1%   +4.80% (p=0.002 n=6)
CORESkBuff/enumval_value                 744.3n ± 1%   774.4n ± 2%   +4.04% (p=0.002 n=6)
Marshaler                                13.64m ± 0%   11.73m ± 0%  -14.01% (p=0.002 n=6)
BuildVmlinux                             122.3m ± 2%   129.9m ± 1%   +6.25% (p=0.002 n=6)
StringTableZeroLookup                    2.885n ± 0%   2.885n ± 0%        ~ (p=0.574 n=6)
PostorderTraversal/single_type           32.20n ± 1%   39.69n ± 0%  +23.24% (p=0.002 n=6)
PostorderTraversal/cycle(1)              565.6n ± 1%   112.3n ± 0%  -80.14% (p=0.002 n=6)
PostorderTraversal/cycle(10)             2.730µ ± 1%   1.578µ ± 1%  -42.22% (p=0.002 n=6)
PostorderTraversal/gov_update_cpu_data   2.001m ± 1%   1.795m ± 1%  -10.32% (p=0.002 n=6)
PreorderTraversal/single_type            36.87n ± 0%
PreorderTraversal/cycle(1)               103.8n ± 1%
PreorderTraversal/cycle(10)              1.926µ ± 0%
PreorderTraversal/gov_update_cpu_data    2.181m ± 1%
Copy                                     5.084µ ± 1%   3.291µ ± 2%  -35.28% (p=0.002 n=6)
Walk/Void                                4.327n ± 1%   4.337n ± 0%        ~ (p=0.061 n=6)
Walk/Int[unsigned_size=0]                4.327n ± 0%   4.340n ± 0%   +0.31% (p=0.002 n=6)
Walk/Pointer[target=<nil>]               109.9n ± 2%   110.7n ± 2%        ~ (p=0.327 n=6)
Walk/Array[index=<nil>_type=<nil>_n=0]   119.2n ± 1%   118.0n ± 1%        ~ (p=0.054 n=6)
Walk/Struct[fields=2]                    122.5n ± 1%   122.4n ± 2%        ~ (p=0.970 n=6)
Walk/Union[fields=2]                     121.1n ± 1%   123.0n ± 2%   +1.57% (p=0.050 n=6)
Walk/Enum[size=0_values=0]               4.327n ± 0%   4.343n ± 0%   +0.35% (p=0.006 n=6)
Walk/Fwd[struct]                         4.340n ± 0%   4.340n ± 1%        ~ (p=0.333 n=6)
Walk/Typedef[<nil>]                      111.9n ± 4%   111.3n ± 1%        ~ (p=0.699 n=6)
Walk/Volatile[<nil>]                     110.8n ± 3%   110.5n ± 2%        ~ (p=0.974 n=6)
Walk/Const[<nil>]                        108.8n ± 2%   110.4n ± 2%        ~ (p=0.121 n=6)
Walk/Restrict[<nil>]                     110.9n ± 1%   110.4n ± 2%        ~ (p=0.671 n=6)
Walk/Func[static_proto=<nil>]            112.7n ± 2%   109.7n ± 2%   -2.66% (p=0.015 n=6)
Walk/FuncProto[args=2_return=<nil>]      136.9n ± 3%   128.6n ± 1%   -6.10% (p=0.002 n=6)
Walk/Var[static]                         114.4n ± 2%   107.5n ± 3%   -5.99% (p=0.002 n=6)
Walk/Datasec                             127.7n ± 2%   118.6n ± 1%   -7.09% (p=0.002 n=6)
UnderlyingType/no_unwrapping             3.384n ± 1%   3.380n ± 0%        ~ (p=0.221 n=6)
UnderlyingType/single_unwrapping         4.817n ± 1%   4.812n ± 0%   -0.08% (p=0.006 n=6)
geomean                                  538.4n        442.0n        -6.67%               ¹
¹ benchmark set differs from baseline; geomeans may not be comparable

Commit messages below.


btf: export As and remove Transformer

BTF encodes C language constructs such as const, restrict, etc. as separate
types. In most cases we don't care about these when traversing a type graph.
Users can strip these qualifiers by invoking Copy() with a
"transformer" aka UnderlyingType(). Unfortunately, copying is quite 
expensive.

For this reason we changed CO-RE relocations to use an unexported as() 
function that can be used to "unwrap" a type instead of copying it. Think of
it as a generic version of UnderlyingType. This is much faster than copying
and turns out to be quite ergonomic as well, since we often have to assert a
type anyways.

Export As() so that external users can benefit from it. Usage is like so:

    foo, ok := As[*Int](typ)
   if !ok {
       panic("not an Int")
   }

Remove the Transformer type and the extra argument from Copy, since As is
the better way to deal with qualifiers and typedefs. This allows further
simplifying Copy in a follow up commit.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: rename walkType to children

children is a much better name than walkType, so let's use that.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: replace modifyGraphPreorder with recursion

Rewrite Copy() to use recursion instead of a manually mantained stack.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

btf: replace postorderIterator with recursion

Throw out the painstakingly hand optimized postorder iterator in favour of a
simple recursive function. Turns out this can be a lot faster for larger 
types, probably because the visited map can be allocated on the stack.

There is a small hit to very simple types, but it doesn't seem to affect 
overall benchmarks too much.

    core: 1
   goos: linux
   goarch: amd64
   pkg: github.com/cilium/ebpf/btf
   cpu: 12th Gen Intel(R) Core(TM) i7-1260P
                                       │ base-po.txt │          
recursive.txt            │
                                       │   sec/op    │   sec/op     vs base 
             │
   PostorderTraversal/single_type           32.22n ± 0%   39.69n ± 0% 
+23.18% (p=0.002 n=6)
   PostorderTraversal/cycle(1)              576.3n ± 3%   112.3n ± 0% 
-80.51% (p=0.002 n=6)
   PostorderTraversal/cycle(10)             2.807µ ± 1%   1.578µ ± 1% 
-43.80% (p=0.002 n=6)
   PostorderTraversal/gov_update_cpu_data   2.039m ± 1%   1.795m ± 1% 
-11.97% (p=0.002 n=6)
   geomean                                  3.211µ        1.885µ      
-41.30%

                                        │  base-po.txt   │             
recursive.txt               │
                                       │      B/op      │     B/op      vs
base                    │
   PostorderTraversal/single_type             0.000 ± 0%       0.000 ± 0%   
     ~ (p=1.000 n=6) ¹
   PostorderTraversal/cycle(1)                264.0 ± 0%         0.0 ± 0% 
-100.00% (p=0.002 n=6)
   PostorderTraversal/cycle(10)               716.5 ± 0%       326.0 ± 0%  
-54.50% (p=0.002 n=6)
   PostorderTraversal/gov_update_cpu_data   345.1Ki ± 0%     334.8Ki ± 0%   
-2.98% (p=0.002 n=6)
   geomean                                               ²                 ?
                     ² ³
   ¹ all samples are equal
   ² summaries must be >0 to compute geomean
   ³ ratios must be >0 to compute geomean

                                        │ base-po.txt  │            
recursive.txt              │
                                       │  allocs/op   │ allocs/op   vs base 
                  │
   PostorderTraversal/single_type           0.000 ± 0%     0.000 ± 0%       
 ~ (p=1.000 n=6) ¹
   PostorderTraversal/cycle(1)              4.000 ± 0%     0.000 ± 0% 
-100.00% (p=0.002 n=6)
   PostorderTraversal/cycle(10)             7.000 ± 0%     1.000 ± 0%  
-85.71% (p=0.002 n=6)
   PostorderTraversal/gov_update_cpu_data   132.0 ± 1%     115.0 ± 1%  
-12.88% (p=0.002 n=6)
   geomean                                             ²               ?    
                 ² ³
   ¹ all samples are equal
   ² summaries must be >0 to compute geomean
   ³ ratios must be >0 to compute geomean

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>

@lmb
Copy link
Collaborator Author

lmb commented Mar 26, 2024

TODO:

  • Check for users of btf.Copy with non-nil parameter.
  • Cut release before merging?

@lmb
Copy link
Collaborator Author

lmb commented Mar 27, 2024

Looking at Sourcegraph makes me think nobody calls Copy: https://sourcegraph.com/search?q=context:global+lang:go+%22btf.Copy%22&patternType=keyword&sm=0

@lmb lmb marked this pull request as ready for review March 27, 2024 09:59
@lmb lmb requested review from dylandreimerink and a team as code owners March 27, 2024 09:59
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, just the one doc comment

btf/types.go Outdated Show resolved Hide resolved
lmb added 4 commits March 27, 2024 14:09
BTF encodes C language constructs such as const, restrict, etc. as
separate types. In most cases we don't care about these when traversing
a type graph. Users can strip these qualifiers by invoking Copy() with a
"transformer" aka UnderlyingType(). Unfortunately, copying is quite
expensive.

For this reason we changed CO-RE relocations to use an unexported as()
function that can be used to "unwrap" a type instead of copying it.
Think of it as a generic version of UnderlyingType. This is much faster
than copying and turns out to be quite ergonomic as well, since we often
have to assert a type anyways.

Export As() so that external users can benefit from it. Usage is like so:

    foo, ok := As[*Int](typ)
    if !ok {
        panic("not an Int")
    }

Remove the Transformer type and the extra argument from Copy, since As
is the better way to deal with qualifiers and typedefs. This allows
further simplifying Copy in a follow up commit.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
children is a much better name than walkType, so let's use that.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Rewrite Copy() to use recursion instead of a manually mantained
stack.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Throw out the painstakingly hand optimized postorder iterator in favour of
a simple recursive function. Turns out this can be a lot faster for larger
types, probably because the visited map can be allocated on the stack.

There is a small hit to very simple types, but it doesn't seem to affect
overall benchmarks too much.

    core: 1
    goos: linux
    goarch: amd64
    pkg: github.com/cilium/ebpf/btf
    cpu: 12th Gen Intel(R) Core(TM) i7-1260P
                                        │ base-po.txt │           recursive.txt            │
                                        │   sec/op    │   sec/op     vs base               │
    PostorderTraversal/single_type           32.22n ± 0%   39.69n ± 0%  +23.18% (p=0.002 n=6)
    PostorderTraversal/cycle(1)              576.3n ± 3%   112.3n ± 0%  -80.51% (p=0.002 n=6)
    PostorderTraversal/cycle(10)             2.807µ ± 1%   1.578µ ± 1%  -43.80% (p=0.002 n=6)
    PostorderTraversal/gov_update_cpu_data   2.039m ± 1%   1.795m ± 1%  -11.97% (p=0.002 n=6)
    geomean                                  3.211µ        1.885µ       -41.30%

                                        │  base-po.txt   │              recursive.txt               │
                                        │      B/op      │     B/op      vs base                    │
    PostorderTraversal/single_type             0.000 ± 0%       0.000 ± 0%         ~ (p=1.000 n=6) ¹
    PostorderTraversal/cycle(1)                264.0 ± 0%         0.0 ± 0%  -100.00% (p=0.002 n=6)
    PostorderTraversal/cycle(10)               716.5 ± 0%       326.0 ± 0%   -54.50% (p=0.002 n=6)
    PostorderTraversal/gov_update_cpu_data   345.1Ki ± 0%     334.8Ki ± 0%    -2.98% (p=0.002 n=6)
    geomean                                               ²                 ?                      ² ³
    ¹ all samples are equal
    ² summaries must be >0 to compute geomean
    ³ ratios must be >0 to compute geomean

                                        │ base-po.txt  │             recursive.txt              │
                                        │  allocs/op   │ allocs/op   vs base                    │
    PostorderTraversal/single_type           0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=6) ¹
    PostorderTraversal/cycle(1)              4.000 ± 0%     0.000 ± 0%  -100.00% (p=0.002 n=6)
    PostorderTraversal/cycle(10)             7.000 ± 0%     1.000 ± 0%   -85.71% (p=0.002 n=6)
    PostorderTraversal/gov_update_cpu_data   132.0 ± 1%     115.0 ± 1%   -12.88% (p=0.002 n=6)
    geomean                                             ²               ?                      ² ³
    ¹ all samples are equal
    ² summaries must be >0 to compute geomean
    ³ ratios must be >0 to compute geomean

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
@lmb lmb dismissed dylandreimerink’s stale review March 27, 2024 14:12

Made the doc change.

@lmb lmb merged commit f631fcc into cilium:main Mar 27, 2024
15 checks passed
@lmb lmb added the breaking-change Changes exported API label Mar 27, 2024
@lmb lmb deleted the btf-recursion branch March 27, 2024 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change Changes exported API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants