stop loading type information of obfuscated dependencies #456

mvdan · 2022-01-10T12:04:47Z

If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does reflect.TypeOf(T2{...}), then P2 won't obfuscate the name T2, and neither should P1.

This kind of information should flow from P2 to P1, as P2 builds before P1 thanks to the dependency graph. The way this is currently done is via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated.

This mechanism has served us well for some time, but it has some significant downsides:

It wastes resources. We load the type information for the entire dependency package, when in practice we just need very small bits of information. I estimate this matters in terms of CPU and memory cost, more than anything.
It's complex. Note how we need KnownObjectFiles for the sake of indirect dependencies, because importcfg files do not contain those object files to feed to the go/types importer.
It makes our code harder to understand. We definitely need the type information for the original code, to inspect it and obfuscate it well. Loading the obfuscated type information as well makes the code rather hard to grasp, as now we have "original type info" and "obfuscated type info".

I propose a different mechanism; following our previous example with P1 and P2:

When we decide not to obfuscate T2 while building P2, we record that in its cachedOutput gob file, much like we do with KnownObjectFiles today.
The format of that entry can be SkippedNames map[string]bool, where the key is T2's fully qualified name, P2.T2
P1 loads P2's cachedOutput file, and thus knows that it shouldn't obfuscate T2
Just like KnownObjectFiles, SkippedNames would accumulate with dependencies - P1's cachedOutput file would include P2.T2. This way, we could get rid of KnownObjectFiles entirely, as we wouldn't need to load cached output files from indirect deps

One disadvantage of this mechanism is that garble would use more space in the build cache, as SkippedNames would be an added extra. On the plus side, we shouldn't be skipping names for obfuscation very often; I expect the number of names to be in the order of dozens per package on average.

It should solve the original three problems with type information, but especially the "wastes resources" one - we would be loading far less information from disk.

The text was updated successfully, but these errors were encountered:

If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes burrowers#456. Updates burrowers#475.

If package P1 imports package P2, P1 needs to know which names from P2 weren't obfuscated. For instance, if P2 declares T2 and does "reflect.TypeOf(T2{...})", then P2 won't obfuscate the name T2, and neither should P1. This information should flow from P2 to P1, as P2 builds before P1. We do this via obfuscatedTypesPackage; P1 loads the type information of the obfuscated version of P2, and does a lookup for T2. If T2 exists, then it wasn't obfuscated. This mechanism has served us well, but it has downsides: 1) It wastes CPU; we load the type information for the entire package. 2) It's complex; for instance, we need KnownObjectFiles as an extra. 3) It makes our code harder to understand, as we load both the original and obfuscated type informaiton. Instead, we now have each package record what names were not obfuscated as part of its cachedOuput file. Much like KnownObjectFiles, the map records incrementally through the import graph, to avoid having to load cachedOutput files for indirect dependencies. We shouldn't need to worry about those maps getting large; we only skip obfuscating declared names in a few uncommon scenarios, such as the use of reflection or cgo's "//export". Since go/types is relatively allocation-heavy, and the export files contain a lot of data, we get a nice speed-up: name old time/op new time/op delta Build-16 11.5s ± 2% 11.1s ± 3% -3.77% (p=0.008 n=5+5) name old bin-B new bin-B delta Build-16 5.15M ± 0% 5.15M ± 0% ~ (all equal) name old cached-time/op new cached-time/op delta Build-16 375ms ± 3% 341ms ± 6% -8.96% (p=0.008 n=5+5) name old sys-time/op new sys-time/op delta Build-16 283ms ±17% 289ms ±13% ~ (p=0.841 n=5+5) name old user-time/op new user-time/op delta Build-16 687ms ± 6% 664ms ± 7% ~ (p=0.548 n=5+5) Fixes #456. Updates #475.

mvdan added the enhancement New feature or request label Jan 10, 2022

mvdan self-assigned this Jan 21, 2022

This was referenced Jan 22, 2022

redesign benchmark to be more useful and realistic #472

Merged

look into garble functions showing up in "perf report" #475

Open

mvdan mentioned this issue Jan 27, 2022

stop loading obfuscated type information from deps #476

Merged

lu4p closed this as completed in #476 Jan 31, 2022

mvdan mentioned this issue Feb 3, 2022

Obfuscation Fails on inet.af/netstack/tcpip/transport/tcp #410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stop loading type information of obfuscated dependencies #456

stop loading type information of obfuscated dependencies #456

mvdan commented Jan 10, 2022

stop loading type information of obfuscated dependencies #456

stop loading type information of obfuscated dependencies #456

Comments

mvdan commented Jan 10, 2022