Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
libselinux: rework selabel_file(5) database
Currently the database for file backend of selabel stores the file context specifications in a single long array. This array is sorted by special precedence rules, e.g. regular expressions without meta character first, ordered by length, and the remaining regular expressions ordered by stem (the prefix part of the regular expressions without meta characters) length. This results in suboptimal lookup performance for two reasons; File context specifications without any meta characters (e.g. '/etc/passwd') are still matched via an expensive regular expression match operation. All such trivial regular expressions are matched against before any non- trivial regular expression, resulting in thousands of regex match operations for lookups for paths not matching any of the trivial ones. Rework the internal representation of the database in two ways: Convert regular expressions without any meta characters and containing only supported escaped characters (e.g. '/etc/rc\.d/init\.d') into literal strings, which get compared via strcmp(3) later on. Store the specifications in a tree structure (since the filesystem is a tree) to reduce the to number of specifications that need to be checked. Since the internal representation is completely rewritten introduce a new compiled file context file format mirroring the tree structure. The new format also stores all multi-byte data in network byte-order, so that such compiled files can be cross-compiled, e.g. for embedded devices with read-only filesystems (except for the regular expressions, which are still architecture-dependent). The improved lookup performance will also benefit SELinux aware daemons, which create files with their default context, e.g. systemd. # Performance data ## Compiled file context sizes Fedora 38 (regular expressions are omitted on Fedora): file_contexts.bin: 596783 -> 575284 (bytes) file_contexts.homedirs.bin: 21219 -> 18185 (bytes) Debian Sid (regular expressions are included): file_contexts.bin: 2580704 -> 1428354 (bytes) file_contexts.homedirs.bin: 130946 -> 96884 (bytes) ## Single lookup (selabel -b file -k /bin/bash) Fedora 38 in VM: text: time: 3.6 ms -> 4.7 ms peak heap: 2.32M -> 1.44M peak rss: 5.61M -> 6.03M compiled: time: 1.5 ms -> 1.5 ms peak heap: 2.14M -> 917.93K peak rss: 5.33M -> 5.47M Debian Sid on Raspberry Pi 3: text: time: 33.9 ms -> 19.9 ms peak heap: 10.46M -> 468.72K peak rss: 9.44M -> 4.98M compiled: time: 39.3 ms -> 22.8 ms peak heap: 13.09M -> 1.86M peak rss: 12.57M -> 7.86M ## Full filesystem relabel (restorecon -vRn /) Fedora 38 in VM: 27.445 s -> 3.293 s Debian Sid on Raspberry Pi 3: 86.734 s -> 10.810 s (restorecon -vRn -T0 /) Fedora 38 in VM (8 cores): 29.205 s -> 2.521 s Debian Sid on Raspberry Pi 3 (4 cores): 46.974 s -> 10.728 s (note: I am unsure why the parallel runs on Fedora are slower) # TODO There might be subtle differences in lookup results which evaded my testing, because some precedence rules are oblique. For example `/usr/(.*/)?lib(/.*)?` has to have a higher precedence than `/usr/(.*/)?bin(/.*)?` to match the current Fedora behavior. Please report any behavior changes. If any code section is unclear I am happy to add some inline comments. The maximum node depth in the database is set to 3, which seems to give the best performance to memory usage ratio. Might be tweaked for systems with different filesystem hierarchies (Android?). I am not that familiar with the selabel_partial_match(3), selabel_get_digests_all_partial_matches(3) and selabel_hash_all_partial_matches(3) related interfaces, so I only did some rudimentary tests for them. Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
- Loading branch information