diff --git a/_vale/Docker/Acronyms.yml b/_vale/Docker/Acronyms.yml index 8bad91e72b83..512319cdf373 100644 --- a/_vale/Docker/Acronyms.yml +++ b/_vale/Docker/Acronyms.yml @@ -17,6 +17,7 @@ exceptions: - AWS - BIOS - BPF + - BSD - CI - CISA - CLI @@ -73,6 +74,7 @@ exceptions: - NFS - NOTE - NTLM + - NUMA - NVDA - OCI - OS diff --git a/_vale/config/vocabularies/Docker/accept.txt b/_vale/config/vocabularies/Docker/accept.txt index b5ec8d271386..205fcac3ab8d 100644 --- a/_vale/config/vocabularies/Docker/accept.txt +++ b/_vale/config/vocabularies/Docker/accept.txt @@ -20,8 +20,8 @@ Couchbase Datadog Ddosify Debootstrap -Dev Environments? Dev +Dev Environments? Django Docker Build Cloud Docker Business @@ -73,8 +73,8 @@ Nuxeo OAuth OTel Okta -Paketo PKG +Paketo Postgres PowerShell Python @@ -98,8 +98,9 @@ WireMock Zscaler Zsh [Aa]utobuild -[Bb]uildx +[Aa]llowlist [Bb]uildpack(s)? +[Bb]uildx [Cc]odenames? [Cc]ompose [Dd]istroless @@ -134,6 +135,10 @@ Zsh [Ss]ysfs [Tt]oolchains? [Uu]narchived? +[Uu]ngated +[Uu]ntrusted +[Uu]serland +[Uu]serspace [Vv]irtiofs [Vv]irtualize [Ww]alkthrough @@ -178,8 +183,5 @@ systemd tmpfs ufw umask -ungated -userland -untrusted vSphere vpnkit diff --git a/content/manuals/build/cache/garbage-collection.md b/content/manuals/build/cache/garbage-collection.md index d0b1e59b8237..9e35068d24cf 100644 --- a/content/manuals/build/cache/garbage-collection.md +++ b/content/manuals/build/cache/garbage-collection.md @@ -8,36 +8,155 @@ aliases: While [`docker builder prune`](/reference/cli/docker/builder/prune.md) or [`docker buildx prune`](/reference/cli/docker/buildx/prune.md) -commands run at once, garbage collection runs periodically and follows an -ordered list of prune policies. +commands run at once, Garbage Collection (GC) runs periodically and follows an +ordered list of prune policies. The BuildKit daemon clears the build cache when +the cache size becomes too big, or when the cache age expires. -Garbage collection runs in the BuildKit daemon. The daemon clears the build -cache when the cache size becomes too big, or when the cache age expires. The -following sections describe how you can configure both the size and age -parameters by defining garbage collection policies. +For most users, the default GC behavior is sufficient and doesn't require any +intervention. Advanced users, particularly those working with large-scale +builds, self-managed builders, or constrained storage environments, might +benefit from customizing these settings to better align with their workflow +needs. The following sections explain how GC works and provide guidance on +tailoring its behavior through custom configuration. -Each of the policy's parameters corresponds with a `docker buildx prune` command line -argument. Details can be found in the -`docker buildx prune` [documentation](/reference/cli/docker/buildx/prune.md). +## Garbage collection policies + +GC policies define a set of rules that determine how the build cache is managed +and cleaned up. These policies include criteria for when to remove cache +entries, such as the age of the cache, the amount of space being used, and the +type of cache records to prune. + +Each GC policy is evaluated in sequence, starting with the most specific +criteria, and proceeds to broader rules if previous policies do not free up +enough cache. This lets BuildKit prioritize cache entries, preserving the most +valuable cache while ensuring the system maintains performance and +availability. + +For example, say you have the following GC policies: + +1. Find "stale" cache records that haven't been used in the past 48 hours, and + delete records until there's maximum 5GB of "stale" cache left. +2. If the build cache size exceeds 10GB, delete records until the total cache + size is no more than 10GB. + +The first rule is more specific, prioritizing stale cache records and setting a +lower limit for a less valuable type of cache. The second rule imposes a higher +hard limit that applies to any type of cache records. With these policies, if +you have 11GB worth of build cache, where: + +- 7GB of which is "stale" cache +- 4GB is other, more valuable cache + +A GC sweep would delete 5GB of stale cache as part of the 1st policy, with a +remainder of 6GB, meaning the 2nd policy does not need to clear any more cache. + +The default GC policies are (approximately): + +1. Remove cache that can be easily regenerated, such as build contexts from + local directories or remote Git repositories, and cache mounts, if hasn't + been used for more than 48 hours. +2. Remove cache that hasn't been used in a build for more than 60 days. +3. Remove unshared cache that exceeds the build cache size limit. Unshared + cache records refers to layer blobs that are not used by other resources + (typically, as image layers). +4. Remove any build cache that exceeds the build cache size limit. + +The precise algorithm and the means of configuring the policies differ slightly +depending on what kind of builder you're using. Refer to +[Configuration](#configuration) for more details. ## Configuration -Depending on the [driver](../builders/drivers/_index.md) used by your builder instance, -the garbage collection will use a different configuration file. +> [!NOTE] +> If you're satisfied with the default garbage collection behavior and don't +> need to fine-tune its settings, you can skip this section. Default +> configurations work well for most use cases and require no additional setup. + +Depending on the type of [build driver](../builders/drivers/_index.md) you use, +you will use different configuration files to change the builder's GC settings: + +- If you use the default builder for Docker Engine (the `docker` driver), use + the [Docker daemon configuration file](#docker-daemon-configuration-file). +- If you use a custom builder, use a [BuildKit configuration file](#buildkit-configuration-file). + +### Docker daemon configuration file + +If you're using the default [`docker` driver](../builders/drivers/docker.md), +GC is configured in the [`daemon.json` configuration file](/reference/cli/dockerd.md#daemon-configuration-file), +or if you use Docker Desktop, in [**Settings > Docker Engine**](/manuals/desktop/settings-and-maintenance/settings.md). + +The following snippet shows the default builder configuration for the `docker` +driver for Docker Desktop users: + +```json +{ + "builder": { + "gc": { + "defaultKeepStorage": "20GB", + "enabled": true + } + } +} +``` + +The `defaultKeepStorage` option configures the size limit of the build cache, +which influences the GC policies. The default policies for the `docker` driver +work as follows: + +1. Remove ephemeral, unused build cache older than 48 hours if it exceeds 13.8% + of `defaultKeepStorage`, or at minimum 512MB. +2. Remove unused build cache older than 60 days. +3. Remove unshared build cache that exceeds the `defaultKeepStorage` limit. +4. Remove any build cache that exceeds the `defaultKeepStorage` limit. + +Given the Docker Desktop default value for `defaultKeepStorage` of 20GB, the +default GC policies resolve to: + +```json +{ + "builder": { + "gc": { + "enabled": true, + "policy": [ + { + "keepStorage": "2.764GB", + "filter": [ + "unused-for=48h", + "type==source.local,type==exec.cachemount,type==source.git.checkout" + ] + }, + { "keepStorage": "20GB", "filter": ["unused-for=1440h"] }, + { "keepStorage": "20GB" }, + { "keepStorage": "20GB", "all": true } + ] + } + } +} +``` + +The easiest way to tweak the build cache configuration for the `docker` driver +is to adjust the `defaultKeepStorage` option: + +- Increase the limit if you feel like you think the GC is too aggressive. +- Decrease the limit if you need to preserve space. -If you're using the [`docker` driver](../builders/drivers/docker.md), garbage collection -can be configured in the [Docker Daemon configuration](/reference/cli/dockerd.md#daemon-configuration-file). -file: +If you need even more control, you can define your own GC policies directly. +The following example defines a more conservative GC configuration with the +following policies: + +1. Remove unused cache entries older than 1440 hours, or 60 days, if build cache exceeds 50GB. +2. Remove unshared cache entries if build cache exceeds 50GB. +3. Remove any cache entries if build cache exceeds 100GB. ```json { "builder": { "gc": { "enabled": true, - "defaultKeepStorage": "10GB", + "defaultKeepStorage": "50GB", "policy": [ - { "keepStorage": "10GB", "filter": ["unused-for=2200h"] }, - { "keepStorage": "50GB", "filter": ["unused-for=3300h"] }, + { "keepStorage": "0", "filter": ["unused-for=1440h"] }, + { "keepStorage": "0" }, { "keepStorage": "100GB", "all": true } ] } @@ -45,52 +164,127 @@ file: } ``` -For other drivers, garbage collection can be configured using the -[BuildKit configuration](../buildkit/toml-configuration.md) file: +Policies 1 and 2 here set `keepStorage` to `0`, which means they'll fall back +to the default limit of 50GB as defined by `defaultKeepStorage`. + +### BuildKit configuration file + +For build drivers other than `docker`, GC is configured using a +[`buildkitd.toml`](../buildkit/toml-configuration.md) configuration file. This +file uses the following high-level configuration options that you can use to +tweak the thresholds for how much disk space BuildKit should use for cache: + +| Option | Description | Default value | +| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | +| `reservedSpace` | The minimum amount of disk space BuildKit is allowed to allocate for cache. Usage below this threshold will not be reclaimed during garbage collection. | 10% of total disk space or 10GB (whichever is lower) | +| `maxUsedSpace` | The maximum amount of disk space that BuildKit is allowed to use. Usage above this threshold will be reclaimed during garbage collection. | 60% of total disk space or 100GB (whichever is lower) | +| `minFreeSpace` | The amount of disk space that must be kept free. | 20GB | + +You can set these options either as number of bytes, a unit string (for +example, `512MB`), or as a percentage of the total disk size. Changing these +options influences the default GC policies used by the BuildKit worker. With +the default thresholds, the GC policies resolve as follows: ```toml +# Global defaults [worker.oci] gc = true - gckeepstorage = 10000 - [[worker.oci.gcpolicy]] - keepBytes = 512000000 - keepDuration = 172800 - filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout"] - [[worker.oci.gcpolicy]] - all = true - keepBytes = 1024000000 + reservedSpace = "10GB" + maxUsedSpace = "100GB" + minFreeSpace = "20%" + +# Policy 1 +[[worker.oci.gcpolicy]] + filters = [ "type==source.local", "type==exec.cachemount", "type==source.git.checkout" ] + keepDuration = "48h" + maxUsedSpace = "512MB" + +# Policy 2 +[[worker.oci.gcpolicy]] + keepDuration = "1440h" # 60 days + reservedSpace = "10GB" + maxUsedSpace = "100GB" + +# Policy 3 +[[worker.oci.gcpolicy]] + reservedSpace = "10GB" + maxUsedSpace = "100GB" + +# Policy 4 +[[worker.oci.gcpolicy]] + all = true + reservedSpace = "10GB" + maxUsedSpace = "100GB" ``` -## Default policies - -Default garbage collection policies apply to all builders if not set: - -```text -GC Policy rule#0: - All: false - Filters: type==source.local,type==exec.cachemount,type==source.git.checkout - Keep Duration: 48h0m0s - Keep Bytes: 512MB -GC Policy rule#1: - All: false - Keep Duration: 1440h0m0s - Keep Bytes: 26GB -GC Policy rule#2: - All: false - Keep Bytes: 26GB -GC Policy rule#3: - All: true - Keep Bytes: 26GB +In practical terms, this means: + +- Policy 1: If the build cache exceeds 512MB, BuildKit removes cache records + for local build contexts, remote Git contexts, and cache mounts that haven’t + been used in the last 48 hours. +- Policy 2: If disk usage exceeds 100GB, unshared build cache older than 60 + days is removed, ensuring at least 10GB of disk space is reserved for cache. +- Policy 3: If disk usage exceeds 100GB, any unshared cache is removed, + ensuring at least 10GB of disk space is reserved for cache. +- Policy 4: If disk usage exceeds 100GB, all cache—including shared and + internal records—is removed, ensuring at least 10GB of disk space is reserved + for cache. + +`reservedSpace` has the highest priority in defining the lower limit for build +cache size. If `maxUsedSpace` or `minFreeSpace` would define a lower value, the +minimum cache size would never be brought below `reservedSpace`. + +If both `reservedSpace` and `maxUsedSpace` are set, a GC sweep results in a +cache size between those thresholds. For example, if `reservedSpace` is set to +10GB, and `maxUsedSpace` is set to 20GB, the resulting amount of cache after a +GC run is less than 20GB, but at least 10GB. + +You can also define completely custom GC policies. Custom policies also let you +define filters, which lets you pinpoint the types of cache entries that a given +policy is allowed to prune. + +#### Custom GC policies in BuildKit + +Custom GC policies let you fine-tune how BuildKit manages its cache, and gives +you full control over cache retention based on criteria such as cache type, +duration, or disk space thresholds. If you need full control over the cache +thresholds and how cache records should be prioritized, defining custom GC +policies is the way to go. + +To define a custom GC policy, use the `[[worker.oci.gcpolicy]]` configuration +block in `buildkitd.toml`. Each policy define the thresholds that will be used +for that policy. The global values for `reservedSpace`, `maxUsedSpace`, and +`minFreeSpace` do not apply if you use custom policies. + +Here’s an example configuration: + +```toml +# Custom GC Policy 1: Remove unused local contexts older than 24 hours +[[worker.oci.gcpolicy]] + filters = ["type==source.local"] + keepDuration = "24h" + reservedSpace = "5GB" + maxUsedSpace = "50GB" + +# Custom GC Policy 2: Remove remote Git contexts older than 30 days +[[worker.oci.gcpolicy]] + filters = ["type==source.git.checkout"] + keepDuration = "720h" + reservedSpace = "5GB" + maxUsedSpace = "30GB" + +# Custom GC Policy 3: Aggressively clean all cache if disk usage exceeds 90GB +[[worker.oci.gcpolicy]] + all = true + reservedSpace = "5GB" + maxUsedSpace = "90GB" ``` -- `rule#0`: if build cache uses more than 512MB delete the most easily - reproducible data after it has not been used for 2 days. -- `rule#1`: remove any data not used for 60 days. -- `rule#2`: keep the unshared build cache under cap. -- `rule#3`: if previous policies were insufficient start deleting internal data - to keep build cache under cap. +In addition to the `reservedSpace`, `maxUsedSpace`, and `minFreeSpace` threshold, +when defining a GC policy you have two additional configuration options: -> [!NOTE] -> -> `Keep Bytes` defaults to 10% of the size of the disk. If the disk size cannot -> be determined, it uses 2GB as a fallback. +- `all`: By default, BuildKit will exclude some cache records from being pruned + during GC. Setting this option to `true` will allow any cache records to be + pruned. +- `filters`: Filters let you specify specific types of cache records that a GC + policy is allowed to prune. diff --git a/content/manuals/engine/security/seccomp.md b/content/manuals/engine/security/seccomp.md index 1ea65a0b9d00..094bdbffe0a0 100644 --- a/content/manuals/engine/security/seccomp.md +++ b/content/manuals/engine/security/seccomp.md @@ -26,13 +26,13 @@ protective while providing wide application compatibility. The default Docker profile can be found [here](https://github.com/moby/moby/blob/master/profiles/seccomp/default.json). -In effect, the profile is an allowlist which denies access to system calls by -default, then allowlists specific system calls. The profile works by defining a +In effect, the profile is an allowlist that denies access to system calls by +default and then allows specific system calls. The profile works by defining a `defaultAction` of `SCMP_ACT_ERRNO` and overriding that action only for specific system calls. The effect of `SCMP_ACT_ERRNO` is to cause a `Permission Denied` error. Next, the profile defines a specific list of system calls which are fully allowed, because their `action` is overridden to be `SCMP_ACT_ALLOW`. Finally, -some specific rules are for individual system calls such as `personality`, and others, +some specific rules are for individual system calls such as `personality`, and others, to allow variants of those system calls with specific arguments. `seccomp` is instrumental for running Docker containers with least privilege. It @@ -53,61 +53,61 @@ $ docker run --rm \ Docker's default seccomp profile is an allowlist which specifies the calls that are allowed. The table below lists the significant (but not all) syscalls that -are effectively blocked because they are not on the Allowlist. The table includes +are effectively blocked because they are not on the allowlist. The table includes the reason each syscall is blocked rather than white-listed. -| Syscall | Description | -|---------------------|---------------------------------------------------------------------------------------------------------------------------------------| -| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. | -| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. | -| `bpf` | Deny loading potentially persistent bpf programs into kernel, already gated by `CAP_SYS_ADMIN`. | -| `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | -| `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | -| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE_* flags, except `CLONE_NEWUSER`. | -| `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. | -| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | -| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | -| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. | -| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | -| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | -| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | -| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | -| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | -| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. | -| `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. | -| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | -| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. | -| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | -| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | -| `move_pages` | Syscall that modifies kernel memory and NUMA settings. | -| `nfsservctl` | Deny interaction with the kernel nfs daemon. Obsolete since Linux 3.1. | -| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. | -| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. | -| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulns. | -| `pivot_root` | Deny `pivot_root`, should be privileged operation. | -| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | -| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | +| Syscall | Description | +| ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `acct` | Accounting syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_PACCT`. | +| `add_key` | Prevent containers from using the kernel keyring, which is not namespaced. | +| `bpf` | Deny loading potentially persistent BPF programs into kernel, already gated by `CAP_SYS_ADMIN`. | +| `clock_adjtime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | +| `clock_settime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | +| `clone` | Deny cloning new namespaces. Also gated by `CAP_SYS_ADMIN` for CLONE\_\* flags, except `CLONE_NEWUSER`. | +| `create_module` | Deny manipulation and functions on kernel modules. Obsolete. Also gated by `CAP_SYS_MODULE`. | +| `delete_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | +| `finit_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | +| `get_kernel_syms` | Deny retrieval of exported kernel and module symbols. Obsolete. | +| `get_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | +| `init_module` | Deny manipulation and functions on kernel modules. Also gated by `CAP_SYS_MODULE`. | +| `ioperm` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | +| `iopl` | Prevent containers from modifying kernel I/O privilege levels. Already gated by `CAP_SYS_RAWIO`. | +| `kcmp` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | +| `kexec_file_load` | Sister syscall of `kexec_load` that does the same thing, slightly different arguments. Also gated by `CAP_SYS_BOOT`. | +| `kexec_load` | Deny loading a new kernel for later execution. Also gated by `CAP_SYS_BOOT`. | +| `keyctl` | Prevent containers from using the kernel keyring, which is not namespaced. | +| `lookup_dcookie` | Tracing/profiling syscall, which could leak a lot of information on the host. Also gated by `CAP_SYS_ADMIN`. | +| `mbind` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | +| `mount` | Deny mounting, already gated by `CAP_SYS_ADMIN`. | +| `move_pages` | Syscall that modifies kernel memory and NUMA settings. | +| `nfsservctl` | Deny interaction with the kernel NFS daemon. Obsolete since Linux 3.1. | +| `open_by_handle_at` | Cause of an old container breakout. Also gated by `CAP_DAC_READ_SEARCH`. | +| `perf_event_open` | Tracing/profiling syscall, which could leak a lot of information on the host. | +| `personality` | Prevent container from enabling BSD emulation. Not inherently dangerous, but poorly tested, potential for a lot of kernel vulnerabilities. | +| `pivot_root` | Deny `pivot_root`, should be privileged operation. | +| `process_vm_readv` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | +| `process_vm_writev` | Restrict process inspection capabilities, already blocked by dropping `CAP_SYS_PTRACE`. | | `ptrace` | Tracing/profiling syscall. Blocked in Linux kernel versions before 4.8 to avoid seccomp bypass. Tracing/profiling arbitrary processes is already blocked by dropping `CAP_SYS_PTRACE`, because it could leak a lot of information on the host. | -| `query_module` | Deny manipulation and functions on kernel modules. Obsolete. | -| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. | -| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. | -| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. | -| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | -| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. | -| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | -| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | -| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | -| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | -| `sysfs` | Obsolete syscall. | -| `_sysctl` | Obsolete, replaced by /proc/sys. | -| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | -| `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | -| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. | -| `uselib` | Older syscall related to shared libraries, unused for a long time. | -| `userfaultfd` | Userspace page fault handling, largely needed for process migration. | -| `ustat` | Obsolete syscall. | -| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | -| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | +| `query_module` | Deny manipulation and functions on kernel modules. Obsolete. | +| `quotactl` | Quota syscall which could let containers disable their own resource limits or process accounting. Also gated by `CAP_SYS_ADMIN`. | +| `reboot` | Don't let containers reboot the host. Also gated by `CAP_SYS_BOOT`. | +| `request_key` | Prevent containers from using the kernel keyring, which is not namespaced. | +| `set_mempolicy` | Syscall that modifies kernel memory and NUMA settings. Already gated by `CAP_SYS_NICE`. | +| `setns` | Deny associating a thread with a namespace. Also gated by `CAP_SYS_ADMIN`. | +| `settimeofday` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | +| `stime` | Time/date is not namespaced. Also gated by `CAP_SYS_TIME`. | +| `swapon` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | +| `swapoff` | Deny start/stop swapping to file/device. Also gated by `CAP_SYS_ADMIN`. | +| `sysfs` | Obsolete syscall. | +| `_sysctl` | Obsolete, replaced by /proc/sys. | +| `umount` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | +| `umount2` | Should be a privileged operation. Also gated by `CAP_SYS_ADMIN`. | +| `unshare` | Deny cloning new namespaces for processes. Also gated by `CAP_SYS_ADMIN`, with the exception of `unshare --user`. | +| `uselib` | Older syscall related to shared libraries, unused for a long time. | +| `userfaultfd` | Userspace page fault handling, largely needed for process migration. | +| `ustat` | Obsolete syscall. | +| `vm86` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | +| `vm86old` | In kernel x86 real mode virtual machine. Also gated by `CAP_SYS_ADMIN`. | ## Run without the default seccomp profile @@ -115,6 +115,6 @@ You can pass `unconfined` to run a container without the default seccomp profile. ```console -$ docker run --rm -it --security-opt seccomp=unconfined debian:jessie \ +$ docker run --rm -it --security-opt seccomp=unconfined debian:latest \ unshare --map-root-user --user sh -c whoami ``` diff --git a/content/manuals/engine/storage/drivers/_index.md b/content/manuals/engine/storage/drivers/_index.md index 07ad123cfab2..57bf355e7b81 100644 --- a/content/manuals/engine/storage/drivers/_index.md +++ b/content/manuals/engine/storage/drivers/_index.md @@ -48,7 +48,7 @@ CMD python /app/app.py ``` This Dockerfile contains four commands. Commands that modify the filesystem create -a layer. The `FROM` statement starts out by creating a layer from the `ubuntu:22.04` +a new layer. The `FROM` statement starts out by creating a layer from the `ubuntu:22.04` image. The `LABEL` command only modifies the image's metadata, and doesn't produce a new layer. The `COPY` command adds some files from your Docker client's current directory. The first `RUN` command builds your application using the `make` command, diff --git a/content/reference/compose-file/services.md b/content/reference/compose-file/services.md index 73ec17d5ce9f..f14a48f5a83e 100644 --- a/content/reference/compose-file/services.md +++ b/content/reference/compose-file/services.md @@ -913,12 +913,12 @@ services: common: image: busybox security_opt: - - label:role:ROLE + - label=role:ROLE cli: extends: service: common security_opt: - - label:user:USER + - label=user:USER ``` Produces the following configuration for the `cli` service. @@ -926,8 +926,8 @@ Produces the following configuration for the `cli` service. ```yaml image: busybox security_opt: -- label:role:ROLE -- label:user:USER +- label=role:ROLE +- label=user:USER ``` In case list syntax is used, the following keys should also be treated as sequences: @@ -1736,8 +1736,8 @@ secrets: ```yml security_opt: - - label:user:USER - - label:role:ROLE + - label=user:USER + - label=role:ROLE ``` For further default labeling schemes you can override, see [Security configuration](/reference/cli/docker/container/run.md#security-opt).