Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: consolidate BPF map documentation in concepts/ebpf/intro.rst #12183

Merged
merged 1 commit into from Jun 18, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
79 changes: 0 additions & 79 deletions Documentation/concepts/ebpf/intro.rst
Expand Up @@ -134,85 +134,6 @@ by Cilium. Below we show the following possible flows connecting endpoints on a
node, ingress to an endpoint, and endpoint to egress networking device. In each case
there is an additional diagram showing the TCP accelerated path available when socket layer enforcement is enabled.

Scale
=====

.. _bpf_map_limitations:

BPF Map Limitations
-------------------

All BPF maps are created with upper capacity limits. Insertion beyond the limit
will fail and thus limits the scalability of the datapath. The following table
shows the default values of the maps. Each limit can be bumped in the source
code. Configuration options will be added on request if demand arises.

======================== ================ =============== =====================================================
Map Name Scope Default Limit Scale Implications
======================== ================ =============== =====================================================
Connection Tracking node or endpoint 1M TCP/256k UDP Max 1M concurrent TCP connections, max 256k expected UDP answers
NAT node 512k Max 512k NAT entries
Neighbor Table node 512k Max 512k neighbor entries
Endpoints node 64k Max 64k local endpoints + host IPs per node
IP cache node 512k Max 256k endpoints (IPv4+IPv6), max 512k endpoints (IPv4 or IPv6) across all clusters
Load Balancer node 64k Max 64k cumulative backends across all services across all clusters
Policy endpoint 16k Max 16k allowed identity + port + protocol pairs for specific endpoint
Proxy Map node 512k Max 512k concurrent redirected TCP connections to proxy
Tunnel node 64k Max 32k nodes (IPv4+IPv6) or 64k nodes (IPv4 or IPv6) across all clusters
IPv4 Fragmentation node 8k Max 8k fragmented datagrams in flight simultaneously on the node
Session Affinity node 64k Max 64k affinities from different clients
======================== ================ =============== =====================================================

For some BPF maps, the upper capacity limit can be overridden using command
line options for ``cilium-agent``. A given capacity can be set using
``--bpf-ct-global-tcp-max``, ``--bpf-ct-global-any-max``,
``--bpf-nat-global-max``, ``--bpf-neigh-global-max``, ``--bpf-policy-map-max``,
and ``--bpf-fragments-map-max``.

Using the ``--bpf-map-dynamic-size-ratio`` flag, the upper capacity limits of
several large BPF maps are determined at agent startup based on the given ratio
of the total system memory. For example, a given ratio of 0.0025 leads to 0.25%
of the total system memory to be used for these maps.

This flag affects the following BPF maps that consume most memory in the system:
``cilium_ct_{4,6}_global``, ``cilium_ct_{4,6}_any``,
``cilium_nodeport_neigh{4,6}``, ``cilium_snat_v{4,6}_external`` and
``cilium_lb{4,6}_reverse_sk``.

``kube-proxy`` sets as the maximum number entries in the linux's connection
tracking table based on the number of cores the machine has. ``kube-proxy`` has
a default of ``32768`` maximum entries per core with a minimum of ``131072``
entries regardless of the number of cores the machine has.

Cilium has its own connection tracking tables as BPF Maps and the number of
entries of such maps is calculated based on the amount of total memory in the
node with a minimum of ``131072`` entries regardless the amount of memory the
machine has.

The following table presents the value that ``kube-proxy`` and Cilium sets for
their own connection tracking tables when Cilium is configured with
``--bpf-map-dynamic-size-ratio: 0.0025``.

+------+--------------+-----------------------+-------------------+
| vCPU | Memory (GiB) | Kube-proxy CT entries | Cilium CT entries |
+------+--------------+-----------------------+-------------------+
| 1 | 3.75 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 2 | 7.5 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 4 | 15 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 8 | 30 | 262144 | 284560 |
+------+--------------+-----------------------+-------------------+
| 16 | 60 | 524288 | 569120 |
+------+--------------+-----------------------+-------------------+
| 32 | 120 | 1048576 | 1138240 |
+------+--------------+-----------------------+-------------------+
| 64 | 240 | 2097152 | 2276480 |
+------+--------------+-----------------------+-------------------+
| 96 | 360 | 3145728 | 4552960 |
+------+--------------+-----------------------+-------------------+

Kubernetes Integration
======================

Expand Down
59 changes: 51 additions & 8 deletions Documentation/concepts/ebpf/maps.rst
Expand Up @@ -4,8 +4,10 @@
Please use the official rendered version released here:
https://docs.cilium.io

Maps
====
.. _bpf_map_limitations:

BPF Maps
========

All BPF maps are created with upper capacity limits. Insertion beyond the limit
will fail and thus limits the scalability of the datapath. The following table
Expand All @@ -17,22 +19,63 @@ Map Name Scope Default Limit Scale Implications
======================== ================ =============== =====================================================
Connection Tracking node or endpoint 1M TCP/256k UDP Max 1M concurrent TCP connections, max 256k expected UDP answers
NAT node 512k Max 512k NAT entries
Neighbor Table node 512k Max 512k neighbor entries
Endpoints node 64k Max 64k local endpoints + host IPs per node
IP cache node 512k Max 256k endpoints (IPv4+IPv6), max 512k endpoints (IPv4 or IPv6) across all clusters
Load Balancer node 64k Max 64k cumulative backends across all services across all clusters
Policy endpoint 16k Max 16k allowed identity + port + protocol pairs for specific endpoint
Proxy Map node 512k Max 512k concurrent redirected TCP connections to proxy
Tunnel node 64k Max 32k nodes (IPv4+IPv6) or 64k nodes (IPv4 or IPv6) across all clusters
IPv4 Fragmentation node 8k Max 8k fragmented datagrams in flight simultaneously on the node
Session Affinity node 64k Max 64k affinities from different clients
======================== ================ =============== =====================================================

For some BPF maps, the upper capacity limit can be overridden using command
line options for ``cilium-agent``. A given capacity can be set using
``--bpf-ct-global-tcp-max``, ``--bpf-ct-global-any-max``,
``--bpf-nat-global-max``, ``--bpf-policy-map-max``, and
``--bpf-fragments-map-max``.
``--bpf-nat-global-max``, ``--bpf-neigh-global-max``, ``--bpf-policy-map-max``,
and ``--bpf-fragments-map-max``.

Using the ``--bpf-map-dynamic-size-ratio`` flag, the upper capacity limits of
several large BPF maps are determined at agent startup based on the given ratio
of the total system memory. For example, a given ratio of 0.0025 leads to 0.25%
of the total system memory to be used for these maps.

This flag affects the following BPF maps that consume most memory in the system:
``cilium_ct_{4,6}_global``, ``cilium_ct_{4,6}_any``,
``cilium_nodeport_neigh{4,6}``, ``cilium_snat_v{4,6}_external`` and
``cilium_lb{4,6}_reverse_sk``.

``kube-proxy`` sets as the maximum number entries in the linux's connection
tracking table based on the number of cores the machine has. ``kube-proxy`` has
a default of ``32768`` maximum entries per core with a minimum of ``131072``
entries regardless of the number of cores the machine has.

Cilium has its own connection tracking tables as BPF Maps and the number of
entries of such maps is calculated based on the amount of total memory in the
node with a minimum of ``131072`` entries regardless the amount of memory the
machine has.

The following table presents the value that ``kube-proxy`` and Cilium sets for
their own connection tracking tables when Cilium is configured with
``--bpf-map-dynamic-size-ratio: 0.0025``.

Using ``--bpf-map-dynamic-size-ratio`` the upper capacity limits of the
connection tracking, NAT, and policy maps are determined at agent startup based
on the given ratio of the total system memory. For example a given ratio of 0.03
leads to 3% of the total system memory to be used for these maps.
+------+--------------+-----------------------+-------------------+
| vCPU | Memory (GiB) | Kube-proxy CT entries | Cilium CT entries |
+------+--------------+-----------------------+-------------------+
| 1 | 3.75 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 2 | 7.5 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 4 | 15 | 131072 | 131072 |
+------+--------------+-----------------------+-------------------+
| 8 | 30 | 262144 | 284560 |
+------+--------------+-----------------------+-------------------+
| 16 | 60 | 524288 | 569120 |
+------+--------------+-----------------------+-------------------+
| 32 | 120 | 1048576 | 1138240 |
+------+--------------+-----------------------+-------------------+
| 64 | 240 | 2097152 | 2276480 |
+------+--------------+-----------------------+-------------------+
| 96 | 360 | 3145728 | 4552960 |
+------+--------------+-----------------------+-------------------+