Skip to content

Commit

Permalink
Update xdp.md
Browse files Browse the repository at this point in the history
  • Loading branch information
shard77 committed May 30, 2024
1 parent 3dfcf85 commit 471d5b4
Showing 1 changed file with 98 additions and 51 deletions.
149 changes: 98 additions & 51 deletions docs/book/programs/xdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,23 +6,36 @@


## What is XDP in eBPF?
XDP (eXpress Data Path) is a type of eBPF program that attaches to the network interface. It enables filtering, manipulation and refirection of network packets as soon as they are received from the network driver, even before they enter the Linux kernel networking stack, resulting in low latency and high throughput.
XDP (eXpress Data Path) is a type of eBPF program that attaches to the network interface.
It enables filtering, manipulation and refirection of network packets
as soon as they are received from the network driver,
even before they enter the Linux kernel networking stack, resulting in low latency and high throughput.

The idea behind XDP is to add an early hook in the `RX` path of the kernel, and let a user supplied eBPF program decide the fate of the packet. The hook is placed in the NIC driver just after the interrupt processing, and before any memory allocation needed by the network stack itself.
The idea behind XDP is to add an early hook in the `RX` path of the kernel,
and let a user supplied eBPF program decide the fate of the packet.
The hook is placed in the NIC driver just after the interrupt processing,
and before any memory allocation needed by the network stack itself.

The XDP program is allowed to edit the packet data and, after the XDP program returns, an action code determines what to do with the packet:
The XDP program is allowed to edit the packet data and,
after the XDP program returns, an action code determines what to do with the packet:

* `XDP_PASS`: let the packet continue through the network stack
* `XDP_DROP`: silently drop the packet
* `XDP_ABORTED`: drop the packet with trace point exception
* `XDP_TX`: bounce the packet back to the same NIC it arrived on
* `XDP_REDIRECT`: redirect the packet to another NIC or user space socket via the [`AF_XDP`](https://www.kernel.org/doc/html/latest/networking/af_xdp.html) address family
* `XDP_REDIRECT`: redirect the packet to another NIC or user space socket via the
[`AF_XDP`](https://www.kernel.org/doc/html/latest/networking/af_xdp.html) address family

## AF_XDP
Along with XDP, a new address familiy entered in the Linux kernel, starting at 4.18.
`AF_XDP`, formerly known as `AF_PACKETv4` (which was never included in the mainline kernel), is a raw socket optimized for high performance packet processing and allows zero-copy between kernel and applications. As the socket can be used for both receiving and transmitting, it supports high performance network applications purely in user-space.
`AF_XDP`, formerly known as `AF_PACKETv4` (which was never included in the mainline kernel),
is a raw socket optimized for high performance packet processing and
allows zero-copy between kernel and applications.
As the socket can be used for both receiving and transmitting,
it supports high performance network applications purely in user-space.

If you want a more extensive explanation about `AF_XDP`, you can find it in the [kernel documentation](https://www.kernel.org/doc/html/latest/networking/af_xdp.html).
If you want a more extensive explanation about `AF_XDP`,
you can find it in the [kernel documentation](https://www.kernel.org/doc/html/latest/networking/af_xdp.html).

## XDP Operation Modes
You can connect an XDP program to an interface using the following modes:
Expand All @@ -46,28 +59,28 @@ You can connect an XDP program to an interface using the following modes:

A list of drivers supporting native XDP can be found in the table below:

|Vendor|Driver|XDP Support|
|------|------|-----------|
|Amazon|ena |>=5.6 |
|Broadcom|bnxt_en|>=4.11 |
|Cavium|thunderx|>=4.12 |
|Freescale|dpaa2|>=5.0 |
|Intel |ixgbe |>=4.12 |
|Intel |ixgbevf|>=4.17 |
|Intel |i40e |>=4.13 |
|Intel |ice |>=5.5 |
|Marvell|mvneta|>=5.5 |
|Mellanox|mlx4|>=4.8 |
|Mellanox|mlx5|>=4.9 |
|Microsoft|hv_netvsc|>=5.6|
|Netronome|nfp|>=4.10 |
|Others|virtio_net|>=4.10 |
|Others|tun/tap|>=4.14 |
|Others|bond|>=5.15 |
|Qlogic|qede|>=4.10 |
|Socionext|netsec|>=5.3 |
|Solarflare|sfc|>=5.5 |
|Texas Instruments|cpsw|>=5.3|
| Vendor | Driver | XDP Support |
| ----------------- | ---------- | ----------- |
| Amazon | ena | >=5.6 |
| Broadcom | bnxt_en | >=4.11 |
| Cavium | thunderx | >=4.12 |
| Freescale | dpaa2 | >=5.0 |
| Intel | ixgbe | >=4.12 |
| Intel | ixgbevf | >=4.17 |
| Intel | i40e | >=4.13 |
| Intel | ice | >=5.5 |
| Marvell | mvneta | >=5.5 |
| Mellanox | mlx4 | >=4.8 |
| Mellanox | mlx5 | >=4.9 |
| Microsoft | hv_netvsc | >=5.6 |
| Netronome | nfp | >=4.10 |
| Others | virtio_net | >=4.10 |
| Others | tun/tap | >=4.14 |
| Others | bond | >=5.15 |
| Qlogic | qede | >=4.10 |
| Socionext | netsec | >=5.3 |
| Solarflare | sfc | >=5.5 |
| Texas Instruments | cpsw | >=5.3 |

You can use the following command to check your interface's network driver name:
`ethtool -i <interface>`.
Expand All @@ -89,9 +102,12 @@ cargo generate --name simple-xdp-program -d program_type=xdp https://github.com/
```

### Creating the eBPF Component
First, we must create the eBPF component for our program, in this component, we will decide what to do with the incoming packets.
First, we must create the eBPF component for our program,
in this component, we will decide what to do with the incoming packets.

Since we want to drop the incoming packets from certain IPs, we are going to use the `XDP_DROP` action code whenever the IP is in our blacklist, and everything else will be treated with the `XDP_PASS` action code.
Since we want to drop the incoming packets from certain IPs,
we are going to use the `XDP_DROP` action code whenever the IP is in our blacklist,
and everything else will be treated with the `XDP_PASS` action code.

```rust
#![no_std]
Expand All @@ -115,7 +131,8 @@ use network_types::{

We import the necessary dependencies:

* `aya_ebpf`: For XDP actions (`bindings::xdp_action`), the XDP context struct `XdpContext` (`programs:XdpContext`), map definitions (for our HashMap) and XDP program macros (`macros::{map, xdp}`)
* `aya_ebpf`: For XDP actions (`bindings::xdp_action`), the XDP context struct `XdpContext` (`programs:XdpContext`),
map definitions (for our HashMap) and XDP program macros (`macros::{map, xdp}`)
* `aya_log_ebpf`: For logging within the eBPF program
* `core::mem`: For memory manipulation
* `network_types`: For Ethernet and IP header definitions
Expand All @@ -132,14 +149,16 @@ fn panic(_info: &core::panic::PanicInfo) -> ! {
}
```

An eBPF-compatible panic handler is provided because eBPF programs cannot use the default panic behavior.
An eBPF-compatible panic handler is provided because
eBPF programs cannot use the default panic behavior.

```rust
#[map]
static BLOCKLIST: HashMap<u32, u32> = HashMap::<u32, u32>::with_max_entries(1024, 0);
```

Here, we define our blocklist with a `HashMap`, which stores integers (u32), with a maximum of 1024 entries.
Here, we define our blocklist with a `HashMap`,
which stores integers (u32), with a maximum of 1024 entries.

```rust
#[xdp]
Expand All @@ -151,7 +170,9 @@ pub fn xdp_firewall(ctx: XdpContext) -> u32 {
}
```

The `xdp_firewall` function (picked up in user-space) accepts an `XdpContext` and returns a `u32`. It delegates the main packet processing logic to the `try_xdp_firewall` function. If an error occurs, the function returns `xdp_action::XDP_ABORTED` (which is equal to the u32 `0`).
The `xdp_firewall` function (picked up in user-space) accepts an `XdpContext` and returns a `u32`.
It delegates the main packet processing logic to the `try_xdp_firewall` function.
If an error occurs, the function returns `xdp_action::XDP_ABORTED` (which is equal to the u32 `0`).

```rust
#[inline(always)]
Expand All @@ -169,7 +190,11 @@ unsafe fn ptr_at<T>(ctx: &XdpContext, offset: usize) -> Result<*const T, ()> {
}
```

Our `ptr_at` function is designed to provide safe access to a generic type `T` within an `XdpContext` at a specified offset. It performs bounds checking by comparing the desired memory range (`start + offset + len`) against the end of the data (`end`). If the access is within bounds, it returns a pointer to the specified type; otherwise, it returns an error. We are going to use this function to retrieve data from the `XdpContext`.
Our `ptr_at` function is designed to provide safe access to a generic type `T`
within an `XdpContext` at a specified offset.
It performs bounds checking by comparing the desired memory range (`start + offset + len`) against the end of the data (`end`).
If the access is within bounds, it returns a pointer to the specified type; otherwise,
it returns an error. We are going to use this function to retrieve data from the `XdpContext`.

```rust

Expand Down Expand Up @@ -200,13 +225,20 @@ fn try_xdp_firewall(ctx: XdpContext) -> Result<u32, ()> {

The `block_ip` function checks if a given IP address (address) exists in the blocklist.

As said before, the `try_xdp_firewall` contains the main logic for our firewall. We first retrieve the Ethernet header from the `XdpContext` with the `ptr_at` function, the header is located at the beginning of the `XdpContext`, therefore we use `0` as an offset.
As said before, the `try_xdp_firewall` contains the main logic for our firewall.
We first retrieve the Ethernet header from the `XdpContext` with the `ptr_at` function,
the header is located at the beginning of the `XdpContext`, therefore we use `0` as an offset.

If the packet is not IPv4 (`ether_type` check), the function returns `xdp_action::XDP_PASS` and allows the packet to pass through the network stack.
If the packet is not IPv4 (`ether_type` check), the function returns `xdp_action::XDP_PASS` and
allows the packet to pass through the network stack.

`ipv4hdr` is used to retrieve the IPv4 header, `source` is used to store the source IP address from the IPv4 header. We then compare the IP address with those that are in our blocklist using the `block_ip` function we created earlier. If `block_ip` matches, meaning that the IP is in the blocklist, we use the `XDP_DROP` action code so that it doesn't get through the network stack, otherwise we let it pass with the `XDP_PASS` action code.
`ipv4hdr` is used to retrieve the IPv4 header, `source` is used to store the source IP address from the IPv4 header.
We then compare the IP address with those that are in our blocklist using the `block_ip` function we created earlier.
If `block_ip` matches, meaning that the IP is in the blocklist, we use the `XDP_DROP` action code so that it doesn't
get through the network stack, otherwise we let it pass with the `XDP_PASS` action code.

Lastly, we log the activity, `SRC` is the source IP address and `ACTION` is the action code that has been used on it. We then return `Ok(action)` as a result.
Lastly, we log the activity, `SRC` is the source IP address and `ACTION`
is the action code that has been used on it. We then return `Ok(action)` as a result.

The full code:
```rust
Expand Down Expand Up @@ -286,9 +318,11 @@ fn try_xdp_firewall(ctx: XdpContext) -> Result<u32, ()> {
### Populating our map from user-space
In order to add the addresses to block, we first need to get a reference to the `BLOCKLIST` map.

Once we have it, it's simply a case of calling `ip_blocklist.insert()` to insert the ips into the blocklist.
Once we have it, it's simply a case of calling `ip_blocklist.insert()`
to insert the ips into the blocklist.

We'll use the `IPv4Addr` type to represent our IP address as it's human-readable and can be easily converted to a u32.
We'll use the `IPv4Addr` type to represent our IP address as
it's human-readable and can be easily converted to a u32.

We'll block all traffic originating from `1.1.1.1` in this example.

Expand Down Expand Up @@ -323,15 +357,19 @@ use tokio::signal;
```

* `anyhow::Context`: Provides additional context for error handling
* `aya`: Provides the Bpf structure and related functions for loading eBPF programs, as well as the XDP program and its flags (`aya::programs::{Xdp, XdpFlags}`)
* `aya`: Provides the Bpf structure and related functions for loading eBPF programs,
as well as the XDP program and its flags (`aya::programs::{Xdp, XdpFlags}`)
* `aya_log::BpfLogger`: For logging within the eBPF program
* `clap::Parser`: Provides argument parsing
* `log::{info, warn}`: The [logging library](https://docs.rs/log/latest/log/index.html) we use for informational and warning messages
* `log::{info, warn}`: The [logging library](https://docs.rs/log/latest/log/index.html)
we use for informational and warning messages
* `std::net::Ipv4Addr`: A struct to work with IPv4 addresses
* `tokio::signal`: For handling signals asynchronously, see [this link](https://docs.rs/tokio/latest/tokio/signal/) for more information

!!! note
`aya::Bpf` is deprecated since version `0.13.0` and `aya_log:BpfLogger` since `0.2.1`. Use [`aya::Ebpf`](https://docs.aya-rs.dev/aya/struct.ebpf) and [`aya_log:EbpfLogger`](https://docs.aya-rs.dev/aya_log/struct.ebpflogger) instead if you are using the more recent versions.
`aya::Bpf` is deprecated since version `0.13.0` and `aya_log:BpfLogger` since `0.2.1`.
Use [`aya::Ebpf`](https://docs.aya-rs.dev/aya/struct.ebpf) and
[`aya_log:EbpfLogger`](https://docs.aya-rs.dev/aya_log/struct.ebpflogger) instead if you are using the more recent versions.

#### Defining command-line arguments

Expand All @@ -343,7 +381,8 @@ struct Opt {
}
```

A simple struct is defined for command-line parsing using [clap's derive feature](https://docs.rs/clap/latest/clap/_derive/index.html), with the optional argument `iface` to provide our network interface name.
A simple struct is defined for command-line parsing using [clap's derive feature](https://docs.rs/clap/latest/clap/_derive/index.html),
with the optional argument `iface` to provide our network interface name.

#### Main Function

Expand Down Expand Up @@ -387,23 +426,29 @@ async fn main() -> Result<(), anyhow::Error> {
```

##### Parsing command-line arguments
Inside the `main` function, we first parse the command-line arguments, using [`Opt::parse()`](https://docs.rs/clap/latest/clap/trait.Parser.html#method.parse) and the struct defined earlier.
Inside the `main` function, we first parse the command-line arguments,
using [`Opt::parse()`](https://docs.rs/clap/latest/clap/trait.Parser.html#method.parse) and the struct defined earlier.

##### Initializing environment logging
Logging is initialized using [`env_logger::init()`](https://docs.rs/env_logger/latest/env_logger/fn.init.html), we will make use of the environment logger later in our code.
Logging is initialized using [`env_logger::init()`](https://docs.rs/env_logger/latest/env_logger/fn.init.html),
we will make use of the environment logger later in our code.

##### Loading the eBPF program
The eBPF program is loaded using `Bpf::load()`, choosing the debug or release version based on the build configuration (`debug_assertions`).
The eBPF program is loaded using `Bpf::load()`, choosing the debug or
release version based on the build configuration (`debug_assertions`).

##### Loading and attaching our XDP
The XDP program named `xdp_firewall` is retrieved from the eBPF program we defined earlier using `bpf.program_mut()`. The XDP program is then loaded and attached to our network interface.
The XDP program named `xdp_firewall` is retrieved from the eBPF program
we defined earlier using `bpf.program_mut()`.
The XDP program is then loaded and attached to our network interface.

##### Setting up the ip blocklist
The IP blocklist (`BLOCKLIST` map) is loaded from the eBPF program and converted to a `HashMap`.
The IP `1.1.1.1` is added to the blocklist.

##### Waiting for the exit signal
The program awais the `CTRL+C` signal asynchronously using `signal::ctrl_c().await`, once received, it logs an exit message and returns `Ok(())`.
The program awais the `CTRL+C` signal asynchronously using `signal::ctrl_c().await`,
once received, it logs an exit message and returns `Ok(())`.

#### Full user-space code

Expand Down Expand Up @@ -466,4 +511,6 @@ async fn main() -> Result<(), anyhow::Error> {
```

### Running our program!
Now that we have all the pieces for our eBPF program, we can run it using: `RUST_LOG=info cargo xtask run` or `RUST_LOG=info cargo xtask run -- --iface <interface>` if you want to provide another network interface name, note that you can also use `cargo xtask run` without the rest, but you won't get any logging.
Now that we have all the pieces for our eBPF program, we can run it using: `RUST_LOG=info cargo xtask run`
or `RUST_LOG=info cargo xtask run -- --iface <interface>` if you want to provide another network interface name,
note that you can also use `cargo xtask run` without the rest, but you won't get any logging.

0 comments on commit 471d5b4

Please sign in to comment.