New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multipath support with non-zero length Connection IDs #1310
base: master
Are you sure you want to change the base?
Conversation
qlog/src/events/quic.rs
Outdated
}, | ||
|
||
PathAbandon { | ||
identifier_type: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the absence of a qlog spec, I'd expect identifier_type
to be u64. That makes it consistent with varint type in the frame definition.
qlog/src/events/quic.rs
Outdated
}, | ||
|
||
PathStatus { | ||
identifier_type: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the absence of a qlog spec, I'd expect identifier_type
to be u64. That makes it consistent with varint type in the frame definition.
895e259
to
896295e
Compare
be47100
to
3edcdd9
Compare
c730e2d
to
ad2e520
Compare
b892a23
to
30a9dff
Compare
d2dda3d
to
d677228
Compare
Thanks for your work on implementing MPQUIC! I played around with this code and have the suspicion that some packets are not sent on the path that they should be sent on. My setup: I created a mininet topology to test multipath support with the quiche-server and quiche-client applications.
I added My tests showed that, on the server, This is however not true for The quiche logs hint that packets, that are assigned to the packet number space of one path, are sometimes sent on the other path:
Is this behavior expected or is there actually something going wrong? Attachments:
|
@hendrikcech Nice to see you are experimenting with the code :) It seems that
But at server side:
So It actually seems that |
@qdeconinck Thanks for taking a look! I can confirm that these changes to quiche-server resolve the problem. |
apps/src/client.rs
Outdated
let migrate_socket = if args.perform_migration { | ||
let mut socket = | ||
mio::net::UdpSocket::bind(bind_addr.parse().unwrap()).unwrap(); | ||
let mut addrs = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now would be a good opportunity to move the socket construction code into its own method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, per my comment about the args, it might help to think of the possible failure scenarios with user provided addresses and add some sanity checking to report problems rather then it just letting it fail as a timeout etc
apps/src/client.rs
Outdated
/// Generate a new pair of Source Connection ID and reset token. | ||
fn generate_cid_and_reset_token<T: SecureRandom>( | ||
rng: &T, | ||
) -> (quiche::ConnectionId<'static>, u128) { | ||
let mut scid = [0; quiche::MAX_CONN_ID_LEN]; | ||
rng.fill(&mut scid).unwrap(); | ||
let scid = scid.to_vec().into(); | ||
let mut reset_token = [0; 16]; | ||
rng.fill(&mut reset_token).unwrap(); | ||
let reset_token = u128::from_be_bytes(reset_token); | ||
(scid, reset_token) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be identical to the one in common.rs. Can't we just use that existing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this seems to be code I forgot to clean up, well spotted!
apps/src/args.rs
Outdated
@@ -273,6 +281,8 @@ Options: | |||
--max-active-cids NUM The maximum number of active Connection IDs we can support [default: 2]. | |||
--enable-active-migration Enable active connection migration. | |||
--perform-migration Perform connection migration on another source port. | |||
--multipath Enable multipath support. | |||
-A --address ADDR ... Additional client addresses to use. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is kind of confusing.
Prior to this change, the client would look at the Server IP and pick an "any" socket using an IP family that matched the server's i.e. 0.0.0.0:0 or [::]:0 depending on v4 or v6 respectively
With this change, there aren't additional address that are used on top of the old behaviour, but instead they are just the addresses that would be used. This opens a few new failure scenarios that could catch users out. For example, if a server only returns an A record and the user provided a v6 client address, then the connection would fail.
Among the most confusing type of failure is where packets go to a black hole and the connection fails after a timeout.
We might want to tweak how the socket code handling for this option works, and make the description a bit clearer about what it does and how it might fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the current description does not suit the actual behavior of this option. As a first step, the initial description could be rewritten as "Specify source addresses to be used instead of the unspecified address" (or "instead of letting the OS decides",...).
For the family address mismatch that may arise, I can indeed rework the code to remove all the specified addresses that do not match the family of the server one. In the case none is remaining, the code can hence fallback to the original behavior, i.e., use "0.0.0.0:0" or "[::]:0".
There still remains the issue that non-routable addresses may be provided, e.g., fe80::/16
or other. I'm not sure we can do much here, so maybe update the description as "Specify source addresses to be used instead of the unspecified address, non-routable addresses will lead to connectivity issues" (maybe a bit long, but not sure how to make it shorter without loosing information).
I made a simple client/server example and tested with this multipath patch. I noticed there might be a flaw in the ACKMP sending logic. My code is based on the one in quiche/apps. The client opens 2 paths against the server and writes data on a single QUIC stream. It turns out that the server sometimes stops sending ACKMPs which causes a client-side idle timeout and subsequent termination of the connection. This problem can be mitigated if the send() call in the server event loop is replaced with a send_on_path() call that iterates over all available paths, just like the client example (in the multipath branch) does. The send_single() function sends an ACKMP only if there is an unacked packet receive don the same path as specified in the function argument, while the send() function just selects a "best" path and no other paths are tried even if the selected path doesn't yield a packet data. I noticed the client is retransmitting STREAM packets before a timeout but it doesn't seem to help. I haven't checked on which path those retransmits are happening. |
@@ -485,80 +469,75 @@ pub fn connect( | |||
scid_sent = true; | |||
} | |||
|
|||
if args.perform_migration && | |||
if conn_args.multipath && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the number of addresses to use is greater than conn.available_dcids() some addresses are ignored without giving any information. I think a warning could be added to show that this happens and it can be solved with the --max-active-cids parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be indeed interesting to add a warning message indicating so, good point!
@@ -116,6 +101,7 @@ pub fn connect( | |||
config.set_initial_max_streams_uni(conn_args.max_streams_uni); | |||
config.set_disable_active_migration(!conn_args.enable_active_migration); | |||
config.set_active_connection_id_limit(conn_args.max_active_cids); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my previous comment, I don't think it makes sense for the client to set the active_connection_id_limit to less than the number of paths it intends to use, so the configuration could be:
config.set_active_connection_id_limit(std::cmp::max( conn_args.max_active_cids, args.addrs.len().try_into().unwrap()));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we want the client to include such "magic" without proper documentation, but I can wait for other opinions. Also, the comparison should be made against addrs.len()
and not args.addrs.len()
, as some provided addresses might be of different families than the contacted server address.
@toshiiw This is strange, as the |
I tested with the following version.
I also tested with the newest code on the multipath branch but still saw premature shutdowns. |
quiche/src/lib.rs
Outdated
@@ -6984,6 +7547,63 @@ impl Connection { | |||
} | |||
} | |||
|
|||
let mut consider_standby = false; | |||
let dgrams_to_emit = self.dgram_max_writable_len().is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this always returns true even if self.dgram_send_queue is empty.
// When using multiple packet number spaces, let's force ACK_MP sending | ||
// on their corresponding paths. | ||
if self.is_multipath_enabled() { | ||
if let Some(pid) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...so, this code isn't executed.
@toshiiw Oooh good catch indeed! If you configure your connection to enable datagrams on the connection, I can indeed reproduce the issue! I will push a fix with the adapted multipath test now; replacing |
ead77e6
to
f1ba4a6
Compare
- packet number space map - spaced packet number - `PathEvent::Closed` now includes error code and reason - function refactoring in lib.rs and frame.rs
This commit introduces the following features. - Leave to the application to choose the multipath extensions or not - Build on the `PathEvent`s to let the application decide which paths to use - Provide default reasonable behavior into `quiche` while letting the application provide optimised behavior == Requesting the Multipath feature The application can request multipath through the `Config` structure using a specific API,`set_multipath()`. ```rust config.set_multipath(true); ``` The boolean value determines whether the underlying connection negotiates the multipath extension. Once the handshake succeeded, the application can check whether the connection has the multipath feature enabled using the `is_multipath_enabled()` API on the `Connection` structure. == Path management The API requires the application to specify on which paths/4-tuples it wants to send non-probing packets. Paths must first be validated before using them. This is automatically done for servers, and client must use `probe_path()`. Once the path is validated, the application decides whether it wants active usage through the `set_active()` API. It provides a single API entrypoint to request its usage or not. For active path usage, it can use the following. ```rust if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() { conn.set_active(local, peer, true).unwrap(); } ``` Then, the path will then be considered to use non-probing packets. On the other hand, if for some reason the application wants to temporarily stop sending non-probing frames on a given path, it can do the following. ```rust conn.set_active(local, peer, false).unwrap(); ``` Note that in such state, quiche still replies to PATH_CHALLENGEs observed on that path. Finally, the multipath design allows a QUIC endpoint to close/abandon a given path along with an error code and error message, without altering the connection's operations as long as another path is available. ```rust conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap(); ``` -- Retrocompatibility note - The `Closed` variant of `PathEvent` is now a 4-tuple that, in addition to the local and peer `SocketAddr`, also contains an `u64` error code and a `Vec<u8>` reason message. - There is a new variant of `PathEvent`: `PeerPathStatus` reports to the application that the peer advertised some status for a given 4-tuple. - There are two new `Error` variants: `UnavailablePath` and `MultiPathViolation`. -- Note Currently this API is only available when multipath feature is enabled over the session (i.e., `conn.is_multipath_enabled()` returns `true`). If the extension is not enabled, `set_active()` and `abandon_path()` return an `Error::InvalidState`. Actually, this API might sound "double usage" along with the `migrate()` API (as there is no real "connection migration" with multipath). Should we just keep the `set_active()` or similarly named API and include the `migrate()` functionality in `set_active()`? Actually, an client application without the multipath feature could just migrate using `set_active(local, peer, true)`, setting the previous path in unused mode under the hood. == Scheduling sent packets Similarly to the connection migration PR, there are two ways to control how `quiche` schedules packets on paths. The first consists in letting `quiche` handles this by itself. The application simply uses the `send()` method. In the current master code, `quiche` automatically handles path validation processing thanks to the internal `get_send_path_id()` `Connection` method. The multipath patch extends this method and iterates over all active paths following the lowest latency path having its congestion window open heuristic (a reasonable default in multipath protocols). ```rust loop { let (write, send_info) = match conn.send(&mut out) { Ok(v) => v, Err(quiche::Error::Done) => break, Err(e) => { conn.close(false, 0x1, b"fail").ok(); break; }, }; // write to socket bound to `send_info.from` } ``` The second option is to let the application choose on which path it wants to send the next packet. The application can iterate over the available paths and their corresponding statistics using `path_stats()` and schedules packets using `send_on_path()` method. This can be useful when the use case requires some specific scheduling strategies. See `apps/src/client.rs` for an example of such application-driven scheduling.
And fix clippy
Credits to @toshiiw for finding the issue.
Through the addition of the `find_scid_seq()` method on `Connection`.
That was not doing much.
Hello everyone, I'm new to QUIC, and I'm starting my master's thesis entitled "Enhancing the Performance of a Single QUIC Connection with Multi-Path QUIC." While conducting measurements, I've noticed that the Multi-Path extension is experiencing correctness issues in my setup. I'm using loop-back addresses on a single host. Below is a script that reproduces the issue. The server has one endpoint, and the client has two endpoints. Observations:
I haven't dug deeply into the issue because I'm uncertain whether it's due to a misconfiguration on my part or if it's a bug in the actual source code. Thank you in advance for your time. #!/bin/bash
# Code partlty inspired by https://github.com/tumi8/quic-10g-paper
# Variables
QUICHE_REPO="https://github.com/qdeconinck/quiche.git"
QUICHE_COMMIT="d87332018d84fb7c429ad2ed34cbfdc6ee9477c8"
RUST_PLATFORM="x86_64-unknown-linux-gnu"
FILE_SIZE=8G
NB_RUNS=10
RED='\033[0;31m'
RESET='\033[0m'
echo_red() {
echo -e "${RED}$1${RESET}"
}
get_unused_port(){
local port
port=$(shuf -i 2000-65000 -n 1)
while netstat -atn | grep -q ":$port "; do
port=$(shuf -i 2000-65000 -n 1)
done
echo "$port"
}
clone_mp_quiche() {
if [ ! -d "./quiche" ]; then
git clone --recursive "$QUICHE_REPO"
cd quiche || exit
git checkout "$QUICHE_COMMIT"
RUSTFLAGS='-C target-cpu=native' cargo build --release
cd ..
fi
if [ ! -f "./quiche-client" ]; then
cp "quiche/target/release/quiche-client" .
fi
if [ ! -f "./quiche-server" ]; then
cp "quiche/target/release/quiche-server" .
fi
}
setup_rust() {
# Rust
if ! rustc --version 1>/dev/null 2>&1; then
curl --proto '=https' --tlsv1.2 -sSf -o /tmp/rustup-init.sh https://sh.rustup.rs
chmod +x /tmp/rustup-init.sh
/tmp/rustup-init.sh -q -y --default-host "$RUST_PLATFORM" --default-toolchain stable --profile default
source "$HOME/.cargo/env"
else
echo "Rust is already installed"
fi
}
setup_environment() {
mkdir -p "$(pwd)/www" "$(pwd)/responses" "$(pwd)/logs"
fallocate -l ${FILE_SIZE} "$(pwd)/www/${FILE_SIZE}B_file"
}
iteration_loop() {
for iter in $(seq 1 ${NB_RUNS}); do
echo "Testing Multi-Path QUIC correctness - Iteration $iter"
server_port=$(get_unused_port)
client_port_1=$(get_unused_port)
client_port_2=$(get_unused_port)
# Run server
env RUST_LOG=info ./quiche-server \
--listen 127.0.0.1:${server_port} \
--root "$(pwd)/www/" \
--key "$(pwd)/quiche/apps/src/bin/cert.key" \
--cert "$(pwd)/quiche/apps/src/bin/cert.crt" \
--multipath \
1>"$(pwd)/logs/server_${iter}.log" 2>&1 &
server_pid=$!
# Run client
env RUST_LOG=info ./quiche-client \
--no-verify "https://127.0.0.1:${server_port}/${FILE_SIZE}B_file" \
--dump-responses "$(pwd)/responses/" \
-A 127.0.0.1:${client_port_1} \
-A 127.0.0.1:${client_port_2} \
--multipath \
1>"$(pwd)/logs/client_${iter}.log" 2>&1
error_code=$?
sleep 1
kill -9 "$server_pid" 1>/dev/null 2>&1
if [ $error_code -ne 0 ]; then
echo_red "Error Client: $error_code"
exit 1
fi
# Check if files are the same
diff -q "$(pwd)/www/${FILE_SIZE}B_file" "$(pwd)/responses/${FILE_SIZE}B_file"
if [ $? -ne 0 ]; then
echo_red "Error: files are not the same"
exit 1
fi
done
}
main() {
# Version
setup_rust
[ $? -ne 0 ] && { echo_red "Error setting up rust"; exit 1; }
clone_mp_quiche
[ $? -ne 0 ] && { echo_red "Error cloning quiche"; exit 1; }
setup_environment
[ $? -ne 0 ] && { echo_red "Error setting up environment"; exit 1; }
iteration_loop
}
main |
This commits introduces the following features.
PathEvent
s to let the application decide which paths to usequiche
while letting theapplication provide optimised behavior
== Requesting the Multipath feature
The application can request multipath through the
Config
structure using aspecific API,
set_multipath()
.The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
is_multipath_enabled()
API on theConnection
structure.== Path management
The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use
probe_path()
.Once the path is validated, the application decides whether it wants active
usage through the
set_active()
API. It provides a single API entrypoint torequest its usage or not. For active path usage, it can use the following.
Then, the path will then be considered to use non-probing packets.
On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.
Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.
Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.
-- Retrocompatibility note
Closed
variant ofPathEvent
is now a 4-tuple that, in addition tothe local and peer
SocketAddr
, also contains anu64
error code and aVec<u8>
reason message.PathEvent
:PeerPathStatus
reports to theapplication that the peer advertised some status for a given 4-tuple.
Error
variants:UnavailablePath
andMultiPathViolation
.-- Note
Currently this API is only available when multipath feature is enabled over
the session (i.e.,
conn.is_multipath_enabled()
returnstrue
). If theextension is not enabled,
set_active()
andabandon_path()
return anError::InvalidState
. Actually, this API might sound "double usage" alongwith the
migrate()
API (as there is no real "connection migration" withmultipath). Should we just keep the
set_active()
or similarly named APIand include the
migrate()
functionality inset_active()
? Actually, anclient application without the multipath feature could just migrate using
set_active(local, peer, true)
, setting the previous path in unused modeunder the hood.
== Scheduling sent packets
Similarly to the connection migration PR, there are two ways to control how
quiche
schedules packets on paths.The first consists in letting
quiche
handles this by itself. Theapplication simply uses the
send()
method. In the current master code,quiche
automatically handles path validation processing thanks to theinternal
get_send_path_id()
Connection
method. The multipath patchextends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).
The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using
path_stats()
and schedules packetsusing
send_on_path()
method. This can be useful when the use case requiressome specific scheduling strategies. See
apps/src/client.rs
for an exampleof such application-driven scheduling.