Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipath support with non-zero length Connection IDs #1310

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

qdeconinck
Copy link
Contributor

@qdeconinck qdeconinck commented Sep 5, 2022

This commits introduces the following features.

  • Leave to the application to choose the multipath extensions or not
  • Build on the PathEvents to let the application decide which paths to use
  • Provide default reasonable behavior into quiche while letting the
    application provide optimised behavior

== Requesting the Multipath feature

The application can request multipath through the Config structure using a
specific API,set_multipath().

config.set_multipath(true);

The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
is_multipath_enabled() API on the Connection structure.

== Path management

The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use probe_path().
Once the path is validated, the application decides whether it wants active
usage through the set_active() API. It provides a single API entrypoint to
request its usage or not. For active path usage, it can use the following.

if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() {
    conn.set_active(local, peer, true).unwrap();
}

Then, the path will then be considered to use non-probing packets.

On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.

conn.set_active(local, peer, false).unwrap();

Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.

Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.

conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap();

-- Retrocompatibility note

  • The Closed variant of PathEvent is now a 4-tuple that, in addition to
    the local and peer SocketAddr, also contains an u64 error code and a
    Vec<u8> reason message.
  • There is a new variant of PathEvent: PeerPathStatus reports to the
    application that the peer advertised some status for a given 4-tuple.
  • There are two new Error variants: UnavailablePath and
    MultiPathViolation.

-- Note

Currently this API is only available when multipath feature is enabled over
the session (i.e., conn.is_multipath_enabled() returns true). If the
extension is not enabled, set_active() and abandon_path() return an
Error::InvalidState. Actually, this API might sound "double usage" along
with the migrate() API (as there is no real "connection migration" with
multipath). Should we just keep the set_active() or similarly named API
and include the migrate() functionality in set_active()? Actually, an
client application without the multipath feature could just migrate using
set_active(local, peer, true), setting the previous path in unused mode
under the hood.

== Scheduling sent packets

Similarly to the connection migration PR, there are two ways to control how
quiche schedules packets on paths.

The first consists in letting quiche handles this by itself. The
application simply uses the send() method. In the current master code,
quiche automatically handles path validation processing thanks to the
internal get_send_path_id() Connection method. The multipath patch
extends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).

loop {
    let (write, send_info) = match conn.send(&mut out) {
            Ok(v) => v,

            Err(quiche::Error::Done) => break,

            Err(e) => {
                conn.close(false, 0x1, b"fail").ok();
                break;
            },
        };
        // write to socket bound to `send_info.from`
}

The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using path_stats() and schedules packets
using send_on_path() method. This can be useful when the use case requires
some specific scheduling strategies. See apps/src/client.rs for an example
of such application-driven scheduling.

@qdeconinck qdeconinck requested a review from a team as a code owner September 5, 2022 08:17
},

PathAbandon {
identifier_type: u8,
Copy link
Contributor

@LPardue LPardue Sep 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.

},

PathStatus {
identifier_type: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the absence of a qlog spec, I'd expect identifier_type to be u64. That makes it consistent with varint type in the frame definition.

@hendrikcech
Copy link

Thanks for your work on implementing MPQUIC! I played around with this code and have the suspicion that some packets are not sent on the path that they should be sent on.

My setup: I created a mininet topology to test multipath support with the quiche-server and quiche-client applications.

                Path 1
         /--- s1 --- s2 ---/
     10.0.1.1           10.0.1.2
 Client h1          Server h2
     10.0.2.1           10.0.2.2 
         \--- s3 --- s4 ---/
                Path 2

I added space_id and path_id to the qlog PacketHeader that is used in the send_single and recv_single functions. From my understanding, space_id refers to the packet number space of the sent/received packet while path_id refers to the network path that the packet will be sent over / was received from. Since we have one packet number space per path, these two values should be equal.

My tests showed that, on the server, path_id always equals space_id (values are either 0 or 1). On the client, space_id and path_id are also equal in all transport:packet_sent events.

This is however not true for transport:packet_received events on the client: here, space_id != path_id in about 3% of cases (e.g., 1300 of 40000 received packets). I was not yet able to confirm if packets are actually sent on a different network path than space_id indicates in those cases (Wireshark fails to decode the QUIC packets).

The quiche logs hint that packets, that are assigned to the packet number space of one path, are sometimes sent on the other path:

[...]
[2022-11-16T14:19:49.964226504Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 1
[2022-11-16T14:19:49.974246915Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 0
[2022-11-16T14:19:49.985896148Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 0 now see SCID with seq num 1
[...]

Is this behavior expected or is there actually something going wrong?

Attachments:

@qdeconinck
Copy link
Contributor Author

@hendrikcech Nice to see you are experimenting with the code :)

It seems that quiche behaves correctly, but the server code does not send the packet on the right path. From the client logs,

[2022-11-16T14:19:43.721770238Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx pkt Short dcid=18f4cf5ad9df2fb7f8cb280b8ea06fafa2e45d3c key_phase=false len=1329 pn=0 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.721878453Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 1 now see SCID with seq num 1
[2022-11-16T14:19:43.721925418Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PATH_RESPONSE data=[02, e6, 51, c4, 58, 32, 39, 45]
[2022-11-16T14:19:43.721993077Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PATH_CHALLENGE data=[26, 6b, ab, b8, 54, 64, 49, 0b]
[2022-11-16T14:19:43.722450331Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm PADDING len=1294
[2022-11-16T14:19:43.722732114Z TRACE quiche_apps::client] 10.0.2.1:8002: processed 1350 bytes
[2022-11-16T14:19:43.722774926Z TRACE quiche_apps::client] 10.0.2.1:8002: got 49 bytes
[2022-11-16T14:19:43.722840885Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx pkt Short dcid=14d656e811af235b787dc1ba7bc22a72524c0d80 key_phase=false len=28 pn=1 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.722927964Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 peer reused CID 14d656e811af235b787dc1ba7bc22a72524c0d80 from path 0 on path 1
[2022-11-16T14:19:43.722971351Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 path ID 1 now see SCID with seq num 0
[2022-11-16T14:19:43.723004292Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm DATA_BLOCKED limit=10000000
[2022-11-16T14:19:43.723037715Z TRACE quiche] 14d656e811af235b787dc1ba7bc22a72524c0d80 rx frm STREAM id=7 off=0 len=1 fin=false

But at server side:

[2022-11-16T14:19:43.668765632Z INFO  quiche_server] 327765e95cde0abdb523c6de9e6b2d292c0ced18 Seen new path (10.0.1.2:4433, 10.0.2.1:8002)
[2022-11-16T14:19:43.668816125Z TRACE quiche_server] recv() would block
[2022-11-16T14:19:43.669135491Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx pkt Short dcid=18f4cf5ad9df2fb7f8cb280b8ea06fafa2e45d3c key_phase=false len=1312 pn=0 src:10.0.1.2:4433 dst:10.0.2.1:8002
[2022-11-16T14:19:43.669183980Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PATH_RESPONSE data=[02, e6, 51, c4, 58, 32, 39, 45]
[2022-11-16T14:19:43.669228104Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PATH_CHALLENGE data=[26, 6b, ab, b8, 54, 64, 49, 0b]
[2022-11-16T14:19:43.669267438Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm PADDING len=1294
[2022-11-16T14:19:43.669409946Z TRACE quiche::recovery] 327765e95cde0abdb523c6de9e6b2d292c0ced18 timer=1.023455331s latest_rtt=0ns srtt=None min_rtt=0ns rttvar=166.5ms loss_time=[None, None, None] loss_probes=[0, 0, 0] cwnd=13500 ssthresh=18446744073709551615 bytes_in_flight=1350 app_limited=true congestion_recovery_start_time=None Rate { delivered: 0, delivered_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, first_sent_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, end_of_app_limited: SpacedPktNum(0, 0), last_sent_packet: SpacedPktNum(1, 0), largest_acked: SpacedPktNum(0, 0), rate_sample: RateSample { delivery_rate: 0, is_app_limited: false, interval: 0ns, delivered: 0, prior_delivered: 0, prior_time: None, send_elapsed: 0ns, ack_elapsed: 0ns, rtt: 0ns } } pacer=Pacer { enabled: true, capacity: 13500, used: 0, rate: 0, last_update: Instant { tv_sec: 228741, tv_nsec: 941993262 }, next_time: Instant { tv_sec: 228741, tv_nsec: 941993262 }, max_datagram_size: 1350, last_packet_size: None, iv: 0ns } hystart=window_end=Some(SpacedPktNum(1, 0)) last_round_min_rtt=18446744073709551615.999999999s current_round_min_rtt=18446744073709551615.999999999s css_baseline_min_rtt=18446744073709551615.999999999s rtt_sample_count=0 css_start_time=None css_round_count=0 cubic={ k=0 w_max=0 } 
[2022-11-16T14:19:43.669725202Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx pkt Short dcid=14d656e811af235b787dc1ba7bc22a72524c0d80 key_phase=false len=11 pn=1 src:10.0.1.2:4433 dst:10.0.1.1:8001
[2022-11-16T14:19:43.669773886Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm DATA_BLOCKED limit=10000000
[2022-11-16T14:19:43.669803595Z TRACE quiche] 327765e95cde0abdb523c6de9e6b2d292c0ced18 tx frm STREAM id=7 off=0 len=1 fin=false
[2022-11-16T14:19:43.669936399Z TRACE quiche::recovery] 327765e95cde0abdb523c6de9e6b2d292c0ced18 timer=218.846885ms latest_rtt=104.273399ms srtt=Some(104.518737ms) min_rtt=104.273399ms rttvar=22.416993ms loss_time=[None, None, None] loss_probes=[0, 0, 0] cwnd=13500 ssthresh=18446744073709551615 bytes_in_flight=49 app_limited=true congestion_recovery_start_time=None Rate { delivered: 2194, delivered_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, first_sent_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, end_of_app_limited: SpacedPktNum(0, 1), last_sent_packet: SpacedPktNum(0, 1), largest_acked: SpacedPktNum(0, 1), rate_sample: RateSample { delivery_rate: 4958, is_app_limited: true, interval: 104.273399ms, delivered: 517, prior_delivered: 1677, prior_time: Some(Instant { tv_sec: 228741, tv_nsec: 832462742 }), send_elapsed: 0ns, ack_elapsed: 104.273399ms, rtt: 104.273399ms } } pacer=Pacer { enabled: true, capacity: 13500, used: 0, rate: 161454, last_update: Instant { tv_sec: 228741, tv_nsec: 936736141 }, next_time: Instant { tv_sec: 228741, tv_nsec: 936736141 }, max_datagram_size: 1350, last_packet_size: Some(0), iv: 0ns } hystart=window_end=Some(SpacedPktNum(0, 0)) last_round_min_rtt=18446744073709551615.999999999s current_round_min_rtt=18446744073709551615.999999999s css_baseline_min_rtt=18446744073709551615.999999999s rtt_sample_count=0 css_start_time=None css_round_count=0 cubic={ k=0 w_max=0 } 
[2022-11-16T14:19:43.670211822Z TRACE quiche_server] 327765e95cde0abdb523c6de9e6b2d292c0ced18 written 1399 bytes

So quiche at the server side indeed indicates the second packet should be sent on the original path src:10.0.1.2:4433 dst:10.0.1.1:8001, but for some reason the client side notices that the "STREAM" packet is received on the same path than the PATH_CHALLENGE one.

It actually seems that quiche-server batches the conn.send() into a large buffer and calls send_to once the buffer is full (or if there is no additional packet to be sent), but forgets to check whether packets should be sent on different 4-tuples (only the 4-tuple of the first packet in the batch is considered). I can quickly look at fixing this behaviour of quiche-server in a commit.

@hendrikcech
Copy link

@qdeconinck Thanks for taking a look! I can confirm that these changes to quiche-server resolve the problem.

let migrate_socket = if args.perform_migration {
let mut socket =
mio::net::UdpSocket::bind(bind_addr.parse().unwrap()).unwrap();
let mut addrs = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think now would be a good opportunity to move the socket construction code into its own method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, per my comment about the args, it might help to think of the possible failure scenarios with user provided addresses and add some sanity checking to report problems rather then it just letting it fail as a timeout etc

Comment on lines 621 to 673
/// Generate a new pair of Source Connection ID and reset token.
fn generate_cid_and_reset_token<T: SecureRandom>(
rng: &T,
) -> (quiche::ConnectionId<'static>, u128) {
let mut scid = [0; quiche::MAX_CONN_ID_LEN];
rng.fill(&mut scid).unwrap();
let scid = scid.to_vec().into();
let mut reset_token = [0; 16];
rng.fill(&mut reset_token).unwrap();
let reset_token = u128::from_be_bytes(reset_token);
(scid, reset_token)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be identical to the one in common.rs. Can't we just use that existing one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this seems to be code I forgot to clean up, well spotted!

apps/src/args.rs Outdated
@@ -273,6 +281,8 @@ Options:
--max-active-cids NUM The maximum number of active Connection IDs we can support [default: 2].
--enable-active-migration Enable active connection migration.
--perform-migration Perform connection migration on another source port.
--multipath Enable multipath support.
-A --address ADDR ... Additional client addresses to use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is kind of confusing.

Prior to this change, the client would look at the Server IP and pick an "any" socket using an IP family that matched the server's i.e. 0.0.0.0:0 or [::]:0 depending on v4 or v6 respectively

With this change, there aren't additional address that are used on top of the old behaviour, but instead they are just the addresses that would be used. This opens a few new failure scenarios that could catch users out. For example, if a server only returns an A record and the user provided a v6 client address, then the connection would fail.

Among the most confusing type of failure is where packets go to a black hole and the connection fails after a timeout.

We might want to tweak how the socket code handling for this option works, and make the description a bit clearer about what it does and how it might fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the current description does not suit the actual behavior of this option. As a first step, the initial description could be rewritten as "Specify source addresses to be used instead of the unspecified address" (or "instead of letting the OS decides",...).

For the family address mismatch that may arise, I can indeed rework the code to remove all the specified addresses that do not match the family of the server one. In the case none is remaining, the code can hence fallback to the original behavior, i.e., use "0.0.0.0:0" or "[::]:0".

There still remains the issue that non-routable addresses may be provided, e.g., fe80::/16 or other. I'm not sure we can do much here, so maybe update the description as "Specify source addresses to be used instead of the unspecified address, non-routable addresses will lead to connectivity issues" (maybe a bit long, but not sure how to make it shorter without loosing information).

@toshiiw
Copy link
Contributor

toshiiw commented Feb 1, 2023

I made a simple client/server example and tested with this multipath patch. I noticed there might be a flaw in the ACKMP sending logic.

My code is based on the one in quiche/apps. The client opens 2 paths against the server and writes data on a single QUIC stream. It turns out that the server sometimes stops sending ACKMPs which causes a client-side idle timeout and subsequent termination of the connection. This problem can be mitigated if the send() call in the server event loop is replaced with a send_on_path() call that iterates over all available paths, just like the client example (in the multipath branch) does.

The send_single() function sends an ACKMP only if there is an unacked packet receive don the same path as specified in the function argument, while the send() function just selects a "best" path and no other paths are tried even if the selected path doesn't yield a packet data.
I think the problem lies here.

I noticed the client is retransmitting STREAM packets before a timeout but it doesn't seem to help. I haven't checked on which path those retransmits are happening.

@@ -485,80 +469,75 @@ pub fn connect(
scid_sent = true;
}

if args.perform_migration &&
if conn_args.multipath &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the number of addresses to use is greater than conn.available_dcids() some addresses are ignored without giving any information. I think a warning could be added to show that this happens and it can be solved with the --max-active-cids parameter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be indeed interesting to add a warning message indicating so, good point!

@@ -116,6 +101,7 @@ pub fn connect(
config.set_initial_max_streams_uni(conn_args.max_streams_uni);
config.set_disable_active_migration(!conn_args.enable_active_migration);
config.set_active_connection_id_limit(conn_args.max_active_cids);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my previous comment, I don't think it makes sense for the client to set the active_connection_id_limit to less than the number of paths it intends to use, so the configuration could be:
config.set_active_connection_id_limit(std::cmp::max( conn_args.max_active_cids, args.addrs.len().try_into().unwrap()));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want the client to include such "magic" without proper documentation, but I can wait for other opinions. Also, the comparison should be made against addrs.len() and not args.addrs.len(), as some provided addresses might be of different families than the contacted server address.

@qdeconinck
Copy link
Contributor Author

@toshiiw This is strange, as the get_send_path_id() method should address this point. Could you indicate on which commit you are based, and provide logs describing the behavior? Feel free to contact me offline if preferred.

@toshiiw
Copy link
Contributor

toshiiw commented Feb 7, 2023

I tested with the following version.
Attached are sqlog files with packet namespace ids.
Data is sent from a client to a server (unidirectional).

commit d67722801b043a9b82c04fa70bf1e3240492ee23 (HEAD -> multipath)
Author: Quentin De Coninck <quentin_d@apple.com>
Date:   Mon Oct 10 17:29:15 2022 +0200

sqlog.tar.gz

I also tested with the newest code on the multipath branch but still saw premature shutdowns.

@@ -6984,6 +7547,63 @@ impl Connection {
}
}

let mut consider_standby = false;
let dgrams_to_emit = self.dgram_max_writable_len().is_some();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this always returns true even if self.dgram_send_queue is empty.

// When using multiple packet number spaces, let's force ACK_MP sending
// on their corresponding paths.
if self.is_multipath_enabled() {
if let Some(pid) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...so, this code isn't executed.

@qdeconinck
Copy link
Contributor Author

@toshiiw Oooh good catch indeed! If you configure your connection to enable datagrams on the connection, I can indeed reproduce the issue! I will push a fix with the adapted multipath test now; replacing let dgrams_to_emit = self.dgram_max_writable_len().is_some(); with let dgrams_to_emit = self.dgram_send_queue.has_pending(); indeed solves the issue.

@qdeconinck
Copy link
Contributor Author

@ghedo Refactoring changes are isolated in #1493.

qdeconinck and others added 17 commits October 27, 2023 16:40
- packet number space map
- spaced packet number
- `PathEvent::Closed` now includes error code and reason
- function refactoring in lib.rs and frame.rs
This commit introduces the following features.

- Leave to the application to choose the multipath extensions or not
- Build on the `PathEvent`s to let the application decide which paths to use
- Provide default reasonable behavior into `quiche` while letting the
  application provide optimised behavior

== Requesting the Multipath feature

The application can request multipath through the `Config` structure using a
specific API,`set_multipath()`.
```rust
config.set_multipath(true);
```
The boolean value determines whether the underlying connection negotiates the
multipath extension. Once the handshake succeeded, the application can check
whether the connection has the multipath feature enabled using the
`is_multipath_enabled()` API on the `Connection` structure.

== Path management

The API requires the application to specify on which paths/4-tuples it wants
to send non-probing packets. Paths must first be validated before using them.
This is automatically done for servers, and client must use `probe_path()`.
Once the path is validated, the application decides whether it wants active
usage through the `set_active()` API. It provides a single API entrypoint to
request its usage or not. For active path usage, it can use the following.

```rust
if let Some(PathEvent::Validated(local, peer)) = conn.path_event_next() {
    conn.set_active(local, peer, true).unwrap();
}
```
Then, the path will then be considered to use non-probing packets.

On the other hand, if for some reason the application wants to temporarily
stop sending non-probing frames on a given path, it can do the following.

```rust
conn.set_active(local, peer, false).unwrap();
```
Note that in such state, quiche still replies to PATH_CHALLENGEs observed on
that path.

Finally, the multipath design allows a QUIC endpoint to close/abandon a
given path along with an error code and error message, without altering the
connection's operations as long as another path is available.

```rust
conn.abandon_path(local, peer, 42, "Some error message".into()).unwrap();
```

-- Retrocompatibility note

- The `Closed` variant of `PathEvent` is now a 4-tuple that, in addition to
the local and peer `SocketAddr`, also contains an `u64` error code and a
`Vec<u8>` reason message.
- There is a new variant of `PathEvent`: `PeerPathStatus` reports to the
application that the peer advertised some status for a given 4-tuple.
- There are two new `Error` variants: `UnavailablePath` and
`MultiPathViolation`.

-- Note

Currently this API is only available when multipath feature is enabled over
the session (i.e., `conn.is_multipath_enabled()` returns `true`). If the
extension is not enabled, `set_active()` and `abandon_path()` return an
`Error::InvalidState`. Actually, this API might sound "double usage" along
with the `migrate()` API (as there is no real "connection migration" with
multipath). Should we just keep the `set_active()` or similarly named API
and include the `migrate()` functionality in `set_active()`? Actually, an
client application without the multipath feature could just migrate using
`set_active(local, peer, true)`, setting the previous path in unused mode
under the hood.

== Scheduling sent packets

Similarly to the connection migration PR, there are two ways to control how
`quiche` schedules packets on paths.

The first consists in letting `quiche` handles this by itself. The
application simply uses the `send()` method. In the current master code,
`quiche` automatically handles path validation processing thanks to the
internal `get_send_path_id()` `Connection` method. The multipath patch
extends this method and iterates over all active paths following the lowest
latency path having its congestion window open heuristic (a reasonable
default in multipath protocols).

```rust
loop {
    let (write, send_info) = match conn.send(&mut out) {
            Ok(v) => v,

            Err(quiche::Error::Done) => break,

            Err(e) => {
                conn.close(false, 0x1, b"fail").ok();
                break;
            },
        };
        // write to socket bound to `send_info.from`
}
```

The second option is to let the application choose on which path it wants
to send the next packet. The application can iterate over the available paths
and their corresponding statistics using `path_stats()` and schedules packets
using `send_on_path()` method. This can be useful when the use case requires
some specific scheduling strategies. See `apps/src/client.rs` for an example
of such application-driven scheduling.
Through the addition of the `find_scid_seq()` method on
`Connection`.
@vanyingenzi
Copy link

vanyingenzi commented Oct 31, 2023

Hello everyone,

I'm new to QUIC, and I'm starting my master's thesis entitled "Enhancing the Performance of a Single QUIC Connection with Multi-Path QUIC."

While conducting measurements, I've noticed that the Multi-Path extension is experiencing correctness issues in my setup. I'm using loop-back addresses on a single host. Below is a script that reproduces the issue. The server has one endpoint, and the client has two endpoints.

Observations:

  • The server validates the paths. However, it sends traffic on both paths, but the transfer ends prematurely with the client.
  • The client terminates with a timeout error due to the idle timeout, where the duration of the timeout equals the time of the transfer.
  • The occurrence of this error is non-deterministic, varying from one run to another. However, it appears to happen more frequently with larger files. I tested with 1GB and 8GB only.

I haven't dug deeply into the issue because I'm uncertain whether it's due to a misconfiguration on my part or if it's a bug in the actual source code.

Thank you in advance for your time.

#!/bin/bash

# Code partlty inspired by https://github.com/tumi8/quic-10g-paper

# Variables
QUICHE_REPO="https://github.com/qdeconinck/quiche.git"
QUICHE_COMMIT="d87332018d84fb7c429ad2ed34cbfdc6ee9477c8"
RUST_PLATFORM="x86_64-unknown-linux-gnu"
FILE_SIZE=8G
NB_RUNS=10

RED='\033[0;31m'
RESET='\033[0m'

echo_red() {
    echo -e "${RED}$1${RESET}"
}

get_unused_port(){
    local port
    port=$(shuf -i 2000-65000 -n 1)
    while netstat -atn | grep -q ":$port "; do
        port=$(shuf -i 2000-65000 -n 1)
    done
    echo "$port"
}

clone_mp_quiche() {
    if [ ! -d "./quiche" ]; then
        git clone --recursive "$QUICHE_REPO"
        cd quiche || exit
        git checkout "$QUICHE_COMMIT"
        RUSTFLAGS='-C target-cpu=native' cargo build --release
        cd ..
    fi
    if [ ! -f "./quiche-client" ]; then
        cp "quiche/target/release/quiche-client" .
    fi
    if [ ! -f "./quiche-server" ]; then
        cp "quiche/target/release/quiche-server" .
    fi
}

setup_rust() {
    # Rust
    if ! rustc --version 1>/dev/null 2>&1; then
        curl --proto '=https' --tlsv1.2 -sSf -o /tmp/rustup-init.sh https://sh.rustup.rs
        chmod +x /tmp/rustup-init.sh
        /tmp/rustup-init.sh -q -y --default-host "$RUST_PLATFORM" --default-toolchain stable --profile default
        source "$HOME/.cargo/env"
    else 
        echo "Rust is already installed"
    fi
}

setup_environment() {
    mkdir -p "$(pwd)/www" "$(pwd)/responses" "$(pwd)/logs"
    fallocate -l ${FILE_SIZE} "$(pwd)/www/${FILE_SIZE}B_file"
}

iteration_loop() {
    for iter in $(seq 1 ${NB_RUNS}); do
        echo "Testing Multi-Path QUIC correctness - Iteration $iter"
        
        server_port=$(get_unused_port)
        client_port_1=$(get_unused_port)
        client_port_2=$(get_unused_port)

        # Run server
        env RUST_LOG=info ./quiche-server \
            --listen 127.0.0.1:${server_port} \
            --root "$(pwd)/www/" \
            --key "$(pwd)/quiche/apps/src/bin/cert.key" \
            --cert "$(pwd)/quiche/apps/src/bin/cert.crt" \
            --multipath \
            1>"$(pwd)/logs/server_${iter}.log" 2>&1 &
        server_pid=$!

        # Run client
        env RUST_LOG=info ./quiche-client \
            --no-verify "https://127.0.0.1:${server_port}/${FILE_SIZE}B_file" \
            --dump-responses "$(pwd)/responses/" \
            -A 127.0.0.1:${client_port_1} \
            -A 127.0.0.1:${client_port_2} \
            --multipath \
            1>"$(pwd)/logs/client_${iter}.log" 2>&1
        error_code=$?

        sleep 1
        
        kill -9 "$server_pid" 1>/dev/null 2>&1
        if [ $error_code -ne 0 ]; then
            echo_red "Error Client: $error_code"
            exit 1
        fi

        # Check if files are the same
        diff -q "$(pwd)/www/${FILE_SIZE}B_file" "$(pwd)/responses/${FILE_SIZE}B_file"
        if [ $? -ne 0 ]; then
            echo_red "Error: files are not the same"
            exit 1
        fi
    done
}

main() {
    # Version
    setup_rust
    [ $? -ne 0 ] && { echo_red "Error setting up rust"; exit 1; }
    clone_mp_quiche
    [ $? -ne 0 ] && { echo_red "Error cloning quiche"; exit 1; }
    setup_environment
    [ $? -ne 0 ] && { echo_red "Error setting up environment"; exit 1; }
    iteration_loop
}

main

logs.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants