New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection migration support #1223
Conversation
@qdeconinck can you check the test-failure ? |
The integration indicates a failure with
|
After investigation, it seems that the failing test was due to the slowness of the |
apps/src/args.rs
Outdated
@@ -68,6 +70,8 @@ pub struct CommonArgs { | |||
/// --dgram-proto PROTO DATAGRAM application protocol. | |||
/// --dgram-count COUNT Number of DATAGRAMs to send. | |||
/// --dgram-data DATA DATAGRAM data to send. | |||
/// --max-active-cids NUM Maximum number of active Connection IDs. | |||
/// --migrate Enable active connection migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a huge fan of this flag name since it can be overloaded. How about --enable-active-migration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming is complicated :) but yes, --enable-active-migration
actually better describes what this option does. Will update that and the following occurrences.
apps/src/args.rs
Outdated
@@ -216,6 +229,9 @@ Options: | |||
--no-grease Don't send GREASE. | |||
--cc-algorithm NAME Specify which congestion control algorithm to use [default: cubic]. | |||
--disable-hystart Disable HyStart++. | |||
--max-active-cids NUM The maximum number of active Connection IDs we can support [default: 2]. | |||
--migrate Enable active connection migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment here regarding the name.
apps/src/args.rs
Outdated
@@ -360,6 +381,8 @@ Options: | |||
--dgram-data DATA Data to send for certain types of DATAGRAM application protocol [default: brrr]. | |||
--cc-algorithm NAME Specify which congestion control algorithm to use [default: cubic]. | |||
--disable-hystart Disable HyStart++. | |||
--max-active-cids NUM The maximum number of active Connection IDs we can support [default: 2]. | |||
--migrate Enable active connection migration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here :-)
apps/src/bin/quiche-server.rs
Outdated
.conn | ||
.probe_path(local_addr, peer_addr) | ||
.expect("cannot probe"); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: house style prefers blank line between match cases. Given this is a bit of a sizeable block of code, please consider putting it in a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated that, also integrated the trace_id
of the client's connection for each received event.
apps/src/client.rs
Outdated
|
||
Err(e) => { | ||
// There are no more UDP packets to read, so end the read | ||
// loop. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment true? Isn't there a possibility that there are other packets to read from other sockets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Indeed, we should instead iterate over the events and loop on a given socket until there is nothing more to read on that one. Will update the code.
apps/src/client.rs
Outdated
// There are no more UDP packets to read, so end the read | ||
// loop. | ||
if e.kind() == std::io::ErrorKind::WouldBlock { | ||
trace!("recv() would block"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably want to add detail to this line to say which socket would block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I added in the log the local address on which the socket is bound.
apps/src/client.rs
Outdated
|
||
trace!("got {} bytes", len); | ||
trace!("got {} bytes", len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from experience, it helps to be as descriptive in these lines as possible. So like above let's consider if there is any way for the log line to discriminate exactly what received the bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the same as above.
apps/src/client.rs
Outdated
while let Ok(qe) = conn.poll_path() { | ||
match qe { | ||
quiche::PathEvent::New(..) => { | ||
panic!("Should not see new path"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this is an semantic error but it would be great to avoid the explicit panic. This makes me wonder, does the library event return such an event?
Perhaps a clearer thing to do would be to use unreachable!()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a client-side connection, it should not. I replaced that to unreachable!()
.
apps/src/client.rs
Outdated
@@ -78,14 +78,24 @@ pub fn connect( | |||
std::net::SocketAddr::V6(_) => "[::]:0", | |||
}; | |||
|
|||
let mut sockets = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are only going to use at most 2 sockets in the migration, I think it would be better to make that explicit (i.e. just name the objects) rather than have a potentially unbounded vector that requires magic indexes, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no objection doing explicit naming instead of the current sockets
solution. My original motivation to do so was to handle the variable number of sockets with slightly less code. I now have socket
and migrate_socket
, the second being an Option
depending whether the application wants to perform migration or not.
apps/src/client.rs
Outdated
@@ -446,3 +553,16 @@ pub fn connect( | |||
|
|||
Ok(()) | |||
} | |||
|
|||
/// Generate a new pair of Source Connection ID and reset token. | |||
fn generate_cid_and_reset_token<T: SecureRandom>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two of these and they seem identical?
If so, please put common code in apps/src/common.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, moved the duplicate codes into common.rs
apps/src/common.rs
Outdated
@@ -89,6 +89,8 @@ pub struct Client { | |||
|
|||
pub http_conn: Option<Box<dyn HttpConn>>, | |||
|
|||
pub id: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a more descriptive term than just id
that we could use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe client_id
? It is used as an intermediate identifier between Connection IDs and actual client's connections.
@@ -520,6 +621,12 @@ typedef struct { | |||
|
|||
// DATAGRAM frame extension parameter, if any. | |||
ssize_t peer_max_datagram_frame_size; | |||
|
|||
// The stats of the connection's paths. | |||
quiche_path_stats paths[8]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does 8
come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually a magic number to ensure the structure can hold up to 8 path's statistics. I could instead put a simple pointer along with a length, but it is unclear to me whether the caller should allocate the required space or the ffi
should instead do this.
09438b7
to
21ce1c7
Compare
ffff7b4
to
9711413
Compare
I pushed a commit that does a bit of refactoring and clean-up. For the most part there should be no functional change, however I did rename or otherwise change some of the new public APIs as follows:
(I might have missed something in the list above) |
9711413
to
a0264b6
Compare
92b3d16
to
3f96d4d
Compare
The proposed refactoring looks good to me. |
This commit adds connection migration support, as well as related features such as support for changing connection IDs and tracking multiple network paths. Network path changes are exposed to applications through the `path_event_next()` API, and migration is done either passively as a server (raising a `PathEvent::PeerMigrated` event to the app), or actively as a client through the `migrate()` API. Note that after a migration, the connection might lose connectivity, e.g. because the client uses a non-working 4-tuple or because there was some form of address spoofing attack seen as a migration by the server. quiche includes a fall-back support when the newly migrated path is detected as non-working. This required introducing full 4-tuple awareness to quiche, i.e. both the peer and local addresses (while before only the peer's address was tracked), which required changing existing public APIs, such as `SendInfo`, `RecvInfo`, `connect()` and `accept()`, but applications can be easily updated as the local address can be typically retrieved using `socket.local_addr().unwrap()`. In addition, this adds APIs for considering only a specific 4-tuple using the `send_on_path()` method (implementers might also be interested in the `paths_iter()` method). This also adds tracking of multiple network paths, via the `Path` structure (located in its dedicated `path` module). Note that the `Recovery` structure is now relative to a path and not on the whole connection anymore. This enables the stack to easily maintain sent packets, RTT estimates, congestion control state,... on each existing path. All these network paths are handled by a dedicated structure, the `PathMap`, also located in the `path` module. The stack is now able to probe network paths in order to validate them through PATH_CHALLENGE/PATH_RESPONSE exchange. Such challenge process can be triggered by the application thanks to the `probe_path()` method. While the client can probe new 4-tuples, the server first needs to see a peer's packet originating from the 4-tuple before probing it. Several new `PathEvent` events returned by the `path_event_next()` method (`New`, `Validated`, `FailedValidation`,...) enables the application to handle such cases. Finally, support for Connection ID changes was also added. A connection is not bound to a single SCID-DCID pair and can now advertise and use other Connection IDs through NEW_CONNECTION_ID and RETIRE_CONNECTION_ID frames. From an API viewpoint, a host can indicate in its QUIC config the maximum number of active connection IDs it can support (`set_active_connection_id_limit()`). Then, it can actively provide Connection IDs to its peer (`new_source_cid()`) and retire peer-provided Connection IDs (`retire_destination_cid()`). Retiring a source Connection ID are monitored through the `retired_scid_next()` method. Connection IDs are handled by a dedicated structure, `ConnectionIdentifiers`, located in the newly introduced `cid` module.
3f96d4d
to
57be325
Compare
Merged 🎉, thanks @qdeconinck for all your work and @ehaydenr for the review! |
This PR provides provides connection migration support to quiche. It is composed of 3 incremental parts (the current 3 commits), starting from CID handling support, then path validation and finally connection migration. See the commit messages for further details about the changes introduced by each of them.
Note that while most of the changes are additive, there are a few breaking API changes introduced by the path validation support. Namely,
RecvInfo
now contains theto
field, indicating the local host address on which the packet was received.SendInfo
now contains thefrom
field, indicating from which local host address the packet should be sent.connect
andaccept
now takes an ownedConnectionId
for the source CID and an additionalSocketAddr
parameter corresponding to the (initial) local host address that is used by the connection.