Skip to content

Defer DTLS handshake consumption until parse succeeds#144

Merged
algesten merged 6 commits into
mainfrom
fix-handshake-parse-before-handle
Jun 21, 2026
Merged

Defer DTLS handshake consumption until parse succeeds#144
algesten merged 6 commits into
mainfrom
fix-handshake-parse-before-handle

Conversation

@algesten

@algesten algesten commented Jun 15, 2026

Copy link
Copy Markdown
Owner

Summary:

@algesten

Copy link
Copy Markdown
Owner Author

Alternative take on #143

@zRedShift I felt 143 was going in a very complex direction. I try again making it smaller.

@zRedShift zRedShift left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have time to look into this, sorry for the delay. Here's a Codex/GPT-5.5 assisted review.

Findings

High: Extension bodies are still validated after handshake consumption

This PR moves set_handled() and transcript append until after Body::parse(), but Body::parse() still does not validate all known extension bodies. The Extension::parse() helpers keep raw extension data ranges, and the known extension body parsers still run later in the client/server state handlers. That leaves the core #142 failure mode intact: malformed known extension bodies can still be parsed after the handshake has already been consumed.

Evidence:

pub fn parse(input: &[u8], base_offset: usize) -> IResult<&[u8], Extension> {
let original_input = input;
let (input, extension_type) = ExtensionType::parse(input)?;
let (input, extension_length) = be_u16(input)?;
let (input, extension_data_slice) = if extension_length > 0 {
take(extension_length)(input)?
} else {
(input, &input[0..0])
};
// Calculate absolute range in root buffer
let relative_offset =
extension_data_slice.as_ptr() as usize - original_input.as_ptr() as usize;
let start = base_offset + relative_offset;
let end = start + extension_data_slice.len();
Ok((
input,
Extension {
extension_type,
extension_data_range: start..end,
},
))
}
pub fn extension_data<'a>(&self, buf: &'a [u8]) -> &'a [u8] {
&buf[self.extension_data_range.clone()]

pub fn parse(input: &[u8], base_offset: usize) -> IResult<&[u8], Extension> {
let original_input = input;
let (input, extension_type) = ExtensionType::parse(input)?;
let (input, extension_length) = be_u16(input)?;
let (input, extension_data_slice) = if extension_length > 0 {
take(extension_length)(input)?
} else {
(input, &input[0..0])
};
// Calculate absolute range in root buffer
let relative_offset =
extension_data_slice.as_ptr() as usize - original_input.as_ptr() as usize;
let start = base_offset + relative_offset;
let end = start + extension_data_slice.len();
Ok((
input,
Extension {
extension_type,
extension_data_range: start..end,
},
))
}
pub fn extension_data<'a>(&self, buf: &'a [u8]) -> &'a [u8] {
&buf[self.extension_data_range.clone()]

let (rest, body) = Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?;
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();
}
// If transcript is provided, write the handshake header + body after parsing succeeds.
if let Some(transcript) = transcript {
transcript.push(first_handshake.header.msg_type.as_u8());
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&first_handshake.header.message_seq.to_be_bytes());
// Defragmented handshake has fragment_offset=0 and fragment_length=length
transcript.extend_from_slice(&0u32.to_be_bytes()[1..]);
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&buffer[..first_handshake.header.length as usize]);

let (rest, body) = if allow_unknown_client_hello_suites {
Body::parse_allow_unknown_client_hello_suites(
buffer,
0,
first_handshake.header.msg_type,
cipher_suite,
)?
} else {
Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?
};
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();
}
// If transcript is provided, write the TLS 1.3-style header + body after parsing succeeds.
// Per RFC 9147 Section 5.2, the transcript uses msg_type(1) + length(3)
// WITHOUT the DTLS-specific message_seq, fragment_offset, fragment_length.
if let Some(transcript) = transcript {
transcript.push(first_handshake.header.msg_type.as_u8());
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&buffer[..first_handshake.header.length as usize]);

dimpl/src/dtls12/server.rs

Lines 422 to 442 in 09dafc4

for ext in ch.extensions {
match ext.extension_type {
ExtensionType::UseSrtp => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, use_srtp) =
UseSrtpExtension::parse(ext_data).map_err(InternalError::from)?;
client_srtp_profiles = Some(use_srtp.profiles);
}
ExtensionType::ExtendedMasterSecret => {
client_offers_ems = true;
}
ExtensionType::SupportedGroups => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, groups) =
SupportedGroupsExtension::parse(ext_data).map_err(InternalError::from)?;
client_supported_groups = Some(groups.groups);
}
ExtensionType::EcPointFormats => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let _ =
ECPointFormatsExtension::parse(ext_data).map_err(InternalError::from)?;

dimpl/src/dtls13/server.rs

Lines 449 to 496 in 09dafc4

for ext in &client_hello.extensions {
match ext.extension_type {
ExtensionType::SupportedVersions => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, sv) = SupportedVersionsClientHello::parse(ext_data)
.map_err(InternalError::from)?;
for v in &sv.versions {
if *v == ProtocolVersion::DTLS1_3 {
supported_versions_ok = true;
}
}
}
ExtensionType::KeyShare => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let ext_data_start = ext.extension_data_range.start;
let (_, ks) = KeyShareClientHello::parse(ext_data, ext_data_start)
.map_err(InternalError::from)?;
let mut entries = ArrayVec::new();
for entry in &ks.entries {
entries
.try_push((entry.group, entry.key_exchange_range.clone()))
.map_err(|_| {
InternalError::parse(nom::error::ErrorKind::LengthValue)
})?;
}
client_key_shares = Some(entries);
}
ExtensionType::SupportedGroups => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, sg) =
SupportedGroupsExtension::parse(ext_data).map_err(InternalError::from)?;
client_supported_groups = Some(sg.groups);
}
ExtensionType::SignatureAlgorithms => {
let ext_data = ext.extension_data(&server.defragment_buffer);
// Parse but we don't currently filter by signature algorithms
let _ = SignatureAlgorithmsExtension::parse(ext_data);
}
ExtensionType::UseSrtp => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, use_srtp) =
UseSrtpExtension::parse(ext_data).map_err(InternalError::from)?;
client_srtp_profiles = Some(use_srtp.profiles);
}
ExtensionType::Cookie => {
let ext_data = ext.extension_data(&server.defragment_buffer);
let (_, cookie) =
parse_cookie_extension(ext_data).map_err(InternalError::from)?;

In DTLS 1.2, next_handshake() also advances peer_handshake_seq_no before those later extension parsers run:

dimpl/src/dtls12/engine.rs

Lines 664 to 672 in 09dafc4

let handshake = Handshake::defragment(
iter,
defragment_buffer,
self.cipher_suite,
Some(&mut self.transcript),
)?;
// Move the expected seq_no along
self.peer_handshake_seq_no = handshake.header.message_seq + 1;

Suggested fix: validate all known extension bodies before committing handled/transcript/sequence state, or store already-validated extension-body data in the parsed handshake body so the state handlers no longer need to run fallible extension parsers after consumption.

Suggested regression: malformed known extension body (use_srtp, supported_groups, key_share, or cookie) followed by a clean retransmission with the same handshake key. The test should prove that only the clean message reaches transcript/sequence state and that the handshake can continue.

High: Non-Finished parse leftovers are still committed

The new consume gate also treats Body::parse() success as enough even when the parser leaves trailing bytes. Both defragment paths only reject non-empty rest for Finished; other known handshake bodies can return a valid prefix plus leftover bytes and still get marked handled and appended to the transcript.

Evidence:

let (rest, body) = Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?;
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();
}
// If transcript is provided, write the handshake header + body after parsing succeeds.
if let Some(transcript) = transcript {
transcript.push(first_handshake.header.msg_type.as_u8());
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&first_handshake.header.message_seq.to_be_bytes());
// Defragmented handshake has fragment_offset=0 and fragment_length=length
transcript.extend_from_slice(&0u32.to_be_bytes()[1..]);
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&buffer[..first_handshake.header.length as usize]);

let (rest, body) = if allow_unknown_client_hello_suites {
Body::parse_allow_unknown_client_hello_suites(
buffer,
0,
first_handshake.header.msg_type,
cipher_suite,
)?
} else {
Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?
};
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();
}
// If transcript is provided, write the TLS 1.3-style header + body after parsing succeeds.
// Per RFC 9147 Section 5.2, the transcript uses msg_type(1) + length(3)
// WITHOUT the DTLS-specific message_seq, fragment_offset, fragment_length.
if let Some(transcript) = transcript {
transcript.push(first_handshake.header.msg_type.as_u8());
transcript.extend_from_slice(&first_handshake.header.length.to_be_bytes()[1..]);
transcript.extend_from_slice(&buffer[..first_handshake.header.length as usize]);

Examples of parsers that can return leftover bytes after consuming a declared inner vector:

// Parse extensions length
let (remaining, extensions_len) = be_u16(input)?;
// Early return if extensions length is 0
if extensions_len == 0 {
return Ok((remaining, extensions));
}
// Take the extensions data
let (remaining, extensions_data) = take(extensions_len)(remaining)?;

pub fn parse(input: &[u8], base_offset: usize) -> IResult<&[u8], EncryptedExtensions> {
let original_input = input;
let (input, extensions_len) = be_u16(input)?;
let (input, extensions_data) = take(extensions_len)(input)?;
let data_base_offset =
base_offset + (extensions_data.as_ptr() as usize - original_input.as_ptr() as usize);
let mut extensions = ArrayVec::new();
let mut rest = extensions_data;
let mut current_offset = data_base_offset;
while !rest.is_empty() {
let before_len = rest.len();
let (new_rest, ext) = Extension::parse(rest, current_offset)?;
let parsed_len = before_len - new_rest.len();
current_offset += parsed_len;
if ext.extension_type.is_supported() {
if extensions
.iter()
.any(|existing: &Extension| existing.extension_type == ext.extension_type)
{
return Err(Err::Failure(Error::new(rest, ErrorKind::LengthValue)));
}
extensions
.try_push(ext)
.map_err(|_| Err::Failure(Error::new(rest, ErrorKind::LengthValue)))?;
}
rest = new_rest;
}
Ok((input, EncryptedExtensions { extensions }))

pub fn parse(input: &[u8], base_offset: usize) -> IResult<&[u8], Certificate> {
let original_input = input;
let (input, total_len) = be_u24(input)?;
let (input, certs_data) = take(total_len)(input)?;
// Calculate base offset for certs_data within the root buffer
let certs_base_offset =
base_offset + (certs_data.as_ptr() as usize - original_input.as_ptr() as usize);
// Parse certificates manually with dynamic base_offset
let mut certificate_list = ArrayVec::new();
let mut rest = certs_data;
while !rest.is_empty() {
let offset =
certs_base_offset + (rest.as_ptr() as usize - certs_data.as_ptr() as usize);
let (new_rest, cert) = Asn1Cert::parse(rest, offset)?;
certificate_list
.try_push(cert)
.map_err(|_| Err::Failure(Error::new(rest, ErrorKind::LengthValue)))?;
rest = new_rest;
}
Ok((input, Certificate { certificate_list }))
}

pub fn parse(input: &[u8], base_offset: usize) -> IResult<&[u8], Certificate> {
let original_input = input;
// certificate_request_context<0..255>
let (input, context_len) = be_u8(input)?;
let (input, context_slice) = take(context_len)(input)?;
let context_relative = context_slice.as_ptr() as usize - original_input.as_ptr() as usize;
let context_range = (base_offset + context_relative)
..(base_offset + context_relative + context_slice.len());
// certificate_list<0..2^24-1>
let (input, total_len) = be_u24(input)?;
let (input, certs_data) = take(total_len)(input)?;
let certs_base_offset =
base_offset + (certs_data.as_ptr() as usize - original_input.as_ptr() as usize);
let mut certificate_list = ArrayVec::new();
let mut rest = certs_data;
while !rest.is_empty() {
let entry_base =
certs_base_offset + (rest.as_ptr() as usize - certs_data.as_ptr() as usize);
// cert_data<1..2^24-1>
let (r, cert) = Asn1Cert::parse(rest, entry_base)?;
// extensions<0..2^16-1>
let (r, ext_len) = be_u16(r)?;
let (r, ext_slice) = take(ext_len)(r)?;
let ext_relative = ext_slice.as_ptr() as usize - certs_data.as_ptr() as usize;
let extensions_range = (certs_base_offset + ext_relative)
..(certs_base_offset + ext_relative + ext_slice.len());
certificate_list
.try_push(CertificateEntry {
cert,
extensions_range,
})
.map_err(|_| Err::Failure(Error::new(rest, ErrorKind::LengthValue)))?;
rest = r;
}
Ok((
input,
Certificate {

Suggested fix: make the pre-consumption validation require exact whole-body consumption for every known non-opaque handshake body, not only Finished. If any known body parser leaves trailing bytes, handle it through the same discard/replacement path as other transient malformed input.

Suggested regression: non-Finished handshake body with a valid prefix plus trailing bytes, followed by a clean retransmission. The malformed prefix should not reach transcript/sequence state, and the clean retransmission should remain processable.

High: Failed body parses can leave an unhandled queue entry that blocks recovery

For malformed handshakes rejected before the post-parse commit point (Body::parse() errors or the Finished exact-tail check), the PR now returns before marking the candidate fragments handled. The public packet paths swallow transient parser errors, so the bad queued entry remains visible. A later clean retransmission with the same (message_seq, fragment_offset) can then be dropped as a duplicate instead of replacing or bypassing the failed candidate.

Evidence:

let (rest, body) = Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?;
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();

dimpl/src/dtls12/engine.rs

Lines 298 to 315 in 09dafc4

let search_result = self.queue_rx.binary_search_by(|item| {
let key_other = item
.first()
.first_handshake()
.as_ref()
.map(|h| (h.header.message_seq, h.header.fragment_offset))
.unwrap_or((u16::MAX, u32::MAX));
key_other.cmp(&key_current)
});
match search_result {
Err(index) => {
// Insert in order of handshake key
self.queue_rx.insert(index, incoming);
}
Ok(_) => {
// Exact duplicate handshake fragment
}

let (rest, body) = if allow_unknown_client_hello_suites {
Body::parse_allow_unknown_client_hello_suites(
buffer,
0,
first_handshake.header.msg_type,
cipher_suite,
)?
} else {
Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite)?
};
if !rest.is_empty() && first_handshake.header.msg_type == MessageType::Finished {
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
return Err(crate::InternalError::parse_incomplete());
}
for handshake in handled {
handshake.set_handled();

dimpl/src/dtls13/engine.rs

Lines 420 to 450 in 09dafc4

Ok(index) => {
// Duplicate message_seq + fragment_offset. Replace if either:
// (a) the existing entry was already consumed (handled), so a
// fresh retransmission (e.g., CH2 after HRR) can be processed.
// (b) the existing entry looks corrupted: its total `length`
// differs from `fragment_length` while the new entry's match,
// indicating the retransmission corrected a bit-flip.
let existing = &self.queue_rx[index];
let should_replace = existing.first().is_handled() || {
let existing_corrupt = existing
.first()
.first_handshake()
.map(|h| h.header.length != h.header.fragment_length)
.unwrap_or(false);
let incoming_ok = incoming
.first()
.first_handshake()
.map(|h| h.header.length == h.header.fragment_length)
.unwrap_or(false);
existing_corrupt && incoming_ok
};
if should_replace {
for record in incoming.records().iter() {
let seq = record.record().sequence;
if seq.epoch >= 2 {
let _ = self
.received_record_numbers
.try_push((seq.epoch as u64, seq.sequence_number));
}
}
self.queue_rx[index] = incoming;

dimpl/src/error.rs

Lines 687 to 693 in 09dafc4

pub(crate) fn into_public_error(self) -> Option<Error> {
match self {
Self::Transient(err) => {
debug!("Discarding packet: {err}");
None
}
Self::Fatal(err) => Some(err),

Suggested fix: keep transcript mutation deferred, but make failed assembled-body parses discard or mark only the failed candidate fragments as handled, or track parse-failed queue entries so an exact clean retransmission can replace them. Be careful with fragmented messages: replacing only the first fragment is not enough if stale later fragments still participate in defragmentation.

Suggested regression: a complete same-length malformed ClientHello/Finished/KeyUpdate candidate that fails body parsing, followed by a corrected retransmission with the same handshake key. The connection should progress and the transcript should contain only the corrected message.

Medium: The new stack cap counts a different unit than the existing defragment cap

MAX_DEFRAGMENT_HANDSHAKES caps flattened handshake entries, but the existing defragment visibility cap is expressed in records, and each record can contain multiple parsed handshakes. That introduces a new protocol-visible limit without boundary coverage.

Evidence:

use nom::number::complete::{be_u16, be_u24};
const MAX_DEFRAGMENT_HANDSHAKES: usize = 50;

use nom::number::complete::{be_u16, be_u24};
const MAX_DEFRAGMENT_HANDSHAKES: usize = 50;

dimpl/src/dtls12/engine.rs

Lines 593 to 600 in 09dafc4

fn has_complete_handshake_with_seq(&mut self, wanted: MessageType, expected_seq: u16) -> bool {
let mut skip_handled = self
.queue_rx
.iter()
.flat_map(|i| i.records().iter())
.skip_while(|r| r.is_handled())
// Cap to MAX_DEFRAGMENT_PACKETS to avoid misbehaving peers
.take(MAX_DEFRAGMENT_PACKETS)

dimpl/src/dtls13/engine.rs

Lines 729 to 735 in 09dafc4

fn has_complete_handshake_with_seq(&mut self, wanted: MessageType, expected_seq: u16) -> bool {
let mut skip_handled = self
.queue_rx
.iter()
.flat_map(|i| i.records().iter())
.skip_while(|r| r.is_handled())
.take(MAX_DEFRAGMENT_PACKETS)

pub struct ParsedRecord {
record: DTLSRecord,
handshakes: ArrayVec<Handshake, 8>,

pub struct ParsedRecord {
record: Dtls13Record,
handshakes: ArrayVec<Handshake, 8>,

Suggested fix: either derive the handled-fragment cap from the same unit as the caller cap, stop defragmenting once the target handshake is complete, or add boundary tests showing that valid coalesced/fragmented traffic is not rejected at the new limit.

@algesten

Copy link
Copy Markdown
Owner Author

Thanks for the detailed review. I went through the findings one by one and kept the scope intentionally below the broader rollback/transactionality direction from #143.

  1. Extension bodies validated after handshake consumption

Accepted/documented rather than expanded. The malformed known extension payload is still rejected by the client/server state handler; the remaining problem is recovery from a transiently corrupted datagram after the handshake body/envelope has already been consumed. Pulling all known extension payload parsing into the handshake parser, or storing parsed extension payloads in the body, makes this PR drift toward the broader transactionality work. I added comments in the DTLS 1.2 and DTLS 1.3 defragment commit points documenting this parser/state-handler boundary and the accepted recovery edge.

  1. Non-Finished parse leftovers are still committed

Fixed. The exact-consumption check now applies to every recognized handshake message type, not only Finished. Unknown handshake types remain opaque. Added DTLS 1.2 and DTLS 1.3 coverage for known non-Finished bodies with trailing bytes, proving they are rejected before handled/transcript mutation.

  1. Failed body parses can leave an unhandled queue entry that blocks recovery

Fixed with a local parser-layer discard. Once defragmentation has assembled a complete candidate, body parse failure or exact-consumption failure marks only the candidate fragments handled before returning the error. Transcript is not written and sequence state is not advanced. This reuses the existing stack ArrayVec of selected fragments and avoids queue rebuild/rollback state.

  1. Stack cap counts flattened handshakes, not records

Accepted/documented as a sanity cap. The cap is 50 flattened fragments for one defragmentation attempt; exceeding that implies pathologically tiny records for ordinary handshake sizes. I added comments by the cap in both protocol versions clarifying that it intentionally does not mirror the receive queue record-count cap.

Verification run:

MACOSX_DEPLOYMENT_TARGET=13.0 cargo test handshake

@algesten

Copy link
Copy Markdown
Owner Author

@zRedShift I think we're ready to merge and close this out.

@zRedShift zRedShift left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Codex's re-review:

Findings

High: clean same-key retransmissions can still be dropped after parser-layer discard

The new parser-layer discard marks the failed assembled candidate handled before returning from Body::parse() or exact-tail failures:

let (rest, body) =
match Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite) {
Ok(parsed) => parsed,
Err(err) => {
mark_handled(handled);
return Err(err.into());
}
};
if !rest.is_empty()
&& first_handshake
.header
.msg_type
.rejects_trailing_body_bytes()
{
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
mark_handled(handled);
return Err(crate::InternalError::parse_incomplete());

let (rest, body) = if allow_unknown_client_hello_suites {
match Body::parse_allow_unknown_client_hello_suites(
buffer,
0,
first_handshake.header.msg_type,
cipher_suite,
) {
Ok(parsed) => parsed,
Err(err) => {
mark_handled(handled);
return Err(err.into());
}
}
} else {
match Body::parse(buffer, 0, first_handshake.header.msg_type, cipher_suite) {
Ok(parsed) => parsed,
Err(err) => {
mark_handled(handled);
return Err(err.into());
}
}
};
if !rest.is_empty()
&& first_handshake
.header
.msg_type
.rejects_trailing_body_bytes()
{
debug!("Defragmentation failed. Body::parse() did not consume the entire buffer");
mark_handled(handled);
return Err(crate::InternalError::parse_incomplete());

That avoids transcript/sequence mutation, but it does not reliably let a clean retransmission replace or bypass the discarded candidate.

In DTLS 1.2, duplicate insertion still keys only on (message_seq, fragment_offset), and the Ok(_) duplicate branch drops the retransmission without checking whether the existing candidate is already handled:

dimpl/src/dtls12/engine.rs

Lines 298 to 315 in d7e7cd7

let search_result = self.queue_rx.binary_search_by(|item| {
let key_other = item
.first()
.first_handshake()
.as_ref()
.map(|h| (h.header.message_seq, h.header.fragment_offset))
.unwrap_or((u16::MAX, u32::MAX));
key_other.cmp(&key_current)
});
match search_result {
Err(index) => {
// Insert in order of handshake key
self.queue_rx.insert(index, incoming);
}
Ok(_) => {
// Exact duplicate handshake fragment
}

The consumer then skips handled records/handshakes:

dimpl/src/dtls12/engine.rs

Lines 593 to 603 in d7e7cd7

fn has_complete_handshake_with_seq(&mut self, wanted: MessageType, expected_seq: u16) -> bool {
let mut skip_handled = self
.queue_rx
.iter()
.flat_map(|i| i.records().iter())
.skip_while(|r| r.is_handled())
// Cap to MAX_DEFRAGMENT_PACKETS to avoid misbehaving peers
.take(MAX_DEFRAGMENT_PACKETS)
.flat_map(|r| r.handshakes().iter())
.skip_while(|h| h.is_handled())
.peekable();

purge_handled_queue_rx() does not fully save this path because it only runs from poll_output(), and only removes front entries whose records are fully handled. A clean retransmission that arrives before the next poll/purge is still dropped:

dimpl/src/dtls12/engine.rs

Lines 413 to 416 in d7e7cd7

pub fn poll_output<'a>(&mut self, buf: &'a mut [u8], now: Instant) -> Output<'a> {
// Drain incoming queue of processed records.
self.purge_handled_queue_rx();

dimpl/src/dtls12/engine.rs

Lines 470 to 478 in d7e7cd7

fn purge_handled_queue_rx(&mut self) {
while let Some(peek) = self.queue_rx.front() {
let fully_handled = peek.records().iter().all(|r| r.is_handled());
if fully_handled {
let incoming = self.queue_rx.pop_front().unwrap();
incoming
.into_records()
.for_each(|r| self.buffers_free.push(r.into_buffer()));

DTLS 1.3 has an analogous same-record case. Replacement checks existing.first().is_handled(), but Record::is_handled() is false until all handshakes in that record are handled:

dimpl/src/dtls13/engine.rs

Lines 420 to 450 in d7e7cd7

Ok(index) => {
// Duplicate message_seq + fragment_offset. Replace if either:
// (a) the existing entry was already consumed (handled), so a
// fresh retransmission (e.g., CH2 after HRR) can be processed.
// (b) the existing entry looks corrupted: its total `length`
// differs from `fragment_length` while the new entry's match,
// indicating the retransmission corrected a bit-flip.
let existing = &self.queue_rx[index];
let should_replace = existing.first().is_handled() || {
let existing_corrupt = existing
.first()
.first_handshake()
.map(|h| h.header.length != h.header.fragment_length)
.unwrap_or(false);
let incoming_ok = incoming
.first()
.first_handshake()
.map(|h| h.header.length == h.header.fragment_length)
.unwrap_or(false);
existing_corrupt && incoming_ok
};
if should_replace {
for record in incoming.records().iter() {
let seq = record.record().sequence;
if seq.epoch >= 2 {
let _ = self
.received_record_numbers
.try_push((seq.epoch as u64, seq.sequence_number));
}
}
self.queue_rx[index] = incoming;

pub fn is_handled(&self) -> bool {
if self.parsed.handshakes.is_empty() {
self.parsed.handled.load(Ordering::Relaxed)
} else {
self.parsed.handshakes.iter().all(|h| h.is_handled())
}

So if the failed candidate is the first handshake in a same-record flight and later handshakes remain unhandled, a clean retransmission of the first candidate is not admitted. The parser discard only handled the selected prefix; the whole record is not handled.

Suggested fix: make duplicate replacement/purge candidate-aware rather than whole-record aware. In particular, DTLS 1.2 should be able to replace or remove handled same-key handshake fragments, and DTLS 1.3 should not require the whole existing record to be handled before replacing a handled same-key handshake candidate. Please add recovery tests where a malformed assembled candidate is followed by a clean retransmission with the same (message_seq, fragment_offset).

High: the PR still overclaims Fixes #142

Issue #142 asks that the full handshake body, including inner extension payloads, be validated before consumption/transcript/state mutation. The updated PR explicitly accepts that known extension payload parsing still happens later in the client/server state handlers:

// Intentional boundary: Body::parse validates the handshake body shape and
// extension envelopes, but known extension payloads remain validated by the
// client/server state handlers. A transiently corrupted UDP datagram whose
// extension payload fails later may therefore have been consumed here; that
// recovery edge is accepted to keep this path parser-only and avoid the
// broader transaction/rollback machinery.

// Intentional boundary: Body::parse validates the handshake body shape and
// extension envelopes, but known extension payloads remain validated by the
// client/server state handlers. A transiently corrupted UDP datagram whose
// extension payload fails later may therefore have been consumed here; that
// recovery edge is accepted to keep this path parser-only and avoid the
// broader transaction/rollback machinery.

That scope choice is understandable for keeping this PR smaller, but it means this PR does not fully close #142 as written. The PR body still says Fixes #142, and also says the unit coverage proves parse failures do not mark handshakes handled, while current tests assert the new discard behavior does mark them handled:

assert!(handshake.is_handled());

assert!(handshake.is_handled());

Suggested fix: either remove/narrow the Fixes #142 claim and leave the extension-payload recovery part open, or pull extension-payload validation before consumption and add the clean-retransmission regressions for that path too. The PR body should also describe the current behavior as “failed complete candidates are discarded without transcript/sequence mutation”, not “not marked handled.”

Validation

Local checks passed:

cargo fmt --check
git diff --check 32e3b02f6995969014b35aed97eca0889f623d0c...HEAD
/home/ronen/.codex/skills/dimpl/scripts/check-snowflake-local.pl 32e3b02f6995969014b35aed97eca0889f623d0c
cargo test handshake --features rcgen
cargo test --all-targets --features rcgen
cargo clippy --all-targets --features rcgen -- -D warnings
cargo test --no-default-features --features rust-crypto
cargo clippy --no-default-features --features rust-crypto -- -D warnings
cargo test --doc --features rcgen

@algesten

algesten commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

@zRedShift yeah, both of those comments are invalid given that I accept the UDP resend issue described.

@zRedShift zRedShift left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@algesten then it's good to go

algesten and others added 6 commits June 21, 2026 15:12
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Codex <codex@openai.com>
Co-Authored-By: Codex <codex@openai.com>
@algesten algesten force-pushed the fix-handshake-parse-before-handle branch from 497064c to 5a2db0c Compare June 21, 2026 13:13
@algesten

Copy link
Copy Markdown
Owner Author

To summarize I disagree with these two as blockers.

For the extension-body case, the remaining problem is recovery from UDP corruption after the handshake envelope has already parsed. That can happen, but it is very much an edge case. Moving all known extension payload parsing into the handshake parser, or making the later state-handler parsing transactional, feels like a lot of machinery for that case.

The packet still errors. It just errors later, when the state handler parses the known extension payload.

For the defragmentation case, I don't think we should try to preserve old fragments and mix them with later retransmitted fragments. Once a complete assembled candidate fails to parse, we don't know which fragment was bad. A clean recovery is a resend of the handshake message or flight, not trying combinations of fragments from attempt one and attempt two until something parses. That would be absurd.

Also, this is Sans-IO. After handle_packet, the caller must poll until Output::Timeout. That poll purges handled queue entries. So marking the failed candidate handled doesn't put us in a state where a full resend can't be accepted. The problematic timing case needs the caller to feed another packet before doing the required poll, which is outside the API contract.

@algesten algesten merged commit 1a47ca7 into main Jun 21, 2026
46 checks passed
@algesten algesten deleted the fix-handshake-parse-before-handle branch June 21, 2026 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Malformed handshake extensions poison the transcript: validate full message before consuming/transcript-append

2 participants