bench: add benchmarks and results by al8n · Pull Request #5 · Findit-AI/fasrt

al8n · 2026-03-11T17:44:24Z

No description provided.

Copilot

Pull request overview

This PR adds Criterion benchmarks for SRT/WebVTT parsing and includes performance-oriented parsing changes (byte-based timestamp parsing + optional memchr fast paths) to improve throughput.

Changes:

Optimize SRT/WebVTT timestamp parsing by extracting fields from bytes instead of split/parse.
Speed up line iteration in both parsers using memchr when enabled.
Add Criterion benchmark suites for SRT and WebVTT and document benchmark results in the README.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/vtt.rs`	Reworks timestamp parsing and line scanning for performance.
`src/srt.rs`	Reworks header/timestamp parsing and line scanning for performance.
`benches/vtt.rs`	Adds Criterion benchmarks covering small inputs and fixture-based workloads.
`benches/srt.rs`	Adds Criterion benchmarks covering small/medium inputs and fixture-based workloads.
`README.md`	Documents benchmark results and how to run them.
`Cargo.toml`	Registers the new WebVTT benchmark target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T17:50:46Z

+  let hour_len = len - 10;
+  let hours = match hour_len {
+    2 => Hour(digit2(&b[..2]) as u16),
+    3 => Hour(digit3(&b[..3])),
+    _ => return Err(ParseHourError::NotPadded.into()),
+  };


parse_timestamp_bytes assumes the hour component is exactly 2 or 3 digits (hour_len match), but the lexer regex for Token::Header allows 1–3 digits ([0-9]{1,3}). This is a behavior regression: inputs like 0:00:01,000 will now be rejected with ParseHourError::NotPadded even though they match the token regex and were previously accepted. Update the hour parsing to handle 1-digit hours (or tighten the regex to require 2–3 digits) so lexer and parser constraints stay consistent.

codecov · 2026-03-11T17:52:25Z

Codecov Report

❌ Patch coverage is 83.63636% with 18 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/vtt.rs	81.48%	15 Missing ⚠️
src/srt.rs	89.28%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T18:18:11Z

+";
+
+const MEDIUM_SRT: &str = include_str!("../fixtures/srt/DeathNote_01.eng.srt");
+


MEDIUM_SRT uses include_str!("../fixtures/srt/DeathNote_01.eng.srt"), but Cargo.toml excludes the entire fixtures directory from the published package (exclude = ["tests", "fixtures"]). That means cargo bench will fail to compile for downstream users of the published crate because the included file won’t exist. Consider embedding a smaller sample under benches/, generating the data at runtime, or adjusting the package include/exclude so the referenced fixture is available when benches are built.

Copilot · 2026-03-11T18:18:11Z

+  if b[len - 4] != b'.' || b[len - 7] != b':' {
+    return Err(ParseVttError::InvalidTimestamp("invalid format"));
+  }
+  let millis = Millisecond(vtt_digit3(&b[len - 3..]));
+  let seconds = Second(vtt_digit2(&b[len - 6..len - 4]));
+  let minutes = Minute(vtt_digit2(&b[len - 9..len - 7]));
+  let hours = if len > 9 {
+    if b[len - 10] != b':' {
+      return Err(ParseVttError::InvalidTimestamp("invalid format"));
+    }
+    let hour_str = &s[..len - 10];
+    // VTT hours are unbounded u64; parse the variable-length prefix
+    parse_vtt_hour_bytes(hour_str.as_bytes())
+      .map_err(|_| ParseVttError::InvalidTimestamp("invalid hours"))?
+  } else {
+    Hour::new()
+  };
+  Ok(Timestamp::from_hmsm(hours, minutes, seconds, millis))


parse_timestamp now constructs Minute, Second, and Millisecond via their tuple constructors (Minute(..), Second(..), Millisecond(..)), which bypasses the range/padding validation enforced by their FromStr/with/try_with APIs. This function is also used by cue-text parsing (vtt/cue.rs) where the timestamp regex allows arbitrary digits/colons, so inputs like 00:99.000 or 0:00.000 can create invalid units (or trigger underflow/panic in debug builds) and later break as_str()/encoding assumptions. Please validate digits + ranges (e.g., use Minute::try_with, Second::try_with, Millisecond::try_with and reject on None) before constructing the timestamp, or fall back to the previous parse()-based path for cue timestamps.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T18:39:28Z

+  let b = s.as_bytes();
+  let len = b.len();
+  if len < 9 {
+    return Err(ParseVttError::InvalidTimestamp("too short"));
+  }
+  // Validate expected separators before doing byte arithmetic.
+  // This guards against malformed input from cue-text parsing where
+  // the regex only loosely validates the format.
+  if b[len - 4] != b'.' || b[len - 7] != b':' {
+    return Err(ParseVttError::InvalidTimestamp("invalid format"));
+  }
+  let millis_val = vtt_digit3(&b[len - 3..]);
+  let seconds_val = vtt_digit2(&b[len - 6..len - 4]);
+  let minutes_val = vtt_digit2(&b[len - 9..len - 7]);
+  // Validate ranges — needed for cue-text path where input is not regex-validated.
+  if seconds_val >= 60 || minutes_val >= 60 {
+    return Err(ParseVttError::InvalidTimestamp("out of range"));
+  }
+  let millis = Millisecond(millis_val);
+  let seconds = Second(seconds_val);
+  let minutes = Minute(minutes_val);
+  let hours = if len > 9 {
+    if b[len - 10] != b':' {
+      return Err(ParseVttError::InvalidTimestamp("invalid format"));
+    }
+    let hour_str = &s[..len - 10];
+    // VTT hours are unbounded u64; parse the variable-length prefix
+    parse_vtt_hour_bytes(hour_str.as_bytes())
+      .map_err(|_| ParseVttError::InvalidTimestamp("invalid hours"))?
+  } else {
+    Hour::new()
+  };


In the cue-text path you explicitly note the input may not be regex-validated, but vtt_digit2/vtt_digit3 will silently produce incorrect values for non-ASCII digits (due to u8 underflow/wrap on b[0] - b'0'). Additionally, hour_str can be empty (e.g. \":00:00.000\"), and parse_vtt_hour_bytes returns Ok(Hour(0)), incorrectly accepting an invalid long-form timestamp. Consider validating that all digit positions are is_ascii_digit() before arithmetic and rejecting empty hour_str when len > 9.

Copilot · 2026-03-11T18:39:28Z

+fn parse_vtt_hour_bytes(b: &[u8]) -> Result<Hour, ParseHourError> {
+  let mut val: u64 = 0;
+  for &byte in b {
+    if !byte.is_ascii_digit() {
+      return Err(ParseHourError::NotPadded);
+    }
+    val = val
+      .checked_mul(10)
+      .and_then(|v| v.checked_add((byte - b'0') as u64))
+      .ok_or(ParseHourError::NotPadded)?;
  }
+  Ok(Hour(val))
+}


ParseHourError::NotPadded is used for both non-digit input and overflow (checked_mul/checked_add failure). That makes it hard to distinguish formatting errors from overflow when mapping to ParseVttError. Consider using a dedicated error variant for overflow (or mapping overflow to a more specific InvalidTimestamp message) so callers can report a more accurate failure reason.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T18:57:49Z

+  let millis_val = vtt_digit3(&b[len - 3..]);
+  let seconds_val = vtt_digit2(&b[len - 6..len - 4]);
+  let minutes_val = vtt_digit2(&b[len - 9..len - 7]);


parse_timestamp is called from cue-text parsing (vtt::cue) where timestamp tags are only loosely regex-validated (<[0-9:]+\.[0-9]{3}>). Here vtt_digit2/vtt_digit3 do unchecked u8 subtraction (b[0] - b'0'), which can panic on bytes < b'0' in debug builds and is generally unsafe for malformed input. Consider validating that all expected digit bytes are ASCII digits (or using checked conversion) before doing arithmetic, and return InvalidTimestamp on failure.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-11T19:17:08Z

 /// The error type for parsing hour components of timestamps.
 #[derive(Debug, Clone, PartialEq, Eq, IsVariant, Unwrap, TryUnwrap, thiserror::Error)]
 #[unwrap(ref, ref_mut)]
 #[try_unwrap(ref, ref_mut)]
 pub enum ParseHourError {
  /// The hour component is not zero-padded to 2 digits.
  #[error("hour component is not zero-padded to 2 digits")]
  #[unwrap(ignore)]
  #[try_unwrap(ignore)]
  NotPadded,
  /// The hour component is out of range (not between 0-999).
  #[error("hour component must be between 0-999, but was {0}")]
  Overflow(u16),
+  /// The hour component overflowed u64 (VTT unbounded hours).
+  #[error("hour component overflowed")]
+  #[unwrap(ignore)]
+  #[try_unwrap(ignore)]
+  HourOverflow,


ParseHourError is used by both SRT (0–999) and WebVTT (unbounded u64) hour parsing. Adding HourOverflow makes the enum’s current doc comment (0-999) misleading; update the ParseHourError docs to describe that some variants are format-specific (e.g., SRT range vs WebVTT unbounded + overflow).

Copilot · 2026-03-11T19:17:08Z

 pub(crate) fn parse_timestamp(s: &str) -> Result<Timestamp, ParseVttError> {
-  let (time, ms) = s
-    .split_once('.')
-    .ok_or(ParseVttError::InvalidTimestamp("missing '.'"))?;
-  let millis: Millisecond = ms.parse()?;
-  let mut parts = time.splitn(3, ':');
-  let first = parts
-    .next()
-    .ok_or(ParseVttError::InvalidTimestamp("empty timestamp"))?;
-  let second = parts.next().ok_or(ParseVttError::InvalidTimestamp(
-    "missing minutes or seconds",
-  ))?;
-  match parts.next() {
-    // Long form: HH:MM:SS
-    Some(sec_str) => Ok(Timestamp::from_hmsm(
-      first.parse()?,
-      second.parse()?,
-      sec_str.parse()?,
-      millis,
-    )),
-    // Short form: MM:SS
-    None => Ok(Timestamp::from_hmsm(
-      Hour::new(),
-      first.parse()?,
-      second.parse()?,
-      millis,
-    )),
+  let b = s.as_bytes();
+  let len = b.len();
+  if len < 9 {
+    return Err(ParseVttError::InvalidTimestamp("too short"));
+  }
+  // Validate expected separators before doing byte arithmetic.
+  // This guards against malformed input from cue-text parsing where
+  // the regex only loosely validates the format.
+  if b[len - 4] != b'.' || b[len - 7] != b':' {
+    return Err(ParseVttError::InvalidTimestamp("invalid format"));
+  }
+  let millis_val = vtt_digit3(&b[len - 3..]);
+  let seconds_val = vtt_digit2(&b[len - 6..len - 4]);
+  let minutes_val = vtt_digit2(&b[len - 9..len - 7]);
+  // Use try_with to validate ranges — needed for cue-text path where input
+  // is not regex-validated.
+  let millis = Millisecond::try_with(millis_val)
+    .ok_or(ParseVttError::InvalidTimestamp("milliseconds out of range"))?;
+  let seconds =
+    Second::try_with(seconds_val).ok_or(ParseVttError::InvalidTimestamp("seconds out of range"))?;
+  let minutes =
+    Minute::try_with(minutes_val).ok_or(ParseVttError::InvalidTimestamp("minutes out of range"))?;
+  let hours = if len > 9 {
+    if b[len - 10] != b':' {
+      return Err(ParseVttError::InvalidTimestamp("invalid format"));
+    }
+    let hour_str = &s[..len - 10];
+    // VTT hours are unbounded u64; parse the variable-length prefix
+    parse_vtt_hour_bytes(hour_str.as_bytes())
+      .map_err(|_| ParseVttError::InvalidTimestamp("invalid hours"))?
+  } else {
+    Hour::new()
+  };
+  Ok(Timestamp::from_hmsm(hours, minutes, seconds, millis))


The new byte-based timestamp parsing in parse_timestamp has several edge cases (non-digit bytes from cue-text path, empty hour prefix, huge hour overflow). There are existing cue-text timestamp tests, but none that assert malformed timestamp tags are safely rejected (no panic) and that hour-overflow paths produce an error. Add focused tests covering these cases to lock in the intended behavior.

Copilot · 2026-03-11T19:17:09Z

+  let b = s.as_bytes();
+  let len = b.len();
+  if len < 9 {
+    return Err(ParseVttError::InvalidTimestamp("too short"));
+  }
+  // Validate expected separators before doing byte arithmetic.
+  // This guards against malformed input from cue-text parsing where
+  // the regex only loosely validates the format.
+  if b[len - 4] != b'.' || b[len - 7] != b':' {
+    return Err(ParseVttError::InvalidTimestamp("invalid format"));
+  }
+  let millis_val = vtt_digit3(&b[len - 3..]);
+  let seconds_val = vtt_digit2(&b[len - 6..len - 4]);
+  let minutes_val = vtt_digit2(&b[len - 9..len - 7]);
+  // Use try_with to validate ranges — needed for cue-text path where input
+  // is not regex-validated.
+  let millis = Millisecond::try_with(millis_val)
+    .ok_or(ParseVttError::InvalidTimestamp("milliseconds out of range"))?;
+  let seconds =
+    Second::try_with(seconds_val).ok_or(ParseVttError::InvalidTimestamp("seconds out of range"))?;
+  let minutes =
+    Minute::try_with(minutes_val).ok_or(ParseVttError::InvalidTimestamp("minutes out of range"))?;


parse_timestamp does unchecked digit arithmetic (b[i] - b'0') for minutes/seconds/millis. When this function is called from cue-text parsing (e.g., unterminated tags), those byte positions may not be ASCII digits, which can panic in debug builds and/or yield wrapped values that slip past the later range checks. Add explicit ASCII-digit validation for the extracted slices (and return InvalidTimestamp) before calling vtt_digit2/vtt_digit3, or make those helpers validate and return Result.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T02:50:44Z

+  let slice = s.slice();
+  // The regex guarantees `H+:MM:SS,mmm <ws> --> <ws> H+:MM:SS,mmm`.
+  // Find the arrow separator and extract both timestamps via byte arithmetic.
+  let arrow = slice.find("-->").unwrap();
+  let start_str = slice[..arrow].trim();
+  let end_str = slice[arrow + 3..].trim();


The comment above parse_header says the regex guarantees H+:MM:SS,mmm ..., but the Token::Header regex currently enforces 2–3 hour digits. Either adjust the comment to match the regex, or relax the regex if H+ is intended.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T03:05:01Z

  /// Header "HH:MM:SS,mmm --> HH:MM:SS,mmm"
  #[regex(
-    r"[0-9]{1,3}:[0-5][0-9]:[0-5][0-9],[0-9]{3}[ \t\x0C]+-->[ \t\x0C]+[0-9]{1,3}:[0-5][0-9]:[0-5][0-9],[0-9]{3}",
+    r"[0-9]{2,3}:[0-5][0-9]:[0-5][0-9],[0-9]{3}[ \t\x0C]+-->[ \t\x0C]+[0-9]{2,3}:[0-5][0-9]:[0-5][0-9],[0-9]{3}",
    parse_header


The SRT header lexer regex was tightened from allowing 1–3 hour digits to 2–3 ([0-9]{2,3}). This changes parsing behavior (e.g., 0:00:01,000 --> ... will no longer be recognized as a header line in either strict or lossy mode). If 1-digit hours should remain supported, keep the regex at {1,3} and update parse_timestamp_bytes to handle hour_len == 1; otherwise, consider documenting this as an intentional strictness increase.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot · 2026-03-12T03:52:21Z

+Goodbye world!
+";
+
+const MEDIUM_SRT: &str = include_str!("../fixtures/srt/DeathNote_01.eng.srt");


include_str!("../fixtures/srt/DeathNote_01.eng.srt") hard-depends on a fixture file at compile time, but Cargo.toml excludes fixtures/ from the published package. This will make cargo bench fail to compile from crates.io or any checkout without fixtures. Consider reading the file at runtime with a fallback (like the VTT benches do), or include the needed fixtures in the package.

Copilot · 2026-03-12T03:52:21Z

  /// A timestamp could not be parsed.
  #[error("invalid timestamp: {0}")]
-  InvalidTimestamp(&'static str),
+  InvalidTimestamp(TimestampError),



The PR title suggests this change is benchmark-only, but this diff also changes the public WebVTT error API (InvalidTimestamp now contains TimestampError instead of &'static str) and updates core timestamp parsing logic. Please either update the PR title/description (and ideally the changelog) to reflect these behavior/API changes, or split the parsing changes into a separate PR.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

bench: add benchmarks and results

ade05b6

al8n requested a review from Copilot March 11, 2026 17:44

Copilot started reviewing on behalf of al8n March 11, 2026 17:45 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

fix comments

1f89be3

al8n requested a review from Copilot March 11, 2026 18:12

Copilot started reviewing on behalf of al8n March 11, 2026 18:13 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

improve performance

13d0eda

al8n requested a review from Copilot March 11, 2026 18:35

Copilot started reviewing on behalf of al8n March 11, 2026 18:36 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

al8n and others added 2 commits March 12, 2026 02:42

Update benches/vtt.rs

a2a379e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

improve performance

30b5c4a

al8n requested a review from Copilot March 11, 2026 18:52

Copilot started reviewing on behalf of al8n March 11, 2026 18:53 View session

improve performance

cb9bee5

Copilot AI reviewed Mar 11, 2026

View reviewed changes

improve performance

da9be04

al8n requested a review from Copilot March 11, 2026 19:12

Copilot started reviewing on behalf of al8n March 11, 2026 19:12 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

al8n added 2 commits March 12, 2026 03:19

improve performance

3f6146b

improve performance

f5e7549

al8n requested a review from Copilot March 12, 2026 02:42

Copilot started reviewing on behalf of al8n March 12, 2026 02:44 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

al8n and others added 2 commits March 12, 2026 10:52

Update benches/vtt.rs

ca93584

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update benches/srt.rs

71494a9

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

al8n and others added 2 commits March 12, 2026 10:53

Update src/vtt.rs

30ac3ee

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

update stale docs

d633aeb

al8n requested a review from Copilot March 12, 2026 02:58

Copilot started reviewing on behalf of al8n March 12, 2026 02:59 View session

update ci sh

6bb4595

Copilot AI reviewed Mar 12, 2026

View reviewed changes

al8n and others added 4 commits March 12, 2026 11:06

Update src/error.rs

29fa662

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update benches/vtt.rs

87a967e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

cleanup

a8ed2dc

cleanup

c36a4cb

al8n requested a review from Copilot March 12, 2026 03:45

Copilot started reviewing on behalf of al8n March 12, 2026 03:46 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

al8n and others added 2 commits March 12, 2026 11:54

Update benches/srt.rs

59ee779

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix clippy

cef5922

al8n merged commit ac5338c into main Mar 12, 2026
44 checks passed

al8n deleted the 0.2.0 branch March 12, 2026 04:38

al8n pushed a commit that referenced this pull request Apr 17, 2026

bench: add benchmarks and results (#5)

bdc1189

		";

		const MEDIUM_SRT: &str = include_str!("../fixtures/srt/DeathNote_01.eng.srt");

Conversation

al8n commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

codecov bot commented Mar 11, 2026 •

edited

Loading