Set accurate seekMap duration in SubtitleExtractor after parsing by onseok · Pull Request #3106 · androidx/media

onseok · 2026-03-06T07:43:55Z

SubtitleExtractor creates its IndexSeekMap with durationUs = C.TIME_UNSET in init(), but never updates it after parsing, even though the exact duration is knowable from the parsed cues' endTimeUs values.

This causes problems for WebVTT files with long-duration cues (e.g. a cue spanning 00:00.000 --> 01:00:00.000).
Since the seekMap reports TIME_UNSET as its duration, consumers may fall back to estimating duration from sample timestamps (startTimeUs), which can be significantly shorter than the actual content end time. This can result in subtitle samples not being loaded after seeking to a position beyond the estimated duration.

This change updates the seekMap's durationUs to the maximum endTimeUs across all parsed cues, following the same pattern used by Mp3Extractor (which calls IndexSeekMap.setDurationUs() and re-emits the seekMap after reading all samples).

Also added tests to verify this behavior. Happy to adjust if anything needs to be changed :)

icbaker · 2026-03-06T09:43:50Z

This change makes sense - but I'm curious about this part of your explanation:

Since the seekMap reports TIME_UNSET as its duration, consumers may fall back to estimating duration from sample timestamps (startTimeUs), which can be significantly shorter than the actual content end time.

Can you give more details about these consumers? If they are handling the data after it's been transcoded to MimeTypes.APPLICATION_MEDIA3_CUES then there will be a sample representing the 'end of the last cue' (i.e. the sample's start time is the time the cues end), so it would be a correct way to estimate the track duration (and this change wouldn't be required).

onseok · 2026-03-06T10:05:09Z

Hi, @icbaker. The consumer I had in mind is ProgressiveMediaPeriod.onLoadCompleted(), which falls back to getLargestQueuedTimestampUs() + DEFAULT_LAST_SAMPLE_DURATION_US when the seekMap duration is TIME_UNSET.

I initially had the same thought about end-of-cue samples, but it turns out LegacySubtitleUtil.toCuesWithTiming() skips them. In outputSubtitleEvent(), when the cue list at an event time is empty, it returns early without emitting a sample:

if (cuesForThisStartTime.isEmpty()) {
  // An empty cue list has already been implicitly encoded in the duration
  // of the previous sample.
  return;
}

For example, a single WebVTT cue spanning 00:10.000 --> 05:05.000 results in two event times [10s, 305s] inside WebvttSubtitle. But getCues(305s) returns an empty list because startTime <= 305 < endTime evaluates to false, so the 305s event gets skipped entirely. Only one sample at startTimeUs = 10s is produced, and the actual end time is encoded in its durationUs field instead.

Because of this, getLargestQueuedTimestampUs() returns 10s, and the fallback duration ends up being 10.01s rather than the correct 305s.

The existing tests also confirm this behavior. For instance, SubtitleExtractorTest shows that 3 cues produce 4 samples rather than 6, since end-time-only events are not emitted as separate samples.

icbaker · 2026-03-06T10:09:03Z

Ah right, yes I'd misremembered how things work - thanks for clarifying.

After parsing all subtitle cues, the maximum `endTimeUs` across all cues is known. Update the seekMap's `durationUs` with this value instead of leaving it as `C.TIME_UNSET`. This follows the same pattern used by `Mp3Extractor`, which also re-emits the seekMap with an exact duration after reading all samples.

icbaker · 2026-03-09T09:50:45Z

I'm going to send this for internal review now. You may see some more commits being added as I make changes in response to review feedback. Please refrain from pushing any more substantive changes as it will complicate the internal review - thanks!

icbaker self-assigned this Mar 6, 2026

icbaker added the pending comments label Mar 6, 2026

icbaker removed the pending comments label Mar 6, 2026

onseok and others added 2 commits March 9, 2026 09:50

Set (and test) too

58cbdc4

icbaker force-pushed the fix/subtitle-extractor-seekmap-duration branch from f5b3b1a to 58cbdc4 Compare March 9, 2026 09:50

icbaker added the should merge label Mar 9, 2026

Handle internal review feedback

96f6bdc

copybara-service Bot merged commit 2dc01e2 into androidx:main Mar 17, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set accurate seekMap duration in SubtitleExtractor after parsing#3106

Set accurate seekMap duration in SubtitleExtractor after parsing#3106
copybara-service[bot] merged 3 commits intoandroidx:mainfrom
onseok:fix/subtitle-extractor-seekmap-duration

onseok commented Mar 6, 2026 •

edited

Loading

Uh oh!

icbaker commented Mar 6, 2026

Uh oh!

onseok commented Mar 6, 2026

Uh oh!

icbaker commented Mar 6, 2026

Uh oh!

icbaker commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

onseok commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

icbaker commented Mar 6, 2026

Uh oh!

onseok commented Mar 6, 2026

Uh oh!

icbaker commented Mar 6, 2026

Uh oh!

icbaker commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

onseok commented Mar 6, 2026 •

edited

Loading