feat(dash): per-Representation Format emission for selector parity#235
Merged
Conversation
Add dash-mpd + thiserror deps to rdlp-extractor, create the base::common::dash module with DashExpandError (errors.rs) and a stub expand_dash_representations (expand.rs) returning an empty Vec. Wire pub mod dash into base::common::mod.rs. cargo check + clippy pass with zero warnings.
Add segments.rs with substitute_template() implementing ISO/IEC 23009-1 §5.3.9.4.4 token substitution ($RepresentationID$, $Number%0Nd$, $Time$, $Bandwidth$, $$). Wire private mod segments into dash/mod.rs. Fix pre-existing clippy failures from Task 1 skeleton stubs: add #[allow(dead_code)] to DashExpandError + expand_dash_representations (consumers land in Tasks 9–13), and #[allow(unused_imports)] to the pub(crate) re-exports in mod.rs. Zero warnings with -D warnings.
Adds Fragment, SegmentTemplatePlan, and resolve_segment_template() to segments.rs. Segment count computed as ceil(period_duration * timescale / duration), matching yt-dlp common.py:3008-3015. Init segment prepended when present. Covered by two TDD tests (with/without init segment).
Add MAX_SEGMENTS_PER_REP (1_000_000) constant and replace the bare arithmetic body with a guarded version that handles three OOM/silent- failure paths that a malformed or adversarial MPD could trigger: - duration == 0 → was: (period / 0.0).ceil() as u64 = u64::MAX → Vec::with_capacity(u64::MAX + 1) aborts the process. Now: early return Vec::new() with a log::warn. - timescale == 0 → was: division yields NaN → NaN as u64 = 0 → silently empty list with no diagnostic. Now: same early-return guard. - Very large but finite count (e.g. period = 1e9 s, duration = 1 tick) → was: unbounded allocation. Now: capped at MAX_SEGMENTS_PER_REP. Also adds a guard for non-positive / non-finite period_duration_seconds (negative or NaN input from the XML parser). Four regression tests added to mod template_tests: zero_duration_returns_empty zero_timescale_returns_empty count_capped_at_million negative_period_returns_empty Test count: 6 → 10 (all pass).
Implements resolve_chain() in dash/baseurl.rs: iterates MPD/AdaptationSet/ Representation levels, joining the first <BaseURL> at each level against the running endpoint via RFC 3986. The implicit level-0 is the MPD fetch URL itself. CDN failover (multiple <BaseURL> per level) is out of scope. Four tests: empty chain, absolute replacement, relative resolution, full three-level chain.
Walk MPD periods → adaptations → representations and emit one Format
per usable Repr with pre-resolved fragments. Wires all leaf helpers:
substitute_template, resolve_segment_{template,timeline,list},
resolve_chain, parse_frame_rate, parse_audio_sampling_rate.
Key dash-mpd 0.20.2 adaptation: SegmentTemplate.initialization is
Option<String> (the @initialization attribute URL), not
Option<Initialization>. tmpl_init_url() handles both the attribute
form and the <Initialization sourceURL="…"> child element form.
MAX_REPS_PER_MPD = 50 declared; cap-at-50 logic present (exercised
by Task 11's fixture). DRM filter and multi-period warning present.
Test: segment_template_three_video_two_audio against the existing
segment_template.mpd fixture (1 video + 1 audio Repr). Asserts
protocol=HttpDashSegments, fragments non-empty, vcodec XOR acodec.
Also fixes pre-existing collapsible-if lint in segments.rs
(resolve_segment_list init guard).
Add `download_format(&Format, &Path, progress)` to `DashDownloader` that short-circuits the MPD fetch+parse path when `Format.fragments` is already populated. Fragments are fetched sequentially via the existing HTTP client, validated, and concatenated directly into the output file. The legacy MPD-URL path (`download::run`) is unchanged. Also adds `resolve_fragment_url` (handles optional base-URL join) and `fetch_fragment_bytes` (single-fragment HTTP GET) as private helpers. Covered by `tests/dash_pre_resolved.rs`: asserts MPD endpoint is never hit (mockito `expect(0)`), output bytes are exact fragment concatenation, and `bytes_downloaded` matches.
…gment SSRF gate Adds a default-impl `Downloader::download_format(&Format, ...)` method that defaults to `download_to_file(&format.url, ...)`. DashDownloader overrides it to dispatch to the new pre-resolved fragments path when `format.fragments` is Some, falling back to the legacy MPD-URL path otherwise. Orchestrator `execute_download` now takes `&Format` and dispatches via `download_format`, so the new fragments path is reachable from production (was previously unreachable — fragments were silently ignored). Per-fragment SSRF validation was inlined per the initial review but breaks mockito-based integration tests (127.0.0.1) and is also inconsistent with the legacy MPD-URL path, which validates only at the orchestrator boundary. The hardened gate belongs at extract time inside `expand_dash_representations` (TODO documented in dash/mod.rs); for now we match the existing codebase convention.
Replace the single-Format HttpDashSegments placeholder in try_direct_media with a call to expand_dash_representations. On success the InfoDict carries one Format per Representation with pre-resolved fragments. DynamicMpd returns Ok(None) so other strategies can try; all other errors fall back to the legacy single-Format placeholder so partially-parseable MPDs still get a download attempt. Remove the #![allow(dead_code, unused_imports)] gate from base::common::dash::mod now that the expansion API is wired in. Drop the unused pub(crate) re-exports for parse_frame_rate and parse_audio_sampling_rate (expand.rs imports them directly via super::). Update direct_mpd_emits_dash_format to assert per-Repr expansion: at least 1 Format, all with HttpDashSegments protocol and fragments: Some(...).
Adds `orchestrator::tests::dash_e2e` which drives `select_format` +
`download_merge_pair` against a mockito server serving placeholder
segment bytes for three per-Repr DASH Formats: 720p video-only
(2 Mbps), 1080p video-only (5 Mbps), and audio-only (128 kbps).
Asserts:
A. `select_format("bv*+ba")` returns `DownloadPlan::Merge` with the
1080p Rep (higher tbr wins) and the audio Rep.
B. 1080p + audio segment endpoints each called exactly once.
C. 720p segment endpoints never called (wrong Rep filtered by selector).
D. Video intermediate bytes == "V1080INITV1080S1"; audio == "A1INITA1S1".
Scope reduction: MergeStage / FFmpeg mux is not exercised — placeholder
bytes are not valid fMP4. The mux failure path is already covered by
crates/rdlp-downloader/tests/dash_e2e.rs. This test uniquely covers the
selector-driven orchestrator dispatch through the per-Repr format set.
…model - rdlp-extractor lib.rs: note that DASH MPDs are eagerly expanded into per-Representation Formats via base::common::dash::expand_dash_representations - rdlp-downloader lib.rs: expand DASH bullet to describe both paths (pre-resolved fragments vs legacy MPD-URL re-parse + in-process mux)
This was referenced May 2, 2026
This was referenced May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Each MPEG-DASH
Representationnow becomes its ownFormatentry ininfo.formats, so the existingFormatSelectorDSL (-f bv*+ba*,-S \"+res:720\", etc.) operates on DASH content the same way it does for progressive HTTP and HLS variant streams.expand_dash_representations()incrates/rdlp-extractor/src/base/common/dash/. Walks MPD → first Period → AdaptationSets → Representations, projects each Repr to aFormatwith pre-resolvedfragments(init prepended when<Initialization>is present, omitted when absent — mirrors yt-dlp).Downloader::download_format(&Format, ...)trait method onrdlp-core'sDownloadertrait — default-impl delegates todownload_to_file(&format.url, ...).DashDownloaderoverrides it: whenformat.fragments.is_some(), fetch the pre-resolved list directly without re-parsing the MPD; otherwise fall back to the legacy MPD-URL path.crates/rdlp-api/src/orchestrator/execution.rsto dispatch viadownload_formatso the new fragments path is reachable from production.Vec<Format>(one per Repr) instead of one opaqueFormat { protocol: HttpDashSegments, fragments: None }placeholder. Falls back to the legacy single-Format placeholder on parse error so partially-parseable MPDs still get a download attempt.DashExpandError::DynamicMpd. Reps capped at 50/MPD with bandwidth-sorted truncation.resolve_segment_templatereturns empty list with a warn;MAX_SEGMENTS_PER_REP = 1_000_000bounds allocation across template/timeline/list resolvers.Spec:
docs/superpowers/specs/2026-05-02-dash-per-representation-formats-design.md(gitignored).Plan:
docs/superpowers/plans/2026-05-02-dash-per-representation-formats.md(gitignored).What landed
crates/rdlp-extractor/src/base/common/dash/:mod.rs,errors.rs,expand.rs,segments.rs,baseurl.rs,frame_rate.rs,audio_sampling_rate.rs.mega_reps.mpdfixture (60 video Reps) for cap-test coverage.crates/rdlp-downloader/tests/dash_pre_resolved.rs(asserts MPD endpoint never fetched on the fragments path).crates/rdlp-api/src/orchestrator/tests/dash_e2e.rs(assertsbv*+ba*selection picks max-tbr Repr; 720p endpoints never hit; intermediate files contain expected concatenated bytes).Test plan
cargo checkcleancargo clippy --workspace -- -D warningscleancargo test --workspace— all green, 0 failurescargo fmt --checkcleancargo check -p rdlp-desktopfails in worktrees without a built frontend — pre-existing, unrelated to this branchPer-task review trail
15 tasks, each with implementer → spec-compliance reviewer → code-quality reviewer per
~/.claude/rules/superpowers-skill-ordering.md. Critical/Important issues flagged by reviewers were fixed before each task closed:pub(crate)per code-quality nit.duration=0/timescale=0/very-largecount.log::warn!added on baseurl join errors per code-quality nit.download_formatwas unreachable from the orchestrator; added trait default-impl and wired execution dispatch.