Skip to content

OxideAV/oxideav-workspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

210 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oxideav

A 100% pure Rust media transcoding and streaming framework. No C libraries, no FFI wrappers, no *-sys crates — just Rust, all the way down.

Goals

  • Pure Rust implementation. Never depend on ffmpeg, libav, x264, libvpx, libopus, or any other C library — directly or transitively. Every codec, container, and filter is implemented from the spec.
  • Clean abstractions for codecs, containers, timestamps, and streaming formats.
  • Composable pipelines: media input → demux → decode → transform → encode → mux → output, with pass-through mode for remuxing without re-encoding.
  • Modular workspace: per-format crates for complex modern codecs/containers, a shared crate for simple standard formats, and an aggregator crate that ties them together behind Cargo features.

Non-goals

  • Wrapping existing C codec libraries.
  • Perfect feature parity with FFmpeg on day one. Codec and container coverage grows incrementally.
  • GPU-specific acceleration (may come later through pure-Rust compute libraries, but never C drivers).

Workspace layout

The workspace is a set of Cargo crates under crates/, grouped by role:

  • Infrastructureoxideav-core (primitives: Packet / Frame / Rational / Timestamp / PixelFormat / ExecutionContext), oxideav-codec (Decoder / Encoder traits + registry), oxideav-container (Demuxer / Muxer traits + registry), oxideav-pipeline (source → transforms → sink composition).
  • I/Ooxideav-source (generic SourceRegistry + file driver + BufferedSource), oxideav-http (HTTP/HTTPS driver, opt-in via feature).
  • Effects + conversionsoxideav-audio-filter (Volume / NoiseGate / Echo / Resample / Spectrogram), oxideav-pixfmt (pixel-format conversion matrix + palette generation + dither).
  • Job graphoxideav-job (JSON transcode graph + pipelined multithreaded executor).
  • Containers — one crate each for oxideav-ogg / -mkv / -mp4 / -avi / -iff. Simple containers (WAV, raw PCM, slin) live inside oxideav-basic.
  • Codec crates — one crate per codec family; see the Codecs table below for the per-codec status. Tracker formats (oxideav-mod, oxideav-s3m) are decoder-only by design. Codec scaffolds that register-but-refuse (JPEG XL, JPEG 2000, AVIF) reserve their codec ids so the API surface stays forward-compatible.
  • Aggregatoroxideav re-exports every enabled crate behind Cargo features. Registries::with_all_features() builds a registry covering every format compiled in.
  • Binariesoxideav-cli (the oxideav CLI: list / probe / remux / transcode / run / validate / dry-run) and oxideplay (reference SDL2 + TUI player).

Use cargo run --release -p oxideav-cli -- list to enumerate the codec and container matrix actually compiled into the release binary.

Core concepts

  • Packet — a chunk of compressed (encoded) data belonging to one stream, with timestamps.
  • Frame — a chunk of uncompressed data (audio samples or a video picture).
  • Stream — one media track inside a container (audio, video, subtitle…).
  • TimeBase / Timestamp — rational time base per stream; timestamps are integers in that base.
  • Demuxer — reads a container, emits Packets per stream.
  • Decoder — turns Packets of a given codec into Frames.
  • Encoder — turns Frames into Packets.
  • Muxer — writes Packets into an output container.
  • Pipeline — connects these pieces. A pipeline can pass Packets straight from Demuxer to Muxer (remux, no quality loss) or route through Decoder → [Filter] → Encoder.

Using a codec directly (no containers, no pipeline)

Every codec crate in OxideAV is designed to be usable on its own. Pull only oxideav-core (types), oxideav-codec (trait + registry), and the codec itself:

[dependencies]
oxideav-core = "0.0"
oxideav-codec = "0.0"
oxideav-g711 = "0.0"   # or any other codec crate
use oxideav_codec::CodecRegistry;
use oxideav_core::{CodecId, CodecParameters, Frame, Packet, TimeBase};

let mut reg = CodecRegistry::new();
oxideav_g711::register(&mut reg);

let mut params = CodecParameters::audio(CodecId::new("pcm_mulaw"));
params.sample_rate = Some(8_000);
params.channels = Some(1);

let mut dec = reg.make_decoder(&params)?;
dec.send_packet(&Packet::new(0, TimeBase::new(1, 8_000), ulaw_bytes))?;
let Frame::Audio(a) = dec.receive_frame()? else { unreachable!() };
// `a.data[0]` is S16 PCM.

The canonical walkthrough of the send_packet / receive_frame / flush / reset loop lives in oxideav-codec's README. Each codec crate's README has a concrete example tailored to its payload shape.

Current status

oxideav list (via the CLI) prints the live, build-time-accurate codec + container matrix with per-implementation capability flags — that's the source of truth at any point. The tables below are the human-readable summary, grouped + collapsible so the page stays scannable.

Containers (click to expand)

Container format detection is content-based: each container ships a probe that scores the first 256 KB against its magic bytes. The file extension is a tie-breaker hint, not the source of truth — a .mp4 that's actually a WAV opens correctly.

Container Demux Mux Seek Notes
WAV LIST/INFO metadata; byte-offset seek
FLAC VORBIS_COMMENT, streaminfo, PICTURE block; SEEKTABLE-based seek
Ogg Vorbis/Opus/Theora/Speex pages + comments; page-granule bisection
Matroska MKV/MKA/MKS; DocType-aware probe; Cues-based seek
WebM First-class: separate fourcc, codec whitelist (VP8/VP9/AV1/Vorbis/Opus); inherits Matroska Cues seek
MP4 mp4/mov/ismv brands, faststart, iTunes ilst metadata; sample-table seek
AVI LIST INFO, avih duration; idx1 keyframe-index seek
MP3 ID3v2/v1 tags + cover art, Xing/VBRI TOC seek (+ CBR fallback), frame sync with mid-stream resync
IFF / 8SVX Amiga IFF with NAME/AUTH/ANNO/CHRS
IVF VP8 elementary stream container
AMV Chinese MP4 player format (RIFF-like)
WebP RIFF/WEBP (lossy + lossless + animation)
PNG / APNG 8 + 16-bit, all color types, APNG animation
GIF GIF87a/GIF89a, LZW, animation + NETSCAPE2.0 loop
JPEG Still-image wrapper around the MJPEG codec
slin Asterisk raw-PCM: .sln/.slin/.sln8..192
MOD / S3M Tracker modules (decode-only by design)

Cross-container remux works for any pair whose codecs don't require rewriting (FLAC ↔ MKV, Ogg ↔ MKV, MP4 ↔ MOV, etc.).

Codecs

Audio (click to expand)
Codec Decode Encode
PCM (s8/16/24/32/f32/f64) ✅ all variants ✅ all variants
slin (Asterisk raw PCM) ✅ .sln/.slin/.sln16/.sln48 etc. ✅ same — headerless S16LE
FLAC ✅ bit-exact vs reference ✅ bit-exact vs reference
Vorbis ✅ matches lewton/ffmpeg (type-0/1/2 residue) ✅ stereo coupling + ATH floor
Opus ✅ CELT mono+stereo; SILK NB/MB/WB mono 10+20+40+60 ms; SILK stereo ✅ CELT-only full-band 20 ms (new_celt_only_full_band() constructor)
MP1 ✅ all modes, RMS 2.9e-5 vs ffmpeg ✅ CBR (greedy allocator, 89 dB PSNR on pure tone)
MP2 ✅ all modes, RMS 2.9e-5 vs ffmpeg ✅ CBR mono+stereo (greedy allocator, ~31 dB PSNR)
MP3 ✅ MPEG-1 Layer III (M/S stereo) ✅ CBR mono+stereo
AAC-LC ✅ mono+stereo, M/S, IMDCT ✅ mono+stereo, ffmpeg accepts
CELT ✅ full §4.3 pipeline (energy + PVQ + IMDCT + post-filter) ✅ mono + stereo dual-stereo (intra-only long-block; energy + PVQ + fMDCT)
Speex ✅ NB modes 1-8 + WB via QMF+SB-CELP (+ formant postfilter) ✅ NB mode-5 CELP + WB sub-mode-1 (QMF split, 16 kHz @ ~16.6 kbit/s)
GSM 06.10 ✅ full RPE-LTP ✅ full RPE-LTP (standard + WAV-49)
G.711 (μ-law / A-law) ✅ ITU tables ✅ ITU tables (pcm_mulaw / pcm_alaw + aliases)
G.722 ✅ 64 kbit/s QMF + dual-band ADPCM (37 dB PSNR, self-consistent tables) ✅ same roundtrip
G.723.1 ✅ ACELP + MP-MLQ decode ✅ 5.3k ACELP + 6.3k MP-MLQ (default 6.3k)
G.728 ✅ LD-CELP 50-order backward-adaptive; placeholder codebook ✅ exhaustive 128×8 analysis-by-synthesis
G.729 ✅ CS-ACELP (non-spec tables, produces audible speech) ✅ symmetric encoder
IMA-ADPCM (AMV) ✅ (33.8 dB PSNR roundtrip)
8SVX ✅ via FORM/8SVX container muxer
Video (click to expand)
Codec Decode Encode
MJPEG ✅ baseline + progressive 4:2:0/4:2:2/4:4:4/grey ✅ baseline
FFV1 ✅ v3, 4:2:0/4:4:4 ✅ v3
MPEG-1 video ✅ I+P+B frames ✅ I+P+B frames (half-pel ME, FWD/BWD/BI B-modes, 43 dB PSNR)
MPEG-4 Part 2 ✅ I+P-VOP, half-pel MC ✅ I+P-VOP (41-43 dB PSNR, 21% vs all-I)
Theora ✅ I+P frames ✅ I+P frames (45 dB PSNR, 3.7× vs all-I)
H.263 ✅ I+P pictures, half-pel MC ✅ I+P pictures, diamond-pattern motion search (±15 pel range), 46 dB PSNR on sliding-gradient
H.264 Baseline I-slice skeleton: CAVLC + intra-pred + transforms + deblocking; 100% on solid-gray IDR
H.265 (HEVC) NAL + VPS/SPS/PPS/slice parse
VP8 ✅ I+P frames (6-tap sub-pel + MV decode + ref management) ✅ I-frame only (DC_PRED, 42 dB PSNR at qindex 50)
VP9 Header parse + bool decoder, DC/V/H intra-pred, 4×4/8×8 iDCT; partition syntax pending
AV1 OBU + sequence/frame header parse, range decoder, DC/V/H intra-pred, 4×4/8×8 DCT
AMV video ✅ (synthesised JPEG header + vertical flip) ✅ (via MJPEG encoder, 33 dB PSNR roundtrip)
ProRes 422 ✅ 4:2:2 Proxy/LT/Standard (forward/inverse 8×8 DCT + zig-zag + exp-Golomb) ✅ same (Yuv422P in; 44 dB PSNR at quant 4). MP4 .mov FourCC wiring still TODO.
Image (click to expand)
Codec Decode Encode
PNG / APNG ✅ 5 color types × 8/16-bit, all 5 filters, APNG animation ✅ same matrix + APNG emit
GIF ✅ GIF87a/89a, LZW, interlaced, animation ✅ GIF89a, animation, per-frame palettes
WebP VP8L ✅ full lossless (Huffman + LZ77 + transforms) ✅ lossless (no transforms, byte-identical roundtrip)
WebP VP8 ✅ lossy (via VP8 decoder) ✅ lossy (via VP8 I-frame enc, 32 dB PSNR)
JPEG (still) ✅ via MJPEG codec ✅ via MJPEG codec
Trackers (decode-only by design) (click to expand)
Codec Decode Encode
MOD ✅ 4-channel Paula-style mixer + main effects
S3M ✅ stereo + SCx/SDx/SBx effects
Subtitles (click to expand)

All text formats parse to a unified IR (SubtitleCue with rich-text Segments: bold / italic / underline / strike / color / font / voice / class / karaoke / timestamp / raw) so cross-format conversion preserves as much styling as each pair can represent. Bitmap-native formats (PGS, DVB, VobSub) decode directly to Frame::Video(Rgba).

Text formats — in oxideav-subtitle:

Format Decode Encode Notes
SRT (SubRip) <b>/<i>/<u>/<s>, <font color> hex + 17 named, <font face size>
WebVTT Header, STYLE ::cue(.class), REGION, inline b/i/u/c/v/lang/ruby/timestamp, cue settings
MicroDVD frame-based, {y:b/i/u/s}, {c:$BBGGRR}, {f:family}
MPL2 decisecond timing, / italic, | break
MPsub relative-start timing, FORMAT=TIME, TITLE=/AUTHOR=
VPlayer HH:MM:SS:text, end inferred
PJS frame-based, quoted body
AQTitle -->> N frame markers
JACOsub \B/\I/\U, #TITLE/#TIMERES headers
RealText HTML-like <time>/<b>/<i>/<u>/<font>/<br/>
SubViewer 1/2 marker-based v1, [INFORMATION] header v2
TTML W3C Timed Text, <tt>/<head>/<styling>/<style>/<p>/<span>/<br/>, tts:* styling
SAMI Microsoft, <SYNC Start=ms> + <STYLE> CSS classes
EBU STL ISO/IEC 18041 binary GSI+TTI (text mode only; bitmap + colour variants deferred)

Advanced text (own crate)oxideav-ass:

Format Decode Encode Notes
ASS / SSA Script Info + V4+/V4 Styles (BGR+inv-alpha) + override tags (b/i/u/s/c/fn/fs/pos/an/k/kf/ko/N/n/h). Animated tags (\t, \fad, \move, \clip, \fscx/y, \frz, \blur) preserved as opaque raw so text survives round-trip

Bitmap-native (own crate)oxideav-sub-image:

Format Decode Encode Notes
PGS / HDMV (.sup) Blu-ray subtitle stream; PCS/WDS/PDS/ODS + RLE + YCbCr palette → RGBA
DVB subtitles ETSI EN 300 743 segments + 2/4/8-bit pixel-coded objects
VobSub (.idx+.sub) DVD SPU with control commands + RLE + 16-colour palette

Cross-format transforms (text side): srt_to_webvtt, webvtt_to_srt in oxideav-subtitle; srt_to_ass, webvtt_to_ass, ass_to_srt, ass_to_webvtt in oxideav-ass. Other pairs go through the unified IR directly (parse → IR → write).

Text → RGBA rendering — any decoder producing Frame::Subtitle can be wrapped with RenderedSubtitleDecoder::make_rendered_decoder(inner, width, height) which emits Frame::Video(Rgba) at the caller- specified canvas size, one new frame per visible-state change. Embedded 8×16 bitmap font covers ASCII + Latin-1 supplement; bold via smear, italic via shear; 4-offset outline. No TrueType dep, no CJK.

In-container subtitles (MKV / MP4 subtitle tracks) remain a scoped follow-up.

Scaffolds — API registered, pixel/sample decode not yet implemented (click to expand)
Codec Status
JPEG XL stub — registered, returns Error::Unsupported on decode/encode
JPEG 2000 stub — ditto
AVIF stub — gated on full AV1 tile decode, which is itself pending

Tags + attached pictures

The oxideav-id3 crate parses ID3v2.2 / v2.3 / v2.4 tags (whole-tag

  • per-frame unsync, extended header, v2.4 data-length indicator, encrypted/compressed frames recorded as Unknown) plus the legacy 128-byte ID3v1 trailer. Text frames (T*, TXXX), URLs (W*, WXXX), COMM / USLT, and APIC / PIC picture frames are handled structurally; less-common frames (SYLT, RGAD/RVA2, PRIV, GEOB, UFID, POPM, MCDI, …) survive as Unknown with their raw bytes available.

oxideav-mp3 and oxideav-flac containers surface the extracted fields via the standard Demuxer::metadata() (Vorbis-comment-style keys: title, artist, album, date, genre, track, composer, …) and cover art via a new Demuxer::attached_pictures() method returning &[AttachedPicture] (MIME type + one-of-21 picture-type enum + description + raw image bytes). FLAC's native METADATA_BLOCK_PICTURE is handled natively; FLAC wrapped in ID3 (a few oddball taggers) works via the fallback path.

oxideav probe file.mp3 prints a Metadata: section and an Attached pictures: section with per-picture summary.

Audio filters

The oxideav-audio-filter crate provides:

  • Volume — gain adjustment with configurable scale factor
  • NoiseGate — threshold-based gate with attack/hold/release
  • Echo — delay line with feedback
  • Resample — polyphase windowed-sinc sample rate conversion
  • Spectrogram — STFT → image (Viridis/Magma colormaps, RGB + PNG output)

Pixel formats + conversion

The oxideav-pixfmt crate is the shared conversion layer for video codecs. The PixelFormat enum covers ~30 first-tier formats (ffmpeg equivalent names in parentheses):

  • RGB family: Rgb24, Bgr24, Rgba, Bgra, Argb, Abgr, plus 16-bit-per-channel Rgb48Le / Rgba64Le.
  • YUV planar: Yuv420P / Yuv422P / Yuv444P at 8 / 10 / 12-bit, plus JPEG-full-range variants (YuvJ420P, YuvJ422P, YuvJ444P).
  • YUV semi-planar: Nv12, Nv21. YUV packed: Yuyv422, Uyvy422.
  • Grayscale: Gray8, Gray10Le, Gray12Le, Gray16Le.
  • Alpha-bearing: Ya8, Yuva420P.
  • Palette: Pal8. 1-bit: MonoBlack, MonoWhite.

oxideav_pixfmt::convert(src, dst_format, &ConvertOptions) handles the live conversion matrix (RGB all-to-all swizzles, YUV↔RGB under BT.601 / BT.709 × limited / full range, NV12/NV21 ↔ Yuv420P, Gray ↔ RGB, Rgb48 ↔ Rgb24, Pal8 ↔ RGB with optional dither). Palette generation via generate_palette() offers MedianCut and Uniform strategies. Dither options: None, 8×8 ordered Bayer, Floyd-Steinberg.

Codecs declare accepted_pixel_formats on their CodecCapabilities; the job graph (below) auto-inserts conversion when the upstream format doesn't match.

JSON job graph

The oxideav-job crate is a declarative way to describe multi-output transcode pipelines. A job is a JSON object: keys are output filenames (or reserved sinks like @null / @display), values describe tracks grouped by audio / video / subtitle / all, and each track carries a recursive input tree of source refs and filter / convert nodes.

{
  "threads": 8,
  "@in":       {"all": [{"from": "movie.mp4"}]},
  "out.mkv":   {
    "video": [{"from": "@in", "codec": "h264", "codec_params": {"crf": 23}}],
    "audio": [{"from": "@in", "codec": "flac"}]
  },
  "out.png":   {"video": [{"from": "@in", "convert": "rgba"}]}
}

The executor has two modes: serial (threads == 1) runs one packet at a time; pipelined (threads ≥ 2, default when available_parallelism() ≥ 2) spawns one worker thread per stage per track connected by bounded mpsc channels. The mux/sink loop runs on the caller's thread so JobSink implementations don't need to be Send (the SDL2 player sink in oxideplay stays a single-threaded object). Both modes produce byte-identical output for deterministic jobs.

Decoder / Encoder trait hook: set_execution_context(&ExecutionContext) (default no-op) lets codecs opt into slice- / GOP-parallel work later without trait churn.

Explicit pixel-format conversion nodes ({"convert": "yuv420p", "input": ...}) fit anywhere in the input tree; the resolver also auto-inserts a PixConvert stage between Decode and Encode when a codec's accepted_pixel_formats list excludes the upstream format.

Input sources

The source layer decouples I/O from container parsing. Container demuxers receive an already-opened Box<dyn ReadSeek> and never touch the filesystem directly. The SourceRegistry resolves URIs to readers:

Scheme Driver Notes
bare path / file:// built-in std::fs::File
http:// / https:// oxideav-http (opt-in) ureq + rustls, Range-request seeking

The HTTP driver is off by default in the library (http cargo feature) and on by default in oxideplay and oxideav-cli.

BufferedSource wraps any ReadSeek with a prefetch ring buffer (64 MiB default in oxideplay, configurable via --buffer-mib). A worker thread fills the ring ahead of the read cursor; seeks inside the window are free.

$ oxideav probe https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
Input: https://download.blender.org/peach/bigbuckbunny_movies/BigBuckBunny_320x180.mp4
Format: mp4
Duration: 00:09:56.46
  Stream #0 [Video]  codec=h264  video 320x180
  Stream #1 [Audio]  codec=aac  audio 2ch @ 48000 Hz

Playback

An opt-in binary crate oxideplay implements a reference player with SDL2 (audio + video) and a crossterm TUI. SDL2 is loaded at runtime via libloadingoxideplay doesn't link against SDL2 at build time, so the binary builds and ships without requiring SDL2 dev headers. If SDL2 isn't installed on the target machine, the player exits cleanly with a "library not found" message instead of failing to start. The core oxideav library remains 100% pure Rust.

cargo run -p oxideplay -- /path/to/file.mkv
cargo run -p oxideplay -- https://example.com/video.mp4

Keybinds: q quit, space pause, ← / → seek ±10 s, ↑ / ↓ seek ±1 min (up = forward, down = back), pgup / pgdn seek ±10 min, * volume up, / volume down. Works from the SDL window (when a video stream is present) or from the TTY.

CLI

oxideav command-line verbs: list, probe, remux, transcode, run, validate, dry-run. Inputs can be local paths or HTTP(S) URLs.

$ oxideav list                           # print registered codecs + containers
$ oxideav probe song.flac
$ oxideav transcode song.flac song.wav
$ oxideav remux input.ogg output.mkv
$ oxideav probe https://example.com/video.mp4

# JSON job graph
$ oxideav run job.json
$ oxideav run - < job.json
$ oxideav run --inline '{"out.mkv":{"audio":[{"from":"in.mp3"}]}}'
$ oxideav run --threads 4 job.json        # override thread budget
$ oxideav validate job.json               # check without running
$ oxideav dry-run job.json                # print the resolved DAG

oxideplay --job <file> runs a job where @display / @out binds to the SDL2 player sink; other outputs (file paths) write to disk in the same run.

Building

cargo build --workspace
cargo test --workspace

The oxideav binary is produced by the oxideav-cli crate:

cargo run -p oxideav-cli -- --help

Working with extracted sibling crates

A handful of fully-spec-complete codecs have been extracted into their own repositories under the OxideAV organization and are consumed from crates.io. To hack on them locally alongside this repo, clone them as siblings and run scripts/dev-patch.sh:

# layout: parent/
#         ├── oxideav/           (this repo)
#         └── oxideav-<name>/    (any OxideAV/oxideav-* clone)
git clone git@github.com:OxideAV/oxideav-gsm.git ../oxideav-gsm
./scripts/dev-patch.sh           # generates .cargo/config.toml
cargo run -p oxideplay -- some.wav

scripts/dev-patch.sh rewrites .cargo/config.toml with a [patch.crates-io] entry for every ../oxideav-* sibling it finds, plus every in-workspace crates/oxideav-* crate. The file is gitignored, so each dev owns their own layout. Re-run the script after adding or removing a sibling.

License

MIT — see LICENSE. Copyright © 2026 Karpelès Lab Inc.

About

Pure-Rust media transcoding framework — codecs, containers, filters, and a JSON transcode job graph. No C dependencies.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors