Skip to content

v3.5.12 — the camera-media release

Latest

Choose a tag to compare

@DozaVisuals DozaVisuals released this 11 Jun 02:22

Doza Assist v3.5.12

The camera-media release. Every fix in here comes from one editor's real-world workflow in #39 — MXF/AVC camera originals, multi-track audio, embedded timecode, two-speaker interviews on an M1 Mac. If that sounds like your footage, this release is for you.

Transcription that survives camera files

  • Multi-track audio is now mixed down properly. Camera files often put each mic on its own mono track; the old extraction grabbed one track — sometimes the empty one. That single bug caused silent/garbled transcripts, the cryptic "cannot reshape tensor of 0 elements" crash, and transcripts that missed a speaker entirely. All tracks are now mixed into the transcription audio.
  • Bad cached audio self-heals. Audio extracted by older versions (including truncated or wrong-track files) is detected and re-extracted automatically on the next transcription. No uninstall/reinstall needed — ever. For an existing broken project, just hit gear → Retranscribe once.
  • A file with no usable audio now says so ("the file may have no usable audio track") instead of crashing the engine, and an empty transcript is reported as an error instead of showing a blank page marked "transcribed."

Real progress for long Whisper runs

  • The progress bar now shows actual engine progress ("12 of 20 min processed · ~9m remaining") instead of a size-based guess that camped on "Almost done…" for the entire run. Long CPU transcriptions look like what they are: working, not hung.

Playback for browser-hostile formats

  • MXF (and many high-bitrate camera MP4s) can't be decoded by any browser — that was the black viewer with the dead progress bar. The player now detects the decode failure and automatically switches to audio playback with a clear note. Transcript, clips, and exports always used the original file and are unaffected.
  • Fixed the audio-only (🔊) toggle crashing on video-only projects, and audio seeking in Safari.

Honest speaker labels + easy reassignment

  • When the Whisper engine (which can't tell speakers apart) handles a multi-speaker project, the transcript now says so up front — with a one-click path to fix it: your interviewer/subject names are pre-seeded in the sidebar, and clicking any paragraph's speaker name reassigns it. Also fixed reassignment silently skipping the last segment of a paragraph.

Analysis errors that name the real problem

  • "Analysis came back empty" will no longer hide "Ollama isn't running," "model not installed," or "model needs more memory than this Mac has free" — those now surface immediately (in seconds, not after 20 minutes of timeouts) with a link to fix them. On 8GB Macs, the out-of-memory case now recommends the smaller gemma4:e2b variant.

Source timecode display

  • Transcripts, clip cards, the player counter, and TXT exports can now display your camera's embedded source timecode (e.g. 14:23:11:05) instead of 00:00-relative time — frame-accurate, drop-frame aware, on automatically when the media carries TC, toggleable via the TC button in the player. You can also jump the transcript by pasting an SMPTE timecode from your NLE. (FCPXML exports already used source TC; now the UI matches.)

Checksum

SHA-256: 6e3b9330bf3f18eae1f76834eb914e4a20095a4e78a05eb15b44c72fa56307db  Doza-Assist-3.5.12.dmg

Signed, notarized, and stapled (Developer ID: 55F36Z67ZN — Gatekeeper-verified).

Full diff: v3.5.11...v3.5.12