Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
0.93 (2021-08-16)
- Minor Rust updates (format, typos, docs)
- Updated GUI
0.92 (2021-08-10)
- Rust updates: Added srt writer
- Rust updates:-Added writers for transcripts and SAMI
- Added missing DLL to Windows installer
- Updated Windows GUI
0.91 (2021-07-26)
- More Rust in the 708 decoder (Add Pen Presets and timing functions)
- Updated GUI
0.90 (2021-07-14)
- New installer (WiX based)
- New GUI (flutter based)
- More Rust (the 708 decoder is being rewritten)
0.89 (2021-06-13)
- Fix: Fix broken links in README
- Fix: Timing in DVB, sub duration check for timeout.
- New: Added support for SCC and CCD encoder formats
- New: Added support to output captions to MCC file (#733).
- New: Add support for censoring words ("Kid Friendly") (#1139)
- New: Extend support of capitalization for all BITMAP and 608 subtitles (#1214)
- New: Added an option to disable timestamps for WebVTT (In response to issue #1127)
- Fix: Change inet_ntop to inet_ntoa for Windows XP compatibility
- Fix: Added italics, underline, and color rendering support for -out=spupng with EIA608/teletext
- Fix: ccx_demuxer_mxf.c: Parse framerate from MXF captions to fix caption timings.
- Fix: hardsubx_decoder.c: Fix memory leaks using Leptonica API.
- Fix: linux/ added some sources to enable rpms to be created.
- Fix: Crash when using -sc (sentence case) option (#1115)
- Fix: Segmentation fault on VOB #1128
- Fix: Hang while processing video #1121
- Fix: lib_ccx.c: Initialize fatal error logging function before first usage in init_libraries
- Fix: A few (minor) memory leaks around the code.
- Fix: General code clean up / reformatting
- Fix: Fix multiple definitions with new -fno-common default in GCC 10
- Fix: Mac now builds reproducibly again without errors on the date command (#1230)
- Fix: Allow all oem modes with tesseract v4 (#1264)
- Doc: Updated ccextractor.cnf.sample.
- Update: Updated LibPNG to 1.6.37
- Remove: Python API (since no one cares about it and it's unmaintained)
- Remove: -cf , just use FFmpeg if you want a ES from a TS or PS, CCExtractor is a bad tool
for this.
- Fix: Segmentation fault on Windows
- Update: Updated libGPAC to 1.0.1
- Fix: Segmentation fault with unsupported and multitrack file reports
- Fix: Write subtitle header to multitrack outputs
- Fix: Write multitrack files to the output file directory
- Fix: Correct frame number calculation in SCC (#1340)
- Fix: Regression on Teletext that caused dates to be wrong (RT 78 on the sample platform)
- Fix: CEA-708: Better timing, fixes for missing subtitles
- Fix: timing for direct rollup
- Fix: timing for VOB files with multiple chapters
0.88 (2019-05-21)
- New: More tapping points for debug image in ccextractor.
- New: Add support for tesseract 4.0
- Optimize: Remove multiple RGB to grey conversion in OCR.
- Fix: Update UTF8Proc to 2.2.0
- Fix: Update LibPNG to 1.6.35
- Fix: Update Protobuf-c to 1.3.1
- Fix: Warn instead of fatal when a 0xFF marker is missing
- Fix: Segfault in general_loop.c due to null pointer dereference (case of no encoder)
- Fix: Enable printing hdtv stats to console.
- Fix: Many typos in comments and output messages
- Fix: Ignore Visual Studio temporary project files
- New: Add support for non-Latin characters in stdout
- Fix: Check whether stream is empty
- New: Add support for EIA-608 inside .mkv
- New: Add support for DVB inside .mkv
- Fix: Added -latrusmap Map Latin symbols to Cyrillic ones in special cases
of Russian Teletext files (issue #1086)
- Fix: Several OCR crashes
0.87 (2018-10-23)
- New: Upgrade libGPAC to 0.7.1.
- New: mp4 tx3g & multitrack subtitles.
- New: Guide to update dependencies (docs/Updating_Dependencies.txt).
- New: Add LICENSE File (#959).
- New: Display quantisation mode in info box (#954).
- New: Add instruction required to build ccextractor with HARDSUBX support (#946).
- New: Added version no. of libraries to --version.
- New: Added -quant (OCR quantization function).
- New: Python API now compatible with Python 3.
- Fix: linux/builddebug: Added non-local directories to the incluye search path so we don't
require a locally compiled tesseract or leptonica.
- Fix: Correct -HARDSUBX Bug In CMake, allow build with hardsubx using cmake (#966).
- Fix: possible segfaults in hardsubx_classifier.c due to strdup (#963).
- Fix: Improve the start and end timestamps of extracted burned in captions (#962).
- Fix: Update (#960).
- Fix: Fixed crash with "-out=report" and "-out=null".
- Fix: -nocf not working with OCR'ing (#958).
- Fix: segfault in add_cc_sub_text and initialize to NULL in init_encoder (#950).
- Fix: ccx_decoders_common.c: Copy data type when creating a copy of the subtitle structure.
- Fix: Implicit declaration of these functions throws warning during build (#948).
- Fix: ccx_decoders_common.c: Properly release allocated resources on free_subtitle().
- Fix: Added a datatype member to struct cc_subtitle - needed so we can properly free all
memory when void *data points to a structure that has its own pointers.
- Fix: dvb_subtitle_decoder.c: When combining image regions verify that the offset is
never negative.
- Fix: Updated traivis.yml to fix osx build (#947).
- Fix: Add utf8proc src file to cmake, updated header file (#944).
- Fix: Added required pointers on freep() calls.
- Fix: Removed dvb_debug_traces_to_stdout and used the usual dbg_print instead.
- Fix: Additional debug traces for DVB.
- Fix: Fix minor memory leak in ocr.c.
- Fix: Fix issue with displaying utf8proc version.
- Fix: Fix failing cmake due to liblept/tesseract header files.
- Fix: Added missing \n in params.c.
- Fix: builddebug: Use -fsanitize=address -fno-omit-frame-pointer.
- Fix: ccx_decoders_common.c: Removed trivial memory leak.
- Fix: ccx_encoders_srt.c: Made sure a pointer is non-NULL before dereferencing.
- Fix: dvb_subtitle_decoder.c: Initialize pointer members to NULL when creating a structure.
- Fix: lib_ccx.c: Initialize (memset 0) structure cc_subtitle after memory allocation.
- Fix: Added verboseness to error/warnings in dvb_subtitle_decoder.c.
- Fix: dvb_subtitle_decoder.c: Work on passing invalid streams errors upstream (plus some
warning messages) so we can eventually recover from this situation instead of crashing.
- Fix: telxcc.c: Currently setting a colour doesn't necessarily add a space even though the
specifications mandate it. (#930).
- Fix: dvb_subtitle_decoder.c: Fix null pointer derefence when region==NULL in write_dvb_sub.
- Fix: DVB Teletext subtitle incomplete.
- Fix: replace all 0xA characters within startbox with 0x20.
- Fix: DVB Teletext subtitle incomplete (#922).
- Fix: Add missing return value to one of the returns in process_tx3g().
- Fix: Typos and other minor bugs.
- Fix: Tidy CMakeLists & vcxproj (#920).
- Fix: Added m2ts and -mxf to help screen.
- Fix: Added MKV to demuxer_print_cfg.
- Fix: Added MXF to demuxer_print_cfg.
- Fix: "Out of order packets" error had wrong print() parameters.
- Fix: Updated Python documentation.
- Fix: Fix incorrect path in XML (#904).
- Fix: linux build script (non-debug): Don't hide warnings from compiler.
- Fix: linux build script (debug): Display what's step of the build script we're in.
- Fix: Make the build reproducible (#976).
- Fix: Remove instance of o1 and o2 from help.
- Fix: Colors of DVB subtitles with depth 2 broken due to a missing break.
- Fix: CEA-708: Caption loss due to CW command (#991).
- Fix: CEA-708: Update patch for windows priority with functions (#990).
0.86 (2018-01-09)
- New: Preliminary MXF support
- New: Added a histogram in one-minute increments of the number of lines in a subtitle.
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (mac).
- New: Added Autoconf build scripts for CCExtractor to generate makefiles (linux).
- New: Added .rpm package generation script.
- New: Added build/installation script for .pkg.tar.xz (Arch Linux).
- New: Added tarball generation script.
- New: Added --analyzevideo. If present the video stream will be processed even if the
subtitles are in a different stream. This is useful when we want video information
(resolution, frame type, etc). -vides now implies this option too.
[Note: Tentative - some possibly breaking changed were made for this, so if you
use it validate results]
- New: Added a GUI in the main CCExtractor binary (separate from the external GUIs
such as CCExtractorGUI).
- New: A Python binding extension so it's possible to use CCExtractor's tools from
- New: Added -nospupngocr (don't OCR bitmaps when generating spupng, faster)
- New: Add support for file split on keyframe (-segmentonkeyonly)
- New: Added WebVTT output from Matroska.
- New: Support for source-specific multicast.
- New: FreeType-based text renderer (-out=spupng with teletext/EIA608).
- New: Upgrade library UTF8proc
- New: Upgrade library win_iconv
- New: Upgrade library zlib
- New: Upgrade library LibPNG
- New: Support for Source-Specific Multicast
- New: Added Travis CI support
- New: Made error messages clearer, less ambiguous
- Fix: Prevent the OCR being initialized more than once (happened on multiprogram and
PAT changes)
- Fix: Makefiles, build scripts, etc... everything updated and corrected for all
-Fix: Proper line ending for .srt files from bitmaps.
- Fix: OCR corrections using grayscale before extracting texts.
- Fix: End timestamps in transcripts from DVB.
- Fix: Forcing -noru to cause deduplication in ISDB
- Fix: TS: Skip NULL packets
- Fix: When NAL decoding fails, don't dump the whole decoded thing, limit to 160 bytes.
- Fix: Modify Autoconf scripts to generate tarball for mac from `/package_creators/`
and include GUI files in tarball
- Fix: Started work on libGPAC upgrade.
- Fix: DVB subtitle not extracted if there's no display segment
- Fix: Heap corruption in add_ocrtext2str
- Fix: bug that caused -out=spupng sometimes crashes
- Fix: Checks for text before newlines on DVB subtitles
- Fix: OCR issue caused by separated dvb subtitle regions
- Fix: DVB crash on specific condition (!rect->ocr_text)
- Fix: DVB bug (Multiple-line subtitle; Missing last line)
- Fix: --sentencecap for teletext samples
- Fix: Crash when image passed into OCR is empty
- Fix: Temporarily wrapped the Python API, not production ready yet
- Fix: -delay option in DVB
0.85b (2017-01-26)
- Fix: Base Windows binary (without OCR) compiled without DLL dependencies.
0.85 (2017-01-23)
- New: Added FFMPEG 3.0 to Windows build - last one that is XP compatible.
- New: Major improvements in CEA-608 to WebVTT (styles, etc).
- New: Return a non-zero return code if no subtitles are found.
- New: Windows build files updated to Visual Studio 2015, new target platform is 140_xp.
- New: Added basic support of Tesseract 4.0.0.
- New: Added build script for .deb.
- New: Updated -debugdvbsub parameter to get the most relevant DVB traces for debugging.
- New: SMPTE-TT files are now compatible with Adobe Premiere.
- New: Updated libpng.
- New: Added 3rd party (Tracy from static linux build script.
- New: Add chapter extraction for MP4 files.
- New: Return code 10 if no captions are found at all.
- Fix: Teletext duplicate lines in certain cases.
- Fix: Improved teletext timing.
- Fix: DVB timing is finally good.
- Fix: A few minor memory leaks.
- Fix: tesseract library file included in mac build command.
- Fix: Bad WTV timings in some cases.
- Fix: Mac build script.
- Fix: Memory optimization in HARDSUBX edit_distance.
- Fix: SubStation Alpha subtitles in bitmap.
- Fix: lept msg severity in linux.
- Fix: SSA, SPUPNG and VTT timing and skipping of subtitles for SAMI and TTML.
- Fix: SMPTE-TT : Added support for font color.
- Fix: SAMI unnecessary empty subtitle when extracting DVB subs.
- Fix: Skip the packet if the adaptation field length is broken.
- Fix: 708 - lots of work done in the decoder. Implementation of more commands. Better timing.
0.84 (2016-12-16)
- New: In Windows, both with and without-OCR binaries are bundled, since the OCR one causes problems due to
dependencies in some system. So unless you need the OCR just use the non-OCR version.
- New: Added -sbs (sentence by sentence) for DVB output. Each frame in the output file contains a complete
sentence (experimental).
- New: Added -curlposturl. If used each output frame will be sent with libcurl by doing a POST to that URL.
- Fix: More code consistency checking in function names.
- Fix: linux build script now tries to verify dependencies.
- Fix: Mac build script was missing a directory.
0.83 (2016-12-13)
- Fix: Duplicate lines in mp4 (specifically affects itunes).
- Fix: Timing in .mp4, timing now calculated for each CC pair instead of per atom.
- Fix: Typos everywhere in the documentation and source code.
- Fix: CMakeLists for build in cmake.
- Fix: -unixts option.
- Fix: FPS switching messages.
- Fix: Removed ugly debug statement with local path in HardsubX.
- Fix: Changed platform target to v120_xp in Visual Studio (so XP is supported again).
- Fix: Added detail in many error messages.
- Fix: Memory leaks in videos with XDS.
- Fix: Makefile compatibility issues with Raspberry pi.
- Fix: missing separation between WebVTT header and body.
- Fix: Stupid bug in M2TS that preventing it from working.
- Fix: OCR libraries dependencies for the release version in Windows.
- Fix: non-buffered reading from pipes.
- Fix: --stream option with stdin.
- New: terminate_asap to buffered_read_opt
- New: Added some TV-show specific spelling dictionaries.
- New: Updated GPAC library.
- New: ASS/SSA.
- New: Capture sigterm to do some clean up before terminating.
- New: Work on 708: Changed DefineWindow behavior, only clear text of an existing window is style has changed.
0.82 (2016-08-15)
- New: HardsubX - Burned in subtitle extraction subsystem.
- New: Color Detection in DVB Subtitles
- Fix: Corrected sentence capitalization
- Fix: Skipping redundant bytes at the end of tx3g atom in MP4
- Fix: Illegal SRT files being created from DVB subtitles
- Fix: Incorrect Progress Display
0.81 (2016-06-13)
- New: --version parameter for extensive version information (version number, compile date, executable hash, git commit (if appropriate))
- New: Add -sem (semaphore) to create a .sem file when an output file is open and delete it when it's closed.
- New: Add --append parameter. This will prevent overwriting of existing files.
- New: File Rotation support added. The user has to send a USR1 signal to rotate.
- Fix: Issues with files <1 Mb
- Fix: Preview of generated transcript.
- Fix: Statistics were not generated anymore.
- Fix: Correcting display of sub mode and info in transcripts.
- Fix: Teletext page number displayed in -UCLA.
- Fix: Removal of excessive XDS notices about aspect ratio info.
- Fix: Force Flushing of file buffers works for all files now.
- Fix: mp4 void atoms that was causing some .mp4 files to fail.
- Fix: Memory usage caused by EPG processing was high due to many non-dynamic buffers.
- Fix: Project files for Visual Studio now include OCR support in Windows.
0.80 (2016-04-24)
- Fix: "Premature end of file" (one of the scenarios)
- Fix: XDS data is always parsed again (needed to extract information such as program name)
- Fix: Teletext parsing: @ was incorrectly exported as * - X/26 packet specifications in ETS 300 706 v1.2.1 now better followed
- Fix: Teletext parsing: Latin G2 subsets and accented characters not displaying properly
- Fix: Timing in -ucla
- Fix: Timing in ISDB (some instances)
- Fix: "mfra" mp4 box weight changed to 1 (this helps with correct file format detection)
- Fix: Fix for TARGET File is null.
- Fix: Fixed SegFaults while parsing parameters (if mandatory parameter is not present in -outinterval, -codec or -nocodec)
- Fix: Crash when input small is too small
- Fix: Update some URLs in code (references to docs)
- Fix: -delay now updates final timestamp in ISDB, too
- Fix: Removed minor compiler warnings
- Fix: Visual Studio solution files working again
- Fix: ffmpeg integration working again
- New: Added --forceflush (-ff). If used, output file descriptors will be flushed immediately after being written to
- New: Hexdump XDS packets that we cannot parse (shouldn't be many of those anyway)
- New: If input file cannot be open, provide a decent human readable explanation
- New: GXF support
0.79 (2016-01-09)
- Support for Grid Format (g608)
- Show Correct number of teletext packet processed
- Removed Segfault on incorrect mp4 detection
- Remove xml header from transcript format
- Help message updated for Teletext
- Added --help and -h for help message
- Added --nohtmlescape option
- Added --noscte20 option
0.78 (2015-12-12)
- Support to extract Closed Caption from MultiProgram at once.
- CEA-708: exporting to SAMI (.smi), Transcript (.txt), Timed Transcript (ttxt) and SubRip (.srt).
- CEA-708: 16 bit charset support (tested on Korean).
- CEA-708: Roll Up captions handling.
- Changed TCP connection protocol (BIN data is now wrapped in packets, added EPG support and keep-alive packets).
- TCP connection password prompt is removed. To set connection password use -tcppassword argument instead.
- Support ISDB Closed Caption.
- Added a new output format, simplexml (used internally by a CCExtractor user, may or may not be useful for
anyone else).
0.77 (2015-06-20)
- Fixed bug in capitalization code ('I' was not being capitalized).
- GUI should now run in Windows 8 (using the include .Net runtime, since
3.5 cannot be installed in Windows 8 apparently).
- Fixed Mac build script, binary is now compiled with support for
files over 2 GB.
- Fixed bug in PMT code, damaged PMT sections could make CCExtractor
0.76 (2015-03-28)
- Added basic M2TS support
- Added EPG support - you can now export the Program Guide to XML
- Some bug fixes
0.75 (2015-01-15)
- Fixed issue with teletext to other then srt.
- CCExtractor can be used as library if compiled using cmake
- By default the Windows version adds BOM to generated UTF files (this is
because it's needed to open the files correctly) while all other
builds don't add it (because it messes with text processing tools).
You can use -bom and -nobom to change the behaviour.
0.74 (2014-09-24)
- Fixed issue with -o1 -o2 and -12 parameters (where it would write output only in the o2 file)
- Fixed UCLA parameter issue. Now the UCLA parameter settings can't be overwritten anymore by later parameters that affect the custom transcript
- Switched order around for TLT and TT page number in custom transcript to match UCLA settings
- Added nobom parameter, for when files are processed by tools that can't handle the BOM. If using this, files might be not readable under windows.
- Segfault fix when no input files were given
- No more bin output when sending to server + possibility to send TT to server for processing
- Windows: Added the Microsoft redistributable MSVCR120.DLL to both the installation package and the application zip.
0.73 - GSOC (2014-08-19)
- Added support of BIN format for Teletext
- Added start of librarization. This will allow in the future for other programs to use encoder/decoder functions and more.
0.72 - GSOC (2014-08-12)
- Fix for WTV files with incorrect timing
- Added support for fps change using data from AVC video track in a H264 TS file.
- Added FFMpeg Support to enable all encapsulator and decoder provided by ffmpeg
0.71 - GSOC (2014-07-31)
- Added feature to receive captions in BIN format according to CCExtractor's own
protocol over TCP (-tcp port [-tcppassword password])
- Added ability to send captions to the server described above or to the
online repository (-sendto host[:port])
- Added -stdin parameter for reading input stream from standard input
- Compilation in Cygwin using linux/Makefile
- Fix for .bin files when not using latin1 charset
- Correction of mp4 timing, when one timestamp points timing of two atom
0.70 - GSOC (2014-07-06)
This is the first release that is part of Google's Summer of Code.
Anshul, Ruslan and Willem joined CCExtractor to work on a number of things
over the summer, and their work is already reaching the mainstream
version of CCExtractor.
- Added a huge dictionary submitted by Matt Stockard.
- Added DVB subtitles decoder, spupng in output
- Added support for cdt2 media atoms in QT video files. Now multiple atoms in
a single sample sequence are supported.
- Changed Makefile.
- Fixed some bugs.
- Added feature to print info about file's subtitles and streams (-out=report).
- Support Long PMT.
- Support Configuration file.
- There is an sample configuration file in doc/ folder with name
- Just now only ccextractor.cnf named files kept beside ccextractor
executable is supported
- for details of which options can be set using configuration file,
please look at sample file.
- Added options for custom transcript output:
new parameter (-customtxt format), where the format must be like this: 1100100 (7 digits).
These indicate whether the next things should be displayed or not in the (timed) transcript:
- Display start time
- Display end time
- Display caption mode
- Display caption channel
- Use a relative timestamp (relative to the sample)
- Display XDS info
- Use colors
0000101 is the default setting for transcripts
1110101 is the default for timed transcripts
1111001 is the default setting for -ucla
Make sure you use this parameter after others that might affect these
settings (-out, -ucla, -xds, -txt, -ttxt, ...)
- Fixed Negative timing Bug
0.69 (2014-04-05)
- A few patches from Christopher Small, including proper support
for multiple multicast clients listening on the same port.
- GUI: Fixed teletext preview.
- GUI: Added a small indicator of data being received when reading from
- GUI: Added UTF-8 support to preview Window (used for teletext).
- Fixes in Makefile and build script, compilation in linux and OSX failed
if another libpng was found in the system.
- WTV support directly in CCExtractor (no need for wtvccdump any more).
- Started refactoring and clean-up.
- Fix: MPEG clock rollover (happens each 26 hours) caused a time
- Windows GUI: Started work on HDHomeRun support. For now it just looks
for HDHomeRun devices. Lots of other things will arrive in the next
- Windows GUI: Some code refactoring, since the HDHomeRun support makes
the code larger enough to require more than one source file :-)
0.68 (2013-12-24)
- A couple of shared variables between 608 decoders were causing
problems when both fields were processed at the same time with
-12, fixed.
- Added BOM for UTF-8 files.
- Corrected a few extended characters in the UTF-8 encoding,
probably never used in real world captioning but since we got
a good test sample file...
- Color and fonts in PAC commands were ignored, fixed (Helen Buus).
- Added a new output format, spupng. It consists on one .png file
for each subtitle frame and one .xml with all the timing
(Heleen Buus).
- Some fixes (Chris Small).
0.67 (2013-10-09)
- Padding bytes were being discarded early in the process in 0.66,
which is convenient for debugging, but it messes with timing in
.raw, which depends on padding. Fixed.
- MythTV's branch had a fixed size buffer that could not be enough
some times. Made dynamic.
- Better support for PAT changing mid-stream.
- Removed quotes in Start in .smi (format fix).
- Added multicast support (Chris Small)
- Added ability to select IP address to bind in UDP (Chris Small)
- Fixes in -unixts and -delay for teletext.
- Added -autodash : When two people are talking, add a dash as
needed (this is based on subtitle position). Only in .srt and
with -trim. Quite experimental, feedback appreciated.
- Added -latin1 to select Latin 1 as encoding. Default is now
UTF-8 (-utf8 still exists but it's not needed).
- Added -ru1, which emulates a (non-existing in real life) 1 line
roll-up mode.
0.66 (2013-07-01)
- Fixed bug in auto detection code that triggered a message
about file being auto of sync.
- Added -investigate_packets
The PMT is used to select the most promising elementary stream
to get captions from. Sometimes captions are where you least
expect it so -datapid allows you to select a elementary stream
manually, in case the CC location is not obvious from the PMT
contents. To assist looking for the right stream, the parameter
"-investigate_packets" will have CCExtractor look inside each
stream, looking for CC markers, and report the streams that
are likely to contain CC data even if it can't be determined from
their PMT entry.
- Added -datastreamtype to manually selecting a stream based on
its type instead of its PID. Useful if your recording program
always hides the caption under the stream type.
- Added -streamtype so if an elementary stream is selected manually
for processing, the streamtype can be selected too. This can be
needed if you process, for example a stream that is declared as
"private MPEG" in the PMT, so CCExtractor can't tell what it is.
Usually you'll want -streamtype 2 (MPEG video) or -streamtype 6
(MPEG private data).
- PMT content listing improved, it now shows the stream type for
more types.
- Fixes in roll-up, cursor was being moved to column 1 if a
RU2, RU3 or RU4 was received even if already in roll-up mode.
- Added -autoprogram. If a multiprogram TS is processed and
-autoprogram is used, CCExtractor will analyze all PMTs and use
the first program that has a suitable data stream.
- Timed transcript (ttxt) now also exports the caption mode
(roll-up, paint-on, etc.) next to each line, as it's useful to
detect things like commercials.
- Content Advisory information from XDS is now decoded if it's
transmitted in "US TV parental guidelines" or "MPA".
Other encoding such as Canada's are not supported yet due
to lack of samples.
- Copy Management information from XDS is now decoded.
- Added -xds. If present and export format is timed transcript
(only), XDS information will be saved to file (same file as the
transcript, with XDS being clearly marked). Note that for now
all XDS data is exported even if it doesn't change, so the
transcript file will be significantly larger.
- Added some PaintOn support, at least enough to prevent it
from breaking things when the other modes are used.
- Removed afd_data() warning. AFD doesn't carry any caption related
data. AFD still detected in code in case we want to do something
with it later anyway.
- Ported last changes from Petr Kutalek's telxcc. Current version
is 2.4.4.
- In teletext mode when exporting to transcript (not .srt), an effort
is made to detect and merge line duplicates. This is done by using
the Levenshtein's distance, which is the number of changes requires
to convert one string to another. To simplify things, strings are
compared up to the length of the shortest one.
There are 3 parameters that can be used to tweak the thresholds:
-deblev: Enable debug so the calculated distance for each two
strings is displayed. The output includes both strings, the
calculated distance, the maximum allowed distance, and whether
the strings are ultimately considered equivalent or not, i.e.
the calculated distance is less or equal than the max allowed.
-levdistmincnt value: Minimum distance we always allow
regardless of the length of the strings. Default 2. This means
that if the calculated distance is 0, 1 or 2, we consider the
strings to be equivalent.
-levdistmaxpct value: Maximum distance we allow, as a
percentage of the shortest string length. Default 10%. For
example, consider a comparison of one string of 30 characters
and one of 60 characters. We want to determine whether the
first 30 characters of the longer string are more or less the
same as the shortest string, i.e. whether the longest string
is the shortest one plus new characters and maybe some
corrections. Since the shortest string is 30 characters and
the default percentage is 10%, we would allow a distance of
up to 3 between the first 30 characters.
- Added -lf : Use UNIX line terminator (LF) instead of Windows (CRLF).
- Added -noautotimeref: Prevent UTC reference from being auto set from
the stream data.
0.65 (2013-03-14)
- Minor GUI changes for teletext
- Added end timestamps in timed transcripts
- Added support for SMPTE (patch by John Kemp)
- Initial support for MPEG2 video tracks inside MP4 files (thanks a
lot to GPAC's Jean who assisted in analyzing the sample and
doing the required changes in GPAC).
- Improved MP4 auto detection
- Support for PCR if PTS is not available (needed for some teletext
samples, and probably useful for everything else).
- Support for UDP streaming - finally. Use "-udp $port" to have
CCExtractor listen for a stream. I've only been able to test it
with an European HDHomeRun, but it should work fine with any other
- Refactored PMT / PAT processing in transport streams, now allows to
display their contents (-parsePAT and -parsePMT) which makes
troubleshooting easier.
0.64 (2012-10-29)
- Changed Window GUI size (larger).
- Added Teletext options to GUI.
- Added -teletext to force teletext mode even if not detected
- Added -noteletext to disable teletext detection. This can be needed
for streams that have both 608 data and teletext packets if you
need to process the 608 data (if teletext is detected it will
take precedence otherwise).
- Added -datapid to force a specific elementary stream to be used for
data (bypassing detections).
- Added -ru2 and -ru3 to limit the number of visible lines in roll-up
captions (bypassing whatever the broadcast says).
- Added support for a .hex (hexadecimal) dump of data.
- Added support for wtv in Windows. This is done by using a new program
(wtvccdump.exe) and a new DirectShow filter (CCExtractorDump.dll) that
process the .wtv using DirecShow's filters and export the line 21 data
to a .hex file. The GUI calls wtvccdump.exe as needed.
- Added --nogoptime to force PTS timing even when CCExtractor would
use GOP timing otherwise.
0.63 (2012-08-17)
- Telext support added, by integrating Petr Kutalek's telxcc. Integration is
still quite basic (there's equivalent code from both CCExtractor and
telxcc) and some clean up is needed, but it works. Petr has announced that
he's abandoning telxcc so further development will happen directly in
- Some bug fixes, as usual.
0.62 (2012-05-23)
- Corrected Mac build "script" (needed to add GPAC includes). Thanks to the
Mac users that sent this.
- Hauppauge mode now uses PES timing, needed for files that don't have
caption data during all the video (such as in commercial breaks).
- Added -mp4 and -in:mp4 to force the input to be processed as MP4.
- CC608 data embedded in a separate stream (as opposed as in the video
stream itself) in MP4 files is now supported (not heavily tested).
This should be rather useful since closed captioned files from iTunes
use this format.
- More CEA-708 work. The debugger is now able to dump the "TV" contents for
the first time. Also, a .srt can be generated, however timing is not quite
good yet (still need to figure out why).
- Added -svc (or --service) to select the CEA-708 services to be processed.
For example, -svc 1,2 will process the primary and secondary language
services. Valid values are 1-63, where 1 is the primary language, 2 is
the secondary language (this is part of the specification) and 3-63 are
provider defined.
- Rajesh Hingorani sent a fix for the MPEG decoder that fixes garbled output
or certain samples (we had none like this in our test collection). Thanks,
0.61 (2012-03-08)
- Fix: GCC 3.4.4 can now build CCExtractor.
- Fix: Damaged TS packets (those that come with 'error in transport' bit
on) are now skipped.
- Fix: Part of the changes for MP4 support (CC packets buffering in
particular) broke some stuff for other files, causing at least very
annoying character duplication. We hope we've fixed it without breaking
anything but please report).
- Some non-interesting cleanup.
0.60 (unreleased)
- Add: MP4 support, using GPAC (a media library). Integration is currently
"enough so it works", but needs some more work. There's some duplicate
code, the stream must be a file (no streaming), etc.
- Fix: The Windows version was writing text files with double \r.
- Fix: Closed captions blocks with no data could cause a crash.
- Fix: -noru (to generate files without duplicate lines in
roll-up) was broken, with complete lines being missing.
- Fix: bin format not working as input.
0.59 (2011-10-07)
- More AVC/H.264 work. pic_order_cnt_type != 0 will be processed now.
- Fix: Roll-up captions with interruptions for Text (with ResumeTextDisplay
in the middle of the caption data) were missing complete lines.
- Added a timed text transcript output format, probably only useful for
roll-up captions. Use --timedtranscript or -ttxt. Output is like this:
00:01:30,623 | PRIMARY VOTERS.
- XDS parser. Not complete (no point in dealing with V-Chip stuff for
example), but enough to extract program and station information.
- Input streams can now come from standard input using - (just an hyphen)
as parameter.
- Added a new output format called 'null' (use -null or -out=null). This
format means "Don't produce any file", and is useful to have CCExtractor
process the stream (for XDS messages, debugging, etc) without actually
generating anything.
- Updated Windows GUI.
- Added -quiet => If used, CCExtractor will not write any message.
- Added -stdout => If used, the captions will be sent to stdout (console)
instead of file. Combined with -, CCExtractor can work as a filter in
a larger process, receiving the stream from stdin and sending the
captions to stdout.
- Some code clean up, minor refactoring.
- Teletext detection (not yet processing).
0.58 (2011-08-21)
- Implemented new PTS based mode to order the caption information
of AVC/H.264 data streams. The old pic_order_cnt_lsb based method
is still available via the -poc or --usepicorder command switches.
- Removed a couple of those annoying "Impossible!" error messages
that appears when processing some (possibly broken, unsure) files.
- Added -nots --notypesettings to prevent italics and underline
codes from being displayed.
- Note to those not liking the paragraph symbol being used for the
music note: Submit a VALID replacement in latin-1.
- Added preliminary support for multiple program TS files. The
parameter --program-number (or -pn) will let you choose which
program number to process. If no number is passed and the TS
file contains more than one, CCExtractor will display a list of
found programs and terminate.
- Added support (basic, because I only received one sample) for some
Hauppauge cards that save CC data in their own format. Use the
parameter -haup to enable it (CCExtractor will display a notice
if it thinks that it's processing a Hauppauge capture anyway).
- Fixed bug in roll-up.
- More AVC work, now TS files from echostar that provided garbled
output are processed OK.
- Updated Windows GUI.
0.57 (2010-12-16)
- Bug fixes in the Windows version. Some debug code was unintentionally
left in the released version.
0.56 (2010-12-09)
- H264 support
- Other minor changes a lot less important
0.55 (2009-08-09)
- Replace pattern matching code with improved parser for MPEG-2 elementary
- Fix parsing of ReplayTV 5000 captions.
- Add ability to decode SCTE 20 encoded captions.
- Make decoding of TS files more error tolerant.
- Start implementation of EIA-708 decoding (not active yet).
- Add -gt / --goptime switch to use GOP timing instead of PTS timing.
- Start implementation of AVC/H.264 decoding (not active yet).
- Fixed: The basic problem is that when 24fps movie film gets converted to 30fps NTSC
they repeat every 4th frame. Some pics have 3 fields of CC data with field 3 CC data
belongs to the same channel as field 1. The following pics have the fields reversed
because of the odd number of fields. I used top_field_first to tell when the channels
are reversed. See Table 6-1 of the SCTE 20 [Paul Fernquist]
0.54 (2009-04-16)
- Add -nosync and -fullbin switches for debugging purposes.
- Remove -lg (--largegops) switch.
- Improve synchronization of captions for source files with
jumps in their time information or gaps in the caption
- [R. Abarca] Changed Mac script, it now compiles/link
everything from the /src directory.
- It's now possible to have CCExtractor add credits
- Added a feature to add start and end messages (for credits).
See help screen for details.
0.53 (2009-02-24)
- Force generated RCWT files to have the same length as source file.
- Fix documentation for -startat / -endat switches.
- Make -startat / -endat work with all output formats.
- Fix sync check for raw/rcwt files.
- Improve timing of dvr-ms NTSC captions.
- Add -in=bin switch to read CCExtractor's own binary format.
- Fix problem with short input files (smaller 1MB).
- Clean up regular and debug output.
- Add -out=bin switch to write RCWT data.
- Remove -bo/--bufferoutput switch and functionality.
- [Volker] Added new generic binary format (RCWT
for Raw Captions With Time). This new format
allows one file to contain all the available
closed caption data instead of just one stream.
- Added --no_progress_bar to disable status
information (mostly used when debugging, as the
progress information is annoying in the middle
of debug logs).
- The Windows GUI was reported to freeze in some
conditions. Fixed.
- The Windows GUI is now targeted for .NET 2.0
instead of 3.5. This allows Windows 2000 to run
it (there's not .NET 3.5 for Windows 2000), as
requested by a couple of key users.
0.51 (unreleased)
- Removed -autopad and -goppad, no longer needed.
- In preparation to a new binary format we have
renamed the current .bin to .raw. Raw files
have only CC data (with no header, timing, etc.).
- The input file format (when forced) is now
specified with
such as -in=ts, -in=raw, -in=ps ...
The old switches (-ts, -ps, etc.) still work.
The only exception is -bin which has been removed
(reserved for the new binary format). Use
-in=raw to process a raw file.
- Removed -d, which when produced a raw file used
a DVD format. This has been merged into a new
output type "dvdraw". So now instead of using
-raw -d as before, use -out=dvdraw if you need
- Removed --noff
- Added gui_mode_reports for frontend communications,
see related file.
- Windows GUI rewritten. Source code now included,
- [Volker] Dish Network clean-up
0.50 (2008-12-12)
- [Volker] Fix in DVR-MS NTSC timing
- [Volker] More clean-up
- Minor fixes
0.49 (2008-12-10)
- [Volker] Major MPEG parser rework. Code much
cleaner now.
- Some stations transmit broken roll-up captions,
and for some reason don't send CRs but RUs...
Added work-around code to make captions readable.
- Started work on EIA-708 (DTV). Right now you can
add -debug-708 to get a dump of the 708 data.
An actually useful decoder will come soon.
- Some of the changes MIGHT HAVE BROKEN MythTV's
code. I don't use MythTV myself so I rely on
other people's samples and reports. If MythTV
is broken please let me know.
- Added new debug options.
- [Volker] Added support for DVR-MS NTSC files.
- Other minor bug fixes and changes.
0.46 (2008-11-24)
- Added support for live streaming, CCExtractor
can now process files that are being recorded
at the same time.
- [Volker] Added a new DVR-MS loop - this is
completely new, DVR-MS specific code, so we no
longer use the generic MPEG code for DVR-MS.
DVR-MS should (or will be eventually at least)
be as reliable as TS.
Note: For now, it's only ATSC recordings, not
NTSC (analog) recordings.
0.45 (2008-11-14)
- Added auto-detection of DVR-MS files.
- Added -asf to force DVR-MS mode.
- Added some specific support for DVR-MS
files. These format used to work
correctly in 0.34 (pure luck) but the
MPEG code rework broke it. It should
work as it used to.
- Updated Windows GUI to support the
new options.
- Added -lg --largegops
From the help screen:
Each Group-of-Picture comes with timing
information. When this info is too separate
(for example because there are a lot of
frames in a GOP) ccextractor may prefer not
to use GOP timing. Use this option is you
need ccextractor to use GOP timing in large
0.44 (2008-09-10)
- Added an option to the GUI to process
individual files in batch, i.e. call
ccextractor once per file. Use it if you
want to process several unrelated files
in one go.
- Added an option to prevent duplicate
lines in roll-up captions.
- Several minor bug fixes.
- Updated the GUI to add the new options.
0.43 (2008-06-20)
- Fixed a bug in the read loop (no less)
that caused some files to fail when
reading without buffering (which is
the default in the Linux build).
- Several improvements in the GUI, such as
saving current options as default.
0.42 (2008-06-17)
- The option switch "-transcript" has been
changed to "--transcript". Also, "-txt"
has been added as the short alias.
- Windows GUI
- Updated help screen
0.41 (2008-06-15)
- Default output is now .srt instead of .bin,
use -raw if you need the data dump instead of
- Added -trim, which removes blank spaces at
the left and rights of each line in .srt.
Note that those spaces are there to help
deaf people know if the person talking is
at the left or the right of the screen, i.e.
there aren't useless. But if they annoy
you, go ahead...
0.40 (2008-05-20)
- Fixed a bug in the sanity check function
that caused the Myth branch to abort.
- Fixed the OSX build script, it needed a
new #define to work.
0.39 (2008-05-11)
- Added a -transcript. If used, the output will
have no time information. Also, if in roll-up
mode there will be no repeated lines.
- Lots of changes in the MPEG parser, most of
them submitted by Volker Quetschke.
- Fixed a bug in the CC decoder that could cause
the first line not to be cleared in roll-up
- CCExtractor can now follow number sequences in
file names, by suffixing the name with +.
For example,
means DVD0001.VOB, DVD0002.VOB, etc. This works
for all files, so part001.ts+ does what you
could expect.
- Added -90090 which changes the clock frequency
from the MPEG standard 90000 to 90090. It
*could* (remains to be seen) help if there are
timing issues.
- Better support for Tivo files.
- By default ccextractor now considers the whole
input file list a one large file, instead of
several, independent, video files. This has
been changed because most programs (for example
DVDDecrypt) just cut the files by size.
If you need the old behaviour (because you
actually edited the video files and want to
join the subs), use -ve.
0.36 (unreleased)
- Fixed bug in SMI, nbsp was missing a ;.
- Footer for SAMI files was incorrect (<body> and
<sami> tags were being opened again instead of
being closed).
- Displayed memory is now written to disk at end
of stream even if there is no command requesting
so (may prevent losing the last screen-full).
- Important change that could break scripts, but
that have been added because old behaviour was
annoying to most people: _1 and _2 at the end
of the output file names is now added ONLY if
-12 is used (i.e. when there are two output
files to produce). So
ccextractor -srt sopranos.mpg
now produces instead of
If you use -12, i.e.
ccextractor -srt -12 sopranos.mpg
You get and
as usual.
0.35 (unreleased)
- Added --defaultcolor to the help screen. Code
was already in 0.34 but the documentation wasn't
- Buffer is larger now, since I've found a sample
where 256 Kb isn't enough for a PES (go figure).
- At the end of the process, a ratio between
video length and time to process is displayed.
0.34 (2007-06-03)
- Added some basic letter case and capitalization
support. For captions that broadcast in ALL
UPPERCASE (most of them), ccextractor can now
do the first part of the job.
--sentencecap or -sc will tell ccextractor to
follow the typical capitalization rules, such
as capitalize months, days of week, etc.
So from
You get
You better respect
this robe, alan.
--capfile or -caf also enables the case
processing part and adds an extra list of
words in the specified file, for example:
--capfile names.txt
where names.txt is just a plain text file
with the proper spelling for some words,
such as
So you get
You better respect
this robe, Alan.
Which is the correct spelling. You can
have a different spelling file per TV
show, or a large file with a lot of
words, etc.
- ccextractor has been reported to
compile and run on Mac with a minor
change in the build script, so I've
created a mac directory with the
modified script. I haven't tested it
- Windows build comes with a File Version
Number ( in this version) in case
you want to check for version info.
0.33 (unreleased)
- Added -scr or --screenfuls, to select the
number of screenfuls ccextractor should
write before exiting. A screenful is
a change of screen contents caused by
a CC command (not new characters). In
practice, this means that for .srt each
group of lines is a screenful, except when
using -dru (which produces a lot of
groups of lines because each new character
produces a new group).
- Completed tables for all encodings.
- Fixed bug in .srt related to milliseconds
in time lines.
- Font colors are back for .srt (apparently
some programs do support them after all).
Use -nofc or --nofontcolor if you don't
want these tags.
0.32 (unreleased)
- Added -delay ms, which adds (or subtracts)
a number of milliseconds to all times in
.srt/.sami files. For example,
-delay 400
causes all subtitles to appear 400 ms later
than they would normally do, and
-delay -400
causes all subtitles to appear 400 ms before
they would normally do.
- Added -startat at -endat which lets you
select just a portion of data to be processed,
such as from minute 3 to minute 5. Check
help screen for exact syntax.
0.31 (unreleased)
- Added -dru (direct rollup), which causes
roll-up captions to be written as
they would on TV instead of line by line.
This makes .srt/.sami files a lot longer,
and ugly too (each line is written many
times, two characters at time).
0.30 (2007-05-24)
- Fix in extended char decoding, I wasn't
replacing the previous char.
- When a sequence code was found before
having a PTS, reported time was
0.29 (unreleased)
- Minor bug fix.
0.28 (unreleased)
- Fixed a buffering related issue. Short version,
the first 2 Mb in non-TS mode were being
- .srt no longer has <font> tags. No player
seems to process them so my guess is that
they are not part of the .srt "standard"
even if McPoodle add them.
0.27 (unreleased)
- Modified sanitizing code, it's less aggressive
now. Ideally it should mean that characters
won't be missed anymore. We'll see.
0.26 (unreleased)
- Added -gp (or -goppad) to make ccextractor use
GOP timing. Try it for non TS files where
subs start OK but desync as the video advances.
0.25 (unreleased)
- Format detection is not perfect yet. I've added
-nomyth to prevent the MytvTV code path to be
called. I've seen apparently correct files that
make MythTV's MPEG decoder to choke. So, if it
doesn't work correctly automatically: Try
-nomyth and -myth. Hopefully one of the two
options will work.
0.24 (unreleased)
- Fixed a bug that caused dvr-ms (Windows Media Center)
files to be incorrectly processed (letters out of
order all the time).
- Reworked input buffer code, faster now.
- Completed MythTV's MPEG decoder for Program Streams,
which results in better processing of some specific
- Automatic file format detection for all kind of
files and closed caption storage method. No need to
tell ccextractor anything about your file (but you
still can).
0.22 (2007-05-15)
- Added text mode handling into decoder, which gets rids
of junk when text mode data is present.
- Added support for certain (possibly non standard
compliant) DVDs that add more captions block in a
user data block than they should (such as Red October).
- Fix in roll-up init code that caused the previous popup
captions not to be written to disk.
- Other Minor bug fixes.
0.20 (2007-05-07)
- Unicode should be decent now.
- Added support for Hauppauge PVR 250 cards, and (possibly)
many others (bttv) with the same closed caption recording
This is the result of hacking MythTV's MPEG parser into
CCExtractor. Integration is not very good (to put it
midly) but it seems to work. Depending on the feedback I
may continue working on this or just leave it 'as it'
(good enough).
If you want to process a file generated by one of these
analog cards, use -myth. This is essential as it will
make the program take a totally different code path.
- Added .SAMI generation. I'm sure this can be improved,
though. If you have a good CSS for .SAMI files let me
0.19 (2007-05-03)
- Work on Dish Network streams, timing was completely broken.
It's fixed now at least for the samples I have, if it's not
completely fixed let me know. Credit for this goes to
Jack Ha who sent me a couple of samples and a first
implementation of a semi working-fix.
- Added support for several input files (see help screen for
- Added Unicode and Latin-1 encoding.
0.17 (2007-04-29)
- Extraction to .srt is almost complete - works correctly for
pop-up and roll-up captions, possibly not yet for paint-on
(mostly because I don't have any sample with paint-on captions
so I can't test).
- Minor bug fixes.
- Automatic TS/non-TS mode detection.
0.14 (2007-04-25)
- Work on handling special cases related to the MPEG reference
clock: Roll over, jumps, etc.
- Modified padding code a bit: In particular, padding occurs
on B-Frames now.
- Started work on CC data parsing (use -608 to see output).
- Added built-in input buffering.
- Major code reorganization.
- Added a decent progress indicator.
- Added TS header synchronization (so the input file no longer
needs to start with a TS header).
- Minor bug fixes.
0.07 (2007-04-19)
- Added MPEG reference clock parsing.
- Added auto padding in TS. Does miracles with timing.
- Added video information (as extracted from sequence header).
- Some code clean-up.
- FF sanity check enabled by default.