A proof-of-concept multi-threaded variable-blocksize flac encoder frontend using a custom libFLAC as a backend ( https://github.com/chocolate42/flac )
Features:
- Multithreading, of variable and fixed blocking strategy encodes
- Flac/wav/raw-CDDA input
- Optional preservation of input flac metadata
- Optional seektable generation
- Multiple analysis modes, different ways to choose variable blocksizes
- Optional merge/tweak passes to refine frame permutation to be more space-efficient
- Piping of input and output
There's a few critical features missing (like full wav support, currently wav input is limited to 16 bit), so this is still alpha.
Usage: flaccid [options]
Note: There's two ways to define the compression settings used, either
using the simple interface (--preset and possibly --preset-apod), or the
complex interface (numerous settings allowing full customisation)
Options:
[General]
--in infile : Source. Use - to specify piping from stdin. Valid extensions are
.wav for wav format, .flac for flac format, .bin for raw CDDA
--input-format format : Force input to be treated as a particular format.
Valid options are: flac wav cdda
--lax : Allow non-subset settings
--no-md5 : Disable MD5 generation
--no-seek : Disable seeking of the output stream, meaning the header cannot be
updated at the end of the encode. Requires --no-md5 to also be set
to ensure the user knows that disabling seek disables MD5
--out outfile : Destination. Use - to specify piping to stdout. By default the
output pipe caches the entire output to RAM allowing the
header to be updated before writing to pipe. Using --no-seek
allows the output pipe to write output frames as soon as they
are available
--peakset-window size : Maximum window size in millions of samples (default 26
for 26 million samples, ~10 minutes of 44.1KHz input).
This is settable from simple or complex interface as
it mainly allows RAM usage to be customised
--preserve-flac-metadata: Preserve metadata from flac input, excluding padding
--queue size : Number of frames in output queue (default 8192), when output
queue is full it gets flushed. Tweak/merge acting on the output
queue and batching of output encoding allows multithreading
even if the mode used is single-threaded. This is settable from
simple or complex interface as it mainly allows RAM usage to be
customised
--seektable val : Defines if and how a seektable is generated:
-1 (default): Adapt to input if input size is known (most
flac/wav has total_sample_cnt in the header),
else use 100 seekpoints
0 : No seektable
n : n seekpoints
--workers integer : The maximum number of threads to use
[Simple interface]
--preset num[extra] : A preset optionally appended with extra flac settings
(supported settings e/l/m/p/q/r see ./flac for details).
Presets 0..8 match those in .flac and use a fixed
blocking strategy (the only caveat being that -M
adaptive mid-side is not supported by flaccid, so -1 and
-4 don't enable it. Presets 9 and up use variable
blocking strategies
--preset-apod apod : Apodization settings to overwrite those set by preset.
A single string semi-colon delimited for multiple apod
options
[Complex interface]
[Complex flac settings]
--analysis-apod apod_string : Apodization settings to use during analysis. If
supplied this overwrites the apod settings
defined by the flac preset
--analysis-comp comp_string : Compression settings to use during analysis
--output-apod apod_string : Apodization settings to use during output. If
supplied this overwrites the apod settings
defined by the flac preset
--output-comp comp_string : Compression settings to use during output
--outperc num : 1-100%, frequency of normal output settings (default 100%)
--outputalt-apod apod_string : Alt apod settings to use if outperc not 100%.
If supplied this overwrites the apod settings
defined by the flac preset
--outputalt-comp comp_string : Alt output settings to use if outperc not 100%
[Complex flaccid settings]
--mode mode : Which variable-blocksize algorithm to use for analysis. Valid
modes: fixed, peakset, gasc, chunk, gset
--blocksize-list block,list : Blocksizes that a mode is allowed to use for
analysis. Different modes have different
constraints on valid combinations
--blocksize-limit-lower limit : Minimum blocksize a frame can be
--blocksize-limit-upper limit : Maximum blocksize a frame can be
--merge threshold : If set enables merge passes, iterates until a pass saves
less than threshold bytes
--tweak threshold : If set enables tweak passes, iterates until a pass saves
less than threshold bytes
Modes:
fixed: A fixed blocking strategy like the reference encoder. Must use only one
blocksize, cannot use tweak or merge passes, analysis settings unused
Effort O(1)
peakset: Find the optimal permutation of frames for a given blocksize list.
Truly optimal if analysis settings are the same as output settings.
Tweak/merge passes can still be a benefit as they can use blocksizes
not on the list
Effort O(blocksize_count^2) when blocksizes are contiguous multiples
of the smallest blocksize. (n*(n+1))/2
gasc: To find the next frame, test larger and larger blocksizes until
efficiency drops (then pick previous). Typically better than gset
chunk: Process input as chunks, a chunk evenly subdivides the input by building
a tree, the children of a node subdivide the input range of the parent.
The root has a range of the maximum blocksize in the list
Effort O(blocksize_count)
gset: Test all from a set of blocksizes and greedily pick the most efficient
as the next frame
Additional passes:
tweak: Adjusts where adjacent frames are split to look for a more efficient
encoding. Every pass uses a smaller and smaller offset as we try and
get closer to optimal. Multithreaded, acts on the output queue and can
be sped up at a minor efficiency loss by using a smaller queue
merge: Merges adjacent frames to see if the result is more efficient. Best used
with --lax for lots of merging headroom, a sane subset encoding is
unlikely to see much if any benefit as subset is limited to a blocksize
of 4608. Multithreaded, acts on the output queue and can be sped up at a
minor efficiency loss by using a smaller queue
Compression settings format:
* Mostly follows ./flac interface but requires settings to be in single string
* Compression level must be the first element
* Supported settings: e, m, l, p, q, r (see ./flac -h)
* Adaptive mid-side from ./flac is not supported (-M), affects compression
levels 1 and 4
* ie "5er4" defines compression level 5, exhaustive model search, max rice
partition order up to 4
Apodization settings format:
* All apodization settings in a single semi-colon-delimited string
* ie tukey(0.5);partial_tukey(2);punchout_tukey(3)
First build a custom libFLAC that includes a static encoder implementation, the source for that is in the static_encoder branch of this repository: https://github.com/chocolate42/flac/tree/static_encoder
Then to build flaccid on Linux do something like this:
gcc -oflaccid chunk.c common.c fixed.c flaccid.c gasc.c gset.c load.c peakset.c seektable.c -I<PATH_TO_LIBFLAC_INCLUDE> <PATH_TO_libFLAC-static.a> -lcrypto -lm -logg -fopenmp -Wall -O3 -funroll-loops -Wall -Wextra -Wstrict-prototypes -Wmissing-prototypes -Waggregate-return -Wcast-align -Wnested-externs -Wshadow -Wundef -Wmissing-declarations -Winline -Wdeclaration-after-statement -fvisibility=hidden -fstack-protector-strong
This is just a copy of the default flags used to compile libFLAC, plus OpenMP for coarse multithreading and OpenSSL for MD5.
The changes boil down to:
- Leaving the existing API untouched
- Adding a new type FLAC__StaticEncoder which simply wraps FLAC__StreamEncoder, allowing stream functions to be used internally but separating the interface externally
- A few necessary functions to create and destroy the new type
- The user encodes per-frame, by feeding an entire frame of input along with the frame/sample index and providing a valid static encoder instance. Instead of callbacks the function returns a buffer containing the encoded frame, valid until the static encoder instance is re-used
- To reduce needlessly copying data there's a variant with int16_t[] input. There's still a copy from input to internal buffer but it eliminates the intermediate external int32_t[] buffer
- The frame encoders do not do MD5 hashing, hashing is an in-order operation and we cannot guarantee that (a major use case of the API is to allow things to be done out-of-order)