OLAF – Overly Lightweight Acoustic Fingerprinting
Olaf is a C application or library for content-based audio search. Olaf is able to extract fingerprints from an audio stream, and either store those fingerprints in a database, or find a match between extracted fingerprints and stored fingerprints. Olaf does this efficiently in order to be used on embedded platforms, traditional computers or in web browsers via WASM.
Please be aware of the patents US7627477 B2 and US6990453 and perhaps others. They describe techniques used in algorithms implemented within Olaf. These patents limit the use of Olaf under various conditions and for several regions. Please make sure to consult your intellectual property rights specialist if you are in doubt about these restrictions. If these restrictions apply, please respect the patent holders rights. The main aim of Olaf is to serve as a learning platform on efficient (embedded) acoustic fingerprinting algorithms.
- Why Olaf?
- Olaf on traditional computers
- Olaf in the browser
- Embedded Olaf
- Olaf Usage
- Configuring Olaf
- Testing, evaluating and benchmarking Olaf
- Contribute to Olaf
- Olaf documentation
- Limitations of Olaf
- Further Reading
With respect to other audio search systems, Olaf stands out for three reasons. Olaf runs on embedded devices. Olaf is fast on traditional computers. Olaf runs in the browsers.
There seem to be no lightweight acoustic fingerprinting libraries that are straightforward to run on embedded platforms. On embedded platforms memory and computational resources are severely limited. Olaf is written in portable C with these restrictions in mind. Olaf mainly targets 32-bit ARM devices such as some Teensy’s, some Arduino’s and the ESP32. Other modern embedded platforms with similar specifications and should work as well.
Olaf, being written in portable C, also targets traditional computers. There, the efficiency of Olaf makes it run fast. On embedded devices reference fingerprints are stored in memory. On traditional computers fingerprints are stored in a high-performance key-value-store: LMDB. LMDB offers an a B+-tree based persistent storage ideal for small keys and values with low storage overhead.
Olaf works in the browser. Via Emscripten Olaf can be compiled to WASM. This makes it relatively straightforward to combine the capabilities of the Web Audio API and Olaf to create browser based audio fingerprinting applications.
Olaf was featured on hackaday. There is also a small discussion about Olaf on Hacker News.
Olaf on traditional computers
To use Olaf
ruby need to be installed on your system. While the core of Olaf is in pure c, a Ruby script provides an easy to use interface to its capabilities. The Ruby script converts audio (with
ffmpeg), parses command line arguments and reports results in a readable format.
To install ffmpeg and ruby on a Debian like system:
apt-get install ffmpeg ruby. On macOS ruby is available by default and
ffmpeg can be installed with homebrew by calling
brew install ffmpeg.
Compilation and installation
To compile the version with a key value store for traditional computers use the following. By default the makefile uses
gcc set to the C11 standard. Other compilers compliant with the C11 standard work equally well. Make sure
gcc is installed correctly or modify the Makefile for your compiler of choice. Compilation and installation:
make make install
By default, a directory named
.olaf is created in the current user home directory. The command line script is installed to
/usr/local/olaf, which is assumed to be on the user’s path.
Compilation with Zig
Alternatively you can build Olaf with zig. Zig is a programming language which also ships with a c compiler. In this project Zig is employed as an easy to use cross-compiler.
zig.build file is provided. Just call “zig build” to compile for your platform. On an M1 mac it gives the following:
zig build file zig-out/bin/olaf_c # Mach-O 64-bit executable arm64
The power of Zig is, however, more obvious when an other target is provided. For a list of platforms, please consult the output of
zig targets. To build a windows executable run the following commands.
zig build -Dtarget=x86_64-windows-gnu -Drelease-small file zig-out/bin/olaf_c.exe # PE32+ executable (console) x86-64, for MS Windows
Zig allso supports WebAssembly as a target platform as an alterative to Emscripten. The
zig.build file includes a conditional to build the memory db for WASM. To get a WASM binary call the following:
zig build -Dtarget=wasm32-wasi-musl -Drelease-small file zig-out/bin/olaf_c.wasm #WebAssembly (wasm) binary module version 0x1 (MVP)
Olaf on Windows
Olaf has been designed with a UNIX like environment in mind but also works on Windows. A pre-built windows executable can be found in the
pre-built folder. There is additional documentation for running Olaf on Windows.
Olaf on Docker
Olaf can be containerized with Docker. The provided Dockerfile is based on Alpine linux and installs only the needed requirements. A way to build the container and run Olaf:
docker build -t olaf:1.0 . wget "https://filesamples.com/samples/audio/mp3/sample3.mp3" docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf store sample3.mp3 docker run -v $HOME/.olaf/docker_dbs:/root/.olaf -v $PWD:/root/audio olaf:1.0 olaf stats
In this case the
$HOME/.olaf/docker_dbs is mapped to
/root/.olaf in the container to store the database. The audio enters tye system via a mapping of
/root/audio in the container, which is also the work dir in the container. So relative paths with respect to
$PWD of the host can be resolved in the container, absolute paths can not.
Olaf in the browser
To compile Olaf to WASM the emcc compiler from Emcripten is used. Make sure Emscripten is correctly installed and run the following:
make web python3 -m http.server --directory wasm #start a web browser open "http://localhost:8000/spectrogram.html" #open the url in standard browser
Note that the web version does not use a key value store but a list of hashes stored in a header-file. See below for more info.
Olaf is has been tested on the ESP32 microcontroller. Other, similar microcontroller might work as well. The embedded version does not use a key value store but a list of hashes stored in a
.h header file. This header file is the fingerprint index.
The ‘index’ or header file can be created on a computer to contain fingerprints extracted from audio. To create this header file do the following:
make mem olaf to_raw your_audio_file.mp3 bin/olaf_mem store olaf_audio_your_audio_file.raw "arandomidentifier" > your_header_file.h
To test and debug this header file, use the ‘mem’ version of Olaf on your computer. The ESP32 version is basically the same as the mem version, only the audio comes from microphone input and not from a file.
On traditional computers, Olaf provides a command line interface it can be called using
olaf subapplication [argument...]. For each task Olaf has to perform, there is a subapplication. There are subapplications to store fingerprints, to query audio fragments,… . A typical interaction with olaf could look like this:
The store command extracts fingerprints from an audio file and stores them in a reference database. The incoming audio is decoded and resampled using ffmpeg. ffmpeg needs to be installed on your system and available on the path.
olaf store audio_item...
The audio_item can be:
- An audio file:
@olaf store audio.mp3@, if the audio file contains multiple channels they are mixed to a mono.
- A video file. The first audio stream is extracted from the video container and used as input:
@olaf store video.mkv@
- A folder name: Olaf attempts to recursively find all audio files within the folder. It does this with a limited allowlist of known audio file name extensions.
@olaf store /home/user/Music@
- A text file: The text file should contain a list of file names. The following commands recursively finds all mp3 within the current directory and subsequently stores them in the reference database.
find . -name "*.mp3" > list.txt olaf store list.txt
Internally each audio stream is given an identifier using a one time Jenkins Hash function. This identifier is returned when a match is found. A list connecting these identifiers to file names is also stored automatically.
The query command extracts fingerprints and matches them with the database:
olaf query query_fragment.opus
See “the ffmpeg input devices docs for your platform:” http://www.ffmpeg.org/ffmpeg-devices.html#Input-Devices
ffmpeg -f avfoundation -list_devices true -i ""
ffmpeg -f avfoundation -i "none:default" -ac 1 -ar 16000 -f f32le -acodec pcm_f32le pipe:1 | olaf query
The monitor command splits the query file into steps of x seconds. When working in steps of 5 seconds, then the first five seconds are matched with the reference database and matches are reported. Subsequently it goes on with the next 5 seconds and so forth. This is practical if an unsegmented audio file needs to be matched with the reference database.
olaf monitor audio_item.mp3...
There is also a multithreaded ‘batch’ version of the monitor command which writes a text file for each analysed audio file. This makes it easy to continue where the process stopped
olaf batch_monitor audio_list.txt
Deletion of fingerprints is similar to adding prints:
olaf delete item.mp3
Note that it is currently unclear what the performance implications are when adding and removing many items to the db. In other words: how balanced the B+ tree remains with many leaves removed.
Deduplicate a collection
This command finds duplicate audio content in a folder. First each audio file is stored in the reference database. Next each file is used as a query and matched with the reference database. The file should, evidently, match itself but self-matches are ignored, leaving only duplicates.
A duplicate means that audio from the original is found in another file. The start and stop times of the found fragment are reported. If the match reports a start of nearly zero and a duration similar to the duration of the original audio file then a ‘full duplicate’ is found: it is almost certainly the same exact track. If only a couple of seconds are reported it means that only a couple of seconds of the original audio are found in the duplicate.
olaf dedup field_recordings/archive
There is also the
dedupm command which first operates similarly to the
dedup command but uses the monitor command in stead of the query command. This is useful if you expect partial matches.
To get statistics on the database use
stats. It prints information on the b-tree structure backing the storage.
Cache fingerprints and store cached fingerprints
Caching prints can be practical to speed up indexing. Olaf is single threaded mainly for use on embedded platforms. Additionally, the data storage of Olaf only allows one write thread (multiple readers are allowed). However, on more powerfull computing machinery it might be of interest to use multiple cores to extract fingerprints. This can speed up indexing significantly.
The way around this is to cache fingerprints to a file with all available cores and later store all prints in the index.
olaf cache *.mp3 olaf store_cached
By default the script uses 7 cores and needs the
threach ruby gem. Install this with
gem install threach. Also a default is the storage place for cached items:
~/.olaf/cache. These default can be changed in the
/usr/local/bin/olaf ruby file. The configuratiion can be found at the top.
Olaf has a number of configuration parameters. Currently these are done during compile time in the
olaf_config.c file. This is to avoid changing configuration at runtime which could result in an index being not compatible with fingerprints extracted from a query (with different configuration parameters).
The configuration includes the amount of fingerprints extracted, the location of the data directory, configuration related to matching, … Each configuration setting has a small description. There is a default configration for
default cases which slightly differ.
The olaf.rb Ruby script also contains a few settings related to how audio is decoded, where data is expected, the number of threads which can be used,… To change these settings, edit the script itself (typically located at
/usr/local/bin/olaf. Changes to configuration there are immediatly applied.
Testing, Evaluating and Benchmarking Olaf
The tests check wether Olaf works.The evaluation verifies how well Olaf works. The benchmark checks how fast Olaf works and how it deals with scalability.
The first thing this checks is whether Olaf compiles correctly. Afterwards, a small dataset is indexed and some queries are fired. The result of the queries is evaluated for correctness. Also the memory version of Olaf is checked. To run this yourself, with Ruby, ffmpeg and ffprobe installed:
git clone https://github.com/JorenSix/Olaf cd Olaf make && make install ruby eval/olaf_functional_tests.rb
Less interesting are the unit tests, these are mainly of interest for developing Olaf. The unit test can be compiled with `make test` and ran with `./bin/olaf_tests`.
In the Evaluating Olaf
evalfolder there is an evaluation script which takes a folder as input and stores and evaluates queries with several modifications. SoX needs to be available on the system for this to work.
ruby eval/olaf_evaluation.rb /folder/with/music
With the script a folder of audio files is stored and it is registered how long it takes to store 64, 128, 256, 512,… files. If run with the FMA full dataset a total of more than 200 days of audio are stored at a rate of just under 2000 times realtime with a 96 CPU-core system. An interpretation of the graph is that indexing remains linear on larger datasets. At every doubling of the database the query performance is also checked. Run the benchmark yourself:
ruby eval/olaf_benchmark/olaf_benchmark.rb /folder/with/music
Contribute to Olaf
There are several ways to contribute to Olaf. The first is to use Olaf and report issues or feature requests using the Github Issue tracker. Bug reports are greatly appreciated, but keep in mind the note below on responsiveness.
Another way to contribute is to dive into the code, fork and improve Olaf yourself. Merge requests with additional documentation, bug fixes or new features will be handled and end up in the main branch if correctness, maintainability and simplicity are kept in check. However, keep in mind the note below:
My time to spend on Olaf is limited and goes in activity bursts. If an issue is raised it might take a couple of months before I am able to spend time on it during the next burst of activity. A period of relative silence does not mean your feedback / pull request is not greatly valued!
The documentation of Olaf is generated with doxygen and can be consulted on github via the gh-pages branch.
Limitations of Olaf
Currently a complete audio file is read in memory in order to process it. While the algorithm allows streaming, the current implementation does not allow to store incoming streams.Now you can choose whether to compile in the single or stream reader: choose either
olaf_reader_stream.c, which is a bit slower.
- Only one write process: LMDB limits the number of processes that can write to the key-value store to a single process. Attempting to write to the key-value store while an other write-process is busy should put the process automatically in a wait state untile the write lock is released (by the other process). Reading can be done frome multiple processes at the same time.
- Audio decoding is done externally. The core of Olaf does fingerprinting. ffmpeg or similar audio decoding software is required to decode and resample various audio formats.
Removing items from the reference database is currently not supported. The underlying database does support deletion of items so it should be relatively easy to add this functionality.
- Performance implications when removing many items from the database are currently unclear. In other words: how balanced does the B+tree remain when removing many leaves.
- Olaf is single threaded. The main reasons are simplicity and limitations of embedded platforms. The single threaded design keeps the code simple. On embedded platforms with single core CPU’s multithreading makes no sense.
On traditional computers there might be a performance gain by implementing multi-threading. However, the time spent on decoding audio and storing fingerprints is much larger than analysis/extraction so the gain might be limited. As an work-around multiple processes can be used simultaniously to query the database.
- The limitation of the number of tracks that can be indexed and queried on a single computer is not known. Olaf has been used to index and query the fma_full dataset. This dataset contains 100 000 tracks totaling more than 340 days of audio. The dataset, around 800GB of mp3s, were indexed in a 15GB database and query speed remained at a respectable 80 times realtime: it only takes a single second to query 80 seconds of audio. With the datastructure having logaritmic complexity the limit of the number of songs per pc might be a couple of times higher.
Some relevant reading material about (landmark based) acoustic fingerprinting. The order gives an idea of relevance to the Olaf project.
- Wang, Avery L. An Industrial-Strength Audio Search Algorithm (2003)
- Six, Joren and Leman, Marc Panako – A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification (2014)
- Cano, Pedro and Batlle, Eloi and Kalker, Ton and Haitsma, Jaap A Review of Audio Fingerprinting (2005)
- Arzt, Andreas and Bock, Sebastian and Widmer, Gerhard Fast Identification of Piece and Score Position via Symbolic Fingerprinting (2012)
- Fenet, Sebastien and Richard, Gael and Grenier, Yves A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting (2011)
- Ellis, Dan and Whitman, Brian and Porter, Alastair Echoprint – An Open Music Identification Service (2011)
- Sonnleitner, Reinhard and Widmer, Gerhard Quad-based Audio Fingerprinting Robust To Time And Frequency Scaling (2014)
- Sonnleitner, Reinhard and Widmer, Gerhard Robust Quad-Based Audio Fingerprinting (2015)
The programming style of Olaf attempts to use an OOP inspired way to organize code and divide responsibilities and interfaces. For more information see on this style consult this document about OOP in C. Also of interest is the Modern C book by Jens Gustedt.
- PFFFT a pretty fast FFT library. BSD license
- LMDB Lightning Memory-Mapped Database, a fast key value store with a permissive software license (the OpenLDAP Public License) by Howard Chu.
- Hash table and dequeue by Simon Howard, ISC lisence
Olaf by Joren Six at IPEM, Ghent University.