synchdata

echonest · Jun 20, 2011 · 7f1cb26 · 7f1cb26
commit 7f1cb26
Show file tree

Hide file tree

Showing 12 changed files with 784 additions and 0 deletions.
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,28 @@
+synchdata is open source software licensed under the "MIT License"
+More information about the MIT License: http://en.wikipedia.org/wiki/MIT_License
+
+Copyright (c) 2011 The Echo Nest Corporation
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
+
+
+synchdata makes use of the following pieces of software:
+
+- Base64.cpp and Base64.h, see source files for license
+Copyright (C) 2004-2008 René Nyffenegger
diff --git a/README.md b/README.md
@@ -0,0 +1,74 @@
+# Synchdata with Synchstring
+
+by Tristan Jehan, 06/20/2011
+
+Copyright (c) 2011 The Echo Nest Corporation
+
+Synchdata is some sample code (in C++ and Python) that demonstrates how to accurately synchronize [The Echo Nest analysis data](http://developer.echonest.com/docs/v4/track.html "Track API methods") to a corresponding waveform, regardless of which mp3 decoder was used to generate that waveform. This is done using the Echo Nest "synchstring," a base64 encoding of a zlib compression of an hex-encoded series of ASCII integers, that describe the zero-crossing locations for multiple chunks of audio throughout the file. The decoded list of integers is formatted as follows:
+
+    Fs Nch <Nzs Zi dz_1 ... dz_Nzs>_1 ... <Nzs Zi dz_1 ... dz_Nzs>_Nch
+
+    where,
+    Fs: sampling rate (currently 22050)
+    Nch: number of chunks (currently 3)
+    Nzs: number of zero crossings
+    Zi: a zero crossing reference
+    dz_n: number of samples to the next zero crossing
+
+## Why is this useful?
+
+All mp3 decoders (e.g. mpg123, ffmpeg, quicktime, lame, and others) have their own approach to decoding and correcting errors (corrupt frames). That leads to slight variations in the output waveform. In particular, the beginning of the waveform may be shifted in time by a small, yet noticeable time offset (e.g. tens of milliseconds). Unfortunately that offset is somewhat signal dependent, and therefore intractable by simply using the decoder name and version.
+
+## How it works
+
+Synchdata first decodes the synchstring into 3 lists of zero-crossing sample locations, as extracted by the Echo Nest analyzer (we use mpg123), i.e. 1 second worth of audio at the beginning, the middle and the end of the file. It then extracts zero-crossings in the same 3 locations from the proposed 1-second chunks of audio: locally decoded mp3, converted to mono and resampled at 22050 Hz. It finally correlates the zero-crossing data as described in the synchstring with that of the proposed waveform, and retains the optimal sample-accurate alignment (a time offset returned in seconds) for each of the chunks.
+
+If the 3 time offsets are identical, then the offset can be trusted throughout the file, and added to any of the timing information provided in the JSON analysis data (e.g. segment onsets, beats, bars). If there's a mismatch between some of the computed offsets, then the analysis data is misaligned with the waveform somewhere, and sample accuracy isn't guarantied. This can occur when the decoder tries to cope with a corrupt mp3 frame by either inserting some silence, some bogus audio, or by discarding the frame, resulting in discontinuities and time misalignments.
+
+## Speed
+
+The synchdata sample code is provided as an example on how to deal with the Echo Nest synchstring and as a result, data synchronization. It is by no means optimized for speed but will be improved in future updates. For instance, the convolution function could be significantly accelerated with the [FFT-based algorithm](https://ccrma.stanford.edu/~jos/mdft/Convolution_Theorem.html "FFT Convolution"). If speed is a concern, or if only a partial waveform is available (e.g. when streaming audio), one can only compute the initial offset, and assume it to be accurate, while others become available. Currently, the maximum retrieved offset can be +/- 500 ms. However, we almost never run into offsets beyond +/- 100 ms. Computation can be reduced by correlating only 200 ms worth of zero crossings.
+
+## C++
+
+Compile the sample program using: make
+
+Test the program with the proposed waveform stored in raw binary format for 3 different decoders:
+
+    $ ./synchdata ../data/billie.mpg123.22050.mono.raw ../data/billie.synchstring.txt 
+    Offset = 0.00000 seconds
+
+Note that since the synchstring was generated with the same version of mpg123, there's an exact match.
+
+    $ ./synchdata ../data/billie.ffmpeg.22050.mono.raw ../data/billie.synchstring.txt 
+    Warning: Mismatch detected!
+    Found offsets 0.01197 -0.01415 -0.01415 seconds
+
+In this case, an error occurred in the first section of the file. There will be a misalignment up to 0.01415 + 0.01197 = 0.02612 seconds or ~26 ms.
+
+    $ ./synchdata ../data/billie.quicktime.22050.mono.raw ../data/billie.synchstring.txt 
+    Offset = 0.03478 seconds
+
+The offset here is consistent and can be trusted. The client program should add this constant offset to the timing data in the JSON file.
+
+## Python
+
+Assuming numpy, base64, and zlib modules installed, run the test examples like this:
+
+    $ python synchdata.py ../data/billie.mpg123.22050.mono.raw ../data/billie.synchstring.txt
+    Offset = 0.00000 seconds
+
+    $ python synchdata.py ../data/billie.ffmpeg.22050.mono.raw ../data/billie.synchstring.txt
+    Warning: Mismatch detected!
+    Found offsets 0.01197 -0.01415 -0.01415 seconds
+
+    $ python synchdata.py ../data/billie.quicktime.22050.mono.raw ../data/billie.synchstring.txt
+    Offset = 0.03478 seconds
+
+See comments in the C++ section.
+
+## FAQ
+
+Q: Can I use this yet?
+
+A: No. The current API doesn't return synchstrings yet.
diff --git a/c++/Base64.cxx b/c++/Base64.cxx
@@ -0,0 +1,150 @@
+/* 
+   base64.cpp and base64.h
+
+   Copyright (C) 2004-2008 René Nyffenegger
+
+   This source code is provided 'as-is', without any express or implied
+   warranty. In no event will the author be held liable for any damages
+   arising from the use of this software.
+
+   Permission is granted to anyone to use this software for any purpose,
+   including commercial applications, and to alter it and redistribute it
+   freely, subject to the following restrictions:
+
+   1. The origin of this source code must not be misrepresented; you must not
+      claim that you wrote the original source code. If you use this source code
+      in a product, an acknowledgment in the product documentation would be
+      appreciated but is not required.
+
+   2. Altered source versions must be plainly marked as such, and must not be
+      misrepresented as being the original source code.
+
+   3. This notice may not be removed or altered from any source distribution.
+
+   René Nyffenegger rene.nyffenegger@adp-gmbh.ch
+
+   // Changed spacing, modified some parens for readability. Modified a variable name. JRS
+   
+   Tristan Jehan: 06/10/2011 -- added support for base64_url decoding
+*/
+
+#include "Base64.h"
+#include <iostream>
+
+static const std::string base64_chars = 
+             "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+             "abcdefghijklmnopqrstuvwxyz"
+             "0123456789+/";
+
+static const std::string base64_chars_url = 
+              "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
+              "abcdefghijklmnopqrstuvwxyz"
+              "0123456789-_";
+
+
+static inline bool is_base64(unsigned char c)     { return (isalnum(c) || (c == '+') || (c == '/'));}
+static inline bool is_base64_url(unsigned char c) { return (isalnum(c) || (c == '-') || (c == '_'));}
+
+
+std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len, bool url) 
+{
+    std::string ret;
+    int i = 0;
+    int j = 0;
+    unsigned char char_array_3[3];
+    unsigned char char_array_4[4];
+
+    while (in_len--)
+    {
+        char_array_3[i++] = *(bytes_to_encode++);
+        if (i == 3)
+        {
+            char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
+            char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
+            char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
+            char_array_4[3] = char_array_3[2] & 0x3f;
+
+            for(i = 0; i < 4; i++)
+            {
+                if (url)
+                    ret += base64_chars_url[char_array_4[i]];
+                else
+                    ret += base64_chars[char_array_4[i]];
+            }
+            i = 0;
+        }
+    }
+
+    if (i)
+    {
+        for (j = i; j < 3; j++)
+            char_array_3[j] = '\0';
+
+        char_array_4[0] = ((char_array_3[0] & 0xfc) >> 2);
+        char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
+        char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
+        char_array_4[3] = ((char_array_3[2] & 0x3f));
+
+        for (j = 0; j < (i + 1); j++)
+        {
+            if (url)
+                ret += base64_chars_url[char_array_4[j]];
+            else
+                ret += base64_chars[char_array_4[j]];
+        }
+
+        while (i++ < 3)
+            ret += '=';
+    }
+
+    return ret;
+}
+
+std::string base64_decode(std::string const& encoded_string, bool url)
+{
+    int in_len = encoded_string.size();
+    int i = 0;
+    int j = 0;
+    int in = 0;
+    unsigned char char_array_4[4], char_array_3[3];
+    std::string ret;
+
+    const std::string this_base64_chars = url ? base64_chars_url : base64_chars;
+
+    while (in_len-- && encoded_string[in] != '=' && (url ? is_base64_url(encoded_string[in]) : is_base64(encoded_string[in])))
+    {
+        char_array_4[i++] = encoded_string[in];
+        in++;
+        if (i == 4)
+        {
+            for (i = 0; i < 4; i++)
+                char_array_4[i] = this_base64_chars.find(char_array_4[i]);
+
+            char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
+            char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
+            char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
+
+            for (i = 0; i < 3; i++)
+                ret += char_array_3[i];
+            i = 0;
+        }
+    }
+
+    if (i)
+    {
+        for (j = i; j < 4; j++)
+            char_array_4[j] = 0;
+
+        for (j = 0; j < 4; j++)
+            char_array_4[i] = this_base64_chars.find(char_array_4[i]);
+
+        char_array_3[0] = ((char_array_4[0] << 2))       + ((char_array_4[1] & 0x30) >> 4);
+        char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
+        char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];
+
+        for (j = 0; j < (i - 1); j++)
+            ret += char_array_3[j];
+    }
+
+    return ret;
+}
diff --git a/c++/Base64.h b/c++/Base64.h
@@ -0,0 +1,9 @@
+#ifndef BASE64_H
+#define BASE64_H
+
+#include <string>
+
+std::string base64_encode(unsigned char const* , unsigned int len, bool url);
+std::string base64_decode(std::string const& s, bool url);
+
+#endif
diff --git a/c++/Makefile b/c++/Makefile
@@ -0,0 +1,19 @@
+UNAME := $(shell uname -s)
+CXX=g++
+CC=gcc
+ARCH=`uname -m`
+#OPTFLAGS=-g -O0
+OPTFLAGS=-O3 -DNDEBUG
+CXXFLAGS=-Wall -fPIC $(OPTFLAGS)
+LDFLAGS=-L/usr/local/lib -lz -lpthread $(OPTFLAGS)
+PROG=synchdata
+
+main: $(MODULES) $(PROG).o
+	$(CXX) Base64.cxx $(LDFLAGS) $(PROG).o -o ./$(PROG)
+
+%.o: %.cxx %.h
+	$(CXX) $(CXXFLAGS) -c -o $@ $<
+
+clean:
+	rm -f *.o
+	rm -f $(PROG)