Skip to content

Commit

Permalink
synchdata
Browse files Browse the repository at this point in the history
  • Loading branch information
tjehan committed Jun 20, 2011
0 parents commit 7f1cb26
Show file tree
Hide file tree
Showing 12 changed files with 784 additions and 0 deletions.
28 changes: 28 additions & 0 deletions LICENSE
@@ -0,0 +1,28 @@
synchdata is open source software licensed under the "MIT License"
More information about the MIT License: http://en.wikipedia.org/wiki/MIT_License

Copyright (c) 2011 The Echo Nest Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.


synchdata makes use of the following pieces of software:

- Base64.cpp and Base64.h, see source files for license
Copyright (C) 2004-2008 René Nyffenegger
74 changes: 74 additions & 0 deletions README.md
@@ -0,0 +1,74 @@
# Synchdata with Synchstring

by Tristan Jehan, 06/20/2011

Copyright (c) 2011 The Echo Nest Corporation

Synchdata is some sample code (in C++ and Python) that demonstrates how to accurately synchronize [The Echo Nest analysis data](http://developer.echonest.com/docs/v4/track.html "Track API methods") to a corresponding waveform, regardless of which mp3 decoder was used to generate that waveform. This is done using the Echo Nest "synchstring," a base64 encoding of a zlib compression of an hex-encoded series of ASCII integers, that describe the zero-crossing locations for multiple chunks of audio throughout the file. The decoded list of integers is formatted as follows:

Fs Nch <Nzs Zi dz_1 ... dz_Nzs>_1 ... <Nzs Zi dz_1 ... dz_Nzs>_Nch

where,
Fs: sampling rate (currently 22050)
Nch: number of chunks (currently 3)
Nzs: number of zero crossings
Zi: a zero crossing reference
dz_n: number of samples to the next zero crossing

## Why is this useful?

All mp3 decoders (e.g. mpg123, ffmpeg, quicktime, lame, and others) have their own approach to decoding and correcting errors (corrupt frames). That leads to slight variations in the output waveform. In particular, the beginning of the waveform may be shifted in time by a small, yet noticeable time offset (e.g. tens of milliseconds). Unfortunately that offset is somewhat signal dependent, and therefore intractable by simply using the decoder name and version.

## How it works

Synchdata first decodes the synchstring into 3 lists of zero-crossing sample locations, as extracted by the Echo Nest analyzer (we use mpg123), i.e. 1 second worth of audio at the beginning, the middle and the end of the file. It then extracts zero-crossings in the same 3 locations from the proposed 1-second chunks of audio: locally decoded mp3, converted to mono and resampled at 22050 Hz. It finally correlates the zero-crossing data as described in the synchstring with that of the proposed waveform, and retains the optimal sample-accurate alignment (a time offset returned in seconds) for each of the chunks.

If the 3 time offsets are identical, then the offset can be trusted throughout the file, and added to any of the timing information provided in the JSON analysis data (e.g. segment onsets, beats, bars). If there's a mismatch between some of the computed offsets, then the analysis data is misaligned with the waveform somewhere, and sample accuracy isn't guarantied. This can occur when the decoder tries to cope with a corrupt mp3 frame by either inserting some silence, some bogus audio, or by discarding the frame, resulting in discontinuities and time misalignments.

## Speed

The synchdata sample code is provided as an example on how to deal with the Echo Nest synchstring and as a result, data synchronization. It is by no means optimized for speed but will be improved in future updates. For instance, the convolution function could be significantly accelerated with the [FFT-based algorithm](https://ccrma.stanford.edu/~jos/mdft/Convolution_Theorem.html "FFT Convolution"). If speed is a concern, or if only a partial waveform is available (e.g. when streaming audio), one can only compute the initial offset, and assume it to be accurate, while others become available. Currently, the maximum retrieved offset can be +/- 500 ms. However, we almost never run into offsets beyond +/- 100 ms. Computation can be reduced by correlating only 200 ms worth of zero crossings.

## C++

Compile the sample program using: make

Test the program with the proposed waveform stored in raw binary format for 3 different decoders:

$ ./synchdata ../data/billie.mpg123.22050.mono.raw ../data/billie.synchstring.txt
Offset = 0.00000 seconds

Note that since the synchstring was generated with the same version of mpg123, there's an exact match.

$ ./synchdata ../data/billie.ffmpeg.22050.mono.raw ../data/billie.synchstring.txt
Warning: Mismatch detected!
Found offsets 0.01197 -0.01415 -0.01415 seconds

In this case, an error occurred in the first section of the file. There will be a misalignment up to 0.01415 + 0.01197 = 0.02612 seconds or ~26 ms.

$ ./synchdata ../data/billie.quicktime.22050.mono.raw ../data/billie.synchstring.txt
Offset = 0.03478 seconds

The offset here is consistent and can be trusted. The client program should add this constant offset to the timing data in the JSON file.

## Python

Assuming numpy, base64, and zlib modules installed, run the test examples like this:

$ python synchdata.py ../data/billie.mpg123.22050.mono.raw ../data/billie.synchstring.txt
Offset = 0.00000 seconds

$ python synchdata.py ../data/billie.ffmpeg.22050.mono.raw ../data/billie.synchstring.txt
Warning: Mismatch detected!
Found offsets 0.01197 -0.01415 -0.01415 seconds

$ python synchdata.py ../data/billie.quicktime.22050.mono.raw ../data/billie.synchstring.txt
Offset = 0.03478 seconds

See comments in the C++ section.

## FAQ

Q: Can I use this yet?

A: No. The current API doesn't return synchstrings yet.
150 changes: 150 additions & 0 deletions c++/Base64.cxx
@@ -0,0 +1,150 @@
/*
base64.cpp and base64.h
Copyright (C) 2004-2008 René Nyffenegger
This source code is provided 'as-is', without any express or implied
warranty. In no event will the author be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this source code must not be misrepresented; you must not
claim that you wrote the original source code. If you use this source code
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original source code.
3. This notice may not be removed or altered from any source distribution.
René Nyffenegger rene.nyffenegger@adp-gmbh.ch
// Changed spacing, modified some parens for readability. Modified a variable name. JRS
Tristan Jehan: 06/10/2011 -- added support for base64_url decoding
*/

#include "Base64.h"
#include <iostream>

static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";

static const std::string base64_chars_url =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789-_";


static inline bool is_base64(unsigned char c) { return (isalnum(c) || (c == '+') || (c == '/'));}
static inline bool is_base64_url(unsigned char c) { return (isalnum(c) || (c == '-') || (c == '_'));}


std::string base64_encode(unsigned char const* bytes_to_encode, unsigned int in_len, bool url)
{
std::string ret;
int i = 0;
int j = 0;
unsigned char char_array_3[3];
unsigned char char_array_4[4];

while (in_len--)
{
char_array_3[i++] = *(bytes_to_encode++);
if (i == 3)
{
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;

for(i = 0; i < 4; i++)
{
if (url)
ret += base64_chars_url[char_array_4[i]];
else
ret += base64_chars[char_array_4[i]];
}
i = 0;
}
}

if (i)
{
for (j = i; j < 3; j++)
char_array_3[j] = '\0';

char_array_4[0] = ((char_array_3[0] & 0xfc) >> 2);
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = ((char_array_3[2] & 0x3f));

for (j = 0; j < (i + 1); j++)
{
if (url)
ret += base64_chars_url[char_array_4[j]];
else
ret += base64_chars[char_array_4[j]];
}

while (i++ < 3)
ret += '=';
}

return ret;
}

std::string base64_decode(std::string const& encoded_string, bool url)
{
int in_len = encoded_string.size();
int i = 0;
int j = 0;
int in = 0;
unsigned char char_array_4[4], char_array_3[3];
std::string ret;

const std::string this_base64_chars = url ? base64_chars_url : base64_chars;

while (in_len-- && encoded_string[in] != '=' && (url ? is_base64_url(encoded_string[in]) : is_base64(encoded_string[in])))
{
char_array_4[i++] = encoded_string[in];
in++;
if (i == 4)
{
for (i = 0; i < 4; i++)
char_array_4[i] = this_base64_chars.find(char_array_4[i]);

char_array_3[0] = (char_array_4[0] << 2) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

for (i = 0; i < 3; i++)
ret += char_array_3[i];
i = 0;
}
}

if (i)
{
for (j = i; j < 4; j++)
char_array_4[j] = 0;

for (j = 0; j < 4; j++)
char_array_4[i] = this_base64_chars.find(char_array_4[i]);

char_array_3[0] = ((char_array_4[0] << 2)) + ((char_array_4[1] & 0x30) >> 4);
char_array_3[1] = ((char_array_4[1] & 0xf) << 4) + ((char_array_4[2] & 0x3c) >> 2);
char_array_3[2] = ((char_array_4[2] & 0x3) << 6) + char_array_4[3];

for (j = 0; j < (i - 1); j++)
ret += char_array_3[j];
}

return ret;
}
9 changes: 9 additions & 0 deletions c++/Base64.h
@@ -0,0 +1,9 @@
#ifndef BASE64_H
#define BASE64_H

#include <string>

std::string base64_encode(unsigned char const* , unsigned int len, bool url);
std::string base64_decode(std::string const& s, bool url);

#endif
19 changes: 19 additions & 0 deletions c++/Makefile
@@ -0,0 +1,19 @@
UNAME := $(shell uname -s)
CXX=g++
CC=gcc
ARCH=`uname -m`
#OPTFLAGS=-g -O0
OPTFLAGS=-O3 -DNDEBUG
CXXFLAGS=-Wall -fPIC $(OPTFLAGS)
LDFLAGS=-L/usr/local/lib -lz -lpthread $(OPTFLAGS)
PROG=synchdata

main: $(MODULES) $(PROG).o
$(CXX) Base64.cxx $(LDFLAGS) $(PROG).o -o ./$(PROG)

%.o: %.cxx %.h
$(CXX) $(CXXFLAGS) -c -o $@ $<

clean:
rm -f *.o
rm -f $(PROG)

0 comments on commit 7f1cb26

Please sign in to comment.