Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare two fingerprints. #18

Open
avatar29A opened this issue Oct 30, 2013 · 20 comments
Open

Compare two fingerprints. #18

avatar29A opened this issue Oct 30, 2013 · 20 comments

Comments

@avatar29A
Copy link

Another question. Maybe offtopic, but still. I have two identical tracks from different sources. Physically, the tracks are different (possibly encoding compression). I get their fingerprints. They are of course different. Is it possible to get some distance between fingerprints to compare their similarities. If so, in which direction to look? Thank you.

@lalinsky
Copy link
Collaborator

See https://groups.google.com/d/msg/acoustid/Uq_ASjaq3bw/kLreyQgxKmgJ for a general explanation, https://gist.github.com/lalinsky/1132166 for a very simple example and https://bitbucket.org/acoustid/acoustid-server/src/efb787c16ea1a0f6daf38611d12c85376d971b08/postgresql/acoustid_compare.c?at=master#cl-119 for a more complicated example.

Feel free to ask more questions on the acoustid mailing list or on IRC in #acoustid on freenode.

@sampsyo
Copy link
Member

sampsyo commented Oct 30, 2013

Cool; thanks for the links, @lalinsky. Would it make sense to add this bitwise comparison process to the library for reuse? I'll leave this issue open as a reminder to explore that possibility.

@avatar29A
Copy link
Author

Thank you, cool links!

@lalinsky
Copy link
Collaborator

It probably could be added. The problem I had with adding it directly to Chromaprint is that the simple version is not useful for me at AcoustID. I need to handle various exceptional cases, where simple average of the bit differences does not work well enough, and I have the feeling that the more complicated version is too specific to AcoustID (e.g. identifying only full files, not short snippets, etc.).

So I always resorted to only posting examples, not an actual reusable code.

@sampsyo
Copy link
Member

sampsyo commented Oct 31, 2013

That makes sense; thanks again for explaining. I'll reopen this ticket as a reminder to explore adding something simple based on your example.

@naggie
Copy link

naggie commented Apr 29, 2015

def compare_b64_fingerprint(a,b):
    'returns inverse bit error rate in percent'
    a = b64decode(a)
    b = b64decode(b)

    matching_bits  = 0
    different_bits = 0

    if len(b) != len(a):
        # trim to shortest
        shortest = min(len(a),len(b))
        longest  = max(len(a),len(b))

        a = a[:shortest]
        b = b[:shortest]

        # extra bits count as different
        different_bits = 8*(longest-shortest)

    for i in range( len(a) ):
        byte_a = ord( a[i:i+1] )
        byte_b = ord( b[i:i+1] )

        for x in range(8):
            bit_a = byte_a >> x & 1
            bit_b = byte_b >> x & 1

            if (bit_a == bit_b):
                matching_bits  +=1
            else:
                different_bits +=1

    total_bits = different_bits + matching_bits

    return float(matching_bits) / float(total_bits)

This is what I used in a python library that simply used the executable. I put it here in case anyone wants it.

@jimjibone

@lalinsky
Copy link
Collaborator

I don't think that code does what you expect. What are the inputs to the function? For comparing with fingerprints, you need an list of 32-bit integers. Here you are taking some base64-encoded string, decode it and then work with it one byte at a time.

Also, this kind of bit manipulations in Python are very slow. At the very least you should use an 8-bit popcount lookup table, e.g. https://gist.github.com/lalinsky/1132166

@naggie
Copy link

naggie commented Apr 29, 2015

It does exactly what I expect. Base64 strings were the input, generated with a (custom) executable, not the library. Not optimal but it worked bitwise.

Thanks for the link, I'll keep it in mind.

@lalinsky
Copy link
Collaborator

Oh, so these are not chromaprint fingerprints? In that case, it does not make much sense to comment here.

@naggie
Copy link

naggie commented Apr 30, 2015

They are base64 encoded chromaprints. Produced by an fpcalc modified to produce base64 encoded output instead of a series of ints....

The encoding is irrelevant, the method used to calculate BER is relevant here.

@lalinsky
Copy link
Collaborator

Well, I clearly can't convince you, but the code you are using is not comparing the audio fingerprints, only their compressed versions. If it works for you, it's just by luck. What you are doing is like taking two similar text files, compressing them with zip and then comparing the zip files. Even though the original text files were similar, the zip files are completely different.

@naggie
Copy link

naggie commented Apr 30, 2015

What? No, I have 2 regular fingerprints which are base64 encoded in a completely separate custom executable. I then run the executable in a python subprocess, and run this on the output.

I'm not comparing the raw audio files myself. The custom executable is generating a fingerprint for each song.The chromaprint is generated by this executable. What's not to get?

@lalinsky
Copy link
Collaborator

lalinsky commented May 8, 2015

I'm not sure why you posted the code example here, because if it's comparing output of your own program, it's not of much use who use standard acoustid/chromaprint tools.

@naggie
Copy link

naggie commented May 8, 2015

The algorithm for the bit error rate calculation is relevant here, it can be applied to any program that uses chromaprints -- if you ignore the bit that converts from base64.

@shidarin
Copy link

This would be really useful in pyacoustid.

@salamanders
Copy link

This result came up in my searching, I thought I'd add where I ended up in case it helps others.

  1. Cache a list of ints for each mp3 using fpcalc -raw -signed
  2. M^2 comparisons using kotlin:
fun chromaDistance(cl0: List<Int>, cl1: List<Int>): Double = cl0.zip(cl1).map { (c0, c1) ->
    (c0 xor c1).countOneBits()
}.also {
    require(it.isNotEmpty())
}.average()

And it felt like matches < 10.0 started to find duplicates.

@geri777
Copy link

geri777 commented Jul 15, 2020

I found this Python library which utilizes fpcalc and does a great job:
https://medium.com/@shivama205/audio-signals-comparison-23e431ed2207

@salamanders
Copy link

I found this Python library which utilizes fpcalc and does a great job:
https://medium.com/@shivama205/audio-signals-comparison-23e431ed2207

AH! Offsets. I forgot offsets. Dangit. Thank you for the link!

@bindestriche
Copy link
Contributor

I took a stab at it. Decompressed the fingerprints, as @lalinsky pointed out. I tried to port acoustid_compare.c . As such, I lay no claim to intellectual property to the code below, if it's correct feel free to include it in this lib.

import numpy as np
import acoustid
ACOUSTID_MAX_BIT_ERROR = 2
ACOUSTID_MAX_ALIGN_OFFSET = 120

def popcount(x):
    return bin(x).count('1')

def match_fingerprints(a, b):
    asize = len(a)
    bsize = len(b)
    numcounts = asize + bsize + 1
    counts = np.zeros(numcounts, dtype=int)

    for i in range(asize):
        jbegin = max(0, i - ACOUSTID_MAX_ALIGN_OFFSET)
        jend = min(bsize, i + ACOUSTID_MAX_ALIGN_OFFSET)
        for j in range(jbegin, jend):
            biterror = popcount(a[i] ^ b[j])
            if biterror <= ACOUSTID_MAX_BIT_ERROR:
                offset = i - j + bsize
                counts[offset] += 1

    topcount = counts.max()
    return topcount / min(asize, bsize)

def compare_fingerprints(a, b):
    import base64
    # pad
    from acoustid import chromaprint
    a=chromaprint.decode_fingerprint(a)[0]
    b=chromaprint.decode_fingerprint(b)[0]
    a = np.array(a, dtype=np.int32)
    b = np.array(b, dtype=np.int32)
    return match_fingerprints(a, b)

#use
_, a = acoustid.fingerprint_file(file_path1)
_, b = acoustid.fingerprint_file(file_path2)
compare_fingerprints(a,b)

@albertopasqualetto
Copy link
Contributor

It is now implemented in the library, so I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants