Skip to content
This repository has been archived by the owner on Mar 28, 2022. It is now read-only.

endianness of murmurhash #23

Open
bwhitman opened this issue Jul 11, 2011 · 6 comments
Open

endianness of murmurhash #23

bwhitman opened this issue Jul 11, 2011 · 6 comments

Comments

@bwhitman
Copy link

from ashwin:

MurmurHash2 used gives different results on big endian and little endian machines. Looks like if this is fixed, it might break codegen backward compatibility? Could someone explain if this was by design or a bug? How will the matching work if the codes are different?
Edit Fingerprint.css, replace MurmurHash2 with:

//-----------------------------------------------------------------------------
// MurmurHashNeutral2, by Austin Appleby

// Same as MurmurHash2, but endian- and alignment-neutral.
// Half the speed though, alas.

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
    const unsigned int m = 0x5bd1e995;
    const int r = 24;

    unsigned int h = seed ^ len;

    const unsigned char * data = (const unsigned char *)key;

    while(len >= 4)
    {
        unsigned int k;

        k  = data[0];
        k |= data[1] << 8;
        k |= data[2] << 16;
        k |= data[3] << 24;

        k *= m; 
        k ^= k >> r; 
        k *= m;

        h *= m;
        h ^= k;

        data += 4;
        len -= 4;
    }

    switch(len)
    {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0];
            h *= m;
    };

    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;

    return h;
} 

Looks like it is the same on little endian machines:

int main()
{
  char a[100];
  printf("%d, %d\n", MurmurHash2(a, sizeof(a), 2323), MurmurHashNeutral2(a, sizeof(a), 2323));
}
Output:
-330669574, -330669574

Machine used was 64bit Intel.

@alastair
Copy link
Contributor

MurmurHashNeutral2 is twice as slow as the little-endian only version. We should only use it if we detect a big-endian architecture. https://sites.google.com/site/murmurhash/

@bwhitman
Copy link
Author

But in practice codegenning?

(mobile)

On Jul 11, 2011, at 1:47 PM, alastairreply@reply.github.com wrote:

MurmurHashNeutral2 is twice as slow as the little-endian only version. We should only use it if we detect a big-endian architecture. https://sites.google.com/site/murmurhash/

Reply to this email directly or view it on GitHub:
#23 (comment)

@alastair
Copy link
Contributor

true - less than 1/10th of a second over a 5 minute song.

@alastair
Copy link
Contributor

Need to dig into the whole endian thing a bit more - 16bit LE PCM won't read correctly on a BE machine anyway, so there's a chance that more parts of the codegen need tweaking.

@ludflu
Copy link

ludflu commented Dec 29, 2011

I think we're running into this problem - we codgenned on x86 to populate our database, then got our android app to codegen ten second samples from the mic - and nothing matches. I'm thinking this is because ARM is little endian and x86 is big ? How could this work in the IOS sample? or is that intended for the desktop and not iPhones ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants