endianness of murmurhash #23

bwhitman · 2011-07-11T17:20:05Z

from ashwin:

MurmurHash2 used gives different results on big endian and little endian machines. Looks like if this is fixed, it might break codegen backward compatibility? Could someone explain if this was by design or a bug? How will the matching work if the codes are different?
Edit Fingerprint.css, replace MurmurHash2 with:

//-----------------------------------------------------------------------------
// MurmurHashNeutral2, by Austin Appleby

// Same as MurmurHash2, but endian- and alignment-neutral.
// Half the speed though, alas.

unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
{
    const unsigned int m = 0x5bd1e995;
    const int r = 24;

    unsigned int h = seed ^ len;

    const unsigned char * data = (const unsigned char *)key;

    while(len >= 4)
    {
        unsigned int k;

        k  = data[0];
        k |= data[1] << 8;
        k |= data[2] << 16;
        k |= data[3] << 24;

        k *= m; 
        k ^= k >> r; 
        k *= m;

        h *= m;
        h ^= k;

        data += 4;
        len -= 4;
    }

    switch(len)
    {
    case 3: h ^= data[2] << 16;
    case 2: h ^= data[1] << 8;
    case 1: h ^= data[0];
            h *= m;
    };

    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;

    return h;
}

Looks like it is the same on little endian machines:

int main()
{
  char a[100];
  printf("%d, %d\n", MurmurHash2(a, sizeof(a), 2323), MurmurHashNeutral2(a, sizeof(a), 2323));
}

Output:
-330669574, -330669574

Machine used was 64bit Intel.

The text was updated successfully, but these errors were encountered:

alastair · 2011-07-11T17:47:33Z

MurmurHashNeutral2 is twice as slow as the little-endian only version. We should only use it if we detect a big-endian architecture. https://sites.google.com/site/murmurhash/

bwhitman · 2011-07-11T17:49:05Z

But in practice codegenning?

(mobile)

On Jul 11, 2011, at 1:47 PM, alastairreply@reply.github.com wrote:

MurmurHashNeutral2 is twice as slow as the little-endian only version. We should only use it if we detect a big-endian architecture. https://sites.google.com/site/murmurhash/

Reply to this email directly or view it on GitHub:
#23 (comment)

alastair · 2011-07-11T18:08:07Z

true - less than 1/10th of a second over a 5 minute song.

alastair · 2011-07-15T17:42:19Z

Need to dig into the whole endian thing a bit more - 16bit LE PCM won't read correctly on a BE machine anyway, so there's a chance that more parts of the codegen need tweaking.

alastair · 2011-07-15T17:44:28Z

for my reference, stuff on detecting endianness:
http://stackoverflow.com/questions/2100331/c-macro-definition-to-determine-big-endian-or-little-endian-machine/2100391#2100391
http://www.gnu.org/s/hello/manual/gnulib/endian_002eh.html

ludflu · 2011-12-29T02:59:08Z

I think we're running into this problem - we codgenned on x86 to populate our database, then got our android app to codegen ten second samples from the mic - and nothing matches. I'm thinking this is because ARM is little endian and x86 is big ? How could this work in the IOS sample? or is that intended for the desktop and not iPhones ?

birkoffe mentioned this issue Feb 21, 2017

Add support murmurhash2 for big-endian redis/redis#3823

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

endianness of murmurhash #23

endianness of murmurhash #23

bwhitman commented Jul 11, 2011

alastair commented Jul 11, 2011

bwhitman commented Jul 11, 2011

alastair commented Jul 11, 2011

alastair commented Jul 15, 2011

alastair commented Jul 15, 2011

ludflu commented Dec 29, 2011

endianness of murmurhash #23

endianness of murmurhash #23

Comments

bwhitman commented Jul 11, 2011

alastair commented Jul 11, 2011

bwhitman commented Jul 11, 2011

alastair commented Jul 11, 2011

alastair commented Jul 15, 2011

alastair commented Jul 15, 2011

ludflu commented Dec 29, 2011