New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_dictNextPower function optimize in dict.c #4776

Closed
Qinch opened this Issue Mar 20, 2018 · 13 comments

Comments

Projects
None yet
7 participants
@Qinch

Qinch commented Mar 20, 2018

I read the dict.c, _dictNextPower is implemented as follow:

static unsigned long _dictNextPower(unsigned long size) {
  unsigned long i = DICT_HT_INITIAL_SIZE;

  if (size >= LONG_MAX) return LONG_MAX + 1LU;
  while (1) {
    if (i >= size) return i;
    i *= 2;
  }
}

The following is an optimized version inspired by pow2gt function in cJSON.C

static unsigned long _dictNextPower(unsigned long size) {
  if (size <= DICT_HT_INITIAL_SIZE)
        return DICT_HT_INITIAL_SIZE;
  else if (size >= LONG_MAX)
    return LONG_MAX + 1LU;
  else {
    --size;
    size |= size >> 1;
    size |= size >> 2;
    size |= size >> 4;
    size |= size >> 8;
    size |= size >> 16;
    size |= size >> 32;
    return size + 1;
  }
}

Firstly, the optimized version has the same return as the current version

the test code is (all the test code is here)

#include <stdio.h>

/* Our hash table capability is a power of two */
#define DICT_HT_INITIAL_SIZE 4
#define LONG_MAX 0x7FFFFFFF
static unsigned long _dictNextPower(unsigned long size) {
  unsigned long i = DICT_HT_INITIAL_SIZE;

  if (size >= LONG_MAX) return LONG_MAX + 1LU;
  while (1) {
    if (i >= size) return i;
    i *= 2;
  }
}
static unsigned long _dictNextPowerX(unsigned long size) {
  if (size <= DICT_HT_INITIAL_SIZE)
    return DICT_HT_INITIAL_SIZE;
  else if (size >= LONG_MAX)
    return LONG_MAX + 1LU;
  else {
    --size;
    size |= size >> 1;
    size |= size >> 2;
    size |= size >> 4;
    size |= size >> 8;
    size |= size >> 16;
    size |= size >> 32;
    return size + 1;
  }
}
int main() {
  unsigned long In = 0;
  for (; In <= LONG_MAX; In++) {
    unsigned long ret1 = _dictNextPower(In);
    unsigned long ret2 = _dictNextPowerX(In);
    if (ret1 != ret2) printf("bad optimition\n");
  }
}

Secondly, the compare is as follow:

  1. the old version (_dictNextPower in dict.c)
  • gcc NextPower_old.c -o old
  • time ./old
    real 2m52.216s
    user 2m51.972s
    sys 0m0.000s

the test code is:

#include <stdio.h>

/* Our hash table capability is a power of two */
#define DICT_HT_INITIAL_SIZE 4
#define LONG_MAX 0x7FFFFFFF
static unsigned long _dictNextPower(unsigned long size) {
  unsigned long i = DICT_HT_INITIAL_SIZE;

  if (size >= LONG_MAX) return LONG_MAX + 1LU;
  while (1) {
    if (i >= size) return i;
    i *= 2;
  }
}
static unsigned long _dictNextPowerX(unsigned long size) {
  if (size <= DICT_HT_INITIAL_SIZE)
    return DICT_HT_INITIAL_SIZE;
  else if (size >= LONG_MAX)
    return LONG_MAX + 1LU;
  else {
    --size;
    size |= size >> 1;
    size |= size >> 2;
    size |= size >> 4;
    size |= size >> 8;
    size |= size >> 16;
    size |= size >> 32;
    return size + 1;
  }
}
int main() {
  unsigned long In = 0;
  for (; In <= LONG_MAX; In++) {
    unsigned long ret1 = _dictNextPower(In);
    if (ret1 < In) printf("bad optimition\n");
  }
}
  1. the optimized version
  • gcc NextPower_new.c -o new
  • time ./new
    real 0m23.242s
    user 0m23.196s
    sys 0m0.000s

the test code is

#include <stdio.h>

/* Our hash table capability is a power of two */
#define DICT_HT_INITIAL_SIZE 4
#define LONG_MAX 0x7FFFFFFF
static unsigned long _dictNextPower(unsigned long size) {
  unsigned long i = DICT_HT_INITIAL_SIZE;

  if (size >= LONG_MAX) return LONG_MAX + 1LU;
  while (1) {
    if (i >= size) return i;
    i *= 2;
  }
}
static unsigned long _dictNextPowerX(unsigned long size) {
  if (size <= DICT_HT_INITIAL_SIZE)
    return DICT_HT_INITIAL_SIZE;
  else if (size >= LONG_MAX)
    return LONG_MAX + 1LU;
  else {
    --size;
    size |= size >> 1;
    size |= size >> 2;
    size |= size >> 4;
    size |= size >> 8;
    size |= size >> 16;
    size |= size >> 32;
    return size + 1;
  }
}
int main() {
  unsigned long In = 0;
  for (; In <= LONG_MAX; In++) {
    unsigned long ret2 = _dictNextPowerX(In);
    if (ret2 < In) printf("bad optimition\n");
  }
}
@0xtonyxia

This comment has been minimized.

Contributor

0xtonyxia commented Mar 21, 2018

Great optimization, LGTM.
@antirez

@wendy260310

This comment has been minimized.

wendy260310 commented Mar 21, 2018

Great, this optimization is called RoundUpPowerOf2,many places use the code you post, like Java HashMap#tableSizeFor

@ccqy66

This comment has been minimized.

ccqy66 commented Mar 21, 2018

nice idea

@Qinch

This comment has been minimized.

Qinch commented Mar 22, 2018

@0xtonyxia thanks for your reply. by the way , i have another question see issue #4764 .

@wendy260310

This comment has been minimized.

wendy260310 commented Mar 22, 2018

I suggest you to submit a pull request , see how to contribute, and you can ask @soloestoy when you are not sure about your modification, He is a contributor who works at Alibaba

@WiFeng

This comment has been minimized.

WiFeng commented Mar 22, 2018

The algorithm who called RoundUpPowerOf2 is so cool and perfect. As you know, the long type length is different in the different bit system(32bit/64bit), so should be dealed with different ways.
There is a similar operation here. DaveGamble/cJSON@b4d728d

@Qinch

This comment has been minimized.

Qinch commented Mar 22, 2018

@WiFeng Thanks ,I will add the compile macro.
In fact, I considered that ILP32(int , long, pointer: type length is 4bytes in 32bits system) and LP64(long,pointer: type length is 8bytes, in 64bits system). The function _dictNextPowerX works OK both in 32bits and 64 bits Redis program. But, when in 32 bits ,the last size |= size >> 32; operation do nothing, which means a bit of loss of computational performance.

@soloestoy

This comment has been minimized.

Contributor

soloestoy commented Mar 22, 2018

Good job, the bit twiddling is so cool, and here is the explanation about the algorithm.

BTW, I wonder that if anywhere other codes can be replaced with the bit twiddling.

@Qinch

This comment has been minimized.

Qinch commented Mar 22, 2018

@soloestoy Hi, do you remember my ID? I once asked you a question about ziplist.
@0xtonyxia @wendy260310 @soloestoy 跪求指导!!
Now, I have another question. see issue #4764.
key info about #4764 as follows:
My confusion is why the code is implemented like ?
note that: value is unsigned, svalue is signed.

    if (svalue < 0) {
        if (svalue != LLONG_MIN) {
            value = -svalue;
        } else {
            value = ((unsigned long long) LLONG_MAX)+1;
        }
        negative = 1;
    } 

Instread of (they have the same result)

 if (svalue < 0) {
    value = -svalue;
    negative = 1;
}

It's known that

int tmpI= 2147483648(0x80 00 00 00) means tmpI(%d) = -2147483648;

unsigned int tmpUI = -2147483648(0x80 00 00 00) means tmpUI(%u)= 2147483648;

so unsigned int tmp = -INT_MIN;which means tmp=0x80 00 00 00(2147483648);

I think the assembler code corresponding to this function(test) like this:

#define INT_MIN -2147483648 
void test()
{
    int i = INT_MIN;
    unsigned int ui = -INT_MIN; //0 - INT_MIN
}



xorl    %eax, %eax  
movl    $-2147483648, -4(%rbp)  //i= INT_MIN;  
subl    -4(%rbp), %eax   // eax = 0 -i;
movl    %eax, -8(%rbp)  //ui =eax;
@itamarhaber

This comment has been minimized.

Contributor

itamarhaber commented Mar 22, 2018

Interesting, and reminiscent of #3833...

@Qinch

This comment has been minimized.

Qinch commented Mar 22, 2018

@soloestoy

This comment has been minimized.

Contributor

soloestoy commented Mar 23, 2018

        if (svalue != LLONG_MIN) {
            value = -svalue;
        } else {
            value = ((unsigned long long) LLONG_MAX)+1;
        }
        negative = 1;
    } 

@Qinch please notice that LLONG_MIN is not a constant, in limits.h it is defined as (-LLONG_MAX-1LL).

So, using the - operator on LLONG_MIN will cause overflow implicitly, because the - operator means ~LLONG_MIN+1, in fact ~LLONG_MIN is LLONG_MAX, now you can see where overflow happens.

BTW, just use -LLONG_MIN won't cause bug, but that is still not a good idea.

@Qinch

This comment has been minimized.

Qinch commented Mar 23, 2018

@soloestoy thanks!

@Qinch Qinch closed this Mar 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment