Add SCAN command #579

pietern · 2012-07-09T08:03:26Z

I'll write up the details for this technique later (wrote a nice and details commit message, but vim trashed it).

antirez · 2012-07-09T12:44:36Z

Thank you Pieter, for now I put it into the 2.8 Milestone, but actually the code looks so small, self contained, and the effects it has so good that maybe it's worth considering porting it back into 2.6 after some testing. I'll try to understand the code today and provide feedbacks. Cheers!

mrb · 2012-07-10T14:59:22Z

FYI, this segfaults with 0 arguments.

ironhawk · 2012-07-31T16:13:41Z

OK, great, thanks. Then let's hope that this will be released in 2.6 :-)
By the way, I couldn't find anything about the SCAN command, so don't know exactly what it is doing and how... But if you say this could solve the case I've described I'm happy!
Anyway, the problem is real and the solution is really needed. Probably it is not an "accident" that all DB servers has this feature somehow.. :-)

Thanks, keep up the good work guys!

Add HSCAN Command

antirez · 2013-04-22T10:33:29Z

Hello @pietern! I'm starting to re-evaluate the idea of an iterator for Redis, and the first item in this task is definitely to understand better your pull request and implementation. I don't understand exactly the implementation with the reversed bits counter, however for now that's not important, I would like to understand the behavior and guarantees of your implementation. I'm correct, this is what I get:

You don't have guarantee to visit keys just a single time, but you can get repeated keys.
However, you are guaranteed to visit all the keys that are not deleted in the time between the start and the end of the iteration.

Am I correct? Also about the order the keys are visited, could you provide some info? I see that if the table is not actively rehashing, keys are visited just incrementally from the first the last bucket. However what happens if instead there are two tables?

Last question, is there a simple way to implement SCAN without low-level bit operations, but just taking two indexes instead of one (even if we'll conceptually glue the indexes together as a single one in order to expose it to the user) in order to make the code more obvious to the casual reader?

Thank you for your help!
Salvatore

mrb · 2013-04-22T13:01:44Z

@antirez 👍 this is a really interesting take

pietern · 2013-04-23T17:00:59Z

@antirez Those guarantees are correct. Although I don't have a formal proof for these guarantees, I'm reasonably confident they hold. I worked through every hash table state (stable, grow, shrink) and it appears to work everywhere by means of the reverse binary iteration (for lack of a better word).

Nothing can be said about order, because the algorithm only deals with the hashes. From the point of view of the user, the order will be relatively random. Also the number of keys emitted per operation can be anywhere between 0 and the maximum number of collisions in a slot.

To answer the question about multiple tables, it is important to first understand how the iteration works in the normal case. If a hash table is stable, you can iterate by enumerating the slots sequentially and emitting all entries in a slot. This clearly breaks down when you rehash, because you have no clue whether or not elements in slots have already been emitted or not. This is a problem that can be ignored, making the iteration algorithm suboptimal, but we can do better.

Whenever a hash table is grown from size 2^N to size 2^(N+1), we know that elements in slot i will be distributed across slots i and i + N, because growing adds a significant bit to every element's hash. When an iterator has just started and emitted slot 0, we would like to not emit slot N again in the new table because we know it was already emitted via slot 0 in the old table. We wish to do the same for larger growth factors. For instance: going from N to 4N, slot i will be distributed among i, i + N, i + 2N, and i + 3N. More generally, going from N to 2^M * N, slot i will be distributed among i + j*N where j <- [0, 2^M-1]. If we look at the binary representation of these slots, it is clear that they have the same M most significant bits. Therefore, if we have an iteration scheme where these bits are ignored, we can avoid re-visiting these slots.

In the base case, consider a hash table that grows from size 2 to size 4. Assume we have an iterator that has just started and has emitted the elements at slot 0 before the hash table is resized. When the iterator resumes after the hash table has resized, it should not emit elements in slots 0 or 2, but only the elements in slots 1 and 3. To carry this information between iterations, a cursor is used. The cursor is an integer between 0 and 2^64-1 that contains the information the iterator needs to resume. You can think of it as a piece of information that holds what portion of the hash space has been covered. Initially, the iterator is called with a cursor c = 0. Every time the iterator is called it returns a new cursor. Iteration completes when the iterator returns a cursor c = 0.

When the iterator is called, it first emits the elements in the slot that the cursor points to. It masks the cursor with N-1 before doing so, to limit the cursor to the bits relevant for the current size of the hash table. Next, the iterator computes a new cursor to return. This is done by first masking the cursor such that all irrelevant bits are set to 1. This makes sure that incrementing the reverse bit string is done on the bits relevant for the current size of the hash table.

For example: consider a hash table with size 4. Only the 2 LSB's are relevant in addressing the hash table's slots. This means that if we want an increment operation to have effect on these 2 LSB's, we need to mask the other 62 bits to one. If this step is skipped, an increment on the reverse bit string might not carry all the way to the relevant 2 LSB's.

Which bits are relevant depends on the size of the hash table. For a hash table of size N, the log2(N) LSB's are relevant. Incrementing the reverse bit string by 1 for a smaller hash table has a much bigger effect for a larger hash table. This is why all expanded slots can be skipped in a larger hash table when it was already emitted for a smaller hash table. For example: consider a hash table with size 4, where an iterator just emitted slot 0. The cursor that is returned will be 000...00010 (increment the reverse bit string of 111...11100). Now, the hash table grows to size 8 and the iterator continues. The relevant bits from the perspective of this hash table are 010. Observe that the iterator will emit, in order, slots 2 (010), 6 (110), 1 (001), 5 (101), 3 (011), 7, (111). Slot 0 was already emitted for the hash table of size 4, and slot 4 (the expansion of slot 0) is skipped.

When a hash table is shrunk, multiple slots are collapsed into a single one. This means that it is unavoidable to avoid elements from being emitted twice. If an iterator has emitted the i'th slot, but not yet the i + N'th slot in the larger hash table, the i'th slot in the smaller hash table will be emitted again, containing both the elements from the i'th and the i+N'th slots in the larger hash table.

The trick that is applied when the hash table is in the process of being resized is that it always uses the smaller hash table as the reference, and emits both the elements in the slot in the smaller hash, and the elements in all expanded slots in the larger hash table.

I hope this clarifies things.

Cheers!

The irrelevant bits shouldn't be masked to 1. This can result in slots being skipped when the hash table is resized between calls to the iterator.

antirez · 2013-10-23T10:51:47Z

Hello, I believe I found an alternative of the Pieter's algorithm, inspired by his algorithm but extremely simpler to describe, that has similar behavior.

Dict.c hash table layout

Redis hash tables always use power of two tables and are implemented in the file dict.c.

However it is possible that at a given time the hash table uses two internal tables, T0 and T1, in order to incrementally rehash data from one table to one that is better dimensioned for the number of elements currently on the table.

Conflict resolution is obtained using chaining, so it is guaranteed that in a given table, a key is always at the same bucket.

In general in order to put a new element in the hash table, the hash H(key) is computed, trimmed to the size of the hash table by ANDing the hash with a bitmask (given that the tables are always power of two), and finally added to a linked list that is stored at the bucket.

When a rehashing is in progress in the hash table and there are two tables, the new elements are always added to the second table, the one we are rehashing to. When there are to tables also read accesses are performed in the bucket of the two tables, by trimming the hash function output with both the masks, since we don't know if a key is still on the old table or was already rehashed to the new one.

Algorithm description

Order tables so that T0 is the smaller and T1 is the bigger.
When the user requests iteration with cursor 0, set cursor to T0.size
When the user requests iteration with cursor >= T0.size, set cursor to T0.size
Decrement cursor.
Visit cursor in T0 and all its expansions into T1 (iterate all the bits that are the difference between T1 and T0 masks).
For example if T0 mask is 0011, and T1 mask is 1111, visit T0[cursor] and T1[cursor bitwise-or (from 00 to 11 shifted 2 places on left)].
Return the visited elements and the current cursor.

Proof (somewhat...)

Assume that there is a element in bucket that hashes in position K inside T0, or may hash into another position into T1, if T1 exists when we start iterating. Only two cases are possible:

If we visit K into T0, in the initial hash table setup, we are guaranteed to also visit all its expansions into T1, so we are guaranteed to return the element.
If we visit K later, after the setup hash changed, assuming that K is still a valid index inside the currently smaller table T0, we are still guaranteed to visit K into T0 and all its expansions into T1 if it exists. The element cannot change position and must still live into T0 or one of the expansions into T1, so we'll be able to return the element.
If we visit K later, after the setup hash changed, and K is an out of range index in the current T0 table, we start iterating from T0.size-1, and thus we are guaranteed to eventually visit an index that is K modulo T0.size-1, and when when this happens we also visit all the expansions of it that must include the element if it is still in the hash table.

English explanation of the algorithm

By visiting the index cursor at T0 and its expansions on T1, we basically cover all the buckets in both hash tables where a given key hashing at a given cursor at T0 can exist. Because SCAN returns all the keys found visiting one index and all its expansions in a single operation, all the set of keys that would hash in that bucket of T0 are returned, since we are sure that keys hashing to this bucket can only move between it and the expansions in T1.

This shows how during the rehashing process it is not possible for us to miss a key just because it moved from one place to the other.

The "Proof" section explains why this also works globally, that is, why a given key that exists from start to end of the iteration is guaranteed to be returned at least one time.

pietern · 2013-10-23T16:22:46Z

Hey Salvatore,

There are a couple observations I made reading your algorithm:

Sorting the tables regardless of rehash "direction", always using the smaller one as authoritative, and iterating over the expansion of the slot in the smaller table in the bigger table, is common between both our algorithms.
If you grow the hash table, and it is done rehashing, the cursor will still be bound to the original table size T0.size. Assume the previous table size was t0 and the new table size is t1 with t1 > t0. Assume we have an entry in slot K where K < t0. After rehashing is done, the entries in this slot will be distributed across slots K and K + t0. Since your algorithm only decrements the cursor and does not modify it when the table range grows, it will not visit the entries in slot K + t0. This violates the guarantee that entries will be visited if they exist both when the algorithm starts and when it finishes.
If you shrink the hash table before you have iterated over half of the range in the original table, and it is done rehashing, the cursor trimming you apply effectively restarts iteration in the smaller table. This does not violate any guarantees that we initially made. However, if the issue about hash table growth is solved, it may mean that your algorithm doesn't terminate.

Maybe I'm missing something in the explanation of your algorithm, but as it is right now it appears these things need to be fixed for us to maintain the guarantees we talked about earlier in this thread.

Ciao!

antirez · 2013-10-23T16:37:06Z

Hello Pieter! Thanks. I'll analyze your concerns and reply here. Thanks!

pietern · 2013-10-23T17:20:26Z

Btw, these concerns are addressed by the bitwise reverse iterator, since the cursor is oblivious to table size ;-)

antirez · 2013-10-24T09:28:58Z

Hello Pieter,

The similarities between the algorithms are due to the fact that I'm trying just copy your algorithm as closely as possible as I think it is very sounding, I'm just trying a way to see it from a different point of view in order to lower the complexity. However now that I'm "into" it it seems natural, but only because I used a non trivial amount of time studying it in the latest days.

Your concerns about my algorithm are very true and the "stop" condition in my algorithm is wrong, so the result is that it skips some combinations when the new table is smaller. Indeed it can't happen with your implementation, as you test the higher bits first in a given combination, and the magic is that, if the table gets smaller, the mask changes so we test the lower combination again (for example the table was 8 and we were iterating on higher bits like 100, what happens if the table is resized to 4 is that the new mask is "11", so we effectively try again 00, that is the only place where 100 and 000 elements in the previous iterator can be moved to).

At the same time when the table gets bigger it just continues iterating, but because the mask changed less bits are turned on with the inversion of the mask when the counter is incremented, so it will explore all the higher bits combinations first. Super cool, but I still suspect that this same result can be obtained in a way that is simpler to explain, because the concept is indeed "try the expansions first, and later move to the next combination of lower bits" or something like that. I wonder if there is a way to make that more intuitive... so investing some more time into this, and if I fail I'll just merge your code trying to augment it with more comments...

Thanks!

antirez · 2013-10-24T10:36:53Z

Ok, I suspect that the idea of counting backward to test the higher-level bits first, was indeed the right approach in my modified algorithm, but the thing is, to really do it well we need to start to count from the largest integer possible (ULONG_MAX basically) to 0, and using as index the integer masked with the current mask.

Working more on that after lunch.

antirez · 2013-10-24T14:23:49Z

No way... to get the same result of the Pieter's algorithm I've anyway to make it complex enough that is a pointless operation to do, we have just to explain it in better terms. The beauty of the Pieter's approach is that it never does additional operations, by counting flipping the bit, if the table gets bigger, for example from a 3 bits mask, to a 4 bits mask, only the combinations not already tried with the previous mask are tried, since it sets the upper bits first. Similarly when the mask is instead reduced, it only tests again the patterns that can be a compression of not already checked buckets in the previous table. Hard to explain but awesome.

At this point let's talk API :-) Since I want to merge ASAP.

My proposal:

SCAN <offset>

In order to iterate over the keyspace.

SCAN <key> <offset>

In order to iterate over everything else, whatever key is, as long as it is a set, sorted set, or hash.

What do you think?

Also if you want to play with the Pietern's algorithm idea, here is a C program that simulates it:

#include <stdio.h>
#include <stdlib.h>

unsigned long size;
unsigned long v;

unsigned long rev(unsigned long v) {
    unsigned long s = 8 * sizeof(v); // bit size; must be power of 2
    unsigned long mask = ~0;
    while ((s >>= 1) > 0) {
        mask ^= (mask << s);
        v = ((v >> s) & mask) | ((v << s) & ~mask);
    }
    return v;
}

void init(void) {
    size = 8; /* Initial table size. */
    v = 0; /* Initial cursor position. */
    printf("--- size: %d\n", (int)size);
}

void print_bin(unsigned long v) {
    unsigned long bit = 1<<15;
    while(bit) {
        if (bit<<1 == size) printf("|");
        printf("%s", (v&bit) ? "1" : "0");
        bit >>= 1;
    }
    printf("\n");
}

/* Simulates a SCAN call. */
void scan(void) {
    unsigned long mask = size-1;

    print_bin(v & mask);
    printf("Visit %lu\n", v & mask);

    v |= ~mask;
    v = rev(v);
    v++;
    v = rev(v);
}

int main(void) {
    int iter = 0;
    init();
    do {
        scan();
        iter++;

        if (iter == 3) {
            size=16;
            printf("--- size: %d\n", (int)size);
        }
        if (iter == 6) {
            size=4;
            printf("--- size: %d\n", (int)size);
        }
    } while(v);
    return 0;
}

ghost · 2013-10-24T14:31:02Z

feels like it should be SCAN or SCAN to stop any
position confusion?

Khushil Dep
about.me/khushil

On 24 October 2013 15:23, Salvatore Sanfilippo notifications@github.comwrote:

No way... to get the same result of the Pieter's algorithm I've anyway to
make it complex enough that is a pointless operation to do, we have just to
explain it in better terms. The beauty of the Pieter's approach is that it
never does additional operations, by counting flipping the bit, if the
table gets bigger, for example from a 3 bits mask, to a 4 bits mask, only
the combinations not already tried with the previous mask are tried, since
it sets the upper bits first. Similarly when the mask is instead reduced,
it only tests again the patterns that can be a compression of not already
checked buckets in the previous table. Hard to explain but awesome.

At this point let's talk API :-) Since I want to merge ASAP.

My proposal:

SCAN

In order to iterate over the keyspace.

SCAN

In order to iterate over everything else, whatever key is, as long as it
is a set, sorted set, or hash.

What do you think?

Also if you want to play with the Pietern's algorithm idea, here is a C
program that simulates it:

#include <stdio.h>#include <stdlib.h>
unsigned long size;unsigned long v;
unsigned long rev(unsigned long v) {
unsigned long s = 8 * sizeof(v); // bit size; must be power of 2
unsigned long mask = ~0;
while ((s >>= 1) > 0) {
mask ^= (mask << s);
v = ((v >> s) & mask) | ((v << s) & ~mask);
}
return v;}
void init(void) {
size = 8; /* Initial table size. /
v = 0; / Initial cursor position. /
printf("--- size: %d\n", (int)size);}
void print_bin(unsigned long v) {
unsigned long bit = 1<<15;
while(bit) {
if (bit<<1 == size) printf("|");
printf("%s", (v&bit) ? "1" : "0");
bit >>= 1;
}
printf("\n");}
/ Simulates a SCAN call. */void scan(void) {
unsigned long mask = size-1;
print_bin(v & mask);
printf("Visit %lu\n", v & mask);

v |= ~mask;
v = rev(v);
v++;
v = rev(v);}
int main(void) {
int iter = 0;
init();
do {
scan();
iter++;
    if (iter == 3) {
        size=16;
        printf("--- size: %d\n", (int)size);
    }
    if (iter == 6) {
        size=4;
        printf("--- size: %d\n", (int)size);
    }
} while(v);
return 0;}
—
Reply to this email directly or view it on GitHubhttps://github.com//pull/579#issuecomment-26996326
.

antirez · 2013-10-24T14:39:14Z

@khushil I've the feeling that it is much simpler if the key is the first argument as this is how it works usually in Redis, however I'm not entirely happy with the fact that there is some sort of confusion possible.

Maybe we could use SCAN for all the keys iteration, and another command name such as KSCAN for the keyspace, this also plays a lot better with Redis Cluster.

antirez · 2013-10-24T14:40:50Z

p.s. the second proposal would also allow me to merge Pieter's original implementation ASAP just renaming it KSCAN, and only later add the SCAN command when it works with all the data types. I would also love to see this implemented by Pieter if he got some time since he is the expert in both the iteration algorithm AND in the zipped data types that one needs to deal with when iterating encoded objects.

soveran · 2013-10-24T16:33:38Z

@antirez What I mentioned the other day on twitter, is that the proposed approach (even if I like it a lot) breaks this assumption in redis-cluster clients: https://github.com/antirez/redis-rb-cluster/blob/master/cluster.rb#L115-L135

djanowski · 2013-10-24T16:48:49Z

Wow, I was completely unaware that this feature was being discussed and developed. This is great :-)

In my opinion, while the implementation may be similar in both cases, iterating over the keyspace and over the elements in a key are fundamentally different from a design point of view. That's why I'd vote for giving these two uses different names.

For the keyspace I propose KEYSCAN for its similarity with KEYS.

antirez · 2013-10-24T21:34:14Z

Hey Damian! In order to simplify things a bit I'm thinking about, as Pieter suggested, to call the key space iterator just SCAN. However I think I'll use a different interface compared to the current one. What I propose is also an extension on the features. Here is a short description:

SCAN <cursor> [option [value] ... option [value]]

This is basically a SORT-alike syntax. Supported options:

SCAN <cursor> MATCH <pattern> # Just matches a pattern.
SCAN <cursor> TYPE list|set|zset|... # Only return elements of the specified type
SCAN <cursor> MINELE <count> # Returns at least <count> elements per call, unless the number of remaining elements is less than <count>, in such a case all are returned.

That's it... For iterating keys I'll give some more thoughts about it, but actually I'm not completely convinced by a generic command because with the idea of options it makes sense to have different options for different types, for instance one may want to iterate scores within a given range in sorted sets, or even have a different semantics for the return value of sorted sets given that there is a score, so we'll likely end with HSCAN, SSCAN, ZSCAN.

Another reason for that actually is that when you read some source code it may be of some semantic help to read HSCAN for instance. Also this one for example would return key-value pairs, so the client library may return a different format.

Cheers,
Salvatore

pietern · 2013-10-24T21:43:41Z

👍

The options to SCAN that are already implemented are PATTERN to match keys and COUNT to return a minimum number of elements. I like the addition of TYPE as well.

antirez · 2013-10-24T21:46:46Z

Oh awesome Pieter! I did not looked into the actual implementation of the SCAN pattern but apparently it is exactly as I wish it to be, so I can just merge at this point I guess, and add TYPE later :-)

antirez · 2013-10-24T21:47:22Z

p.s. also COUNT is better than MINELE. Just there is to document that is the minimum as there is no way to guarantee to return just "COUNT" element.

djanowski · 2013-10-24T22:29:33Z

Perfect, I'm +1 on SCAN and specific commands for each type. That should make both the client and the application code simpler.

soveran · 2013-10-25T07:55:39Z

The command scans until it finds COUNT matching keys (default is 1). If no keys match, it will behave like KEYS. I'm wondering if it makes sense to limit the number of visited buckets, even if the return value is a cursor with no results. That way you can more or less limit the time spent by SCAN and avoid blocking the database the way KEYS usually does.

soveran · 2013-10-25T08:03:52Z

Disregard my previous comment, I read the code and it's clear now.

soveran · 2013-10-25T08:04:27Z

Btw, the API looks great now.

antirez · 2013-10-25T08:05:48Z

@soveran good catch, let's see the different cases:

No TYPE or PATTERN option given: What you describe can theoretically happen to be honest, but it is extremely unlikely since the hash table size is always proportional to the number of elements it contains and the "low level" is 10% full. I can't imagine how in practice you may ever end with something like a 1 million buckets hash table with 10 elements.

When TYPE or PATTERN are given instead it is extremely likely to happen :-) So I think that, yes, we should be allowed to return an empty set and don't scan more than a given number of buckets per iteration.

I think that semantically there are no problems at all about non returning elements, it is just a matter of implementation, and of course of documentation.

Thanks a ton! That was really helpful.

antirez · 2013-10-25T08:07:21Z

p.s. I'm not sure why you disregarded your comment later. It is already the case that it returns without elements? Reading the code myself...

soveran · 2013-10-25T08:14:18Z

I think there are two different meanings for COUNT. In the code, it means the number of scanned keys, and in your API proposal it means the elements to be returned. The problem arises only if we use it as MINELE. If we use it as it is in the code, we are fine.

antirez · 2013-10-25T08:17:32Z

Ok, reading the code it is clear that the keys are collected in one place and then filtered later, so only the first thing of high latency is possible, the one about an hash table almost empty with many buckets, that is a rare condition.

Given tat the filtering step is performed later, this means we are already allowed to return zero elements in the current API. Because of this it will be easy to change the code later if needed to also break the busy loop if we see issues with that:

do {
    cursor = dictScan(c->db->dict, cursor, scanCallback, keys);
} while (cursor && listLength(keys) < count);

But AFAIK it will not be a source of issues.

antirez · 2013-10-25T08:18:14Z

And yes, because of the actual semantics, COUNT will be hard to explain to users.

soveran · 2013-10-25T08:30:21Z

Yes, I think MINELE is impossible to implement without incurring some penalties. The API for COUNT will be like this:

SCAN <cursor> COUNT <n> # Scan <n> keys per call.

antirez · 2013-10-25T08:33:22Z

My "MINLEN" proposal was actually, semantically, the same as "COUNT" when using PATTERN or TYPE, so probably "COUNT" is a better name since it really does not make promises as "MINLEN" would imply :-)

antirez · 2013-10-26T13:50:08Z

On IRC @soveran suggested that the line:

    (((v | m) + 1) & ~m) | (v & m);

Could be changed with:

    ((v | m) + 1) | (v & m);

While I not usually touch things that work and are being tested in production for months, (Twitter is using the SCAN iterator in production as far as I can tell), It looks like a pretty safe change.

Basically the v|m, given that m is guaranteed to be a power of two minus one, will always result into the latest bits to be all one. Because of the +1 all the lower bits should turn into 0, so masking for ~m should perform no useful operation at all.

moreaki · 2013-10-26T15:53:53Z

IRC again ... reminds me of a story not long ago :). This time the person could be right (logically anyway), though, albeit my first impression from an -O3 run of gcc -S revealed the following difference:

_bitop1:
Leh_func_begin1:
    pushq    %rbp
Ltmp0:
    movq    %rsp, %rbp
Ltmp1:
    movq    %rsi, %rcx
    andq    %rdi, %rcx
    orq    %rsi, %rdi
    incq    %rdi
    notq    %rsi
    andq    %rdi, %rsi
    movq    %rsi, %rax
    orq    %rcx, %rax
    popq    %rbp
    ret
Leh_func_end1:

versus

_bitop2:
Leh_func_begin2:
    pushq    %rbp
Ltmp2:
    movq    %rsp, %rbp
Ltmp3:
    movq    %rsi, %rcx
    andq    %rdi, %rcx
    orq    %rdi, %rsi
    incq    %rsi
    movq    %rsi, %rax
    orq    %rcx, %rax
    popq    %rbp
    ret
Leh_func_end2:

My asm is a bit rusty at the moment, however I believe them to be identical for all constraints for unsigned long on 64bit and 32bit architectures, which essentially boils down to a difference of:

    orq    %rsi, %rdi
    incq    %rdi
    notq    %rsi
    andq    %rdi, %rsi

versus

    orq    %rdi, %rsi
    incq    %rsi

Or if you prefer Clang:

    orq    %rsi, %rdi
    leaq    1(%rdi), %rax
    notq    %rsi
    andq    %rsi, %rax

versus

    orq    %rdi, %rsi
    leaq    1(%rsi), %rax

Could be an interesting SO question regarding compiler bit operation optimisations; pseudo code here:

#include<stdio.h>
#include<stdlib.h>

unsigned long bitop1(unsigned long v, unsigned long m0) {
    return (((v | m0) + 1) & ~m0) | (v & m0);
}

unsigned long bitop2(unsigned long v, unsigned long m0) {
    return ((v | m0) + 1) | (v & m0);
}

int main(void) {
    /*
    printf("bitop1: 0x%08lx --- bitop2: 0x%08lx\n",
        bitop1(0x0, 0x0),
        bitop2(0x0, 0x0));
    */
    bitop1(0x0, 0x0);
    bitop2(0x0, 0x0);
    exit(EXIT_SUCCESS);
}

antirez · 2013-10-26T17:28:41Z

I don't think the compiler could ever understand that "m0" is always a power of two minus one, hence the difference.

Add SCAN command

3a989d3

ghost assigned antirez Jul 9, 2012

ghost assigned pietern Jul 9, 2012

SCAN requires at least 1 argument

b26e6bc

pietern mentioned this pull request Jul 31, 2012

feature request: get N keys (like limit, in SQL) #611

Closed

aitskovi and others added 9 commits August 7, 2012 23:23

Add HSCAN Command

4a43195

Add HSCAN tests

7284106

Add pattern option to HSCAN

fcb542a

Add HSCAN count test

55159ee

Add HSCAN with pattern test

517ad9b

Remove expiry check in hscan

5c1424f

Merge pull request #1 from aitskovi/scan

89682a2

Add HSCAN Command

Fix comment

0a0e769

Style

ea2f120

antirez mentioned this pull request Feb 20, 2013

KEYS * in production #959

Closed

Fix error in scan algorithm

2516eb6

The irrelevant bits shouldn't be masked to 1. This can result in slots being skipped when the hash table is resized between calls to the iterator.

antirez closed this Nov 25, 2013

antirez unassigned pietern Mar 11, 2015

Add SCAN command #579

Add SCAN command #579

Conversation

pietern commented Jul 9, 2012

antirez commented Jul 9, 2012

mrb commented Jul 10, 2012

ironhawk commented Jul 31, 2012

antirez commented Apr 22, 2013

mrb commented Apr 22, 2013

pietern commented Apr 23, 2013

antirez commented Oct 23, 2013

Dict.c hash table layout

Algorithm description

Proof (somewhat...)

English explanation of the algorithm

pietern commented Oct 23, 2013

antirez commented Oct 23, 2013

pietern commented Oct 23, 2013

antirez commented Oct 24, 2013

antirez commented Oct 24, 2013

antirez commented Oct 24, 2013

ghost commented Oct 24, 2013

antirez commented Oct 24, 2013

antirez commented Oct 24, 2013

soveran commented Oct 24, 2013

djanowski commented Oct 24, 2013

antirez commented Oct 24, 2013

pietern commented Oct 24, 2013

antirez commented Oct 24, 2013

antirez commented Oct 24, 2013

djanowski commented Oct 24, 2013

soveran commented Oct 25, 2013

soveran commented Oct 25, 2013

soveran commented Oct 25, 2013

antirez commented Oct 25, 2013

antirez commented Oct 25, 2013

soveran commented Oct 25, 2013

antirez commented Oct 25, 2013

antirez commented Oct 25, 2013

soveran commented Oct 25, 2013

antirez commented Oct 25, 2013

antirez commented Oct 26, 2013

moreaki commented Oct 26, 2013

antirez commented Oct 26, 2013