Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Redis-compatible cursors for SCAN commands #1489

Merged
merged 56 commits into from Jun 25, 2023

Conversation

jihuayu
Copy link
Member

@jihuayu jihuayu commented Jun 10, 2023

Hello everyone, I have completed this PR and below is my explanation of the changes made.

We have added the following steps to the processing of the SCAN, ZSCAN, SSCAN, HSCAN commands:

  • Before processing the command, convert the numeric cursor sent by the client back to the keyname cursor.
  • After processing the command, convert the keyname cursor back to a numeric cursor and return it to the client, And store the conversion dictionary in the Server#cursor_dict_.

Since those steps is non-intrusive to the internal implementation of the SCAN command, we can ensure that the behavior of commands such as ZSCAN, SSCAN, and HSCAN is consistent with that of the SCAN command.
In the following, I will only use the SCAN command as an example to describe the new design.

Cursor design

We call the new cursor the numeric cursor and the old one the keyname cursor.
The numeric cursor is composed of 3 parts:

  • 1-16 bit is counter,and the first bit always 1
  • 17-32 bit is timestamp
  • 33-64 bit is hash of keyname

The counter is a 16-bit unsigned integer that is incremented by 1 every time. When the counter overflows, it returns to 0 because it is an unsigned number. Since our cursor_dict_size is a power of 2, the counter is continuous mod cursor_dict_size.
timestap is a 16-bit timestamp in seconds, which can store up to 9 hours.
hash is a 32-bit hash value of the keyname.

Cursor dictionary(Server#cursor_dict_)

The cursor dictionary is an array with a length of 16384(1024*16), which is determined at compile time, and occupies about 640KB of memory. Including the length of the referenced keyname strings, its size is about 1-2M.

During convert the keyname cursor back to a numeric cursor, a new cursor is generated based on the above rules, and the index for storing the dictionary is determined based on the counter value (index = counter % DICT_SIZE).

During convert the numeric cursor back to a keyname cursor , we get the counter from the cursor, and calculate the index of the cursor in the cursor_dict_ based on the counter. We only need to compare the cursor value of the item at that index with the input cursor value to determine if they are the same.

Other information about the cursor

This design guarantees the validity of the latest 16384(1024*16) cursors, while cursors that are older or not generated in our system are considered invalid cursors. For invalid cursors, we treat them as a 0 cursor, which means we will start iterating over the collection from the beginning.

Our cursor is globally visible, and we store index information in the cursor. As long as the cursor remains valid, using the same cursor in different connections will produce the same results.

We prevent other users from guessing the data traversed by adjacent cursors by adding the hash value of the keyname to the cursor. If a user tries to obtain adjacent cursor information by traversing the hash, the cursor will become an invalid cursor before the traversal is complete because the size of the 32-bit space is much larger than the length of the cursor_dict_.

We add a timestamp to the cursor to ensure that the same cursor does not appear within a short period of time before and after a restart.

Other behaviors are consistent with the original SCAN implementation.

Configuration file

Added redis-cursor-compatible configuration item.
If enabled, the cursor will be an unsigned 64-bit integer.
If disabled, the cursor will be a string.

Test file

Added tests for redis-cli --bigkey and redis-cli --memkeys commands. We only need to ensure that these commands run correctly, because their correctness is guaranteed by redis-cli on the premise that we ensure the correctness of the scan command.
Modified the scan command test to test for the cases where redis-cursor-compatible is set to yes or no.

Other changes:

Fixed a bug where the cursor did not return 0 when SCAN commands return less than number elements.

fixed #1402
fixed #877

@git-hulk
Copy link
Member

git-hulk commented Jun 10, 2023

Hi @jihuayu, Thanks for your great efforts.

Can you explain a bit about the cursor number design and what if we use the wrong cursor for the hash/set/zset scan(should we store and compare the prefix as well when getting it by the cursor)? And if it's good to use a random or timestamp to avoid conflicts when restarting the server.

@PragmaTwice
Copy link
Member

Thanks for your contribution!

Several points after a short glance:

  • seems the ring buffer is shared between all connections, so just be curious that, will the functionality of SCAN get break if we have lots of client connections simultaneously? And could we make it duplicate per connections or able to adjust the buffer size dynamically?
  • if some keys are deleted while the SCAN operation is executed, will the key properties of SCAN be preserved? (refer to "Scan guarantees" in https://redis.io/commands/scan/#scan%20guarantees)
  • self-increase cursor may cause informantion leaks: one client can retrieve some informantion of the current executed commands from another client (partially same as the first point)
  • seems the mutex can be replaced by rwlock or atomic variables

cc @mapleFU @torwig

@git-hulk
Copy link
Member

seems the ring buffer is shared between all connections, so just be curious that, will the functionality of SCAN get break if we have lots of client connections simultaneously? And could we make it duplicate per connections or able to adjust the buffer size dynamically?

@PragmaTwice It cannot be per connection since we need to allow using the cursor in different connections. That's to say, the connection A got the cursor should can be visible to connection B from the client side.

@jihuayu
Copy link
Member Author

jihuayu commented Jun 13, 2023

I will redesign the cursor.
I'm very busy these days, so I will come back in a few days. Wait for me.

@PragmaTwice
Copy link
Member

PragmaTwice commented Jun 13, 2023

seems the ring buffer is shared between all connections, so just be curious that, will the functionality of SCAN get break if we have lots of client connections simultaneously? And could we make it duplicate per connections or able to adjust the buffer size dynamically?

@PragmaTwice It cannot be per connection since we need to allow using the cursor in different connections. That's to say, the connection A got the cursor should can be visible to connection B from the client side.

Is this property used by some client libraries?

Actually I cannot find how and when these cursors become invalid (e.g. Will different connections lead to invalid cursor? Will adding and deleting elements lead to invalid cursor?) in the official document, which seems they just do not give any guarentee about this. It is sad.

But I find such a statement.

Multiple parallel iterations
It is possible for an infinite number of clients to iterate the same collection at the same time, as the full state of the iterator is in the cursor, that is obtained and returned to the client at every call. No server side state is taken at all.

It may make the implementation of redis cursor harder, if we need to follow it.

@git-hulk
Copy link
Member

Is this property used by some client libraries?

Most libraries should use the same connection to scan, but the cursor needs to be still valid if the connection was reconnected. And as the scan command is a long-time operation, it'd be better to allow reconnecting behavior.

Yes, agreed that it's impossible to follow the Redis exactly since the underlying storage is different. What we can do is to keep compatible as possible.

@jihuayu
Copy link
Member Author

jihuayu commented Jun 14, 2023

@PragmaTwice @git-hulk If the user uses redis-cli scan 0 and redis-cli scan 10 to scan, the two scan will be in different connections. I think this usage may be common, so it is necessary to ensure that the cursor is available in different connection .

We can make a promise to how long or how many recent cursors are guaranteed to be valid. For example, cursors created within the last hour or the last 100 cursors.

In redis doc:

A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point SCAN returned it to the user.
A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, SCAN ensures that this element will never be returned.

I think this is necessary, while the others are optional. Our current implementation can ensure these two points.

Yes, agreed that it's impossible to follow the Redis exactly since the underlying storage is different. What we can do is to keep compatible as possible.

I agree as well. We just need to follow redis in common behavior.

@git-hulk
Copy link
Member

Thanks, @jihuayu.

Keeping the recent N cursors solution is good for me since those cursors won't occupy too much memory. To see if @PragmaTwice has any comments.

@jihuayu
Copy link
Member Author

jihuayu commented Jun 14, 2023

@git-hulk We may need to know how the N value is determined in actual project groups. If N=1024 and the average key name length is 40, the memory usage would be 1024*(40+40)=80KB.

@git-hulk
Copy link
Member

@git-hulk We may need to know how the N value is determined in actual project groups. If N=1024 and the average key name length is 40, the memory usage would be 1024*(40+40)=80KB.

It's hard to determine which value is suitable, but I think we can give a relatively large one like 10240 or even more. It's fine since the usage of memory is a fixed number.

@jihuayu
Copy link
Member Author

jihuayu commented Jun 14, 2023

It's hard to determine which value is suitable, but I think we can give a relatively large one like 10240 or even more. It's fine since the usage of memory is a fixed number.

Do we need to provide a configuration option for this value?

src/server/server.h Outdated Show resolved Hide resolved
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jihuayu
Copy link
Member Author

jihuayu commented Jun 22, 2023

Thanks for the review. Does anyone have any other suggestions?

git-hulk
git-hulk previously approved these changes Jun 24, 2023
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution and patience.

@git-hulk
Copy link
Member

@torwig @PragmaTwice Can you take a look again while you're free.

src/config/config.h Outdated Show resolved Hide resolved
torwig
torwig previously approved these changes Jun 25, 2023
@jihuayu jihuayu dismissed stale reviews from torwig and git-hulk via 4411894 June 25, 2023 07:02
@git-hulk
Copy link
Member

Thanks all, merging...

@git-hulk git-hulk merged commit 431277c into apache:unstable Jun 25, 2023
23 checks passed
@jihuayu jihuayu deleted the issues/1402 branch June 26, 2023 03:09
@qingping209
Copy link

very neat design !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature type new feature release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scan and scan_iter stuck to first iteration bigkeys analyze support
8 participants