Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan and scan_iter stuck to first iteration #1402

Closed
1 of 2 tasks
kinoute opened this issue Apr 25, 2023 · 14 comments · Fixed by #1489
Closed
1 of 2 tasks

scan and scan_iter stuck to first iteration #1402

kinoute opened this issue Apr 25, 2023 · 14 comments · Fixed by #1489
Assignees
Labels
bug type bug

Comments

@kinoute
Copy link
Contributor

kinoute commented Apr 25, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

Version

2.3.0

Minimal reproduce step

  1. Start Kvrocks Docker container:

docker run -it -p 6666:6666 apache/kvrocks

  1. Populate the database with 200k keys with Python 3.9.13 and the forked version of Redis-py available here : https://github.com/hashlookup/redis-py
pip install git+https://github.com/hashlookup/redis-py@master
import redis

REDIS = redis.Redis(
    host='0.0.0.0',
    port=6666,
    password='',
    db=0,
    )

# Populate Kvrocks with 200k keys
pipe = REDIS.pipeline()

for i in range(200_000):
    pipe.set(f'dba_{i}', 1)

pipe.execute()

# Test "scan_iter"
for key in scan_iter('dba_*'):
    print(key)


# Test "scan"
cursor = '0'

while True:
    cursor, keys = REDIS.scan(
        cursor=cursor,
        match='dba_*',
    )
    for key in keys:
        print(key)

    if not keys:
        break

What did you expect to see?

I expected both commands to iterate through every dba_* key.

What did you see instead?

Instead, both commands are stuck on the first iteration of default size and keep printing the same batch of keys every time.

Exemple with scan_iter:

b'dba_0'
b'dba_1'
b'dba_10'
b'dba_100'
b'dba_1000'
b'dba_10000'
b'dba_100000'
b'dba_100001'
b'dba_100002'
b'dba_100003'
b'dba_100004'
b'dba_100005'
b'dba_100006'
b'dba_100007'
b'dba_100008'
b'dba_100009'
b'dba_10001'
b'dba_100010'
b'dba_100011'
b'dba_100012'
b'dba_1'
b'dba_10'
b'dba_100'
b'dba_1000'
b'dba_10000'
b'dba_100000'
b'dba_100001'
b'dba_100002'
b'dba_100003'
b'dba_100004'
b'dba_100005'
b'dba_100006'
b'dba_100007'
b'dba_100008'
b'dba_100009'
b'dba_10001'
b'dba_100010'
b'dba_100011'
b'dba_100012'
b'dba_100013'
b'dba_1'
b'dba_10'
b'dba_100'
b'dba_1000'
b'dba_10000'
b'dba_100000'
b'dba_100001'
b'dba_100002'
b'dba_100003'
b'dba_100004'
b'dba_100005'
b'dba_100006'
b'dba_100007'
b'dba_100008'
b'dba_100009'
b'dba_10001'
b'dba_100010'
b'dba_100011'
b'dba_100012'
b'dba_100013'

etc.

Anything Else?

While being inefficient, keys doesn't suffer from this problem and return every key matching the pattern:

len(REDIS.keys('dba_*'))

200000

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@kinoute kinoute added the bug type bug label Apr 25, 2023
@kinoute
Copy link
Contributor Author

kinoute commented Apr 25, 2023

it works when we use this "obscure" python package : https://pypi.org/project/kvrocks-scan/

Which does the following:

import redis

def parse_scan(response, **options):
	cursor, r = response
	if cursor == b"0" or cursor == "0":
		return int(cursor),r;
	return cursor, r

redis.client.AbstractRedis.RESPONSE_CALLBACKS["SCAN"] = parse_scan;


# Redis scan_iter etc

I'm not sure why this works and the forked Redis python by @hashlookup, which seems to handle this int/str cast, doesn't.

Edit: Actually, with this method, scan_iter is stuck in an infinite loop. scan seems to work correctly.

@git-hulk
Copy link
Member

Hi, @kinoute

Thanks for your feedback, we would take a look.

@git-hulk
Copy link
Member

@kinoute The root cause is right that the Kvrocks scan command would return a string cursor instead of the integer, so it's not compatible with the official Python client. Another workaround can use the raw client to send and parse the scan command.

For why the scan_iter didn't, to see if @adulau or other guys who are familiar with Python can help.

@kinoute
Copy link
Contributor Author

kinoute commented Apr 26, 2023

But it seems the forked Redis-py client the Kvrocks team recommends in many issues should handle this problem? See this commit: redis/redis-py@e9c53fb

Or am I missing something here?

@git-hulk
Copy link
Member

Yes, we recommend using the fork redis-py, and it should be fine to add a wrap function to work around this.

@jihuayu
Copy link
Member

jihuayu commented May 31, 2023

The Redis documentation clearly states that the cursor is unsigned 64 bit number. Do we not need to fully compatible with Redis protocol?
Scan commands return value section

@infdahai
Copy link
Contributor

I approve your point.@jihuayu

@git-hulk
Copy link
Member

git-hulk commented Jun 1, 2023

@jihuayu The underly storage is different between Kvrocks and Redis, so what we can do is try our best to be compatible with Redis. Welcome to submit the proposal if you have any ideas to achieve this.

@jihuayu
Copy link
Member

jihuayu commented Jun 4, 2023

@git-hulk SCAN commond is frequently used in Redis clients.It's important to ensure SCAN commond is correct for Redis toolchains to work with Kvrocks.
Therefore, I believe this issue needs to be resolved.

After carefully reading the Redis documentation, I found that cursors have the following features:

  1. The cursor is a 64-bit unsigned integer, with 0 meaning the end of the iteration.
  2. There is no state server side, the full state is captured by the cursor
  3. During the iteration process, a given element may be returned multiple times.
  4. As long as the number of elements is not constantly increasing, the iteration will eventually end.
  5. Use a corrupted cursor is undefined behavior.

Therefore, I think we can use a ring array to store the key names for the most recently used cursors.

When SCAN end_cursor is "keyname1" we push {key:keyname1,cursor:1276} to cursor-ring.
Next SCAN use cursor 1276, we foreach cursor-ring from the newest item, and get the it's key "keyname1". I call this process cursor translation.
Then we can use "keyname1" to seek rocksdb iterator.

If client call many SCAN command, cursor-ring will be filled and the oldest cursor will be discarded.
Now the if client SCAN use a corrupted cursor(the cursor don't in cursor-ring), we will return element from begin.

Based on the above design, we will have the following additional costs.

  1. Less than 2KB of memory is used to store the cursor ring.
  2. Each SCAN request requires one cursor translation and one cursor mapping save. (Note: Ensuring thread safety is required for cursor mapping save)

Because SCAN command is a slow command, so the additional cursor translation has a minimal impact.

@git-hulk
Copy link
Member

git-hulk commented Jun 4, 2023

@jihuayu Thanks for your solution.

The ring buffer or LRU cache is good for me. For the HSCAN and SSCAN commands, we need to use the subkey to map the string cursor and number.

By the way, to be compatible with the older version, we should add a configuration to allow users to enable this feature. And can enable it by default in the next major version.

@jihuayu
Copy link
Member

jihuayu commented Jun 6, 2023

@git-hulk Ok, I will send a PR later. You can assign this task to me.

#877 It may be solved together.

@git-hulk
Copy link
Member

git-hulk commented Jun 6, 2023

Thanks @jihuayu, assigned

@jihuayu jihuayu mentioned this issue Jun 9, 2023
2 tasks
@mapleFU
Copy link
Member

mapleFU commented Jun 9, 2023

Nice catch! @jihuayu

  1. What if "keyname1" is deleted after cursor put into ring array?
  2. How does the cursor:1276 number generated?

@jihuayu
Copy link
Member

jihuayu commented Jun 9, 2023

Thank you! @mapleFU

  1. I'm not storing the iterator, I just made a dictionary that converts the number cursor to a begin key name when receiving the command. After completing the scan, I use the end key name to generate a new number cursor.
  2. The cursor is a global counter that is incremented whenever a new cursor needs to be generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug type bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants