Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk Redis Backend Results when larger than 512MB #7912

Open
4 tasks done
ErikBZ opened this issue Nov 18, 2022 · 8 comments
Open
4 tasks done

Chunk Redis Backend Results when larger than 512MB #7912

ErikBZ opened this issue Nov 18, 2022 · 8 comments

Comments

@ErikBZ
Copy link

ErikBZ commented Nov 18, 2022

Checklist

  • I have checked the issues list
    for similar or identical feature requests.
  • I have checked the pull requests list
    for existing proposed implementations of this feature.
  • I have checked the commit log
    to find out if the same feature was already implemented in the
    master branch.
  • I have included all related issues and possible duplicate issues
    in this issue (If there are none, check this box anyway).

Related Issues and Possible Duplicates

Related Issues

Possible Duplicates

  • None

Brief Summary

Redis has a limit of 512MB for String values. While generating tasks, some may get larger than that limit resulting in a BackendStoreError #6533. In applications where a size larger than 512MB might be expected, we'd like to chunk the task down to manageable pieces and then coalesce the data when it's needed.

This is related to the other issue I posted, #7911, since we would be able to send the IDs of the chunked data and then merge the results later on when a worker has picked up the task.

Design

During a SET we'd use the same key as we normally would celery-task-meta-561cb2b3-c43e-4944-9531-e1b37e84c33d but the data saved to this key would instead be a list of sub-keys, prefixed by REDIS_CHUNK_KEYS, followed by the ids of the chunks delimited by a ,. For example REDIS_CHUNK_KEYS,key1,key2.

These subkeys would be where the chunks of the actual data are stored.

During a GET, we'd check if the string value is prefixed by REDIS_CHUNK_KEYS. If it is then we'd parse the value, and then make another request for each of data chunks pointed to by the subkeys.

Architectural Considerations

None

Proposed Behavior

Proposed UI/UX

Diagrams

N/A

Alternatives

None

@open-collective-bot
Copy link

Hey @ErikBZ 👋,
Thank you for opening an issue. We will get back to you as soon as we can.
Also, check out our Open Collective and consider backing us - every little helps!

We also offer priority support for our sponsors.
If you require immediate assistance please consider sponsoring us.

@auvipy
Copy link
Member

auvipy commented Nov 19, 2022

do you have implementation detail in mind? also how does #7801 (comment) fits in this case?

@ErikBZ
Copy link
Author

ErikBZ commented Nov 21, 2022

maxmemory sets the max usage for the service, but not the max size of a single value. A key's value hitting the 512MB limit is what we're hitting.
As for the implementation, during a SET we'd use the same key as we normally would celery-task-meta-561cb2b3-c43e-4944-9531-e1b37e84c33d but the data saved to this key would instead be a list of sub-keys, prefixed by REDIS_CHUNK_KEYS, followed by the ids of the chunks delimited by a ,. For example REDIS_CHUNK_KEYS,key1,key2.

These subkeys would be where the chunks of the actual data are stored.

During a GET, we'd check if the string value is prefixed by REDIS_CHUNK_KEYS. If it is then we'd parse the value, and then make another request for each of data chunks pointed to by the subkeys.

@auvipy
Copy link
Member

auvipy commented Nov 22, 2022

OK, so you are up for the implementation?

@ErikBZ
Copy link
Author

ErikBZ commented Nov 22, 2022

Yup, sure am

@auvipy
Copy link
Member

auvipy commented Nov 23, 2022

Ping me on the pr for review

@denyszhak
Copy link

Are there any updates on this? It's being here for a while now

@auvipy
Copy link
Member

auvipy commented Mar 1, 2023

we are still waiting for contributions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants