Currently the Redis string type and the protocol itself are limited to max string length of 512 MB.
Actually the internal limit of an sds.c string, that is the dynamic string abstraction used inside Redis, is 2GB (as the string header represents length and remaining space as a signed 32 bit integers), but at some point we decided to reduce it to 512 MB since it's a large enough value anyway, and a 32 bit integer can address every bit inside a 512 MB value, that was handy for bit operations.
However while the limit so far was never a practical problem, there is a new interesting issue about the MIGRATE, DUMP and RESTORE commands that are used in order to serialize / unserialize or atomically transfer Redis keys. It is perfectly valid for a Redis key to have a serialization blob that exceeds the 512 MB limit, while it is not common as the serialization format is very compact (for instance a 1 million elements list of small strings is serialized into a 7 MB blob).
This means that with the current limit we can't MIGRATE very very large keys, and is surely a problem for certain applications (actually Redis cluster is not the best bet if you have many million elements keys as migration can be slow).
It is certainly possible to easily remove the 512 MB limit by using 64 bit integers in sds.c string headers and removing the limits inside the bulk parsing code. The problems with this approach are:
So long story short the real tradeoff is the additional memory usage. The plan is:
But in general, limits suck.
For instance there was an AOF issue, and there is currently an unsolved redis-cli issue that result form the sds.c 2 GB limit. My vote for now is to remove the limit and pay the 8 bytes overhead, but more testing will provide more data points to better evaluate.
I have a question. if the limitation is removed, maybe it can affect lower performance? if someone try get or set values more than 512MB?
I believe in the Unix philosophy and giving the user enough rope to hang himself; so please, let's remove this limit, however make it visible enough for potential users and people upgrading to this feature. Otherwise you could reap all sorts of new funny bug reports from users hanging themselves. Make it backwards compatible for smooth upgrade paths of people with important large-scale implementations, so you'll have more early adopters.
I'm in a project where we're working on some sort of redis filesystem using FUSE and given some design considerations in the future, lifting this 512MB limit could prove to be a valuable asset. So, holler as soon as you have a first implementation ready to test (which probably means by next weekend :)).
There are definitely no performance concerns if we just remove the limit the vanilla way.
Well we have a couple of options fortunately and there is no need to pick the best one right now, some time will help as usually :-)
I brought the up with Pieter at RedisConf (among other topics):
What if strings were pre-sharded if they were beyond a specific size? Instead of storing it all as a single allocation, break it up into blocks, and access the blocks via a hash table.
Turn off a few crashing tests in nightly unittest until perfmons are …
…fixed. See #757.
fix a lazy crashing bug (c.f. Issue #757)
Given the merging of #2509, what remains to be done in order to remove this limit? Do you think it will be part of 3.2?
I can try to assist with this, given some guidance. So far, I've just been searching for "512". The code that is particularly biting me is here:
I also see this line, does this need to be changed or do you want the networking/protocol aspects to be unchanged?
My specific use case is APPENDing to a key which may grow bigger than 1GB; however, my config does not use any persistence.
Hello @neomantra, indeed as a side effect of changes operated by @oranagra to sds.c this should be a solved problem AFAIK... However there is to check if perhaps there is some assertion to remove in the code that assumed the old limits. Then there is to understand if we want also to remove the string limits in the API as well, but maybe it is a sane safeguard to have them? Thanks.
Oh sorry I said something stupid: MIGRATE uses the API anyway, so we need to remove the limit from the API as well... Or to whitelist just the MIGRATE command at least.
P.s. btw yes this is going to be part of Redis 3.2 since it affects Cluster operations in a serious way.
@antirez i just realized there's one more change that is needed for all of that to work.
rdbSaveLen should be fixed so that it can save strings larger than 4gb.
I'm going to experiment with this a bit, with some of the simple changes I noted before and ignoring safety.
With respect to safeguards, I could imagine two different configurations, allowing operators to balance security/sanity/utility:
In my use case, I have very large string values, but they are attained with many small APPENDs, so I might keep a low max message size, but crank up the max String value size. But ultimately, I am in a highly trusted environment (e.g. localhost with auth), so I might not fiddle with it at all.
Or, with the whitelist approach, I'd like to be able to whitelist APPEND.
I also note that some documentation would need to be changed, e.g.:
Seems that one can really mess things up by invoking SETRANGE with a huge number. Perhaps that is a +1 for a whitelist approach?