Potentially Incorrect Expiration Mechanisms #182

astutejoe · 2020-03-05T21:00:43Z

I'm having tons of problems with channels on my Redis at the moment, after studying carefully the problem I realized that the expiration mechanisms don't work as I expected, first lemme quote the Django Channels specification so we're on the same page expectations-wise:

Channel layers do not need to persist data long-term; group memberships only need to live as long as a connection
does, and messages only as long as the message expiry time, which is usually a couple of minutes.

Message expiration

From what I saw the message backlog is saved as asgi:specific.$random12 and the TTL of that key is 60 seconds, and its expiry is being reset for every message (https://github.com/django/channels_redis/blob/master/channels_redis/core.py#L306). This leads to the behavior that messages actually last an infinite amount of time as long as there are new messages being sent, not the 60 seconds per message I expected after reading the documentation and all the READMEs.

My problem: a connection isn't cleanly closed so the group membership is not discarded, leaving that connection with a huge backlog (I've set the capacity to 100000, everything else is the default) that only expires after a day (not 60 seconds). This is consuming a lot of Redis memory in my case and even with a default channel capacity can potentially lead to a mild DoS.

I'm using:
channels-redis 2.4.2
channels 2.4.0
django 3.0.4

My structure:
Redis <- Uvicorn <- Gunicorn <- Nginx <- AWS ELB <- AWS CloudFront

Are my assumptions and expectations correct? If so I will start doing some code for channel_redis. Thanks!!

tarikki · 2020-03-06T02:05:45Z

@astutejoe: Ah, you are absolutely correct! I ran into the same issue in my own project but forgot about it because I was able to mitigate against it with other measures.

The day expiration comes from the group_expiration config setting the default of which is 86400 seconds.

Yeah this is definitely a bug 🐛

astutejoe · 2020-03-06T02:20:00Z

@tarikki I'll try submitting a pull request today using the same mechanism that is used to expire group memberships (sorted sets).

tarikki · 2020-03-06T04:21:55Z

Great! I also came up with something to fix this, will try to come back to it after work (timezone Europe/Helsinki).

PelegMedia · 2020-05-06T10:30:40Z

@astutejoe there seems to be an issue when using bzpopmin in core.py.
As this method returns None (nil) in case of an empty set the code will fail when trying to iterate the result of None into _, member, timestamp.

The code should be as follows:

result = await connection.bzpopmin(channel, timeout=timeout)
if result is None:
      #If set is empty it will return a None (nil) so member is None itself as there are no members left
      member = None
else:
      #Take member and timestamp from returned array
      member = result[1]
      timestamp  = result[2]
      #Add member to backup queue
      await connection.zadd(backup_queue, float(timestamp), member)
return member

astutejoe · 2020-05-06T14:44:57Z

Very good catch @PelegMedia, I commited a fix. Really appreciate you testing this out for me.

PelegMedia · 2020-06-03T14:44:53Z

Hi @astutejoe ,

We found a problem with one of the usage done with REDIS-lua. It may cause the memory to fill up and possibly reach the maximum memory.
We got 60GB of memory filled up very quickly after sending data over the channel.

REDIS save each script in a cache store (forever), even if you execute it only once.
The only way to clean it is via restarting the server.
If you execute a script and change even a single char it will create a new entry in its cache.

To solve this issue you have to send the parameters as part of the arguments, while keeping the script code identical on each eval.

The core.py code uses 3 calls to "eval".
"cleanup_script" and "delete_prefix" are static and OK.

The third one, for "group_send_lua" has the problem.
It is generated each time using a python parameter, "% self.expiry", which generates a different string each time.
It means that each execution will create a new entry in the REDIS scripts cache (forever).

I think the best solution will be to pass the "expiry" data as one of the ARGV items.
This will make sure that the script will have a fixed code content.
The three scripts will then create only three entries in the cache.

astutejoe · 2020-06-03T17:08:36Z

@PelegMedia Very good catch!! I'll try introducing that fix later.

gurshafriri · 2020-06-28T14:16:01Z

👋 @astutejoe, Gur from Snyk here

Would you say this bug has a real security implication? If so, we would like to add it to our vulnerability database. Feel free to reach out at report@snyk.io to provide more details and context.

astutejoe mentioned this issue Mar 5, 2020

"channels over capacity in group xxx" flood #181

Closed

astutejoe mentioned this issue Mar 6, 2020

Using a sorted set to guarantee message expiration #184

Merged

carltongibson closed this as completed in #184 Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potentially Incorrect Expiration Mechanisms #182

Potentially Incorrect Expiration Mechanisms #182

astutejoe commented Mar 5, 2020 •

edited

Loading

tarikki commented Mar 6, 2020

astutejoe commented Mar 6, 2020

tarikki commented Mar 6, 2020

PelegMedia commented May 6, 2020 •

edited

Loading

astutejoe commented May 6, 2020

PelegMedia commented Jun 3, 2020

astutejoe commented Jun 3, 2020

gurshafriri commented Jun 28, 2020 •

edited

Loading

Potentially Incorrect Expiration Mechanisms #182

Potentially Incorrect Expiration Mechanisms #182

Comments

astutejoe commented Mar 5, 2020 • edited Loading

Message expiration

tarikki commented Mar 6, 2020

astutejoe commented Mar 6, 2020

tarikki commented Mar 6, 2020

PelegMedia commented May 6, 2020 • edited Loading

astutejoe commented May 6, 2020

PelegMedia commented Jun 3, 2020

astutejoe commented Jun 3, 2020

gurshafriri commented Jun 28, 2020 • edited Loading

astutejoe commented Mar 5, 2020 •

edited

Loading

PelegMedia commented May 6, 2020 •

edited

Loading

gurshafriri commented Jun 28, 2020 •

edited

Loading