Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made core layer close connection pools on loop close. #347

Merged
merged 1 commit into from Mar 28, 2023

Conversation

sevdog
Copy link
Contributor

@sevdog sevdog commented Feb 15, 2023

Small refactor to assure that both RedisChannelLayer and RedisPubSubChannelLayer closes connections upon event-loop close.

This will prevent redis from raising RuntimeError (see #332).

Closes #332.

@marksweb
Copy link
Contributor

Following some test failures while upgrading a project, mostly failing with;

RuntimeError: Two event loops are trying to receive() on one channel layer at once!

I now have a passing test suite with these changes. Thank you @sevdog

Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's have it. (Lint, I see...)

@carltongibson
Copy link
Member

@sevdog I've been somewhat occupied. If you fancied reviewing the open issues and PRs for any others that you think are ready to go, or close, that would really help, and I'll see if I can get a release out. Thanks

@carltongibson
Copy link
Member

(There's the one about the settings dict at least IIRC)

@sevdog
Copy link
Contributor Author

sevdog commented Feb 24, 2023

(There's the one about the settings dict at least IIRC)

Actually there are two: #341 (to avoid host dict changes in base layer) and #337 (to pass extra params). Maybe it could be nice to have the first merged since it could cover also the latter (but it needs to be changed since it is partial).

PS: it is fun that people focuses on a single layer implementation forgetting that there is an other one 😅

@sevdog
Copy link
Contributor Author

sevdog commented Feb 24, 2023

Is it normal that unittests runs for such a long time? I got some problems also locally and only worked with an old version of python redis (4.3.5), with any more recent version there was something which kept hanging forever.

@carltongibson
Copy link
Member

No, that means something is amiss.

It was happening a while back back had cleared up, so I guess what changed?

@sevdog
Copy link
Contributor Author

sevdog commented Feb 24, 2023

There is something which gives problems in one of these tests (which are the last two logged before timeout):

tests/test_core.py::test_message_expiry__all_messages_under_expiration_time PASSED [ 23%]
tests/test_core.py::test_message_expiry__group_send PASSED               [ 24%]

Also my host keeps hanging with them if I test with tox (which creates its own env).

the stacktrace of cancelation shows that it was waiting to receive something (I stopped tox during test_message_expiry__all_messages_under_expiration_time):

Traceback (most recent call last):
  File "channels_redis/tests/test_core.py", line 594, in test_message_expiry__group_send__one_channel_expires_message
    message = await channel_layer.receive(channel_2)
  File "channels_redis/channels_redis/core.py", line 386, in receive
    message_channel, message = await self.receive_single(
  File "channels_redis/channels_redis/core.py", line 441, in receive_single
    content = await self._brpop_with_clean(
  File "channels_redis/channels_redis/core.py", line 279, in _brpop_with_clean
    result = await connection.bzpopmin(channel, timeout=timeout)
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py", line 514, in execute_command
    return await conn.retry.call_with_retry(
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/retry.py", line 62, in call_with_retry
    await fail(error)
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py", line 501, in _disconnect_raise
    raise error
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py", line 488, in _send_command_parse_response
    return await self.parse_response(conn, command_name, **options)
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py", line 535, in parse_response
    response = await connection.read_response()
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/connection.py", line 836, in read_response
    response = await self._parser.read_response(
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/connection.py", line 256, in read_response
    response = await self._read_response(disable_decoding=disable_decoding)
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/connection.py", line 266, in _read_response
    raw = await self._readline()
  File "channels_redis/.tox/py310/lib/python3.10/site-packages/redis/asyncio/connection.py", line 343, in _readline
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
ERROR: got KeyboardInterrupt signal

From actions history it seems that something messed up execution (also the changelog PR #333 took about 6 hours) https://github.com/django/channels_redis/actions

@carltongibson
Copy link
Member

Ok, I reran the workflow, it hung again. Something needs rewriting. I'll have to a look (but that might be a little while...)

@bbrowning918
Copy link
Contributor

In these latest runs redis==4.5.1 is being pulled and seems to have some breaking changes (though I can't discern them from their change log). Pinning it to a version closer to what 4.0.0 was released under in the 4.2.x - 4.3.X range seems to help.

I am thinking some of the pains around the 4.0.0 release with regard to flaky tests was having set redis>=4.2.0 pulling whatever was latest at the time of the test run which may have incompatible changes for channels_redis.

https://github.com/bbrowning918/channels_redis/actions is me messing about with versions

@sevdog
Copy link
Contributor Author

sevdog commented Feb 27, 2023

@carltongibson I tried to run tests with pytest-timeout using a 10s timneout and the tests which fails due to timeout are:

FAILED tests/test_core.py::test_message_expiry__group_send__one_channel_expires_message - Failed: Timeout >10.0s
FAILED tests/test_sentinel.py::test_message_expiry__group_send__one_channel_expires_message - Failed: Timeout >10.0s

PS: there is also a typerror in test_receive_cancel:

channels_redis/core.py:237: in send
    await connection.zremrangebyscore(
.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py:514: in execute_command
    return await conn.retry.call_with_retry(
.tox/py310/lib/python3.10/site-packages/redis/asyncio/retry.py:59: in call_with_retry
    return await do()
.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py:488: in _send_command_parse_response
    return await self.parse_response(conn, command_name, **options)
...
>           retval = self.response_callbacks[command_name](response, **options)
E           TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
.tox/py310/lib/python3.10/site-packages/redis/asyncio/client.py:547: TypeError
--------
FAILED tests/test_core.py::test_receive_cancel - TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
FAILED tests/test_sentinel.py::test_receive_cancel - TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

@carltongibson
Copy link
Member

@bbrowning918 -- If a temporary pin solved the issues, I'm happy to take that but ideally we'd work out what's going on... (I feel slightly bad saying that since I have 0 capacity myself just now 😬)

@carltongibson
Copy link
Member

... the tests which fails due to timeout are...

@sevdog — So if you skip those tests? 🤔 (It's not a solution, but it narrows down our issue....)

Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @sevdog — I rebased after #349 — all green.

Can I get you to add a regression test for this? The issue was calling async_to_sync() on the channel layer twice in the same context, so doing that? See #332 and django/channels#1966 (comment)

@carltongibson carltongibson changed the title Assure pools are closed on loop close in core (#332) Made core layer close connection pools on loop close. Mar 9, 2023
@sevdog
Copy link
Contributor Author

sevdog commented Mar 9, 2023

Ok, as soon as I got a spare moment I will add a test which assure that no runtime error related to loop is raised.

@carltongibson
Copy link
Member

Thanks @sevdog — if we can get this one in then we're not far off being able to roll a 4.1. Thanks for your help! 🎁

@sevdog
Copy link
Contributor Author

sevdog commented Mar 9, 2023

I used a bad yet effective copy of test_double_receive which on current main branch raises the RuntimeError but not in this branch.

I choose to "copy" that test since it was close to the current solution proposed to solve the issue, thus I just had to remove the .close_pools from the inner send.

Copy link
Member

@carltongibson carltongibson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's take this. Thanks @sevdog!

See #348 for the PY311 failures, pending the next redis-py release.

@carltongibson carltongibson merged commit 0c89a97 into django:main Mar 28, 2023
5 of 6 checks passed
fredfsh pushed a commit to cobalt-robotics/channels_redis that referenced this pull request Jun 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update to v4 results in "RuntimeError: Event loop is closed"
4 participants