Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chanel layer group_send bug #48

Open
davidsarosi92 opened this issue Jul 13, 2023 · 6 comments
Open

Chanel layer group_send bug #48

davidsarosi92 opened this issue Jul 13, 2023 · 6 comments
Labels
question Further information is requested

Comments

@davidsarosi92
Copy link

Hi!

I use Django 4.2.3, Channels 4.0.0 and strawberry-graphql-django 0.10.6.
Python version is 3.11.4

channel_layer is channels_rabbitmq.core.RabbitmqChannelLayer

I use subscription, and I got an error message sometimes for channel_layer.group_send:

| Fatal error: protocol.data_received() call failed.
| protocol: <carehare._protocol.Protocol object at 0xffff6edd6690>
| transport: <_SelectorSocketTransport fd=17 read=polling write=<idle, bufsize=0>>
| Traceback (most recent call last):
|   File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 1009, in _read_ready__data_received
|     self._protocol.data_received(data)
|   File "/usr/local/lib/python3.11/site-packages/carehare/_protocol.py", line 118, in data_received
|     self._handle_buffer_frames()
|   File "/usr/local/lib/python3.11/site-packages/carehare/_protocol.py", line 170, in _handle_buffer_frames
|     self._handle_frame_or_crash(channel_id, frame)
|   File "/usr/local/lib/python3.11/site-packages/carehare/_protocol.py", line 184, in _handle_frame_or_crash
|     self.accept_frame(frame, channel_id)
|   File "/usr/local/lib/python3.11/site-packages/carehare/_protocol.py", line 212, in accept_frame
|     return self._accept_frame_after_handshake(frame, channel_id)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/usr/local/lib/python3.11/functools.py", line 946, in _method
|     return method.__get__(obj, cls)(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/usr/local/lib/python3.11/site-packages/carehare/_protocol.py", line 220, in _accept_frame_after_handshake
|     channel.accept_frame(frame)
|   File "/usr/local/lib/python3.11/functools.py", line 946, in _method
|     return method.__get__(obj, cls)(*args, **kwargs)
|            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|   File "/usr/local/lib/python3.11/site-packages/carehare/_exchange_channel.py", line 52, in _
|     self._delivering.pop(tag).set_result(None)
| asyncio.exceptions.InvalidStateError: invalid state

Sometimes channel_layer.group_add is stuck, I don't know, it is just waiting...

Will there be support for Python 3.11?

@strawberry.subscription
    async def inventory_count_header(self, info: Info, inventory_count_header_id: int) -> AsyncGenerator[
        InventoryCountHeaderWithInventoryItemsType, None]:
        ...

        ws = info.context.ws
        channel_layer = ws.channel_layer
        group_name = ...

        await channel_layer.group_add(group_name, ws.channel_name)

        await channel_layer.group_send(
            group_name,
            {
                'type': ...,
                'operation': ...,
                'id': ...
            },
        )

        async for message in ws.channel_listen(ChannelGroup.INVENTORY_COUNT_HEADER_TYPE.value, groups=[group_name]):
            operation: Operation = message['operation']
            ...
            yield result
@adamhooper
Copy link
Contributor

Hiya! Thank you for your interest.

Could you please reproduce this error without Strawberry in the picture?

@adamhooper
Copy link
Contributor

Also: is there an exception logged before this one?

@adamhooper
Copy link
Contributor

adamhooper commented Jul 14, 2023

I suspect the culprit might be somewhere around here: https://github.com/strawberry-graphql/strawberry/blob/fb7a9c463027d86965219b8647220d35fd692084/strawberry/subscriptions/protocols/graphql_ws/handlers.py#L196C8-L196C8

The key being: something might be calling .cancel() on the .group_send(). And if that happens, it messes with state.

Okay, gotta rant about Python asyncio for a minute. There are probably a million cancel-related bugs in asyncio-based libraries. What a mess. Nobody should cancel tasks because every async library is hopelessly buggy. And every async library is hopelessly buggy because cancelling tasks is so freakishly difficult to implement -- the opposite of how Python should be! (We should all use cancellation contexts, like Go does and C# does and JS does and ....)

All right, rant complete. @davidsarosi92 I don't know your level of expertise in forking and trickery ... but I'm going to need you to dig deep. Can you somehow edit carehare (not channels_rabbitmq) carehare/_exchange_channel.py and change return done to return asyncio.shield(done) ... and get that code running in a place where you could normally make the bug happen. Does that asyncio.shield(done) make your problems go away?

I'm not sure asyncio.shield(done) is the solution we want. The goal here is just to figure out whether the problem is with a cancel in group_send().

@davidsarosi92
Copy link
Author

I missed an error message last time: Failed to send message {'type': '...', 'operation': ..., 'id': ...} to group ... because we disconnected

I'm not sure asyncio.shield(done) is the solution we want. The goal here is just to figure out whether the problem is with a cancel in group_send().

Okay, I changed carehare/_exchange_channel.py and now I got socket.send() raised exception.
I try to help, but I just now started use asyncio tools.

Also: is there an exception logged before this one?

Huhh, it's a good question, I use graphql-flutter,
I started an update with Python 3.9, Django 3.2 etc... Now I use 3.11, 4.2...
I just created temp. subscription solution for later developmet, because it's a bigger update. When we started working on subscription, we got this.

I don't know if it helps, but there is no such problem with redis and redis_channel, but if there is an opportunity, I would stick with the rabbitmq solutions.

I have a hunch that this happens when "disconnect doesn't happen".
Connection managed by graphql_flutter

@adamhooper
Copy link
Contributor

Thanks for trying it out! I'd need to see more thorough logs to pick through those error messages and figure out what's happening....

@adamhooper
Copy link
Contributor

Marking this issue as "question": it looks like there's something wrong, but I'm not sure whether it's in the application or in this library.

socket.send() raised exception looks like it happens if we try and send after the RabbitMQ connection has been closed. ( python/asyncio#301). But that shouldn't happen -- not without also triggering warnings from carehare -- unless something weird is going on. Invalid use of threads or os.fork() could cause this message, for instance.

More thorough logs may help figure out what's happening.

@adamhooper adamhooper added the question Further information is requested label Jul 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants