Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CometD: stalled subscription notifications #530

Open
timbell opened this issue Feb 5, 2021 · 5 comments
Open

CometD: stalled subscription notifications #530

timbell opened this issue Feb 5, 2021 · 5 comments
Labels

Comments

@timbell
Copy link

timbell commented Feb 5, 2021

Testing CometD subcriptions to server status and player status notifications using libcometd.js from the material-skin plugin.

When this code path is followed on the server long-polled notifications are stalled until the next long polling timeout (60s) occurs. This is due to the remove_connection call in sendHTTPResponse.

Reproducing the problem is a bit tricky. I've managed to do it reasonably reliably using the attached test web page as follows:

  • copy the attached files to a test-cometd folder under HTML/Default/html folder on the server
  • run the page in a browser with a URL like: http://localhost:9000/html/test-cometd/
  • click on Connect (the first player registered on the server will be selected) and then Subscribe Player
  • power on/off the selected player to generate server and player status notifications which should show up in the page's Log widget
  • refresh the page in the browser and repeat the previous two steps. On the 2nd or 3rd attempt (page refresh + previous 2 steps) you should see that the status notifications are stalled until the long-polling timeout occurs.

NB the page refreshing is crucial to getting transport('long-polling') to be set on the $httpClient used to service the /slim/subscribe player status call.

From what I can see, the problem could be fixed in a few different ways:

  • suppressing the remove_connection call in this exact case
  • suppressing the remove_connection call more generally - eg only calling it in the /meta/(re)connect case?
  • removing $httpClient->transport('long-polling') from the client once the response has been sent
  • others???

The problem was reproduced with LMS running on Windows 10 (version 8.1) and Linux (version 7.9.1).

@michaelherger
Copy link

Is this something @CDrummond has seen, too? Are you able to reproduce this with any browser?

@CDrummond
Copy link
Contributor

Can't say I have noticed this issue, and I do refresh the page quite often to ensure the skin reloads. I clear cache and do a full refresh/reload, so perhaps this is why I do not see it?

I have noticed, but probably not related to this issue, that notifications can be 'slow' (i.e. take a few seconds to show), which is why Material manually requests a status update after most actions.

@timbell
Copy link
Author

timbell commented Feb 7, 2021

Reproduced with Chrome + Firefox on Windows and Chrome on Android. I'm pretty sure it's not a client issue - the page refreshing just seems to be a reasonably reliable way to get the $delayedResponse = 0 code path to be followed on the server.

What I think is happening is that the $httpClient instances persist across requests and each one is associated with a particular socket. (I could be way off here - new to Perl.) When the page is opened we get, say, socket 1 used for /meta/connect long polling requests and socket 2 used for other requests (/slim/subscribe). Refreshing the page and connecting again can lead to socket 2 being used for /meta/connect long polling requests and socket 1 being used for /slim/subscribe requests.

So, the $httpClient that was initially being used for the long-polling requests (and so has transport('long-polling') set) is now being used for the /slim/subscribe requests. And that's how we get to execute the $delayedResponse = 0 code path. (There could be other ways to get there too.)

@michaelherger
Copy link

Thanks for your test files, @timbell. I believe I have been able to reproduce the issue: sometimes after reloading a web UI using long-polling, it would take a time out on the long-poll before the connection would re-establish. Therefore after a page reload we can have up to 60 seconds before the notifications are reaching the web UI. Is this correct? Are there any other limitations I haven't been able to experience?

@timbell
Copy link
Author

timbell commented Feb 7, 2021

Yes, @michaelherger, that's right. (There may be other ways that can cause the $delayedResponse = 0 code path in Slim::Web::Cometd::handler to be followed, however.)

A simple fix might be to clear the transport in Slim::Web::Cometd::handler like this:

    # reset transport (especially for the long-polling case)
    $conn->[HTTP_CLIENT]->transport(  ' '  );
    
    my $clid;
    my $events = [];
    ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants