Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Session detached after new session attached, resulting in wamp.error.procedure_already_exists #578

Open
DamiToma opened this issue May 16, 2023 · 5 comments
Labels
bug CI-CD Test, build and packaging infra needs-investigation

Comments

@DamiToma
Copy link

DamiToma commented May 16, 2023

I'm using Crossbar v20.12.3 and autobahn v20.9.2. Sometimes the onclose callback is called before the session is detached from the router.

I had a running process connected to the router with session ID 7159338250676650. Then, the onclose event was called at 2023-05-15T20:51:11.041Z with the following content

{
  "reason": "lost",
  "details": {
    "reason": null,
    "message": null,
    "retry_delay": 1.9348029488634622,
    "retry_count": 1,
    "will_retry": true
  }
}

In our implementation, we register all procedures that were registered before as soon as the connection is open again. So, as soon as we received the open event, we registered all procedures that were registered before. However, all of them failed to be created with the error wamp.error.procedure_already_exists. This should not be possible because we only try to register when the new session is connected.

After reviewing router logs, I found out that the previous session had its detached session log about 30 seconds after the new session connected. So my understanding is:

  1. Client lost connection at 20:51:11.041
  2. Router logged the new session at 20:51:13, below is the log
    2023-05-15T20:51:13+0000 [Router 12] attached session 1165314019232070 to realm "myrealm" (authid="username", authrole="role") <crossbar.router.router.Router.attach>
  3. Client called the onopen callback at 20:51:13.512Z and it tried to register all procedures that were registered with the old session, however none of them worked because router still had to detach old session
  4. Router logged old session detachment at 20:51:46, below is the log
    2023-05-15T20:51:46+0000 [Router 12] detached session 7159338250676650 from realm "myrealm" (authid="username", authrole="role", detached 1 sessions total) <crossbar.router.router.Router.detach>

So the question is, should the router detach the session before attaching the new one? Otherwise, how can we handle disconnections and register procedures again? Thanks!

@oberstet
Copy link
Contributor

Sometimes the onclose callback is called before the session is detached from the router.

if so (a session is detached client side only), the that session's internal list of registered procedures and subscriptions must be cleared together with the connection being closed (and possibly auto-reconnect then happen ..)

if this is not the case, it is likely a bug

in this case, we first need a test that reproduces the situation (and then fails currently, because of the supposed bug), and then a fix which rectifies the bug and makes the (new) test pass

@oberstet oberstet added bug needs-investigation CI-CD Test, build and packaging infra labels May 16, 2023
@DamiToma
Copy link
Author

The reason I'm suspicious about this is reconnections happen several times a day, however this is the time we experience such issue. It's not really easy to look into the data (because router is running with minimal logging), however I'm confident about my analysis because the timestamps between router and client logs match. I'm almost sure the old session was not cleared before the new one was created, and this resulted in the list of procedures being held up.

How would we reproduce this? We've been running our production environment for a lot of time now, and it's the first time we experience such issue (although I would not be surprised if it happened another time in the past and I did not recognize it). Perhaps I can run a script on the staging environment that does frequent reconnects? Also, I'm not sure whether clearing out procedures is something that ought to be done on the router or client. Is there resources about this? I've looked into the protocol specification and I'm not sure whether this report belongs to crossbar or Autobahn either.

@DamiToma
Copy link
Author

DamiToma commented Jul 6, 2023

Hi @oberstet Following up on this. I would love to help figuring out whether this is an actual bug (and possibly help fix it), but not sure the best way of reproducing. Is what I proposed above fine? I might just go ahead and try it out, or is there any standard you're following when debugging WAMP related libraries?

@oberstet
Copy link
Contributor

oberstet commented Jul 6, 2023

if it makes sense and works, one option would be expanding the automated testing that we do have in this repo. eg consider

var dl = testutil.connect_n(2);

this will fire up 2 sessions (originating from the 1 test driver instance) connecting to the same router, and then does checking of behavior across sessions.

a test might even "kill -9" the router instance during the test, then restart a router instance, then wait for the testee sessions to reconnect, and then test whatever behavior during or after the reconnect should be tested for ..

@oberstet
Copy link
Contributor

oberstet commented Jul 6, 2023

local testing can be started via makefile:

test_pubsub_basic:

github actions automated testing is using that makefile also:
https://github.com/crossbario/autobahn-js/blob/c831f35c1911eb31ec8ffdfe0f62fa15e9dcccb6/.github/workflows/main.yml#L44C18-L44C18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug CI-CD Test, build and packaging infra needs-investigation
Projects
None yet
Development

No branches or pull requests

2 participants