Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with INFO event handling. #43

Closed
phildunks opened this issue Jun 27, 2019 · 27 comments

Comments

Projects
None yet
2 participants
@phildunks
Copy link

commented Jun 27, 2019

Hi Dave
Making good progress but hit a problem.

I have a sipp UAC script emulating our application server executing a MSML call flow to collect a PIN and join a conference.
This all works fine.

However, the same call flow fails for calls from the actual application server.
It appears the INFO event handler in our javascript is not being called in this call, and the srf is responding automatically to the INFO.

https://gist.github.com/phildunks/8eba657aec5ff8dea97b3459980e1bdc

This gist contains the full srf log for both calls. I have commented inline to explain the call flow.

Thanks very much for your help.
Phil

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jun 27, 2019

The two scenarios do actually differ in a subtle but important way:

  • in the first call, there is no tag on the To header of the INFO request
  • in the second call, there is a tag on the To header of the INFO request

That makes the first example a request outside a dialog, while the second example is a sip request occuring within a dialog.

Since the INFO is intended (I think) to be a message within a dialog, I think you should change your SIPP script to include the To header (with tag) that you got back from the 200 OK to INVITE.

Now as to why the app is not getting triggered by the INFO within a dialog, that's because you need to handle that differently then an INFO outside a dialog. You need to do this:

uac.on('info', (req, res) => {..}

instead of

srf.info((req, res) => {..}

where uac is the UAC dialog you created with the call to Srf#createB2BUA.

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

Thanks Dave

Yes, the INFO is intended to be within a dialog.

I've only ever had an event handler on the uac dialog object though, never on the srf.

It's just got a bit weirder.

I came back from lunch, haven't changed anything, and made a call using the real application server - and it worked fine (!?!).

The next and subsequent calls fail with the usual problem.

https://gist.github.com/phildunks/d8785e456a2646bc59d8030f788976d6

This gist shows the srf log for the first two sequential calls from the application server. One good, one bad.

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

Sorry I should mention for clarity, that because the first call in (from the caller - PIN collection done on this leg) was successful, the application server creates a dedicated session for the conference control leg (all other MSML on this leg). So you will see a second successful call coming in immediately. This is part of the same conference.

The failed call comes after this conference has ended.

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jun 27, 2019

Not sure what the issue is with that 3rd INVITE, where we get an INFO and send the default 200 OK response, but whatever the issue is it will be on the app side -- the drachtio server does send that INFO over to the app, and apparently there is no handler found for the INFO message within that dialog.

Not sure if it is related, but I see a BYE come in for the conference control leg (the second dialog) before we get the 3rd INVITE. shouldn't the conference control leg be staying up?

To troubleshoot what is going on on the app side, I suggest running with DEBUG set as below:

DEBUG=drachtio:srf node app.js

with particular attention to the logging that you see when the INFO on that 3rd dialog comes in

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

Ok, it looks like a race condition, where for the failed calls, the INFO is processed before the B2B is fully established.

I think this is probably due to my schonky javascript.

I have been manually creating the B2B (eg. using createUAS + createUAC, rather than createB2BUA). I did have reasons for this, but now I understand more about the srf I don't think they are valid.

This gist contains the offending code, and application logs for good and bad call showing the problem occurring.

I'll change it to use createB2BUA and let you know how it goes.

Thanks again for your quick responses.

https://gist.github.com/phildunks/62628a2b6e2383cea6ed6c56390bb578

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

Working fine with createB2BUA.
Thanks for your help.

@phildunks phildunks closed this Jun 27, 2019

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jun 27, 2019

Yes, it seems like a race condition. Try using createB2BUA and let me know if there is still a race condition...there shouldn't be, but I am wondering now if there is a small window where this can happen, which I need to address, so let me know your results.

I slightly simplified and cleaned your code example here:

https://gist.github.com/davehorton/1b93ace680c049d740d646bf9976492c

I left out the manipulation of tags you were doing because I wasn't sure that was really necessary (but I may be not understanding the purpose).

@davehorton davehorton reopened this Jun 27, 2019

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 27, 2019

Thanks Dave.
It's certainly working much better with createB2BUA and I haven't been able to reproduce it yet.
When we've worked out all the functionals, we'll definitely be load testing.
I'll let you know how it goes.

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jun 27, 2019

great, thanks

@davehorton davehorton closed this Jun 27, 2019

@phildunks

This comment has been minimized.

Copy link
Author

commented Jun 28, 2019

Hi Dave
I'm still getting this issue occasionally when using createB2BUA.
Having said that, it is running in a docker container on my 6 year old mac, which is also running a VM with a freeswitch, plus a couple of IDE's etc. We haven't tried it on a decent server.
But it does confirm that there is a small window for it to happen.
Thanks
Phil

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jun 28, 2019

Thanks. Is there any chance of getting another drachtio server log when it happens?

@phildunks

This comment has been minimized.

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

Hi Dave
Do you need anything else?
Any luck spotting the issue?
Thanks
Phil

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

I tried creating with this test app last night, unsuccessfully

    try {
      const {uas, uac} = await srf.createB2BUA(req, res, 'sip:172.31.73.159:5065');
      logger.info('call connected successfully');
      [uas, uac].forEach((dlg) => dlg.on('destroy', () => {
        logger.info('call ended'); dlg.other.destroy();
      }));
      uas.on('info', (req, res) => {
        logger.info('dialog got info');
        res.send(503);
      })
    } catch (err) {

I responded 503 in the event handler so I could distinguish the failure case where a default 200 OK is sent.

In single calls generated to this with a sipp test app, which sent an INFO immediately following the app, I could never make this happen.

Does your app differ in any significant way from the above? Are you able to recreate (intermittently) on single calls, or is it more of a load test scenario?

@davehorton davehorton reopened this Jul 2, 2019

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

Hi Dave
It fails on single calls. It's fine for a while, and then starts failing for a while.
I'm not using 'await' in my code.
I'll update it to use that, and see if it improves things.
I'll let you know.
Thanks
Phil

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

hmm, the 'await' should be the same as resolving the Promise via 'then' so I'll be surprised if that happens.

As long as you call .on('info' on the dialog as soon as it is returned from createB2BUA this should not be happening.

I'll try running my test longer to see if I can get it to happen. Meanwhile let me review the logs you provided again to see if I need any more from your side

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

ok, had a quick look at your logs. Would it be possible to get another set of logs showing an occurrence:

  • the drachtio server log, at debug level, as you had it before
  • the log from the drachtio app, with DEBUG=drachtio:srf when running?
@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

OK sure.
I've updated to use await, and it's behaving fine at the moment.

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

Hi Dave

Here are the logs.

https://gist.github.com/phildunks/035dafadebd3ec52c1baf58b92d21cc1

The problem can be seen in the app/srf log :

 drachtio:srf examining ACK, dialog id: st5hsVxKUj;from-tag=1213582815620752658891.0 +2ms
  drachtio:srf calling dialog handler +0ms
  drachtio:srf Dialog handling message:  ACK +4m
  drachtio:srf examining INFO, dialog id: st5hsVxKUj;from-tag=1213582815620752658891.0 +1ms
  drachtio:srf calling dialog handler +0ms
  drachtio:srf Dialog handling message:  INFO +1ms
  drachtio:srf createB2BUA: successfully created UAS..done! +1ms
Created B2BUA

We can see the INFO is processed before the B2BUA is created, and hence the INFO event handler has not been registered on the dialog yet.

Thanks
Phil

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

I'm probably barking up the wrong tree, but there is a small difference in ordering in the logs for good and bad calls.

Good Call

2019-07-02 14:04:14.742321 timerG: removed entry, timer not set (queue is empty after removal), length: 0
2019-07-02 14:04:14.742575 timerH: removing entry, prior to removal length: 1
2019-07-02 14:04:14.742636 timerH: removed entry, timer not set (queue is empty after removal), length: 0
2019-07-02 14:04:14.743019 Client::write_handler - wrote 528 bytes: system:0
2019-07-02 14:04:14.743107 ClientController::removeNetTransaction: transactionId 91487ead-1e26-47d8-be70-a368248fafe0; size: 2
2019-07-02 14:04:14.743166 ClientController::route_ack_request_inside_dialog - removed incoming invite transaction, map size is now: 2 request
2019-07-02 14:04:14.743230 nta.c:5892 incoming_free() nta: incoming_free(0x55d2415c0240)
2019-07-02 14:04:14.743350 tport.c:2777 tport_wakeup_pri() tport_wakeup_pri(0x55d2415275a0): events IN

Here we can see ' wrote 528 bytes' comes before ' incoming_free()'.

Bad Call

2019-06-27 12:59:47.762910 timerG: removed entry, timer not set (queue is empty after removal), length: 0
2019-06-27 12:59:47.762953 timerH: removing entry, prior to removal length: 1
2019-06-27 12:59:47.762983 timerH: removed entry, timer not set (queue is empty after removal), length: 0
2019-06-27 12:59:47.763113 ClientController::removeNetTransaction: transactionId c89659cc-683e-4f58-81a8-99e3fde9ff52; size: 0
2019-06-27 12:59:47.763155 ClientController::route_ack_request_inside_dialog - removed incoming invite transaction, map size is now: 0 request
2019-06-27 12:59:47.763204 nta.c:5892 incoming_free() nta: incoming_free(0x55d8ce0fcc10)
2019-06-27 12:59:47.763234 Client::write_handler - wrote 528 bytes: system:0
2019-06-27 12:59:47.763288 tport.c:2777 tport_wakeup_pri() tport_wakeup_pri(0x55d8ce0e55a0): events IN

Here we can see it comes after.

Other than that, the logs seem pretty identical.

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

can you report which version of drachtio-srf you are using?

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

{
  "_from": "drachtio-srf",
  "_id": "drachtio-srf@4.4.8",
  "_inBundle": false,
  "_integrity": "sha512-mPyU4tPiabpssctl3GGh0Gwrixbo1ZPeG1RcC4uytv/7F6NECciU6D0inRUJushjkLMb088N7G3qrAI3JqewqA==",
  "_location": "/drachtio-srf",
  "_phantomChildren": {},
  "_requested": {
    "type": "tag",
    "registry": true,
    "raw": "drachtio-srf",
    "name": "drachtio-srf",
    "escapedName": "drachtio-srf",
    "rawSpec": "",
    "saveSpec": null,
    "fetchSpec": "latest"
  },
  "_requiredBy": [
    "#USER",
    "/"
  ],
  "_resolved": "https://registry.npmjs.org/drachtio-srf/-/drachtio-srf-4.4.8.tgz",
  "_shasum": "45704314a05f940a7e93e61854213a9b60ca2adc",
  "_spec": "drachtio-srf",
  "_where": "/temp/drachtio",
  "author": {
    "name": "Dave Horton"
  },
  "bugs": {
    "url": "https://github.com/davehorton/drachtio-srf/issues"
  },
  "bundleDependencies": false,
  "dependencies": {
    "async": "^1.4.2",
    "debug": "^3.1.0",
    "delegates": "^0.1.0",
    "deprecate": "^1.0.0",
    "drachtio-mw-registration-parser": "0.0.2",
    "drachtio-sip": "^0.3.9",
    "lodash": "^4.17.11",
    "node-noop": "0.0.1",
    "only": "0.0.2",
    "sip-methods": "^0.3.0",
    "utils-merge": "1.0.0",
    "uuid": "^3.0.0",
    "winston": "^2.4.2"
  },
  "deprecated": false,
  "description": "drachtio signaling resource framework",
  "devDependencies": {
    "blue-tape": "^1.0.0",
    "cheerio": "^0.22.0",
    "config": "^1.28.1",
    "drachtio-test-fixtures": "^0.1.7",
    "eslint": "^5.16.0",
    "eslint-plugin-promise": "^3.6.0",
    "fs-extra": "^0.26.7",
    "minimist": "^1.1.0",
    "nyc": "^14.1.0",
    "passport": "^0.3.2",
    "passport-http": "^0.3.0",
    "tap": "^13.1.6",
    "tap-dot": "^2.0.0",
    "tap-spec": "^5.0.0"
  },
  "engines": {
    "node": ">= 6.9.3"
  },
  "homepage": "https://github.com/davehorton/drachtio-srf#readme",
  "license": "MIT",
  "main": "lib/srf.js",
  "name": "drachtio-srf",
  "repository": {
    "type": "git",
    "url": "git+https://github.com/davehorton/drachtio-srf.git"
  },
  "scripts": {
    "coverage": "nyc --reporter html --report-dir ./coverage npm run test",
    "jslint": "eslint lib",
    "test": "NODE_ENV=test node test/ | ./node_modules/.bin/tap-spec"
  },
  "version": "4.4.8"
}

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

OK, I think I see the issue. It's a Promise race condition -- even though the code delivering a Promise that resolves with the uas is called here:

drachtio:srf Srf#addDialog: adding dialog with id st5hsVxKUj;from-tag=1213582815620752658891.0 type INVITE, dialog count is now 2  +24ms

the .then(uas) code is not guaranteed to execute immediately, and per the spec is enqueued at the back of the job queue.

That queue already has the incoming and 'ack' and 'info' messages that have arrived, which then get processed:

drachtio:srf examining ACK, dialog id: st5hsVxKUj;from-tag=1213582815620752658891.0 +2ms
  drachtio:srf calling dialog handler +0ms
  drachtio:srf Dialog handling message:  ACK +4m
  drachtio:srf examining INFO, dialog id: st5hsVxKUj;from-tag=1213582815620752658891.0 +1ms
  drachtio:srf calling dialog handler +0ms
  drachtio:srf Dialog handling message:  INFO +1ms

and only then do we see are 'then' code execute:

drachtio:srf createB2BUA: successfully created UAS..done! +1ms

Now that I think I see what is going on I should be able to work out a solution.

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 2, 2019

That's great.
Thanks Dave

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 2, 2019

can you install and test with this version:

npm install --save drachtio-srf@beta

and let me know if this resolves it?

@phildunks

This comment has been minimized.

Copy link
Author

commented Jul 3, 2019

Hi Dave
Hasn't occurred all morning.
Approx a couple of hundred calls.
Looks very much better!
I'll keep an eye on it, and let you know any issues.
Thanks very much for your help.
Phil

@davehorton

This comment has been minimized.

Copy link
Owner

commented Jul 3, 2019

ok, you can update to drachtio-srf@4.4.9 (latest) if you like as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.