New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extra 2xx_transaction on reply retransmission #1636
Comments
@bcnewlin , I can confirm (by checking the code) that the statistics are actually counting the number of relayed replies (with certain code classes). |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Updating to prevent stale bot from closing. |
@bcnewlin , could you comment on my above reply ? |
@bogdan-iancu Apologies, I didn't realize you were looking for a response. I think your suggested approach is a good one. |
OK, thanks for confirmation, I will work out a fix in this direction. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
@bogdan-iancu This patch seems to be causing transaction counts to be not counted in some cases. On our server that uses only TM it appears to be fine, but on our server that uses Dialog module the INVITE transaction is not counted on reply. Here are the stats after a single successful call, which is INVITE and BYE both with 200OK: You can see one 200OK is not counted and inuse_transactions is left at 1. I believe it is the INVITE that is getting missed as when I run a call with only an INVITE and 486 response I see these stats: |
@bcnewlin , maybe to pull the stats too fast after the call end. The transactions are deleted (by default) after like 5 secs after final reply. This may explain why the BYE transaction is still in use, and not yet counted as 2xx. Just be sure you wait 10 secs or so. Right now the Nxx_transactions stats are updated when the transactions are destroyed , so after a bit of time after their completion. |
@bogdan-iancu You are right, inserting a delay before pulling the statistics resolved the issue. This is not too bad a side effect in practical use, but will add significant time to our testbed unfortunately. I have decreased the wt_timer TM parameter in our tests to compensate, but that may potentially increase our exposure to other scheduling and timing issues. However, that is our problem. I will continue testing the patch and let you know the results. |
@bogdan-iancu I have verified the original patch fixes the scenario described in this ticket and reports the correct stats for all our other tests. For the new patch, I have verified that it fixes the scenario described in this ticket and does not require a delay before collecting stats. However, I am now seeing an extra 4xx_transaction count for scenarios involving authentication on our server using Dialog. Even if no second Invite is sent with credentials (because no matching credentials are found), I still see two 4xx_transactions (and two received_replies) but only one UAS_transaction: 4xx_transactions: 2, I'm attaching a trace for this scenario, in which the OpenSIPS server using Dialog is at 203.0.113.3. I've verified this is not the case for generic 4xx responses, it seems to only occur for 401, though possibly also for 407 as we have no test for that since SIPp does not support it. |
@bcnewlin , the thing you observed may be related to this part of the code - when I sent you the second patch I had in my mind the issue with "double" |
@bogdan-iancu I think you are right about that code being the culprit. I've run all of our tests against this patch and verified it produces the expected statistics in all cases with no delay. I've also verified it produces the correct statistics in the reported case of a retransmission. |
Perfect, thank you @bcnewlin for the testing, I will push the fix and backport it also. |
The fix consist of counting only ONCE each transaction (for the xxx_transaction statistics), disregarding its retransmissions, multiple 200OK replies or final code override (like 200 after a negative reply). Closes #1636 Many thanks to @bcnewlin for his testing and validation of the fix. (cherry picked from commit 68bedfe)
The fix consist of counting only ONCE each transaction (for the xxx_transaction statistics), disregarding its retransmissions, multiple 200OK replies or final code override (like 200 after a negative reply). Closes #1636 Many thanks to @bcnewlin for his testing and validation of the fix. (cherry picked from commit 68bedfe)
I have found a scenario where I believe there is an error in statistics collection for the TM module. I am attaching a trace where OpenSIPS is the node at 203.0.113.3.
In this scenario the ACK to the 200 OK response is delayed and the 200 response is retransmitted. After this call, the TM module showed the following statistics:
4xx_transactions: 0,
UAS_transactions: 2,
2xx_transactions: 3,
relayed_replies: 4,
5xx_transactions: 0,
received_replies: 7,
3xx_transactions: 0,
UAC_transactions: 0,
local_replies: 1,
6xx_transactions: 0,
inuse_transactions: 2
There should be only two 2xx_transactions: the INVITE and the BYE. I believe the retransmitted 200 OK is being erroneously counted as an additional transaction.
opensips -V
version: opensips 2.4.4 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 52ff74a
main.c compiled on 20:42:40 Mar 13 2019 with gcc 7
extra_transaction.pcap.zip
The text was updated successfully, but these errors were encountered: