New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] HA backup node deletes subscriptions in Federation cluster #2960
Comments
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
@gitforxh , I'm a bit confused here - you mentioned 2 servers in HA mode (so one active, the other standby), but I see your clustering configuration is "on-demand-sharing" - and this setting is more for implementing horizontal scalability with multiple active locations. |
@bogdan-iancu We're trying to implement the "Federating scenario with redundancy" described in https://blog.opensips.org/2018/03/27/clustering-presence-services-with-opensips-2-4/, and we're starting with 1 pair only and see how it goes. And we're following the recommended settings in the doc for the cluster_federation_mode parameter: modparam("presence", "cluster_federation_mode", 1) Isn't the value '1' corresponding to "on-demand-sharing" list here? |
@gitforxh , thanks for the input . The original blog post is for 2.4, while in 3.2 things were changed a bit (to be more straight forward as usage). and yes, the And reviewing the scenario, I agree that the backup node (inside a location) should actually do nothing upon receiving a replicated PUBLISH via clustering - somehow the presence module should know which is the sharing-tag controlling the active-backup mode and if the tag is inactive, it should ignore the data received via clustering. LEt me do more thinking on this, to see what's the best way to get this in place, without bloating it too much. |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details. |
@bogdan-iancu Any update on this? Can you leave this ticket open? |
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days. |
Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details. |
well, almost 1 year later, but better than never :P . |
OpenSIPS version you are running
Describe the bug
We have two opensips instances configured as active-backup HA pair in a federation cluster mode.
The active node has following settings:
The backup node has following settings:
This is the entries in clusterer table:
The VIP 69.168.214.69 is configured on the active node_id 1.
We have phones sending REGISTER and SUBSCRIBE (for BLF) requests to VIP on node 1. And the subscriptions are processed by node 1 and stored into the active_watchers table.
When there's call going on, we have a Presence server sending PUBLISH request to VIP on node 1, which does following:
At the same time, node 1 also broadcast the PUBLISH request to node 2, which then does following things that we think it's not supposed to do:
What's worse is that after a few failed sending attempts, the backup node 2 probably thinks that the subscriber is unreachable, it DELETEs the subscription in the active_watchers table!!!
Expected behavior
To our understanding, the backup node is not supposed to do any of the above behaviours. Since it's just a BACKUP node,
It's NOT supposed to handle the PUBLISH request.
It's NOT supposed to insert into presentity table.
It's NOT supposed to query the active_watchers table.
It's NOT supposed to send NOTIFY to subscribers.
It's NOT supposed to delete any entry in the active_watchers table.
Why is it doing all above at all? Are we configuring the cluster wrong or missing any key settings?
The text was updated successfully, but these errors were encountered: