Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

Closed
creslin2877 opened this issue Nov 30, 2023 · 1 comment · Fixed by #466
Assignees
Labels
improvement support-level-core Functionality with core support level

Comments

@creslin2877
Copy link

Improvement Description

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

@jcolp jcolp added support-level-core Functionality with core support level and removed triage labels Nov 30, 2023
creslin287 added a commit to creslin287/asterisk that referenced this issue Nov 30, 2023
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: asterisk#465
@creslin2877
Copy link
Author

cherry-pick-to: 21
cherry-pick-to: 20
cherry-pick-to: 18

asterisk-org-access-app bot pushed a commit that referenced this issue Dec 6, 2023
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
asterisk-org-access-app bot pushed a commit that referenced this issue Dec 6, 2023
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
asterisk-org-access-app bot pushed a commit that referenced this issue Dec 6, 2023
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
asterisk-org-access-app bot pushed a commit that referenced this issue Dec 6, 2023
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
(cherry picked from commit 058ead0)
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
(cherry picked from commit bfac394)
asterisk-org-access-app bot pushed a commit that referenced this issue Jan 12, 2024
There are valid scenarios where res_odbc's connection pool might have some dead
or stuck connections while others are healthy (imagine network
elements/firewalls/routers silently timing out connections to a single DB and a
single IP address, or a heterogeneous connection pool connected to potentially
multiple IPs/instances of a replicated DB using a DNS front end for load
balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to
other parts of Asterisk that may attempt access to the connection pool, it would
be beneficial to not lock/block access around the entire pool in
_ast_odbc_request_obj2 while doing potentially blocking operations on connection
pool objects such as the connection_dead() test, odbc_obj_connect(), or by
dereferencing a struct odbc_obj for the last time and triggering a
odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections
via the connection_dead() test, which could block potentially for a long period
of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

This patch changes the locking in _ast_odbc_request_obj2() to not lock around
odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to
lock around truly shared, non-immutable state like the connection_cnt member and
the connections list on struct odbc_class.

Fixes: #465
(cherry picked from commit e0bf65b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement support-level-core Functionality with core support level
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants