[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

creslin2877 · 2023-11-30T17:36:36Z

Improvement Description

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails).

In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect().

This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings.

This also would make rapid failover (in the clustered DB scenario) much quicker.

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails). In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect(). This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings. This also would make rapid failover (in the clustered DB scenario) much quicker. This patch changes the locking in _ast_odbc_request_obj2() to not lock around odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to lock around truly shared, non-immutable state like the connection_cnt member and the connections list on struct odbc_class. Fixes: asterisk#465

creslin2877 · 2023-11-30T18:13:12Z

cherry-pick-to: 21
cherry-pick-to: 20
cherry-pick-to: 18

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails). In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect(). This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings. This also would make rapid failover (in the clustered DB scenario) much quicker. This patch changes the locking in _ast_odbc_request_obj2() to not lock around odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to lock around truly shared, non-immutable state like the connection_cnt member and the connections list on struct odbc_class. Fixes: #465

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails). In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect(). This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings. This also would make rapid failover (in the clustered DB scenario) much quicker. This patch changes the locking in _ast_odbc_request_obj2() to not lock around odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to lock around truly shared, non-immutable state like the connection_cnt member and the connections list on struct odbc_class. Fixes: #465 (cherry picked from commit 058ead0)

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails). In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect(). This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings. This also would make rapid failover (in the clustered DB scenario) much quicker. This patch changes the locking in _ast_odbc_request_obj2() to not lock around odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to lock around truly shared, non-immutable state like the connection_cnt member and the connections list on struct odbc_class. Fixes: #465 (cherry picked from commit bfac394)

There are valid scenarios where res_odbc's connection pool might have some dead or stuck connections while others are healthy (imagine network elements/firewalls/routers silently timing out connections to a single DB and a single IP address, or a heterogeneous connection pool connected to potentially multiple IPs/instances of a replicated DB using a DNS front end for load balancing and one replica fails). In order to time out those unhealthy connections without blocking access to other parts of Asterisk that may attempt access to the connection pool, it would be beneficial to not lock/block access around the entire pool in _ast_odbc_request_obj2 while doing potentially blocking operations on connection pool objects such as the connection_dead() test, odbc_obj_connect(), or by dereferencing a struct odbc_obj for the last time and triggering a odbc_obj_disconnect(). This would facilitate much quicker and concurrent timeout of dead connections via the connection_dead() test, which could block potentially for a long period of time depending on odbc.ini or other odbc connector specific timeout settings. This also would make rapid failover (in the clustered DB scenario) much quicker. This patch changes the locking in _ast_odbc_request_obj2() to not lock around odbc_obj_connect(), _disconnect(), and connection_dead(), while continuing to lock around truly shared, non-immutable state like the connection_cnt member and the connections list on struct odbc_class. Fixes: #465 (cherry picked from commit e0bf65b)

creslin2877 added improvement triage labels Nov 30, 2023

jcolp added support-level-core Functionality with core support level and removed triage labels Nov 30, 2023

jcolp assigned creslin2877 Nov 30, 2023

creslin287 mentioned this issue Nov 30, 2023

res_odbc.c: Allow concurrent access to request odbc connections #466

Merged

asterisk-org-access-app bot closed this as completed in #466 Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

creslin2877 commented Nov 30, 2023

creslin2877 commented Nov 30, 2023

[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

[improvement]: Change res_odbc connection pool request logic to not lock around blocking operations #465

Comments

creslin2877 commented Nov 30, 2023

Improvement Description

creslin2877 commented Nov 30, 2023