Skip to content

Commit

Permalink
cgrates: add failover mechanism
Browse files Browse the repository at this point in the history
  • Loading branch information
razvancrainea committed Jan 12, 2017
1 parent aa64739 commit e8f55f4
Show file tree
Hide file tree
Showing 7 changed files with 126 additions and 22 deletions.
53 changes: 39 additions & 14 deletions modules/cgrates/README
Expand Up @@ -26,6 +26,7 @@ Razvan Crainea
1.7.1. cgrates_engine (string)
1.7.2. bind_ip (string)
1.7.3. max_async_connections (integer)
1.7.4. retry_timeout (integer)

1.8. Exported Functions

Expand All @@ -50,13 +51,14 @@ Razvan Crainea
1.1. Set cgrates_engine parameter
1.2. Set bind_ip parameter
1.3. Set max_async_connections parameter
1.4. cgrates_acc() usage
1.5. cgrates_auth() usage
1.6. cgrates_cmd() usage
1.7. $cgr(name) usage
1.8. $cgrret usage
1.9. async cgrates_auth usage
1.10. async cgrates_cmd usage
1.4. Set retry_timeout parameter
1.5. cgrates_acc() usage
1.6. cgrates_auth() usage
1.7. cgrates_cmd() usage
1.8. $cgr(name) usage
1.9. $cgrret usage
1.10. async cgrates_auth usage
1.11. async cgrates_cmd usage

Chapter 1. Admin Guide

Expand Down Expand Up @@ -154,6 +156,17 @@ Chapter 1. Admin Guide
servers, but this is a feature one of the CGRateS component
does starting with newer versions.

Each CGRateS engine has assigned up to max_async_connections
connections, plus one used for synchronous commands. If a
connection fails (due to network issues, or server issues), it
is marked as closed and a new one is tried. If all connections
to that engine are down, then the entire engine is marked as
disabled, and a new engine is queried. After an engine is down
for more than retry_timeout seconds, OpenSIPS tries to connect
once again to that server. If it succeeds, that server is
enabled. Otherwise, the other engines are used, until none is
available and the command fails.

1.6. Dependencies

1.6.1. OpenSIPS Modules
Expand Down Expand Up @@ -212,6 +225,18 @@ modparam("cgrates", "bind_ip", "10.0.0.100")
modparam("cgrates", "max_async_connections", 20)
...

1.7.4. retry_timeout (integer)

The number of seconds after which a disabled connection/engine
is retried.

Default value is "60".

Example 1.4. Set retry_timeout parameter
...
modparam("cgrates", "retry_timeout", 120)
...

1.8. Exported Functions

1.8.1. cgrates_acc([flags[, account[, destination]]])
Expand Down Expand Up @@ -256,7 +281,7 @@ modparam("cgrates", "max_async_connections", 20)
This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
BRANCH_ROUTE and LOCAL_ROUTE.

Example 1.4. cgrates_acc() usage
Example 1.5. cgrates_acc() usage
...
if (!has_totag()) {
...
Expand Down Expand Up @@ -292,7 +317,7 @@ modparam("cgrates", "max_async_connections", 20)
This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
BRANCH_ROUTE and LOCAL_ROUTE.

Example 1.5. cgrates_auth() usage
Example 1.6. cgrates_auth() usage
...
if (!has_totag()) {
...
Expand Down Expand Up @@ -324,7 +349,7 @@ modparam("cgrates", "max_async_connections", 20)
This function can be used from REQUEST_ROUTE, FAILURE_ROUTE,
BRANCH_ROUTE and LOCAL_ROUTE.

Example 1.6. cgrates_cmd() usage
Example 1.7. cgrates_cmd() usage
...
# cgrate_auth("$fU", "$rU"); simulation
$cgr(Tenant) = $fd;
Expand Down Expand Up @@ -353,7 +378,7 @@ modparam("cgrates", "max_async_connections", 20)
pairs are moved in the dialog. Therefore the values will be
accessible along the dialog's lifetime.

Example 1.7. $cgr(name) usage
Example 1.8. $cgr(name) usage
...
if (!has_totag()) {
...
Expand All @@ -372,7 +397,7 @@ ounting

Returns the reply message of a CGRateS command in script.

Example 1.8. $cgrret usage
Example 1.9. $cgrret usage
...
cgrates_auth("$fU", "$rU");
xlog("Call is allowed to run $cgrret seconds\n");
Expand Down Expand Up @@ -405,7 +430,7 @@ ounting
headers, or it is not an initial INVITE.
* -5 - CGRateS returned an invalid message.

Example 1.9. async cgrates_auth usage
Example 1.10. async cgrates_auth usage
route {
...
async(cgrates_auth("$fU", "$rU"), auth_reply);
Expand Down Expand Up @@ -440,7 +465,7 @@ route [auth_reply]
* -3 - No suitable CGRateS server found. message type (not an
initial INVITE).

Example 1.10. async cgrates_cmd usage
Example 1.11. async cgrates_cmd usage
route {
...
$cgr(Tenant) = $fd;
Expand Down
7 changes: 4 additions & 3 deletions modules/cgrates/cgrates.c
Expand Up @@ -41,7 +41,7 @@
#include "cgrates_common.h"
#include "cgrates_engine.h"

int cgre_conn_tout = CGR_DEFAULT_CONN_TIMEOUT;
int cgre_retry_tout = CGR_DEFAULT_RETRY_TIMEOUT;
int cgrc_max_conns = CGR_DEFAULT_MAX_CONNS;
str cgre_bind_ip;

Expand Down Expand Up @@ -110,6 +110,7 @@ static param_export_t params[] = {
(void*)cgrates_set_engine },
{"bind_ip", STR_PARAM, &cgre_bind_ip.s },
{"max_async_connections", INT_PARAM, &cgrc_max_conns },
{"retry_timeout", INT_PARAM, &cgre_retry_tout },
{0, 0, 0}
};

Expand Down Expand Up @@ -213,8 +214,8 @@ static int fixup_cgrates_acc(void ** param, int param_no)

static int mod_init(void)
{
if (cgre_conn_tout < 0) {
LM_ERR("Invalid connection timeout to CGR engine\n");
if (cgre_retry_tout < 0) {
LM_ERR("Invalid retry connection timeout\n");
return -1;
}

Expand Down
2 changes: 1 addition & 1 deletion modules/cgrates/cgrates.h
Expand Up @@ -23,8 +23,8 @@
#define _CGRATES_H_

#define CGR_DEFAULT_PORT 2012 /* default port of the CGR Engine */
#define CGR_DEFAULT_CONN_TIMEOUT 500 /* default connection timeout (ms) */
#define CGR_DEFAULT_MAX_CONNS 10 /* maximum number of conections per process */
#define CGR_DEFAULT_RETRY_TIMEOUT 60 /* default timeout for re-connection */

#define CGR_BUFFER_SIZE 4096 /* buffer read size */

Expand Down
1 change: 1 addition & 0 deletions modules/cgrates/cgrates_common.h
Expand Up @@ -74,6 +74,7 @@ struct cgr_conn {
int fd;
char flags;
enum cgrc_state state;
time_t disable_time;
struct cgr_engine *engine;
struct json_tokener *jtok;
struct list_head list;
Expand Down
50 changes: 47 additions & 3 deletions modules/cgrates/cgrates_engine.c
Expand Up @@ -30,9 +30,28 @@ struct cgr_conn *cgr_get_free_conn(struct cgr_engine *e)
{
struct list_head *l;
struct cgr_conn *c;
time_t now = time(NULL);
int disabled_no = 0;

if (e->disable_time && e->disable_time + cgre_retry_tout > now)
return NULL;

list_for_each(l, &e->conns) {
c = list_entry(l, struct cgr_conn, list);
if (c->state == CGRC_CLOSED) {
if (c->disable_time + cgre_retry_tout < now) {
if (tcp_connect_blocking(c->fd, &c->engine->su.s, sockaddru_len(c->engine->su))<0){
LM_INFO("cannot connect to %.*s:%d\n", c->engine->host.len,
c->engine->host.s, c->engine->port);
c->disable_time = now;
} else {
c->state = CGRC_FREE;
e->disable_time = 0;
return c;
}
}
disabled_no++;
}
if (c->state == CGRC_FREE)
return c;
}
Expand All @@ -41,22 +60,46 @@ struct cgr_conn *cgr_get_free_conn(struct cgr_engine *e)
if (e->conns_no < cgrc_max_conns) {
if ((c = cgrc_new(e)) && cgrc_conn(c) >= 0) {
e->conns_no++;
e->disable_time = 0;
list_add(&c->list, &e->conns);
return c;
} else
LM_ERR("cannot create a new connection!\n");
}
LM_ERR("cannot create a new connection!\n");
} else {
LM_DBG("maximum async connections per process reached!\n");
}
if (disabled_no > 0) {
LM_INFO("Disabling CGRateS engine %.*s:%d for %ds\n",
e->host.len, e->host.s, e->port, cgre_retry_tout);
e->disable_time = now;
return NULL;
}
return cgr_get_default_conn(e);
}

struct cgr_conn *cgr_get_default_conn(struct cgr_engine *e)
{
time_t now = time(NULL);

if (e->disable_time && e->disable_time + cgre_retry_tout > now)
return NULL;

/* use the default connection */
if (e->default_con && e->default_con->state == CGRC_FREE) {
if (!e->default_con)
return NULL;
if (e->default_con->state == CGRC_FREE) {
LM_DBG("using default connection - running in sync mode!\n");
return e->default_con;
} else if (e->default_con->disable_time + cgre_retry_tout < now) {
if (tcp_connect_blocking(e->default_con->fd, &e->su.s, sockaddru_len(e->su))<0){
LM_INFO("cannot connect to %.*s:%d\n", e->host.len,
e->host.s, e->port);
e->default_con->disable_time = now;
} else {
e->default_con->state = CGRC_FREE;
e->disable_time = 0;
return e->default_con;
}
}
return NULL;
}
Expand Down Expand Up @@ -88,6 +131,7 @@ struct cgr_conn *cgrc_new(struct cgr_engine *e)
void cgrc_close(struct cgr_conn *c, int release)
{
c->state = CGRC_CLOSED;
c->disable_time = time(NULL);
/* clean whatever was left in the buffer */
json_tokener_reset(c->jtok);
if (release) {
Expand Down
3 changes: 2 additions & 1 deletion modules/cgrates/cgrates_engine.h
Expand Up @@ -28,6 +28,7 @@ struct cgr_engine {
short port;
str host;
union sockaddr_union su;
time_t disable_time;

struct cgr_conn *default_con;

Expand All @@ -50,7 +51,7 @@ extern struct list_head cgrates_engines;
#define CGRC_IS_DEFAULT(_c) ((_c)->flags & CGRF_DEFAULT)
#define CGRC_SET_DEFAULT(_c) (_c)->flags |= CGRF_DEFAULT

extern int cgre_conn_tout;
extern int cgre_retry_tout;
extern int cgrc_max_conns;
extern str cgre_bind_ip;
int cgrc_conn(struct cgr_conn *c);
Expand Down
32 changes: 32 additions & 0 deletions modules/cgrates/doc/cgrates_admin.xml
Expand Up @@ -119,6 +119,18 @@
balancing logic between the servers, but this is a feature one of the CGRateS
component does starting with newer versions.
</para>
<para>
Each CGRateS engine has assigned up to
<emphasis>max_async_connections</emphasis> connections, plus one
used for synchronous commands. If a connection fails (due to network
issues, or server issues), it is marked as closed and a new one is
tried. If all connections to that engine are down, then the entire
engine is marked as disabled, and a new engine is queried. After an
engine is down for more than <emphasis>retry_timeout</emphasis>
seconds, &osips; tries to connect once again to that server. If it
succeeds, that server is enabled. Otherwise, the other engines are
used, until none is available and the command fails.
</para>
</section>

<section>
Expand Down Expand Up @@ -221,6 +233,26 @@ modparam("cgrates", "bind_ip", "10.0.0.100")
...
modparam("cgrates", "max_async_connections", 20)
...
</programlisting>
</example>
</section>
<section>
<title><varname>retry_timeout</varname> (integer)</title>
<para>
The number of seconds after which a disabled connection/engine
is retried.
</para>
<para>
<emphasis>
Default value is <quote>60</quote>.
</emphasis>
</para>
<example>
<title>Set <varname>retry_timeout</varname> parameter</title>
<programlisting format="linespecific">
...
modparam("cgrates", "retry_timeout", 120)
...
</programlisting>
</example>
</section>
Expand Down

0 comments on commit e8f55f4

Please sign in to comment.