segv in rlm_sql_postgresql if server connection goes away #651

Closed
philmayers opened this Issue May 22, 2014 · 12 comments

Projects

None yet

2 participants

@philmayers
Member

If I do the following:

  1. Start freeradius
  2. Shutdown postgres
  3. Send a radius request

...the radius process gets a sigsegv at:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc4da727700 (LWP 6470)]
sql_query (handle=0x0, config=0x1cab050, 
    query=0x7fc4c40152e0 "select distinct groupname from (select * from netgroup where username=lower('nsg') or callingstationid='02:00:00:00:00:01' order by precedence,groupname) as a") at src/modules/rlm_sql/drivers/rlm_sql_postgresql/rlm_sql_postgresql.c:239
239     rlm_sql_postgres_conn_t *conn = handle->conn;

This is with v3.0.x git 4388926

Will dig into it.

@arr2036
Member
arr2036 commented May 22, 2014

hm, handle is NULL. That should of been caught. I suspect the issue is in rlm_sql itself.

@philmayers
Member

Backtrace:

#0  sql_query (handle=0x0, config=0x1cab050, 
    query=0x7fc4c40152e0 "select distinct groupname from (select * from netgroup where username=lower('nsg') or callingstationid='02:00:00:00:00:01' order by precedence,groupname) as a") at src/modules/rlm_sql/drivers/rlm_sql_postgresql/rlm_sql_postgresql.c:239
#1  0x00007fc4f0c5f072 in rlm_sql_select_query (handle=0x7fc4da725458, inst=0x1cab050, 
    query=0x7fc4c40152e0 "select distinct groupname from (select * from netgroup where username=lower('nsg') or callingstationid='02:00:00:00:00:01' order by precedence,groupname) as a") at src/modules/rlm_sql/sql.c:441
#2  0x00007fc4f0c5c746 in sql_get_grouplist (inst=0x1cab050, handle=0x0, request=0x7fc4c4004250, phead=0x7fc4da7254c8) at src/modules/rlm_sql/rlm_sql.c:476
#3  0x00007fc4f0c5ea5d in sql_groupcmp (instance=0x1cab050, request=0x7fc4c4004250, request_vp=<value optimized out>, check=0x7fc4c4014330, check_pairs=<value optimized out>, 
    reply_pairs=<value optimized out>) at src/modules/rlm_sql/rlm_sql.c:551
#4  0x00007fc4f16da832 in paircompare (request=0x7fc4c4004250, req_list=0x7fc4c4011350, check=0x7fc4c4014330, rep_list=0x0) at src/main/valuepair.c:531

handle is indeed NULL

Look like this in in a block doing:

 if (SQL-Group == "foo") {
 ...
 }
@arr2036
Member
arr2036 commented May 22, 2014

That helps, let me have a quick look.

@arr2036
Member
arr2036 commented May 22, 2014

Yeah, this code hasn't been fixed up, the handle should be a rlm_sql_handle_t **, else it won't deal with reconnects.

@arr2036
Member
arr2036 commented May 22, 2014

Could you try with 3f97db3

@arr2036 arr2036 closed this in fc092b3 May 22, 2014
@philmayers
Member

I still get a segv in f091b0c (one commit on from 87bc6c0 in 3.0.x)

Same place AFAICT:

0x00007ffff714de7a in sql_free_result (handle=0x0, config=0x8e35a0) at src/modules/rlm_sql/drivers/rlm_sql_postgresql/rlm_sql_postgresql.c:408
408     rlm_sql_postgres_conn_t *conn = handle->conn;

The debug under -X immediately before says:

(20)  sql_groupcmp
(20)  EXPAND %{%{Stripped-User-Name}:-%{%{User-Name}:-none}}
(20)     --> nsg
(20)  SQL-User-Name set to 'nsg'
rlm_sql (sql): Reserved connection (4)
(20)  EXPAND select ...
(20)     --> select ...
rlm_sql (sql): Executing query: 'select ...'
rlm_sql_postgresql: Status: PGRES_FATAL_ERROR
rlm_sql_postgresql: 57P01: ADMIN SHUTDOWN
rlm_sql (sql): Reconnecting (4)
rlm_sql_postgresql: Connecting using parameters: dbname=...
rlm_sql_postgresql: Connection failed: could not connect to server: Connection refused...
rlm_sql_postgresql: Socket destructor called, closing socket
rlm_sql_postgresql: Socket destructor called, closing socket

...then the "Reconnecting" repeats several times until:

rlm_sql (sql): Reconnecting (0)
rlm_sql_postgresql: Connecting using parameters: dbname=...
rlm_sql_postgresql: Connection failed: could not connect to server: Connection refused...
rlm_sql_postgresql: Socket destructor called, closing socket
rlm_sql_postgresql: Socket destructor called, closing socket
rlm_sql (sql): Failed to reconnect (0), no free connections are available

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff714de7a in sql_free_result (handle=0x0, config=0x8e35a0) at src/modules/rlm_sql/drivers/rlm_sql_postgresql/rlm_sql_postgresql.c:408
408     rlm_sql_postgres_conn_t *conn = handle->conn;

@arr2036
Member
arr2036 commented May 22, 2014

No, your previous issue was at:

#0  sql_query (handle=0x0, config=0x1cab050, 
    query=0x7fc4c40152e0 "select distinct groupname from (select * from netgroup where username=lower('nsg') or callingstationid='02:00:00:00:00:01' order by precedence,groupname) as a") at src/modules/rlm_sql/drivers/rlm_sql_postgresql/rlm_sql_postgresql.c:239

Which was really wrong. This is a slightly more reasonable SEGV.

@philmayers
Member

Oops yes. Do you want a separate issue opened?

@arr2036
Member
arr2036 commented May 22, 2014

No it's ok, this should be a simple fix.

@arr2036 arr2036 reopened this May 22, 2014
@arr2036
Member
arr2036 commented May 22, 2014

Please provide the full backtrace for the new issue though, it's not quite as obvious as I thought it would be.

@arr2036
Member
arr2036 commented May 22, 2014

nevermind, found it

@arr2036 arr2036 added a commit that referenced this issue May 22, 2014
@arr2036 arr2036 Need to check that rlm_sql_select_query != RLM_SQL_OK, < 0 is not eno…
…ugh as it may return RLM_SQL_RECONNECT Fixes #651
3f7c2ae
@arr2036 arr2036 added a commit that closed this issue May 22, 2014
@arr2036 arr2036 Need to check that rlm_sql_select_query != RLM_SQL_OK, < 0 is not eno…
…ugh as it may return RLM_SQL_RECONNECT Fixes #651
736d34e
@arr2036 arr2036 closed this in 736d34e May 22, 2014
@philmayers
Member

Yep, that's got it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment