New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeradius v3.0.15 segfaults on (some) looped requests #2055

Closed
madrarua opened this Issue Sep 4, 2017 · 7 comments

Comments

Projects
None yet
3 participants
@madrarua

madrarua commented Sep 4, 2017

Issue type

  • Questions about the server or its usage should be posted to the users mailing list.
  • Remote security exploits MUST be sent to security@freeradius.org.
  • Defect - Crash or memory corruption.
  • Defect - Non compliance with a standards document, or incorrect API usage.
  • Defect - Unexpected behaviour (obvious or verified by project member).
  • Feature request.

See here for debugging instructions and how to obtain backtraces.

NOTE: PATCHES GO IN PULL REQUESTS. IF YOU SUBMIT A DIFF HERE, THE DEVELOPMENT TEAM WILL HUNT YOU DOWN AND BEAT YOU OVER THE HEAD WITH YOUR OWN KEYBOARD.

Defect/Feature description

An EAP request was proxied from a Freeradius v3.0.15 server (operating as an eduroam national gateway) to another Freeradius v3.0.15 server (an eduroam IdP). Due to a misconfiguration on the IdP server it proxied the request onwards/back to the national gateway. The request looped back and forth until Freeradius segfault'ed on the national gateway.

The segfault was seen to occur on requests for three different usernames within the IdP’s realm. However, while the segfault occurred only for these three usernames, not every Radius request for one of these usernames triggered the segfault. The IdP was successfully handling/terminating EAP requests for other usernames within its realm throughout this time, its misconfiguration caused it to loop requests for some usernames only.

How to reproduce issue

I haven't been able to reproduce the segfault. I used eapol_test to send an EAP request to the national gateway and this triggered the loop but did not lead to a segfault. I used one of the
three usernames seen on requests that did trigger the segfault, but my test request didn't replicate all of the Radius attributes seen on the problem requests (my test request contained just the default attributes that eapol_test generates).

I set “allow_core_dumps = yes” in radiusd.conf and restarted the service, but no core file was produced on a segfault. Making the service stable was the priority so I was unable to make time to resolve this, and as a consequence I have no core dump to pass on.

Below are relevant sections of various Freeradius and system log files at the time of a segfault occurring.

System kern.log file

The affected server is running Ubuntu 14.04 LTS, here are some of the incidents recorded in its kern.log file:

Aug 18 08:59:10 hostname kernel: radiusd[22790]: segfault at 78 ip 0000000000432a0c sp 00007fff355055a0 error 4 in radiusd[400000+63000]
Aug 18 09:32:28 hostname kernel: radiusd[53744]: segfault at 8c ip 0000000000432c2e sp 00007ffccdf5a110 error 6 in radiusd[400000+63000]
Aug 18 09:36:51 hostname kernel: radiusd[59176]: segfault at 78 ip 0000000000432a0c sp 00007ffd338e2ed0 error 4 in radiusd[400000+63000]
Aug 19 06:31:12 hostname kernel: radiusd[59791]: segfault at 78 ip 0000000000432a0c sp 00007fff5bd62e50 error 4 in radiusd[400000+63000]
Aug 19 06:34:42 hostname kernel: radiusd[59016]: segfault at 78 ip 0000000000432a0c sp 00007ffe44affec0 error 4 in radiusd[400000+63000]

...note that the timestamp is incorrect, each time this segfault occurred the timestamp logged against the entries in kern.log were several minutes or more into the future. The server is ntp-sync'ed (stratum 2) and all Freeradius logs seem to have accurate timestamps at the time of the incidents.

radius.log

In radius.log there is nothing logged at the time of segfault, but the following lines are representative of lines that recur repeatedly before a segfault. Nothing further appears in radius.log until the service is manually restarted:

Sat Aug 19 06:28:52 2017 : Proxy: (1247108) Marking home server x.x.x.x port 1812 alive
Sat Aug 19 06:28:52 2017 : Proxy: (1247108) Received response to status check 1247108 ID 206 (1 in current sequence)
Sat Aug 19 06:28:52 2017 : Proxy: No outstanding request was found for Access-Reject packet from host x.x.x.x port 1812 - ID 129
Sat Aug 19 06:28:52 2017 : Proxy: Marking home server x.x.x.x port 1812 as zombie (it has not responded in 20.000000 seconds).
Sat Aug 19 06:28:52 2017 : Auth: (1244376) Login incorrect (Home Server failed to respond): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)
Sat Aug 19 06:28:52 2017 : ERROR: (1244402) ERROR: Failing proxied request for user "xxx@xxx.ie", due to lack of any response from home server x.x.x.x port 1812
Sat Aug 19 06:28:52 2017 : Proxy: No outstanding request was found for Access-Reject packet from host x.x.x.x port 1812 - ID 217
Sat Aug 19 06:28:52 2017 : Proxy: No outstanding request was found for Access-Reject packet from host x.x.x.x port 1812 - ID 28
Sat Aug 19 06:28:52 2017 : Proxy: No outstanding request was found for Access-Reject packet from host x.x.x.x port 1812 - ID 56
Sat Aug 19 06:28:52 2017 : Auth: (1244376) Login incorrect (Failing proxied request for user "xxx@xxx.ie", due to lack of any response from home server x.x.x.x port 1812): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)
Sat Aug 19 06:28:52 2017 : Proxy: No outstanding request was found for Access-Reject packet from host x.x.x.x port 1812 - ID 129
Sat Aug 19 06:28:52 2017 : Auth: (1244399) Login incorrect (Home Server failed to respond): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)
Sat Aug 19 06:28:52 2017 : Auth: (1244253) Login incorrect (Failing proxied request for user "jxxx@xxx.ie", due to lack of any response from home server x.x.x.x port 1812): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)
Sat Aug 19 06:28:52 2017 : Auth: (1244255) Login incorrect (Home Server failed to respond): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)
Sat Aug 19 06:28:52 2017 : Auth: (1244252) Login incorrect (Home Server failed to respond): [xxx@xxx.ie] (from client radius2.xxx.ie port 0 cli xx:xx:xx:xx:xx:xx)

tcpdump output

The following is the output of a simple tcpdump I was running that captured traffic to/from port 1812/udp at the time of a segfault. From this it seems that my server stopped responding to Radius requests at 06:28:52:

2017-08-19 06:28:52.012898 IP hostname.heanet.ie.57907 > radius2.xxx.ie.radius: RADIUS, Status-Server (12), id: 0xce length: 70
2017-08-19 06:28:52.013528 IP radius2.xxx.ie.49928 > hostname.heanet.ie.radius: RADIUS, Status-Server (12), id: 0x18 length: 70
2017-08-19 06:28:52.014055 IP radius2.xxx.ie.radius > hostname.heanet.ie.57907: RADIUS, Access-Accept (2), id: 0xce length: 20
2017-08-19 06:28:52.014232 IP radius2.xxx.ie.radius > hostname.heanet.ie.39021: RADIUS, Access-Reject (3), id: 0x81 length: 25
2017-08-19 06:28:52.014486 IP hostname.heanet.ie.radius > xxx.surfnet.nl.20280: RADIUS, Access-Reject (3), id: 0x6b length: 20
2017-08-19 06:28:52.014907 IP hostname.heanet.ie.radius > xxx.surfnet.nl.20280: RADIUS, Access-Reject (3), id: 0x6c length: 20
2017-08-19 06:28:52.014968 IP radius2.xxx.ie.radius > hostname.heanet.ie.35745: RADIUS, Access-Reject (3), id: 0xd9 length: 25
2017-08-19 06:28:52.015152 IP radius2.xxx.ie.radius > hostname.heanet.ie.46303: RADIUS, Access-Reject (3), id: 0x1c length: 33
2017-08-19 06:28:52.015229 IP hostname.heanet.ie.50511 > radius2.xxx.ie.radius: RADIUS, Status-Server (12), id: 0x9e length: 70
2017-08-19 06:28:52.015297 IP radius2.xxx.ie.radius > hostname.heanet.ie.44047: RADIUS, Access-Reject (3), id: 0x38 length: 33
2017-08-19 06:28:52.015308 IP radius2.xxx.ie.radius > hostname.heanet.ie.56198: RADIUS, Access-Reject (3), id: 0x81 length: 25
2017-08-19 06:28:52.015384 IP hostname.heanet.ie.radius > radius2.xxx.ie.49928: RADIUS, Access-Accept (2), id: 0x18 length: 20
2017-08-19 06:28:52.015512 IP radius2.xxx.ie.radius > hostname.heanet.ie.56428: RADIUS, Access-Reject (3), id: 0xa7 length: 34
2017-08-19 06:28:52.015832 IP hostname.heanet.ie.radius > radius2.xxx.ie.38851: RADIUS, Access-Reject (3), id: 0x46 length: 30
2017-08-19 06:28:52.015870 IP radius2.xxx.ie.radius > hostname.heanet.ie.57907: RADIUS, Access-Reject (3), id: 0xb2 length: 35
2017-08-19 06:28:52.016032 IP radius2.xxx.ie.radius > hostname.heanet.ie.34376: RADIUS, Access-Reject (3), id: 0x0f length: 34
2017-08-19 06:28:52.016104 IP hostname.heanet.ie.radius > radius2.xxx.ie.55321: RADIUS, Access-Reject (3), id: 0x1e length: 29
2017-08-19 06:28:52.016238 IP radius2.xxx.ie.radius > hostname.heanet.ie.38275: RADIUS, Access-Reject (3), id: 0x5d length: 42
2017-08-19 06:28:52.016402 IP radius2.xxx.ie.radius > hostname.heanet.ie.44047: RADIUS, Access-Reject (3), id: 0x1e length: 42
2017-08-19 06:28:52.016413 IP radius2.xxx.ie.radius > hostname.heanet.ie.52310: RADIUS, Access-Reject (3), id: 0x0d length: 44
2017-08-19 06:28:52.016582 IP radius2.xxx.ie.radius > hostname.heanet.ie.56428: RADIUS, Access-Reject (3), id: 0x64 length: 42
2017-08-19 06:28:52.016591 IP radius2.xxx.ie.radius > hostname.heanet.ie.50511: RADIUS, Access-Accept (2), id: 0x9e length: 20
2017-08-19 06:28:52.017092 IP radius2.xxx.ie.38851 > hostname.heanet.ie.radius: RADIUS, Status-Server (12), id: 0x69 length: 70
2017-08-19 06:28:52.017283 IP radius2.xxx.ie.radius > hostname.heanet.ie.44047: RADIUS, Access-Reject (3), id: 0x3c length: 44
2017-08-19 06:28:52.017301 IP radius2.xxx.ie.radius > hostname.heanet.ie.44047: RADIUS, Access-Reject (3), id: 0xa9 length: 52
2017-08-19 06:28:52.017461 IP radius2.xxx.ie.radius > hostname.heanet.ie.46302: RADIUS, Access-Reject (3), id: 0x2f length: 50
2017-08-19 06:28:52.017625 IP radius2.xxx.ie.radius > hostname.heanet.ie.46078: RADIUS, Access-Reject (3), id: 0x83 length: 50
2017-08-19 06:28:52.017686 IP radius2.xxx.ie.radius > hostname.heanet.ie.59189: RADIUS, Access-Reject (3), id: 0x00 length: 53
2017-08-19 06:28:52.018710 IP radius2.xxx.ie.radius > hostname.heanet.ie.37506: RADIUS, Access-Reject (3), id: 0xa8 length: 53
2017-08-19 06:28:52.019253 IP radius2.xxx.ie.radius > hostname.heanet.ie.44409: RADIUS, Access-Reject (3), id: 0xdc length: 61
2017-08-19 06:28:52.019262 IP radius2.xxx.ie.radius > hostname.heanet.ie.53436: RADIUS, Access-Reject (3), id: 0x29 length: 61
2017-08-19 06:28:52.019266 IP radius2.xxx.ie.radius > hostname.heanet.ie.41359: RADIUS, Access-Reject (3), id: 0x56 length: 60
2017-08-19 06:28:52.019307 IP radius2.xxx.ie.radius > hostname.heanet.ie.44047: RADIUS, Access-Reject (3), id: 0xee length: 59
2017-08-19 06:28:52.019783 IP radius2.xxx.ie.radius > hostname.heanet.ie.40457: RADIUS, Access-Reject (3), id: 0x48 length: 63
2017-08-19 06:28:52.020138 IP radius2.xxx.ie.radius > hostname.heanet.ie.47554: RADIUS, Access-Reject (3), id: 0x21 length: 69
2017-08-19 06:28:52.020319 IP radius2.xxx.ie.radius > hostname.heanet.ie.43638: RADIUS, Access-Reject (3), id: 0x3e length: 71
2017-08-19 06:28:52.020681 IP radius2.xxx.ie.radius > hostname.heanet.ie.49351: RADIUS, Access-Reject (3), id: 0x0e length: 68
2017-08-19 06:28:52.020861 IP radius2.xxx.ie.radius > hostname.heanet.ie.35669: RADIUS, Access-Reject (3), id: 0xd4 length: 70
2017-08-19 06:28:52.021223 IP radius2.xxx.ie.radius > hostname.heanet.ie.57894: RADIUS, Access-Reject (3), id: 0xc4 length: 71

auth-detail

The following is what was logged via auth_log for a Radius request that I believe triggered the segfault, this is the last request recorded from the problem IdP before the Freeradius service was manually restarted:

Sat Aug 19 06:28:46 2017
Packet-Type = Access-Request
User-Name = “xxx@xxx.ie"
Chargeable-User-Identity = 0x00
Calling-Station-Id = “xx:xx:xx:xx:xx:xx”
Acct-Session-Id = "5997cc61/xx:xx:xx:xx:xx:xx/8109919"
NAS-IP-Address = x.x.x.x
NAS-Identifier = "WISM2_CTY_Cont1"
EAP-Message = 0x02010013016a6d7572706879407463642e6965
Message-Authenticator = 0x9702c5069098724acecebd1a8ca49ec8
Operator-Name = "1xxx.ac.uk"
Proxy-State = 0x323035
Proxy-State = 0x313237
Proxy-State = 0x3631
Proxy-State = 0x313633
Proxy-State = 0x34
Proxy-State = 0x3632
Proxy-State = 0x313731
Proxy-State = 0x313930
Proxy-State = 0x313736
Proxy-State = 0x313632
Proxy-State = 0x313832
Proxy-State = 0x3533
Proxy-State = 0x313437
Proxy-State = 0x3734
Proxy-State = 0x313330
Proxy-State = 0x3639
Proxy-State = 0x313236
Proxy-State = 0x323432
Proxy-State = 0x3538
Proxy-State = 0x313735
Proxy-State = 0x323032
Proxy-State = 0x36
Proxy-State = 0x313537
Proxy-State = 0x3637
Proxy-State = 0x313839
Proxy-State = 0x3830
Proxy-State = 0x32
Proxy-State = 0x313736
Proxy-State = 0x323037
Proxy-State = 0x313830
Proxy-State = 0x3831
Proxy-State = 0x323233
Proxy-State = 0x323339
Proxy-State = 0x323031
Proxy-State = 0x313539
Proxy-State = 0x313032
Proxy-State = 0x313036
Proxy-State = 0x313936
Proxy-State = 0x313135
Proxy-State = 0x3538
Proxy-State = 0x313239
Proxy-State = 0x313836
Proxy-State = 0x323533
Proxy-State = 0x313631
Proxy-State = 0x31
Proxy-State = 0x313037
Proxy-State = 0x3738
Proxy-State = 0x313937
Proxy-State = 0x313832
Proxy-State = 0x3130
Proxy-State = 0x313136
Proxy-State = 0x313236
Proxy-State = 0x323433
Proxy-State = 0x3737
Proxy-State = 0x313536
Proxy-State = 0x3934
Proxy-State = 0x313330
Proxy-State = 0x3430
Proxy-State = 0x3632
Proxy-State = 0x313630
Proxy-State = 0x3136
Proxy-State = 0x313439
Proxy-State = 0x323330
Proxy-State = 0x323339
Proxy-State = 0x323532
Proxy-State = 0x323038
Proxy-State = 0x313839
Proxy-State = 0x3435
Proxy-State = 0x313437
Proxy-State = 0x323236
Proxy-State = 0x313039
Proxy-State = 0x37
Proxy-State = 0x3536
Proxy-State = 0x3831
Proxy-State = 0x3430
Proxy-State = 0x313534
Proxy-State = 0x313239
Proxy-State = 0x3335
Proxy-State = 0x313039
Proxy-State = 0x313734
Proxy-State = 0x323430
Proxy-State = 0x323030
Proxy-State = 0x3330
Proxy-State = 0x313434
Proxy-State = 0x313436
Proxy-State = 0x3738
Proxy-State = 0x323439
Proxy-State = 0x313331
Proxy-State = 0x313633
Proxy-State = 0x3838
Proxy-State = 0x3835
Proxy-State = 0x313137
Proxy-State = 0x3931
Proxy-State = 0x3838
Proxy-State = 0x323132
Proxy-State = 0x313333
Proxy-State = 0x323035
Proxy-State = 0x3131
Proxy-State = 0x3237
Proxy-State = 0x30
Proxy-State = 0x313737
Proxy-State = 0x3937
Proxy-State = 0x313330
Proxy-State = 0x3634
Proxy-State = 0x3633
Proxy-State = 0x3837
Proxy-State = 0x313832
Proxy-State = 0x313937
Proxy-State = 0x313033
Proxy-State = 0x313531
Proxy-State = 0x3236
Proxy-State = 0x323135
Proxy-State = 0x323337
Proxy-State = 0x323035
Proxy-State = 0x313439
Proxy-State = 0x313332
Proxy-State = 0x323134
Proxy-State = 0x313435
Proxy-State = 0x3939
Proxy-State = 0x313539
Proxy-State = 0x3331
Proxy-State = 0x3236
Proxy-State = 0x313333
Proxy-State = 0x323239
Proxy-State = 0x3239
Proxy-State = 0x313237
Proxy-State = 0x3537
Proxy-State = 0x3236
Proxy-State = 0x313432
Proxy-State = 0x313032
Proxy-State = 0x3231
Proxy-State = 0x3638
Proxy-State = 0x3431
Proxy-State = 0x3135
Proxy-State = 0x323533
Proxy-State = 0x313034
Proxy-State = 0x3837
Proxy-State = 0x3336
Proxy-State = 0x313135
Proxy-State = 0x323130
Proxy-State = 0x313530
Proxy-State = 0x313732
Proxy-State = 0x3135
Proxy-State = 0x313537
Proxy-State = 0x313330
Proxy-State = 0x3838
Proxy-State = 0x3236
Proxy-State = 0x323337
Proxy-State = 0x323234
Proxy-State = 0x3736
Proxy-State = 0x313136
Proxy-State = 0x3736
Proxy-State = 0x323330
Proxy-State = 0x313533
Proxy-State = 0x3333
Proxy-State = 0x313734
Proxy-State = 0x313230
Proxy-State = 0x313233
Proxy-State = 0x323136
Proxy-State = 0x3235
Proxy-State = 0x313533
Proxy-State = 0x3438
Proxy-State = 0x323338
Proxy-State = 0x313935
Proxy-State = 0x323137
Proxy-State = 0x313536
Proxy-State = 0x313034
Proxy-State = 0x3939
Proxy-State = 0x313331
Proxy-State = 0x323437
Proxy-State = 0x313737
Proxy-State = 0x323030
Proxy-State = 0x323331
Proxy-State = 0x313334
Proxy-State = 0x36
Proxy-State = 0x3533
Proxy-State = 0x323038
Proxy-State = 0x323432
Proxy-State = 0x313631
Proxy-State = 0x313739
Proxy-State = 0x323036
Proxy-State = 0x3533
Proxy-State = 0x3730
Proxy-State = 0x3731
Proxy-State = 0x39
Proxy-State = 0x313239
Proxy-State = 0x313038
Proxy-State = 0x323435
Proxy-State = 0x3835
Event-Timestamp = "Aug 19 2017 06:28:46 IST"
Proxy-State = 0x323030
Timestamp = 1503120526

By comparison, here is what was logged via auth_log for my test request, which triggered the loop but didn't trigger the segfault:

Mon Aug 21 16:55:29 2017
Packet-Type = Access-Request
User-Name = “xxx@xxx.Ie"
NAS-IP-Address = 127.0.0.1
Calling-Station-Id = "02-00-00-00-00-01"
EAP-Message = 0x020000130166626f6c616e64407463642e4965
Message-Authenticator = 0x4dc9a5257ec0ccca320b0314e15f10bc
Operator-Name = "1heanet.ie"
Proxy-State = 0x30
Proxy-State = 0x313133
Proxy-State = 0x313432
Proxy-State = 0x323237
Proxy-State = 0x323430
Proxy-State = 0x323235
Proxy-State = 0x313132
Proxy-State = 0x313638
Proxy-State = 0x3536
Proxy-State = 0x3833
Proxy-State = 0x33
Proxy-State = 0x3637
Proxy-State = 0x3735
Proxy-State = 0x313832
Proxy-State = 0x3231
Proxy-State = 0x3932
Proxy-State = 0x323435
Proxy-State = 0x313437
Proxy-State = 0x323339
Proxy-State = 0x313835
Proxy-State = 0x313836
Proxy-State = 0x313134
Proxy-State = 0x313333
Proxy-State = 0x313630
Proxy-State = 0x313639
Proxy-State = 0x313035
Proxy-State = 0x313537
Proxy-State = 0x3631
Proxy-State = 0x3339
Proxy-State = 0x313635
Proxy-State = 0x3336
Proxy-State = 0x3934
Proxy-State = 0x313135
Proxy-State = 0x323031
Proxy-State = 0x313239
Proxy-State = 0x313430
Proxy-State = 0x3531
Proxy-State = 0x313831
Proxy-State = 0x323038
Proxy-State = 0x3731
Proxy-State = 0x3235
Proxy-State = 0x31
Proxy-State = 0x313031
Proxy-State = 0x313833
Proxy-State = 0x3134
Proxy-State = 0x323239
Proxy-State = 0x3437
Proxy-State = 0x313339
Proxy-State = 0x323234
Proxy-State = 0x36
Proxy-State = 0x313631
Proxy-State = 0x323339
Proxy-State = 0x313430
Proxy-State = 0x323138
Proxy-State = 0x313434
Proxy-State = 0x323132
Proxy-State = 0x313435
Proxy-State = 0x313334
Proxy-State = 0x313230
Proxy-State = 0x313038
Proxy-State = 0x3136
Proxy-State = 0x3139
Proxy-State = 0x34
Proxy-State = 0x3830
Proxy-State = 0x3439
Proxy-State = 0x323234
Proxy-State = 0x3137
Proxy-State = 0x3338
Proxy-State = 0x323237
Proxy-State = 0x323532
Proxy-State = 0x313133
Proxy-State = 0x313131
Proxy-State = 0x323132
Proxy-State = 0x3333
Proxy-State = 0x313438
Proxy-State = 0x313431
Proxy-State = 0x3838
Proxy-State = 0x313739
Proxy-State = 0x3839
Proxy-State = 0x313332
Proxy-State = 0x3736
Proxy-State = 0x313438
Proxy-State = 0x313832
Proxy-State = 0x313636
Proxy-State = 0x323533
Proxy-State = 0x3239
Proxy-State = 0x323239
Proxy-State = 0x313037
Proxy-State = 0x313037
Proxy-State = 0x3232
Proxy-State = 0x323434
Proxy-State = 0x3937
Proxy-State = 0x3935
Proxy-State = 0x313436
Proxy-State = 0x3632
Proxy-State = 0x313435
Proxy-State = 0x313830
Proxy-State = 0x313330
Proxy-State = 0x323133
Proxy-State = 0x3332
Proxy-State = 0x323031
Proxy-State = 0x323432
Proxy-State = 0x323136
Proxy-State = 0x323132
Proxy-State = 0x313337
Proxy-State = 0x313239
Proxy-State = 0x323231
Proxy-State = 0x3636
Proxy-State = 0x323534
Proxy-State = 0x313036
Proxy-State = 0x313034
Proxy-State = 0x323130
Proxy-State = 0x313738
Proxy-State = 0x313434
Proxy-State = 0x3734
Proxy-State = 0x313834
Proxy-State = 0x313834
Proxy-State = 0x3430
Proxy-State = 0x3738
Proxy-State = 0x3531
Proxy-State = 0x313134
Proxy-State = 0x313135
Proxy-State = 0x323332
Proxy-State = 0x323334
Proxy-State = 0x323039
Proxy-State = 0x313138
Proxy-State = 0x313939
Proxy-State = 0x3432
Proxy-State = 0x3837
Proxy-State = 0x313137
Proxy-State = 0x313837
Proxy-State = 0x3532
Proxy-State = 0x313632
Proxy-State = 0x3534
Proxy-State = 0x313232
Proxy-State = 0x3938
Proxy-State = 0x3633
Proxy-State = 0x313230
Proxy-State = 0x3133
Proxy-State = 0x313537
Proxy-State = 0x3233
Proxy-State = 0x323039
Proxy-State = 0x323232
Proxy-State = 0x3435
Proxy-State = 0x3232
Proxy-State = 0x3439
Proxy-State = 0x3537
Proxy-State = 0x3339
Proxy-State = 0x3138
Proxy-State = 0x33
Proxy-State = 0x323130
Proxy-State = 0x323037
Proxy-State = 0x3434
Proxy-State = 0x313938
Proxy-State = 0x3332
Proxy-State = 0x313036
Proxy-State = 0x313130
Proxy-State = 0x323236
Proxy-State = 0x313635
Proxy-State = 0x323134
Proxy-State = 0x323333
Proxy-State = 0x313039
Proxy-State = 0x323137
Proxy-State = 0x3137
Proxy-State = 0x3139
Proxy-State = 0x313734
Proxy-State = 0x3538
Proxy-State = 0x313439
Proxy-State = 0x313732
Proxy-State = 0x313532
Proxy-State = 0x313332
Proxy-State = 0x3238
Proxy-State = 0x323337
Proxy-State = 0x313530
Proxy-State = 0x313433
Proxy-State = 0x313335
Proxy-State = 0x323238
Proxy-State = 0x313737
Proxy-State = 0x3836
Proxy-State = 0x313238
Proxy-State = 0x3830
Proxy-State = 0x3333
Proxy-State = 0x313136
Proxy-State = 0x313333
Proxy-State = 0x313934
Proxy-State = 0x3936
Proxy-State = 0x313734
Proxy-State = 0x323439
Proxy-State = 0x313431
Proxy-State = 0x3235
Proxy-State = 0x3835
Proxy-State = 0x323335
Event-Timestamp = "Aug 21 2017 16:55:29 IST"
Proxy-State = 0x313935
Timestamp = 1503330929

Output of [radiusd|freeradius] -X showing issue occurring

Unfortunately I didn’t have the opportunity to run Freeradius in debug mode. The issue was service-affecting and I had to eliminate the issue as a priority, which didn’t allow time to troubleshoot at this level.

(you may need to run [radiusd|freeradius] -fxx -l stdout if using eg RADIUS with TLS)
REMOVE SECTION IF FEATURE REQUEST

COPY/PASTE OUTPUT HERE (WITHIN BACKTICKS).  NO PASTEBIN (ET AL) LINKS!

Full backtrace from LLDB or GDB

REMOVE SECTION IF FEATURE REQUEST OR NOT CRASH/MEMORY CORRUPTION

COPY/PASTE OUTPUT HERE (WITHIN BACKTICKS).  NO PASTEBIN (ET AL) LINKS!
@alanbuxey

This comment has been minimized.

Show comment
Hide comment
@alanbuxey

alanbuxey Sep 4, 2017

Member
Member

alanbuxey commented Sep 4, 2017

@alandekok

This comment has been minimized.

Show comment
Hide comment
@alandekok

alandekok Sep 4, 2017

Member

without a gdb backtrace, it's difficult to tell what's going wrong here.

We've tested it with proxy loops, and it doesn't crash for us.

It may be useful to add a simple configuration of "server identity", which is used to create a Proxy-State attribute with a magic value. Then, if the server sees that value on input, it refuses to re-proxy the packet.

Member

alandekok commented Sep 4, 2017

without a gdb backtrace, it's difficult to tell what's going wrong here.

We've tested it with proxy loops, and it doesn't crash for us.

It may be useful to add a simple configuration of "server identity", which is used to create a Proxy-State attribute with a magic value. Then, if the server sees that value on input, it refuses to re-proxy the packet.

@alandekok

This comment has been minimized.

Show comment
Hide comment
@alandekok

alandekok Sep 4, 2017

Member

Most proxies should have something like:

if ("%{Proxy-State[#]}" > 20) {
   reject
}
Member

alandekok commented Sep 4, 2017

Most proxies should have something like:

if ("%{Proxy-State[#]}" > 20) {
   reject
}
@madrarua

This comment has been minimized.

Show comment
Hide comment
@madrarua

madrarua Sep 4, 2017

Thanks for the suggestions, I'll explore those.

And apologies for the lack of debug info, I appreciate that it probably makes it impossible to delve into this issue.

madrarua commented Sep 4, 2017

Thanks for the suggestions, I'll explore those.

And apologies for the lack of debug info, I appreciate that it probably makes it impossible to delve into this issue.

@madrarua

This comment has been minimized.

Show comment
Hide comment
@madrarua

madrarua Sep 27, 2017

Perhaps I should log this as a separate issue, but the following are details of a segfault occurring on another two Freeradius v3.0.15 servers that I manage. I suspect the cause here is the same as I first reported with this bug report, and if so then I was wrong to focus on the looping of requests as being anything other than a contributory factor.

I have one of these other servers compiled with --enable-developer and core dumps are enabled. Unfortunately though, it didn't dump core, but wrote the following to its log file:

Sat Sep 23 17:34:45 2017 : Auth: (632035) Login OK: [xxx@xxx.de] (from client radius1.corkcity.ie port 0 cli XX-XX-XX-XX-XX-XX)
Sat Sep 23 17:34:55 2017 : Auth: (632048) Login OK: [USER_A@REALM.ie] (from client hse-mater-radius1.hse.ie port 13 cli AA-AA-AA-AA-AA-AA)
Sat Sep 23 17:34:59 2017 : Auth: (632060) Login OK: [xxx@xxx.de] (from client radius1.corkcity.ie port 0 cli XX-XX-XX-XX-XX-XX)
Sat Sep 23 17:35:18 2017 : ERROR: (632009) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:18 2017 : Auth: (632009) Login incorrect (Home Server failed to respond): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:18 2017 : Proxy: No outstanding request was found for Access-Reject packet from host 1.2.3.4 port 1812 - ID 199
Sat Sep 23 17:35:18 2017 : Auth: (632009) Login incorrect (Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:28 2017 : Proxy: Marking home server 1.2.3.4 port 1812 as zombie (it has not responded in 20.000000 seconds).
Sat Sep 23 17:35:28 2017 : ERROR: (632023) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:28 2017 : Proxy: (632023) : Marking home server 1.2.3.4 port 1812 alive
Sat Sep 23 17:35:28 2017 : Proxy: (632073) Received response to status check 632073 ID 232 (1 in current sequence)
Sat Sep 23 17:35:28 2017 : Auth: (632023) Login incorrect (Home Server says so): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:28 2017 : Auth: (632023) Login incorrect (Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:34 2017 : ERROR: (632036) ERROR: Failing proxied request for user "USER_C@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:34 2017 : Auth: (632036) Login incorrect (Home Server failed to respond): [USER_C@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli CC-CC-CC-CC-CC-CC)
Sat Sep 23 17:35:34 2017 : Proxy: No outstanding request was found for Access-Reject packet from host 1.2.3.4 port 1812 - ID 124
Sat Sep 23 17:35:34 2017 : Auth: (632036) Login incorrect (Failing proxied request for user "USER_C@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_C@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli CC-CC-CC-CC-CC-CC)
Sat Sep 23 17:35:44 2017 : Auth: (632085) Login OK: [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:36:00 2017 : ERROR: (632072) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
CONSISTENCY CHECK FAILED src/main/process.c[2078]: Expected VALUE_PAIR "Proxy-To-Realm" to be parented by 0x1ac14a0 (REQUEST), instead parented by (nil) (NULL)

Talloc chunk lineage:
0x1ac14a0 (REQUEST) < 0x1ac11c0 (auth_listener_pool)
Talloc context level 0:
SOFT ASSERT FAILED src/lib/pair.c[2437]: 0
CAUGHT SIGNAL: Aborted
Backtrace of last 11 frames:
/usr/local/lib/libfreeradius-radius.so(fr_fault+0x12d)[0x7efd6d8f9720]
/usr/local/lib/libfreeradius-radius.so(fr_assert_cond+0x4c)[0x7efd6d8fa252]
/usr/local/lib/libfreeradius-radius.so(fr_pair_list_verify+0x11c)[0x7efd6d90bef5]
/usr/local/lib/libfreeradius-server.so(verify_request+0xa7)[0x7efd6db5ee7c]
/usr/local/sbin/radiusd[0x4414db]
/usr/local/sbin/radiusd[0x441751]
/usr/local/sbin/radiusd[0x442065]
/usr/local/sbin/radiusd[0x442d0d]
/usr/local/sbin/radiusd[0x43be66]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7efd6cc88184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7efd6c77bffd]
No panic action set

Perhaps there is enough info there to point towards an issue? If not then my next step may be to run this server in debug mode and wait for the issue to recur.

madrarua commented Sep 27, 2017

Perhaps I should log this as a separate issue, but the following are details of a segfault occurring on another two Freeradius v3.0.15 servers that I manage. I suspect the cause here is the same as I first reported with this bug report, and if so then I was wrong to focus on the looping of requests as being anything other than a contributory factor.

I have one of these other servers compiled with --enable-developer and core dumps are enabled. Unfortunately though, it didn't dump core, but wrote the following to its log file:

Sat Sep 23 17:34:45 2017 : Auth: (632035) Login OK: [xxx@xxx.de] (from client radius1.corkcity.ie port 0 cli XX-XX-XX-XX-XX-XX)
Sat Sep 23 17:34:55 2017 : Auth: (632048) Login OK: [USER_A@REALM.ie] (from client hse-mater-radius1.hse.ie port 13 cli AA-AA-AA-AA-AA-AA)
Sat Sep 23 17:34:59 2017 : Auth: (632060) Login OK: [xxx@xxx.de] (from client radius1.corkcity.ie port 0 cli XX-XX-XX-XX-XX-XX)
Sat Sep 23 17:35:18 2017 : ERROR: (632009) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:18 2017 : Auth: (632009) Login incorrect (Home Server failed to respond): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:18 2017 : Proxy: No outstanding request was found for Access-Reject packet from host 1.2.3.4 port 1812 - ID 199
Sat Sep 23 17:35:18 2017 : Auth: (632009) Login incorrect (Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:28 2017 : Proxy: Marking home server 1.2.3.4 port 1812 as zombie (it has not responded in 20.000000 seconds).
Sat Sep 23 17:35:28 2017 : ERROR: (632023) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:28 2017 : Proxy: (632023) : Marking home server 1.2.3.4 port 1812 alive
Sat Sep 23 17:35:28 2017 : Proxy: (632073) Received response to status check 632073 ID 232 (1 in current sequence)
Sat Sep 23 17:35:28 2017 : Auth: (632023) Login incorrect (Home Server says so): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:28 2017 : Auth: (632023) Login incorrect (Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:35:34 2017 : ERROR: (632036) ERROR: Failing proxied request for user "USER_C@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
Sat Sep 23 17:35:34 2017 : Auth: (632036) Login incorrect (Home Server failed to respond): [USER_C@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli CC-CC-CC-CC-CC-CC)
Sat Sep 23 17:35:34 2017 : Proxy: No outstanding request was found for Access-Reject packet from host 1.2.3.4 port 1812 - ID 124
Sat Sep 23 17:35:34 2017 : Auth: (632036) Login incorrect (Failing proxied request for user "USER_C@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812): [USER_C@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli CC-CC-CC-CC-CC-CC)
Sat Sep 23 17:35:44 2017 : Auth: (632085) Login OK: [USER_B@REALM.ie] (from client hse-svuh-radius1.hse.ie port 1 cli BB-BB-BB-BB-BB-BB)
Sat Sep 23 17:36:00 2017 : ERROR: (632072) ERROR: Failing proxied request for user "USER_B@REALM.ie", due to lack of any response from home server 1.2.3.4 port 1812
CONSISTENCY CHECK FAILED src/main/process.c[2078]: Expected VALUE_PAIR "Proxy-To-Realm" to be parented by 0x1ac14a0 (REQUEST), instead parented by (nil) (NULL)

Talloc chunk lineage:
0x1ac14a0 (REQUEST) < 0x1ac11c0 (auth_listener_pool)
Talloc context level 0:
SOFT ASSERT FAILED src/lib/pair.c[2437]: 0
CAUGHT SIGNAL: Aborted
Backtrace of last 11 frames:
/usr/local/lib/libfreeradius-radius.so(fr_fault+0x12d)[0x7efd6d8f9720]
/usr/local/lib/libfreeradius-radius.so(fr_assert_cond+0x4c)[0x7efd6d8fa252]
/usr/local/lib/libfreeradius-radius.so(fr_pair_list_verify+0x11c)[0x7efd6d90bef5]
/usr/local/lib/libfreeradius-server.so(verify_request+0xa7)[0x7efd6db5ee7c]
/usr/local/sbin/radiusd[0x4414db]
/usr/local/sbin/radiusd[0x441751]
/usr/local/sbin/radiusd[0x442065]
/usr/local/sbin/radiusd[0x442d0d]
/usr/local/sbin/radiusd[0x43be66]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7efd6cc88184]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7efd6c77bffd]
No panic action set

Perhaps there is enough info there to point towards an issue? If not then my next step may be to run this server in debug mode and wait for the issue to recur.

@alandekok

This comment has been minimized.

Show comment
Hide comment
@alandekok

alandekok Sep 27, 2017

Member

Maybe you have cisco_accounting_username_bug = yes set in the EAP module?

If so, a fix was added in commit e9097a3.

You can also run the server with radiusd -Xxx in debug mode. It will do additional sanity checks to ensure that the request and everything in it is sane. If something goes wrong, the server will die quickly with an assert, instead of randomly later.

It would also help to build the server with debugging symbols, so we can see which function in radiusd is calling verify_request

Member

alandekok commented Sep 27, 2017

Maybe you have cisco_accounting_username_bug = yes set in the EAP module?

If so, a fix was added in commit e9097a3.

You can also run the server with radiusd -Xxx in debug mode. It will do additional sanity checks to ensure that the request and everything in it is sane. If something goes wrong, the server will die quickly with an assert, instead of randomly later.

It would also help to build the server with debugging symbols, so we can see which function in radiusd is calling verify_request

@alandekok alandekok added the v3.0.x label Nov 3, 2017

@alandekok

This comment has been minimized.

Show comment
Hide comment
@alandekok

alandekok Mar 29, 2018

Member

I think I'll close this as old and likely no longer relevant. We've pushed a number of fixes since 3.0.15 was released, including a few talloc parenting issues. If this can be reproduced with the v3.0.x HEAD, then please re-open it.

Member

alandekok commented Mar 29, 2018

I think I'll close this as old and likely no longer relevant. We've pushed a number of fixes since 3.0.15 was released, including a few talloc parenting issues. If this can be reproduced with the v3.0.x HEAD, then please re-open it.

@alandekok alandekok closed this Mar 29, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment