Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash in sssd-kcm due to a race-condition between two concurrent requests #4402

Closed
sssd-bot opened this issue May 2, 2020 · 0 comments
Closed
Assignees
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.

Comments

@sssd-bot
Copy link

sssd-bot commented May 2, 2020

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3372


It seems to be use after free.

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache

valgrind error:

==15846== 1 errors in context 1 of 1:
==15846== Invalid read of size 4
==15846==    at 0x5BFB6B9: talloc_chunk_from_ptr (talloc.c:452)
==15846==    by 0x5BFB6B9: __talloc_get_name (talloc.c:1486)
==15846==    by 0x5BFB6B9: talloc_check_name (talloc.c:1509)
==15846==    by 0x120F58: kcm_op_queue_add (kcmsrv_op_queue.c:136)
==15846==    by 0x120F58: kcm_op_queue_send (kcmsrv_op_queue.c:223)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)
==15846==    by 0x110856: main (kcm.c:313)
==15846==  Address 0xdccbea0 is 544 bytes inside a block of size 773 free'd
==15846==    at 0x4C2FCC8: free (vg_replace_malloc.c:530)
==15846==    by 0x5C024B3: _tc_free_poolmem (talloc.c:1000)
==15846==    by 0x5C024B3: _tc_free_internal (talloc.c:1141)
==15846==    by 0x5BFAAA7: _tc_free_children_internal (talloc.c:1593)
==15846==    by 0x5BFAAA7: _tc_free_internal (talloc.c:1104)
==15846==    by 0x5BFAAA7: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAAA7: _talloc_free (talloc.c:1716)
==15846==    by 0x59ECFC0: tevent_req_received (tevent_req.c:255)
==15846==    by 0x59ECFD8: tevent_req_destructor (tevent_req.c:107)
==15846==    by 0x5BFAF30: _tc_free_internal (talloc.c:1078)
==15846==    by 0x5BFAF30: _talloc_free_internal (talloc.c:1174)
==15846==    by 0x5BFAF30: _talloc_free (talloc.c:1716)
==15846==    by 0x111711: kcm_cmd_request_done (kcmsrv_cmd.c:391)
==15846==    by 0x11E846: kcm_op_get_cache_uuid_list_done (kcmsrv_ops.c:1303)
==15846==    by 0x11B1F7: ccdb_sec_list_done (kcmsrv_ccache_secrets.c:1147)
==15846==    by 0x119DB0: sec_list_done (kcmsrv_ccache_secrets.c:224)
==15846==    by 0x12495A: tcurl_request_done (tev_curl.c:746)
==15846==    by 0x12495A: handle_curlmsg_done (tev_curl.c:234)
==15846==    by 0x12495A: process_curl_activity.isra.0 (tev_curl.c:245)
==15846==    by 0x124E4B: tcurlsock_input_available (tev_curl.c:288)
==15846==  Block was alloc'd at
==15846==    at 0x4C2EB1B: malloc (vg_replace_malloc.c:299)
==15846==    by 0x5BFE03B: __talloc_with_prefix (talloc.c:698)
==15846==    by 0x5BFE03B: _talloc_pool (talloc.c:752)
==15846==    by 0x5BFE03B: _talloc_pooled_object (talloc.c:820)
==15846==    by 0x59ECCAA: _tevent_req_create (tevent_req.c:73)
==15846==    by 0x120C43: kcm_op_queue_send (kcmsrv_op_queue.c:206)
==15846==    by 0x12072F: kcm_cmd_send (kcmsrv_ops.c:162)
==15846==    by 0x1115B1: kcm_cmd_dispatch (kcmsrv_cmd.c:364)
==15846==    by 0x1115B1: kcm_recv (kcmsrv_cmd.c:512)
==15846==    by 0x1115B1: kcm_fd_handler (kcmsrv_cmd.c:600)
==15846==    by 0x59F1A4F: epoll_event_loop (tevent_epoll.c:728)
==15846==    by 0x59F1A4F: epoll_event_loop_once (tevent_epoll.c:930)
==15846==    by 0x59EFEC6: std_event_loop_once (tevent_standard.c:114)
==15846==    by 0x59EBCAC: _tevent_loop_once (tevent.c:721)
==15846==    by 0x59EBECA: tevent_common_loop_wait (tevent.c:844)
==15846==    by 0x59EFE66: std_event_loop_wait (tevent_standard.c:145)
==15846==    by 0x7C2D0F2: server_loop (server.c:718)

Part of sssd-kcm.log

(Tue Apr 18 08:30:48:243213 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data 0xdcd93f0
(Tue Apr 18 08:30:48:246461 2017) [sssd[kcm]] [handle_curlmsg_done] (0x0400): Handled http://localhost/kcm/persistent/1000/ccache/
(Tue Apr 18 08:30:48:247437 2017) [sssd[kcm]] [tcurl_request_done] (0x0400): TCURL request finished [0]: Success
(Tue Apr 18 08:30:48:249128 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: -1
(Tue Apr 18 08:30:48:249732 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 0 outstanding requests
(Tue Apr 18 08:30:48:253592 2017) [sssd[kcm]] [sec_list_done] (0x2000): Found 0 items
(Tue Apr 18 08:30:48:262271 2017) [sssd[kcm]] [sec_list_done] (0x2000): list done
(Tue Apr 18 08:30:48:263827 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Found 2 ccaches
(Tue Apr 18 08:30:48:266425 2017) [sssd[kcm]] [ccdb_sec_list_done] (0x2000): Listing all caches done
(Tue Apr 18 08:30:48:269666 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 23cb4886-fa25-41e1-b065-327cc6f565c4
(Tue Apr 18 08:30:48:270029 2017) [sssd[kcm]] [kcm_debug_uuid] (0x4000): UUID: 901c7634-98ee-4e56-bd6c-84d6548d0f9f
(Tue Apr 18 08:30:48:270963 2017) [sssd[kcm]] [kcm_cmd_done] (0x0400): KCM operation GET_CACHE_UUID_LIST returned [0]: Success
(Tue Apr 18 08:30:48:272724 2017) [sssd[kcm]] [kcm_op_get_default_ccache_send] (0x1000): Getting client's default ccache
(Tue Apr 18 08:30:48:273393 2017) [sssd[kcm]] [ccdb_sec_get_default_send] (0x2000): Getting the default ccache
(Tue Apr 18 08:30:48:273723 2017) [sssd[kcm]] [tcurl_request_send] (0x0400): Sending TCURL request for http://localhost/kcm/persistent/1000/default, at socket /var/run/secrets.socket
(Tue Apr 18 08:30:48:273828 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:274587 2017) [sssd[kcm]] [kcm_send_reply] (0x2000): Sending a reply
(Tue Apr 18 08:30:48:274802 2017) [sssd[kcm]] [kcm_output_construct] (0x1000): Sending a reply with 36 bytes of payload
(Tue Apr 18 08:30:48:280941 2017) [sssd[kcm]] [handle_socket] (0x2000): Activity on curl socket 14 socket data (nil)
(Tue Apr 18 08:30:48:281043 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 0
(Tue Apr 18 08:30:48:281121 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:281227 2017) [sssd[kcm]] [schedule_fd_processing] (0x2000): timeout_ms: 4995
(Tue Apr 18 08:30:48:281319 2017) [sssd[kcm]] [check_curl_timeouts] (0x4000): Still tracking 1 outstanding requests
(Tue Apr 18 08:30:48:282299 2017) [sssd[kcm]] [kcm_send] (0x2000): All data sent!
(Tue Apr 18 08:30:48:282717 2017) [sssd[kcm]] [kcm_input_parse] (0x1000): Received message with length 4
(Tue Apr 18 08:30:48:282789 2017) [sssd[kcm]] [kcm_get_opt] (0x2000): The client requested operation 20
(Tue Apr 18 08:30:48:282870 2017) [sssd[kcm]] [kcm_cmd_send] (0x0400): KCM operation GET_DEFAULT_CACHE
(Tue Apr 18 08:30:48:282935 2017) [sssd[kcm]] [kcm_cmd_send] (0x1000): 0 bytes on KCM input
(Tue Apr 18 08:30:48:283032 2017) [sssd[kcm]] [kcm_op_queue_send] (0x0200): Adding request by 1000 to the wait queue
(Tue Apr 18 08:30:48:284040 2017) [sssd[kcm]] [talloc_log_fn] (0x0010): Bad talloc magic value - wrong talloc version used/mixed

Comments


Comment from lslebodn at 2017-04-18 18:39:26

I have a suspicion that it is partially related to expired keys.

sh]$ klist -l
Principal name                 Cache name
--------------                 ----------
lslebodn@EXAMPLE.COM           KCM:1000 (Expired)
lslebodn@ANOTHERREALM.ORG      KCM:1000:85269 (Expired)

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache

sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM:

Comment from jhrozek at 2017-04-19 21:45:16

I have a different theory. I think it's a race condition between the request that just finished, calls tevent_req_done which frees the queue entry and between the code that first fetches the queue entry from the hash table and then checks it. I think the entry is first returned from the hash table, then the request is freed and then the queue code checks the entry.


Comment from lslebodn at 2017-04-19 22:49:03

I doubt there was any other request. It is still possible.

But I haven't had a time to find reasonable reproducer.


Comment from lslebodn at 2017-04-20 07:15:46

You were probably right. Reasonable reproducer is to kinit from two different terminals with different realm at the same time.

1st terminal:

sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@EXAMPLE.COM
kinit: Credentials cache I/O operation failed while searching for ccache for lslebodn@EXAMPLE.COM
sh$ kinit lslebodn@EXAMPLE.COM
Password for lslebodn@EXAMPLE.COM:

2nd terminal

sh$ kinit lslebodn@ANOTHER.REALM
sh$ kinit lslebodn@ANOTHER.REALM
Password for lslebodn@ANOTHER.REALM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@ANOTHER.REALM
Password for lslebodn@ANOTHER.REALM: 
kinit: Pre-authentication failed: Password read interrupted while getting initial credentials
sh$ kinit lslebodn@ANOTHER.REALM
kinit: Credentials cache I/O operation failed while getting default ccache
sh$ kinit lslebodn@ANOTHER.REALM
kinit: Credentials cache I/O operation failed while getting default ccache

I might reproduce it more often because I run sssd-secrets and sssd-kcm with valgrind and therefore they are slower. But I am still not sure why it can occur even with single kinit operation; as it is described in ticket.


Comment from lslebodn at 2017-04-20 07:36:21

And it needn't be even a different realm. You can just renew the same ticket twice in parallel as a reasonable reproducer. Just ensure that ticker is not Expired :-)

e.g. for i in {1..2} ; do kinit -R & done


Comment from jhrozek at 2017-04-26 23:01:52

Metadata Update from @jhrozek:

  • Issue priority set to: critical
  • Issue set to the milestone: SSSD 1.15.3

Comment from jhrozek at 2017-04-27 17:43:08

Metadata Update from @jhrozek:


Comment from jhrozek at 2017-04-27 17:43:08

Metadata Update from @jhrozek:


Comment from jhrozek at 2017-04-27 17:43:08

Issue linked to Bugzilla: Bug 1446302


Comment from jhrozek at 2017-05-23 19:18:38

Metadata Update from @jhrozek:

  • Issue assigned to jhrozek

Comment from jhrozek at 2017-05-23 19:19:07

Metadata Update from @jhrozek:

  • Custom field patch adjusted to on

Comment from jhrozek at 2017-05-24 16:07:02

master:


Comment from jhrozek at 2017-05-24 16:07:44

Metadata Update from @jhrozek:

  • Issue close_status updated to: Fixed
  • Issue status updated to: Closed (was: Open)

Comment from lslebodn at 2017-05-24 16:16:23

Metadata Update from @lslebodn:

  • Custom field version adjusted to 1.1.5.3
@sssd-bot sssd-bot added Bugzilla Closed: Fixed Issue was closed as fixed. labels May 2, 2020
@sssd-bot sssd-bot closed this as completed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.
Projects
None yet
Development

No branches or pull requests

2 participants