Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect synchronization in mmap cache #2736

Closed
sssd-bot opened this issue May 2, 2020 · 0 comments
Closed

Incorrect synchronization in mmap cache #2736

sssd-bot opened this issue May 2, 2020 · 0 comments
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.

Comments

@sssd-bot
Copy link

sssd-bot commented May 2, 2020

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/1694


https://bugzilla.redhat.com/show_bug.cgi?id=883429 (Fedora)

sss_nss_check_header() in src/sss_client/nss_mc_common.c lacks a memory
barrier.  As a result, this loop:

    /* retry barrier protected reading max 5 times then give up */
    for (count = 5; count > 0; count--) {
        memcpy(&h, ctx->mmap_base, sizeof(struct sss_mc_header));
        if (MC_VALID_BARRIER(h.b1) && h.b1 == h.b2) {
            /* record is consistent so we can proceed */
            break;
        }
    }
    if (count == 0) {
        /* couldn't successfully read header we have to give up */
        return EIO;
    }

is compiled to:

   0x0000000000004d20 <+0>:     mov    0x10(%rdi),%rdx
   0x0000000000004d24 <+4>:     mov    %rbx,-0x28(%rsp)
   0x0000000000004d29 <+9>:     mov    $0x5,%eax
   0x0000000000004d2e <+14>:    mov    %rbp,-0x20(%rsp)
   0x0000000000004d33 <+19>:    mov    %r12,-0x18(%rsp)
   0x0000000000004d38 <+24>:    mov    %r13,-0x10(%rsp)
   0x0000000000004d3d <+29>:    mov    %r14,-0x8(%rsp)
   0x0000000000004d42 <+34>:    mov    (%rdx),%esi
   0x0000000000004d44 <+36>:    mov    0x28(%rdx),%ebp
   0x0000000000004d47 <+39>:    mov    0x20(%rdx),%ebx
   0x0000000000004d4a <+42>:    mov    0x30(%rdx),%r8d
   0x0000000000004d4e <+46>:    mov    0x8(%rdx),%r9d
   0x0000000000004d52 <+50>:    mov    0xc(%rdx),%r11d
   0x0000000000004d56 <+54>:    mov    0x10(%rdx),%r12d
   0x0000000000004d5a <+58>:    mov    0x1c(%rdx),%r13d
   0x0000000000004d5e <+62>:    mov    %esi,%ecx
   0x0000000000004d60 <+64>:    mov    0x4(%rdx),%r10d
   0x0000000000004d64 <+68>:    mov    0x14(%rdx),%r14d
   0x0000000000004d68 <+72>:    and    $0xff000000,%ecx
   0x0000000000004d6e <+78>:    cmp    $0xf0000000,%ecx
   0x0000000000004d74 <+84>:    je     0x4da0 <sss_nss_check_header+128>
   0x0000000000004d76 <+86>:    sub    $0x1,%eax
   0x0000000000004d79 <+89>:    jne    0x4d6e <sss_nss_check_header+78>
   0x0000000000004d7b <+91>:    mov    $0x5,%eax
   0x0000000000004d80 <+96>:    mov    -0x28(%rsp),%rbx
   0x0000000000004d85 <+101>:   mov    -0x20(%rsp),%rbp
   0x0000000000004d8a <+106>:   mov    -0x18(%rsp),%r12
   0x0000000000004d8f <+111>:   mov    -0x10(%rsp),%r13
   0x0000000000004d94 <+116>:   mov    -0x8(%rsp),%r14
   0x0000000000004d99 <+121>:   retq
   0x0000000000004d9a <+122>:   nopw   0x0(%rax,%rax,1)
   0x0000000000004da0 <+128>:   cmp    %r8d,%esi

That is, %ecx is never reloaded from memory.

In sss_nss_mc_get_record(), __sync_synchronize() is needed before the final
barrier check.

sss_mc_add_rec_to_chain() in src/responder/nss/nsssrv_mmap_cache.c contains
this comment:

    /* changing a single uint32_t is atomic, so there is no
     * need to use barriers in this case */

I think the comment is misleading because it's not just atomicity that matters,
ordering can also be relevant.  However, in this particular case, it does not
appear to matter in what order the writes to the links in the hash chain
happen.

A client can pick up a reference to a hash table slot which is about to be
invalidated, resulting in a lookup failure when the record is actually present
in the cache.  This is probably not a problem for this application.

You need to deal with counter overflow in your barriers (the ABA problem).  One
way to do this is to switch to switch to a different file when it happens, and
rename it over the existing file.  Concurrent readers will still have a
consistent view.

It seems you use this scheme because you need wait-free writers, so that the
privileged process which updates the cache does not block on readers.  Correct?
Otherwise, there are probably simpler approaches.

Comments


Comment from simo at 2012-12-05 21:42:53

Fields changed

blockedby: =>
blocking: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
milestone: NEEDS_TRIAGE => SSSD 1.9.3
owner: somebody => simo
status: new => assigned
testsupdated: => 0


Comment from simo at 2012-12-05 22:43:15

Fields changed

patch: 0 => 1


Comment from simo at 2012-12-05 22:49:22

A synchronization instruction right after the memcpy effectively fixes this problem.
This is the new code:

Dump of assembler code for function sss_nss_check_header:
   0x00000000000056dc <+0>:	push   %rbp
   0x00000000000056dd <+1>:	mov    %rsp,%rbp
   0x00000000000056e0 <+4>:	mov    %rdi,-0x48(%rbp)
   0x00000000000056e4 <+8>:	movl   $0x5,-0x4(%rbp)
   0x00000000000056eb <+15>:	jmp    0x574a <sss_nss_check_header+110>
   0x00000000000056ed <+17>:	mov    -0x48(%rbp),%rax
   0x00000000000056f1 <+21>:	mov    0x10(%rax),%rax
   0x00000000000056f5 <+25>:	mov    (%rax),%rdx
   0x00000000000056f8 <+28>:	mov    %rdx,-0x40(%rbp)
   0x00000000000056fc <+32>:	mov    0x8(%rax),%rdx
   0x0000000000005700 <+36>:	mov    %rdx,-0x38(%rbp)
   0x0000000000005704 <+40>:	mov    0x10(%rax),%rdx
   0x0000000000005708 <+44>:	mov    %rdx,-0x30(%rbp)
   0x000000000000570c <+48>:	mov    0x18(%rax),%rdx
   0x0000000000005710 <+52>:	mov    %rdx,-0x28(%rbp)
   0x0000000000005714 <+56>:	mov    0x20(%rax),%rdx
   0x0000000000005718 <+60>:	mov    %rdx,-0x20(%rbp)
   0x000000000000571c <+64>:	mov    0x28(%rax),%rdx
   0x0000000000005720 <+68>:	mov    %rdx,-0x18(%rbp)
   0x0000000000005724 <+72>:	mov    0x30(%rax),%eax
   0x0000000000005727 <+75>:	mov    %eax,-0x10(%rbp)
   0x000000000000572a <+78>:	mfence 
   0x000000000000572d <+81>:	mov    -0x40(%rbp),%eax
   0x0000000000005730 <+84>:	and    $0xff000000,%eax
   0x0000000000005735 <+89>:	cmp    $0xf0000000,%eax
   0x000000000000573a <+94>:	jne    0x5746 <sss_nss_check_header+106>
   0x000000000000573c <+96>:	mov    -0x40(%rbp),%edx
   0x000000000000573f <+99>:	mov    -0x10(%rbp),%eax
   0x0000000000005742 <+102>:	cmp    %eax,%edx
   0x0000000000005744 <+104>:	je     0x5752 <sss_nss_check_header+118>
   0x0000000000005746 <+106>:	subl   $0x1,-0x4(%rbp)
   0x000000000000574a <+110>:	cmpl   $0x0,-0x4(%rbp)
   0x000000000000574e <+114>:	jg     0x56ed <sss_nss_check_header+17>
   0x0000000000005750 <+116>:	jmp    0x5753 <sss_nss_check_header+119>
   0x0000000000005752 <+118>:	nop
   0x0000000000005753 <+119>:	cmpl   $0x0,-0x4(%rbp)
   0x0000000000005757 <+123>:	jne    0x5763 <sss_nss_check_header+135>
   0x0000000000005759 <+125>:	mov    $0x5,%eax
   0x000000000000575e <+130>:	jmpq   0x5852 <sss_nss_check_header+374>
   0x0000000000005763 <+135>:	mov    -0x3c(%rbp),%eax
   0x0000000000005766 <+138>:	cmp    $0x1,%eax
   ....

Patch sent to the list.

I have not changed anything for ABA because updates are so rare that it is effectively impossible for any meaningful client to run into this issue IMO.


Comment from jhrozek at 2012-12-05 23:27:59

resolution: => fixed
status: assigned => closed


Comment from simo at 2012-12-11 18:39:08

The fix was not complete, the Fedora bugzilla has been reopened as well.

resolution: fixed =>
status: closed => reopened


Comment from simo at 2012-12-11 21:05:41

Fields changed

milestone: SSSD 1.9.3 => SSSD 1.9.4
patch: 1 => 0
version: => 1.9.3


Comment from dpal at 2012-12-13 04:38:34

Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=886757

rhbz: [https://bugzilla.redhat.com/show_bug.cgi?id=883429 883429] => [https://bugzilla.redhat.com/show_bug.cgi?id=883429 883429], [https://bugzilla.redhat.com/show_bug.cgi?id=886757 886757]


Comment from jhrozek at 2012-12-13 20:45:58

resolution: => fixed
status: reopened => closed


Comment from jhrozek at 2017-02-24 15:05:57

Metadata Update from @jhrozek:

  • Issue assigned to simo
  • Issue set to the milestone: SSSD 1.9.4
@sssd-bot sssd-bot added Bugzilla Closed: Fixed Issue was closed as fixed. labels May 2, 2020
@sssd-bot sssd-bot closed this as completed May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bugzilla Closed: Fixed Issue was closed as fixed.
Projects
None yet
Development

No branches or pull requests

1 participant