Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/bpf: fix bug where second value can be skipped. #26583

Merged
merged 2 commits into from
Jul 14, 2023

Commits on Jul 9, 2023

  1. pkg/bpf: fix bug where bpf map entries may not be reliably dumped.

    DumpReliablyWithCallback will skip a value callback in some situations.
    This may result in incorrect cilium map dumps or garbage collection.
    
    In situations where the initial key is deleted just after being retrieved, there is no previous key to fallback on.
    The reliable dump will attempt to use the current nextKey (that was based on the deleted current
    key).
    
    The local currentKey and nextKey Key types are being passed to NextKey
    which eventually writes the nextKey output to the nextKeys pointer
    location (via the bpf syscall).
    
    The currentValue was simply being assinged by the equals operator, which
    was copying the underlying interface pointer.
    
    Thus in this situation, the next iteration attempt was passing the same
    pointer twice to NextKey causing the currentKey to be set to the next
    key a second time - skipping a map element.
    
    Fixes cilium#26491
    
    Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
    tommyp1ckles committed Jul 9, 2023
    Configuration menu
    Copy the full SHA
    80d49b0 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2023

  1. pkg/bpf: add additional unit test to avoid dump bugs/regressions.

    This test is an improvement on the previous TestDumpReliablyWithCallback test.
    The goal of this one is to provide more robust testing of the reliable dump mechanism.
    Specifically, it does the following:
    
     1. Creates a map with a small number of entries, populate it with [1, maxEntries)
     2. Start a goroutine that continuously dumps the map and checks that the dump contains
        all odd elements in the range [1, maxEntries).
     3. At the same time, start another goroutine that continuously deletes and re-adds even
        elements in the map.
    
    The motivation here is to provide a test that better catches regressions in code that is inherently prone to race-condition.
    
    This creates a situation where we have interleaved updates and dumps, we're interested
    in ensuring that each dump contains all odd elements in the range [1, maxEntries).
    This will catch bugs and regressions related where elements are skipped in the dump.
    
    For example, while running this without the fix: 74f71841e9c037ddd10bedc3128f3b28cb023597 this will fail a majority of the time.
    Following this fix, this test should always pass.
    
    This was tested locally by running this several times with a million iterations - both with the fix and without.
    For practical purposes we will lower the number of iterations to 1000 to avoid slowing down the test suite too much.
    
    Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
    tommyp1ckles committed Jul 12, 2023
    Configuration menu
    Copy the full SHA
    84ce10e View commit details
    Browse the repository at this point in the history