Skip to content

fixes race condition in metadata consistency check#3392

Merged
keith-turner merged 2 commits intoapache:2.1from
keith-turner:accumulo-3386
May 11, 2023
Merged

fixes race condition in metadata consistency check#3392
keith-turner merged 2 commits intoapache:2.1from
keith-turner:accumulo-3386

Conversation

@keith-turner
Copy link
Contributor

While looking into #3386 I noticed the Accumulo metadata consistency check was incrementing a counter in the incorrect place. It should increment before writing to the metadata table, but it does not. This could cause the check to report false postives. The false positive in the case would be transient and should not repeat on subsequent checks.

Also noticed a redundant check when deciding if the file should be added to the set of in memory files. AFICT this redundant check is harmless, but it could cause problems for future changes.

While looking into apache#3386 I noticed the Accumulo metadata consistency
check was incrementing a counter in the incorrect place.  It should
increment before writing to the metadata table, but it does not.
This could cause the check to report false postives.  The false
positive in the case would be transient and should not repeat on
subsequent checks.

Also noticed a redundant check when deciding if the file should be
added to the set of in memory files.  AFICT this redundant check
is harmless, but it could cause problems for future changes.
@keith-turner
Copy link
Contributor Author

Best to ignore whitespace when looking at the diffs

@keith-turner
Copy link
Contributor Author

I am trying to trigger the false positive with the following test.

I set the following properties.

tserver.compaction.minor.concurrent.max=20
tserver.health.check.interval=1ms

Then I run the following code in jshell

    client.tableOperations().create("foo");
    
    var bw = client.createBatchWriter("foo");

    SortedSet<Text> splits = new TreeSet<>();

    IntStream.range(1, 100).mapToObj(i -> String.format("%03d", i)).map(Text::new)
        .forEach(splits::add);

    client.tableOperations().addSplits("foo", splits);

    while (true) {
      IntStream.range(1, 100).mapToObj(i -> String.format("%03d", i)).forEach(row -> {
        Mutation mutation = new Mutation(row);
        mutation.put("f1", "q1", "v1");
        try {
          bw.addMutation(mutation);
        } catch (Exception e) {
          throw new RuntimeException(e);
        }
      });
      bw.flush();
      client.tableOperations().flush("foo", null, null, true);
    }

I am seeing around 40 minor compactions per second while this test is running against 2.1.0. Not seeing it run into the race condition yet. I think I need to more tservers doing more checks against more tablets to increase the probability of bumping into it. I am running it under Uno.

@keith-turner
Copy link
Contributor Author

Looking at the code I noticed the increment for the bulk import code is also in the wrong place. Going to push a fix for that.

@keith-turner keith-turner merged commit 617bfcb into apache:2.1 May 11, 2023
@ctubbsii ctubbsii linked an issue May 12, 2023 that may be closed by this pull request
@ctubbsii ctubbsii added this to the 2.1.1 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accumulo tablets are getting out of sync with metadata

3 participants