Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError in StoreAppender.checkWritePositionHeaderNumber() #611

Closed
mcs6502 opened this issue Jun 5, 2019 · 9 comments
Closed

AssertionError in StoreAppender.checkWritePositionHeaderNumber() #611

mcs6502 opened this issue Jun 5, 2019 · 9 comments

Comments

@mcs6502
Copy link

mcs6502 commented Jun 5, 2019

I am having an assertion failure in StoreAppender's checkWritePositionHeaderNumber() method when the appender checks the position after having reset it.

@mcs6502
Copy link
Author

mcs6502 commented Jun 5, 2019

WARN [] [Thread-3] n.o.c.q.i.s.SCQIndexing - Took 263 us to linearScan by position from 0 to 0 = (0x0-0x0)=0
java.lang.AssertionError: ~~~~~~~~~~~~~~ thread: Thread-4 pos: 133901 header: 77532749627923 seq1: 531 seq2: 529
	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.checkWritePositionHeaderNumber(SingleChronicleQueueExcerpts.java:422)
	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.resetPosition(SingleChronicleQueueExcerpts.java:334)
	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.setCycle2(SingleChronicleQueueExcerpts.java:292)
	at net.openhft.chronicle.queue.impl.single.SingleChronicleQueueExcerpts$StoreAppender.cycle(SingleChronicleQueueExcerpts.java:560)
	at ChronicleAppenderCycleTest.lambda$runTest$1(ChronicleAppenderCycleTest.java:59)
	at ChronicleAppenderCycleTest.lambda$useAppender$2(ChronicleAppenderCycleTest.java:88)

@mcs6502
Copy link
Author

mcs6502 commented Jun 5, 2019

I've attached a test case that reproduces this issue -- it fails about 8 times out of 10. Confirmed that this failure occurs in the latest chronicle-queue 5.17.14.
ChronicleAppenderCycleTest.java.txt

@mcs6502
Copy link
Author

mcs6502 commented Jun 6, 2019

A race window might exist between the following two lines in StoreAppender's resetPosition() method:

                position(store.writePosition());
                …
                final long headerNumber = store.lastSequenceNumber(this);

(The store.lastSequenceNumber() could have moved as a result of another party appending to the queue at the same time.)

@dpisklov
Copy link
Contributor

@mcs6502 do you mind pasting the test case here? I can't access the attachment for some reason...

@mcs6502
Copy link
Author

mcs6502 commented Jun 10, 2019

Sure thing -- here it is inline:

import static org.junit.Assert.assertNull;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicReference;
import java.util.function.Consumer;

import org.junit.Test;

import net.openhft.chronicle.bytes.Bytes;
import net.openhft.chronicle.queue.ExcerptAppender;
import net.openhft.chronicle.queue.RollCycles;
import net.openhft.chronicle.queue.impl.single.SingleChronicleQueue;
import net.openhft.chronicle.queue.impl.single.SingleChronicleQueueBuilder;
import net.openhft.chronicle.wire.WireType;

/**
 * This test case replicates the assertion error in Chronicle StoreAppender's
 * checkWritePositionHeaderNumber() method.
 */
public class ChronicleAppenderCycleTest {

  private static final long LATCH_TIMEOUT_MS = 5000;

  @Test
  public void testAppenderCycle() throws IOException {
    String id = "testAppenderCycle";
    Bytes msg = Bytes.allocateDirect(64);
    int n = 20;
    for (int i = 0; i < n; ++i)
      runTest(id + '-' + i, msg);
  }

  private void runTest(String id, Bytes msg) throws IOException {
    Path path = Files.createTempDirectory(id);
    CountDownLatch steady = new CountDownLatch(2);
    CountDownLatch go = new CountDownLatch(1);
    CountDownLatch done = new CountDownLatch(1);
    int n = 468;

    AtomicReference<Throwable> thr1 = useAppender(path, appender -> {
      appender.cycle();
      for (int i = 0; i < n; ++i)
        appender.writeBytes(msg);
      steady.countDown();
      await(go, "go");
      for (int i = 0; i < n; ++i)
        appender.writeBytes(msg);
    }, done);

    AtomicReference<Throwable> thr2 = useAppender(path, appender -> {
      steady.countDown();
      await(go, "go");
      int m = 2 * n;
      for (int i = 0; i < m; ++i)
        appender.cycle();
    }, done);

    await(steady, "steady");
    go.countDown();
    await(done, "done");

    assertNull(thr1.get());
    assertNull(thr2.get());
  }

  private void await(CountDownLatch latch, String name) {
    try {
      latch.await(LATCH_TIMEOUT_MS, TimeUnit.MILLISECONDS);
    } catch (InterruptedException e) {
      throw new RuntimeException("Problem acquiring the \"" + name + "\" latch",
        e);
    }
  }

  private AtomicReference<Throwable> useAppender(Path path,
    Consumer<ExcerptAppender> tester, CountDownLatch done)
  {
    AtomicReference<Throwable> refThr = new AtomicReference<>();
    Thread thread = new Thread(() -> {
      try {
        SingleChronicleQueueBuilder builder = createBuilder(path);
        try (SingleChronicleQueue queue = builder.build()) {
          ExcerptAppender appender = queue.acquireAppender();
          tester.accept(appender);
        }
      } catch (Throwable e) {
        refThr.set(e);
      } finally {
        done.countDown();
      }
    });
    thread.setDaemon(true);
    thread.start();
    return refThr;
  }

  private SingleChronicleQueueBuilder createBuilder(Path path) {
    SingleChronicleQueueBuilder builder =
      SingleChronicleQueueBuilder.builder(path, WireType.FIELDLESS_BINARY);
    builder.rollCycle(RollCycles.DAILY);
    return builder;
  }
}

@dpisklov
Copy link
Contributor

@mcs6502 thanks for this, I fixed the issue now. Basically you've hit a corner case - normally users don't use cycle(), especially on a new appender, and cycle() operation had a (undesired) side effect - it also initialized roll cycle for appender if it's unset. I removed this logic, as I don't think read access should mutate anything.

@mcs6502
Copy link
Author

mcs6502 commented Jun 11, 2019

Thanks for the quick fix, Dmitry! I've done a build of the current master and can confirm that the assertion error during cycle() is fixed.
Just wanted to note that previously by calling cycle() on the appender, one could create an empty queue -- and initialise its files on disk -- if the queue didn't already exist. This was (still is?) required because tailers would fail to open the queue unless it had been created on disk first. If the queue creation is removed from the appender's cycle() then that change might break existing chronicle-queue code that relies on this behaviour.
Also I observed a couple of new test failures after I'd applied the fix (see below). This is on a Windows build with jdk 1.8.0_92, so I don't know if they are specific to my setup. The pretoucher test failure occurs less frequently than the other one.

  • ChronicleQueueMethodsWithoutParameters. test
  • PretoucherTest. shouldHandleEarlyCycleRoll
java.lang.IllegalStateException: Bytes not released properly 1
	at net.openhft.chronicle.bytes.BytesUtil.checkRegisteredBytes(BytesUtil.java:215)
	at net.openhft.chronicle.queue.ChronicleQueueTestBase.checkRegisteredBytes(ChronicleQueueTestBase.java:106)
java.lang.AssertionError: 
Expected: is <2>
     but: was <1>
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at net.openhft.chronicle.queue.impl.single.PretoucherTest.lambda$shouldHandleEarlyCycleRoll$1(PretoucherTest.java:105)
	at java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:110)
	at java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:557)
	at net.openhft.chronicle.queue.impl.single.PretoucherTest.shouldHandleEarlyCycleRoll(PretoucherTest.java:85)

Thanks!

@dpisklov
Copy link
Contributor

@mcs6502 The PretoucherTest is known to be flaky, so you can ignore it. Not sure about the other one, but can see that our CI builds are all passing including Windows and ARM.

As to the side effect you described that's exactly the undesired behaviour. There shouldn't really be a way to create a queue file without any data in it - tailer should only pick up files when there's data. I checked that there's no code in the queue itself depending on this logic, so overall the behaviour now is more predictable and appropriate. Thanks for reporting and testing!

PS just for our KYC - are you using Chronicle stuff commercially or for your own pet projects? And if commercially do you mind us asking who you work for?

@hft-team-city
Copy link
Collaborator

Released in Chronicle-Queue-5.17.15, BOM-2.17.279

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants