Fix estimation of claimed batch length when truncating job batch activation records #8491

npepinpe · 2021-12-28T14:14:57Z

Description

Related issues

closes #5525

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

The changes are backwards compatibility with previous versions
If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/0.25) to the PR, in case that fails you need to create backports manually.

Testing:

There are unit/integration tests that verify all acceptance criterias of the issue
New tests are written to ensure backwards compatibility with further versions
The behavior is tested manually
The change has been verified by a QA run
The impact of the changes is verified by a benchmark

Documentation:

The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
New content is added to the release announcement

npepinpe · 2021-12-28T14:15:30Z

To review: the test is I think not very future proof and needs to be improved. It works now and shows that the fix is correct, but it may not be correct in the future and will become useless.

npepinpe · 2022-02-04T18:04:28Z

engine/src/main/java/io/camunda/zeebe/engine/processing/job/JobBatchCollector.java

+    final var jobCopyBuffer = new ExpandableArrayBuffer();
+    final var unwritableJob = new MutableReference<LargeJob>();
+
+    jobState.forEachActivatableJobs(


💭 This lambda is way too big imo. In fact, this method in general has a lot of state, and I think it would be good to decompose it.

npepinpe · 2022-02-15T21:00:25Z

dispatcher/src/test/java/io/camunda/zeebe/dispatcher/DispatcherTest.java

@@ -133,6 +133,34 @@ public void shouldNotClaimBeyondPublisherLimit() {
    verify(logBufferPartition0).getTailCounterVolatile();
  }

+  @Test
+  public void canClaimFragmentBatch() {


💭 I really dislike these tests, so I hope I can discuss with the reviewer a better approach.

npepinpe · 2022-02-15T21:01:22Z

engine/src/main/java/io/camunda/zeebe/engine/processing/incident/IncidentRecordWrapper.java

@@ -105,7 +105,7 @@ public long getRequestId() {
  }

  @Override
-  public long getLength() {
+  public int getLength() {


I switched these things from long to int mostly because we add a bunch of unnecessary casts everywhere. I'm not sure why we ever went with longs - did we expect ByteBuffer to eventually support more than 2GB? Do we want to write buffers > 2GB?

npepinpe · 2022-02-15T21:02:07Z

engine/src/main/java/io/camunda/zeebe/engine/processing/job/JobBatchCollector.java

+    return variables;
+  }
+
+  record LargeJob(long key, JobRecord record) {}


💭 Unsure about this, but it seems like we should already have a way to group a job and it's key

npepinpe · 2022-02-15T21:02:51Z

...n/java/io/camunda/zeebe/engine/processing/streamprocessor/writers/TypedStreamWriterImpl.java

+  /**
+   * This is not actually accurate, as the frame length needs to also be aligned by the same amount
+   * of bytes as the batch. However, this would break concerns here, i.e. the writer here would have
+   * to become Dispatcher aware.


💭 I guess the comment breaks the abstraction boundary as well 🤡

npepinpe · 2022-02-15T21:03:46Z

engine/src/test/java/io/camunda/zeebe/engine/processing/job/JobBatchCollectorTest.java

+import org.junit.jupiter.api.extension.ExtendWith;
+
+@ExtendWith(ZeebeStateExtension.class)
+final class JobBatchCollectorTest {


💭 I know we never quite finalized the discussion on unit tests and how close we want to test, but I do believe these tests have value if only that when they fail, they'll be much easier to diagnose than when the higher level engine tests fail.

npepinpe · 2022-02-15T21:04:27Z

engine/src/test/java/io/camunda/zeebe/engine/processing/variable/VariableBehaviorTest.java


+@ExtendWith(ZeebeStateExtension.class)


💭 Just a small refactoring, happy to extract this since it's not really in the scope here.

npepinpe · 2022-02-15T21:05:23Z

logstreams/src/main/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImpl.java

@@ -284,4 +292,8 @@ private void resetEvent() {
    bufferWriterInstance.reset();
    metadataWriterInstance.reset();
  }
+
+  private int computeBatchLength(final int eventsCount, final int eventsLength) {


💭 Extracted as it was important to me that we keep canWriteAdditionalEvent and tryWrite as consistent as possible together. I'm happy to get other suggestions though.

npepinpe · 2022-02-15T21:06:54Z

logstreams/src/test/java/io/camunda/zeebe/logstreams/impl/log/LogStreamBatchWriterImplTest.java

+  private final LogStreamBatchWriterImpl writer = new LogStreamBatchWriterImpl(1, dispatcher);
+
+  /**
+   * This test asserts that {@link LogStreamBatchWriterImpl#canWriteAdditionalEvent(int)} computes


👀 Would love some alternative suggestions here. I tried to get that if we ever change how a batch is claimed, i.e. what we pass as fragment count/batch length between tryWrite and canWriteAdditionalEvent, a test would fail and notify us, ensuring we are doing the change on purpose. However, I understand this is way too coupled to the implementation...I just couldn't think of something else.

npepinpe · 2022-02-15T21:07:22Z

...util/src/main/java/io/camunda/zeebe/test/broker/protocol/commandapi/PartitionTestClient.java

@@ -230,7 +230,7 @@ public void updateVariables(
            .command()
            .put("scopeKey", scopeKey)
            .put("updateSemantics", updateSemantics)
-            .put("document", MsgPackUtil.asMsgPack(document).byteArray())
+            .put("variables", MsgPackUtil.asMsgPack(document).byteArray())


Just a little mistake I noticed - we renamed the property at some point but never updated it here.

npepinpe · 2022-02-15T21:07:58Z

test-util/src/main/java/io/camunda/zeebe/test/util/asserts/EitherAssert.java

+ *
+ * @param <L> the left type
+ * @param <R> the right type
+ */
 public final class EitherAssert<L, R>


👀 I just saw we have another EitherAssert in the util module's test classes...should we use that one? Should we use the one here? What's our idea?

npepinpe · 2022-02-15T21:14:45Z

This is a big PR. We could split it in the following way:

Either and EitherAssert changes as one PR
Dispatcher and Logstreams addition as one PR
Engine fix as one PR (the main one)

Let me know what you think 👍

Right now it's kind of a mess, so I need to clean it up before setting it for review.

Correctly estimates the size of the batch claim required by the record that will be return in order to estimate when to cut off the job batch.

Adds a new public API method to the Dispatcher, `#canClaimFragmentBatch(int, int)`. This method determines whether the batch with the given fragment count and of the given length can actually be claimed for this dispatcher instance. This method is mostly useful to determine if something will be claimable before you wish to claim it, mostly to avoid having to abort your claim unnecessarily if the batch isn't finished yet when calling this method.

Replaces the API to get the batch length with an additional event with a simpler predicate method, `canWriteAdditionalEvent(int)`. This avoids breaking the abstractions and having the writer know how the dispatcher will compute the framed and aligned batch length, and instead simply delegates deciding whether the batch, with the additional event of the given length, can be written to the dispatcher or not.

npepinpe · 2022-02-16T15:18:13Z

Closing as this was extracted to #8797, #8798, and #8799. I'll delete the branch after these are merged.

8798: Add API to probe the logstream batch writer if more bytes can be written without writing them r=npepinpe a=npepinpe ## Description This PR adds a new API method, `LogStreamBatchWriter#canWriteAdditionalEvent(int)`. This allows users of the writer to probe if adding the given amount of bytes to the batch would cause it to become un-writable, without actually having to write anything to the batch, or even modify their DTO (e.g. the `TypedRecord<?>` in the engine). To avoid having dispatcher details leak into the implementation, an analogous method is added to the dispatcher, `Dispatcher#canClaimFragmentBatch(int, int)`, which will compare the given size, framed and aligned, with the max fragment length. This is the main building block to eventually solve #5525, and enable other use cases (e.g. multi-instance creation) which deal with large batches until we have a more permanent solution (e.g. chunking follow up batches). NOTE: the tests added in the dispatcher are not very good, but I couldn't come up with something else that wouldn't be too coupled to the implementation (i.e. essentially reusing `LogBufferAppender`). I would like some ideas/suggestions. NOTE: this PR comes out of the larger one, #8491. You can check that one out to see how the new API would be used, e.g. in the `JobBatchCollector`. As such, this is marked for backporting, since we'll backport the complete fix for #5525. ## Related issues related to #5525 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>

8797: Extend Either/EitherAssert capabilities r=npepinpe a=npepinpe ## Description This PR extends `Either` by adding a new API, `Either#getOrElse(R)`. This allows to extract the right value of the `Either` or returning a fallback. I did not add any tests as the implementation is incredibly simple, and I can't foresee ever getting more complex, but do challenge this. It also extends the related `EitherAssert` by adding a new adding new `left` and `right` extraction capabilities. So you can now assert something like: ```java EitherAssert.assertThat(either).left().isEqualTo(1); EitherAssert.assertThat(instantEither) .right() .asInstanceOf(InstanceOfAssertFactories.INSTANT) .isBetween(today, tomorrow); ``` Note that calling `EitherAssert#right()` will, under the hood, still call `EitherAssert#isRight()`. This PR is related to #5525 and is extracted from the bigger spike in #8491. You can review how it's used there, specifically in the `JobBatchCollectorTest`. As such, this is marked for backporting, since we'll backport the complete fix for #5525. ## Related issues related to #5525 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com> Co-authored-by: Nicolas Pepin-Perreault <43373+npepinpe@users.noreply.github.com>

npepinpe force-pushed the 5525-max-job-batch branch from 403cb4e to 457785c Compare February 4, 2022 17:50

npepinpe commented Feb 4, 2022

View reviewed changes

npepinpe force-pushed the 5525-max-job-batch branch 2 times, most recently from 5dd1bee to 9f1ed54 Compare February 14, 2022 13:17

npepinpe commented Feb 15, 2022

View reviewed changes

npepinpe added 14 commits February 16, 2022 14:51

fix(engine): fixes bounds check for job batch record

8f3ab7c

Correctly estimates the size of the batch claim required by the record that will be return in order to estimate when to cut off the job batch.

test(engine): adds a regression test to check for batch length

bdd591e

build(engine): add dispatcher dependency

694315b

wip: extracted to JobBatchCollector, tests and cleanup missing

be77475

wip: remove invalid test

b8a6b5a

wip: remove invalid test

4d2bdf8

wip: remove dead imports

bde4074

wip: remove invalid test

e0ee007

wip: remove invalid test

b0fd658

wip: add collector docs

d438656

feat(test-util): extent EitherAssert with extraction capabilities

3c173b4

test(engine): reuse state extension

bde84a4

test(engine): update batch collector tests

63f9847

feat(util): add Either#getOrElse

9642bd2

npepinpe added 7 commits February 16, 2022 14:51

fix(engine): always write ACTIVATED even on large job

232a07a

test(logstreams): remove invalid tests

561d12b

refactor(engine): make use of new batch writer API

f44ec80

style(logstreams): licensing of new test file

2ff71a8

revert(broker): remove unnecessary changes

70bb557

npepinpe force-pushed the 5525-max-job-batch branch from c740e88 to 70bb557 Compare February 16, 2022 13:51

This was referenced Feb 16, 2022

Extend Either/EitherAssert capabilities #8797

Merged

Add API to probe the logstream batch writer if more bytes can be written without writing them #8798

Merged

npepinpe closed this Feb 16, 2022

npepinpe deleted the 5525-max-job-batch branch July 25, 2022 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix estimation of claimed batch length when truncating job batch activation records #8491

Fix estimation of claimed batch length when truncating job batch activation records #8491

npepinpe commented Dec 28, 2021 •

edited

Loading

npepinpe commented Dec 28, 2021

npepinpe Feb 4, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe Feb 15, 2022

npepinpe commented Feb 15, 2022 •

edited

Loading

npepinpe commented Feb 16, 2022 •

edited

Loading


		@ExtendWith(ZeebeStateExtension.class)

Fix estimation of claimed batch length when truncating job batch activation records #8491

Fix estimation of claimed batch length when truncating job batch activation records #8491

Conversation

npepinpe commented Dec 28, 2021 • edited Loading

Description

Related issues

Definition of Done

npepinpe commented Dec 28, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

npepinpe commented Feb 15, 2022 • edited Loading

npepinpe commented Feb 16, 2022 • edited Loading

npepinpe commented Dec 28, 2021 •

edited

Loading

npepinpe commented Feb 15, 2022 •

edited

Loading

npepinpe commented Feb 16, 2022 •

edited

Loading