Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API to probe the logstream batch writer if more bytes can be written without writing them #8798

Merged
4 commits merged into from
Feb 17, 2022

Conversation

npepinpe
Copy link
Member

@npepinpe npepinpe commented Feb 16, 2022

Description

This PR adds a new API method, LogStreamBatchWriter#canWriteAdditionalEvent(int). This allows users of the writer to probe if adding the given amount of bytes to the batch would cause it to become un-writable, without actually having to write anything to the batch, or even modify their DTO (e.g. the TypedRecord<?> in the engine).

To avoid having dispatcher details leak into the implementation, an analogous method is added to the dispatcher, Dispatcher#canClaimFragmentBatch(int, int), which will compare the given size, framed and aligned, with the max fragment length. This is the main building block to eventually solve #5525, and enable other use cases (e.g. multi-instance creation) which deal with large batches until we have a more permanent solution (e.g. chunking follow up batches).

NOTE: the tests added in the dispatcher are not very good, but I couldn't come up with something else that wouldn't be too coupled to the implementation (i.e. essentially reusing LogBufferAppender). I would like some ideas/suggestions.

NOTE: this PR comes out of the larger one, #8491. You can check that one out to see how the new API would be used, e.g. in the JobBatchCollector. As such, this is marked for backporting, since we'll backport the complete fix for #5525.

Related issues

related to #5525

Definition of Done

Not all items need to be done depending on the issue and the pull request.

Code changes:

  • The changes are backwards compatibility with previous versions
  • If it fixes a bug then PRs are created to backport the fix to the last two minor versions. You can trigger a backport by assigning labels (e.g. backport stable/1.3) to the PR, in case that fails you need to create backports manually.

Testing:

  • There are unit/integration tests that verify all acceptance criterias of the issue
  • New tests are written to ensure backwards compatibility with further versions
  • The behavior is tested manually
  • The change has been verified by a QA run
  • The impact of the changes is verified by a benchmark

Documentation:

  • The documentation is updated (e.g. BPMN reference, configuration, examples, get-started guides, etc.)
  • New content is added to the release announcement
  • If the PR changes how BPMN processes are validated (e.g. support new BPMN element) then the Camunda modeling team should be informed to adjust the BPMN linting.

Adds a new public API method to the Dispatcher,
`#canClaimFragmentBatch(int, int)`. This method determines whether the batch
with the given fragment count and of the given length can actually be
claimed for this dispatcher instance.

This method is mostly useful to determine if something will be claimable
before you wish to claim it, mostly to avoid having to abort your claim
unnecessarily if the batch isn't finished yet when calling this method.
Introduces a new API to the `LogStreamBatchWriter` API,
`#canWriteAdditionalEvent(int)`. This API lets one probe whether a batch
including an additional event of the given size could be written.

While this could be done via tryWrite, there are cases, such as with the
job activation, where you want to know if adding data to a record (not
to the writer, but the record) will cause it to be unwritable, at which
point you want to handle it differently. Rolling back changes to the
record itself is more complicated than to the writer, so as a compromise
you can probe the writer to know if it will be able to write more bytes
before writing them, even, as in our example, to your DTO.
@npepinpe
Copy link
Member Author

Looking forward to talk about tests 😄 I'm not very satisfied with the tests I added, especially in the dispatcher, but nothing came to mind without essentially reusing the implementation in the tests 😵

Copy link
Member

@saig0 saig0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npepinpe looks good so far 👍
I'm not so happy about the writer test but I have no better idea 😅

Please have a look at my comment. But feel free to merge 🚀

Adds a common case test for `canClaimFragmentBatch`. The idea is that
the previous test was testing a corner case - i.e. when the given batch
length fits the max fragment length, but will not when it's framed and
aligned.
@npepinpe
Copy link
Member Author

I also took the opportunity to update the DispatcherTest to Junit 5, as it was an extremely easy thing to do and we do want to migrate to Junit 5 completely eventually. I don't think it needs another review, but I'll ping you on slack for a quick chance to veto anyway 😉

@npepinpe
Copy link
Member Author

bors merge

ghost pushed a commit that referenced this pull request Feb 17, 2022
8798: Add API to probe the logstream batch writer if more bytes can be written without writing them r=npepinpe a=npepinpe

## Description

This PR adds a new API method, `LogStreamBatchWriter#canWriteAdditionalEvent(int)`. This allows users of the writer to probe if adding the given amount of bytes to the batch would cause it to become un-writable, without actually having to write anything to the batch, or even modify their DTO (e.g. the `TypedRecord<?>` in the engine).

To avoid having dispatcher details leak into the implementation, an analogous method is added to the dispatcher, `Dispatcher#canClaimFragmentBatch(int, int)`, which will compare the given size, framed and aligned, with the max fragment length. This is the main building block to eventually solve #5525, and enable other use cases (e.g. multi-instance creation) which deal with large batches until we have a more permanent solution (e.g. chunking follow up batches).

NOTE: the tests added in the dispatcher are not very good, but I couldn't come up with something else that wouldn't be too coupled to the implementation (i.e. essentially reusing `LogBufferAppender`). I would like some ideas/suggestions.

NOTE: this PR comes out of the larger one, #8491. You can check that one out to see how the new API would be used, e.g. in the `JobBatchCollector`. As such, this is marked for backporting, since we'll backport the complete fix for #5525.

## Related issues

related to #5525 



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@ghost
Copy link

ghost commented Feb 17, 2022

Build failed:

@npepinpe
Copy link
Member Author

bors r+

ES tests were flaky, which incidentally is my next issue to work on 😄

@ghost
Copy link

ghost commented Feb 17, 2022

Build succeeded:

@github-actions
Copy link
Contributor

Successfully created backport PR #8809 for stable/1.2.

@github-actions
Copy link
Contributor

Successfully created backport PR #8810 for stable/1.3.

ghost pushed a commit that referenced this pull request Feb 17, 2022
8809: [Backport stable/1.2] Add API to probe the logstream batch writer if more bytes can be written without writing them r=npepinpe a=github-actions[bot]

# Description
Backport of #8798 to `stable/1.2`.

relates to #5525

Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
ghost pushed a commit that referenced this pull request Feb 17, 2022
8810: [Backport stable/1.3] Add API to probe the logstream batch writer if more bytes can be written without writing them r=npepinpe a=github-actions[bot]

# Description
Backport of #8798 to `stable/1.3`.

relates to #5525

Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
ghost pushed a commit that referenced this pull request Feb 23, 2022
8799: Correctly truncate a job activation batch if it will not fit in the dispatcher r=npepinpe a=npepinpe

## Description

This PR fixes a bug where at times, we would try to activate a job batch which was too big and could not be written into the dispatcher, as when it was framed and aligned, its size would exceed the `maxFragmentLength`. The fix itself is simple, and it was to delegate, all the way down to the dispatcher (with layers adding their own framing in between), the responsibility to decide whether more jobs can be added to the batch or not. To simplify the code and testing, the responsibility of collecting jobs into a batch was extracted into a specific class.

The new API, `TypedEventWriter#canWriteEventOfLength(int)` could potentially be reused in other places where we want to prevent writing batches that are too large. However, this is a bit brittle, since it requires the code in the engine to properly compute beforehand how the record will grow when you modify it. For example, in the `JobBatchCollector`, to calculate the length of the `TypedRecord<JobBatchRecordValue> record` when adding a given `JobRecordValue` to it, we need the following:

```java
// the expected length is based on the current record's length plus the length of the job
// record we would add to the batch, the number of bytes taken by the additional job key,
// as well as one byte required per job key for its type header. if we ever add more, 
// should be updated accordingly.
final var jobRecordLength = jobRecord.getLength();
final var expectedEventLength = record.getLength() + jobRecordLength + Long.BYTES + 1;
```

Not extremely complicated, but could be error prone. A better solution, chunking the follow up events, is not something we can sanely do in between KRs, unfortunately, and would most likely carry a high risk, so wouldn't be anything we can backport. For now, I think this is an OK compromise.

NOTE: this PR depends on #8797 and #8798. Wait for these to be merged before reviewing!

## Related issues

closes #5525



Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
@Zelldon Zelldon added Release: 1.2.11 version:1.3.5 Marks an issue as being completely or in parts released in 1.3.5 labels Mar 1, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
version:1.3.5 Marks an issue as being completely or in parts released in 1.3.5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants