Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds streaming append capability. #11

Merged
merged 111 commits into from
May 7, 2023
Merged

Adds streaming append capability. #11

merged 111 commits into from
May 7, 2023

Conversation

arindas
Copy link
Owner

@arindas arindas commented May 7, 2023

laminarmq specific enhancements to the segmented_log data structure

While the conventional segmented_log data structure is quite performant for a commit_log implementation,
it still requires the following properties to hold true for the record being appended:

  • We have the entire record in memory
  • We know the record bytes' length and record bytes' checksum before the record is appended

It's not possible to know this information when the record bytes are read from an asynchronous stream of
bytes. Without the enhancements, we would have to concatenate intermediate byte buffers to a vector.
This would not only incur more allocations, but also slow down our system.

Hence, to accommodate this use case, we introduced an intermediate indexing layer to our design.

segmented_log

Fig: Data organisation for persisting the segmented_log data structure on a *nix file system.

In the new design, instead of referring to records with a raw offset, we refer to them with indices. The
index in each segment translates the record indices to raw file position in the segment store file.

Now, the store append operation accepts an asynchronous stream of bytes instead of a contiguously laid out
slice of bytes. We use this operation to write the record bytes, and at the time of writing the record
bytes, we calculate the record bytes' length and checksum. Once we are done writing the record bytes to
the store, we write its corresponding record_header (containing the checksum and length), position and
index as an index_record in the segment index.

This provides two quality-of-life enhancements:

  • Allow asynchronous streaming writes, without having to concatenate intermediate byte buffers
  • Records are accessed much more easily with easy-to-use indices

Now, to prevent a malicious user from overloading our storage capacity and memory with a maliciously
crafted request which infinitely loops over some data and sends it to our server, we have provided an
optional append_threshold parameter to all append operations. When provided, it prevents streaming
append writes to write more bytes than the provided append_threshold.

At the segment level, this requires us to keep a segment overflow capacity. All segment append operations
now use segment_capacity - segment.size + segment_overflow_capacity as the append_threshold value.
A good segment_overflow_capacity value could be segment_capacity / 2.

Append threshold when appending to store is calculated as follows:
append_threshold = remaining_store_capacity + max_store_overflow

BREAKING CHANGE: adds new member max_store_overflow to segment::Config
@arindas arindas changed the title Dev/0 Adds streaming append capability. May 7, 2023
@codecov-commenter
Copy link

Codecov Report

Patch coverage: 89.23% and project coverage change: +7.38 🎉

Comparison is base (64d229e) 80.57% compared to head (1b48707) 87.95%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #11      +/-   ##
===========================================
+ Coverage    80.57%   87.95%   +7.38%     
===========================================
  Files           19       37      +18     
  Lines          942     4875    +3933     
===========================================
+ Hits           759     4288    +3529     
- Misses         183      587     +404     
Impacted Files Coverage Δ
src/commit_log/glommio_impl/segmented_log/store.rs 94.56% <0.00%> (+17.04%) ⬆️
src/commit_log/segmented_log/mod.rs 94.17% <0.00%> (+14.72%) ⬆️
src/commit_log/segmented_log/segment.rs 89.37% <0.00%> (+3.81%) ⬆️
src/lib.rs 100.00% <ø> (ø)
src/server/partition/single_node/commit_log.rs 84.72% <0.00%> (+14.95%) ⬆️
src/server/worker.rs 52.17% <0.00%> (-22.83%) ⬇️
src/common/mod.rs 71.60% <68.57%> (-28.40%) ⬇️
src/storage/commit_log/mod.rs 73.07% <73.07%> (ø)
src/storage/commit_log/segmented_log/store.rs 87.65% <87.65%> (ø)
src/storage/commit_log/segmented_log/index.rs 88.62% <88.62%> (ø)
... and 10 more

... and 17 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@arindas arindas merged commit 6f98438 into develop May 7, 2023
2 checks passed
arindas added a commit that referenced this pull request May 7, 2023
Add development updates from #11
@arindas arindas deleted the dev/0 branch May 7, 2023 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants