Adds streaming append capability. #11

arindas · 2023-05-07T05:09:17Z

`laminarmq` specific enhancements to the `segmented_log` data structure

While the conventional segmented_log data structure is quite performant for a commit_log implementation,
it still requires the following properties to hold true for the record being appended:

We have the entire record in memory
We know the record bytes' length and record bytes' checksum before the record is appended

It's not possible to know this information when the record bytes are read from an asynchronous stream of
bytes. Without the enhancements, we would have to concatenate intermediate byte buffers to a vector.
This would not only incur more allocations, but also slow down our system.

Hence, to accommodate this use case, we introduced an intermediate indexing layer to our design.

Fig: Data organisation for persisting the segmented_log data structure on a *nix file system.

In the new design, instead of referring to records with a raw offset, we refer to them with indices. The
index in each segment translates the record indices to raw file position in the segment store file.

Now, the store append operation accepts an asynchronous stream of bytes instead of a contiguously laid out
slice of bytes. We use this operation to write the record bytes, and at the time of writing the record
bytes, we calculate the record bytes' length and checksum. Once we are done writing the record bytes to
the store, we write its corresponding record_header (containing the checksum and length), position and
index as an index_record in the segment index.

This provides two quality-of-life enhancements:

Allow asynchronous streaming writes, without having to concatenate intermediate byte buffers
Records are accessed much more easily with easy-to-use indices

Now, to prevent a malicious user from overloading our storage capacity and memory with a maliciously
crafted request which infinitely loops over some data and sends it to our server, we have provided an
optional append_threshold parameter to all append operations. When provided, it prevents streaming
append writes to write more bytes than the provided append_threshold.

At the segment level, this requires us to keep a segment overflow capacity. All segment append operations
now use segment_capacity - segment.size + segment_overflow_capacity as the append_threshold value.
A good segment_overflow_capacity value could be segment_capacity / 2.

…*() utilities.

…nce.

… Copy.

…gorithm.

…re/**

Append threshold when appending to store is calculated as follows: append_threshold = remaining_store_capacity + max_store_overflow BREAKING CHANGE: adds new member max_store_overflow to segment::Config

… if no read segments are present

…ovider

…iables

codecov-commenter · 2023-05-07T05:15:19Z

Codecov Report

Patch coverage: 89.23% and project coverage change: +7.38 🎉

Comparison is base (64d229e) 80.57% compared to head (1b48707) 87.95%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@             Coverage Diff             @@
##           develop      #11      +/-   ##
===========================================
+ Coverage    80.57%   87.95%   +7.38%     
===========================================
  Files           19       37      +18     
  Lines          942     4875    +3933     
===========================================
+ Hits           759     4288    +3529     
- Misses         183      587     +404

Impacted Files	Coverage Δ
src/commit_log/glommio_impl/segmented_log/store.rs	`94.56% <0.00%> (+17.04%)`	⬆️
src/commit_log/segmented_log/mod.rs	`94.17% <0.00%> (+14.72%)`	⬆️
src/commit_log/segmented_log/segment.rs	`89.37% <0.00%> (+3.81%)`	⬆️
src/lib.rs	`100.00% <ø> (ø)`
src/server/partition/single_node/commit_log.rs	`84.72% <0.00%> (+14.95%)`	⬆️
src/server/worker.rs	`52.17% <0.00%> (-22.83%)`	⬇️
src/common/mod.rs	`71.60% <68.57%> (-28.40%)`	⬇️
src/storage/commit_log/mod.rs	`73.07% <73.07%> (ø)`
src/storage/commit_log/segmented_log/store.rs	`87.65% <87.65%> (ø)`
src/storage/commit_log/segmented_log/index.rs	`88.62% <88.62%> (ø)`
... and 10 more

... and 17 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Add development updates from #11

arindas added 30 commits January 29, 2023 22:13

Added indexed commit log trait implementation, along with store write…

9b68fa2

…*() utilities.

Added utilites for RecordHeader.

3679af8

Added utilities for deserializing RecordHeader from a byte slice.

b01d6cf

Extracted out close(), remove() and truncate() into appropriate traits.

643b2ff

Refactored TryFrom impls into more ergnomic read(), write() fns.

cb80831

Restructures storage/segmented_log/store module as a simple .rs file

de4736d

Added a RAII utility for automatically cleaning up AsyncConsume insta…

861b9b5

…nce.

Moved ConsumeHandle to consume submodule under storage::impls::glommio

93a5684

Added SegmentError type.

09a1412

Started AsyncIndexedRead impl for Segment

f26533a

Added AsyncIndexRead implementation for Segment.

94bc53b

Added SizedStorage impl and is_maxed() for Segment.

ff95c65

Added Segment append() implementation.

7d90df1

Implemented AsyncTruncate and AsyncConsume for Segment

f378f5e

Removed redundant reference in has_expired() params since Duration is…

0fb2a9c

… Copy.

Added with_consume_method constructor for ConsumeHandle

c36fa01

Started SegmentedLog implementation.

3c65e41

Added AsyncIndexedRead impl for SegmentedLog

60d8403

Made Store and it's utilities generic in terms of checksum hashing al…

3be9b1e

…gorithm.

Made SegmentCreator more minimalist.

583b9bb

Added rotate_new_write_segment

c794ca7

Implemented AsyncTruncate for SegmentedLog

52647c1

Implemented remove_expired_segments and AsyncConsume for SegmentedLog

04a7939

Implemented CommitLog for SegmentedLog.

4637aa4

Added doc to elucidate return type for remove_expired_segments

62440bd

Made store size generic.

27b7163

Setup module struture for in_mem commit_log/* impls.

7ad19d9

Added storage trait.

dbf1938

Added temporary storage stub.

f2599e1

Remove unnecessary modules from in_mem.

1190afc

arindas added 25 commits March 22, 2023 14:55

ci: adds on push to rust-ci workflow

b9b2cfa

chore: empty-commit to tigger github workflow load.

615b6d5

ci: removes on-push, uses ulimit to increase memlock limit.

bdc97cc

ci: adds on-push to rust-ci.

c6a53c2

ci: removes sudo in ulimit invocation.

c1016ed

chore: empty-commit to tigger github workflow load.

eb586ba

ci: set ulimit -m 8192

2a67a2a

ci: set ulimit -m 1024

a969cbe

ci: invokes ulimit inside bash

0bc993e

ci: runs tests on self-hosted runner instead

e21bb35

ci: invokes grcov in run command instead of actions-rs/grcov

5655b7c

ci: installs grcov with cargo

794b8cb

ci: grabs grcov from releases instead of using cargo install

b66a8c2

ci: adds component llvm-tools-preview when using actions-rs/toolchain

33e2260

ci: writes lcov to ./coverage.lcov

46dbc06

ci: changes grcov output to cobertura

19b325a

ci: rust-ci configure on-push

6612fbc

ci: removes main and develop branches from on-push target; adds featu…

f57839c

…re/**

feat(segment)!: adds max_store_overflow to segment config

81f508c

Append threshold when appending to store is calculated as follows: append_threshold = remaining_store_capacity + max_store_overflow BREAKING CHANGE: adds new member max_store_overflow to segment::Config

test(segmented_log): adds test for remove_expired_segments

8a5f1be

fix(SegmentedLog::lowest_index): Considers write segment lowest index…

545a1b5

… if no read segments are present

feat: Manually implements Default and Clone traits for InMemStoragePr…

94e0046

…ovider

style(in_mem::segment): remove _ from instance variales

ad9433e

style(storage::commit_log::segmented_log): remove _ from instance var…

e45ad55

…iables

fix(clippy): removes redundant closure and pattern matching

1b48707

arindas changed the title ~~Dev/0~~ Adds streaming append capability. May 7, 2023

arindas merged commit 6f98438 into develop May 7, 2023
2 checks passed

arindas added a commit that referenced this pull request May 7, 2023

Merge pull request #12 from arindas/develop

b9c4ffc

Add development updates from #11

arindas deleted the dev/0 branch May 7, 2023 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds streaming append capability. #11

Adds streaming append capability. #11

arindas commented May 7, 2023 •

edited

Loading

codecov-commenter commented May 7, 2023

Adds streaming append capability. #11

Adds streaming append capability. #11

Conversation

arindas commented May 7, 2023 • edited Loading

laminarmq specific enhancements to the segmented_log data structure

codecov-commenter commented May 7, 2023

Codecov Report

arindas commented May 7, 2023 •

edited

Loading

`laminarmq` specific enhancements to the `segmented_log` data structure