Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(tiering): Simplest small bins #2810

Merged
merged 4 commits into from
Apr 5, 2024

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Apr 1, 2024

Simplest design for small bins

@@ -14,6 +14,10 @@
namespace dfly::tiering {

struct DiskSegment {
DiskSegment FillPages() const {
return {offset / 4_KB * 4_KB, length / 4_KB * 4_KB};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not good, imho. You round down length, where you should round up offset + length

Copy link
Contributor Author

@dranikpg dranikpg Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, actually I don't use the length for now anywhere. Offset should be rounded down of course. Rounding should always contain the original page

Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

src/server/tiering/disk_storage.h Outdated Show resolved Hide resolved
src/server/tiering/small_bins.cc Show resolved Hide resolved
src/server/tiering/small_bins.cc Outdated Show resolved Hide resolved
return {id, std::move(out)};
}

SmallBins::CutList SmallBins::ReportStashed(unsigned id, DiskSegment segment) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a cutlist? why this name?

Copy link
Contributor Author

@dranikpg dranikpg Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll call it KeySegmentList. After a bin was stashed, this list returns where each value is stored on the page (segment)

src/server/tiering/small_bins.h Show resolved Hide resolved
@dranikpg dranikpg marked this pull request as ready for review April 2, 2024 06:57
@dranikpg dranikpg requested a review from romange April 2, 2024 06:57
auto& key = node.key();
auto& value = node.mapped();

// 1. 8 byte key hash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you interleave hashes and key/values.
Is it how it was in tiered_storage? I think we keep all the hashes together. The motivation behind the separation: to read easily all the hashes from the page when performing defragmentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the format was different. The hashes were written first, then all the values with their lengths rounded up because we had separate bins. Will change to all hashes being stored first, but keep in mind this is not a final design

@dranikpg dranikpg requested a review from romange April 4, 2024 07:39
@dranikpg
Copy link
Contributor Author

dranikpg commented Apr 4, 2024

Now the format is: num_entries [hash...] [value_length value...]

@romange
Copy link
Collaborator

romange commented Apr 4, 2024

you need to rebase and fix confllicts.

@dranikpg
Copy link
Contributor Author

dranikpg commented Apr 4, 2024

Actually, I don't store value lengths, fixed accounting 😅

@dranikpg dranikpg requested a review from romange April 5, 2024 06:13
using FilledBin = std::pair<unsigned /* id */, std::string>;

// List of locations of values for corresponding keys
using KeySegmentList = std::vector<std::pair<std::string /* key*/, DiskSegment>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of commenting on basic types, I suggest using to declare Key, Id.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also comment on Id that it's unique

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will define Id. I suggest not to define Key = string anywhere because it's overkill, we don't to it anywhere. It's clear from the comments above, inline comments are just for readability

// SIMPLEST VERSION for now.
class SmallBins {
public:
using FilledBin = std::pair<unsigned /* id */, std::string>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here actually add a comment that string is a serialized blob, that is at most 4KB but can be less.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's stated above in SmallBins that it fill 4kb pages

FilledBin FlushBin();

private:
unsigned last_bin_id_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the implication of this id ? unsigned can overflow, that's why I am asking.
Do you use it to track in flight requests or it's more then that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see. you use it to track in flight ids.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's only for pending stashes. It can overflow safely, I doubt we can have 2e9 ongoing disk operations

@dranikpg dranikpg force-pushed the tiering-small-bins branch 3 times, most recently from 531c3a3 to dc1a0c1 Compare April 5, 2024 08:15
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg merged commit 5fcd64a into dragonflydb:main Apr 5, 2024
7 checks passed
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 16, 2024
…nfly ( v1.16.1 → v1.17.0 ) (#3473)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.16.1` -> `v1.17.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.17.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.17.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.16.1...v1.17.0)

##### Dragonfly v1.17.0

Some prominent changes include:

- Improved performance for MGET operations
([#&#8203;2453](https://togithub.com/dragonflydb/dragonfly/issues/2453))
- Fix argument parsing in json.objkeys
([#&#8203;2872](https://togithub.com/dragonflydb/dragonfly/issues/2872))
- Fix ipv6 support for replication
([#&#8203;2889](https://togithub.com/dragonflydb/dragonfly/issues/2889))
- Support serialisation of bloom filters - saving to and loading from
snapshots
([#&#8203;2846](https://togithub.com/dragonflydb/dragonfly/issues/2846))
- Support of HLL PFADD
([#&#8203;2761](https://togithub.com/dragonflydb/dragonfly/issues/2761))
- Support bullmq workloads that do not have `{}` hashtags in their queue
names
([#&#8203;2890](https://togithub.com/dragonflydb/dragonfly/issues/2890))

##### What's Changed

- fix:
[#&#8203;2745](https://togithub.com/dragonflydb/dragonfly/issues/2745)
don't start migration process again after apply the same the same config
is applied by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2822
- feat(transaction): Idempotent callbacks (immediate runs) by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2453
- refactor(cluster): replace sync_id with node_id for slot migration
[#&#8203;2835](https://togithub.com/dragonflydb/dragonfly/issues/2835)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2838
- feat(tiering): Simple OpManager by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2781
- chore: implement path mutation for JsonFlat by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2805
- feat(cluster): add migration removing by config
[#&#8203;2835](https://togithub.com/dragonflydb/dragonfly/issues/2835)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2844
- chore: expose direct API on Bloom objects by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2845
- chore: generalize CompactObject::AllocateMR by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2847
- feat(tiering): Simplest small bins by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2810
- refactor: clean cluster slot migration code by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2848
- fix(tests): Fix numsub test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2852
- fix: healthcheck for docker containers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2853
- fix: possible crash in tls code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2854
- fix(server): Do not block admin-port commands by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2842
- fix(pytest): make pytests fail if server crash on shutdown by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2827
- feat(server): add prints on takeover timeout by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2856
- fix(pytest): dont check process return code on kill by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2862
- fix: authorize the http connection to call commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2863
- feat(cluster): Send number of keys for incoming and outgoing
migrations. by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2858
- feat(tiering): TieredStorageV2 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2849
- bug(server): set connection flags block/pause flag on all blocking
commands by [@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2816
- chore: serialize SBF by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2846
- fix: test_replicaof_reject_on_load crash on stop by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2818
- feat(dbslice): Add self-laundering iterator in `DbSlice` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2815
- chore: License update by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2767
- fix(acl): incompatibilities with acl load by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2867
- fix(json): make path optional in json.objkeys by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2872
- fix: return wrong type errors for SET...GET command by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2874
- fix(redis replication): remove partial sync flow ,not supported yet by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2865
- chore: limit traffic logger only to the main interface by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2877
- chore: relax repltakeover constraints to only exclude write commands
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2873
- chore(replayer): Roll back to go1.18 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2881
- fix: brpoplpush single shard to wake up blocked transactions by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2875
- chore: LockTable tracks fingerprints of keys by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2839
- chore: reject TLS handshake when our listener is plain TCP by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2882
- Add support for Sparse HLL PFADD by
[@&#8203;azuredream](https://togithub.com/azuredream) in
[dragonflydb/dragonfly#2761
- feat server: bring visibility to script errors by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2879
- chore: clean up REPLTAKEOVER flow by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2887
- chore(tiering): Move files and move kb literal to common by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2868
- chore(interpreter): Support object replies by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2885
- fix(ci/helm): Stick to v0.73.0 version of prom operator by
[@&#8203;Pothulapati](https://togithub.com/Pothulapati) in
[dragonflydb/dragonfly#2893
- fix(acl): authentication with UDS socket by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2895
- feat(cluster): add repeated ACK if an error is happened by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2892
- chore(blocking): Remove faulty DCHECK by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2898
- chore: add a clear link on how to build dragonfly from source by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2884
- feat(server): Allow configuration of hashtag extraction by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2890
- fix: fix build under macos by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2901
- fix(cluster_replication): replicate redis cluster node bug fix by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2876
- fix(acl): skip http and add check on connection traversals by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2883
- fix(zset): Better memory consumption calculation by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2900
- fix: fix ld for num converting by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2902
- chore: add help string for memory_fiberstack_vms_bytes by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2903
- fix(sanitizers): false positive fail on multi_test::Eval by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2896
- chore: pull helio and add ipv6 replication test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2889
- chore: add ipv6 support for native linux release by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2908

##### Huge thanks to all the contributors! ❤️

**Full Changelog**:
dragonflydb/dragonfly@v1.16.0...v1.17.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMDEuNSIsInVwZGF0ZWRJblZlciI6IjM3LjMwMS41IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
@dranikpg dranikpg deleted the tiering-small-bins branch April 18, 2024 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants