Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(replication-tests): add cache_mode on test replication all #2685

Merged
merged 9 commits into from
Mar 27, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Mar 4, 2024

  • add cache_mode cases on test_replication_all
  • fix CVCOnBumpUp to not skip some of the modified buckets

The reg tests added uncovered a corner case with CVCOnBump. Basically, two buckets with version less than the threshold got modified but only one id got returned causing inconsistent data between the replica and master. This was fixed by properly returning the number of modified buckets to the caller.

@kostasrim kostasrim self-assigned this Mar 4, 2024
async def check():
await check_all_replicas_finished(c_replicas, c_master)
hashes = await asyncio.gather(*(SeederV2.capture(c) for c in [c_master] + c_replicas))
assert len(set(hashes)) == 1
Copy link
Contributor Author

@kostasrim kostasrim Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is amazing 💯 (thank you @dranikpg 🙏 )

  2. @adiholden We have an issue, this is flaky with cache mode and fails from time to time which means that some replica is not on par with master when we evict (cache mode on). I will investigate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also (3). check could be a general function` this is a nice idiom :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks 😊
  2. Hmmm.. shouldn't be the case, because we wait for replication lag to catch up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (2) I suspect we have a bug in cache_mode

@kostasrim kostasrim requested a review from adiholden March 4, 2024 15:12
dranikpg
dranikpg previously approved these changes Mar 4, 2024
async def check():
await check_all_replicas_finished(c_replicas, c_master)
hashes = await asyncio.gather(*(SeederV2.capture(c) for c in [c_master] + c_replicas))
assert len(set(hashes)) == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Thanks 😊
  2. Hmmm.. shouldn't be the case, because we wait for replication lag to catch up

@kostasrim
Copy link
Contributor Author

@dranikpg it passes locally as well but it requires to increase the async_timeout from 20 seconds to 80 seconds (fails on 60seconds timeout). The question is, is this acceptable ? We are looking at 4x slower replication on cache mode for the stress test (pytest.param(8, [8, 8], dict(key_target=1_000_000, units=16), 50_000, marks=M_STRESS). cc @adiholden

@dranikpg
Copy link
Contributor

dranikpg commented Mar 7, 2024

We can skip the stress test, we can divide the workload by 2-3-4 if we see that we're running in cache mode, let you decide 🤷🏻

@kostasrim kostasrim requested a review from dranikpg March 11, 2024 09:03
@@ -1506,46 +1506,22 @@ template <bool UV>
std::enable_if_t<UV, unsigned> Segment<Key, Value, Policy>::CVCOnBump(uint64_t ver_threshold,
unsigned bid, unsigned slot,
Hash_t hash,
uint8_t result_bid[2]) const {
uint8_t result_bid[3]) const {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here now is that we might send more buckets than we actually changed because there exist fast paths in Segment::BumpUp. For example, if we got empty space for some reason we will just unload the stash entry leaving the rest of the buckets untouched. As this was broken before, this is a small price for now but we should improve. I am still not 100% familiar with the dash code but I will investigate how we can patch this and make it better

@kostasrim kostasrim requested a review from romange March 21, 2024 15:18
// 2. If there is empty space in target or probe bucket insert the slot there and remove
// it from the stash bucket.
// 3. If there is no empty space then we need swap slots with either the target or the probe
// bucket. Furthermore, we might clear the stash bucket so in total all the 3 buckets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!!!
Furthermore, we might clear the stash bucket - why we might clear the stash bucket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we call target.UnsetStashPtr which internally calls ClearStash

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, ClearStash clears a single entry inside a stash bucket, so maybe we might remove an entry from the stash bucket, so...

Copy link
Contributor Author

@kostasrim kostasrim Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, technically we do not remove an entry(slot) via ClearStash but rather clear the stash pointer of the bucket we are about to swap (since insertions after displacement end up on the stash buckets). I slightly reworded.

// and both of them have version < version_threshold. If we don't return them both
// then full sync phase during replication might fail to capture the changes leading
// to data inconsistencies between master and replica.
EXPECT_EQ(touched_bid[1], 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@romange
Copy link
Collaborator

romange commented Mar 24, 2024

@kostasrim is this pr ready ?

@kostasrim
Copy link
Contributor Author

@romange for some reason I received 0 notifications for this PR. Yes it's ready!

@kostasrim kostasrim requested a review from romange March 26, 2024 08:52
@romange romange removed their request for review March 26, 2024 09:08
@kostasrim kostasrim requested a review from romange March 26, 2024 14:33
@kostasrim kostasrim merged commit cd20c40 into main Mar 27, 2024
10 checks passed
@kostasrim kostasrim deleted the add_cache_mode_to_replication_test_all branch March 27, 2024 12:28
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 3, 2024
…nfly ( v1.15.1 → v1.16.0 ) (#3354)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.15.1` -> `v1.16.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.16.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.16.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.15.1...v1.16.0)

##### Dragonfly v1.16.0

Our spring release. We are getting closer to 2.0 with some very exciting
features ahead. Stay tuned!

Some prominent changes include:

- Improved memory accounting of client connections
([#&#8203;2710](https://togithub.com/dragonflydb/dragonfly/issues/2710)
[#&#8203;2755](https://togithub.com/dragonflydb/dragonfly/issues/2755)
and
[#&#8203;2692](https://togithub.com/dragonflydb/dragonfly/issues/2692) )
- FT.AGGREGATE call
([#&#8203;2413](https://togithub.com/dragonflydb/dragonfly/issues/2413))
- Properly handle and replicate Memcache flags
([#&#8203;2787](https://togithub.com/dragonflydb/dragonfly/issues/2787)
[#&#8203;2807](https://togithub.com/dragonflydb/dragonfly/issues/2807))
- Intoduce BF.AGGREGATE BD.(M)ADD and BF.(M)EXISTS methods
([#&#8203;2801](https://togithub.com/dragonflydb/dragonfly/issues/2801)).
Note, that it does not work with snapshots and replication yet.
- Dragonfly builds natively on MacOS. We would love some help with
extending the release pipeline to create a proper macos binary.
- Following the requests from the Edge developers community, we added a
basic HTTP API support! Try running Dragonfly with:
`--expose_http_api` flag and then call `curl -X POST -d '["ping"]'
localhost:6379/api`. We will follow up with more extensive docs later
this month.
- Lots of stability fixes, especially around Sidekiq and BullMQ
workloads.

##### What's Changed

- chore: make usan asan optional and enable them on CI by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2631
- fix: missing manual trigger for daily sanitizers by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2682
- bug(tiering): fix overflow in page offset calculation and wrong hash
offset calculation by [@&#8203;theyueli](https://togithub.com/theyueli)
in
[dragonflydb/dragonfly#2683
- Chore: Fixed Docker Health Check by
[@&#8203;manojks1999](https://togithub.com/manojks1999) in
[dragonflydb/dragonfly#2659
- chore: Increase disk space in the coverage runs by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2684
- fix(flushall): Decommit memory after releasing tables. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2691
- feat(server): Account for serializer's temporary buffer size by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2689
- refactor: remove FULL-SYNC-CUT cmd
[#&#8203;2687](https://togithub.com/dragonflydb/dragonfly/issues/2687)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2688
- chore: add malloc-based stats and decommit by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2692
- feat(cluster): automatic slot migration finalization
[#&#8203;2697](https://togithub.com/dragonflydb/dragonfly/issues/2697)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2698
- Basic FT.AGGREGATE by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2413
- chore: little transaction cleanup by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2608
- fix(channel store): add acquire/release pair in fast update path by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2704
- chore: add ubuntu22 devcontainer by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2700
- feat(cluster): Add `--cluster_id` flag by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2695
- feat(server): Use mimalloc in SSL calls by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2710
- chore: Pull helio with new BlockingCounter by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2711
- chore(transaction): Simplify PollExecution by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2712
- chore(transaction): Don't call GetLocalMask from blocking controller
by [@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2715
- chore: improve compatibility of EXPIRE functions with Redis by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2696
- chore: disable flaky fuzzy migration test by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2716
- chore: update sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2686
- chore: Use c-ares for resolving hosts in `ProtocolClient` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2719
- Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2699
- chore: Record cmd stat from invoke by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2720
- fix(transaction): nullptr access on non transactional commands by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2724
- fix(BgSave): async from sync by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2702
- chore: remove core/fibers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2723
- fix(transaction): Replace with armed sync point by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2708
- feat(json): Deserialize ReJSON format by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2725
- feat: add flag masteruser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2693
- refactor: block list refactoring
[#&#8203;2580](https://togithub.com/dragonflydb/dragonfly/issues/2580)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2732
- chore: fix DeduceExecMode by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2733
- fix(cluster): Reply with correct `\n` / `\r\n` from `CLUSTER` sub cmd
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2731
- chore: Introduce fiber stack allocator by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2730
- fix(cluster): Save replica ID per replica by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2735
- fix(ssl): Proper cleanup by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2742
- chore: add skeleton files for flat_dfs code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2738
- chore: better error reporting when connecting to tls with plain socket
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2740
- chore: Support json paths without root selector by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2747
- chore: journal cleanup by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2749
- refactor: remove start-slot-migration cmd
[#&#8203;2727](https://togithub.com/dragonflydb/dragonfly/issues/2727)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2728
- feat(server): Add TLS usage to /metrics and `INFO MEMORY` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2755
- chore: preparations for adding flat json support by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2752
- chore(transaction): Introduce RunCallback by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2760
- feat(replication): Do not auto replicate different master by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2753
- Improve Helm chart to be rendered locally and on machines where is not
the application target by [@&#8203;fafg](https://togithub.com/fafg) in
[dragonflydb/dragonfly#2706
- chore: preparation for basic http api by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2764
- feat(server): Add metric for RDB save duration. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2768
- chore: fix flat_dfs read tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2772
- fix(ci): do not overwrite last_log_file among tests by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2759
- feat(server): support cluster replication by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2748
- fix: fiber preempts on read path and OnCbFinish() clears
fetched_items\_ by [@&#8203;kostasrim](https://togithub.com/kostasrim)
in
[dragonflydb/dragonfly#2763
- chore(ci): open last_log_file in append mode by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2776
- doc(README): fix outdated expiry ranges description by
[@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) in
[dragonflydb/dragonfly#2779
- Benchmark runner by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2780
- chore(replication-tests): add cache_mode on test replication all by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2685
- feat(tiering): DiskStorage by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2770
- chore: add a boilerplate for bloom filter family by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2782
- chore: introduce conversion routines between JsonType and FlatJson by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2785
- chore: Fix memcached flags not updated by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2787
- chore: remove duplicate code from dash and simplify by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2765
- chore: disable test_cluster_slot_migration by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2788
- fix: new\[] delete\[] missmatch in disk_storage by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2792
- fix: sanitizers clang build and clean up some warnings by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2793
- chore: add bloom filter class by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2791
- chore: add SBF data structure by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2795
- chore(tiering): Disable compilation for MacOs by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2799
- chore: fix daily build by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2798
- chore: expose SBF via compact_object by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2797
- fix(ci): malloc trim on sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2794
- fix(cluster): Don't miss updates in FLUSHSLOTS by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2783
- feat: add bf.(m)add and bf.(m)exists commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2801
- fix: SBF memory leaks by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2803
- chore: refactor StringFamily::Set to use CmdArgParser by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2800
- fix: propagate memcached flags to replica by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2807
- DFLYMIGRATE ACK refactoring by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2790
- feat: add master lsn and journal_executed dcheck in replica via ping
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2778
- fix: correct json response for errors by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2813
- chore: bloom test - cover corner cases by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2806
- bug(server): do not write lsn opcode to journal by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2814
- chore: Fix build by disabling the tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2821
- fix(replication): replication with multi shard sync enabled lagging by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2823
- fix: io_uring/fibers bug in DnsResolve by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2825

##### New Contributors

- [@&#8203;manojks1999](https://togithub.com/manojks1999) made their
first contribution in
[dragonflydb/dragonfly#2659
- [@&#8203;fafg](https://togithub.com/fafg) made their first
contribution in
[dragonflydb/dragonfly#2706
- [@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) made their
first contribution in
[dragonflydb/dragonfly#2779

##### Huge thanks to all the contributors! ❤️

🇮🇱  🇺🇦

**Full Changelog**:
dragonflydb/dragonfly@v1.15.0...v1.16.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants