Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(transaction): Replace with armed sync point #2708

Merged
merged 9 commits into from
Mar 14, 2024

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Mar 10, 2024

  1. Replaces run_barrier as a synchronization point with is_armed + an embedded blocking counter for awaiting running jobs
  2. Replaces IsArmedInShard + GetLocalMask + is_armed.exchange chain with a single DisarmInShard() / DisarmInShardWhen

@@ -1091,7 +1061,7 @@ Transaction::RunnableResult Transaction::RunQuickie(EngineShard* shard) {
void Transaction::ExpireBlocking(WaitKeysProvider wcb) {
DCHECK(!IsGlobal());
DVLOG(1) << "ExpireBlocking " << DebugId();
run_barrier_.Start(unique_shard_cnt_);
run_barrier_.Add(unique_shard_cnt_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in your embedded bc please add Set or Start function that actually dchecks that the previous value is 0.

bool Transaction::DisarmInShard(ShardId sid, uint16_t relevant_flags) {
auto& sd = shard_data_[SidToId(sid)];

if (relevant_flags) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you combine two differrent functions in one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to simplify the interface, IsArmedInShard and GetLocalMask are never used separately. We disarm either unconditionally (in queue) or with condition (awakened, ooo)

@@ -994,6 +961,11 @@ void Transaction::ExecuteAsync() {
});
}

void Transaction::FinishHop() {
boost::intrusive_ptr<Transaction> guard(this);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be nice to add hop_index_ for debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To shard stats?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, for example to use it in Debug_FailedState or in general to have it in case we need to debug it. not for prod

@dranikpg dranikpg requested a review from romange March 12, 2024 09:07
@dranikpg dranikpg marked this pull request as ready for review March 12, 2024 09:22
// Check if the caller was handled by a previous poll.
if (trans && !trans->IsArmedInShard(sid))
return;
uint16_t local_mask = trans ? trans->DisarmInShardWhen(sid, Transaction::AWAKED_Q) : 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local here does not provide much value, on the other hand maybe call it trans_mask instead?

DCHECK(trans != head);
CHECK(trans->DisarmInShard(sid));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you already disarm it at line 451?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fetch the local mask, but we disarm only if AWAKED was present. See the comment of DisarmWhen.

The other option is disarming if OUT_OF_ORDER or AWAKED are present and then checking head == trans || head->Disarm which is also no that elegant

The third option is just reading LocalMask and Disarming seprately, like we did before, but it's not that elegant and grows the outside interface, where we only read local mask to disarm them

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I saw the semantics of DisarmInShardWhen and I find it a bit confusing. You returned a mask if we are armed but you disarm only if the mask matches the argument. And now we need to track somehow whether we disarmed the state or not based on the mask returned. I do not have a good suggestion, just stating it was confusing to follow. maybe we should add some comments to line 451.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now we need to track somehow whether we disarmed the state or not based on the mask returned

Well, we enter the if block if we disarmed. Otherwise we assume we didn't. Will add comments

@@ -1023,7 +996,7 @@ void Transaction::FIX_ConcludeJournalExec() {
string Transaction::DEBUG_PrintFailState(ShardId sid) const {
auto res = StrCat(
"usc: ", unique_shard_cnt_, ", name:", GetCId()->name(),
", usecnt:", use_count_.load(memory_order_relaxed), ", runcnt: ", run_barrier_.DEBUG_Count(),
", usecnt:", use_count_.load(memory_order_relaxed), ", runcnt: ", 0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runcnt 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

@dranikpg dranikpg requested a review from romange March 14, 2024 13:31
Co-authored-by: Roman Gershman <romange@gmail.com>
Signed-off-by: Vladislav <vladislav.oleshko@gmail.com>
@dranikpg dranikpg enabled auto-merge (squash) March 14, 2024 14:24
@dranikpg dranikpg merged commit 9c6e6a9 into dragonflydb:main Mar 14, 2024
7 checks passed
@dranikpg dranikpg deleted the tx-new-sync branch March 28, 2024 21:00
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 3, 2024
…nfly ( v1.15.1 → v1.16.0 ) (#3354)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.15.1` -> `v1.16.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.16.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.16.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.15.1...v1.16.0)

##### Dragonfly v1.16.0

Our spring release. We are getting closer to 2.0 with some very exciting
features ahead. Stay tuned!

Some prominent changes include:

- Improved memory accounting of client connections
([#&#8203;2710](https://togithub.com/dragonflydb/dragonfly/issues/2710)
[#&#8203;2755](https://togithub.com/dragonflydb/dragonfly/issues/2755)
and
[#&#8203;2692](https://togithub.com/dragonflydb/dragonfly/issues/2692) )
- FT.AGGREGATE call
([#&#8203;2413](https://togithub.com/dragonflydb/dragonfly/issues/2413))
- Properly handle and replicate Memcache flags
([#&#8203;2787](https://togithub.com/dragonflydb/dragonfly/issues/2787)
[#&#8203;2807](https://togithub.com/dragonflydb/dragonfly/issues/2807))
- Intoduce BF.AGGREGATE BD.(M)ADD and BF.(M)EXISTS methods
([#&#8203;2801](https://togithub.com/dragonflydb/dragonfly/issues/2801)).
Note, that it does not work with snapshots and replication yet.
- Dragonfly builds natively on MacOS. We would love some help with
extending the release pipeline to create a proper macos binary.
- Following the requests from the Edge developers community, we added a
basic HTTP API support! Try running Dragonfly with:
`--expose_http_api` flag and then call `curl -X POST -d '["ping"]'
localhost:6379/api`. We will follow up with more extensive docs later
this month.
- Lots of stability fixes, especially around Sidekiq and BullMQ
workloads.

##### What's Changed

- chore: make usan asan optional and enable them on CI by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2631
- fix: missing manual trigger for daily sanitizers by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2682
- bug(tiering): fix overflow in page offset calculation and wrong hash
offset calculation by [@&#8203;theyueli](https://togithub.com/theyueli)
in
[dragonflydb/dragonfly#2683
- Chore: Fixed Docker Health Check by
[@&#8203;manojks1999](https://togithub.com/manojks1999) in
[dragonflydb/dragonfly#2659
- chore: Increase disk space in the coverage runs by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2684
- fix(flushall): Decommit memory after releasing tables. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2691
- feat(server): Account for serializer's temporary buffer size by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2689
- refactor: remove FULL-SYNC-CUT cmd
[#&#8203;2687](https://togithub.com/dragonflydb/dragonfly/issues/2687)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2688
- chore: add malloc-based stats and decommit by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2692
- feat(cluster): automatic slot migration finalization
[#&#8203;2697](https://togithub.com/dragonflydb/dragonfly/issues/2697)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2698
- Basic FT.AGGREGATE by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2413
- chore: little transaction cleanup by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2608
- fix(channel store): add acquire/release pair in fast update path by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2704
- chore: add ubuntu22 devcontainer by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2700
- feat(cluster): Add `--cluster_id` flag by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2695
- feat(server): Use mimalloc in SSL calls by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2710
- chore: Pull helio with new BlockingCounter by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2711
- chore(transaction): Simplify PollExecution by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2712
- chore(transaction): Don't call GetLocalMask from blocking controller
by [@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2715
- chore: improve compatibility of EXPIRE functions with Redis by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2696
- chore: disable flaky fuzzy migration test by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2716
- chore: update sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2686
- chore: Use c-ares for resolving hosts in `ProtocolClient` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2719
- Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2699
- chore: Record cmd stat from invoke by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2720
- fix(transaction): nullptr access on non transactional commands by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2724
- fix(BgSave): async from sync by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2702
- chore: remove core/fibers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2723
- fix(transaction): Replace with armed sync point by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2708
- feat(json): Deserialize ReJSON format by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2725
- feat: add flag masteruser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2693
- refactor: block list refactoring
[#&#8203;2580](https://togithub.com/dragonflydb/dragonfly/issues/2580)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2732
- chore: fix DeduceExecMode by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2733
- fix(cluster): Reply with correct `\n` / `\r\n` from `CLUSTER` sub cmd
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2731
- chore: Introduce fiber stack allocator by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2730
- fix(cluster): Save replica ID per replica by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2735
- fix(ssl): Proper cleanup by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2742
- chore: add skeleton files for flat_dfs code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2738
- chore: better error reporting when connecting to tls with plain socket
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2740
- chore: Support json paths without root selector by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2747
- chore: journal cleanup by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2749
- refactor: remove start-slot-migration cmd
[#&#8203;2727](https://togithub.com/dragonflydb/dragonfly/issues/2727)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2728
- feat(server): Add TLS usage to /metrics and `INFO MEMORY` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2755
- chore: preparations for adding flat json support by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2752
- chore(transaction): Introduce RunCallback by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2760
- feat(replication): Do not auto replicate different master by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2753
- Improve Helm chart to be rendered locally and on machines where is not
the application target by [@&#8203;fafg](https://togithub.com/fafg) in
[dragonflydb/dragonfly#2706
- chore: preparation for basic http api by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2764
- feat(server): Add metric for RDB save duration. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2768
- chore: fix flat_dfs read tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2772
- fix(ci): do not overwrite last_log_file among tests by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2759
- feat(server): support cluster replication by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2748
- fix: fiber preempts on read path and OnCbFinish() clears
fetched_items\_ by [@&#8203;kostasrim](https://togithub.com/kostasrim)
in
[dragonflydb/dragonfly#2763
- chore(ci): open last_log_file in append mode by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2776
- doc(README): fix outdated expiry ranges description by
[@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) in
[dragonflydb/dragonfly#2779
- Benchmark runner by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2780
- chore(replication-tests): add cache_mode on test replication all by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2685
- feat(tiering): DiskStorage by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2770
- chore: add a boilerplate for bloom filter family by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2782
- chore: introduce conversion routines between JsonType and FlatJson by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2785
- chore: Fix memcached flags not updated by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2787
- chore: remove duplicate code from dash and simplify by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2765
- chore: disable test_cluster_slot_migration by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2788
- fix: new\[] delete\[] missmatch in disk_storage by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2792
- fix: sanitizers clang build and clean up some warnings by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2793
- chore: add bloom filter class by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2791
- chore: add SBF data structure by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2795
- chore(tiering): Disable compilation for MacOs by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2799
- chore: fix daily build by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2798
- chore: expose SBF via compact_object by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2797
- fix(ci): malloc trim on sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2794
- fix(cluster): Don't miss updates in FLUSHSLOTS by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2783
- feat: add bf.(m)add and bf.(m)exists commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2801
- fix: SBF memory leaks by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2803
- chore: refactor StringFamily::Set to use CmdArgParser by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2800
- fix: propagate memcached flags to replica by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2807
- DFLYMIGRATE ACK refactoring by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2790
- feat: add master lsn and journal_executed dcheck in replica via ping
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2778
- fix: correct json response for errors by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2813
- chore: bloom test - cover corner cases by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2806
- bug(server): do not write lsn opcode to journal by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2814
- chore: Fix build by disabling the tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2821
- fix(replication): replication with multi shard sync enabled lagging by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2823
- fix: io_uring/fibers bug in DnsResolve by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2825

##### New Contributors

- [@&#8203;manojks1999](https://togithub.com/manojks1999) made their
first contribution in
[dragonflydb/dragonfly#2659
- [@&#8203;fafg](https://togithub.com/fafg) made their first
contribution in
[dragonflydb/dragonfly#2706
- [@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) made their
first contribution in
[dragonflydb/dragonfly#2779

##### Huge thanks to all the contributors! ❤️

🇮🇱  🇺🇦

**Full Changelog**:
dragonflydb/dragonfly@v1.15.0...v1.16.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check of whether a tx is ARMED in a shard can be wrong for multi-hop operations.
2 participants