Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add master lsn and journal_executed dcheck in replica via ping #2778

Merged
merged 16 commits into from
Apr 1, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Mar 26, 2024

resolves #2773

  • add lsn number to journal ping
  • add periodic ping from master to replica in journal
  • add dcheck in replica that journal_executed == lsn
  • add version 4 in dfly version
  • add separate counter for journal_executed that has proper semantics for pinging lsn

@kostasrim kostasrim self-assigned this Mar 26, 2024
@kostasrim
Copy link
Contributor Author

@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:

There are two options:

  1. Ping at period P on a separate fiber.
  2. Ping at period P when we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).

I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send PING LSN journal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will need n=number_of_flows fibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).

Do you have any objections with (2)?

// TODO remove incrementing lsn on master side otherwise we break takeover
// journal_rec_executed_.fetch_add(1, std::memory_order_relaxed);
if (tx_data->lsn != journal_rec_executed_.load()) {
// TODO LOG
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adiholden we should not DCHECK here. Lagging should not crash us on debug builds (mentioning this because it was mentioned in the issue)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not crash? if we reproduce this in tests or some run I want to know we have a bug here so we can debug this.
The check is not that replica is lagging behind master but that the count of journal changes is different between master and replica meaning some data did not reach to replica / we do not count correctly in replica the journal changes

@adiholden
Copy link
Collaborator

@adiholden This is just a prototype so don't review -- it needs polishing and some fixing/gluing. I opened this because I have a small question:

There are two options:

  1. Ping at period P on a separate fiber.
  2. Ping at period P when we Record an entry in the journal (that means that if the master is idle there will be no pings, even if Period internal was reached).

I opted in for 2 (although it's easy to switch to 1). The reason is that if master is idle, then the last recorded entry (or one of the last within 2 seconds) will send PING LSN journal entry and since master won't progress anyway the last lag will show how close the replica is. The downside of this is that we won't get continuous updates on our progression on the replica side. Also, note for (1) we will need n=number_of_flows fibers whereas with (2) we don't need (it flows naturally over the flow of stable sync and journal recording).

Do you have any objections with (2)?

Option 2 sounds good

@kostasrim
Copy link
Contributor Author

@adiholden replication tests should fail -- I am chasing two missing LSN's 😛 In the meantime, you can leave comments :)

@kostasrim kostasrim marked this pull request as ready for review March 27, 2024 12:31
@kostasrim kostasrim changed the title feat: add master lag check in replica feat: add master lsn and journal_executed dcheck in replica via ping Mar 27, 2024
} else if (tx_data->opcode == journal::Op::EXEC) {
if (use_multi_shard_exe_sync_) {
records = tx_data->journal_rec_count;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will revert this change.

@@ -503,7 +503,8 @@ OpStatus DflyCmd::StartStableSyncInThread(FlowInfo* flow, Context* cntx, EngineS

if (shard != nullptr) {
flow->streamer.reset(new JournalStreamer(sf_->journal(), cntx));
flow->streamer->Start(flow->conn->socket());
const bool should_ping = flow->version == DflyVersion::VER4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flow->version >= DflyVersion::VER4

@@ -195,7 +195,11 @@ io::Result<journal::ParsedEntry> JournalReader::ReadEntry() {
entry.dbid = dbid_;
entry.opcode = opcode;

if (opcode == journal::Op::PING || opcode == journal::Op::FIN) {
if (opcode == journal::Op::PING) {
SET_OR_UNEXPECT(ReadUInt<uint64_t>(), entry.lsn);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we currently send ping in replica takeover flow.
so you need to read entry.lsn only if master version is ver4 or higher

if (use_multi_shard_exe_sync_) {
InsertTxDataToShardResource(std::move(*tx_data));
} else {
ExecuteTxWithNoShardSync(std::move(*tx_data), cntx);
}
}
journal_rec_executed_.fetch_add(records);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote in the issue that we need to compare with journal_rec_executed_, but actually now after reviewing the code I understand that we need a different counter to compare to which will be incremented inside NextTxData when reading another entry from socket.
The reason for this is that under use_multi_shard_exe_sync_ we accumulate the multi transaction data and do not execute it therefor you might get different value in journal_rec_executed_ when comparing when getting a ping if it is in the middle of multi transaction

Copy link
Contributor Author

@kostasrim kostasrim Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this is that under use_multi_shard_exe_sync_ we accumulate the multi transaction data and do not execute it therefor you might get different value in journal_rec_executed_

Exactly. That's why I moved the fetch_add from execute and replaced it here, so this doesn't happen. I am not 100% sure of this is correct but for now I am looking at why one of the replication tests triggers the DCHECK (and it's not because of multi). I will take care of this soon :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but journal_rec_executed_ was incremented after executing the command on purpose. replica sends master this value so that we will know the replica lag and that we can also do replica takeover only after lag is 0 meaning we will have no records that are not executed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I understand this and that's why I said it's probably wrong. Ended up using a separate counter in NextTxData but we still have some issues :)

NotifyWritten(allow_await);
});
if (with_pings) {
periodic_ping_.MaybePing(allow_await);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I dont this flow is working if we have few replicas.
this class is created per replica but here you increase the lsn and send to only single replica the journal change
To make this flow work you should write this code in journal slice so that the ping will be sent to all registered callbacks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware but what you suggest would also be problematic and a bug. We use Journal::RegisterOnChange in other places and we do not only register callbacks associated with StableSync. We also for example register a callback during snapshot (see snapshot.cc:65). So if we were to apply the ping in all callbacks we would send multiple pings for execution paths that are completely irrelevant. For stable sync, we register one callback per flow and each of the flows should periodically ping its lsn.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how was this resolved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fairly certain that what @adiholden suggests here is a bug. We should NOT send a ping for every registered callback because some of the registered callbacks are completely irrelevant with StableSync. An example of that is snapshot, that does call RegisterOnChange and most certainty you wouldn't want to ping there.

Each shard on master has a local JournalSlice. Even if we have less shards available on the replica, we still have master number of flows. So for example:

master(3 shards)  [shard 1] [shard 2] [shard 3]
                   |       /            |
replica(2 shards)  [shard 1]     [shard 2]

Here we have 3 flows each having their local lsn's retrieved from the thread/shard local variable JournalSlice. However, on the replica side, we track LSN's via a local variable of the flow. So, when shard2 sends its lsn to replica shard 1, it does not overwrite a thread local on the replica side, but a member variable that tracks the lsn's for that flow and therefore we should be safe.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasrim you resolved this now by not sending the LSN but a local variable you keep track of in the streamer class - you named it total_records_.
With this change we might miss something when trying to understand where is our bug in replica takeover
We conclude replica takeover when the LSN for replica is the same as master
If we missed sending some data from master to replica but it did get to the journal we will not see it. If we sent wrong lsn value when full sync finished we will not see this.
Also regarding my suggestion above it is not a bug, the only think is that we will write a ping to journal when sometimes it is not needed i.e when snapshotting from SAVE command

Copy link
Collaborator

@adiholden adiholden Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

journal_rec_executed and tx_reader.NextTxData are on the replica side and I dont see how they are relevant here. Inorder to make sure we get all the journal changes in replica side and that we are in sync with lsn we must send the lsn from master. Because the lsn is saved in the journal_slice we need to write the lsn value in hold by this class.
I do think that changing journal slice write entry Op::PING with the lsn is the right way. you can ignore the Op::PING in snapshoting as we do for Op::EXEC and for Op::NOOP. I dont see this as mixing irrelevant flows. We want to be able to track lsn so we must record it. Weather we want to track it is the callback which is writing the sync to decide, in snapshot no in steamer yes.
This will also simplify the flow in replica cause you will always increase the lsn on PING on master side you will always increase journal_rec_executed on replica side

Copy link
Contributor Author

@kostasrim kostasrim Mar 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the changes but now we trigger the DCHECK almost every time on test_replication_all which IMO I think it should not happen. So there is either something that we are missing or a bug.

This will also simplify the flow in replica cause you will always increase the lsn on PING on master side you will always increase journal_rec_executed on replica side

We still need a separate variable for this because journal_rec_executed gets incremented interleaved. See also your comment above:

I wrote in the issue that we need to compare with journal_rec_executed_, but actually now after reviewing the code I understand that we need a different counter to compare to which will be incremented inside NextTxData when reading another entry from socket.
The reason for this is that under use_multi_shard_exe_sync_ we accumulate the multi transaction data and do not execute it therefor you might get different value in journal_rec_executed_ when comparing when getting a ping if it is in the middle of multi transaction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any ideas?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you did are again sending pings from the streamer and not from journal slice.. this was my first comment in this thread that this will not work because when having several replicas you send the ping to only one and increase the lag for all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I made the changes, however the DCHECK still triggers. I will investigate. Let me know if you got any ideas as well

@@ -83,6 +83,10 @@ LSN Journal::GetLsn() const {
return journal_slice.cur_lsn();
}

LSN Journal::PostIncrLsn() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a postincrlsn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostIncrementLsn -> PostIncrLsn

@@ -234,6 +234,7 @@ class DflyShardReplica : public ProtocolClient {
// **executed** records, which might be received interleaved when commands
// run out-of-order on the master instance.
std::atomic_uint64_t journal_rec_executed_ = 0;
std::atomic_uint64_t lsn_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why it is atomic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be, just like journal_rec_executed and some other atomics. I will patch this, give me a second

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please change only lsn_ though.

void JournalStreamer::PeriodicPing::MaybePing(bool allow_await) {
const auto now = std::chrono::system_clock::now();
const auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time_);
if (elapsed > kLimit) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where kLimit is defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static member of JournalStreamer::PeriodicPing

return AwaitIfWritten();
}
++total_records_;
LOG(INFO) << "TOTAL RECORDS " << total_records_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove this?

void MaybePing(bool allow_await);
void Start();

static constexpr std::chrono::seconds kLimit{2};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to move to cc, and have a better name

void Start();

static constexpr std::chrono::seconds kLimit{2};
friend JournalStreamer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need a friend here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't, it was a leftover

if (tx_data->lsn != 0) {
const uint64_t expect = lsn_.load();
const bool is_expected = tx_data->lsn == expect;
LOG(INFO) << "tx_data->lsn=" << tx_data->lsn << " lsn_=" << expect;
Copy link
Collaborator

@romange romange Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have this situation, the logs will continuously output this and will fill the volume.
please use LOG_FIRST_N (..1000) for that

src/server/replica.cc Outdated Show resolved Hide resolved
@romange
Copy link
Collaborator

romange commented Mar 28, 2024

@kostasrim should I review?

@kostasrim
Copy link
Contributor Author

@romange yes I think your comments are addressed. Let me know :)

romange
romange previously approved these changes Mar 28, 2024
@@ -65,6 +66,7 @@ struct TransactionReader {
// Stores ongoing multi transaction data.
absl::flat_hash_map<TxId, TransactionData> current_;
bool accumulate_multi_ = false;
int64_t total_ = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this used?

LOG_FIRST_N(INFO, 10) << "tx_data->lsn=" << tx_data->lsn << " lsn_=" << expect;
DCHECK(is_expected) << "tx_data->lsn=" << tx_data->lsn << "lsn=" << expect;
} else {
journal_rec_executed_.fetch_add(1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flow is so confusing. In replica takeover we send Op::PING but you dont fill the entry.lsn so it is 0 and therefor you end up increasing journal_rec_executed_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's not ideal. If it's 0 it means that it's a Ping coming from REPLTAKEOVER. LSN start at 1, so 0 is used here to denote that the PING is used without the extension, that is PING 0 == old PING and PING >=1 == PING LSN.

and also I treat PING LSNas separate and that's why they don't participate in incrementing journal_rec_executed.

const uint64_t expect = lsn_;
const bool is_expected = tx_data->lsn == expect;
LOG_FIRST_N(INFO, 10) << "tx_data->lsn=" << tx_data->lsn << " lsn_=" << expect;
DCHECK(is_expected) << "tx_data->lsn=" << tx_data->lsn << "lsn=" << expect;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DCHECK_EQ(tx_data->lsn, lsn_)

if (tx_data->lsn != 0) {
const uint64_t expect = lsn_;
const bool is_expected = tx_data->lsn == expect;
LOG_FIRST_N(INFO, 10) << "tx_data->lsn=" << tx_data->lsn << " lsn_=" << expect;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_IF_EVERY_N(WARNING, tx_data->lsn != lsn_, 1000)

src/server/replica.cc Outdated Show resolved Hide resolved
@@ -54,8 +54,10 @@ void TransactionData::AddEntry(journal::ParsedEntry&& entry) {
opcode = entry.opcode;

switch (entry.opcode) {
case journal::Op::PING:
case journal::Op::LSN:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we said we would extend PING. I would have done the same :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that extending PING instead of adding another opcode would be the right way. BUT when going into the code I understand that serializer writer can not decide weather to add the lsn data to ping because it is used for all registered callbacks and some of them may support the new format and some may not. f.e we can have 2 replicas one with ver4 and other with vers3. so serializer which is used in journal silce which first serializes the entry and than call all the register callbacks can not decide if to add the lsn data or not. There for introducing new Opcode is the best solution

Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
romange
romange previously approved these changes Apr 1, 2024
Signed-off-by: adi_holden <adi@dragonflydb.io>
romange
romange previously approved these changes Apr 1, 2024
Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden adiholden merged commit b2e2ad6 into main Apr 1, 2024
10 checks passed
@adiholden adiholden deleted the master_lag_check branch April 1, 2024 14:51
szinn pushed a commit to szinn/k8s-homelab that referenced this pull request Apr 3, 2024
…nfly ( v1.15.1 → v1.16.0 ) (#3354)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.15.1` -> `v1.16.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.16.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.16.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.15.1...v1.16.0)

##### Dragonfly v1.16.0

Our spring release. We are getting closer to 2.0 with some very exciting
features ahead. Stay tuned!

Some prominent changes include:

- Improved memory accounting of client connections
([#&#8203;2710](https://togithub.com/dragonflydb/dragonfly/issues/2710)
[#&#8203;2755](https://togithub.com/dragonflydb/dragonfly/issues/2755)
and
[#&#8203;2692](https://togithub.com/dragonflydb/dragonfly/issues/2692) )
- FT.AGGREGATE call
([#&#8203;2413](https://togithub.com/dragonflydb/dragonfly/issues/2413))
- Properly handle and replicate Memcache flags
([#&#8203;2787](https://togithub.com/dragonflydb/dragonfly/issues/2787)
[#&#8203;2807](https://togithub.com/dragonflydb/dragonfly/issues/2807))
- Intoduce BF.AGGREGATE BD.(M)ADD and BF.(M)EXISTS methods
([#&#8203;2801](https://togithub.com/dragonflydb/dragonfly/issues/2801)).
Note, that it does not work with snapshots and replication yet.
- Dragonfly builds natively on MacOS. We would love some help with
extending the release pipeline to create a proper macos binary.
- Following the requests from the Edge developers community, we added a
basic HTTP API support! Try running Dragonfly with:
`--expose_http_api` flag and then call `curl -X POST -d '["ping"]'
localhost:6379/api`. We will follow up with more extensive docs later
this month.
- Lots of stability fixes, especially around Sidekiq and BullMQ
workloads.

##### What's Changed

- chore: make usan asan optional and enable them on CI by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2631
- fix: missing manual trigger for daily sanitizers by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2682
- bug(tiering): fix overflow in page offset calculation and wrong hash
offset calculation by [@&#8203;theyueli](https://togithub.com/theyueli)
in
[dragonflydb/dragonfly#2683
- Chore: Fixed Docker Health Check by
[@&#8203;manojks1999](https://togithub.com/manojks1999) in
[dragonflydb/dragonfly#2659
- chore: Increase disk space in the coverage runs by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2684
- fix(flushall): Decommit memory after releasing tables. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2691
- feat(server): Account for serializer's temporary buffer size by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2689
- refactor: remove FULL-SYNC-CUT cmd
[#&#8203;2687](https://togithub.com/dragonflydb/dragonfly/issues/2687)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2688
- chore: add malloc-based stats and decommit by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2692
- feat(cluster): automatic slot migration finalization
[#&#8203;2697](https://togithub.com/dragonflydb/dragonfly/issues/2697)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2698
- Basic FT.AGGREGATE by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2413
- chore: little transaction cleanup by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2608
- fix(channel store): add acquire/release pair in fast update path by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2704
- chore: add ubuntu22 devcontainer by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2700
- feat(cluster): Add `--cluster_id` flag by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2695
- feat(server): Use mimalloc in SSL calls by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2710
- chore: Pull helio with new BlockingCounter by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2711
- chore(transaction): Simplify PollExecution by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2712
- chore(transaction): Don't call GetLocalMask from blocking controller
by [@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2715
- chore: improve compatibility of EXPIRE functions with Redis by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2696
- chore: disable flaky fuzzy migration test by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2716
- chore: update sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2686
- chore: Use c-ares for resolving hosts in `ProtocolClient` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2719
- Remove check-fail in ExpireIfNeeded and introduce DFLY LOAD by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2699
- chore: Record cmd stat from invoke by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2720
- fix(transaction): nullptr access on non transactional commands by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2724
- fix(BgSave): async from sync by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2702
- chore: remove core/fibers by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2723
- fix(transaction): Replace with armed sync point by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2708
- feat(json): Deserialize ReJSON format by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2725
- feat: add flag masteruser by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2693
- refactor: block list refactoring
[#&#8203;2580](https://togithub.com/dragonflydb/dragonfly/issues/2580)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2732
- chore: fix DeduceExecMode by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2733
- fix(cluster): Reply with correct `\n` / `\r\n` from `CLUSTER` sub cmd
by [@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2731
- chore: Introduce fiber stack allocator by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2730
- fix(cluster): Save replica ID per replica by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2735
- fix(ssl): Proper cleanup by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2742
- chore: add skeleton files for flat_dfs code by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2738
- chore: better error reporting when connecting to tls with plain socket
by [@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2740
- chore: Support json paths without root selector by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2747
- chore: journal cleanup by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2749
- refactor: remove start-slot-migration cmd
[#&#8203;2727](https://togithub.com/dragonflydb/dragonfly/issues/2727)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2728
- feat(server): Add TLS usage to /metrics and `INFO MEMORY` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2755
- chore: preparations for adding flat json support by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2752
- chore(transaction): Introduce RunCallback by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2760
- feat(replication): Do not auto replicate different master by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2753
- Improve Helm chart to be rendered locally and on machines where is not
the application target by [@&#8203;fafg](https://togithub.com/fafg) in
[dragonflydb/dragonfly#2706
- chore: preparation for basic http api by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2764
- feat(server): Add metric for RDB save duration. by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2768
- chore: fix flat_dfs read tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2772
- fix(ci): do not overwrite last_log_file among tests by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2759
- feat(server): support cluster replication by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2748
- fix: fiber preempts on read path and OnCbFinish() clears
fetched_items\_ by [@&#8203;kostasrim](https://togithub.com/kostasrim)
in
[dragonflydb/dragonfly#2763
- chore(ci): open last_log_file in append mode by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2776
- doc(README): fix outdated expiry ranges description by
[@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) in
[dragonflydb/dragonfly#2779
- Benchmark runner by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2780
- chore(replication-tests): add cache_mode on test replication all by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2685
- feat(tiering): DiskStorage by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2770
- chore: add a boilerplate for bloom filter family by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2782
- chore: introduce conversion routines between JsonType and FlatJson by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2785
- chore: Fix memcached flags not updated by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2787
- chore: remove duplicate code from dash and simplify by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2765
- chore: disable test_cluster_slot_migration by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2788
- fix: new\[] delete\[] missmatch in disk_storage by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2792
- fix: sanitizers clang build and clean up some warnings by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2793
- chore: add bloom filter class by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2791
- chore: add SBF data structure by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2795
- chore(tiering): Disable compilation for MacOs by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[dragonflydb/dragonfly#2799
- chore: fix daily build by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2798
- chore: expose SBF via compact_object by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2797
- fix(ci): malloc trim on sanitizers workflow by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2794
- fix(cluster): Don't miss updates in FLUSHSLOTS by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[dragonflydb/dragonfly#2783
- feat: add bf.(m)add and bf.(m)exists commands by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2801
- fix: SBF memory leaks by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2803
- chore: refactor StringFamily::Set to use CmdArgParser by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2800
- fix: propagate memcached flags to replica by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2807
- DFLYMIGRATE ACK refactoring by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[dragonflydb/dragonfly#2790
- feat: add master lsn and journal_executed dcheck in replica via ping
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[dragonflydb/dragonfly#2778
- fix: correct json response for errors by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2813
- chore: bloom test - cover corner cases by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2806
- bug(server): do not write lsn opcode to journal by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2814
- chore: Fix build by disabling the tests. by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2821
- fix(replication): replication with multi shard sync enabled lagging by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[dragonflydb/dragonfly#2823
- fix: io_uring/fibers bug in DnsResolve by
[@&#8203;romange](https://togithub.com/romange) in
[dragonflydb/dragonfly#2825

##### New Contributors

- [@&#8203;manojks1999](https://togithub.com/manojks1999) made their
first contribution in
[dragonflydb/dragonfly#2659
- [@&#8203;fafg](https://togithub.com/fafg) made their first
contribution in
[dragonflydb/dragonfly#2706
- [@&#8203;enjoy-binbin](https://togithub.com/enjoy-binbin) made their
first contribution in
[dragonflydb/dragonfly#2779

##### Huge thanks to all the contributors! ❤️

🇮🇱  🇺🇦

**Full Changelog**:
dragonflydb/dragonfly@v1.15.0...v1.16.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4yNzkuMCIsInVwZGF0ZWRJblZlciI6IjM3LjI3OS4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add master Lag check in replica side to check if replica is out of sync
3 participants