Improve consistency and isolation semantics by adding Context parameter to DB API #2310

PokIsemaine · 2024-05-14T14:38:41Z

Search before asking

I had searched in the issues and found no similar issues.

Motivation

The current DB API may use multiple read operations or have nested calls. These read operations do not use fixed snapshots, which may cause different snapshot data to be read during a single operation, causing inconsistency.

Solution

Referring to the current LatestSnapshot and GetOptions, we can add a Context parameter to each DB API, through which the API can pass a definite snapshot.

After a few simple attempts, I found that changes often have a ripple effect, requiring multiple modules to change their APIs at the same time, which can result in a huge PR that is difficult to break down into multiple smaller PRs. I'm going to try to give a draft PR in the near future to get a rough idea of what to do, and then gradually refine the changes to each module.

Are you willing to submit a PR?

I'm willing to submit a PR!

PokIsemaine · 2024-05-16T07:13:09Z

Need some help:

struct Context {
  explicit Context(engine::Storage *storage) : storage_(storage), snapshot_(storage->GetDB()->GetSnapshot()) {}

  engine::Storage *storage_ = nullptr;
  const rocksdb::Snapshot *snapshot_ = nullptr;
  rocksdb::WriteBatchWithIndex* batch_ = nullptr;

  Context() = default;
  rocksdb::ReadOptions GetReadOptions();
  const rocksdb::Snapshot *GetSnapShot();
};

This is the general idea: pass in the Context to the Database API, use a fixed snapshot when calling, and this snapshot will not change during the entire calling process. When we need to read our own data for the current operation, we use WriteBatchWithIndex + GetFromBatchAndDB to obtain the data batch=>db(snapshot=ctx.snapshot).

Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from WriteBatch to ctx.WriteBatchWithIndex? In this way, I only need to modify the last Write part of storage, instead of modifying ctx.WriteBatchWithIndex when the DB API modifies WriteBatch.

mapleFU · 2024-05-16T07:20:43Z

Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from WriteBatch to ctx.WriteBatchWithIndex? In this way, I only need to modify the last Write part of storage, instead of modifying ctx.WriteBatchWithIndex when the DB API modifies WriteBatch.

rocksdb has the relation below:

WriteBatchBase
WriteBatchWithIndex : WriteBatchBase
WriteBatch : WriteBatchBase

Should we switch to WriteBatchBase in some place? Besides, WriteBatchWithIndex has a WriteBatch* GetWriteBatch() override; interface here

PokIsemaine · 2024-05-16T07:24:28Z

Need some help:
struct Context {
  explicit Context(engine::Storage *storage) : storage_(storage), snapshot_(storage->GetDB()->GetSnapshot()) {}

  engine::Storage *storage_ = nullptr;
  const rocksdb::Snapshot *snapshot_ = nullptr;
  rocksdb::WriteBatchWithIndex* batch_ = nullptr;

  Context() = default;
  rocksdb::ReadOptions GetReadOptions();
  const rocksdb::Snapshot *GetSnapShot();
};
This is the general idea: pass in the Context to the Database API, use a fixed snapshot when calling, and this snapshot will not change during the entire calling process. When we need to read our own data for the current operation, we use WriteBatchWithIndex + GetFromBatchAndDB to obtain the data batch=>db(snapshot=ctx.snapshot).

Most operations nowadays use WriteBatch. Is there any way to copy the operation sequence from WriteBatch to ctx.WriteBatchWithIndex? In this way, I only need to modify the last Write part of storage, instead of modifying ctx.WriteBatchWithIndex when the DB API modifies WriteBatch.

Correction: Because there may be multiple Writes, they should be appended to ctx.WriteBatchWithIndex instead of simply copied

mapleFU · 2024-05-16T09:30:21Z

@PokIsemaine I've checked that most output uses GetWriteBatch with a WriteBatchBase, would that ok for the scenerio here?

PokIsemaine · 2024-05-18T03:52:42Z

I think there are currently two ways:

Keep the current WriteBatch, then use WriteBatch.Iterator(&handler) when writing, and refer to batch_debugger.h to write a WriteBatch::Handler that appends WriteBatch operations to WriteBatchWithIndex one by one.
Get WriteBatchWithIndex in GetWriteBatchBase, but I found that WriteBatchWithIndex cannot support all operations of WriteBatch, such as DeleteRange. Even after using WriteBatchWithIndex::GetWriteBatch after DeleteRange, we cannot index the effect of DeleteRange in Batch through GetFromBatchAndDB.

For the DeleteRange operation, maybe we need to switch to for + Delete, but I don't know if this will have a big performance impact. If it is the first type, we just add Batch but do not perform Write operation, what will happen?

我认为目前两种方式：

保留现在的 WriteBatch，然后在 Write 的时候使用 WriteBatch.Iterator(&handler), 并参考 batch_debugger.h 的方式编写一个将 WriteBatch 操作逐个追加到 WriteBatchWithIndex 的 WriteBatch::Handler。
GetWriteBatchBase 中改为获取 WriteBatchWithIndex，但是我发现 WriteBatchWithIndex 并不能支持 WriteBatch 的所有操作，例如 DeleteRange。即使使用 WriteBatchWithIndex::GetWriteBatch 后 DeleteRange，我们并不能通过 GetFromBatchAndDB 在 Batch 中索引到 DeleteRange 的效果。

对于 DeleteRange 操作，或许我们需要转为 for + Delete 的方式，但我不清楚这样是否会有很大的性能影响。如果是第一种，我们只是加入 Batch 但不进行 Write 操作，这样又会怎么样。

PokIsemaine · 2024-05-18T03:58:12Z

Some other questions:

What isolation level can we expect if kvrocks requests are processed by multiple threads without using transactions? Serializable, snapshot isolation, or something else?
What resources are specifically protected by LockGuard for write operations?

另外的一些疑问：

如果不使用事务，多线程处理 kvrocks 请求，我们期望什么样的隔离级别？可串行化、快照隔离还是其他的情况？
写操作的 LockGuard 具体保护的资源是什么？

mapleFU · 2024-05-18T05:00:16Z

(2) LockGuard protect the "keys" for operation. During writing, it first collects the key it tents to write, and Lock the all keys we would like to write

(1) Is a interesting problem, I think we're looking forward to a SI ( snapshot isolation ). This cannot avoid concurrency read-modify-write sequence to same key. And it would making single write or multiple write operations seeing the same snapshot

git-hulk · 2024-05-23T05:17:39Z

For the DeleteRange operation, maybe we need to switch to for + Delete, but I don't know if this will have a big performance impact. If it is the first type, we just add Batch but do not perform Write operation, what will happen?

This should be unacceptable for the performance consideration. For example, it might cause the Kvrocks stop to service if it runs the FLUSH[DB|ALL] command with a large number of keys in the DB. To mitigate this, it'd be better to disallow using the DeleteRange operation in the transaction if necessary.

mapleFU · 2024-05-23T05:19:44Z

I also think maybe we should checkout how we use DeleteRange. It can always be a performance hack since it's a "hack" in implementation

PokIsemaine · 2024-05-23T05:28:14Z

For the DeleteRange operation, maybe we need to switch to for + Delete, but I don't know if this will have a big performance impact. If it is the first type, we just add Batch but do not perform Write operation, what will happen?

This should be unacceptable for the performance consideration. For example, it might cause the Kvrocker stop to service if it runs the FLUSH[DB|ALL] command with a large number of keys in the DB. To mitigate this, it'd be better to disallow using the DeleteRange operation in the transaction if necessary.

Does this impact occur when db_->Write is executed? What if I just use WriteBatchWithIndex to log the operations performed but don't do a Write ?

mapleFU · 2024-05-23T05:31:23Z

https://github.com/facebook/rocksdb/wiki/DeleteRange-Implementation

To be short DeleteRange write a explicit "range" tombstone, and this would affect scan, read, compaction...

PokIsemaine · 2024-05-23T05:37:38Z

I also think maybe we should checkout how we use DeleteRange. It can always be a performance hack since it's a "hack" in implementation

Currently it mainly affects kvrocks2redis, rdb and some test cases.

PokIsemaine · 2024-05-23T05:45:13Z

Progress synchronization:

Keep the current WriteBatch, then use WriteBatch.Iterator(&handler) when writing, and refer to batch_debugger.h to write a WriteBatch::Handler that appends WriteBatch operations to WriteBatchWithIndex one by one.

I'm trying the first option but currently in a specific version of GitHub Action, the golang test times out and I'm currently troubleshooting why.
unstable...PokIsemaine:kvrocks:unstable

git-hulk · 2024-05-23T05:48:50Z

@PokIsemaine I'm sorry for didn't quite understand why we need to care about the WriteBatch. From the issue description, what we want to solve is that may use the undetermined snapshot while reading with nested read operations? If yes, passing the snapshot via the context is a good solution.

For the write operation and transaction mode, the lock of the key should be able to protect the corresponding key won't be changed, so it should be good to leave it as it is.

Help to correct me if I missed anything.

mapleFU · 2024-05-23T05:54:00Z

I'm sorry for didn't quite understand why we need to care about the WriteBatch. From the issue description, what we want to solve is that may use the undetermined snapshot while reading with nested read operations? If yes, passing the snapshot via the context is a good solution.

The main-issue is framework-level read-after-write. I like a command issues a write and read after it. It's hard to ensure read "own-workspace" after this command may writing some data

git-hulk · 2024-05-23T06:00:31Z

I see, thank you. But I can't remember if any commands will write-then-read except the Lua and transaction.

mapleFU · 2024-05-23T06:01:37Z

@git-hulk @PokIsemaine I think currently we can first reject "DeleteRange" and check out other command works?

IMO DeleteRange is mostly used in background. Maybe I can recheck the logic use "DeleteRange" in commands

PragmaTwice · 2024-05-23T06:07:33Z

Yeah the current conversation goes beyond the issue title. I think it's related to this project: https://summer-ospp.ac.cn/org/prodetail/249430512?lang=en&list=pro
We can have separate issues for it.

PokIsemaine · 2024-05-25T07:07:11Z

Yeah the current conversation goes beyond the issue title. I think it's related to this project: https://summer-ospp.ac.cn/org/prodetail/249430512?lang=en&list=pro We can have separate issues for it.

Yes, we are discussing this project. I tried changing the title of this issue and grouping it into a new OSPP Tracking issues.

PokIsemaine added the enhancement type enhancement label May 14, 2024

git-hulk assigned PokIsemaine May 14, 2024

PokIsemaine mentioned this issue May 23, 2024

Transaction with FlushDB：DeleteRange unsupported in WriteBatchWithIndex #2326

Closed

2 tasks

PokIsemaine changed the title ~~Read the determined rocksdb snapshot by passing the Context parameter~~ Improve consistency and isolation semantics by adding Context parameter to DB API May 25, 2024

PokIsemaine mentioned this issue May 25, 2024

[OSPP 2024] Tracking issues: Enhance Kvrocks Transaction Syntax #2331

Open

7 tasks

PokIsemaine mentioned this issue May 25, 2024

[draft] Improve consistency and isolation semantics by adding Context parameter to DB API #2332

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve consistency and isolation semantics by adding Context parameter to DB API #2310

Improve consistency and isolation semantics by adding Context parameter to DB API #2310

PokIsemaine commented May 14, 2024

PokIsemaine commented May 16, 2024 •

edited

Loading

mapleFU commented May 16, 2024

PokIsemaine commented May 16, 2024

mapleFU commented May 16, 2024

PokIsemaine commented May 18, 2024

PokIsemaine commented May 18, 2024 •

edited

Loading

mapleFU commented May 18, 2024

git-hulk commented May 23, 2024 •

edited

Loading

mapleFU commented May 23, 2024

PokIsemaine commented May 23, 2024

mapleFU commented May 23, 2024

PokIsemaine commented May 23, 2024

PokIsemaine commented May 23, 2024 •

edited

Loading

git-hulk commented May 23, 2024

mapleFU commented May 23, 2024

git-hulk commented May 23, 2024

mapleFU commented May 23, 2024 •

edited

Loading

PragmaTwice commented May 23, 2024 •

edited

Loading

PokIsemaine commented May 25, 2024

Improve consistency and isolation semantics by adding Context parameter to DB API #2310

Improve consistency and isolation semantics by adding Context parameter to DB API #2310

Comments

PokIsemaine commented May 14, 2024

Search before asking

Motivation

Solution

Are you willing to submit a PR?

PokIsemaine commented May 16, 2024 • edited Loading

mapleFU commented May 16, 2024

PokIsemaine commented May 16, 2024

mapleFU commented May 16, 2024

PokIsemaine commented May 18, 2024

PokIsemaine commented May 18, 2024 • edited Loading

mapleFU commented May 18, 2024

git-hulk commented May 23, 2024 • edited Loading

mapleFU commented May 23, 2024

PokIsemaine commented May 23, 2024

mapleFU commented May 23, 2024

PokIsemaine commented May 23, 2024

PokIsemaine commented May 23, 2024 • edited Loading

git-hulk commented May 23, 2024

mapleFU commented May 23, 2024

git-hulk commented May 23, 2024

mapleFU commented May 23, 2024 • edited Loading

PragmaTwice commented May 23, 2024 • edited Loading

PokIsemaine commented May 25, 2024

PokIsemaine commented May 16, 2024 •

edited

Loading

PokIsemaine commented May 18, 2024 •

edited

Loading

git-hulk commented May 23, 2024 •

edited

Loading

PokIsemaine commented May 23, 2024 •

edited

Loading

mapleFU commented May 23, 2024 •

edited

Loading

PragmaTwice commented May 23, 2024 •

edited

Loading