Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory engine: wrong-total in jepsen test #17018

Open
Tracked by #16141
SpadeA-Tang opened this issue May 15, 2024 · 2 comments
Open
Tracked by #16141

In-memory engine: wrong-total in jepsen test #17018

SpadeA-Tang opened this issue May 15, 2024 · 2 comments
Assignees
Labels
jepsen type/bug Type: Issue - Confirmed a bug

Comments

@SpadeA-Tang
Copy link
Member

Bug Report

What version of TiKV are you using?

Master

What operating system and CPU are you using?

Steps to reproduce

What did you expect?

Jepsen pass

What did happened?

{:latency-graph {:valid? true},
  :rate-graph {:valid? true},
  :valid? true},
 :workload
 {:SI
  {:valid? false,
   :read-count 142404,
   :error-count 1,
   :first-error
   {:type :wrong-total,
    :total 61,
    :op
    {:type :ok,
     :f :read,
     :process 1,
     :time 38284047629,
     :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
     :txn-info {:start_ts 449754326792667154},
     :index 33719}},
   :errors
   {:wrong-total
    {:count 1,
     :first
     {:type :wrong-total,
      :total 61,
      :op
      {:type :ok,
       :f :read,
       :process 1,
       :time 38284047629,
       :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
       :txn-info {:start_ts 449754326792667154},
       :index 33719}},
     :worst
     {:type :wrong-total,
      :total 61,
      :op
      {:type :ok,
       :f :read,
       :process 1,
       :time 38284047629,
       :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
       :txn-info {:start_ts 449754326792667154},
       :index 33719}},
     :last
     {:type :wrong-total,
      :total 61,
      :op
      {:type :ok,
       :f :read,
       :process 1,
       :time 38284047629,
       :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
       :txn-info {:start_ts 449754326792667154},
       :index 33719}},
     :lowest
     {:type :wrong-total,
      :total 61,
      :op
      {:type :ok,
       :f :read,
       :process 1,
       :time 38284047629,
       :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
       :txn-info {:start_ts 449754326792667154},
       :index 33719}},
     :highest
     {:type :wrong-total,
      :total 61,
      :op
      {:type :ok,
       :f :read,
       :process 1,
       :time 38284047629,
       :value {0 12, 1 25, 2 10, 3 7, 6 5, 7 2},
       :txn-info {:start_ts 449754326792667154},
       :index 33719}}}}},
  :plot {:valid? true},
  :valid? false},
 :valid? false}
@SpadeA-Tang SpadeA-Tang added the type/bug Type: Issue - Confirmed a bug label May 15, 2024
@SpadeA-Tang SpadeA-Tang self-assigned this May 15, 2024
@SpadeA-Tang
Copy link
Member Author

I have found some bugs or missing part leading to this error:

  1. Delete lock directly violates the write atomicity. We can make the write complete then delete the lock.
  2. Missing handles in remove peers, merge, apply snapshot: all theses cases should evict the relevant regions.

@SpadeA-Tang
Copy link
Member Author

In addition, using same seqno for the whole batch also leads to this type of error. If there's a put lock and delete lock in the same batch with the same user key, the delete will be hidden by the put.
We use increment the seqno for each key just like what does in rocksdb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jepsen type/bug Type: Issue - Confirmed a bug
Projects
None yet
Development

No branches or pull requests

1 participant