Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

known issues / limitations of lightweight deletes #39870

Closed
filimonov opened this issue Aug 3, 2022 · 4 comments
Closed

known issues / limitations of lightweight deletes #39870

filimonov opened this issue Aug 3, 2022 · 4 comments

Comments

@filimonov
Copy link
Contributor

filimonov commented Aug 3, 2022

  1. they will be delayed, then there are a lot of merges happening (=when there is a high inserts pressure, and the pool is busy)

  2. they 'touch' all the parts - so if the list of parts is big, it leads to a lot of funny issues:

    • lot of IOPS on the filesystem
    • huge transactions in zookeeper
  3. every delete makes the current set of parts inactive. If the rate of deletes & number of parts is big
    the number of inactive parts can grow to a very high numbers

    • cleaning old/inactive parts may become super slow and inefficient.
    • it increases the zookeeper traffic significantly (get the part list from the replica returns a lot)
  4. they can pollute system.mutations (so select from that can become very slow)

    UPDATE _row_exists = 0 WHERE number = 1
    
  5. all the parts during the lightweight delete get renamed (incrementing the mutation version)

    • issues with backups
    • stale replicas will need to do full resync to get back online.
  6. some queries get much slower because some optimizations will be disabled: simple count queries are very slow after lightweight deletes #47930

  7. in some conditions the deleted rows will never be removed from the disk automatically (like they are inside a huge 150Gb part, or it's a single part in partition etc.)

Good things comparing to fake column and doing ALTER UPDATE

  • completely transparent
  • it creates a mask (_row_exists UInt8 column) only for the parts where it really touches the data
  • after optimizing final the column disappears

/cc @zhangjmruc

@alexey-milovidov
Copy link
Member

This list makes sense, but this task is not actionable. Closing.

@filimonov
Copy link
Contributor Author

#50613

@den-crane
Copy link
Contributor

#53083 (comment)

@filimonov
Copy link
Contributor Author

#61959

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants