You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
they will be delayed, then there are a lot of merges happening (=when there is a high inserts pressure, and the pool is busy)
they 'touch' all the parts - so if the list of parts is big, it leads to a lot of funny issues:
lot of IOPS on the filesystem
huge transactions in zookeeper
every delete makes the current set of parts inactive. If the rate of deletes & number of parts is big
the number of inactive parts can grow to a very high numbers
cleaning old/inactive parts may become super slow and inefficient.
it increases the zookeeper traffic significantly (get the part list from the replica returns a lot)
they can pollute system.mutations (so select from that can become very slow)
UPDATE _row_exists = 0 WHERE number = 1
all the parts during the lightweight delete get renamed (incrementing the mutation version)
issues with backups
stale replicas will need to do full resync to get back online.
in some conditions the deleted rows will never be removed from the disk automatically (like they are inside a huge 150Gb part, or it's a single part in partition etc.)
Good things comparing to fake column and doing ALTER UPDATE
completely transparent
it creates a mask (_row_exists UInt8 column) only for the parts where it really touches the data
they will be delayed, then there are a lot of merges happening (=when there is a high inserts pressure, and the pool is busy)
they 'touch' all the parts - so if the list of parts is big, it leads to a lot of funny issues:
every delete makes the current set of parts inactive. If the rate of deletes & number of parts is big
the number of inactive parts can grow to a very high numbers
they can pollute system.mutations (so select from that can become very slow)
all the parts during the lightweight delete get renamed (incrementing the mutation version)
some queries get much slower because some optimizations will be disabled: simple count queries are very slow after lightweight deletes #47930
in some conditions the deleted rows will never be removed from the disk automatically (like they are inside a huge 150Gb part, or it's a single part in partition etc.)
Good things comparing to fake column and doing ALTER UPDATE
/cc @zhangjmruc
The text was updated successfully, but these errors were encountered: