-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify when to use DELETE vs. TRUNCATE #4094
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LGTM.
Drive by comment on factual accuracy of existing content that maybe @vivekmenezes can confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
v2.1/truncate.md, line 7 at r1 (raw file):
--- The `TRUNCATE` [statement](sql-statements.html) removes all rows from a table. At a high level, it works by dropping the table and recreating a new table with the same name.
Remove double space!
cdab970
to
b9587a0
Compare
Fixed, thanks @lhirata |
@@ -58,7 +57,7 @@ If disk usage is a concern, there are two potential solutions. The | |||
first is to [reduce the time-to-live](configure-replication-zones.html) | |||
(TTL) for the zone, which will cause garbage collection to clean up | |||
deleted rows more frequently. Second, unlike `DELETE`, | |||
[truncate](truncate.html) immediately deletes the entire table, so |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, I think maybe GH ate my comment. Maybe @vivekmenezes can chime in here, but AFAIK, TRUNCATE doesn't delete the on-disk data any sooner -- it just avoids writing deletes for every row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I think it did eat it. Definitely did not receive this info earlier. Thanks for posting it again @dt
So if I'm reading you right, I have the following statements/questions which I would appreciate your (and/or @vivekmenezes ') input on:
- DELETE works by issuing deletes per-row (are these "delete intents" (term?) that are written to each row and later cleaned up during a GC pass? is that why you say they involve a write? if so, makes sense that it would be slower)
- TRUNCATE works by marking (?) the entire table all at once for deletion (using same model from Completed draft of start a local cluster #1 above, if accurate)
- Because of this, TRUNCATE is faster from the user POV because it writes less to disk, i.e. O(1) (table) instead of O(n) ( # of rows touched) -- though it doesn't actually delete the on-disk data sooner in terms of clock time since that is determined by when GC runs - yes?
If the above are approximately right, my (weak) opinion is that from the user's POV the semantics of TRUNCATE are still best expressed as "TRUNCATE immediately deletes the entire table". But it depends on how much of the machinery of deletion/GC we want to expose in the docs. Our users may really need this info to avoid confusion. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to says.
- Normally, TRUNCATE should be used to delete all the table data, but TRUNCATE is implemented as a schema change which is not transactional. Therefore some data added within a short interval after the TRUNCATE might get deleted by the truncate.
- Alternatively, DELETE can be use for transactional deletion of table data, but care must be taken because it will not work on very large tables because of the large transactions are not supported limitation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that if you want the disk space back, neither TRUNCATE nor DELETE are going to give it to you, but the current text seems to imply that TRUNCATE will which is what I was objecting to. To be fair, it is better than DELETE
s in terms of disk space since it doesn't write a tombstone on every row, but it still isn't going to free it up any sooner than the GC time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks David and Vivek for the info. Sounds like neither statement is useful for reclaiming disk, so we need to remove that bit.
Just pushed a change (also copied below) to reflect that neither statement will give your disk space back (soon), and you must update your GC TTL settings. Let me know what you think.
If disk usage is a concern, the solution is to reduce the time-to-live (TTL) for the zone by setting
gc.ttlseconds
to a lower value, which will cause garbage collection to clean up deleted objects (rows, tables) more frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained
v2.2/delete.md, line 112 at r2 (raw file):
~~~ #### Delete rows Using non-unique columns
While you're here, would you mind changing Using
to using
here and in the 2.1 version?
b9587a0
to
c9e36bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
v2.2/delete.md, line 112 at r2 (raw file):
Previously, jseldess (Jesse Seldess) wrote…
While you're here, would you mind changing
Using
tousing
here and in the 2.1 version?
Fixed, thanks for the catch.
Fixes #4088. Summary of changes: - Update `DELETE` page to remove unnecessary `TRUNCATE` example, and add a note saying to prefer `TRUNCATE` for tables >= 1000 rows. Also note that if you have disk space concerns you must update the GC TTL. - Update `TRUNCATE` page to expand the initial description of its behavior slightly, and to add a note saying to prefer `DELETE` for tables with <= 1000 rows. - Various small copy edits: typo fix; link update; whitespace.
33380e4
to
81e7ad6
Compare
Just spoke with David and Vivek IRL - merging this. |
@rmloveland, while you're here, would you please mention on the drop database, drop table, and truncate pages that you can view the progress of these operations via |
Whoops, @rmloveland, I just realized my last comment was post-merge. Could you open a follow-up? |
Fixes #4088.
Summary of changes:
Update
DELETE
page to remove unnecessaryTRUNCATE
example, and adda note saying to prefer
TRUNCATE
for tables >= 1000 rows.Update
TRUNCATE
page to expand the initial description of itsbehavior slightly, and to add a note saying to prefer
DELETE
fortables with < 1000 rows.
Various small copy edits: typo fix; link update; whitespace.