Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify when to use DELETE vs. TRUNCATE #4094

Merged
merged 1 commit into from
Nov 28, 2018

Conversation

rmloveland
Copy link
Contributor

Fixes #4088.

Summary of changes:

  • Update DELETE page to remove unnecessary TRUNCATE example, and add
    a note saying to prefer TRUNCATE for tables >= 1000 rows.

  • Update TRUNCATE page to expand the initial description of its
    behavior slightly, and to add a note saying to prefer DELETE for
    tables with < 1000 rows.

  • Various small copy edits: typo fix; link update; whitespace.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland
Copy link
Contributor Author

Direct links for easier reference (these are for the stable versions, but the "dev" versions have the same changes):

@rmloveland rmloveland requested a review from dt November 21, 2018 21:59
Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM.

Drive by comment on factual accuracy of existing content that maybe @vivekmenezes can confirm.

Copy link
Contributor

@lnhsingh lnhsingh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


v2.1/truncate.md, line 7 at r1 (raw file):

---

The `TRUNCATE` [statement](sql-statements.html) removes all rows from a table.  At a high level, it works by dropping the table and recreating a new table with the same name.

Remove double space!

@rmloveland
Copy link
Contributor Author

Remove double space!

Fixed, thanks @lhirata

@@ -58,7 +57,7 @@ If disk usage is a concern, there are two potential solutions. The
first is to [reduce the time-to-live](configure-replication-zones.html)
(TTL) for the zone, which will cause garbage collection to clean up
deleted rows more frequently. Second, unlike `DELETE`,
[truncate](truncate.html) immediately deletes the entire table, so
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I think maybe GH ate my comment. Maybe @vivekmenezes can chime in here, but AFAIK, TRUNCATE doesn't delete the on-disk data any sooner -- it just avoids writing deletes for every row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I think it did eat it. Definitely did not receive this info earlier. Thanks for posting it again @dt

So if I'm reading you right, I have the following statements/questions which I would appreciate your (and/or @vivekmenezes ') input on:

  1. DELETE works by issuing deletes per-row (are these "delete intents" (term?) that are written to each row and later cleaned up during a GC pass? is that why you say they involve a write? if so, makes sense that it would be slower)
  2. TRUNCATE works by marking (?) the entire table all at once for deletion (using same model from Completed draft of start a local cluster #1 above, if accurate)
  3. Because of this, TRUNCATE is faster from the user POV because it writes less to disk, i.e. O(1) (table) instead of O(n) ( # of rows touched) -- though it doesn't actually delete the on-disk data sooner in terms of clock time since that is determined by when GC runs - yes?

If the above are approximately right, my (weak) opinion is that from the user's POV the semantics of TRUNCATE are still best expressed as "TRUNCATE immediately deletes the entire table". But it depends on how much of the machinery of deletion/GC we want to expose in the docs. Our users may really need this info to avoid confusion. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to says.

  1. Normally, TRUNCATE should be used to delete all the table data, but TRUNCATE is implemented as a schema change which is not transactional. Therefore some data added within a short interval after the TRUNCATE might get deleted by the truncate.
  2. Alternatively, DELETE can be use for transactional deletion of table data, but care must be taken because it will not work on very large tables because of the large transactions are not supported limitation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that if you want the disk space back, neither TRUNCATE nor DELETE are going to give it to you, but the current text seems to imply that TRUNCATE will which is what I was objecting to. To be fair, it is better than DELETEs in terms of disk space since it doesn't write a tombstone on every row, but it still isn't going to free it up any sooner than the GC time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks David and Vivek for the info. Sounds like neither statement is useful for reclaiming disk, so we need to remove that bit.

Just pushed a change (also copied below) to reflect that neither statement will give your disk space back (soon), and you must update your GC TTL settings. Let me know what you think.

If disk usage is a concern, the solution is to reduce the time-to-live (TTL) for the zone by setting gc.ttlseconds to a lower value, which will cause garbage collection to clean up deleted objects (rows, tables) more frequently.

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


v2.2/delete.md, line 112 at r2 (raw file):

~~~

#### Delete rows Using non-unique columns

While you're here, would you mind changing Using to using here and in the 2.1 version?

Copy link
Contributor Author

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


v2.2/delete.md, line 112 at r2 (raw file):

Previously, jseldess (Jesse Seldess) wrote…

While you're here, would you mind changing Using to using here and in the 2.1 version?

Fixed, thanks for the catch.

Fixes #4088.

Summary of changes:

- Update `DELETE` page to remove unnecessary `TRUNCATE` example, and add
  a note saying to prefer `TRUNCATE` for tables >= 1000 rows.  Also note
  that if you have disk space concerns you must update the GC TTL.

- Update `TRUNCATE` page to expand the initial description of its
  behavior slightly, and to add a note saying to prefer `DELETE` for
  tables with <= 1000 rows.

- Various small copy edits: typo fix; link update; whitespace.
@rmloveland
Copy link
Contributor Author

Just spoke with David and Vivek IRL - merging this.

@rmloveland rmloveland merged commit acad3d2 into master Nov 28, 2018
@rmloveland rmloveland deleted the truncate-delete-infinite-loop branch November 28, 2018 20:10
@jseldess
Copy link
Contributor

@rmloveland, while you're here, would you please mention on the drop database, drop table, and truncate pages that you can view the progress of these operations via show jobs or the jobs page of the admin ui? Issue: #4103

@jseldess
Copy link
Contributor

Whoops, @rmloveland, I just realized my last comment was post-merge. Could you open a follow-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix English language infinite loop between TRUNCATE and DELETE pages
6 participants