Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify single version compaction #1303

Closed

Conversation

ou05020
Copy link
Contributor

@ou05020 ou05020 commented Apr 14, 2020

When NumVersionsToKeep is 1, deleted and expired entries should always be removed if there is no overlap with lower levels. Otherwise, deleted entries will continue to grow for heavy write/delete use cases with many transient keys.

This PR adds simplified logic when NumVersionsToKeep is 1 that always sets the skipKey and only writes deleted or expired entries when hasOverlap is true.


This change is Reviewable

@muXxer
Copy link
Contributor

muXxer commented May 7, 2020

badger_gc
I guess we have the same problem in our software.
Here you can see a chart of the database size. We are deleting a lot of entries, but only the value size decreases at garbage collection.
NumVersionsToKeep is set to 1, because using 0 would never delete the entries/values in the database....

@misterpickypants
Copy link

@manishrjain @jarifibrahim Any chance you could review this? This was another fix that we needed along with #1302.

Copy link
Contributor

@jarifibrahim jarifibrahim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this @ou05020 ! The fix works for the most part but I think we might have to make some changes. The following test fails on this branch (and master) . Let me know if you need help fixing them. I can help in fixing the failures.

diff --git a/levels_test.go b/levels_test.go
index 8c7df15..1703141 100644
--- a/levels_test.go
+++ b/levels_test.go
@@ -17,6 +17,7 @@
 package badger
 
 import (
+	"fmt"
 	"math"
 	"testing"
 	"time"
@@ -139,11 +140,11 @@ func getAllAndCheck(t *testing.T, db *DB, expected []keyValVersion) {
 		defer it.Close()
 		i := 0
 		for it.Rewind(); it.Valid(); it.Next() {
-			require.Less(t, i, len(expected), "DB has more number of key than expected")
+			// require.Less(t, i, len(expected), "DB has more number of key than expected")
 			item := it.Item()
 			v, err := item.ValueCopy(nil)
 			require.NoError(t, err)
-			// fmt.Printf("k: %s v: %d val: %s\n", item.key, item.Version(), v)
+			fmt.Printf("k: %s v: %d val: %s\n", item.key, item.Version(), v)
 			expect := expected[i]
 			require.Equal(t, expect.key, string(item.Key()), "expected key: %s actual key: %s",
 				expect.key, item.Key())
@@ -155,7 +156,7 @@ func getAllAndCheck(t *testing.T, db *DB, expected []keyValVersion) {
 				"key: %s expected meta: %d meta %d", item.key, expect.meta, item.meta)
 			i++
 		}
-		require.Equal(t, len(expected), i, "keys examined should be equal to keys expected")
+		require.Equal(t, len(expected), i, "keys examined %d should be equal to keys expected %d", i, len(expected))
 		return nil
 	})
 
@@ -254,6 +255,58 @@ func TestCompaction(t *testing.T) {
 			getAllAndCheck(t, db, []keyValVersion{{"foo", "bar", 3, 0}, {"fooz", "baz", 1, 0}})
 		})
 	})
+	t.Run("level 1 to level 2 with delete", func(t *testing.T) {
+		t.Run("with overlap", func(t *testing.T) {
+			runBadgerTest(t, &opt, func(t *testing.T, db *DB) {
+				l1 := []keyValVersion{{"foo", "bar", 3, bitDelete}, {"fooz", "baz", 1, bitDelete}}
+				l2 := []keyValVersion{{"foo", "bar", 2, 0}}
+				l3 := []keyValVersion{{"foo", "bar", 1, 0}}
+				createAndOpen(db, l1, 1)
+				createAndOpen(db, l2, 2)
+				createAndOpen(db, l3, 3)
+
+				// Set a high discard timestamp so that all the keys are below the discard timestamp.
+				db.SetDiscardTs(10)
+
+				getAllAndCheck(t, db, []keyValVersion{
+					{"foo", "bar", 3, 1}, {"foo", "bar", 2, 0}, {"foo", "bar", 1, 0}, {"fooz", "baz", 1, 1},
+				})
+				cdef := compactDef{
+					thisLevel: db.lc.levels[1],
+					nextLevel: db.lc.levels[2],
+					top:       db.lc.levels[1].tables,
+					bot:       db.lc.levels[2].tables,
+				}
+				require.NoError(t, db.lc.runCompactDef(1, cdef))
+				// foo version 2 should be dropped after compaction.
+				getAllAndCheck(t, db, []keyValVersion{{"foo", "bar", 3, bitDelete}, {"foo", "bar", 1, 0}})
+			})
+		})
+		t.Run("without overlap", func(t *testing.T) {
+			runBadgerTest(t, &opt, func(t *testing.T, db *DB) {
+				l1 := []keyValVersion{{"foo", "bar", 3, bitDelete}, {"fooz", "baz", 1, bitDelete}}
+				l2 := []keyValVersion{{"fooo", "barr", 2, 0}}
+				createAndOpen(db, l1, 1)
+				createAndOpen(db, l2, 2)
+
+				// Set a high discard timestamp so that all the keys are below the discard timestamp.
+				db.SetDiscardTs(10)
+
+				getAllAndCheck(t, db, []keyValVersion{
+					{"foo", "bar", 3, 1}, {"fooo", "barr", 2, 0}, {"fooz", "baz", 1, 1},
+				})
+				cdef := compactDef{
+					thisLevel: db.lc.levels[1],
+					nextLevel: db.lc.levels[2],
+					top:       db.lc.levels[1].tables,
+					bot:       db.lc.levels[2].tables,
+				}
+				require.NoError(t, db.lc.runCompactDef(1, cdef))
+				// foo version 2 should be dropped after compaction.
+				getAllAndCheck(t, db, []keyValVersion{{"fooo", "barr", 2, 0}})
+			})
+		})
+	})
 }
 
 func TestHeadKeyCleanup(t *testing.T) {

@ou05020 ou05020 force-pushed the single-version-lsm-compaction branch from 3462d8b to c52e68a Compare May 29, 2020 15:06
Copy link
Contributor Author

@ou05020 ou05020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jarifibrahim I have incorporated your test with modifications.

The "level 1 to level 2 with delete/with overlap" test was missing "fooz/baz" in the expected results.

That entry still exists after compaction because hasOverlap in the compactBuildTables method is true which is expected because that is the purpose of the test and hasOverlap is not checked per key. hasOverlap is set true if any keys in the compaction tables overlap with lower layers which happened to be the case for the foo key in this test.

During the first compaction, all L1 items were moved to L2. Running a subsequent compaction with L2 and L3 results in removal of all keys.

Reviewable status: 0 of 2 files reviewed, all discussions resolved (waiting on @ashish-goswami and @manishrjain)

@jarifibrahim
Copy link
Contributor

@ou05020 Thanks! I didn't realize there was an overlap.

// When `NumVersionsToKeep == 1` the skipKey should always be set
// and there is no need for `lastValidVersion`. The only required check
// is to determine if a deleted key should be written due to `hasOverlap`.
if s.kv.opt.NumVersionsToKeep == 1 {
Copy link
Contributor

@jarifibrahim jarifibrahim May 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ou05020 I think all the changes in levels.go file can be replaced with the following

diff --git a/levels.go b/levels.go
index 764fcd9..0121650 100644
--- a/levels.go
+++ b/levels.go
@@ -578,6 +578,16 @@ func (s *levelsController) compactBuildTables(
 				// only valid version for a running transaction.
 				numVersions++
 
+				if isDeletedOrExpired(vs.Meta, vs.ExpiresAt) && !hasOverlap &&
+					s.kv.opt.NumVersionsToKeep == 1 {
+					skipKey = y.SafeCopy(skipKey, it.Key())
+
+					numSkips++
+					updateStats(vs)
+					continue
+				}
+
 				// Keep the current version and discard all the next versions if
 				// - The `discardEarlierVersions` bit is set OR
 				// - We've already processed `NumVersionsToKeep` number of versions

I tried this and everything seems to be working fine. I'll push some more tests on this PR to ensure we've covered all the cases. We have very few tests for compaction which were added recently.

Copy link
Contributor Author

@ou05020 ou05020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 files reviewed, 1 unresolved discussion (waiting on @ashish-goswami, @jarifibrahim, and @manishrjain)


levels.go, line 579 at r2 (raw file):

Previously, jarifibrahim (Ibrahim Jarif) wrote…

@ou05020 I think all the changes in levels.go file can be replaced with the following

diff --git a/levels.go b/levels.go
index 764fcd9..0121650 100644
--- a/levels.go
+++ b/levels.go
@@ -578,6 +578,16 @@ func (s *levelsController) compactBuildTables(
 				// only valid version for a running transaction.
 				numVersions++
 
+				if isDeletedOrExpired(vs.Meta, vs.ExpiresAt) && !hasOverlap &&
+					s.kv.opt.NumVersionsToKeep == 1 {
+					skipKey = y.SafeCopy(skipKey, it.Key())
+
+					numSkips++
+					updateStats(vs)
+					continue
+				}
+
 				// Keep the current version and discard all the next versions if
 				// - The `discardEarlierVersions` bit is set OR
 				// - We've already processed `NumVersionsToKeep` number of versions

I tried this and everything seems to be working fine. I'll push some more tests on this PR to ensure we've covered all the cases. We have very few tests for compaction which were added recently.

@jarifibrahim That will work.

But I think the fix should also be expanded to address deleted/expired keys when NumVersionsToKeep > 1.

Consider the following compaction scenarios with num versions = 3 assuming no overlap:

Before Compaction -> After Compaction

v3, v2 delete, v1 -> v3
v3, v2, v1 delete -> v3, v2, v1 delete
v3 delete, v2, v1 ->

When the deleted/expired key is the last valid version it is kept even when there is no overlap. The second scenario above should have resulted in v3, v2 keys remaining.

The changes proposed in issue 1228 or PR #1354 will handle NumVersionsToKeep >= 1.

#1228 (comment)

@jarifibrahim
Copy link
Contributor

@ou05020 I'm going to close this PR in favor of PR #1354 since PR #1354 has simpler changes. Thank you so much for fixing this issue, we appreciate your contribution. We have cherry-picked your commit in PR #1354 and so you would be the co-author of that PR.

Thank you so much for your help 🎉
I would appreciate if you can also review #1354.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants