Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test/persist: add test that compaction is happening as expected #10541

Merged
merged 1 commit into from
Feb 9, 2022

Conversation

ruchirK
Copy link
Contributor

@ruchirK ruchirK commented Feb 8, 2022

This commit adds a smoke test to check that persisted tables are being compacted
as they should be, using metrics for the number of batches in a persistent trace
to assert that the number decreases as expected.

This test is a little bit brittle, as it is very coupled to the way trace compaction
works today. A better alternative would be some kind of system table that can
automatically return details like the since frontier or the number of trace
batches. That is left as a followup.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR adds a release note for any user-facing behavior changes.

Touches MaterializeInc#10533

This commit adds a smoke test to check that persisted tables are being compacted
as they should be, using metrics for the number of batches in a persistent trace
to assert that the number decreases as expected.

This test is a little bit brittle, as it is very coupled to the way trace compaction
works today. A better alternative would be some kind of system table that can
automatically return details like the since frontier or the number of trace
batches. That is left as a followup.
@ruchirK
Copy link
Contributor Author

ruchirK commented Feb 9, 2022

verified that this fails without the fix for #10533 Hopefully, this test can help us prevent future regressions!

@aljoscha
Copy link
Contributor

aljoscha commented Feb 9, 2022

I think the test looks fine, but yes, it's also closely tied to how compaction works today. Could we maybe just do a lot of inserts, then , then verify that the blob count is below some reasonable threshold?

Copy link
Contributor

@philip-stoev philip-stoev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! Lets see if there is any flakiness introduced and we will reevaluate.

@ruchirK
Copy link
Contributor Author

ruchirK commented Feb 9, 2022

We can actually do better than passively waiting for flakiness

altaria-2:materialize Test$ git diff
diff --git a/test/persistence/mzcompose.py b/test/persistence/mzcompose.py
index 1cc13734a..3e47767f8 100644
--- a/test/persistence/mzcompose.py
+++ b/test/persistence/mzcompose.py
@@ -168,3 +168,8 @@ def workflow_compaction(c: Composition) -> None:
     c.rm("materialized", "testdrive-svc", destroy_volumes=True)

     c.rm_volumes("mzdata")
+
+
+def workflow_stress_compaction(c: Composition) -> None:
+    for i in range(0, 200):
+        workflow_compaction(c)
altaria-2:materialize Test$

running this now to see if we observe any failures over 200 runs. will merge if thats green!

@ruchirK
Copy link
Contributor Author

ruchirK commented Feb 9, 2022

I think the test looks fine, but yes, it's also closely tied to how compaction works today. Could we maybe just do a lot of inserts, then , then verify that the blob count is below some reasonable threshold?

I thought about this and felt like it was non-trivial to write a test that would not give either a large percentage of false negatives (flakes) or false positives (fail to see a regression). When you send a large number of inserts its a bit opaque how many trace batches will get produced and its similarly opaque when those trace batches will get compacted. That makes it hard to figure out:

  • how should we set the baseline number of trace batches to look for?
  • how long should we wait to declare failure?

This test concretely observes the number of trace batches increasing and decreasing, and should break if we change trace compaction - but hopefully I'll be the only person hit by that for a while, and we can revisit the test when we have better introspection primitives

@ruchirK
Copy link
Contributor Author

ruchirK commented Feb 9, 2022

Aljoscha approved from Slack and this test ran 200 times without error so merging! TFTR!

@ruchirK ruchirK merged commit 6059761 into MaterializeInc:main Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants