OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

reschke · 2024-02-21T14:56:17Z

Just a proof-of-concept.

…cumentStore

stefan-egli · 2024-02-21T15:48:36Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+     * @param revisionChecker filter for revisions (for instance, to check for cluster id)
+     * @return {@link UpdateOp} suitable for shrinking document, {@code null} otherwise
+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {


If doc could be a NodeDocument instead of the generic Document some of those instanceof could be avoided...

I think we can do that once we use that from NodeDocumentStore, not DocumentStore...

stefan-egli · 2024-02-21T15:49:02Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+                }
+            }
+            // sort by age
+            Collections.sort(revs, new Comparator<Revision>() {


wondering if there isn't such a Comparator in oak land already?

Not in Revision, as far as I can tell. I wanted a comparator that sorts by clusterId first; we may not need this if we always filter by cluster id though.

stefan-egli · 2024-02-21T15:59:44Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+            for (Revision r : revs) {
+                if (last != null) {
+                    if (last.getClusterId() == r.getClusterId()) {
+                        clean.removeMapEntry(propertyName, last);
+                    }
+                }
+                last = r;
+            }


this seems to be a bit a more broader GC than expected. It deletes basically all older revisions of the same clusterId. That seems fine - but at that point I'm wondering why restrict it to only branch commits - i.e why specifically only branch commits. Which would lead to the assumption that the idea was to only remove "overwritten branch commits"?

is the clusterId check here still necessary given the predicate check introduced?

this doesn't take the usual 24h GC max time nor active checkpoints into account.

PS:

it doesn't take late-writes into account (i.e. where traversed state isn't equal head state)

all older revisions that are branch commits (filtered earlier in the code); is there more that we can check? That would require _revisions, but those might be in a different document, right?

clusterid check - yes, unless we guarantee that the predicate will filter by id

ack - suggestions?

some ideas:

we could restrict it to "overwritten unmerged branch commits" : those we know are garbage whatsoever

but that might still leave us with a too large doc/prop. we could then try to get the traversed state from "24h ago or the oldest checkpoint" (whichever is older) - then delete anything older than that

but that still might leave us with a too large doc/prop. then we might have to do an impromptu split and move anything younger than the previous into a split doc ...

those 3 cases could be .. test cases .. :)

regarding

get the traversed state from "24h ago or the oldest checkpoint" (whichever is older)

that might actually be a tricky thing to achieve - and I believe we might not have done that properly in the DetailedGC effort so far. I think we might need an actual checkpoint that corresponds to "24h ago"

I think we might need an actual checkpoint that corresponds to "24h ago"

... or maybe not a physical checkpoint, but a root revision that corresponds to reading 24h ago : which we might substitute with corresponding revisions (with timestamp 24h minus 1 millisecond) for each known clusterId ... or something like that ...

we might not have done that properly in the DetailedGC effort so far

... taking that back .. the difference between DetailedGC and this runtime GC case here is : in DetailedGC we're only looking at documents that have not been modified for 24+ hours. That means, reading their traversed state with headRevision of "now" is fine. But in this runtime GC case here, that is not fine (as we need to respect those 24+ hours worth of MVCC)

reschke · 2024-02-22T09:00:12Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+     * @param revisionChecker filter for revisions (for instance, to check for cluster id)
+     * @return {@link UpdateOp} suitable for shrinking document, {@code null} otherwise
+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {


@mbaedke - this is where we could check the feature flug for now...

mbaedke · 2024-02-22T11:59:28Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+     * Produce an {@link UpdateOp} suitable for shrinking branch revision entries for given property in {@link Document}, {@code null} otherwise.
+     * 
+     * @param doc document to inspect for repeated branch commits
+     * @param propertName property to check for


Make it @param propertyName, please.

rishabhdaim · 2024-02-22T16:23:01Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+            Collections.sort(revs, new Comparator<Revision>() {
+                @Override
+                public int compare(Revision r1, Revision r2) {
+                    if (r1.getClusterId() != r2.getClusterId()) {
+                        return r1.getClusterId() - r2.getClusterId();
+                    } else if (r1.getTimestamp() != r2.getTimestamp()) {
+                        return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;
+                    } else {
+                        return r1.getCounter() - r2.getCounter();
+                    }
+                }});


Suggested change

Collections.sort(revs, new Comparator<Revision>() {

@Override

public int compare(Revision r1, Revision r2) {

if (r1.getClusterId() != r2.getClusterId()) {

return r1.getClusterId() - r2.getClusterId();

} else if (r1.getTimestamp() != r2.getTimestamp()) {

return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;

} else {

return r1.getCounter() - r2.getCounter();

}

}});

revs.sort((c1, c2) -> Comparator.comparing(Revision::getClusterId).thenComparing(Revision::getTimestamp).thenComparing(Revision::getCounter).compare(c1, c2));

rishabhdaim · 2024-02-22T16:24:41Z

oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/util/Utils.java

+     */
+    public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {
+        Object t_bc = doc.get("_bc");
+        Object t_property = doc.get(propertyName);


I would use camelCase.

rishabhdaim · 2024-02-23T08:35:40Z

@reschke I wonder if instead of modifying the existing DocumentStore classes we create a new wrapper over the DocumentStore (behind a feature flag just like for throttling) and perform all the diagnostic stuff there, wdyt?

…ail due to 16MB limit Introduced new feature toggle to control the new commit cleanup feature on MongoDocumentStore.

OAK-10657: test one-time cleanup for _childOrder property in MemoryDo…

1e06134

…cumentStore

reschke requested review from mbaedke, stefan-egli and rishabhdaim February 21, 2024 14:56

reschke marked this pull request as draft February 21, 2024 14:56

reschke added 3 commits February 21, 2024 16:13

OAK-10657: nuke the old entries, avoids API change in Revision as well

4866a37

OAK-10657: filter revs by predicate, mostly to match cluster id

23a37bf

OAK-10657: move 'extract cluster id from UpdateOp' into Utils

37658dd

stefan-egli reviewed Feb 21, 2024

View reviewed changes

reschke commented Feb 22, 2024

View reviewed changes

OAK-10657: add retry logic to MongoDocumentStore

94f56dd

mbaedke reviewed Feb 22, 2024

View reviewed changes

rishabhdaim reviewed Feb 22, 2024

View reviewed changes

OAK-10657: MongoDocumentStore: shrink in-DB documents after updates f…

5bc3172

…ail due to 16MB limit Introduced new feature toggle to control the new commit cleanup feature on MongoDocumentStore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

reschke commented Feb 21, 2024

stefan-egli Feb 21, 2024

reschke Feb 21, 2024

stefan-egli Feb 21, 2024

reschke Feb 21, 2024

stefan-egli Feb 21, 2024

stefan-egli Feb 21, 2024

reschke Feb 21, 2024 •

edited

stefan-egli Feb 21, 2024

stefan-egli Feb 21, 2024

stefan-egli Feb 21, 2024

stefan-egli Feb 21, 2024

stefan-egli Feb 21, 2024

reschke Feb 22, 2024

mbaedke Feb 22, 2024

rishabhdaim Feb 22, 2024

rishabhdaim Feb 22, 2024

rishabhdaim commented Feb 23, 2024 •

edited

OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

Are you sure you want to change the base?

OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

Conversation

reschke commented Feb 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reschke Feb 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rishabhdaim commented Feb 23, 2024 • edited

reschke Feb 21, 2024 •

edited

rishabhdaim commented Feb 23, 2024 •

edited