Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OAK-10657: shrink in-DB documents after updates fail due to 16MB limit #1314

Draft
wants to merge 6 commits into
base: trunk
Choose a base branch
from

Conversation

reschke
Copy link
Contributor

@reschke reschke commented Feb 21, 2024

Just a proof-of-concept.

@reschke reschke marked this pull request as draft February 21, 2024 14:56
* @param revisionChecker filter for revisions (for instance, to check for cluster id)
* @return {@link UpdateOp} suitable for shrinking document, {@code null} otherwise
*/
public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doc could be a NodeDocument instead of the generic Document some of those instanceof could be avoided...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do that once we use that from NodeDocumentStore, not DocumentStore...

}
}
// sort by age
Collections.sort(revs, new Comparator<Revision>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if there isn't such a Comparator in oak land already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in Revision, as far as I can tell. I wanted a comparator that sorts by clusterId first; we may not need this if we always filter by cluster id though.

Comment on lines +334 to +341
for (Revision r : revs) {
if (last != null) {
if (last.getClusterId() == r.getClusterId()) {
clean.removeMapEntry(propertyName, last);
}
}
last = r;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • this seems to be a bit a more broader GC than expected. It deletes basically all older revisions of the same clusterId. That seems fine - but at that point I'm wondering why restrict it to only branch commits - i.e why specifically only branch commits. Which would lead to the assumption that the idea was to only remove "overwritten branch commits"?
  • is the clusterId check here still necessary given the predicate check introduced?
  • this doesn't take the usual 24h GC max time nor active checkpoints into account.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS:

  • it doesn't take late-writes into account (i.e. where traversed state isn't equal head state)

Copy link
Contributor Author

@reschke reschke Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • all older revisions that are branch commits (filtered earlier in the code); is there more that we can check? That would require _revisions, but those might be in a different document, right?
  • clusterid check - yes, unless we guarantee that the predicate will filter by id
  • ack - suggestions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some ideas:

  • we could restrict it to "overwritten unmerged branch commits" : those we know are garbage whatsoever
  • but that might still leave us with a too large doc/prop. we could then try to get the traversed state from "24h ago or the oldest checkpoint" (whichever is older) - then delete anything older than that
  • but that still might leave us with a too large doc/prop. then we might have to do an impromptu split and move anything younger than the previous into a split doc ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those 3 cases could be .. test cases .. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding

get the traversed state from "24h ago or the oldest checkpoint" (whichever is older)

that might actually be a tricky thing to achieve - and I believe we might not have done that properly in the DetailedGC effort so far. I think we might need an actual checkpoint that corresponds to "24h ago"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need an actual checkpoint that corresponds to "24h ago"

... or maybe not a physical checkpoint, but a root revision that corresponds to reading 24h ago : which we might substitute with corresponding revisions (with timestamp 24h minus 1 millisecond) for each known clusterId ... or something like that ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might not have done that properly in the DetailedGC effort so far

... taking that back .. the difference between DetailedGC and this runtime GC case here is : in DetailedGC we're only looking at documents that have not been modified for 24+ hours. That means, reading their traversed state with headRevision of "now" is fine. But in this runtime GC case here, that is not fine (as we need to respect those 24+ hours worth of MVCC)

* @param revisionChecker filter for revisions (for instance, to check for cluster id)
* @return {@link UpdateOp} suitable for shrinking document, {@code null} otherwise
*/
public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbaedke - this is where we could check the feature flug for now...

* Produce an {@link UpdateOp} suitable for shrinking branch revision entries for given property in {@link Document}, {@code null} otherwise.
*
* @param doc document to inspect for repeated branch commits
* @param propertName property to check for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it @param propertyName, please.

Comment on lines +320 to +330
Collections.sort(revs, new Comparator<Revision>() {
@Override
public int compare(Revision r1, Revision r2) {
if (r1.getClusterId() != r2.getClusterId()) {
return r1.getClusterId() - r2.getClusterId();
} else if (r1.getTimestamp() != r2.getTimestamp()) {
return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;
} else {
return r1.getCounter() - r2.getCounter();
}
}});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Collections.sort(revs, new Comparator<Revision>() {
@Override
public int compare(Revision r1, Revision r2) {
if (r1.getClusterId() != r2.getClusterId()) {
return r1.getClusterId() - r2.getClusterId();
} else if (r1.getTimestamp() != r2.getTimestamp()) {
return r1.getTimestamp() > r2.getTimestamp() ? 1 : -1;
} else {
return r1.getCounter() - r2.getCounter();
}
}});
revs.sort((c1, c2) -> Comparator.comparing(Revision::getClusterId).thenComparing(Revision::getTimestamp).thenComparing(Revision::getCounter).compare(c1, c2));

*/
public static @Nullable UpdateOp getShrinkOp(Document doc, String propertyName, Predicate<Revision> revisionChecker) {
Object t_bc = doc.get("_bc");
Object t_property = doc.get(propertyName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use camelCase.

@rishabhdaim
Copy link
Contributor

rishabhdaim commented Feb 23, 2024

@reschke I wonder if instead of modifying the existing DocumentStore classes we create a new wrapper over the DocumentStore (behind a feature flag just like for throttling) and perform all the diagnostic stuff there, wdyt?

…ail due to 16MB limit

Introduced new feature toggle to control the new commit cleanup feature on MongoDocumentStore.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants