MODE-1988 Added ability to enable still-alpha child optimization logic #884
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As the number of children increase for a node, it becomes more expensive to read and write that node's document representation. The ModeShape 3 rearchitecture included the ability to break up the child references in a parent node's document into multiple separate documents based upon the total number of children. The session cache framework is able to work with a node's representation regardless of the number of block (page) documents are used to store the child references, and the session cache (and update) framework never change the number of documents used to represent a node.
The responsibility of splitting a node's "too many" child references amongst more documents or combining/merging multiple documents with "too few" child references is completely separate and is done asynchronously in a scheduled background process that is called "document optimization".
Prior to this change, there was no way to enable or run document optimization. The thought was that it is an optimization step that would eventually be enabled once users start running into issues with nodes containing very large numbers of child references. Also, it would be enabled only after the rest of the infrastructure was in place and more thoroughly vetted.
When we added federation, we made it possible for connectors to be "pageable", meaning that a connector could return a node document that contained only some of the child references, while the remaining child references would then be accessed as separate documents (i.e., pages). The session cache framework's ability to work with segments" of child references was used for this part, and has shown to be quite useful. IOW, part of the original "paging"/"blocks"/"segments" session cache infrastructure is now being used in federation.
This commit exposes via new configuration options (in both the JSON and EAP subsystem) for running the document optimization process as a scheduled background thread. The scheduling-related fields are identical to the garbage collection fields, while there are two new fields for the document optimization (e.g., the target number of child references to have in each page, and the tolerance allowed between the actual and target numbers before optimization kicks in). Currently both fields have no defaults, which requires users to set them.
At this time, the document optimization process is DISABLED by default, and enabling it requires picking the two fields and results in INFO-level log messages stating that this is a technology preview that should not be used in production. Therefore, the risk of incorporating these changes into the codebase is relatively low.
However, with these changes it is now possible for users to experiment with this feature and help us test it and identify/fix problems. It is not clear how long the feature will remain 'tech preview', but it will remain so only until we're much more satisfied with the stability and quality of the implementation AND have better defaults for the target and tolerance fields (based upon experimental testing).