MODE-1988 Added ability to enable still-alpha child optimization logic #884

rhauch · 2013-07-10T14:47:59Z

As the number of children increase for a node, it becomes more expensive to read and write that node's document representation. The ModeShape 3 rearchitecture included the ability to break up the child references in a parent node's document into multiple separate documents based upon the total number of children. The session cache framework is able to work with a node's representation regardless of the number of block (page) documents are used to store the child references, and the session cache (and update) framework never change the number of documents used to represent a node.

The responsibility of splitting a node's "too many" child references amongst more documents or combining/merging multiple documents with "too few" child references is completely separate and is done asynchronously in a scheduled background process that is called "document optimization".

Prior to this change, there was no way to enable or run document optimization. The thought was that it is an optimization step that would eventually be enabled once users start running into issues with nodes containing very large numbers of child references. Also, it would be enabled only after the rest of the infrastructure was in place and more thoroughly vetted.

When we added federation, we made it possible for connectors to be "pageable", meaning that a connector could return a node document that contained only some of the child references, while the remaining child references would then be accessed as separate documents (i.e., pages). The session cache framework's ability to work with segments" of child references was used for this part, and has shown to be quite useful. IOW, part of the original "paging"/"blocks"/"segments" session cache infrastructure is now being used in federation.

This commit exposes via new configuration options (in both the JSON and EAP subsystem) for running the document optimization process as a scheduled background thread. The scheduling-related fields are identical to the garbage collection fields, while there are two new fields for the document optimization (e.g., the target number of child references to have in each page, and the tolerance allowed between the actual and target numbers before optimization kicks in). Currently both fields have no defaults, which requires users to set them.

At this time, the document optimization process is DISABLED by default, and enabling it requires picking the two fields and results in INFO-level log messages stating that this is a technology preview that should not be used in production. Therefore, the risk of incorporating these changes into the codebase is relatively low.

However, with these changes it is now possible for users to experiment with this feature and help us test it and identify/fix problems. It is not clear how long the feature will remain 'tech preview', but it will remain so only until we're much more satisfied with the stability and quality of the implementation AND have better defaults for the target and tolerance fields (based upon experimental testing).

As the number of children increase for a node, it becomes more expensive to read and write that node's document representation. The ModeShape 3 rearchitecture included the ability to break up the child references in a parent node's document into multiple separate documents based upon the total number of children. The session cache framework is able to work with a node's representation regardless of the number of block (page) documents are used to store the child references, and the session cache (and update) framework never change the number of documents used to represent a node. The responsibility of splitting a node's "too many" child references amongst more documents or combining/merging multiple documents with "too few" child references is completely separate and is done asynchronously in a scheduled background process that is called "document optimization". Prior to this change, there was no way to enable or run document optimization. The thought was that it is an optimization step that would eventually be enabled once users start running into issues with nodes containing very large numbers of child references. Also, it would be enabled only after the rest of the infrastructure was in place and more thoroughly vetted. When we added federation, we made it possible for connectors to be "pageable", meaning that a connector could return a node document that contained only some of the child references, while the remaining child references would then be accessed as separate documents (i.e., pages). The session cache framework's ability to work with "segments" of child references was used for this part, and has shown to be quite useful. IOW, part of the original "paging"/"blocks"/"segments" session cache infrastructure is now being used in federation. This commit exposes via new configuration options (in both the JSON and EAP subsystem) for running the document optimization process as a scheduled background thread. The scheduling-related fields are identical to the garbage collection fields, while there are two new fields for the document optimization (e.g., the target number of child references to have in each page, and the tolerance allowed between the actual and target numbers before optimization kicks in). Currently both fields have no defaults, which requires users to set them. At this time, the document optimization process is DISABLED by default, and enabling it requires picking the two fields and results in INFO-level log messages stating that this is a technology preview that should not be used in production. Therefore, the risk of incorporating these changes into the codebase is relatively low. However, with these changes it is now possible for users to experiment with this feature and help us test it and identify/fix problems. It is not clear how long the feature will remain 'tech preview', but it will remain so only until we're much more satisfied with the stability and quality of the implementation AND have better defaults for the target and tolerance fields (based upon experimental testing).

rhauch · 2013-07-10T17:44:57Z

Merged into the 'master' branch, and the first commit (89bd15f) cherry-picked onto the '3.3.x' branch.

rhauch added 2 commits July 10, 2013 09:21

Fixed compiler warnings from recent commits.

5f0cb78

rhauch merged commit 5f0cb78 into ModeShape:master Jul 10, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MODE-1988 Added ability to enable still-alpha child optimization logic #884

MODE-1988 Added ability to enable still-alpha child optimization logic #884

rhauch commented Jul 10, 2013

rhauch commented Jul 10, 2013

MODE-1988 Added ability to enable still-alpha child optimization logic #884

MODE-1988 Added ability to enable still-alpha child optimization logic #884

Conversation

rhauch commented Jul 10, 2013

rhauch commented Jul 10, 2013