Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MODE-1988 Added ability to enable still-alpha child optimization logic #884

Merged
merged 2 commits into from Jul 10, 2013

Conversation

rhauch
Copy link
Contributor

@rhauch rhauch commented Jul 10, 2013

As the number of children increase for a node, it becomes more expensive to read and write that node's document representation. The ModeShape 3 rearchitecture included the ability to break up the child references in a parent node's document into multiple separate documents based upon the total number of children. The session cache framework is able to work with a node's representation regardless of the number of block (page) documents are used to store the child references, and the session cache (and update) framework never change the number of documents used to represent a node.

The responsibility of splitting a node's "too many" child references amongst more documents or combining/merging multiple documents with "too few" child references is completely separate and is done asynchronously in a scheduled background process that is called "document optimization".

Prior to this change, there was no way to enable or run document optimization. The thought was that it is an optimization step that would eventually be enabled once users start running into issues with nodes containing very large numbers of child references. Also, it would be enabled only after the rest of the infrastructure was in place and more thoroughly vetted.

When we added federation, we made it possible for connectors to be "pageable", meaning that a connector could return a node document that contained only some of the child references, while the remaining child references would then be accessed as separate documents (i.e., pages). The session cache framework's ability to work with segments" of child references was used for this part, and has shown to be quite useful. IOW, part of the original "paging"/"blocks"/"segments" session cache infrastructure is now being used in federation.

This commit exposes via new configuration options (in both the JSON and EAP subsystem) for running the document optimization process as a scheduled background thread. The scheduling-related fields are identical to the garbage collection fields, while there are two new fields for the document optimization (e.g., the target number of child references to have in each page, and the tolerance allowed between the actual and target numbers before optimization kicks in). Currently both fields have no defaults, which requires users to set them.

At this time, the document optimization process is DISABLED by default, and enabling it requires picking the two fields and results in INFO-level log messages stating that this is a technology preview that should not be used in production. Therefore, the risk of incorporating these changes into the codebase is relatively low.

However, with these changes it is now possible for users to experiment with this feature and help us test it and identify/fix problems. It is not clear how long the feature will remain 'tech preview', but it will remain so only until we're much more satisfied with the stability and quality of the implementation AND have better defaults for the target and tolerance fields (based upon experimental testing).

As the number of children increase for a node, it becomes more expensive
to read and write that node's document representation. The ModeShape 3
rearchitecture included the ability to break up the child references in
a parent node's document into multiple separate documents based upon the
total number of children. The session cache framework is able to work
with a node's representation regardless of the number of block (page)
documents are used to store the child references, and the session cache
(and update) framework never change the number of documents used to
represent a node.

The responsibility of splitting a node's "too many" child references
amongst more documents or combining/merging multiple documents with
"too few" child references is completely separate and is done
asynchronously in a scheduled background process that is called
"document optimization".

Prior to this change, there was no way to enable or run document
optimization. The thought was that it is an optimization step that would
eventually be enabled once users start running into issues with nodes
containing very large numbers of child references. Also, it would be
enabled only after the rest of the infrastructure was in place and more
thoroughly vetted.

When we added federation, we made it possible for connectors to be
"pageable", meaning that a connector could return a node document that
contained only some of the child references, while the remaining
child references would then be accessed as separate documents (i.e.,
pages). The session cache framework's ability to work with "segments"
of child references was used for this part, and has shown to be
quite useful. IOW, part of the original "paging"/"blocks"/"segments"
session cache infrastructure is now being used in federation.

This commit exposes via new configuration options (in both the JSON
and EAP subsystem) for running the document optimization process
as a scheduled background thread. The scheduling-related fields are
identical to the garbage collection fields, while there are two new fields
for the document optimization (e.g., the target number of child references
to have in each page, and the tolerance allowed between the actual and
target numbers before optimization kicks in). Currently both fields
have no defaults, which requires users to set them.

At this time, the document optimization process is DISABLED by default,
and enabling it requires picking the two fields and results in INFO-level
log messages stating that this is a technology preview that should not
be used in production. Therefore, the risk of incorporating these
changes into the codebase is relatively low.

However, with these changes it is now possible for users to experiment
with this feature and help us test it and identify/fix problems.
It is not clear how long the feature will remain 'tech preview',
but it will remain so only until we're much more satisfied
with the stability and quality of the implementation AND have better
defaults for the target and tolerance fields (based upon experimental
testing).
@rhauch rhauch merged commit 5f0cb78 into ModeShape:master Jul 10, 2013
@rhauch
Copy link
Contributor Author

rhauch commented Jul 10, 2013

Merged into the 'master' branch, and the first commit (89bd15f) cherry-picked onto the '3.3.x' branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant