Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deep stack sizes when serializing some schemas #331

Conversation

stevedlawrence
Copy link
Member

The "parents" val in a DPathCompileInfo is a backpointer to all
DPathCompileInfo's that reference it. The problem with this is that when
elements are shared, these backpointers create a highly connected graph
that requires a large stack to serialize using the default java
serialization as it jumps around parents and children. To avoid this
large stack requirement, we make the parents backpointer transient. This
prevents jumping back up to parents during serialization and results in
only needing a stack depth relative to the schema depth. Once all that
serialization is completed and all the DPathCompileInfo's are
serialized, we then manually traverse all the DPathCompileInfo's again
and serialize the parent sequences (via the serailizeParents method).
Because all the DPathCompileInfo's are already serialized, this just
serializes the Sequence objects and the stack depth is again relative to
the schema depth.

On complex schemas, this saw an order of magnitude reduction in stack
size during serialization.

DAFFODIL-2283

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Maybe a minor simplification possible.

Copy link
Contributor

@bsloane1650 bsloane1650 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

The "parents" val in a DPathCompileInfo is a backpointer to all
DPathCompileInfo's that reference it. The problem with this is that when
elements are shared, these backpointers create a highly connected graph
that requires a large stack to serialize using the default java
serialization as it jumps around parents and children. To avoid this
large stack requirement, we make the parents backpointer transient. This
prevents jumping back up to parents during serialization and results in
only needing a stack depth relative to the schema depth. Once all that
serialization is completed and all the DPathCompileInfo's are
serialized, we then manually traverse all the DPathCompileInfo's again
and serialize the parent sequences (via the serailizeParents method).
Because all the DPathCompileInfo's are already serialized, this just
serializes the Sequence objects and the stack depth is again relative to
the schema depth.

On complex schemas, this saw an order of magnitude reduction in stack
size during serialization.

DAFFODIL-2283
@stevedlawrence stevedlawrence force-pushed the daffodil-2283-save-schema-stack-size branch from 99edd5d to 6872456 Compare March 9, 2020 16:47
@stevedlawrence stevedlawrence merged commit 1267898 into apache:master Mar 9, 2020
@stevedlawrence stevedlawrence deleted the daffodil-2283-save-schema-stack-size branch March 9, 2020 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants