Skip to content

Third Cut at Simplfying Layers - uses reflection to simpify variable handling#1176

Closed
mbeckerle wants to merge 40 commits intoapache:mainfrom
mbeckerle:daf-2845-layers
Closed

Third Cut at Simplfying Layers - uses reflection to simpify variable handling#1176
mbeckerle wants to merge 40 commits intoapache:mainfrom
mbeckerle:daf-2845-layers

Conversation

@mbeckerle
Copy link
Contributor

@mbeckerle mbeckerle commented Mar 3, 2024

Please look first at the code for the layers themselves in daffodil-runtime1-layers, and in daffodil-test.

Then look at the ".api" packages for anything in them that needn't be there.

Still TBD:

  • a possible namespace bug: bswap prefix vs. bs prefix for byteSwapLayer.dfdl.xsd
  • change dfdlx:layerTransform to dfdlx:layer and vals/vars/args named layerTransformer to either layer or layerFactory
  • review TODO/FIXME and a few commented out code sections I've not yet revisited.

layer API is all defined in Java clases and interfaces now.

Reworked to use DFDL variables for all
communications into the layer code.

Split layers so that layer limiting is its own layer (for boundary mark
and there is also a fixedLength layer).

There are two ways to do explicit layer limiting, that differ in an
important way.

If you use 'implicit' layer length (no restriction layer), and constraint
the layer length by a surrounding element of specified length, then the
layer length is controlled when parsing, but NOT controlled when unparsing.

If you use the layer "fixedLength", that dictates the parse and unparse length
to be the value of the layerLength variable. Similarly if you use a checksum
layer built with the ChecksumLayer base class, the length is controlled
for both parsing and unparsing.

DAFFODIL-2845
static initializer differences of some sort.

DAFFODIL-2845
Note that The "@throws" seem to be particularly problematic, so I've
just omitted those from the javadoc.

DAFFODIL-2845
Two tests fail still

test_checkDigit_01
test_checkDigit_01u

Both fail with circular deadlock.
nothing is ever writing the checkDigit variable.

This makes sense because the checkDigit isn't a fixed length layer in the same sense that
IPv4 is exactly 20 bytes.

You can't count-down the length on unparsing and trigger the close.

I think.

Remove LayerRuntime.getCharset

Rename wrap methods with simpler names.

Convert to reflection-based system for variables
and constructor args, and result variable getters.

Problem is....Don't know where to call the
thing that gets the results and assigns to the
result variables. for unparsing specifically.

When do you know a layer is complete
and you can grab its result values?
(In general, not for checksums only)

Notes on test that are needed
- a layer with a layer parameter of every type
- and a return variable of every type.

A layer with mismatches
- extra vars not corresponding constructor arg,
- wrong type constructor arg,
- no corresponding result getter,
- wrong type result getter.

- extra parameter args with no corresponding var.
- extra getter with no corresponding var.

DAFFODIL-2845
@mbeckerle mbeckerle requested a review from stevedlawrence March 3, 2024 16:50
Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really happy with how this. Just a bunch of thoughts. Only real question is if things are made more simple by using a setDFDLVariables function for input variables instead of using alternative constructor.

// In addition, this process of verifying the DFDL variables used by the layer
// pre-computes some data structures that facilitate fast run-time processing

val lvr = computeLayerVarsRuntime(lri.layerRuntimeData, spiLayerInstance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have the LayerRegistry store the LVR information? So when it loads a layer, it immediately computes the LVR and stores it in the registry along with the Layer instance? So the registry acts as this cache and it's checked immediately when a layer instance is created? I'm not sure when the registry is loaded, but it would be good to trigger that on schema reload to make sure everything is checked if the classpath changed.

So this just becomes something like:

val (layerSPI, lvr) = LayerRegistry.findLayer(...)

In fact, do we even need the Layer instance from the registery? Doesn't lvr have everything needed to create an actual new instance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I have to think about. Perhaps we can do this. Once we have the layer we have the layer namespace, and that will let us get the variables if we have the data processor. That's the issue. Do we have the data processor object (with schema set runtime data) already in hand at this point, or is that 'in process' of being created?

s: InputSourceDataInputStream,
layerRuntimeImpl: LayerRuntimeImpl,
): Unit = {
layerVarsRuntime.callGettersToPopulateResultVars(layer, layerRuntimeImpl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little confused that the LayerDriver handles getting output variables but the LayerFactory handles setting input variables. I guess it makes sense since the Factory creates the Layer and so it needs the input variables, but I wonder if it would be more obvious if the driver handled both? Maybe this is an argument for a special setDFDLVariables function that is called after construction to set all variables? Then the factory get just get the lvr and create a Driver, and the driver handles both setting and getting variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LayerFactory has to get variable values. The LayerDriver has to call the getters to assign the output variables.

This is ordinary I think. One could do something more java-beans flavored, and have setters for the args and getters for the results, but I think that's clumsier. Never did like java beans approach. So many lines of code to do nothing much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more I think about this, I actually like setters for the args and getters for the results. Using a setter gives a single place to do input error detection and avoids the dummy constructor stuff in scala implementations. Implementing a setter in scala feels a bit dirty, but this is a Java API where setters are normal. And you need mutable state for any getters, so more mutable state for setters is at least consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer one setter for all params. Only a zero-arg default constructor is needed then.

But we still need to compute the call to the setter at schema compile time, which checks the types and number and names of the parameters against the variable definintions.

It's much closer to the java-beans approach, ie., 100% of making sure all the parameters are correct is on the user's back, vs the constructor approach, but both are viable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaned up next push of changes. The LayerDriver class now does all the invocation of the setter for vars and getter for results. The LayerFactory just invokes the LayerRuntimeCompiler and creates the Layer object instance, and LayerDriver that encapsulates it.

)
case pe: ParseError =>
state.setFailed(pe)
case e: Exception =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a runtime SDE be caught here? Should that be treated differently than LayerUnexpectedException?

Copy link
Contributor Author

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm addressing most of your comments in the next commits.

There is obviously layer doc to write, and tests to cover many areas must be created.

I plan to first investigate other layer implementations that exist in house at Owl to make sure those will convert easily.

Then I will return to the testing, doc, and the last few unresolved issues for which the comments are not resolved.

<appinfo source="http://www.ogf.org/dfdl/">

<dfdl:defineVariable name="boundaryMark" type="xs:string"/>
<dfdl:defineVariable name="layerEncoding" type="xs:string"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be there or in the javadoc?

For DFDL schema users it should be in these files I think. I have not written this yet however.


override def wrapLayerOutput(jos: OutputStream, lr: LayerRuntime): OutputStream = {

if (fixedLength > maxFixedLength)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not worth it.

It's better to just construct the object at schema compile time so as to verify everything works, and then do it again, once, on loading. The latter check is really only needed for the chance case that you use the wrong jar/class files that don't match the schema. And even in that case the check just verifies that the number of param args and their types and the return getters and their types haven't changed. You could still have the wrong jar.

// In addition, this process of verifying the DFDL variables used by the layer
// pre-computes some data structures that facilitate fast run-time processing

val lvr = computeLayerVarsRuntime(lri.layerRuntimeData, spiLayerInstance)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one I have to think about. Perhaps we can do this. Once we have the layer we have the layer namespace, and that will let us get the variables if we have the data processor. That's the issue. Do we have the data processor object (with schema set runtime data) already in hand at this point, or is that 'in process' of being created?

s: InputSourceDataInputStream,
layerRuntimeImpl: LayerRuntimeImpl,
): Unit = {
layerVarsRuntime.callGettersToPopulateResultVars(layer, layerRuntimeImpl)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LayerFactory has to get variable values. The LayerDriver has to call the getters to assign the output variables.

This is ordinary I think. One could do something more java-beans flavored, and have setters for the args and getters for the results, but I think that's clumsier. Never did like java beans approach. So many lines of code to do nothing much.

Copy link
Contributor Author

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevedlawrence Please take a look how this came out.

Now uses one setter to pass parameter variables to the layer.

Also eliminated the need to use the boxed JInt-like types in the layer definitions.

Just a few refactorings to move code around are left undone still from the review comments.

<appinfo source="http://www.ogf.org/dfdl/">

<dfdl:defineVariable name="boundaryMark" type="xs:string"/>
<dfdl:defineVariable name="layerEncoding" type="xs:string"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree a README is the right thing.

We have to make sure the README is included in the delivered jars though.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the way the setLayerVariable turned out. It definitely feels weird when you are so used to the scala way, but I think Java devs will feel right at home, and it does clean up scala implementations a bit. And doses't feel like it added much extra complication.

Couple minor questions that these changes made me think about.

*/
def setLayerVariableParameters(boundaryMark: String, layerEncoding: String): Unit = {
this.boundaryMark = boundaryMark
this.layerEncoding = layerEncoding
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass in LayerRuntime into this function for so users can throw errors here? For example, this might want to be something like:

private var boundaryMark: String = _
private var charset: Charset = _

def setLayerVariableParameters(lr: LayerRuntime, boundryMark: String, layerEncoding: String): Unit = {
  this.boundaryMark = boundaryMake
  this.charset = try {
    Charset.forName(layerEncoding)
  } catch {
    case e: Exception => lr.runtimeSchemaDefinitionError(...)
  } 
}

Or alternatively, maybe setLayerVariableParameters can throw an exception (maybe IllegalArgumentException or a custom LayerBadVariableException?) if any variables values don't make sense for a layer? And the layer backend swould catch that and convert it to an SDE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can say any exception thrown from this becomes a SchemaDefinitionError. I added that as the defined way to do this. I didn't want to muddy the waters with an arg that isn't about a variable.

}

private def check(lr: LayerRuntime): Unit = {
if (fixedLength < 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question above about the possibiity for setLayerVariableParameters to perform validation checks would mean this check function can go away, and wrap functions can always assume valid state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is gone. In the IPV4Checksum example, the setLayerVariableParameters has no args and just serves to initialize.

*
* @param lr The LayerRuntime instance.
*/
protected def init(lr: LayerRuntime): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setVariable can check errors I think init also goes away?

Then setLength would create failures if length is <= 0 however setVariable would?

I guess we still need a check to make sure implementations actually call setLength, so the wrap functions do need some kind of initialization check. Maybe you could change the var to a JInt that is initialized to null, and the wrap functions could do something simple like Assert.invariant(length != null). Just trying to think of ways to makes things as simple as possible.

Also, it feels like if a user doesn't cal setLength, that wants to be an error more aggressive than a processing error. That's a bug in the layer and needs to be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not calling setLength is an SDE now, and check is gone.


lrd.context.SDE(tooManyConstructorsMsg)
}
val constructor = c.getConstructors.find { c => c.getParameterCount == 0 }.getOrElse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified to this to get the zero arg constructor:

val constructor = c.getConstructor()

In order for SPI to have found the layer and gotten this far, and created an instance this must return something, right?

Assert.invariantFailed("SPI-loaded class without default constructor? Not possible.")
}
val allMethods = c.getMethods
val optParamSetter = allMethods.find { _.getName == varParamSetter }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check if there there are multiple overloaded varParamSetter an error?

* This will be called automatically to retrieve the integer value that was returned from the `compute` method, and
* the DFDL variable 'digest' will be assigned that value.
*/
public abstract class ChecksumLayer extends ChecksumLayerBase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a comment that implementations must call setLength prior to wrap being called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

@@ -63,11 +63,11 @@
* <p/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis doc needs updating to use the new setVariable method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in next commits

* For example, a result value getter for a DFDL variable named 'checksum' of type xs:unsignedShort would be:
* <pre>
* Int getDFDLResultVariable_checksum() {
* Int getLayerVariableResult_checksum() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these set/getVariable functions need o be public, or can reflection still find them if they are package private?

Also, it might be worth using int instead of Int since this is javadoc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package private works also. I've changed all ours, since those will be used as examples.

* @return the check digit value
*/
def getDFDLResultVariable_checkDigit: JInt =
def getLayerVariableResult_checkDigit: JInt =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if this must be a Jint or would a primitive Int work too? I think it would work with the new changes? Should our examples use primitive int since I imagine that what most Java devs would find more natural?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question has not been resolved yet.

I got this far only by skimming; I didn't have time to review 2/3s of the PR thoroughly. I will accept that the changes work since the tests pass.

isUnparse: Boolean,
byteBuffer: ByteBuffer,
): Int = {
init(layerRuntime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a huge fan of this init, it feels an awful lot like the check() function that we tried to remove. Maybe it's required in this case? I did mention a possible alternative in another comment where the set variable function calls setLength, but I guess the issue here is this layer don't have the a set variable function to call setLength since this layer has no variables?

A thought, does it work to have an empty set variable functions? For example, maybe this checksum layer implementation could do:

def setLayerVariableParameters(): Unit = {
  setLength(20)
}

So the documentation for ChecksumLayer api says all implementations must implement a setLayerVariableParameters function and must call setLength(). If the layer has no variables, the function can have an empty parameter list.

So setLayerVariablParameters kindof becomes a layers init() function, and it can be parameterized by variables or not?

The parameter setter is now the
initialize method and is called even
if it has zero args.
LayerSchemaCompiler does schema-compile-time checking and constructs a LayerFactory

LayerRuntimeCompiler does
 runtime compiling of the construct, parameter/var setting,
 result/var getting structures.

 Both LayerSchemaCompiler and
 LayerFactory invoke the LayerRuntimeCompiler, in the former case to do checking.

 LayerRegistry is gone, merged into LayerFactory
Copy link
Contributor Author

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready for a significant review now.
@tuxji now would be worthwhile to add your perspective
@pkatlic please take a look
@olabusayoT please take a look - focus on the built in and example layers, and the APIs they use more than the implementation, though of course comments are welcome on that also.

Not done:

  • user documentation of these layers
  • clean up of this AIS example - which is using internals of BitsCharset so isn't really a clean user example
  • tests that pass every type of variable, multiple getters that get every type of result
  • negative tests to excite all the error situations (which codecov will spot when those error paths aren't used.
  • at Owl we have several layers that are in-house things, and I haven't yet re-done those to match this, but I'm pretty confident they will work fine, as they are just a variety of proprietary checksums, and one interesting zip-file-layer one.

*/
def setLayerVariableParameters(boundaryMark: String, layerEncoding: String): Unit = {
this.boundaryMark = boundaryMark
this.layerEncoding = layerEncoding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can say any exception thrown from this becomes a SchemaDefinitionError. I added that as the defined way to do this. I didn't want to muddy the waters with an arg that isn't about a variable.

s: InputSourceDataInputStream,
layerRuntimeImpl: LayerRuntimeImpl,
): Unit = {
layerVarsRuntime.callGettersToPopulateResultVars(layer, layerRuntimeImpl)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cleaned up next push of changes. The LayerDriver class now does all the invocation of the setter for vars and getter for results. The LayerFactory just invokes the LayerRuntimeCompiler and creates the Layer object instance, and LayerDriver that encapsulates it.

* This will be called automatically to retrieve the integer value that was returned from the `compute` method, and
* the DFDL variable 'digest' will be assigned that value.
*/
public abstract class ChecksumLayer extends ChecksumLayerBase {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

boolean isUnparse,
ByteBuffer byteBuffer
);
} No newline at end of file
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out the code is tolerant of the unboxed types. I just had to make our own type checking tolerate it.

Javadoc for Layer has been updated to document the getter.

@@ -63,11 +63,11 @@
* <p/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in next commits

* For example, a result value getter for a DFDL variable named 'checksum' of type xs:unsignedShort would be:
* <pre>
* Int getDFDLResultVariable_checksum() {
* Int getLayerVariableResult_checksum() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

package private works also. I've changed all ours, since those will be used as examples.

<appinfo source="http://www.ogf.org/dfdl/">

<dfdl:defineVariable name="boundaryMark" type="xs:string"/>
<dfdl:defineVariable name="layerEncoding" type="xs:string"/>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have NOT addressed this yet.

}

private def check(lr: LayerRuntime): Unit = {
if (fixedLength < 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is gone. In the IPV4Checksum example, the setLayerVariableParameters has no args and just serves to initialize.

*
* @param lr The LayerRuntime instance.
*/
protected def init(lr: LayerRuntime): Unit = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not calling setLength is an SDE now, and check is gone.

@mbeckerle mbeckerle requested review from olabusayoT and tuxji March 8, 2024 22:43
Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be getting to a good point for merge, although unfinished parts should be completed first. I do wish that there was a way to split this PR into a series of several smaller PRs with cohesive changes in each PR since that would make the numerous changes easier to review. I skimmed 2/3s of the PR's changes.

protected def emptyFormatFactory = new DFDLSequence(newDFDLAnnotationXML("sequence"), this)

final lazy val <sequence>{apparentXMLChildren @ _*}</sequence> = (xml \\ "sequence")(0)
final lazy val <sequence>{apparentXMLChildren @ _*}</sequence> = (xml \\ "sequence").head
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment might be nice to explain this unusual syntax. I normally expect the "final lazy val" to be followed by an identifier, but I see what appears to be a pattern deconstruction expression instead. I believe that the pattern expression on the left side is deconstructing the XML Node on the right side, stripping off the <sequence>...</sequence> tags and initializing apparentXMLChildren with whatever was between the <sequence>...</sequence> tags. Correct?

I also suppose that if xml doesn't contain a <sequence> node as the parent node or as a child node below the parent node (which one is xml \\ "sequence" looking for?), then either the DPath expression or the pattern deconstruction expression will fail and throw an exception. Is the exception caught higher up and turned into a schema definition error?

override def separatorPosition: SeparatorPosition = SeparatorPosition.Infix

override def isLayered: Boolean = false
override def layerOption: Option[RefQName] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My browser sees only 4 places where layerOption occurs in this GitHub page (even after loading all the diffs hidden from view). Of these 4 places, this appears to be the only place which initializes layerOption to anything, and it's initializing layerOption to None. A codecov warning also says this line is not called, so where else is the codebase actually setting layerOption to anything at all?

val (_, actual) = TestUtils.testBinary(sch, data, areTracing = false)
TestUtils.assertEqualsXMLElements(infoset, actual)

TestUtils.testUnparsingBinary(sch, infoset, data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that there are not pairs of assertEquals on both the infoset and the data like I would expect, but it turns out that the TestUtils.testUnparsingBinary has its own assertEquals checking that the data round trips back to the same data.

* All layers are derived from this class, and must have no-args default constructors.
* <p/>
* Derived classes will be dynamically loaded by Java's SPI system.
* The names of concrete classes derived from Layer are listed in a resources/M.services file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*resources/META-INF/services

* @return the check digit value
*/
def getDFDLResultVariable_checkDigit: JInt =
def getLayerVariableResult_checkDigit: JInt =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question has not been resolved yet.

I got this far only by skimming; I didn't have time to review 2/3s of the PR thoroughly. I will accept that the changes work since the tests pass.

private val deprecatedProperties: Seq[DeprecatedProperty] = Seq(
DeprecatedProperty(XMLUtils.DFDL_NAMESPACE, "layerTransform", "dfdlx:layerTransform"),
private def deprecatedProperties: Seq[DeprecatedProperty] = Seq(
DeprecatedProperty(XMLUtils.DFDL_NAMESPACE, "layer" + "Transform", "dfdlx:layer"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed back to "layerTransform" from "layer" + "Transform"?

}
afterBaos.close()
val actualAfterDelim = new String(afterBaos.toByteArray())
val rls = new RegexLimitingInputStream(bais, "\\r\\n(?!(?:\\t|\\ ))", "\r\n", utf8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these pattern strings are used several times here, should they be defined as a constant?

// regex is CRLF not followed by a tab or space.
//
val rls = new RegexLimitingStream(bba, "\\r\\n(?!(?:\\t|\\ ))", "\r\n", iso8859)
val rls = new RegexLimitingInputStream(bba, "\\r\\n(?!(?:\\t|\\ ))", "\r\n", iso8859)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question as above, should these pattern strings be moved to a common constant?

We don't care that these exceptions come from reflective calls.
We want to process them based on what
they are really.

org.apache.daffodil.charsets.BitsCharset_ISO_8859_1_Reverse_Definition
org.apache.daffodil.charsets.BitsCharsetTest_ISO_8859_13_Definition
org.apache.daffodil.charsets.BitsCharsetTest_ISO_8859_13_Definition No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to be missing a newline at the end of the file.

Copy link
Contributor

@olabusayoT olabusayoT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great +1 with a small change

Comment on lines 73 to 76
<dfdl:assert
failureType="recoverableError"
test='{ checkDigit eq $cd:checkDigit }'
message='{ fn:concat("Incorrect check digit: ", checkDigit, ". Should be: ", $cd:checkDigit, ".") }'/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate project, we established that xmlllint didn't recognize the asserts as validation errors and decided instead to use an "invalid" element that only showed up when the checksum/checkdigit was incorrect, and had an enum facet that would always fail. I think showing that in this example might be the better approach.

The layer now has the LayerRuntime
as a member, so it does not need
to be passed to methods, and the layer
class itself can have error reporting methods.
Merged LayerRuntime and LayerRuntimeImpl.
Not needed, functionality was redundant.
Per review comment. Uses
an invalid element now.
@mbeckerle mbeckerle marked this pull request as draft March 13, 2024 03:35
AllTypes test works.

Several positive tests that set the stage
for negative tests working.
These tests are all where the layer and the layer vars are mismatched, or the
Layer class otherwise has errors in its construction.
Bomb-out test rig helps
exercise failure conditions.

2 important tests are failing.
(See FIXME comments in TestLayers.scala)
Added close() to InputSourceDataInputStream, as we weren't closing these.

Added try/catch unwind protection to Text sub-parser to ensure resource recovery even if the underlying layer read call fails via a throw.
Added appropriate try/catch to defend against bad
behavior by layer code.

3 tests failing still
Discovered that LayerVarsRuntime is holding onto VRDs for the
variables of the layer invocation. This doesn't work because
the VRDs are different for newVariableInstances from those of the
original variable declarations.

Variables are accessed by grabbing an index out of the VRD, and this
index was incorrect as the wrong VRDs were being used.

Right now, variables are being accessed by qname, which does a linear
search down the list of current VRDs for this context to get the correct
VRDs.

Ultimately, the way LayerVarsRuntime caches the "real" VRDs needs to be
updated, so that when they are dereferenced their real VRDs are captured.

There are still 3 tests failing related to backtracking - somehow the wrong layer data stream is getting the backtracking Mark, so when we backtrack, the stack of marks is empty.
Protecting every place that layer code is called from any sort of error/throw
requires quite a number of try/catch blocks.
Protecting every place that layer code is called from any sort of error/throw
requires quite a number of try/catch blocks.

Also, while the Layer SPI cache can be a singleton,
the LayerVarsRuntime cache cannot. It must be per schema set, since
it stores Variable VRDs, which for the same QName will be different
for different schemas.
@mbeckerle
Copy link
Contributor Author

Closing. These comments have been fixed/addressed.

This PR subsumed by

#1187

which was then subsumed by

#1191

@mbeckerle mbeckerle closed this Mar 28, 2024
@mbeckerle mbeckerle deleted the daf-2845-layers branch March 28, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments