Make include/import logic more precise and consistent#1067
Conversation
mbeckerle
left a comment
There was a problem hiding this comment.
+1
Minor suggestions for comment enhancement.
| * optContextURI must be provided or a usage exception is thrown. | ||
| * | ||
| * Returns None if this fails to resolve the schemaLocation to an URI that exists. If this | ||
| * successfully resolves, it returns a Some containing a tuple of the URI and a boolean that |
There was a problem hiding this comment.
It's worth using the word "compatibility" in here somewhere just for the keyword search value.
| SDW( | ||
| WarnID.DeprecatedBuiltInFormats, | ||
| "schemaLocation property uses deprecated include/import of edu/illinois/ncsa/daffodil/xsd/built-in-formats.xsd. Use org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd instead and change the dfdl:format ref to \"GeneralFormat\".", | ||
| "schemaLocation property uses deprecated include/import of edu/illinois/ncsa/daffodil/xsd/built-in-formats.xsd. Use /org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd instead and change the dfdl:format ref to \"GeneralFormat\".", |
There was a problem hiding this comment.
It has been years. I think we should get rid of this feature. We need to add that to a release note if we do, because it's a feature being removed.
There was a problem hiding this comment.
Seems resonable, this PR is probably a reasonable time to remove it. I'll add it to the Deprecation/Compatibility section.
| val optURI = Misc.getResourceRelativeOption(sysId, baseURI) | ||
| // We did not get a catalog resolution of the nsURI. We now look for the systemID (which | ||
| // comes from the schemaLocation attribute) on classpath or as a file. Note that we | ||
| // ignore if boolean |
There was a problem hiding this comment.
ignore if boolean..... what
There was a problem hiding this comment.
This "note" can be removed. It was saying that the boolean part of the tuple returned by resolveSchemaLocation can be ignored since we look at it elsewhere, but I move it as a comment a couple lines below and I guess I messed up removing this sentence. I'll remove this.
| * is true if a relative schemaLocation path was resolved absolutely on the classpath and | ||
| * false otherwise. This capability may be disabled in the future. | ||
| */ | ||
| def resolveSchemaLocation( |
There was a problem hiding this comment.
We should consider factoring out a daffodil-resolver library that contains the resolver stuff, and this utility.
Actually I guess breaking up daffodil-lib into compile-time and runtime parts is a separate topic, but I think it needs to happen.
There was a problem hiding this comment.
Agreed. I think he biggest obstacle is the XML catalog support, which I think is handled slightly different in the DaffodilXMLLoader and in the IIBase stuff. It also requires state for finding/loading the XML Catalog, so it felt a bit bigger task than I wanted to take on in this PR, which is relatively small and localized. I'll open a clean up ticket for it.
tuxji
left a comment
There was a problem hiding this comment.
Right now Main is duplicating some of the same code as XMLUtils.resolveSchemaLocation but not doing it entirely correctly. The best thing would be for Main to call that function too, but if it can't, then the Main code needs to be tweaked and ideally some tests added (these new tests won't be necessary if Main calls the XMLUtils function).
| Some(file.toURI) | ||
| } else { | ||
| Misc.getResourceRelativeOption(s, None) | ||
| val optURI = None |
There was a problem hiding this comment.
This None.orElse{}... expression evaluates each alternative in turn, stopping when one returns a non-empty Option.
If you think a reader might appreciate the above line as a comment, please add it. I actually had to go look at the definition of .orElse in IDEA.
There was a problem hiding this comment.
Yeah, I wasn't too sure about the `None.orElse{}.orElse{}.orElse{} pattern. It felt kindof nice since it removed a bunch of if-else statements, but I'm not sure we use it anywhere else, so it does feel off. I'm definitely open to other suggestions, but will add a comment explaining this if there's nothing better.
| .orElse { | ||
| if (s.startsWith("/")) { | ||
| val resource = this.getClass.getResource(s) | ||
| Option(resource.toURI) |
There was a problem hiding this comment.
If the resource does not exist or is not visible due to security considerations, the getResource method returns null. Do you need a null check before calling resource.toURI and another None if the null check fails?
Also, lines 237 - 238 don't seem to be covered by tests.
There was a problem hiding this comment.
Yep, good catch. At one point this was just Option(resource), and null was fine. But the toURI is needed since the resource is a URL. I fixed it in the XMLUtils function but not here. Will fix.
I'll see if I can add some CLI tests that hit these code paths. These are important use cases we want to make sure we don't accidentally break. I don't think we had tests for them before and that's probably a mistake.
| val resource = this.getClass.getResource("/" + s) | ||
| if (resource != null) { | ||
| Logger.log.warn(s"Found relative path on classpath absolutely, did you mean /$s") | ||
| Some(resource.toURI) |
There was a problem hiding this comment.
If the previous .orElse code returns None because a resource cannot be found when s starts with "/", this code will add another "/" to s, still fail to find the resource, and return None again.
Should we combine these two .orElse cases by assigning an if expression to val resource?
val resource = if (s.startsWith("/")) s else "/" + sActually, you want to warn the user if a relative path is found absolutely on the classpath, but you can guard the warning with an if (!s.startsWith("/")).
Line 247 doesn't seem to be covered by tests too.
There was a problem hiding this comment.
My thinking was that when we want to remove this fallback support and result in a normal "could not find schema" error, then we can just remove this orElse block and nothing else needs to be modified. I don't feel strongly about this though, I can merge the logic.
There was a problem hiding this comment.
Your reasoning makes sense to me - let's just add a comment to this orElse block saying the entire block is fallback support that we may remove a few releases later.
| val optURI = Misc.getResourceRelativeOption(sysId, baseURI) | ||
| // We did not get a catalog resolution of the nsURI. We now look for the systemID (which | ||
| // comes from the schemaLocation attribute) on classpath or as a file. Note that we | ||
| // ignore if boolean |
There was a problem hiding this comment.
The sentence Note that we ignore if boolean is incomplete and I don't know what you mean by it. Do you mean Note that we ignore sysId's boolean value.?
There was a problem hiding this comment.
Mentioned above, but this note should just e removed. The original comment was moved a couple lines below and I guess I didn't delete this one correctly.
| val pr = tdp.parse(input, b.length * 8) | ||
| assert( | ||
| !pr.isProcessingError && pf.getDiagnostics.isEmpty, | ||
| !pr.isProcessingError, |
There was a problem hiding this comment.
Let's discuss whether to remove && pf.getDiagnostics.isEmpty or not. When we compile the generated C code into an executable and run that executable, the program will invariably print validation diagnostics if it detects validation problems like an element's value not matching its fixed attribute. However, the program returns an error exit status only if you pass -V on or -V limited, otherwise it returns a successful exit status. I wanted to fail these test cases and print the validation diagnostics if any are printed.
Note that right now, the TDML Runner doesn't pass any validation option to the daffodilC executable, but the TDML Runner does return any printed messages as validation diagnostics in the TDML result, so that the TDML Runner will do the right thing with any validation error clauses in the test cases.
There was a problem hiding this comment.
I originally removed this because of the new warnings about a missing leading slash that causes this to fail (which are fixed in the subsequent commit in this PR). So there were new schema compile-time diagnostics that this check didn't like.
the program will invariably print validation diagnostics if it detects validation problems like an element's value not matching its fixed attribute
Are these parse-time diagnostics that you expect to be empty? Not schema compile/code generation-time diagnostics? Maybe this should have be pr.getDiagnostics.isEmpty instead of pf.getDiagnostics.isEmpty? Would that be sufficient for the check you're looking for, since the new warnings should only be on the ProcessorFactory but not on the ParseResult (I think?).
There was a problem hiding this comment.
Yes, you're right, pf.getDiagnostics.isEmpty should be pr.getDiagnostics.isEmpty (parse-time diagnostics, not schema compile/code generation-time diagnostics).
The new function works slightly differently than how main works. I think the main different is that if the schema location is relative it requires a context URI to resolve relative to, or else it's an assertion failure. The CLI allows relative path locations without a context. Though, I wonder if maybe when called from the CLI, we could say the |
| cr match { | ||
| case Left(diagnostics) => fail(s"getProcessor failed: ${diagnostics.mkString}") | ||
| case Right((diagnostics, tdmlDFDLProcessor)) => | ||
| assert(diagnostics.isEmpty) |
There was a problem hiding this comment.
@tuxji, this place where I removed diagnostics is just about the getProcessor call and not about parse/unparse diagnostics. Is it safe to keep this one removed? Or maybe I an change it so that it only asserts that there are no error error diagnostics, so that new warnings that get created (like this one) do not break this test?
There was a problem hiding this comment.
We should fix the relative schemaLocation causing the schema compile-time warning if possible. If not, let's remove diagnostics.isEmpty to make the test more robust.
There was a problem hiding this comment.
I was hoping to leave all the updates to schemaLocations in a separate PR so that 1) it's easier to look at the real changes if we ever need to come back to it and 2) if we ever add a .git-blame-ignore-revs file then maybe the second commit that updates all the schemaLocations could go in that. I would also like to make sure all tests pass in both commits separately in case we ever need to do a git bisect. So I'd rather we remove or change the assert so this test passes without needing a schemaLocation updatew. I think something like this should work and is maybe reasonable?
assert(diagnostics.forall(!_.isError))It's maybe the same idea as what the original assert was checking for, but a bit more robust against new warnings that might be added in the future.
|
I've pushed a new commit that I think resolves all the issues. Also, I was able to to modify Main to use the same function by making the contextURI option a URI to the current working directory. |
tuxji
left a comment
There was a problem hiding this comment.
+1
A lot of the CLI tests failed on Windows for some reason, so you'll have to fix that problem first before you can merge.
The issue is that Windows paths are not valid URI's. I've pushed a commit that converts the schema path to a valid URI string, keeping it relative if possible so that resolveSchemaLocation can use its relative lookup logic. Makes things a bit more complicated in Main, but I've learned that windows paths are actually pretty complicated. |
|
Yes, the new commit has made the CLI tests pass. I wonder why only the Main class needed to be changed? What about the Java/Scala API, are there ways to pass paths into Daffodil's japi/sapi which also need similar code to transform Windows paths into URIs (or do these API accept only URIs, not String paths, anyway)? |
mbeckerle
left a comment
There was a problem hiding this comment.
+1
One suggestion for a comment.
| // This test makes sure a schema can be compiled from an absolute path on the filesystem | ||
| val schema = path( | ||
| "daffodil-cli/src/test/resources/org/apache/daffodil/cli/global_element.dfdl.xsd", | ||
| ).toAbsolutePath |
There was a problem hiding this comment.
Is this going to be a MS-Windows style path when this test is run on Windows?
I honestly did not know that. I assumed Java paths were somehow portable things at that point.
If so, please add comment like // on MS Windows this will be a windows-style path with backslash separators.
There was a problem hiding this comment.
Yes, the path method just does Paths.get(...) which supports unix-style separators even on windows systems. And when call toString on that Path to get a string to pass in to the CLI, the string contains windows style paths.
So Paths.get("/path/containing/unix/separators").toString will work on Linux or Windows, and return the right thing depending on the OS.
The Main class expects The The Scala/Java API |
We currently resolve include/import schema locations in two different places, one in the DaffodilCatalogResolver and the other in dsom IIBase. The logic is not obviously the same, so this adds a new XMLUtils function specifically for resolving a schemaLocation attribute. The more complex cases like catalog managers are not implemented here and are still done in the different classes, but this at least consolidates the search when catalog manager fails (which is the common case). This also slightly tweaks the schemaLocation logic to the following - If a schemaLocation is an absolute URI (i.e. has a scheme) we just look it up directly - If a schemaLocation is not absolute and the path starts with a /, then we first look for it on the classpath. If not found we look for the absolute path on the file system - If a schemaLocation is not absolute and the path does not start with a /, then we resolve the path relative to the importing file. If not found, we look for the relative path on the classpath absolutely, but with a deprecation warning. Note that this deprecates looking up relative include imports on the classpath. A schemaLocation that is expected to be in different jar on the classpath should be an absolute path. This removes any ambiguity about if a file should be resolved relative to the importing schema or on the classpath. Eventually we should remove the deprecated capability completely, but most schemas currently depend on this behavior. For example, many schemas have something like the following: <import schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" /> But this really should be an absolute path, like so: <import schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" /> This also modifies the CLI to have similar logic, where we warn if a schema path is provided absolutely. Deprecation/Compatibility: Resolving relative import/include schemaLocation paths absolutely is now deprecated and will output a warning, and support may eventually be removed. For example, a common import is: <import schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" /> Although this path is relative, previous versions of Daffodil allowed this to be found on the classpath absolutely, as if it started with a slash. This behavior is deprecated. If a schemaLocation is expected to be found on the classpath and not relative to the importing file, then it should start with a slash, for example: <import schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" /> Daffodil still supports the old behavior and will look up relative schemaLocatin paths absolutely, but will output a schema definition warning. Future versions of Daffodil may change this to an error. Deprecation/Compatability: Support for importing "edu/illinois/ncsa/daffodil/xsd/built-in-formats.xsd" has been removed. Schemas should instead import "/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" and change dfdl:format to reference "GeneralFormat" DAFFODIL-2828
This removes deprecated import/include schemaLocations and associated warnings. DAFFODIL-2828
277029d to
d8a2683
Compare
We currently resolve include/import schema locations in two different
places, one in the DaffodilCatalogResolver and the other in dsom IIBase.
The logic is not obviously the same, so this adds a new XMLUtils
function specifically for resolving a schemaLocation attribute. The more
complex cases like catalog managers are not implemented here and are
still done in the different classes, but this at least consolidates the
search when catalog manager fails (which is the common case).
This also slightly tweaks the schemaLocation logic to the following
look it up directly
we first look for it on the classpath. If not found we look for the
absolute path on the file system
/, then we resolve the path relative to the importing file. If not
found, we look for the relative path on the classpath absolutely, but
with a deprecation warning.
Note that this deprecates looking up relative include imports on the
classpath. A schemaLocation that is expected to be in different jar on
the classpath should be an absolute path. This removes any ambiguity
about if a file should be resolved relative to the importing schema or
on the classpath. Eventually we should remove the deprecated capability
completely, but most schemas currently depend on this behavior. For
example, many schemas have something like the following:
<import schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />But this really should be an absolute path, like so:
<import schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />This also modifies the CLI to have similar logic, where we warn if a
schema path is provided absolutely.
Deprecation/Compatibility:
Resolving relative import/include schemaLocation paths absolutely is now
deprecated and will output a warning, and support may eventually be
removed. For example, a common import is:
<import schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />Although this path is relative, previous versions of Daffodil allowed
this to be found on the classpath absolutely, as if it started with a
slash. This behavior is deprecated. If a schemaLocation is expected to
be found on the classpath and not relative to the importing file, then
it should start with a slash, for example:
<import schemaLocation="/org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd" />Daffodil still supports the old behavior and will look up relative
schemaLocatin paths absolutely, but will output a schema definition
warning. Future versions of Daffodil may change this to an error.
DAFFODIL-2840
Note that I've left this as two separate commits, one deprecating relative schemaLocations and one changing all the imports in Daffodil tests. This should hopefully make review easiser by mainly focusing on the first commit