Skip to content

De-personalize Diagnostic File URLs in Compiled Schemas#1065

Merged
olabusayoT merged 1 commit intoapache:mainfrom
olabusayoT:daf-2195-depersonalize
Mar 20, 2024
Merged

De-personalize Diagnostic File URLs in Compiled Schemas#1065
olabusayoT merged 1 commit intoapache:mainfrom
olabusayoT:daf-2195-depersonalize

Conversation

@olabusayoT
Copy link
Contributor

  • update DFDLSchemaFile.uriString to remove any prefix from Classpath and replace $HOME with ~ using Misc helper function
  • update systemId in SDE/SDW to replace $HOME with ~ using Misc helper function
  • update/add tests

DAFFODIL-2195

@olabusayoT olabusayoT force-pushed the daf-2195-depersonalize branch 2 times, most recently from c114b5d to 450a762 Compare August 4, 2023 15:32
Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small refactoring of some confusing code please.

*/
def depersonalizePath(path: String): String = {
path.replace(
System.getenv("HOME"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HOME isn't defined on MS-Windows. This is linux/unix specific.
According to ChatGPT, this is the portable way to do this:

        String homeDir = System.getProperty("user.home");

val finalUriString = Misc.depersonalizePath(
uriString_
.replace(
Misc.classPath.find(cp => uriString_.startsWith(cp.toString)).getOrElse("").toString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is confusing.

Can you spread this code out more? Add an intermediate for the uriString where the prefix that is found on the classpath has been removed, then depersonalize that in a separate line? The cascade of the replace with the find with the getOrElse is not easily understood.

val tokens = uriString.split(splitter)
val classpathTrimmedUriString =
uriString.replace(
Misc.classPath.find(cp => uriString.startsWith(cp.toString)).getOrElse("").toString,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idiom. I found confusing above.

Suggest create a private helper method and call it in both places. That way you can explain it once.

Name it "stripClasspathPrefix" or something like that perhaps.

<tdml:defineConfig name="cfg_maxParentDirectoriesForDiagnostics_3">
<daf:tunables xmlns="http://www.w3.org/2001/XMLSchema"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<daf:maxParentDirectoriesForDiagnostics>20</daf:maxParentDirectoriesForDiagnostics>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test where you clamp this to 1. Maybe that's in another PR already merged? I don't see the definition of this new tunable in this PR, so I'm guessing that is already covered. If not add a test to that effect.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on a potentially different approach. Let me know your thoughts, as I haven't put a ton of thought into it.

*/
def depersonalizePath(path: String): String = {
path.replace(
System.getenv("HOME"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Daffodil never looks at the home directory when resolving schemas unless it's on the classpath so if feels like we shouldn't need this. Feels like this is a sign that maybe our approach isn't right.

This also doesn't fully depersonalize schema paths. For example, if my schemas are in /home/slawrence/mycompany/confidential/schemas/, there's sill going to be personal information in the path. Same issue if schemas are in a different mount point like /dev/myschemas/. I'm wondering if there's a different approach we can take?

I'm wondering if a different approach could tie into recent discussions about absolute vs relative schemaLocation paths?

If an import schemaLocation is absolute (i.e. starts with slash), then that is what we display for whatever it resolves to. For example, if we have schemaLocation="/foo/bar/baz.dfdl.xsd" and that resolves to jar:file:/home/slawrence/.ivy1/pom/jars/foo.jar!/foo/bar/baz.dfdl.xsd, then the display schemaLocation is just /foo/bar/baz.dfdl.xsd, since that was the absolute schemaLocation that found it.

If an import schemaLocation is relative (i.e. doesn't start with a slash), then the display location becomes the schemaLocation resolved relative to the display URI of including schema. So for example, if the above baz.dfdl.xsd file imported ../qaz.dfdl.xsd, then display schemaLocation of that file be calculated like URI("/foo/bar/baz.dfdl.xsd).resolve("../qaz.dfdl.xsd") which would become URI("/foo/qaz.dfdl.xsd"). Note that this is very basically what we do to resolve relative paths, so the logic is very similar. We are just resolving reslative to the the dispaly schema location of the importing schema it's actual schema location

So we always end up with an absolute URI as the display schemaLocation. I imagine this is very similar to the results you get by looking at the classpath, but should be less finicky. For example, Misc.classpath wont' always return somethign if it's using a different classloader, which we have talk about for things like making plugins easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steve's suggestion makes sense to me (so does Mike's). Also, Steve's PR #1067 has made the import/include logic more consistent, so you'll want to redo this PR against the latest HEAD anyway when you have time to work on it again.

runCLI(args"save-parser -s $schema $parser") { cli =>
cli.expectErr("[error]")
cli.expectErr("systemId: file:~")
cli.expectErr("Schema context: file:~")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of outputting file:~/..., it feels like this wants to be something like:

Schema context: daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd",

since that is what the user provided. Similar to my other comment about how we want to use the impor/include schemaLocation as a hint of what to display, I think for the root schema we want to display the path that was actually provided.

As another example, if we had foo.jar on DAFFODIL_CLASSPATH and ran something like:

daffodil save-parser -s /some/file/in/foo/jar/dfdl.xsd

And that resolves to a path in foo.jar, then I think the schema context we output wants to be exactly /some/file/in/foo/jar/dfdl.xsd.

So I suggested that based on my other comment, when we resolve a schema, we either want to use the path directly (in the case of importing absolute schemaLocations or locations without an importing schema like this one), or resolve the schemaLocation relative to the importing display location in the case of relative imports with a context.

val tokens = uriString.split(splitter)
val classpathTrimmedUriString =
uriString.replace(
Misc.classPath.find(cp => uriString.startsWith(cp.toString)).getOrElse("").toString,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned elsewhere, but Misc.classpath is not very reliable. If a differnet classloader is used than a URLClassLoader (which we have discussed) this is going to break. Suggested an alternative in other comments about using schemaLocation instead.

infosetType,
processor,
parseOpts.schema.toOption,
parseOptsSchema,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do something like to get the URI:

parseOpts.schema.map(_._2).toOption

Note that we can also change parseOpts.infosetType.toOption.get to just parseOpts.infosetType(). In fact, if we do that, I suggest we just remove the infosetType variable--I don't think it really adds anything over parseOpts.infosetType() is only used in that one spot.

Another thought, it might be useful to create a case class like this

case class SchemaOption(raw: String, uri: URI)

And then have the value converter return SchmaOption(s, uri) instead of a tuple.

Then the we can use .raw or .uri to access what we want instead of ._1 or ._2, some examples:

parseOpts.schema.map(_.uri).toOption

This CLI code is so spread out that it's hard to remember what the values of tuples represent, this makes that a little easier.

runCLI(args"save-parser -s $schema $parser") { cli =>
cli.expectErr("[error]")
cli.expectErr("Schema context")
cli.expectErr("daffodil-sapi")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expect the full Schema Context line since this test is about making sure we aren't outputting absolute paths? For example, something like this:

cli.expectErr("Schema context: ... daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd")

This way if we something ever changes and we output file://some/absolute/path/daffodil-sapi/src/test/.... this test will fail and catch it? Right now this test would still pass even if that happened.

optRootNamespace: Option[String] = None,
): ProcessorFactory = {
val source = URISchemaSource(file.toURI)
val source = URISchemaSource((file.getPath, file.toURI))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on using separate parameters instead of a tuple for the URISchemaSource constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think with the SchemaOption case class, we can pass that in as a single argument rather than using the tuple

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "SchemaOption" feels CLI specific to me and might not want to leak into the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, any reason it can't be part of UriSchemaSource, that way it can be used widely?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess maybe it adds an extra level in indirection without much gain? For example, if URISchemaSource looked like this:

class URISchemaSource(val schemaOption)

Then to create it you need to create a separate object, e.g.

val schemaOption = new SchemaOption(..., ...)
val schemaSource = new URISchemaSource(schemaOption)

And then either everything needs to be changed to do schemaSource.schemaOption.uri for example, or URISchemaSource needs to update its members to reach into schemaOption for he uri and diagnostics.

Really, the only reason why we need SchemaOption is because scallop doesn't have a way to access to the original CLI argument after it's been converted. So you either need the a tuple (or a new object that's basically a tuple) to keep the original string around. Nothing else in the code really needs this pair of diagnostic + uri separately/ And that's really all the URISchemaSource it's, it's a tuple of the URI and how to show that URI in diagnostics plus some helper stuff.

Which makes me wonder if instead of creating a URI in the fileResourceURIConverter, we should just create a URISchemaSource directly. I imagine everything that uses the schema option as a URI just creates a URISchemaSource anyways. And if it needs the URI, it can just access the uri member of the URISchemaSource?

schemaFileLocation.uriString,
schemaFileLocation.diagnosticFilepath,
schemaFileLocation.toString,
"Schema File",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a test to see what this looks like and it's something like this:

Schema context: Schema File Location line 32 column 74 in daffodil-sapi/src/test/resources/test/sapi/mySchema6.dfdl.xsd

I wonder if Schema File doesn't really add anything and can be removed? Normally this would be something like Schema context: ex:someElement Location ... where ex:someElement is helpful information, but we don't have access to that when errors come from Xerces--we just have to rely on Xerces include that in the error message.

override lazy val diagnosticFilepath = if (self.schemaFile.isDefined) {
self.schemaFile.get.diagnosticFilepath
} else {
self.schemaSet.diagnosticFilepath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned elsewhere, but are we sure this is correct? Isn't this just the filepath of the root schema? I think getting information from the schemaSet might not be correct? I'm not even sure the schemaSet should have a diagnosticFilePath since it's a set of schemas?

}
override def uriForLoading = tempURI

override def diagnosticFilepath: String = Misc.stripJarPrefix(tempURI.getPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tempURI will never be in a jar. I think the diagnostic file path for an InputStreamSchemaSource can just be the full URI path. I believe this is only used for tests so the diganostic probably doesn't matter that much.

Alternativeyl, we could allow users to pass in a diagnostic path, but I'm not sure it matters.

val tempSchemaFile = XMLUtils.convertNodeToTempFile(node, tmpDir.orNull, nameHint)
val tempURI = tempSchemaFile.toURI
tempURI
(Misc.stripJarPrefix(tempURI.getPath), tempURI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar, NodeSchemaSources are only used for tests, so this might not matter. Using the tempURI.getPath is probably fine for these.

Copy link
Contributor

@mbeckerle mbeckerle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I have no further objections other than the things Steve has raised that are unresolved as yet.

Copy link
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Looks good to me.

// we will be passing in and receiving back an absolute diagnosticFilepath from resolveSchemaLocation.
// In just this case, we want to ignore that absolute filepath and use the diagnosticFilepath
// from main, which is the XMLSchemaDocument diagnosticFilepath
val finalUriSchemaSource = if (xsdArg.isBootStrapSD) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do the letters "SD" stand for in "isBootStrapSD"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Schema Document? To bootstrap schema compilation we create a fake schema that imports the actual main schema. This makes some logic a little weird since we need special cases for this bootstrap logic, but it makes a lot of other logic more consistent, since importing the main schema is exaclyt the same as importing any other schema.

Lola and I breifly looked at removing the fake schema and we came to the conclusion that too many things rely on it.


org.apache.daffodil.japi.Compiler c = Daffodil.compiler();
java.io.File schemaFile = getResource("/test/japi/mySchema1.dfdl.xsd");
java.io.File schemaFile = new File("daffodil-japi/src/test/resources/test/japi/mySchema1.dfdl.xsd");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for replacing getResource (was it a function we defined ourselves? I thought Java's getResource returned an URI or null, not a File) with new File?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually our own getResource function which wraps java's and converts the URI to a File

I think the idea with these changes is that the File that is returned is an absolute File path to a jar (e.g. something like jar:file:/path/to/foo.jar!/xsd/foo.dfdl.xsd), and so any diagnostics will not be depersonalized, they'll be these long paths.

By using a relative File, the diagnostic path is relative and should be depersonalized.

But both should work though. In fact, I would suggest we undo this change since we don't really care about diagnostics for unit tests, and the getResource stuff is a bit cleaner. It would also ensure we have tests that make sure you can call compileFile with a jar File.

If we want, when compileFile is called we could detect if the file is a jar and do the Misc.stripJarPrefix thing to get a good diagnostic path for cases like this where jar files are used.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Feels like the right approach and I've doe some basic testing and it looks to be doing the right thing. Just some minor comments that might help simplify things a bit, and maybe remove some complex edge cases.

// pass it to resolveSchemaLocation to find the actual file or resource
val cwd = Paths.get("").toUri
XMLUtils.resolveSchemaLocation(uri.toString, Some(cwd))
XMLUtils.resolveSchemaLocation(uri.toString, Some(URISchemaSource(uri.getPath, cwd)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird to me:

URISchemaSource(uri.getPath, cwd)

This creates a uri schema source where the uri is CWD but the diagnostic path to be used for that is the path of the URI? Shouldn't the diagnostic of the context also be cwd?

I guess because resolveSchemaLocation uses resolveSibling() it kindof requires the diagnostic part of URISchemaSource to be to an actual file? Which kindof makes sense, a schemaLocation also comes from a file doing an include/import, not from a directory. Maybe this wants to be something like:

val contextPath = Paths.get("fakeContext.dfdl.xsd").toAbsolutePath
val contextSource = URISchemaSource(contextPath.toString, conextPath.toUri)
XMLUtils.resolveSchemaLocation(uri.toString, Some(contextSource))

This way resolveSchemaLocation has an actual file to resolve relative paths against (even though that file doesn't technically exist). The context will never be output so the fact that it's fake doesn't really matter, but we need a comment that describes that resolveSchemaLocation expects a context that is a a file for diagnostic purposes.

// we create a new SchemaFileLocation because the Xerces error has line and column info that
// the original schemaFileLocation that's passed in doesn't contain, so we create this more
// complete SchemaFileLocation
val xsfl = new XercesSchemaFileLocatable(err, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second parameter to XercesSchemaFileLocatable is always empty string. Is this intentional or was this a place holder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was intentional. It represents the diagnostic debug name, which we changed from "Schema File" to an empty string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, makes sense. So does it always want to be empty string then? Maybe instead of it being a parameter XercesSchemaFileLocatable can do something like override diagnosticDebugName = "" with a comment explaining why that's preferred over "Schema File"?

val newSchemaLocation = SchemaFileLocation(
xsfl,
schemaFileLocation,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed XercesSchmaFileLocatable doesn't implement the SchemaFileLocatable trait. Should it, then you wouldn't need a new SchemaFileLocation apply method specific to Xerces. This could just bt

val xsfl = new XercesSchemaFileLocatable(err, "", schemaFileLocation)
val newSchemaLocation = SchemaFileLocation(xsfl)

Just thinking of ways to simplfy things. This is nice since now ScheaFileLocation doesn't have any know anything about some weird Xerces special case.

val enclosingSchemaURIAndDFP = schemaFile.map { sf =>
(sf.schemaSource.diagnosticFilepath, sf.schemaSource.uriForLoading)
}
val enclosingSchemaURISchemaSource = enclosingSchemaURIAndDFP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do something like

val enclosingSchemaURISchemaSource = schemaFile.map { sf => sf.schemaSource }

Why do we extract the uri and diagnostic out of sf.schemaSouce and then create a new schema source?

// we will be passing in and receiving back an absolute diagnosticFilepath from resolveSchemaLocation.
// In just this case, we want to ignore that absolute filepath and use the diagnosticFilepath
// from main, which is the XMLSchemaDocument diagnosticFilepath
val finalUriSchemaSource = if (xsdArg.isBootStrapSD) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Schema Document? To bootstrap schema compilation we create a fake schema that imports the actual main schema. This makes some logic a little weird since we need special cases for this bootstrap logic, but it makes a lot of other logic more consistent, since importing the main schema is exaclyt the same as importing any other schema.

Lola and I breifly looked at removing the fake schema and we came to the conclusion that too many things rely on it.

try {
uri.toURL.openStream.close
Some(uri, false)
val uss = URISchemaSource(uri.getPath, uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest we call uriToDiagosticPath here to get a diganostic path. Absolute URI's should be pretty rare, and as mentioned in a comment above, we may even want to remove support for it some day. In the rare case they are used, we can fall back to our heuristic.

diagnosticFilepathPaths.toString
} else {
diagnosticFilepathPaths.resolveSibling(uri.getPath).toString
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some comments about what is going on here. I'm not sure I understand why we need this.

Seem this logic is because sometimes the diagnosticFilePath isn't a Path, like it's a URI or something? Maybe if we require that it must be a Path, mentioned in another comment, then this special case goes away? If diagnosticFilePath is just a Path, we can always just call resolveSibling here and it should just work?

//
// In this case, we have a real TDML file (or resource) to open
URISchemaSource(uri)
val path = if (Misc.isNullOrBlank(uri.getPath)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this in case uri is a "jar" URI? Maybe this goes away with the uriToDiagnosticPath idea mentioend above?

This probably raises an issue that we want to be careful about calling uri.getPath, since that doesn't work for "jar" uris. I imagine we don't do that except for converting URI's to diagnostics.

Copy link
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, only suggestion is to avoid using uriToDiagnosticFile, that should really only be used when all we have is a URI and know nothing about where it came from. If we have more information, we can probably use that to make a better diagnosticFile.

optRootNamespace: Option[String] = None,
): ProcessorFactory = {
val source = URISchemaSource(file.toURI)
val source = URISchemaSource(Misc.uriToDiagnosticFile(file.toURI), file.toURI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass in file as the diagnostic here? file.toURI creates an absolute URI, so the resulting diagnostic File will be absolute even if the original File was relative. If we pass in file directly, our diagnostic file stays relative.

val compiler = Compiler()
val uri = Misc.getRequiredResource(resourcePathString)
val schemaSource = URISchemaSource(uri)
val schemaSource = URISchemaSource(Misc.uriToDiagnosticFile(uri), uri)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably wants to be Paths.get(resourcePathString).toFile. I think in most cases we should avoid using uriToDiagnosticFile unless we really only have a URI and don't have the original thing that led to use getting a URI.

}
val res = URISchemaSource(new File(tmpXMLFileName).toURI)
val res =
URISchemaSource(Paths.get(tmpXMLFileName).toFile, new File(tmpXMLFileName).toURI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already create a file, can we pull hat out and reuse it insead of doing Paths.get.toFile?

URISchemaSource(Misc.uriToDiagnosticFile(uri), uri),
)

def fromFile(file: File) = fromURI(file.toURI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much this matters this is just config stuff, but we have a File here we can use for diagnostics, e.g.

def fromFile(fiel: File) = fromSchemaSource(
  URISchemaSource(file, file.toURI)
}

It will create slightly better diagnostics if the file is a relative path.

}) {
tempSchemaFileFromNode: File,
) extends URISchemaSource(
Misc.uriToDiagnosticFile(tempSchemaFileFromNode.toURI),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in tempSchemaFileFromNode here, no need to convert a File to a URI back to a File.

contextURIDiagnosticFile.toPath
.resolveSibling(Paths.get(uri.getPath))
.normalize()
.toFile
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move these vals inside the map below? We probably only want to do this logic is resolveRelativeOnly found something.

// are strict about how absolute vs relative schemaLocations are resolved.
val pair = Option(this.getClass.getResource("/" + uri.getPath))
.map { _.toURI }
.map { r => URISchemaSource(Misc.uriToDiagnosticFile(r.toURI), r.toURI) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, we probably want to create the diagnostic File directly. Something like Paths.get("/" + uri.getPath).toFile.

val dafextURI = XMLUtils.dafextURI
val node = ldr.load(URISchemaSource(file.toURI), Some(dafextURI))
val node = ldr.load(
URISchemaSource(Misc.uriToDiagnosticFile(file.toURI), file.toURI),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use file as the diagnostic.

def testScalaAPI4b(): Unit = {
val c = Daffodil.compiler()

val schemaFileName = getResource("/test/sapi/mySchema3.dfdl.xsd")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we revert these changes so they match the Java API tests?

val newNode =
loader.load(URISchemaSource(uri), optTDMLSchema, addPositionAttributes = true)
loader.load(
URISchemaSource(Misc.uriToDiagnosticFile(uri), uri),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the file for the diag.

@olabusayoT olabusayoT force-pushed the daf-2195-depersonalize branch 2 times, most recently from 4275976 to b835c88 Compare March 20, 2024 17:51
- pass on java.io.File to URISchemaSource as well as URI; update usages
- update diagnostics to remove systemid but keep cause, line number and column number
- remove maxParentDirectoriesForDiagnostics tunable and diagnosticsStripLocationInfo tdml attribute as it's no longer used and has been replaced with diagnosticFile
- return URISchemaSource from resolveSchemaLocation and fileResourceURIConverter
- update URISchemaSource constructor to accept 2 arguments instead of tuple
- add XercesSchemaFileLocation class which uses xercesError and schemaFileLocation to create more complete schemaFileLocation
- if we're using a fakeSchemaDocXML, overwrite the diagnosticFile returned from resolveSchemaLocation to use the xmlSchemaDocument.diagnosticFile
- use Paths.resolveSibling for diagnosticFilepath resolution against context diagnosticFile
- set diagnosticFile to empty string for DaffodilXMLLoader with comment explaining why
- create uriToDiagnosticFile function with heuristic for jar/file/null scheme with separate handling
- pass in fake context in cwd to resolveSchemaLocation call from main
- clarification comments
- update tests to remove paths with unix path separators
- update/add tests

DAFFODIL-2195
@olabusayoT olabusayoT force-pushed the daf-2195-depersonalize branch from b835c88 to d024f17 Compare March 20, 2024 17:58
@olabusayoT olabusayoT merged commit 7db9e4e into apache:main Mar 20, 2024
@olabusayoT olabusayoT deleted the daf-2195-depersonalize branch March 20, 2024 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants