Skip to content

Expose the DFDLCatalogResolver to the SAPI/JAPI#862

Merged
jadams-tresys merged 1 commit into
apache:mainfrom
jadams-tresys:DFDLCatalogResolver
Oct 28, 2022
Merged

Expose the DFDLCatalogResolver to the SAPI/JAPI#862
jadams-tresys merged 1 commit into
apache:mainfrom
jadams-tresys:DFDLCatalogResolver

Conversation

@jadams-tresys
Copy link
Copy Markdown
Contributor

Comment thread daffodil-sapi/src/main/scala/org/apache/daffodil/sapi/Daffodil.scala Outdated
/**
* Exposes access to the DFDLCatalogResolver to the Java API.
*
* Resolves URI/URL/URNs to loadable files/streams.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should way make it clear that this thing is what Daffodil uses to resolve entities? E.g.

Returns the EntityResolver used by Daffodil to resolve import/include schemaLocations

The idea being that if you are using DFDL schemas in outside of Daffodil and you need Daffodil's same behavior, then this is what you should use.

Also, should we be specific how this thing we return actually resolves entities? For example, it looks at the catalog manager, it looks at class paths, it looks at relative paths, etc. I'm not sure exactly what the logic is, but this feels like the right place for that documentation, and would be useful information for people using this class and know how Daffodil actually resolves entities. It would also resolve DAFFODIL-2616

* - org.apache.xerces.xni.parser.XMLEntityResolver
* - org.w3c.dom.ls.LSResourceResolver
* - org.apache.sax.EntityResolver
* - org.apache.sax.ext.EntityResolver2
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure listing the interfaces is particularly helpful, especially since most people probably don't care too much about them aside from EntityResolver?

Maybe instead we just change the get function to look something like this:

def get: EntityResolver = DFDLCatalogResolver.get

That way it is self documenting that this returns an EntityResolver.

Though, would a user ever need one of the other resolver interfaces? I would guess no? But if so, maybe we have additional functions that return each one that a user might need? E.g.

def getEntityResolver: EntityResolver = DFDLCatalogResolver.get
def getXMLEntityResolver: XMLEntityResolver = DFDLCatalogResolver.get
...

Seems a bit verbose, and I'm not sure if that makes sense or not?

Maybe another option is just have a class, e.g.

class DaffodilXMLEntityResolver extends DFDLCatalogResolver

And then maybe the javadoc just shows the interfaces that are implemented? I'm not sure if it does or not?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the idea of the multipl getXXXResovler functions. I believe I needed to use the XMLEntityResolver with one of the EXI implementations, so I definitely needed more than just the SAXEntityResolver interface.

*/
object DaffodilXMLEntityResolver {
def get = DFDLCatalogResolver.get
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked that the java/scala docs show up reasonably well? I'm not sure how well comments on objects get converted to java docs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looks just like a normal Java class documentation, so I think this is good to go

Comment thread daffodil-sapi/src/main/scala/org/apache/daffodil/sapi/Daffodil.scala Outdated
Copy link
Copy Markdown
Member

@stevedlawrence stevedlawrence left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, which some minor suggestions

* DFDLCatalogResolver uses an org.apache.xml.resolver.Catalog/CatalogManager to
* attempt to resolve namespaces and systemId's. If we are unable to resolve a
* file with the Catalog, we attempt to resolve the file using the systemId path
* from the schemaLocation attribute. If this fails we throw a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we attempt to resolve the file using the systemId path from the schemaLocation attribute.

Does this require a little extra clarification? Don't we look at both the classpath and the filesystem, and if so, which do we do first? Is it worth mentioning relative paths in schemaLocations are relative to the file that includes (it I think that's right)?

This might also work better as a numeric list, e.g.

The returned resolver looks in the following places to resolve entities in this order:

1. org.apache....CataogManager  ...
2. Java classpath ...
3. Filesystem ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking through our code I don't think we technically reach out to the filesystem at all. We only search for things on the classpath or relative to some context URI t hat gets passed in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I guess that makes sense, and it just depends on if the root schema was found in on the classpath or the filesystem. If the root schema was found on the classpath all imports will look on the classpath (since they are relative to things found on the classpath), and likewise with the filesystem.

So maybe instead of mentioning the classpath we say something like, catalogmanager is first, otherwise schemaLocations are resolves relative to the importing schema URI, which could be a file on the filesystem or in a jar on the classpath. Probably can be worded better, but something like that to make it clear we support classpaths or file depending on the original root schema.

* Returns the EntityResolver used by Daffodil to resolve import/include
* schemaLocations.
*
* DFDLCatalogResolver uses an org.apache.xml.resolver.Catalog/CatalogManager to
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't mention DFDLCatalogResolver, that class isn't public so mentioning it in public API is just going to be confusing. The user will just get an EntityResolver, XmlEntityResolver, etc.--that's all they need to know about.

* so we use ThreadLocal to only create one instance per thread.
*/
object DaffodilXMLEntityResolver {
def get = DFDLCatalogResolver.get
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return this. No one needs access to a DFDLCatalogResolver and it's capabilities, they just need one of the other kind of revolvers. If we do find someone needs access to this, we can consider making it public. But until then let's keep that implementation private. It's much easier to make something public that's private than to make something private that's public.

Copy link
Copy Markdown
Contributor

@tuxji tuxji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I agree with all of Steve's comments. I'm good with the PR after you resolve them.

* importing schema URI, which could either be a file on the filesystem or in
* a jar on the classpath.
*
* The DFDLCatalogResolver isn't thread safe, but it also is expensive and stateful,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more instance of DFDLCatalogResolver to remove. New changes look good to me.

@jadams-tresys jadams-tresys merged commit 4f84e04 into apache:main Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants