Expose the DFDLCatalogResolver to the SAPI/JAPI#862
Conversation
| /** | ||
| * Exposes access to the DFDLCatalogResolver to the Java API. | ||
| * | ||
| * Resolves URI/URL/URNs to loadable files/streams. |
There was a problem hiding this comment.
Should way make it clear that this thing is what Daffodil uses to resolve entities? E.g.
Returns the EntityResolver used by Daffodil to resolve import/include schemaLocations
The idea being that if you are using DFDL schemas in outside of Daffodil and you need Daffodil's same behavior, then this is what you should use.
Also, should we be specific how this thing we return actually resolves entities? For example, it looks at the catalog manager, it looks at class paths, it looks at relative paths, etc. I'm not sure exactly what the logic is, but this feels like the right place for that documentation, and would be useful information for people using this class and know how Daffodil actually resolves entities. It would also resolve DAFFODIL-2616
| * - org.apache.xerces.xni.parser.XMLEntityResolver | ||
| * - org.w3c.dom.ls.LSResourceResolver | ||
| * - org.apache.sax.EntityResolver | ||
| * - org.apache.sax.ext.EntityResolver2 |
There was a problem hiding this comment.
I'm not sure listing the interfaces is particularly helpful, especially since most people probably don't care too much about them aside from EntityResolver?
Maybe instead we just change the get function to look something like this:
def get: EntityResolver = DFDLCatalogResolver.getThat way it is self documenting that this returns an EntityResolver.
Though, would a user ever need one of the other resolver interfaces? I would guess no? But if so, maybe we have additional functions that return each one that a user might need? E.g.
def getEntityResolver: EntityResolver = DFDLCatalogResolver.get
def getXMLEntityResolver: XMLEntityResolver = DFDLCatalogResolver.get
...
Seems a bit verbose, and I'm not sure if that makes sense or not?
Maybe another option is just have a class, e.g.
class DaffodilXMLEntityResolver extends DFDLCatalogResolverAnd then maybe the javadoc just shows the interfaces that are implemented? I'm not sure if it does or not?
There was a problem hiding this comment.
I think I like the idea of the multipl getXXXResovler functions. I believe I needed to use the XMLEntityResolver with one of the EXI implementations, so I definitely needed more than just the SAXEntityResolver interface.
| */ | ||
| object DaffodilXMLEntityResolver { | ||
| def get = DFDLCatalogResolver.get | ||
| } |
There was a problem hiding this comment.
Have you checked that the java/scala docs show up reasonably well? I'm not sure how well comments on objects get converted to java docs.
There was a problem hiding this comment.
Actually looks just like a normal Java class documentation, so I think this is good to go
stevedlawrence
left a comment
There was a problem hiding this comment.
+1, which some minor suggestions
| * DFDLCatalogResolver uses an org.apache.xml.resolver.Catalog/CatalogManager to | ||
| * attempt to resolve namespaces and systemId's. If we are unable to resolve a | ||
| * file with the Catalog, we attempt to resolve the file using the systemId path | ||
| * from the schemaLocation attribute. If this fails we throw a |
There was a problem hiding this comment.
we attempt to resolve the file using the systemId path from the schemaLocation attribute.
Does this require a little extra clarification? Don't we look at both the classpath and the filesystem, and if so, which do we do first? Is it worth mentioning relative paths in schemaLocations are relative to the file that includes (it I think that's right)?
This might also work better as a numeric list, e.g.
The returned resolver looks in the following places to resolve entities in this order:
1. org.apache....CataogManager ...
2. Java classpath ...
3. Filesystem ...
There was a problem hiding this comment.
Actually, looking through our code I don't think we technically reach out to the filesystem at all. We only search for things on the classpath or relative to some context URI t hat gets passed in
There was a problem hiding this comment.
Ah, I guess that makes sense, and it just depends on if the root schema was found in on the classpath or the filesystem. If the root schema was found on the classpath all imports will look on the classpath (since they are relative to things found on the classpath), and likewise with the filesystem.
So maybe instead of mentioning the classpath we say something like, catalogmanager is first, otherwise schemaLocations are resolves relative to the importing schema URI, which could be a file on the filesystem or in a jar on the classpath. Probably can be worded better, but something like that to make it clear we support classpaths or file depending on the original root schema.
| * Returns the EntityResolver used by Daffodil to resolve import/include | ||
| * schemaLocations. | ||
| * | ||
| * DFDLCatalogResolver uses an org.apache.xml.resolver.Catalog/CatalogManager to |
There was a problem hiding this comment.
Don't mention DFDLCatalogResolver, that class isn't public so mentioning it in public API is just going to be confusing. The user will just get an EntityResolver, XmlEntityResolver, etc.--that's all they need to know about.
| * so we use ThreadLocal to only create one instance per thread. | ||
| */ | ||
| object DaffodilXMLEntityResolver { | ||
| def get = DFDLCatalogResolver.get |
There was a problem hiding this comment.
I don't think we should return this. No one needs access to a DFDLCatalogResolver and it's capabilities, they just need one of the other kind of revolvers. If we do find someone needs access to this, we can consider making it public. But until then let's keep that implementation private. It's much easier to make something public that's private than to make something private that's public.
tuxji
left a comment
There was a problem hiding this comment.
+1
I agree with all of Steve's comments. I'm good with the PR after you resolve them.
| * importing schema URI, which could either be a file on the filesystem or in | ||
| * a jar on the classpath. | ||
| * | ||
| * The DFDLCatalogResolver isn't thread safe, but it also is expensive and stateful, |
There was a problem hiding this comment.
One more instance of DFDLCatalogResolver to remove. New changes look good to me.
DAFFODIL-2739
726c34a to
c977900
Compare
DAFFODIL-2739