Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDFXML: possible to load incorrect XML #2620

Closed
sszuev opened this issue Aug 4, 2024 · 8 comments · Fixed by #2627
Closed

RDFXML: possible to load incorrect XML #2620

sszuev opened this issue Aug 4, 2024 · 8 comments · Fixed by #2627
Labels

Comments

@sszuev
Copy link
Contributor

sszuev commented Aug 4, 2024

Version

5.1.0

What happened?

This time I expect failure (i.e. the behavior of Jena 4.x).
The document is OWL/XML, not RDF/XML.
It seems to be important: in owlcs/ONTAPI there is a loading mechanism that iterates over formats (both Jena and OWLAPI).
If the document is parsed as RDF/XML successfully, than mechanism stops and returns ready Graph. In this case it contains rubbish.


String data = "<?xml version=\"1.0\"?>\n" +
        "\n" +
        "\n" +
        "<!DOCTYPE Ontology [\n" +
        "    <!ENTITY xsd \"http://www.w3.org/2001/XMLSchema#\" >\n" +
        "    <!ENTITY xml \"http://www.w3.org/XML/1998/namespace\" >\n" +
        "    <!ENTITY rdfs \"http://www.w3.org/2000/01/rdf-schema#\" >\n" +
        "    <!ENTITY rdf \"http://www.w3.org/1999/02/22-rdf-syntax-ns#\" >\n" +
        "]>\n" +
        "\n" +
        "\n" +
        "<Ontology xmlns=\"http://www.w3.org/2002/07/owl#\"\n" +
        "     xml:base=\"http://www.derivo.de/ontologies/examples/anonymous-individuals\"\n" +
        "     xmlns:rdfs=\"http://www.w3.org/2000/01/rdf-schema#\"\n" +
        "     xmlns:xsd=\"http://www.w3.org/2001/XMLSchema#\"\n" +
        "     xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"\n" +
        "     xmlns:xml=\"http://www.w3.org/XML/1998/namespace\"\n" +
        "     ontologyIRI=\"http://www.derivo.de/ontologies/examples/anonymous-individuals\">\n" +
        "    <Prefix name=\"\" IRI=\"http://www.derivo.de/ontologies/examples/anonymous-individuals#\"/>\n" +
        "    <Prefix name=\"owl\" IRI=\"http://www.w3.org/2002/07/owl#\"/>\n" +
        "    <Prefix name=\"rdf\" IRI=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"/>\n" +
        "    <Prefix name=\"xsd\" IRI=\"http://www.w3.org/2001/XMLSchema#\"/>\n" +
        "    <Prefix name=\"rdfs\" IRI=\"http://www.w3.org/2000/01/rdf-schema#\"/>\n" +
        "    <Declaration>\n" +
        "        <Class IRI=\"#C\"/>\n" +
        "    </Declaration>\n" +
        "    <Declaration>\n" +
        "        <ObjectProperty IRI=\"#r\"/>\n" +
        "    </Declaration>\n" +
        "    <ClassAssertion>\n" +
        "        <Class IRI=\"#C\"/>\n" +
        "        <AnonymousIndividual nodeID=\"a\"/>\n" +
        "    </ClassAssertion>\n" +
        "    <ObjectPropertyAssertion>\n" +
        "        <ObjectProperty IRI=\"#r\"/>\n" +
        "        <AnonymousIndividual nodeID=\"a\"/>\n" +
        "        <AnonymousIndividual nodeID=\"a\"/>\n" +
        "    </ObjectPropertyAssertion>\n" +
        "</Ontology>";
ModelFactory.createDefaultModel().read(new StringReader(data), null, "rdf/xml")
        .setNsPrefixes(PrefixMapping.Standard).write(System.out, "ttl");

Relevant output and stacktrace

expected error, but got

PREFIX dc:   <http://purl.org/dc/elements/1.1/>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

[ rdf:type  owl:AnonymousIndividual ] .

[ rdf:type  owl:AnonymousIndividual ] .

[ rdf:type  owl:AnonymousIndividual ] .

[ rdf:type                     owl:Ontology;
  owl:ClassAssertion           [ rdf:type  owl:Class ];
  owl:Declaration              [ rdf:type  owl:ObjectProperty ];
  owl:Declaration              [ rdf:type  owl:Class ];
  owl:ObjectPropertyAssertion  [ rdf:type  owl:ObjectProperty ];
  owl:Prefix                   ""
] .

Are you interested in making a pull request?

None

@sszuev sszuev added the bug label Aug 4, 2024
@afs
Copy link
Member

afs commented Aug 4, 2024

There are a lot of warnings.

But it parses for me using riot for both Jena 4.10.0 and Jena 5.1.0 - different RDF though.
RDF/XML documents don't have to have a rdf:RDF.

Would it be possible to have more concise examples of problems? It should only need a short example.

@sszuev
Copy link
Contributor Author

sszuev commented Aug 5, 2024

What about this ?

String data = """
        <?xml version="1.0"?>
        <Ontology xmlns="http://www.w3.org/2002/07/owl#"
             xml:base="http://www.w3.org/2002/07/owl#"
             xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:xml="http://www.w3.org/XML/1998/namespace"
             xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
             xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
            <Prefix name="owl" IRI="http://www.w3.org/2002/07/owl#"/>
            <Prefix name="rdf" IRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
            <Prefix name="xml" IRI="http://www.w3.org/XML/1998/namespace"/>
            <Prefix name="xsd" IRI="http://www.w3.org/2001/XMLSchema#"/>
            <Prefix name="rdfs" IRI="http://www.w3.org/2000/01/rdf-schema#"/>
            <Declaration>
                <Class IRI="http://x#X"/>
            </Declaration>
        </Ontology>
        """;
ModelFactory.createDefaultModel()
        .read(new StringReader(data), null, "rdf/xml")
        .setNsPrefixes(PrefixMapping.Standard).write(System.out, "ttl");

output

PREFIX dc:   <http://purl.org/dc/elements/1.1/>
PREFIX owl:  <http://www.w3.org/2002/07/owl#>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

[ rdf:type         owl:Ontology;
  owl:Declaration  [ rdf:type  owl:Class ];
  owl:Prefix       ""
] .

To create such a document, OWLAPI or owlcs/ONTAPI can be used:

OWLOntologyManager m = OntManagers.createOWLAPIImplManager();
OWLOntology ont = m.createOntology();
ont.add(m.getOWLDataFactory().getOWLDeclarationAxiom(m.getOWLDataFactory().getOWLClass(IRI.create("http://x#X"))));
ont.saveOntology(new OWLXMLDocumentFormat(), System.out);

@afs afs added question and removed bug labels Aug 5, 2024
@afs
Copy link
Member

afs commented Aug 5, 2024

I can't make Jena4 cause an error parsing; it does throw an exception when writing the model due to a relative IRI for a property.

Jena4 parsing issues warnings, as does Jena5.

There is a difference output between Jena4 and Jena5.

<?xml version="1.0"?>

<Ontology xmlns:p="http://example/ns#">
  <Prefix p:name="rdf"/>
</Ontology>

with p:name - no warnings. If the XML attribute is name= (no namespace) there is a warning.

In Jena4, ARP issues a warning and outputs a relative URI property. It also generate an relative URI for Ontology.

In Jena5, RRX issues a warning and skips the triple. It resolves the Ontology against the base (and there is always a base which may be external).

This is by design. In writing RRX, I checked all the cases ARP supports. It was decided by the RDF 1.0 WG that bare attributes were not legal - they had been in the earlier design phases but that never made it to a spec. So in RRX it a warning and skipping, it is not a hard error due to legacy with ARP. Relative URIs will get into trouble later!

I'd be happy for that to become a error, not warning.

The fact that the doc parses as RDF/XML at all is because the root qname does not have to be rdf:RDF, for a single top-level element, that can be omitted. This is the case Jena4 and Jena5, in ARP and RRX.

ARP (0 - the original; 1 - more integrated into RIOT error handling) is available in Jena5, at the moment, as lang name "arp0", "arp1" or as a file extension, and deprecated constant RRX.RDFXML_ARP0

Expect ARP0 to go away soon.

ARP1 was the RDF/XML parser from started in 4.7.0 to 4.10.0,

@afs
Copy link
Member

afs commented Aug 5, 2024

As the distingushing RDF/XML from OWL/XML, it hard/impossible in the most general case. It would be better to read the input, snoop for the top level element and decide then reparse (all after checking MIME type and file extension if available).

File extension owl was never registered and in the wild, you can find owl for RDF/XML.

@sszuev
Copy link
Contributor Author

sszuev commented Aug 5, 2024

I can't make Jena4 cause an error parsing; it does throw an exception when writing the model due to a relative IRI for a property.

It seems need to use the original ontology document https://github.com/owlcs/ont-api/blob/4.x.x/src/test/resources/owlapi/owlxml_anonloop.owx

@afs
Copy link
Member

afs commented Aug 5, 2024

What's the error message, and is there a small extract that picks up the feature in error?

@sszuev
Copy link
Contributor Author

sszuev commented Aug 5, 2024

<?xml version="1.0"?>
<Ontology xmlns="http://www.w3.org/2002/07/owl#"
     xml:base="http://www.derivo.de/ontologies/examples/anonymous-individuals"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:xml="http://www.w3.org/XML/1998/namespace"
     ontologyIRI="http://www.derivo.de/ontologies/examples/anonymous-individuals">
    <Prefix name="" IRI="http://www.derivo.de/ontologies/examples/anonymous-individuals#"/>
    <Prefix name="owl" IRI="http://www.w3.org/2002/07/owl#"/>
    <Prefix name="rdf" IRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
    <Prefix name="xsd" IRI="http://www.w3.org/2001/XMLSchema#"/>
    <Prefix name="rdfs" IRI="http://www.w3.org/2000/01/rdf-schema#"/>
    <ClassAssertion>
        <Class IRI="#C"/>
        <AnonymousIndividual nodeID="a"/>
    </ClassAssertion>
</Ontology>
Exception in thread "main" org.apache.jena.riot.RiotException: [line: 16, col: 42] {E201} Multiple children of property element
	at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:160)
	at org.apache.jena.riot.lang.ReaderRIOTRDFXML$ErrorHandlerBridge.error(ReaderRIOTRDFXML.java:292)
	at org.apache.jena.rdfxml.xmlinput.impl.ARPSaxErrorHandler.error(ARPSaxErrorHandler.java:37)
	at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:206)
	at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:183)
	at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.warning(XMLHandler.java:178)
	at org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.warning(ParserSupport.java:147)
	at org.apache.jena.rdfxml.xmlinput.states.Frame.warning(Frame.java:57)
	at org.apache.jena.rdfxml.xmlinput.states.WantLiteralValueOrDescription.startElement(WantLiteralValueOrDescription.java:38)
	at org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.startElement(XMLHandler.java:121)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:510)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractXMLDocumentParser.emptyElement(AbstractXMLDocumentParser.java:183)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:351)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2710)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:534)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
	at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
	at org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:100)
	at org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:88)
	at org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:101)
	at org.apache.jena.riot.lang.ReaderRIOTRDFXML.parse(ReaderRIOTRDFXML.java:173)
	at org.apache.jena.riot.lang.ReaderRIOTRDFXML.read(ReaderRIOTRDFXML.java:72)
	at org.apache.jena.riot.RDFParser.read(RDFParser.java:420)
	at org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:406)
	at org.apache.jena.riot.RDFParser.parse(RDFParser.java:356)
	at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:570)
	at org.apache.jena.riot.RDFDataMgr.parseFromReader(RDFDataMgr.java:728)
	at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:287)
	at org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:271)
	at org.apache.jena.riot.adapters.RDFReaderRIOT.read(RDFReaderRIOT.java:62)
	at org.apache.jena.rdf.model.impl.ModelCom.read(ModelCom.java:245)

@afs
Copy link
Member

afs commented Aug 7, 2024

The error occurs because of two objects:

        <Outer xmlns="http://BASE/">
          <Prop>
            <A/>
            <B/>
          </Prop>
        </Outer>

Only the SAX based RRX parser is affected - the two StAX based ones, and ARP, detect this mistake.

Parsing OWLx (".owx") as RDF/XML and expecting an error isn't guaranteed. There are some simple documents that will pass.

afs added a commit to afs/jena that referenced this issue Aug 9, 2024
afs added a commit to afs/jena that referenced this issue Aug 9, 2024
afs added a commit to afs/jena that referenced this issue Aug 9, 2024
afs added a commit to afs/jena that referenced this issue Aug 9, 2024
sszuev added a commit to owlcs/ont-api that referenced this issue Aug 9, 2024
afs added a commit that referenced this issue Aug 11, 2024
@afs afs closed this as completed in #2627 Aug 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants