Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

folialint produces invalid FoLiA out of dubious input #23

Closed
kosloot opened this issue Oct 30, 2018 · 6 comments
Closed

folialint produces invalid FoLiA out of dubious input #23

kosloot opened this issue Oct 30, 2018 · 6 comments
Assignees

Comments

@kosloot
Copy link
Contributor

kosloot commented Oct 30, 2018

related to proycon/flat#138

Consider this file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="test" generator="libfolia-v0.14" version="0.12.0">
  <metadata type="native">
    <annotations>
      <pos-annotation set="pos"/>
      <syntax-annotation set="syn"/>
    </annotations>
  </metadata>
  <text xml:id="test.text">
    <s xml:id="s.1">
      <w xml:id="s.1.w.1">
	<t>Is@</t>
	<pos class="BEP" />
      </w>
      <syntax>
	<su xml:id="s.1.su.1" class="IP-MAT">
          <su xml:id="s.1.su.2" class="NP-SBJ">
            <w xml:id="s.1.su.w.1">
              <t>*exp*</t>
              <pos class="EX" />
            </w>
          </su>
	</su>
    </syntax>
    </s>
  </text>
</FoLiA>

It contains a <w> in the <su> node that IS NOT present in the <s> itself.
That is a construction which is (until now) never thought of.

When running folialint on this file, an INVALID output is produced:

<?xml-stylesheet type="text/xsl" href="folia.xsl"?>
<FoLiA xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://ilk.uvt.nl/folia" xml:id="test" generator="libfol
ia-v1.20" version="0.12.0">
  <metadata type="native">
    <annotations>
      <pos-annotation set="pos"/>
      <syntax-annotation set="syn"/>
    </annotations>
  </metadata>
  <text xml:id="test.text">
    <s xml:id="s.1">
      <w xml:id="s.1.w.1">
        <t>Is@</t>
        <pos class="BEP"/>
      </w>
      <syntax>
        <su xml:id="s.1.su.1" class="IP-MAT">
          <su xml:id="s.1.su.2" class="NP-SBJ">
            <wref id="s.1.su.w.1" t="*exp*"/>
          </su>
        </su>
      </syntax>
    </s>
  </text>
</FoLiA>

A <wref> is generated to a non existing word!

Desired behavior:

  • Or reject the input
  • Or emit a real <w> and not a <wref>
@proycon
Copy link
Member

proycon commented Oct 30, 2018

also cross-referencing proycon/folia#58

@kosloot
Copy link
Contributor Author

kosloot commented Oct 30, 2018

I added a patch to libfolia to output the <w> AS IS when it is the only occurrence. (so NO reference to the same <w> elsewhere.

@kosloot
Copy link
Contributor Author

kosloot commented Oct 30, 2018

I ran foliavalidator too, on this file: It rejects it:

Error on line 19: Element su has extra content: w
Error on line 0: Extra element su in interleave
Error on line 17: Element su failed to validate content
Error on line 17: Element syntax failed to validate content
Error on line 0: Extra element syntax in interleave
Error on line 3: Element FoLiA failed to validate content
VALIDATION ERROR against RelaxNG schema (stage 1/2), in tests/scary.xml
Element su has extra content: w, line 19

@kosloot
Copy link
Contributor Author

kosloot commented Oct 30, 2018

Maybe an open door. But text inside this kind of words is exempt from all text processing.
Which is probably exact what you would like to see...
In the example, the s.text() function should return just Is@, and not Is@ *exp*

@kosloot
Copy link
Contributor Author

kosloot commented Jan 21, 2019

I modified libfolia to also reject this kind of construction:

failed: XML error: connecting a <w> to an <su> is forbidden, use <wref>

I think is is correct now

@proycon
Copy link
Member

proycon commented Jan 21, 2019

Agreed

@kosloot kosloot closed this as completed Mar 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants