Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small issue in section v.4 of TEI Guidelines #1949

Closed
trevordavidlawrence opened this issue Dec 3, 2019 · 25 comments
Closed

Small issue in section v.4 of TEI Guidelines #1949

trevordavidlawrence opened this issue Dec 3, 2019 · 25 comments
Assignees

Comments

@trevordavidlawrence
Copy link

In section v.4 "Complicating the Issue" of the TEI guidelines, the third paragraph reads:

[...] In XPath it is also possible to indicate that any number of other items may intervene by repeating the solidus. For example, the XPath expression /anthology/poem//line[1] will refer to the first line of each poem in the anthology, irrespective of whether it is in a stanza.

This is at the very least slightly misleading. The XPath expression /anthology/poem//line[1] will not select the first line element of each poem element, rather it will select the descendant line elements of each poem element that are the first line within their parents.

This is demonstrated clearly by evaluating the expression against the document presented a few pages earlier:

<?xml version="1.0" encoding="UTF-8"?>
<anthology>
    <poem>
        <heading>The SICK ROSE</heading>
        <stanza>
            <line>O Rose thou art sick.</line>
            <line>The invisible worm,</line>
            <line>That flies in the night</line>
            <line>In the howling storm:</line>
        </stanza>
        <stanza>
            <line>Has found out thy bed</line>
            <line>Of crimson joy:</line>
            <line>And his dark secret love</line>
            <line>Does thy life destroy.</line>
        </stanza>
    </poem>
    <!-- more poems go here -->
</anthology>

The expression selects a sequence of two line elements:

<line>O Rose thou art sick.</line>
<line>Has found out thy bed</line>

as each one is the first line within their parent stanza elements. It does not select the "first line of each poem in the anthology" as stated.

To me, this initially-surprising interaction between positional predicates and the // operator in XPath is one of the most common trip-ups people new to XML/XPath will encounter, so I think it's important enough to warrant some rephrasing. Unfortunately, I can't actually think of a way of properly expressing the exact meaning of that XPath expression that matches the general level of complexity of the surrounding text. Possibly you could just drop the positional predicate from the example altogether and just say /anthology/poem//line will select all lines in the poem.

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

@trevordavidlawrence Thanks for this post. A simple fix to our XPath that would select only the very first line would be to wrap it in parentheses, so that the expression "walks the whole tree" first, and then selects the very first <line> element in the sequence of all results: (//anthology/poem//line)[1].

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

Here's the location in the Guidelines that we need to amend to correct the XPath: https://tei-c.org/release/doc/tei-p5-doc/en/html/SG.html#SG152

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

Now that I look at the section more closely, we should probably keep the XPath expression as is and amend the language about what it retrieves. This is simply retrieving the first <line> inside each <stanza>. We could add something more if it seems necessary to address how to retrieve the very first line in the poem, but that might be out of scope for this example.

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

No...of course, as @trevordavidlawrence points out, the language is really problematic, especially as connected to the example given. We'll need to correct the XPath and the language. @lujessica and I will give this a try soon and come back to this ticket to ask if our correction is an improvement!

@lb42
Copy link
Member

lb42 commented Dec 3, 2019

As the onlie begetter of this problem, I am delighted to see that it's taken so long to get noticed, and that no-one has quite got it right yet... There's a clue to the authorial intent in the phrase "irrespective of whether it is in a stanza." The expression will match the first line within its parent, whatever its parent may be. See what happens if for example you add some <line>s at the end outside a <stanza>, or a <poem> without any <stanza> children.

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

@lb42 The example here doesn't give us any <line> elements outside of a stanza, so that problem struck me as I was looking at it just now. If we want the matching of lines outside of stanzas to be a test case, we'd want an example that features such a thing. I don't think it's really necessary to change the nice, simple example though, to make the point about the solidus and, really, the descendant axis. I'm not sure how deep to get into the weeds here, but we do need to correct this.

@lb42
Copy link
Member

lb42 commented Dec 3, 2019

Them's not weeds!

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

Well, what if we add a little markup to the example? This is just a super-simple XML and not TEI, and the point is to show us something about XPath. Maybe something simple like <flower> to wrap around the word "rose" where it appears in the <heading> and the first <line>? And we make this example retrieve the first <flower> in the <poem> regardless of whether it's in a <line> or not?

@lb42
Copy link
Member

lb42 commented Dec 3, 2019

I'd suggest changing the explanation to say that the XPath (unchanged) will identify the first line of each stanza, as Trevor has indicated. I would however add a proviso that this xpath will also select the first line of a poem, if a poem has no stanzas, or if it has stanzas within blorts within wibbles. It's a part of explaining what // means, after all. I would NOT recommend introducing complexities like bracketting or the phrase "walks the whole tree".

@ebeshero
Copy link
Member

ebeshero commented Dec 3, 2019

Yeah--it seems the point is really about how the descendant axis works, and how selecting the first of a phenomenon might catch it at different levels of the tree. Our current example isn't really helpful for showing that.

@ebeshero
Copy link
Member

Okay, I'm reviewing this material in light of the full context and I think I see how we can fix it. @lujessica and I will be working on this together tomorrow. In passing I note that this links to some good but somewhat dated training resources on XPath. Any objections to my adding @djbpitt's XPath tutorial? http://dh.obdurodon.org/introduction-xpath.xhtml

@ebeshero
Copy link
Member

@lujessica and I are thinking while we're in here editing this, we might as well update the graphic (showing the tree structure) to make it match the example better, and to better show the XML hierarchy.

@lujessica
Copy link
Contributor

Our next move will be to revise the graphic, replacing title with heading, realigning heading with stanza, and removing numerical expressions from line.

@ebeshero ebeshero added this to the Guidelines 3.7.0 milestone Dec 14, 2019
@ebeshero
Copy link
Member

@lujessica I think the new diagram is much improved and definitely is a better match to our example, now that it's using the same element names and more clearly representing the XML hierarchy. Thanks for working on it! From my local build of the Guidelines the new diagram + corrected prose looks like this on the built HTML page--it's a little fuzzy but I think the old one was, too:

Screenshot 2019-12-20 11 26 34

I tried converting it to SVG in Inkscape to see if I could sharpen it up, but that didn't work. And then I thought I might try a little exercise to output this as SVG via XSLT over the poem--since these are just simple shapes with text in them. If I can get it looking a little sharper that way, I'll push something else up. But thanks for fixing this age-old problematic diagram and revising the prose. I'll close this ticket once I've either succeeded or given up on SVG-ifying our new graphic.

@ebeshero
Copy link
Member

ebeshero commented Dec 27, 2019

Yay! I set myself a little holiday XSLT puzzle to make an honest SVG of this diagram, and here's how it looks in my local build:
Screenshot 2019-12-26 20 47 40

@ebeshero
Copy link
Member

ebeshero commented Dec 27, 2019

I'm closing this ticket with 93ed1d6 as we've now thoroughly revised both the explanation and the associated graphic.

@lb42
Copy link
Member

lb42 commented Dec 27, 2019

The graphic is nice, but I still think it would have been better to include a stanza free poem in the example. Pedagogically better because more realistic -- many poems don't have stanzad

@lb42
Copy link
Member

lb42 commented Dec 27, 2019

... stanzas -- as well as motivating the issue in the text discussed initially on this ticket

@ebeshero
Copy link
Member

ebeshero commented Dec 27, 2019

If @lb42 will kindly review the initial posts on this ticket, it is evident that the issue is not poetic representation at all but a faulty explanation of XPath. Along the way we found faults with the graphic representing the example and we took time to render it with matching element names, and now as an SVG made with XSLT that is readily adaptable to any example. Because we need a simple XML example for the purposes of explaining XPath, I don’t at this point find fault with the example at all, but if you do, @lb42, perhaps you wish to reopen the ticket and supply a different base of poems. And don’t forget to change the discussion of XPath accordingly. I saved an updated example poems.xml and my XSLT that produces the SVG (xml-to-svg-flowchart.xsl) in the Images directory together with the SVG file now in place so I would urge you to please update them to match whatever change to the base example you now wish to implement.

@ebeshero
Copy link
Member

ebeshero commented Dec 27, 2019

By the way, the XSLT is optimized for the current example but if we go with a stanza-less poem with many lines, it will exceed the viewport I set for the SVG. I was thinking of determining the viewport by variables anyway so the XSLT is more universally applicable and renders something on one screen regardless of the size and hierarchy depth of the source documents. So if people really want to reopen this ticket and totally overhaul the example XML with a completely new set of poems, I am happy to adapt the XSLT that makes an SVG flowchart of them, whatever they are.

@sydb
Copy link
Member

sydb commented Dec 28, 2019

I’m not exactly on-topic here, @ebeshero, but I’m wondering if you could post your XSLT program, or better yet check it into P5/Utilities/.

@ebeshero
Copy link
Member

@sydb My XSLT is in P5/Images now, though I should check in a fresher version of it that seems to resize for a screen with changing source inputs. It was/is most definitely a fun “rabbit hole” for the holiday. I noticed one or two other XSLT files in Images that seemed to be connected to graphics in the Guidelines in some way so I thought that might be the right directory for it to keep it in context w/ both its immediate source and its output. Is Utilities/ better?

@lb42
Copy link
Member

lb42 commented Dec 28, 2019

@ebeshero I was merely saying that if I were to write this example again, I'd (a) choose an example in which one poem had stanzas and one didn't (b) show an XPath which would select the first line of each poem whether or not it occurred inside a stanza (c) Discuss. That was my original intention as well but somewhere along the line the text got mangled, whence the perceptive bug report.

@ebeshero
Copy link
Member

Reopening this for a moment while I replace the SVG with a .png image here after all, at least until we have worked out how to include SVG in the epub3 build of the Guidelines.

@ebeshero
Copy link
Member

ebeshero commented Dec 29, 2019

Closing this again, as all is well with our builds now, but I'll open a new ticket on the Stylesheets repo about handling SVG in the Guidelines and the troubles with epub3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants