Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<lg> and <gap> #1419

Closed
hcayless opened this issue Jan 5, 2016 · 21 comments
Closed

<lg> and <gap> #1419

hcayless opened this issue Jan 5, 2016 · 21 comments

Comments

@hcayless
Copy link
Member

hcayless commented Jan 5, 2016

I have a case where an entire strophe of a poem has been omitted. Each strophe is wrapped in an <lg>. I'd like to be able to do

<lg type="strophe">
  <gap reason="lost" quantity="5" unit="line"/>
</lg>

but the content model of <lg> demands at least one ( model.lLike | model.stageLike | model.labelLike | lg ) before model.global (and hence <gap>) is permitted. Interestingly, the Schematron rule

<sch:assert test="count(descendant::tei:lg|descendant::tei:l|descendant::tei:gap) 
  &gt; 0">An lg element must contain at least one child l, lg or gap element.</sch:assert>

implies that an <lg> containing only a <gap> should be legal. I suggest the content model be changed to permit a single <gap> inside <lg>.

@martindholmes
Copy link
Contributor

Makes sense, but I think you might want to allow multiple gaps too, in case the missing bits are missing in different ways or for different reasons.

@jamescummings
Copy link
Member

Seems reasonable to me.

@hcayless
Copy link
Member Author

hcayless commented Jan 5, 2016

Sure. Changing the first required part of the content model to allow gap would achieve that too, because it’s fine to have gaps after one of the permitted elements. Actually, it’s even worse, isn’t it, because you can’t have gap followed by lines inside an lg. So it really does need to be fixed.

On Jan 5, 2016, at 11:47 , Martin Holmes notifications@github.com wrote:

Makes sense, but I think you might want to allow multiple gaps too, in case the missing bits are missing in different ways or for different reasons.


Reply to this email directly or view it on GitHub #1419 (comment).

@lb42
Copy link
Member

lb42 commented Jan 5, 2016

If an entire strophe is missing, shouldnt that be a <gap quantity="1" unit="strophe"/> ? I am having trouble with the concept of a strophe which is present but has nothing in it.

@hcayless
Copy link
Member Author

hcayless commented Jan 5, 2016

It’s a conjecture, actually. The editor thinks there’s a missing strophe and conjectures an empty one. In the edition, it’s marked like:

. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .

The whole thing would be wrapped in an app, too, but I thought that was unnecessarily complicating things for purposes of a bug report.

On Jan 5, 2016, at 12:01 , Lou notifications@github.com wrote:

If an entire strophe is missing, shouldnt that be a ? I am having trouble with the concept of a strophe which is present but has nothing in it.


Reply to this email directly or view it on GitHub #1419 (comment).

@lb42
Copy link
Member

lb42 commented Jan 5, 2016

OK, so this confirms my view that you have a <gap quantity="1" unit="strophe" resp="#madEditor"/> and there is no need to change the content of <lg> to permit something which seems ontologically implausible.

@raffazizzi
Copy link
Contributor

I agree with @lb42: if the strophe is missing, I wouldn't expect <lg> to be there. It's true, though, that there wouldn't be an easy place where to specify the number of lines... maybe @extent? ("indicates the size of the object concerned using a project-specific vocabulary combining quantity and units in a single string of words")

<gap quantity="1" unit="strophe" extent="5 lines"/>

@hcayless
Copy link
Member Author

hcayless commented Jan 5, 2016

There are actually good reasons for thinking a strophe has dropped out :-). Your approach makes it much harder to make an argument about the poem’s structure in the encoding. Plus there’s the problem of having to deal with encoding the size, AND there’s still the issue that we can’t have a line group with a missing first line.

On Jan 5, 2016, at 12:08 , Lou notifications@github.com wrote:

OK, so this confirms my view that you have a and there is no need to change the content of to permit something which seems ontologically implausible.


Reply to this email directly or view it on GitHub #1419 (comment).

@jamescummings
Copy link
Member

I was assuming that there is some other indication of the strophe. So let's say there are a series of strophes (that happens, right?) the start of which is marked in some renditional but non-textual way. And for one of them the text is entirely missing. If it takes up the same space, has the same layout and external markings but just the text content of the lines is gone, then I can see why @hcayless would want to mark it up this way. Markup isn't just for marking what is there, but for documenting an argument or interpretation, right? (Ok, I would probably mark it up as @raffazizzi has done to be honest... but I can see why one might want to do this.)

@lb42
Copy link
Member

lb42 commented Jan 5, 2016

No-one is disputing that a strophe may be missing: just how to indicate the fact. I think Raff's suggestion indicates well enough how to encode the size. I agree however that the content model needs change if it doesn't permit something like

<lg> <gap/> <l/> </lg>

(And I still think the schematron rule you cited above is just wrong)

@hcayless
Copy link
Member Author

hcayless commented Jan 5, 2016

@jamescummings yes, this is part of a structure where it's clear that what's dropped out is an entire line group. The editor is saying: "Hey, there's a line group here, but I don't know what went in it, only how big it was." Personally, I'm not at all comfortable with the idea of @unit containing a conceptual thing rather than an actual unit of measurement (lines, characters, centimeters, etc.). I wouldn't like @unit="paragraph" either, for the same reason. A strophe doesn't have a certain length (even possibly within a single poem, though in this particular case they're always 5 lines), so @lb42's and @raffazizzi's suggestions don't really satisfy.

@lb42
Copy link
Member

lb42 commented Jan 5, 2016

Here are some cases of <gap being used for a "conceptual thing" rather than a unit of measurement in the Guidelines, or of @extent being used for an arbitrary measurement of some kind

<gap quantity="1" unit="essay"
 reason="sampling"/>
<gap extent="3 sylls"/>
<gap extent="2 beats"/>
<gap reason="passing-truck" quantity="5"  unit="s"/>
 <gap reason="illegible" agent="rubbing"  quantity="4" unit="letter"/>

@raffazizzi
Copy link
Contributor

To me adding <lg> in a text that otherwise is a transcription of primary source implies that there is a line group there. You said:

The editor is saying: "Hey, there's a line group here, but I don't know what went in it, only how big it was."

but how can there be a line group if there are no lines? Maybe there's a <space>?

I think it makes more sense to encoding the conjecture of the editor thinking a line group should be there with <gap>, but I agree with your concerns about its expressiveness particularly re: line numbers:

A strophe doesn't have a certain length (even possibly within a single poem, though in this particular case they're always 5 lines)

I'm not sure what to suggest there...

@PFSchaffner
Copy link
Member

FWIW, I think that in many if not most cases in which we have bits of structure missing, we do attempt to create empty structures down to the lowest level that can be reasonably reconstructed, and fill them with gaps at that level. E.g. a missing scene in a play, a missing chapter, or a missing stanza, thus:

<lg>
<l>``<gap extent='1 span'/>``</l>
<l>``<gap extent='1 span'/>``</l>
<l>``<gap extent='1 span'/>``</l>
<l>``<gap extent='1 span'/>``</l>
<l>``<gap extent='1 span'/>``</l>
</lg>

counter-intuitive though that may be. It certainly makes rendition easier (".....").

@hcayless
Copy link
Member Author

hcayless commented Jan 5, 2016

@lb42: The first one does seem a bit silly to me; @extent is there so you can use project-specific vocabulary, so no problem with 2 and 3. I assume the ’s’ in the 4th means 'second', so it's perfectly fine, and 'letter' or 'character' can be a unit of measurement (albeit a little imprecise). The latter is one of the recommended values of @unit. Strophe isn't a unit of measurement. I don't think essay is either. I'm tempted to file a bug on that example.

@raffazizzi this isn't a primary source transcription though, we're encoding an edition. Each strophe is spoken by one of two characters, and it's an obvious that one character's "speech" is missing at this point. The editor prints something like:

(A.) . . . . . . . . . . 
     . . . . . . . . . .
     . . . . . . . . . .
     . . . . . . . . . .
     . . . . . . . . . .

and notes in the apparatus that other editors have attempted to resolve the problem by re-arranging the stanzas in various ways. He thinks there's a lacuna here. My reading of what he prints is that he's saying "There's a strophe here, with 5 lines of unknown content inside it." @PFSchaffner's solution is basically the same I think, but I can easily render what the editor printed from <gap unit="line" quantity="5" reason="lost"/>, without needing to encode empty lines.

@raffazizzi
Copy link
Contributor

@hcayless I see. Though if you had to encode a missing word from the same edition, say [....], you would encode it as <gap reason="lost" quantity="4" unit="character"/> not <w><gap unit="characters" extent="4"/></w>, right?
Although I concede that the latter encoding could still be useful / make sense in a tokenization.

@sydb
Copy link
Member

sydb commented Jan 25, 2016

  1. I think @hcayless is correct, essay is not a unit. I’m inclined to say a strophe is not a unit, either.
  2. I think encoding 5 <l> <gap/> </l> inside an <lg> is a perfectly reasonable solution … it even scales well to the case where you have some of the strophe, but not all of it. (I think this is how we would do this at the WWP, in large part because it lets us count the number of <l> elements. Except that we would use the <wwp:elision> element instead of <tei:gap/>.)
  3. I think encoding a <gap extent="1 strophe"/> is also a reasonable solution. This one works better in the case when you don’t know exactly how many metrical lines are missing. (Although it is clumsy to if you want to indicate both “1 stanza” and “5 or 7 metrical lines”.)
  4. However, if there really is a use case where an encoder wants to say <lg> <gap/> </lg>, I don’t see why the Guidelines should prevent it.

@lb : You assert (pardon the pun) that the Schematron (reproduced below) is “just wrong”. Besides the fact that it uses the descendant:: axis when it talks about children, and besides the fact that it uses a count() function superfluously, and besides the fact that it is missing the comma after “lg”, what is wrong with it? (I’m guessing you object to the implication that <gap> can fulfill the content of <lg>, but want to be sure.)

      <sch:assert test="count(descendant::tei:lg|descendant::tei:l|descendant::tei:gap) &gt; 0">An lg element
        must contain at least one child l, lg or gap element.</sch:assert>

@martindholmes
Copy link
Contributor

I think the descendant axis is used because of the possibility that the content may be enclosed within another element such as <argument> or <epigraph>; the idea I believe was that the content must at some level have line(s), linegroup(s), or gap(s) where they should be, but it's difficult to predict, given the complexity of the content model, whether it will be a direct child or not.

@sydb
Copy link
Member

sydb commented Jan 25, 2016

Well, if that is why it was used, it seems to run against both the prose of the <assert> and the normal “TEI abstract model” definition of a line group (at least, in my mind). But far more importantly, neither <argument> nor <epigraph> satisfy the RELAX NG content model of <lg> without a following <l>, <camera>, <caption>, <move>, <sound>, <stage>, <tech>, <view>, <desc>, <label>, or <lg>; so it makes no sense.
To see this, note that the content model has 4 “clauses”, of which only the 2nd has any required elements.

(
  ( model.divTop | model.global )*,
  ( model.lLike | model.stageLike | model.labelLike | lg ),
  ( model.lLike | model.stageLike | model.labelLike | model.global | lg )*,
  ( ( model.divBottom ), model.global* )*
)

BTW, this is a pretty screwy content model. The references to model.stageLike and model.labelLike in clause 2 should be in clause 1. In which case, the Schematron probably would not be needed at all. This is obviously fodder for a different ticket, of course.

@sydb
Copy link
Member

sydb commented Apr 26, 2016

SB to try to make content of <lg> allow <gap> in place of <l> or <lg> in deterministic way.

@sydb
Copy link
Member

sydb commented Sep 26, 2016

I spent several hours on this, and while I still believe it is probably possible to get a non-deterministic content model that does the right thing, it requires (at a minimum) creating a new model class model.global.sans-gap, and probably others. Likely way more effort than it is worth.
Council decides to close w/o doing anything.

@sydb sydb closed this as completed Sep 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants