-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recording page numbers in USFM and XML #9
Comments
See also #10 where the column break splits a word that has both a hyphen and a soft hyphen. |
Although the For this reason, I changed to use a footnote marker The minus sign signifies that there is no caller symbol for the note. However, there may be a problem when the USFM footnote occurs before chapter 1 of any book. |
The use in digital form will be to be able to refer to a physical page from
the print edition by a link?
A cross reference type marker might be a better, slightly more appropriate
hack than a footnote, because it will automatically generate a link target
designed to be linked in from elsewhere when being processed, where the
footnotes will generate links only designed to link out.
I suggest using the form
*\x - \xo (current verse) \xt (current verse) \xta Page number \x**
and possibly using a different form *\ex ... \ex** or *\fe ... \fe* *instead
of* \x ... \x** to be able to keep these page numbers separate from visible
cross references, if they exist.
You might also read about the explicitly marked notes as an option:
*\f (and \ef, \ex,** \fdc, **\fe, \x, \xdc, \xnt, **\xot** ....)* all have
three forms for the marker argument
*+* = software created visible marker inline
*- *= no marker visible inline
*(anything else)* = text appears here is the visible marker that appears
inline. However, the explicit reading of the spec is '(singular)
character'. In practice, I've noted some exceptions, and 2 character
markers have been used. The argument is therefore delimited on the space
and might take more than 2 characters.)
The problem with explicit markers in most SFM processors is that the code
may not accept multiple digits, and will not accept a space.
*\x 254 \xt ... \x* * : the 54 might not be recognised.
*\x Page 254 \xt ... \x** : (will present a P or a Page and almost
certainly error on the digits.)
*\x Page.254 \xt ... \x** : everything after the P might not be recognized.
…On Fri, Jan 5, 2018 at 9:55 AM, David Frank Haslam ***@***.*** > wrote:
Although the \rem tag seemed sensible, it causes serious problems in USFM
where the page break occurs in the middle of a verse, or even in the middle
of a word!
For this reason, I changed to use a footnote marker \f - ... \f* to
record page numbers and column breaks.
This can be used mid-verse without much hassle.
The minus sign signifies that there is *no caller symbol* for the note.
cf. More commonly, USFM footnotes would have a plus sign. Refer to the *USFM
2.4 User Reference*.
However, there may be a problem when the USFM footnote occurs *before* *chapter
1* of any book.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ALQyclSCW1b_ESpJF2BNt3Zh_YwSXO6eks5tHkYOgaJpZM4RRhIr>
.
|
The page and column breaks in the OSIS should not be links. OSIS 2.1.1 defines a milestone type for these. The USFM tags for remarks do at least convert to milestone elements, albeit not in the prescribed format. However, USFM footnotes or cross-references convert to note elements in OSIS. UsIng USFM was not part of Thom's original digitisation plan.
Also, it has since become apparent, Thom's OSIS for the first 12 books digitised has errors and other inconsistencies. I would have preferred to use a general milestone tag in USFM, but that does not yet exist. Even so, it's something I had already proposed to ICAP for USFM 3.1 (too late for 3.0). |
I should add that my switch from The problem with It's not a character level marker. You can't use it conveniently in the middle of a word! Yet that's where column breaks and page breaks can occur. After converting USFM to OSIS, further postprocessing of these will be required. The advantage of But that does not mitigate the bigger problem of mid-word page or column breaks. |
I observe that the XML files contain a record of where a page break occurs together with the next page number, e.g.
There's no defined marker for this in USFM.
The best option would be to use the
\rem
tag to record the page number, thus:NB. Though USFM does have tag
\pb
for an explicit page break, we're not trying to specify how future editions should be typeset, but merely recording FIO where page breaks occurred in the original.btw.
u2o.py
converts each such remark into a milestone element, though not exactly the same as what you have already.The text was updated successfully, but these errors were encountered: