Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording page numbers in USFM and XML #9

Open
DavidHaslam opened this issue Jan 3, 2018 · 5 comments
Open

Recording page numbers in USFM and XML #9

DavidHaslam opened this issue Jan 3, 2018 · 5 comments

Comments

@DavidHaslam
Copy link

I observe that the XML files contain a record of where a page break occurs together with the next page number, e.g.

<milestone type="pb" n="62"/>

There's no defined marker for this in USFM.

The best option would be to use the \rem tag to record the page number, thus:

\rem page 62

NB. Though USFM does have tag \pb for an explicit page break, we're not trying to specify how future editions should be typeset, but merely recording FIO where page breaks occurred in the original.

btw. u2o.py converts each such remark into a milestone element, though not exactly the same as what you have already.

@DavidHaslam
Copy link
Author

See also #10 where the column break splits a word that has both a hyphen and a soft hyphen.

@DavidHaslam
Copy link
Author

Although the \rem tag seemed sensible, it causes serious problems in USFM where the page break occurs in the middle of a verse, or even in the middle of a word!

For this reason, I changed to use a footnote marker \f - ... \f* to record page numbers and column breaks.
This can be used mid-verse without much hassle.

The minus sign signifies that there is no caller symbol for the note.
cf. More commonly, USFM footnotes would have a plus sign. Refer to the USFM 2.4 User Reference.

However, there may be a problem when the USFM footnote occurs before chapter 1 of any book.

@cmahte
Copy link

cmahte commented Jan 6, 2018 via email

@DavidHaslam
Copy link
Author

The page and column breaks in the OSIS should not be links.

OSIS 2.1.1 defines a milestone type for these.

The USFM tags for remarks do at least convert to milestone elements, albeit not in the prescribed format.

However, USFM footnotes or cross-references convert to note elements in OSIS.

UsIng USFM was not part of Thom's original digitisation plan.
I suggested USFM as an expedient because

  1. it's easier to write, and
  2. there is a straightforward conversion script called u2o.py

Also, it has since become apparent, Thom's OSIS for the first 12 books digitised has errors and other inconsistencies.

I would have preferred to use a general milestone tag in USFM, but that does not yet exist.

Even so, it's something I had already proposed to ICAP for USFM 3.1 (too late for 3.0).

@DavidHaslam
Copy link
Author

DavidHaslam commented Jan 6, 2018

I should add that my switch from \rem to using footnote markers is definitely a "kludge".

The problem with \rem is that it has to be a line of its own in USFM.

It's not a character level marker. You can't use it conveniently in the middle of a word!

Yet that's where column breaks and page breaks can occur.

After converting USFM to OSIS, further postprocessing of these will be required.

The advantage of \rem is that it is allowed before any tags that determine displayed content.
You can even have it before the \mt1 book title.
And, unsurprisingly, that's where the first page number occurs in Genesis.

But that does not mitigate the bigger problem of mid-word page or column breaks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants