Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index Reconstruction #5

Closed
tajmone opened this issue Aug 25, 2018 · 6 comments
Closed

Index Reconstruction #5

tajmone opened this issue Aug 25, 2018 · 6 comments
Labels
📖 Alan Manual Issues relating to "The Alan Language Manual" 👮 styling conventions Policies: How elements should be styled in Alan docs ❔ question Further information is requested 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.)

Comments

@tajmone
Copy link
Collaborator

tajmone commented Aug 25, 2018

Today I started looking into the reconstruction of the Index, but I've realized that there seem to be some problems in the original document.

In the PDF Manual's Index, the entries are not cliccable links (although the page numbers they refer to seem correct). I've looked into the ODT file, and I can't find any Index entry fields.

What was the original word processor used to create the ODT document? I'm using LibreOffice Writer to open the original doc, so it might be that some features are not supported.

I guess that I'll have to try and reconstruct the Index manually, by checking the Index entries and the page they point to, and working out myself where to place ADoc style Index anchors in the document.

Any suggestions on this?

@tajmone tajmone added ❔ question Further information is requested 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.) 📖 Alan Manual Issues relating to "The Alan Language Manual" labels Aug 25, 2018
@tajmone
Copy link
Collaborator Author

tajmone commented Aug 25, 2018

I think that I might have solved this by opening the "manual-pretranscript.odt" document instead.

What is the difference between this file and "manual.odt"?

@tajmone
Copy link
Collaborator Author

tajmone commented Aug 26, 2018

It looks like "manual-pretranscript.odt" is an older document (some sections are not in the same order as the PDF and "manual.odt"), but contains all the Index entries (fields).

On the other hand, "manual.odt" seems to be the doc used to produce the latest PDF, but all Index entries are lost in it (trying to update the Index empties).

Anyhow, the differences between the two are not all that huge, and I've managed to use "manual-pretranscript.odt" as a reference to rebuild the Index — worst case sceneario: I have to find the paragraph via Search functionality, because it was moved around in the latest doc, but the indexed words are all there.

So far I've managed to successfully reconstruct the Index up to Chapter 3 (included). In a few places I've taken license to adpat slightly the entries to avoid having separate entries due to lettercase differences. Also, I've added a few items to the Index too, following the overall pattern of the Index (and I plan to work actively on the Index in the future, as it can only add benefits to the reader, without damaging anything).

@tajmone
Copy link
Collaborator Author

tajmone commented Aug 27, 2018

NOTE — the following considerations on the Index require previewing the PDF version of the Manual, which is currently being "gitignored" in the repository and therefore must be converted locally, in order to preview the current status of the Manual's Index.

Keywords in Index

@thoni56, having rebuilt the Index up to Chapter 3, I've noticed a problem with keywords in the Index: all styles are doppred in Index entries (including bold and italic).

IMO, Alan keywords should always be represented in uppercase in the Index, to avoid confusing them with plain English nouns, as well as to make looking up the Index quicker and more intuitive.

Here is a selection of some Index entries, copied from the converted PDF Manual (formatted as Verbatim block for editing convenience):

Locate statement, 13, 79
Location
   in What specifications, 92
   predefined class, 15
Look statement, 88
location, 7, 10
locations, 34

Thing
   predefined class, 15, 31
This expression, 98            <-- doesn't look good at all.
Transcript statement, 90

What specification, 92         <-- looks bad!
Where specification, 91        <-- looks bad!

... where this might look more intuitive:

LOCATE statement, 13, 79
LOCATION, 7, 10                <-- 'Location' and 'location' merged in single entry!
   in WHAT specifications, 92
   predefined class, 15
LOOK statement, 88
locations, 34                  <-- the concept, not the keyword!

THING
   predefined class, 15, 31
THIS expression, 98            <-- now more clear!
TRANSCRIPT statement, 90

WHAT specification, 92         <-- much better!
WHERE specification, 91        <-- much better!

ASCII-BETICAL SORTING!!! — Also, note that Asciidoctor will list in the Index all words starting with capital letter first, followed by lowercase words, which means that differently cased same-words will be kept apart in the list (in the above example, Look is placed between Location and location, whereas it should have come after location) . It looks like it uses Ascii-betical sorting within each letter group, instead of alphabetically sorting entries.

Enforcing same-casing on keywords inside the Index would prevent them being indexed separately and apart!

Flow Index vs Concealed Index

On the one hand, the current system of indexing kywords as they appear in the text is practical because it allows using AsciiDoc flow index styling — Eg:

the ((`Description` clause)) should

... will be indexed as "Description clause"; which is more practical then using a concealed index. (notice how the inline-code styling doesn't interfere with indexing, nor it affects how the entry will show up in the Index).

On the other hand, using a concealed index would allow to control the letter casing of keywords in the Index — Eg:

the `Description` clause (((DESCRIPTION clause))) should

... will be indexed as "DESCRIPTION clause".

Of course, in order to achive this (and at the same time preserve the agreed-upon convention fro letter-casing keywords in the Manual text) we'd have to use always the concealed index syntax when keywords are involved, which is slightly more verbose and somewhate interrupts the natural flow of the source text, but at least it will grant us fine-grain control over keywords casing.

Enforcing a casing convention on keywords has another benificial effect on the Index: it avoids redundant entries. Asciidoctor creates independent Index entries for same-words with different casing. Eg:

The ((`Actor`)) class is used to create an ((actor)).

... would create two separate Index entries: "Actor" and "actor"; whereas using a concealed index would create one single entry, pointing to two different pages:

The `Actor` (((actor))) class is used to create an ((actor)).

... where the first occurence uses a concealed index and the second one a flow index.

Of course, the two types of indexing can cohexist in the same document without problems — and both are currently being used, as often a concealed index is required for practical reasons or because of the need for secondary and tertiary entries.

I see the Index as an important feature in the PDF version of the book (especially if a user prints it on paper, or if the Manual will be offered in paperback format via POD), for it allows to quickly find a piece of information to solve a problem. In a paperback version of the Manual the reader wouldn't be able to use the Search feature of a PDF reader to find contents, so the Index would be the main way to look up the Manaul for specific keywords.

So, it might be worthy investing some extra energy on it, and tollerate the added verbosity of the concealed index syntax to the source document, for the final results in the Index are well worth it.

What's your view on this?

@tajmone tajmone added the 👮 styling conventions Policies: How elements should be styled in Alan docs label Aug 27, 2018
@thoni56
Copy link
Contributor

thoni56 commented Aug 27, 2018

Again, I think you have layed the arguments out clearly and the conclusion is fairly simple, yes, it's worth some extra work to get a good index. So for concealed index wherever that is required to get nice, visual keyword identification and combination of terms in the index.

And I think I can remember my own reasoning going like this when I started the work on the index, because indexing in a .DOC is also quite cumbersome ;-) and you have to remember your own conventions to get it right.

tajmone added a commit that referenced this issue Aug 28, 2018
Enforce all-caps letter casing on keywords in the Index. (See #5)
tajmone added a commit that referenced this issue Aug 28, 2018
Add guidelines for Index. (See #5)
tajmone added a commit that referenced this issue Aug 28, 2018
Rebuilt index entries of Chapters 4 to 6, Appendices A, C, D, F.
(Ch 7, and Appendices E, G, H, and I didn't contain any entries!)
With this commit, all Index entries from the ODT file have been reconstructed.
There seem to be a few entries (about a dozen) listed in the PDF Index which
were not present in the ODT file (`manual-pretranscript.odt`) due to these
docs being frorm different versions (See Issue #5).
These missing entries can be rebuilt by looking at the latest PDF doc.
@tajmone
Copy link
Collaborator Author

tajmone commented Aug 30, 2018

Asciidoctor Indexing Feature Request

I've opened a feature request on Asciidoctor PDF regarding the letter-casing sorting problem mentioned above:

Hopefully, this new feature would improve the usability of the Index.

@tajmone tajmone reopened this Aug 30, 2018
@tajmone
Copy link
Collaborator Author

tajmone commented Sep 5, 2018

The Index in the PDF created via asciidoctor-fopub looks great, it doesn't have the ascii-sorting problem, and its styles are fully customizable.

@tajmone tajmone added this to the PDF Conversion Toolchain milestone Sep 5, 2018
@tajmone tajmone closed this as completed Sep 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📖 Alan Manual Issues relating to "The Alan Language Manual" 👮 styling conventions Policies: How elements should be styled in Alan docs ❔ question Further information is requested 💀 format porting issues Cross-format problems (ADoc, HTML, PDF, etc.)
Projects
None yet
Development

No branches or pull requests

2 participants