Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Inclusion of non-UTF-8 Files #3248

Closed
tajmone opened this issue Apr 7, 2019 · 8 comments
Closed

Allow Inclusion of non-UTF-8 Files #3248

tajmone opened this issue Apr 7, 2019 · 8 comments
Assignees
Labels
compliance enhancement v2.0.11 Issues resolved in the 2.0.11 release
Milestone

Comments

@tajmone
Copy link

tajmone commented Apr 7, 2019

Feature Request: add an (optional) attribute in the include:: directive to control the encoding of the included source file, and allow inclusion of non-UTF-8 source files by automatically converting them first.

Rationale:

I'm currently working on a documentation project that involves sourcecode files encoded in ISO-8859-1, which can't be directly included into AsciiDoc socuments.

In order to include the source files (or parts of them) into the documents via the include:: directive I need to first run a script that converts them to UTF-8 via iconv, and then run the Asciidoctor toolchain and include the UTF-8 version instead. Here's a real case example:

Where the script creates a copy of the original sourcefiles (eg. "mysource.alan"/".i") converted to UTF-8 ("mysource.utf8_alan"/".utf8_i").

This introduces an extra layer of complexity and dependencies, especially on Windows which doesn't have a native tool like iconv, adds extra files and complicates managing any watch scripts.

If Asciidoctor were to allow an extra attribute to control the encoding of the included file — e.g. include::path[encoding=iso-8859-1] it would be much simpler and elegant.

I know that today most source files are expected to be in UTF-8, but some legacy tools still cling on ISO encodings — and, besides, there are many other encodings still in use today. Being an optional extra feature that doesn't break backward compatibility, this would introduce and added benefit to Asciidoctor. My guess is that there should be Ruby libraries to handle encoding conversion.

@mojavelinux
Copy link
Member

This request does seem reasonable to me.

FYI, you can accomplish this today using a custom include processor. In Asciidoctor 2, we already isolated the read mode, so technical it is very feasible.

For now, this would not change the fact that the AsciiDoc document itself has to be encoded in UTF-8. We're still discussing making that configurable, but it would be a separate issue.

@mojavelinux mojavelinux self-assigned this Apr 7, 2019
@mojavelinux mojavelinux added this to the v2.x milestone Apr 7, 2019
@tajmone
Copy link
Author

tajmone commented Apr 7, 2019

Thanks @mojavelinux , I'm looking forward to it.

FYI, you can accomplish this today using a custom include processor.

I'm sure there are many ways to circumvent this issue, and using a custom include processor would definitely be more elegant than my current solution (and won't require creating copies of the sources).

It's just that I think that keeping things simpler, by finding solution within the native functionality of Asciidoctor is always preferable for in many projects the documentation part is often a subproject on the side, managed by specific users, and not all contributors to the main project might have experience with Asciidoctor (or none at all).

In quite a few project I'm the one that follows the documentation part, and I always try to leave behind something that is easy to use and understand, just in case someone else would have to take on its maintainance in the future.

Right now, the bash script solution is fine (and even Windows contributors are expected to have Bash as part of Git for Windows, which includes iconv), but as the saying goes "less is more", therefore the proposed feature would take off some burden from the project complexity (there already enough complications with custom extensions to handle Highlight and a few other third party tools to generate documentation from sources).

@mojavelinux
Copy link
Member

It's just that I think that keeping things simpler

I understand that. By suggesting the custom include processor, I was not arguing against the idea. I was simply offering you a path forward in the short term. So there's no need to provide further justification. I get it.

@tajmone
Copy link
Author

tajmone commented Apr 7, 2019

Don't get me wrong, I understood perfectly that you were both welcoming my suggestion and offering a better workaround.

My intention was just to share personal experience and thoughts about Asciidoctor, as a way of giving some feedback because I know that so many people use Asciidoctor in different ways, each one with his/her own needs and goals. So I just thought that providing some context about the scenario I'm working-in might provide some additional insight — i.e. to illustrate that sometimes what are easy and natural solutions for everyday Asciidoctor users might be seen as an obstacle by other collaborators who aren't into the document side of projects.

Often by reading users comments in issues I learn about how others are using Asciidoctor in ways that I never considered, which broadens my view of the context the tool is being used in.

@mojavelinux
Copy link
Member

👍

@bedoro
Copy link

bedoro commented Jul 29, 2019

In Gitter, you mentioned that you are thinking about scoping a feature for 2.0.11 or 2.0.12 that allows the importer to set the encoding of the file to be included. Any news about this? :)

tajmone added a commit to alan-if/alan-docs that referenced this issue Aug 30, 2019
First Glossary draft with an initial entry (*stropping*) and some
commented-out pending entries TBD later on (Closes #54).

Update contents of "§4.2. Words, Identifiers and Names":

 * Add "Stropping" sub-section.
 * Add `stropping` anchor.
 * Add `stropping` Index entry.
 * Revise and improve contents of this section:
    * More examples.
    * Extra admonitions.
    * Polish text.

Clean-up, polish and update README files in Alan Manual directory.

Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.
tajmone added a commit to alan-if/alan-docs that referenced this issue Sep 1, 2019
First Glossary draft with an initial entry (*stropping*) and some
commented-out pending entries TBD later on (Closes #54).

Update contents of "§4.2. Words, Identifiers and Names":

 * Add "Stropping" sub-section.
 * Add `stropping` anchor.
 * Add `stropping` Index entry.
 * Revise and improve contents of this section:
    * More examples.
    * Extra admonitions.
    * Polish text.

Clean-up, polish and update README files in Alan Manual directory.

Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.
tajmone added a commit to alan-if/alan-docs that referenced this issue Sep 1, 2019
First Glossary draft with an initial entry (*stropping*) and some
commented-out pending entries TBD later on (Closes #54).

Update contents of "§4.2. Words, Identifiers and Names":

 * Add "Stropping" sub-section.
 * Add `stropping` anchor.
 * Add `stropping` Index entry.
 * Revise and improve contents of this section:
    * More examples.
    * Extra admonitions.
    * Polish text.

Clean-up, polish and update README files in Alan Manual directory.

Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.
mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Sep 16, 2019
@mojavelinux
Copy link
Member

I've submitted a PR. See #3419

@tajmone
Copy link
Author

tajmone commented Sep 16, 2019

Thanks, I work with many big-sized projects for documentation of old software tools from the '80s and '90s, and I have to handle lot's of source code in ISO- and other legacy encodings!

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Sep 27, 2019
@mojavelinux mojavelinux modified the milestones: v2.x, v2.0.x Sep 30, 2019
@mojavelinux mojavelinux added compliance v2.0.11 Issues resolved in the 2.0.11 release labels Sep 30, 2019
tajmone added a commit to alan-if/alan-docs that referenced this issue Sep 18, 2020
First Glossary draft with an initial entry (*stropping*) and some
commented-out pending entries TBD later on (Closes #54).

Update contents of "§4.2. Words, Identifiers and Names":

 * Add "Stropping" sub-section.
 * Add `stropping` anchor.
 * Add `stropping` Index entry.
 * Revise and improve contents of this section:
    * More examples.
    * Extra admonitions.
    * Polish text.

Clean-up, polish and update README files in Alan Manual directory.

Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.
tajmone added a commit to alan-if/alan-docs that referenced this issue Sep 20, 2020
First Glossary draft with an initial entry (*stropping*) and some
commented-out pending entries TBD later on (Closes #54).

Update contents of "§4.2. Words, Identifiers and Names":

 * Add "Stropping" sub-section.
 * Add `stropping` anchor.
 * Add `stropping` Index entry.
 * Revise and improve contents of this section:
    * More examples.
    * Extra admonitions.
    * Polish text.

Clean-up, polish and update README files in Alan Manual directory.

Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.
tajmone added a commit to AnssiR66/AlanStdLib that referenced this issue Dec 28, 2020
Stop converting ALAN sources and transcripts to UTF-8 and directly
include the original ISO-8859-1 files in AsciiDoc sources (fixes #126).

This huge commit entirely removes from the repo all assets that dealt
with creating UTF-8 intermediate versions of the ISO-8559-1 ALAN sources
and transcripts, using instead the new (undocumented) `encoding` option
of Asciidoctor's `include::` directives, which was kindly added by
@mojavelinux on our request for the ALAN-IF projects:

- asciidoctor/asciidoctor#3248

The build toolchain is now much faster than before.
For the full details of the changes, refer see the task list of #126.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compliance enhancement v2.0.11 Issues resolved in the 2.0.11 release
Projects
None yet
Development

No branches or pull requests

3 participants