Allow Inclusion of non-UTF-8 Files #3248

tajmone · 2019-04-07T08:17:31Z

Feature Request: add an (optional) attribute in the include:: directive to control the encoding of the included source file, and allow inclusion of non-UTF-8 source files by automatically converting them first.

Rationale:

I'm currently working on a documentation project that involves sourcecode files encoded in ISO-8859-1, which can't be directly included into AsciiDoc socuments.

In order to include the source files (or parts of them) into the documents via the include:: directive I need to first run a script that converts them to UTF-8 via iconv, and then run the Asciidoctor toolchain and include the UTF-8 version instead. Here's a real case example:

https://github.com/alan-if/alan-docs/blob/master/alanguide/_adoc/generate-inc-files.sh

Where the script creates a copy of the original sourcefiles (eg. "mysource.alan"/".i") converted to UTF-8 ("mysource.utf8_alan"/".utf8_i").

This introduces an extra layer of complexity and dependencies, especially on Windows which doesn't have a native tool like iconv, adds extra files and complicates managing any watch scripts.

If Asciidoctor were to allow an extra attribute to control the encoding of the included file — e.g. include::path[encoding=iso-8859-1] it would be much simpler and elegant.

I know that today most source files are expected to be in UTF-8, but some legacy tools still cling on ISO encodings — and, besides, there are many other encodings still in use today. Being an optional extra feature that doesn't break backward compatibility, this would introduce and added benefit to Asciidoctor. My guess is that there should be Ruby libraries to handle encoding conversion.

The text was updated successfully, but these errors were encountered:

mojavelinux · 2019-04-07T08:41:40Z

This request does seem reasonable to me.

FYI, you can accomplish this today using a custom include processor. In Asciidoctor 2, we already isolated the read mode, so technical it is very feasible.

For now, this would not change the fact that the AsciiDoc document itself has to be encoded in UTF-8. We're still discussing making that configurable, but it would be a separate issue.

tajmone · 2019-04-07T08:59:08Z

Thanks @mojavelinux , I'm looking forward to it.

FYI, you can accomplish this today using a custom include processor.

I'm sure there are many ways to circumvent this issue, and using a custom include processor would definitely be more elegant than my current solution (and won't require creating copies of the sources).

It's just that I think that keeping things simpler, by finding solution within the native functionality of Asciidoctor is always preferable for in many projects the documentation part is often a subproject on the side, managed by specific users, and not all contributors to the main project might have experience with Asciidoctor (or none at all).

In quite a few project I'm the one that follows the documentation part, and I always try to leave behind something that is easy to use and understand, just in case someone else would have to take on its maintainance in the future.

Right now, the bash script solution is fine (and even Windows contributors are expected to have Bash as part of Git for Windows, which includes iconv), but as the saying goes "less is more", therefore the proposed feature would take off some burden from the project complexity (there already enough complications with custom extensions to handle Highlight and a few other third party tools to generate documentation from sources).

mojavelinux · 2019-04-07T09:40:14Z

It's just that I think that keeping things simpler

I understand that. By suggesting the custom include processor, I was not arguing against the idea. I was simply offering you a path forward in the short term. So there's no need to provide further justification. I get it.

tajmone · 2019-04-07T09:53:44Z

Don't get me wrong, I understood perfectly that you were both welcoming my suggestion and offering a better workaround.

My intention was just to share personal experience and thoughts about Asciidoctor, as a way of giving some feedback because I know that so many people use Asciidoctor in different ways, each one with his/her own needs and goals. So I just thought that providing some context about the scenario I'm working-in might provide some additional insight — i.e. to illustrate that sometimes what are easy and natural solutions for everyday Asciidoctor users might be seen as an obstacle by other collaborators who aren't into the document side of projects.

Often by reading users comments in issues I learn about how others are using Asciidoctor in ways that I never considered, which broadens my view of the context the tool is being used in.

mojavelinux · 2019-04-07T10:42:54Z

👍

bedoro · 2019-07-29T11:26:30Z

In Gitter, you mentioned that you are thinking about scoping a feature for 2.0.11 or 2.0.12 that allows the importer to set the encoding of the file to be included. Any news about this? :)

First Glossary draft with an initial entry (*stropping*) and some commented-out pending entries TBD later on (Closes #54). Update contents of "§4.2. Words, Identifiers and Names": * Add "Stropping" sub-section. * Add `stropping` anchor. * Add `stropping` Index entry. * Revise and improve contents of this section: * More examples. * Extra admonitions. * Polish text. Clean-up, polish and update README files in Alan Manual directory. Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.

…ied using encoding attribute

mojavelinux · 2019-09-16T08:47:18Z

I've submitted a PR. See #3419

tajmone · 2019-09-16T09:02:40Z

Thanks, I work with many big-sized projects for documentation of old software tools from the '80s and '90s, and I have to handle lot's of source code in ISO- and other legacy encodings!

…ied using encoding attribute

First Glossary draft with an initial entry (*stropping*) and some commented-out pending entries TBD later on (Closes #54). Update contents of "§4.2. Words, Identifiers and Names": * Add "Stropping" sub-section. * Add `stropping` anchor. * Add `stropping` Index entry. * Revise and improve contents of this section: * More examples. * Extra admonitions. * Polish text. Clean-up, polish and update README files in Alan Manual directory. Referenced Issues: #36, #50, #54, asciidoctor/asciidoctor#3248.

@mojavelinux

Stop converting ALAN sources and transcripts to UTF-8 and directly include the original ISO-8859-1 files in AsciiDoc sources (fixes #126). This huge commit entirely removes from the repo all assets that dealt with creating UTF-8 intermediate versions of the ISO-8559-1 ALAN sources and transcripts, using instead the new (undocumented) `encoding` option of Asciidoctor's `include::` directives, which was kindly added by @mojavelinux on our request for the ALAN-IF projects: - asciidoctor/asciidoctor#3248 The build toolchain is now much faster than before. For the full details of the changes, refer see the task list of #126.

mojavelinux self-assigned this Apr 7, 2019

mojavelinux added this to the v2.x milestone Apr 7, 2019

mojavelinux added the enhancement label Apr 7, 2019

This was referenced Aug 30, 2019

Squash Glossary in Beta7 Dev Branch alan-if/alan-docs#55

Merged

Add Support for UTF-8 Sources and I/O Stream alan-if/alan#12

Closed

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Sep 16, 2019

resolves asciidoctor#3248 allow encoding of include file to be specif…

cad8aeb

…ied using encoding attribute

mojavelinux added a commit to mojavelinux/asciidoctor that referenced this issue Sep 27, 2019

resolves asciidoctor#3248 allow encoding of include file to be specif…

4c008b0

…ied using encoding attribute

mojavelinux closed this as completed in 8376d7f Sep 28, 2019

mojavelinux modified the milestones: v2.x, v2.0.x Sep 30, 2019

mojavelinux added compliance v2.0.11 Issues resolved in the 2.0.11 release labels Sep 30, 2019

This was referenced Nov 24, 2019

Index Sorting: Introduce Unicode Collation Instead of Asciibetical Sorting asciidoctor/asciidoctor-pdf#928

Closed

Externalize Hugo Code Examples tajmone/hugo-book#37

Open

mojavelinux mentioned this issue Apr 20, 2020

include:: and special characters #3628

Closed

This was referenced Dec 27, 2020

Use New encoding Option with include:: Directives alan-if/alan-docs#84

Closed

Include ALAN Assets directly in ISO Encoding AnssiR66/AlanStdLib#126

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Inclusion of non-UTF-8 Files #3248

Allow Inclusion of non-UTF-8 Files #3248

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

bedoro commented Jul 29, 2019

mojavelinux commented Sep 16, 2019

tajmone commented Sep 16, 2019

Allow Inclusion of non-UTF-8 Files #3248

Allow Inclusion of non-UTF-8 Files #3248

Comments

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

tajmone commented Apr 7, 2019

mojavelinux commented Apr 7, 2019

bedoro commented Jul 29, 2019

mojavelinux commented Sep 16, 2019

tajmone commented Sep 16, 2019