New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support RTL in output formats (particularly HTML) #1601

Open
shahryareiv opened this Issue Dec 8, 2015 · 7 comments

Comments

Projects
None yet
3 participants
@shahryareiv
Contributor

shahryareiv commented Dec 8, 2015

Signaling the back-ends about the text direction. At the moment in HTML it is LTR by default. Adding such thing as:

:text-direction: rtl

Should signal the back-end to apply the correct direction.

Please note that being RTL or LTR can change at element level, so for each element that should be an option to change it.

@mojavelinux mojavelinux added this to the v1.6.0 milestone Dec 17, 2015

@mojavelinux

This comment has been minimized.

Show comment
Hide comment
@mojavelinux

mojavelinux Dec 17, 2015

Member

I knew that this issue would one day come :) I've often thought about it. Thanks for submitting it!

We really have two concerns here, the input and and the output.

On the input side, we want AsciiDoc to be friendly for RTL languages. But that is going to be tough because it may require changes to the parser, and with it some design changes.

On the output side, we should definitely allow the RTL to be controlled because that is critical for the reader. Let's make this issue about RTL in the output so we get one part of it down. Agreed?

Member

mojavelinux commented Dec 17, 2015

I knew that this issue would one day come :) I've often thought about it. Thanks for submitting it!

We really have two concerns here, the input and and the output.

On the input side, we want AsciiDoc to be friendly for RTL languages. But that is going to be tough because it may require changes to the parser, and with it some design changes.

On the output side, we should definitely allow the RTL to be controlled because that is critical for the reader. Let's make this issue about RTL in the output so we get one part of it down. Agreed?

@mojavelinux mojavelinux changed the title from Support RTL to Support RTL in output formats (particularly HTML) Dec 17, 2015

@shahryareiv

This comment has been minimized.

Show comment
Hide comment
@shahryareiv

shahryareiv Dec 17, 2015

Contributor

There are two things about RTL at the input. 1: to write in RTL language and 2: to have Asciidoc tags and macros in RTL. I created an issue for number 2 here: #1600 .

But about number 1 I was thinking it is already there. Actually I should test a long and sophisticated enough text (well, just in Persian) to see if things work or not. I would put the result here.

Contributor

shahryareiv commented Dec 17, 2015

There are two things about RTL at the input. 1: to write in RTL language and 2: to have Asciidoc tags and macros in RTL. I created an issue for number 2 here: #1600 .

But about number 1 I was thinking it is already there. Actually I should test a long and sophisticated enough text (well, just in Persian) to see if things work or not. I would put the result here.

@mojavelinux mojavelinux modified the milestones: v1.6.0, v1.7.0 Dec 21, 2015

@shahryareiv

This comment has been minimized.

Show comment
Hide comment
@shahryareiv

shahryareiv Dec 25, 2015

Contributor

Ok good news. I guess that Asciidoctor is very close to claim it supports RTL (bidi) languages at least to some extents. I tested an article in Persian (Farsi) with some levels of sophistication, first transferred it to Asciidoc and then converted to HTML through Asciidoctor. A custom css (sass) accompanied the output. The input and the output are attached and they look quite good and readable. The only problem I noticed is the automated-numberings in section headings, image numberings ... which are in English.
Here I attached the input file and the output file.
khoshksali.adoc.html.zip
khoshksali.adoc.txt

Also, Asciidoctor was able to convert Persian roles into correct HTML classes in Persian. As an example, section 1.1 which is actually the references section was given role چپچین , which means "left-aligned" and you can see it was associated with the correct css class (again in Persian) and shows as LTR:

.چپچین, .ltr{  
    direction:ltr;
    * {
        direction:ltr;
    }
}

I wish that macros and types (such as [appendix]) can also be written in the destination language so there would be more readability and homogeneity in the asciidoctor RTL files.

I guess the problem of English numbering should also be fixed very easily, by an attribute one can determine the language of the document (e.ge :lang: fa ) and based on that the numbers can be translated to right numbers using a mapping array. At least for the Persian and Arabic I know the mappings are:

 //for Persian: unicode codes ۰ to ۹ 
  persian =[۰,۱,۲,۳,۴,۵,۶,۷,۸,۹];
 //for Arabic: unicode codes ٠ ; to ٩ 
  arabic =[٠,١,٢,٣,٤,٥,٦,٧,٨,٩];
  english =[0,1,2,3,4,5,6,7,8,9];

Please note that the the order of numbers in section numbering should be RTL .For example, section 2 subsection 5 (2.5) it should be 2 then 5 from the right (not left) when the numbers are converted.

One more destination to check is Docbook, although it seems OK in the translation but I tried to check it more closely by converting to Latex and then converting to PDF (using Xelatex) but DBLatex was escaping Persian characters and still I could not fix it to check the final result.

Contributor

shahryareiv commented Dec 25, 2015

Ok good news. I guess that Asciidoctor is very close to claim it supports RTL (bidi) languages at least to some extents. I tested an article in Persian (Farsi) with some levels of sophistication, first transferred it to Asciidoc and then converted to HTML through Asciidoctor. A custom css (sass) accompanied the output. The input and the output are attached and they look quite good and readable. The only problem I noticed is the automated-numberings in section headings, image numberings ... which are in English.
Here I attached the input file and the output file.
khoshksali.adoc.html.zip
khoshksali.adoc.txt

Also, Asciidoctor was able to convert Persian roles into correct HTML classes in Persian. As an example, section 1.1 which is actually the references section was given role چپچین , which means "left-aligned" and you can see it was associated with the correct css class (again in Persian) and shows as LTR:

.چپچین, .ltr{  
    direction:ltr;
    * {
        direction:ltr;
    }
}

I wish that macros and types (such as [appendix]) can also be written in the destination language so there would be more readability and homogeneity in the asciidoctor RTL files.

I guess the problem of English numbering should also be fixed very easily, by an attribute one can determine the language of the document (e.ge :lang: fa ) and based on that the numbers can be translated to right numbers using a mapping array. At least for the Persian and Arabic I know the mappings are:

 //for Persian: unicode codes ۰ to ۹ 
  persian =[۰,۱,۲,۳,۴,۵,۶,۷,۸,۹];
 //for Arabic: unicode codes ٠ ; to ٩ 
  arabic =[٠,١,٢,٣,٤,٥,٦,٧,٨,٩];
  english =[0,1,2,3,4,5,6,7,8,9];

Please note that the the order of numbers in section numbering should be RTL .For example, section 2 subsection 5 (2.5) it should be 2 then 5 from the right (not left) when the numbers are converted.

One more destination to check is Docbook, although it seems OK in the translation but I tried to check it more closely by converting to Latex and then converting to PDF (using Xelatex) but DBLatex was escaping Persian characters and still I could not fix it to check the final result.

@shahryareiv

This comment has been minimized.

Show comment
Hide comment
@shahryareiv

shahryareiv Dec 27, 2015

Contributor

Results from DocBook conversion (attached the result):
There is another problem with RTL in the generated DocBook. While HTML renderers usually detect changes from LTR to RTL or vice versa (they only need to know the default direction) but in DocBook case as it is not intended for a standard rendering system it is needed to delimit the LTR texts inside another RTL text or vice versa.

At the moment we have :

<simpara> یک نوشته some text یک نوشته</simpara>

While this should be probably:

<simpara> یک نوشته <phrase dir="ltr">some text</phrase> یک نوشته </simpara>

I don't know if dir is inherited by child nodes (like HTML), but if probably yes then we need for documents with :direction: rtl the main container contains dir=rtl while all non-rtl texts contain 'dir=ltr'

The test which is attached renders DocBook to PDF through (Xe)latex and because of lacking the above mentioned RTL delimitations it renders English phrases in reverse order of words. Also because there is no RTL/LTR signaling all numbers are rendered as Persian numbers which is not desirable when a Latin one is intended (such as in URLs). At the same time, because the auto-numbers are not normally generated in DocBook (? not sure it is always the case) the problem with auto-numbers (as in HTML) does not exist.

asciidoc_rtl_docbook_xelatex_pdf.zip

Contributor

shahryareiv commented Dec 27, 2015

Results from DocBook conversion (attached the result):
There is another problem with RTL in the generated DocBook. While HTML renderers usually detect changes from LTR to RTL or vice versa (they only need to know the default direction) but in DocBook case as it is not intended for a standard rendering system it is needed to delimit the LTR texts inside another RTL text or vice versa.

At the moment we have :

<simpara> یک نوشته some text یک نوشته</simpara>

While this should be probably:

<simpara> یک نوشته <phrase dir="ltr">some text</phrase> یک نوشته </simpara>

I don't know if dir is inherited by child nodes (like HTML), but if probably yes then we need for documents with :direction: rtl the main container contains dir=rtl while all non-rtl texts contain 'dir=ltr'

The test which is attached renders DocBook to PDF through (Xe)latex and because of lacking the above mentioned RTL delimitations it renders English phrases in reverse order of words. Also because there is no RTL/LTR signaling all numbers are rendered as Persian numbers which is not desirable when a Latin one is intended (such as in URLs). At the same time, because the auto-numbers are not normally generated in DocBook (? not sure it is always the case) the problem with auto-numbers (as in HTML) does not exist.

asciidoc_rtl_docbook_xelatex_pdf.zip

@mojavelinux

This comment has been minimized.

Show comment
Hide comment
@mojavelinux

mojavelinux Mar 16, 2016

Member

I discovered this remarkably enlightening and timely article on the subject of bidirectional text on the web (posted on opensource.com). It was almost as though this article was written to help us resolve this issue :)

https://opensource.com/life/16/3/twisted-road-right-left-language-support

Member

mojavelinux commented Mar 16, 2016

I discovered this remarkably enlightening and timely article on the subject of bidirectional text on the web (posted on opensource.com). It was almost as though this article was written to help us resolve this issue :)

https://opensource.com/life/16/3/twisted-road-right-left-language-support

@shahryareiv

This comment has been minimized.

Show comment
Hide comment
@shahryareiv

shahryareiv Dec 13, 2016

Contributor

Unicode bidirectional algorithm detects when a bidi source changes from left to right (or vice versa) and applies the correct (costume) markup (so no need to manually specify it)
http://www.unicode.org/reports/tr9/
There are also open source implementations of this algorithm:
https://www.fribidi.org/

Contributor

shahryareiv commented Dec 13, 2016

Unicode bidirectional algorithm detects when a bidi source changes from left to right (or vice versa) and applies the correct (costume) markup (so no need to manually specify it)
http://www.unicode.org/reports/tr9/
There are also open source implementations of this algorithm:
https://www.fribidi.org/

@mohamed-ali

This comment has been minimized.

Show comment
Hide comment
@mohamed-ali

mohamed-ali Jul 11, 2017

Hi all,

Any progress on this issue? Otherwise, how can I help? where can I start if I want to contribute?
PS: I am a python developer.

cheers,
Mohamed Ali.

mohamed-ali commented Jul 11, 2017

Hi all,

Any progress on this issue? Otherwise, how can I help? where can I start if I want to contribute?
PS: I am a python developer.

cheers,
Mohamed Ali.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment