Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

ggenzone · 2022-11-24T09:05:18Z

From the perspective of using the API to traverse the AST, I think it might be useful to expose the @title variable without implicit substitutions.

It would be desirable to allow access to the content of the blocks avoiding the implicit substitutions of the inline notation. Specifically, it would be nice to have a getter that allows you to read the title of a section without implicitly translating the inline code.

In paragraphs this is possible but there is no way (AFAIK) to achieve the same with the section title, maybe a source getter could solve it.

require 'asciidoctor'

example_document = <<-ADOC
= Title

== image:filename.png[] Images feature with macros

Some paragraph with macros image:filename.png[]
ADOC

doc = Asciidoctor.load example_document


section = doc.blocks[0]

puts section 
#<Asciidoctor::Section@60 {level: 1, title: "image:filename.png[] Images feature with macros", blocks: 1}>

# Section title with macro subs
puts section.title
#<span class="image"><img src="filename.png" alt="filename"></span> Images feature with macros


# Section title without macro subs
#
# Expected :  puts section.source  / undefined
#    
# Remove subs, does not work
#   section.remove_sub 'specialcharacters'.to_sym
#   section.remove_sub 'macros'.to_sym
#   section.remove_sub 'quotes'.to_sym
#   section.remove_sub 'replacements'.to_sym
#   section.remove_sub 'post_replacements'.to_sym
#   puts section.title  
#


paragraph = section.blocks[0]

# With macro subs
puts paragraph.content
# Some paragraph with macros <span class="image"><img src="filename.png" alt="filename"></span>

# Without macro subs
puts paragraph.source
# Some paragraph with macros image:filename.png[]

# asciidoctor -v
Asciidoctor 2.0.18 [https://asciidoctor.org]
Runtime Environment (ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8)

The text was updated successfully, but these errors were encountered:

ggenzone · 2022-12-02T09:29:17Z

There is also other block that lack interfaces to access raw content.

= Document


.image:number-1.png[One] Set up networking
paragraph

// Within a paragraph there is two interface to access to the title
//    - puts paragraph_block.title                  <--- <span class="image"><img src="number-1.png" alt="number-1"></span> Set up networking
//    - puts paragraph_block.attributes['title']     <--- image:number-1.png[One] Set up networking


.image:number-1.png[One] Set up networking
* Item 1
* Item 2
* Item 3
// Within a ulist/olist there is only one interface to access to the title
//   - puts paragraph_block.title                  <--- <span class="image"><img src="number-1.png" alt="number-1"></span> Set up networking

What would be nice is to be able to traverse the AST of a file and be able to read the original content. I don't think there are many more cases like these.

In short, these two structures do not have access to the original title without substitutions.

Asciidoctor::Section
Asciidoctor::List

If this would be possible, a file could be read and rewritten, replacing the text with translated text but keeping all the structure. Right now the only way is to try to reverse the substitutions in these places.

mojavelinux · 2022-12-02T09:35:48Z

It's possible to access this information today.

Asciidoctor::Section

section.instance_variable_get :@title

Asciidoctor::ListItem

li.instance_variable_get :@text

I understand that this is not ideal. I'm simply documenting what is possible.

mojavelinux · 2022-12-02T09:37:05Z

To keep consistent with the terminology that we are using, I think source_title and source_text make the most sense as names for the public accessors for this data. I want to avoid the word "raw" as that has a different meaning in AsciiDoc.

mojavelinux · 2022-12-02T09:38:40Z

What would be nice is to be able to traverse the AST of a file and be able to read the original content. I don't think there are many more cases like these.

This is never going to be fully possible, at least not with the current implementation of Asciidoctor. That's because Asciidoctor is focused primarily on conversion, not building a 1-to-1 representation of the source. Asciidoctor does not track all lines, and it does not always keep track of where it got information from.

I describe a technique for linking up the parsed document with the original source in the following topic in the project chat: https://asciidoctor.zulipchat.com/#narrow/stream/279642-users/topic/How.20to.20access.20raw.20AsciiDoc.20when.20parsing

If this would be possible, a file could be read and rewritten, replacing the text with translated text but keeping all the structure. Right now the only way is to try to reverse the substitutions in these places.

To reiterate, this will not be possible. You can't reproduce the input after parsing the document with Asciidoctor using purely the information in the parsed document. It's necessary to go back to the original source to apply edits if you want all the information to be preserved and in the same order.

ggenzone · 2022-12-02T09:57:48Z

It's possible to access this information today.

Asciidoctor::Section
section.instance_variable_get :@title
Asciidoctor::ListItem
li.instance_variable_get :@text
I understand that this is not ideal. I'm simply documenting what is possible.

Awesome, I was able to access both in ListItem using @title but it worked.

This is never going to be fully possible, at least not with the current implementation of Asciidoctor. That's because Asciidoctor is focused primarily on conversion, not building a 1-to-1 representation of the source. Asciidoctor does not track all lines, and it does not always keep track of where it got information from.

I describe a technique for linking up the parsed document with the original source in the following topic in the project chat: https://asciidoctor.zulipchat.com/#narrow/stream/279642-users/topic/How.20to.20access.20raw.20AsciiDoc.20when.20parsing

To reiterate, this will not be possible. You can't reproduce the input after parsing the document with Asciidoctor using purely the information in the parsed document. It's necessary to go back to the original source to apply edits if you want all the information to be preserved and in the same order.

I understand the limitations, I also know that the preprocessor overrides include:: macros but there is still a lot that can be done. Although information is lost, the content of the document can be read fairly well.

To keep consistent with the terminology that we are using, I think source_title and source_text make the most sense as names for the public accessors for this data. I want to avoid the word "raw" as that has a different meaning in AsciiDoc.

It would be perfect, for now the solution you told me, even if it is not ideal, works for me. Feel free to close the issue if you see fit.

Thanks again, I still owe you an example just to show you the potential of translating AsciiDoc content using NMT with this approach.

mojavelinux · 2022-12-02T22:05:24Z

Glad to hear that gets you moving!

I'm actually fond of this idea, so I'm going to keep it open and implement it in 2.1.0. One minor correction, though. The property names should be title_source and text_source, not the other way around. I had it mixed up when I replied.

These will be reader methods since it's already possible to set these properties using title= and text=.

mojavelinux · 2023-04-22T07:18:26Z

This has become a duplicate of #1146 (and vice versa).

mojavelinux self-assigned this Dec 2, 2022

mojavelinux added the enhancement label Dec 2, 2022

mojavelinux added this to the v2.1.x milestone Dec 2, 2022

mojavelinux mentioned this issue Apr 22, 2023

Make Raw Content available for titles and content-parts #1146

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

ggenzone commented Nov 24, 2022

ggenzone commented Dec 2, 2022

mojavelinux commented Dec 2, 2022

mojavelinux commented Dec 2, 2022

mojavelinux commented Dec 2, 2022 •

edited

ggenzone commented Dec 2, 2022

mojavelinux commented Dec 2, 2022 •

edited

mojavelinux commented Apr 22, 2023

Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

Comments

ggenzone commented Nov 24, 2022

ggenzone commented Dec 2, 2022

mojavelinux commented Dec 2, 2022

mojavelinux commented Dec 2, 2022

mojavelinux commented Dec 2, 2022 • edited

ggenzone commented Dec 2, 2022

mojavelinux commented Dec 2, 2022 • edited

mojavelinux commented Apr 22, 2023

mojavelinux commented Dec 2, 2022 •

edited

mojavelinux commented Dec 2, 2022 •

edited