Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow access to the @title AbstractBlock variable without its implicit substitutions. #4386

Open
ggenzone opened this issue Nov 24, 2022 · 7 comments
Assignees
Milestone

Comments

@ggenzone
Copy link

From the perspective of using the API to traverse the AST, I think it might be useful to expose the @title variable without implicit substitutions.

It would be desirable to allow access to the content of the blocks avoiding the implicit substitutions of the inline notation. Specifically, it would be nice to have a getter that allows you to read the title of a section without implicitly translating the inline code.

In paragraphs this is possible but there is no way (AFAIK) to achieve the same with the section title, maybe a source getter could solve it.

require 'asciidoctor'

example_document = <<-ADOC
= Title

== image:filename.png[] Images feature with macros

Some paragraph with macros image:filename.png[]
ADOC

doc = Asciidoctor.load example_document


section = doc.blocks[0]

puts section 
#<Asciidoctor::Section@60 {level: 1, title: "image:filename.png[] Images feature with macros", blocks: 1}>

# Section title with macro subs
puts section.title
#<span class="image"><img src="filename.png" alt="filename"></span> Images feature with macros


# Section title without macro subs
#
# Expected :  puts section.source  / undefined
#    
# Remove subs, does not work
#   section.remove_sub 'specialcharacters'.to_sym
#   section.remove_sub 'macros'.to_sym
#   section.remove_sub 'quotes'.to_sym
#   section.remove_sub 'replacements'.to_sym
#   section.remove_sub 'post_replacements'.to_sym
#   puts section.title  
#


paragraph = section.blocks[0]

# With macro subs
puts paragraph.content
# Some paragraph with macros <span class="image"><img src="filename.png" alt="filename"></span>

# Without macro subs
puts paragraph.source
# Some paragraph with macros image:filename.png[]
# asciidoctor -v
Asciidoctor 2.0.18 [https://asciidoctor.org]
Runtime Environment (ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]) (lc:UTF-8 fs:UTF-8 in:UTF-8 ex:UTF-8)
@ggenzone
Copy link
Author

ggenzone commented Dec 2, 2022

There is also other block that lack interfaces to access raw content.

= Document


.image:number-1.png[One] Set up networking
paragraph

// Within a paragraph there is two interface to access to the title
//    - puts paragraph_block.title                  <--- <span class="image"><img src="number-1.png" alt="number-1"></span> Set up networking
//    - puts paragraph_block.attributes['title']     <--- image:number-1.png[One] Set up networking


.image:number-1.png[One] Set up networking
* Item 1
* Item 2
* Item 3
// Within a ulist/olist there is only one interface to access to the title
//   - puts paragraph_block.title                  <--- <span class="image"><img src="number-1.png" alt="number-1"></span> Set up networking

What would be nice is to be able to traverse the AST of a file and be able to read the original content. I don't think there are many more cases like these.

In short, these two structures do not have access to the original title without substitutions.

  • Asciidoctor::Section
  • Asciidoctor::List

If this would be possible, a file could be read and rewritten, replacing the text with translated text but keeping all the structure. Right now the only way is to try to reverse the substitutions in these places.

@mojavelinux
Copy link
Member

It's possible to access this information today.

Asciidoctor::Section

section.instance_variable_get :@title

Asciidoctor::ListItem

li.instance_variable_get :@text

I understand that this is not ideal. I'm simply documenting what is possible.

@mojavelinux
Copy link
Member

To keep consistent with the terminology that we are using, I think source_title and source_text make the most sense as names for the public accessors for this data. I want to avoid the word "raw" as that has a different meaning in AsciiDoc.

@mojavelinux
Copy link
Member

mojavelinux commented Dec 2, 2022

What would be nice is to be able to traverse the AST of a file and be able to read the original content. I don't think there are many more cases like these.

This is never going to be fully possible, at least not with the current implementation of Asciidoctor. That's because Asciidoctor is focused primarily on conversion, not building a 1-to-1 representation of the source. Asciidoctor does not track all lines, and it does not always keep track of where it got information from.

I describe a technique for linking up the parsed document with the original source in the following topic in the project chat: https://asciidoctor.zulipchat.com/#narrow/stream/279642-users/topic/How.20to.20access.20raw.20AsciiDoc.20when.20parsing

If this would be possible, a file could be read and rewritten, replacing the text with translated text but keeping all the structure. Right now the only way is to try to reverse the substitutions in these places.

To reiterate, this will not be possible. You can't reproduce the input after parsing the document with Asciidoctor using purely the information in the parsed document. It's necessary to go back to the original source to apply edits if you want all the information to be preserved and in the same order.

@ggenzone
Copy link
Author

ggenzone commented Dec 2, 2022

It's possible to access this information today.

Asciidoctor::Section

section.instance_variable_get :@title

Asciidoctor::ListItem

li.instance_variable_get :@text

I understand that this is not ideal. I'm simply documenting what is possible.

Awesome, I was able to access both in ListItem using @title but it worked.

This is never going to be fully possible, at least not with the current implementation of Asciidoctor. That's because Asciidoctor is focused primarily on conversion, not building a 1-to-1 representation of the source. Asciidoctor does not track all lines, and it does not always keep track of where it got information from.

I describe a technique for linking up the parsed document with the original source in the following topic in the project chat: https://asciidoctor.zulipchat.com/#narrow/stream/279642-users/topic/How.20to.20access.20raw.20AsciiDoc.20when.20parsing

To reiterate, this will not be possible. You can't reproduce the input after parsing the document with Asciidoctor using purely the information in the parsed document. It's necessary to go back to the original source to apply edits if you want all the information to be preserved and in the same order.

I understand the limitations, I also know that the preprocessor overrides include:: macros but there is still a lot that can be done. Although information is lost, the content of the document can be read fairly well.

To keep consistent with the terminology that we are using, I think source_title and source_text make the most sense as names for the public accessors for this data. I want to avoid the word "raw" as that has a different meaning in AsciiDoc.

It would be perfect, for now the solution you told me, even if it is not ideal, works for me. Feel free to close the issue if you see fit.

Thanks again, I still owe you an example just to show you the potential of translating AsciiDoc content using NMT with this approach.

@mojavelinux
Copy link
Member

mojavelinux commented Dec 2, 2022

Glad to hear that gets you moving!

I'm actually fond of this idea, so I'm going to keep it open and implement it in 2.1.0. One minor correction, though. The property names should be title_source and text_source, not the other way around. I had it mixed up when I replied.

These will be reader methods since it's already possible to set these properties using title= and text=.

@mojavelinux mojavelinux self-assigned this Dec 2, 2022
@mojavelinux mojavelinux added this to the v2.1.x milestone Dec 2, 2022
@mojavelinux
Copy link
Member

This has become a duplicate of #1146 (and vice versa).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants