-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch includes asynchronously (in-browser / client-side) #630
Comments
Hello @tigregalis I had this discussion with @mojavelinux but it's not an easy task 😅 As you saw the includes need to be resolved sequentially. In fact Asciidoctor parses the document from top to bottom and will resolve includes along the way. For instance let's take this simple example: main.adoc include::attrs.adoc[]
include::installation.adoc[] attrs.adoc :program-version: 1.2.3 installation.adoc Install program version {program-version} In this example However I think we can safely resolve includes asynchronously when the included file is not an AsciiDoc document: [source,ruby]
----
include::app.rb[]
---- Or (embedded) images when image::sunset.jpg[Sunset,300,200] We could hack something in Asciidoctor.js but I think it would be better if Asciidoctor core knows that the function can be async... |
If we think outside of the include processor (so to speak), the solution to this is a lot easier than we realize. What we want to do is start processing the document once all the includes have been resolved. In order to do that, we flip the include processing inside-out. First, we grab all the include URLs, then toss them into a Promise.all to build a map of URL to content. When that promise resolves, we invoke the processor. The include processor can reach back into this resolved map to grab the content for a URL when it comes across an include. (In fact, you don't have to limit it to URLs since file IO is also async). This is actually the right way to do async includes. If we designed the processor to understand an async include processor, all we would end up doing is turning it into a sync operation because we can't advance the line until we have the content. The benefit of the approach I'm proposing is that it opens the door to managing the cache of the content you downloaded to avoid having to download it every time. And if you're processing a lot of documents, you could even avoid repeat visits to the same URL. In order to make this work, we'd need an API for extracting the include targets. (Keep in mind this could be recursive). It's actually pretty simple since includes aren't context sensitive (they don't see blocks). A very naive approach would just be to look for any lines that match an include directive. A slightly better approach is to handle all preprocessor conditionals too. There might be some sharp edges we'll find, but I think we can get most of the way there, then think about how to take it the rest of the way after that. |
Btw, in order to support URL includes in Antora, I'd really like to do it this way instead of invoking a sync subprocess. (Though we could to the latter first, then improve on it). Related issue in Antora: https://gitlab.com/antora/antora/issues/246 |
After reading @Mogztter's comment I was about to suggest, but @mojavelinux already beat me to it: Can you defer the resolution of attributes until all includes have been retrieved? In the case of recursion, recursion is already an issue as it is:
== this is from recursion 1
hello
include::recursion-2.adoc[]
== this is from recursion 2
world
include::recursion-1.adoc[] renders:
To deal with this, it would be appropriate to generate a tree of includes, and at each include, check along the current path from root-to-branch. Given the following documents:
And the following includes, in this order:
The tree generated would be:
|
👍
It might be an edge case but what about: attrs.adoc :include-all: main.adoc include::attrs.adoc[]
ifdef::include-all[]
include::a.adoc[]
include::b.adoc[]
include::c.adoc[]
include::d.adoc[]
include::e.adoc[]
include::f.adoc[]
// ...
endif::[] If we took the naive approach we will fetch
What do you mean by conditionals ? We should execute all the registered preprocessor but I don't get the conditionals ? Maybe you are referring to
If there's a infinite loop, we should use the
Indeed if we have the complete tree we could do the resolution and detect loop early 👍 |
Attributes can also be in the target itself, e.g, https://docs.antora.org/antora/1.0/asciidoc/include-content/#include-partial include::{partialsdir}/log-definition.adoc[]
You could do it greedily, but iteratively. As far as I'm aware, there are four types of includes:
The flow would be:
If all includes were unconditional, this is the best case scenario, and the lowest time spent waiting for files. If all includes were conditional, this is the worst case scenario, and a lot of time would be spent waiting for files. ExampleRetrieve the root document
Parse the root document to find unconditional includes and conditional includes, and start building up the tree one-level deep, leaving a stub for each.
Retrieve all unconditional stubs. Don't retrieve the conditional stubs.
Parse the retrieved documents to find unconditional includes and conditional includes.
At this point, there are two conditions which are true: there are no unconditional stubs, and there are conditional stubs. So resolve the attributes up to the first conditional.
Even if the
Retrieve all unconditional stubs. Don't retrieve the conditional stubs.
Parse the retrieved documents to find unconditional includes and conditional includes.
At this point, there are no unconditional stubs, and there are no conditional stubs. So start processing the overall document. |
Thanks @tigregalis for the detailed example. @mojavelinux I'm wondering how we should proceed ? Maybe we could implement something in Asciidoctor.js and introduce an experimental API so we can test it against various documents and fine tune the solution (before committing to it in Asciidoctor core) ? Speaking of attributes and resolution, it's reminding me of another issue: asciidoctor/asciidoctor#1911 (comment) |
@tigregalis Are you working on it ? I want to give it a try but I don't want to step on your toes 😉 |
@Mogztter I've seen that you've started making some changes in #633 and read a bit of the discussion there. I don't understand the codebase well enough to contribute, and the syntax has a lot of quirks (discussed further below), but I have been playing around with building a wrapper/extension as a proof-of-concept, with my plan as follows:
Other than asciidoctor.js itself processing the file(s) at the end, this should be completely async and should be quite efficient. I've been going through the asciidoctor user manual and testing a number of scenarios, and I'm finding there's a lot of flexibility, a lot of exceptions and a lot of strange behaviour that is hard to account for. Asynchronously fetching includes is relatively easy, even doing nested includes: as long as it's not conditional. Doing conditional resolution of includes though... a lot of these edge-cases can be pretty nightmarish.
I guess it stems from the fact that includes are just points in the target file at which you can substitute the include text with the contents of the referenced file, i.e. partials. There are no syntactical rules for when an include is valid or not. All includes are valid. Some of this unfortunate flexibility is also found with the This is an aside, but are syntactical changes within the scope of Asciidoctor 2.x? If so, there are a few ideas I have to make it more logical and tree-like:
I'd also suggest building it async-first. A JSON AST as an output format would be excellent as well: I've dabbled with writing an extension to do this, but haven't gotten very far yet. |
@tigregalis No worries! Again thanks for taking the time to write your thoughts. I think we should iterate on this feature and take one step at a time. While I agree with a lot of things you said I think we should leave attributes for now and focus on unconditional includes that don't rely on attributes.
Even with this limited scope we have a few things to decide. vfs.addFile({path: '/path/to/file.adoc', contents: Buffer()}) or create/provide one: asciidoctor.convert(input, {vfs: vfs}) At some point we will also need to decide if we should replace the include directive with the file content when it's "safe". Might be a premature optimization but we don't need to parse an include twice with this simple example: main.adoc include::a.adoc[] a.adoc a In this case we could resolve the include and call Asciidoctor with the following input: main.adoc
Anyway first thing first, let's work on the virtual file system and on the process to fetch files efficiently. |
Haven't been keeping up with this for a while. Has this moved at all? If I want to contribute to the codebase, where would be the best place to start (reading)? |
It's still a work in progress: #644
If you want to understand how Asciidoctor resolves the include directive then you should read the Asciidoctor Ruby code base. In fact, the code generated from Opal is harder to read. The Asciidoctor.js code base is now in the packages/core directory. Please note that most of the code is generated from the Asciidoctor Ruby library. If you want to override a method from Asciidoctor Ruby you should create a Ruby file. For instance we override the |
To be honest, I just don't know. The language was designed in such a way that includes are synchronous and trying to decouple that in any way has side effects that may cause the parser to produce a different result. (Yes, I said what I said above, but in reality it's just more complex). One possible solution is to do something like what Asciidoctor Reducer does, which is to expand all the includes before parsing (i.e., preprocess the document from top to bottom). The benefit there is that you can leverage the existing parser without having to resort to using what amounts to an alternate parser. Of course, that means Asciidoctor Reducer would need to be able to work with async operations itself...so changes may still be needed, but it's perhaps less invasive. In the AsciiDoc Language, what we really need to do is be able to walk a document to identify preprocessor directives without parsing...which may even end up being a separate step in the parsing of AsciiDoc. It's a major problem for defining a grammar as well. |
IMHO, this is just narrow minded of them. |
I'd like to fetch documents and includes asynchronously (in-browser), using
fetch
.By default, it uses synchronous XMLHttpRequest, which locks the browser and also give us a warning:
That's why I'd like to suggest a loadAsync / convertAsync API for Asciidoctor.js.
Perhaps a related suggestion is a way for the user to specify their own http client, e.g. if someone wishes to use fetch, axios, or node-fetch.
As a really convoluted work-around / proof-of-concept, I've managed to use two include processor extensions and Promises to progressively "load" the document. Right now this will only go down to one level of includes. There are a number of other issues of course, but primarily is that it's loading/parsing the document more than once.
The text was updated successfully, but these errors were encountered: