Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolves #3765 download and embed custom remote stylesheet if allow-uri-read is set #3766

Merged

Conversation

mojavelinux
Copy link
Member

No description provided.

@mojavelinux mojavelinux force-pushed the issue-3765-embed-remote-stylesheet branch from dc49b35 to aa61dda Compare September 28, 2020 06:00
@mojavelinux mojavelinux force-pushed the issue-3765-embed-remote-stylesheet branch from aa61dda to eea4a23 Compare October 23, 2020 05:47
@mojavelinux mojavelinux merged commit 3bf8cb3 into asciidoctor:master Oct 24, 2020
@mojavelinux mojavelinux deleted the issue-3765-embed-remote-stylesheet branch October 24, 2020 05:57
@mojavelinux
Copy link
Member Author

@Mogztter This may have broke tests in Asciidoctor.js. Let me know if something seems incorrect.

@ggrossetie
Copy link
Member

Thanks for the heads up!
Indeed, two tests are failing in the browser tests suite: https://github.com/asciidoctor/asciidoctor.js/runs/1301380876
I will take a look later today or tomorrow.

@mojavelinux
Copy link
Member Author

Thanks! I want to make sure we get this cleared up before the 2.0.11 release, though there's still plenty of time for that.

@ggrossetie
Copy link
Member

ggrossetie commented Oct 25, 2020

So the issue is that in a browser environment the target is a URI and the read_contents will only read the contents of the file if allow-uri-read is defined.
Previously, read_asset will unconditionally read the contents of the file.

Let's take a concret example, the following code will not download and embed the stylesheet:

const options = {
  doctype: 'article',
  safe: 'unsafe',
  header_footer: true,
  attributes: ['showtitle', 'stylesheet=asciidoctor.css', 'stylesdir=http://localhost:5000/build/css']
}
const html = asciidoctor.convert('=== Test', options)

In my opinion, it's a bit odd to force the user to define allow-uri-read when everything is on the same domain.

The second issue is more or less the same but when using the chrome:// protocol: asciidoctor/asciidoctor.js#816

Until now, my workaround was to monkey patch PathResolver#root? to return true when the path starts with http://, https:// or chrome:// in a browser environment:

https://github.com/asciidoctor/asciidoctor.js/blob/3ce815f48a2041624776829a48c9220effd521dc/packages/core/lib/asciidoctor/js/asciidoctor_ext/browser/path_resolver.rb

I could probably monkey patch the whole read_contents in a browser environment to check if the target is on the same domain.
If this is the case, then the attribute allow-uri-read won't be required.

A cleaner approach would be to define a allow_uri_read? method:

def allow_uri_read? target
  @document.attr? 'allow-uri-read'
end

That would allow Asciidoctor.js to monkey patch this specific method (instead of the whole read_contents method).
Something like:

def allow_uri_read? target
  # chrome:// extension URI
  return true if target.start_with? 'chrome://'
  # same domain
  return true if target.start_with? `window.location.protocol + '//' + window.location.host`

  @document.attr? 'allow-uri-read'
end

@mojavelinux What do you think?

@mojavelinux
Copy link
Member Author

the read_contents will only read the contents of the file if allow-uri-read is defined. Previously, read_asset will unconditionally read the contents of the file.

I was not expecting there to be difference there in the logic. I'll need to revisit that.

Until now, my workaround was to monkey patch PathResolver#root? to return true when the path starts with http://, https:// or chrome:// in a browser environment:

I think it would be better if we moved this check to a method like url_protocol? or something so it is simpler to patch. The question we are asking is whether this is a URL we can read from. Outside of the browser environment, it can only be http:// or https://. But the browser environment extends this. And that's what the code should reflect. So let's try to get that in there.

@mojavelinux
Copy link
Member Author

mojavelinux commented Oct 25, 2020

I was not expecting there to be difference there in the logic. I'll need to revisit that.

I see, the logic is different given how you have patched it. The read_asset method in core only reads from the filesystem. The read_contents method goes further an enables reading from a URL. Obviously, in core, this is going to require allow-uri-read since we cannot read from any URL unless permission is granted.

I would recommend replacing this method since the whole point of the method is different in the browser environment.

@mojavelinux
Copy link
Member Author

It's actually a bit of a mistake that we have both read_asset and read_contents. I added read_contents because read_asset wasn't sufficient for what we needed it to do. Rather than try to add behavior to it, I just cut a new method. But I think read_asset should probably be phased out eventually.

@ggrossetie
Copy link
Member

I think it would be better if we moved this check to a method like url_protocol? or something so it is simpler to patch. The question we are asking is whether this is a URL we can read from. Outside of the browser environment, it can only be http:// or https://. But the browser environment extends this. And that's what the code should reflect. So let's try to get that in there.

Why not 👍

Obviously, in core, this is going to require allow-uri-read since we cannot read from any URL unless permission is granted.

I think this logic still makes sense in a browser environment. The only difference is that, in a browser environment, we can establish a list of trusted domain/protocol.
Expanding on this idea, currently, the allow-uri-read attribute is all or nothing but it might be useful to configure a list of trusted URIs (allowlist/denylist).
For instance, I might want to read assets from https://cdn.happylittletrees.org but I don't want Asciidoctor to download and embed assets from http://evil.com.
In this context, a allow_uri_read? target method would probably makes sense in Asciidoctor core?

Anyway, it might be something we should consider in the future.

I would recommend replacing this method since the whole point of the method is different in the browser environment.

I don't 100% agree, but it's probably the safest/quickest solution.
I will patch read_contents logic in Asciidoctor.js 👍

@mojavelinux
Copy link
Member Author

mojavelinux commented Oct 25, 2020

I think this logic still makes sense in a browser environment. The only difference is that, in a browser environment, we can establish a list of trusted domain/protocol. Expanding on this idea, currently, the allow-uri-read attribute is all or nothing but it might be useful to configure a list of trusted URIs (allowlist/denylist).

I think you're trying to change the meaning of allow-uri-read. In core, it should be all or nothing because the point is to control use of the network...even for trusted domains.

If we entertained the idea of trusted domains in core, it would only be as a comma-separate list in the value of the allow-uri-read attribute. So it would build on the existing feature, not circumvent it.

If the browser environment has a different definition of a remote URI (like an expanded definition of a file URI to include chrome), then it can treat certain URIs differently. (And we want enough logic hooks for that).

In this context, a allow_uri_read? target method would probably makes sense in Asciidoctor core?

Only with allow-uri-read set and an additional allowlist (which, of course, the extension could then leverage).

If the chrome protocol is really a local request, and you want to match the behavior of the read_asset method, then consider patching Helpers.uriish?. I think you keep trying to figure out how to allow remote requests when what you should focus on is what the definition of a remote URI is. If the browser environment can influence what is local vs remote, then you won't have problems. And if you can't find a path, then overriding read_content is the right approach.

(It's okay if the browser environment and core have a different understanding of what constitutes a remote URI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants