Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues related to rendering Pod6 in GitHub #55

Closed
kivikakk opened this issue Dec 14, 2018 · 38 comments
Closed

Issues related to rendering Pod6 in GitHub #55

kivikakk opened this issue Dec 14, 2018 · 38 comments

Comments

@kivikakk
Copy link

Hi! I'm implementing Pod 6 rendering on github.com using this library. Everything works great so far! Have a look:

https://github.com/kivikakk/pod6test/blob/master/test.pod6

There's only one nit, which you'll see in that example: class names are getting Pod::To::HTML:: prepended, I guess because the code is getting evaluated in the context of the module.

Here's a quick repro:

test.p6
#| Hello!
class XYZ {}

Using perl6 --doc=HTML:

$ perl6 --doc=HTML test.p6 | awk '/pod-body/, /\/div/'
    <div class="pod-body no-toc)">
    <h1>XYZ: Hello!</h1>
    </div>
$

Using Pod::To::HTML directly:

$ perl6
To exit type 'exit' or '^D'
> use Pod::To::HTML;
Nil
> for Pod::To::HTML.render('test.p6'.IO).lines { .say if /'pod-body'/ ff /'/div'/ }
    <div class="pod-body no-toc)">
    <h1>Pod::To::HTML::XYZ: Hello!</h1>
    </div>
>

Above is with Rakudo Star 2018.10. I've just installed Rakudo 2018.11 from scratch and tried with that, and the results are exactly same, weird no-toc) bit and all.

@kivikakk
Copy link
Author

Oh hey, look, https://github.com/perl6/Pod-To-HTML/blob/master/README.pod6 renders now!

@kivikakk
Copy link
Author

So, this is probably not the solution... but, is it?

diff --git a/lib/Pod/To/HTML.pm b/lib/Pod/To/HTML.pm
index 64f88ca..f195e63 100644
--- a/lib/Pod/To/HTML.pm
+++ b/lib/Pod/To/HTML.pm
@@ -301,7 +301,7 @@ multi sub node2html(Pod::Block::Declarator $node) {
         }
         default {
             Debug { note "I don't know what {$node.WHEREFORE.WHAT.perl} is. Assuming class..." };
-        "<h1>"~ node2html([$node.WHEREFORE.perl, q{: }, $node.contents])~ "</h1>";
+        "<h1>"~ node2html([$node.WHEREFORE.perl.subst(/^'Pod::To::HTML::'/, ''), q{: }, $node.contents])~ "</h1>";
         }
     }
 }

@JJ
Copy link

JJ commented Dec 14, 2018

Hi, Ashe. First, thanks a lot for doing this. We'll do what we can to help you. But let me understand what you need. You need no class at all there, or something else? We can implement a switch that, by default, does not include any class name (and the TOC part). Would that be OK for you? I don't think substituting it on the tail end would make a lot of sense, but it might...

@JJ
Copy link

JJ commented Dec 14, 2018

OK, I get this. You are talking about actual class names. Let's create a test for that and fix it.

JJ added a commit that referenced this issue Dec 14, 2018
Refs #55, for the time being it does not reproduce the error. Will
need more tests.
JJ added a commit that referenced this issue Dec 14, 2018
@JJ
Copy link

JJ commented Dec 14, 2018

Well, it does happen if you use Pod::To::HTML directly. Now that's weird.

@JJ
Copy link

JJ commented Dec 14, 2018

But I think I know the reason why...

@JJ
Copy link

JJ commented Dec 14, 2018

EVAL is used to render the pod, but that's done within the current package. That is why the namespace of the current package is appended, but only if you call it the way you do. That's by design, it's not an error. If you positively need to to that, we can try and fix it at the package level. At any rate, we do need to remove package names prepended to class names. So mid-term fix will have to be fixing that. And thanks!

JJ added a commit that referenced this issue Dec 14, 2018
Refs #55. Still the same problem with strings, but that's not a
blocker right now. Please @kivikakk download new version to check this
works correctly, ping me if it does not.
@kivikakk
Copy link
Author

FWIW, we're using a vendored copy of Pod::To::HTML for this (see https://github.com/github/markup/tree/master/vendor), so we can just update the submodule reference, publish a new github-markup gem and use that in github.com to get the latest fix. So let me know when you think it's ready! cdc6266 looks promising.

@kivikakk
Copy link
Author

btw, this process has been a lovely introduction to Perl 6 for me! Really enjoying it.

@JJ
Copy link

JJ commented Dec 15, 2018

It takes a full day to update the repositories, so it's not going to work until later today, circa 12pm Madrid time. I'll let you know. And thanks for all the work you're doing! Also, welcome to the community :-)

@JJ JJ closed this as completed in 22d0418 Dec 15, 2018
@JJ
Copy link

JJ commented Dec 15, 2018

It's already in the repositories. You'll need to reinstall dependencies, since there's a new one now. Please reopen if you find any other problem or it does not really fix it.

@kivikakk
Copy link
Author

Thanks so much. Changes underway now to use it.

@kivikakk
Copy link
Author

There seems to be a few issues with the new Pod::To::HTML:

  • It only gets the first Pod "element" from a document, and ignores the rest.

    e.g. https://github.com/kivikakk/pod6test/blob/3114490895bf9fb26ef8ab9fe0013b7f35c2cb80/test6.pod renders only the line "Pod 6, determined heuristically." If I remove the =begin pod ... =end pod section, it only renders the <h1>, and not the Pod for the sub that follows. In short, most of the document won't be written out.

  • It writes temporary files that aren't cleaned up.

    /tmp/perl6-pod-load is filled with Pod document fragments, and subdirectories under there are filled with compilation artifacts. We can't use this on our servers as-is -- it will fill up containers until they crash.

  • There is a race condition with tempfile writing.

    See load in Pod::Load:

    my $initials= $string.words.map( *.substr(1,1) )[^128]:v;
    my $id = $tmp-dir~ $initials.join("") ~ ".pod6";
    spurt $id, $string;

    If I attempt to view two documents rendered via Pod::Load whose words have the same second letter, a race condition will ensue. Tempfiles should never have predictable names. Using File::Temp would shield you from race conditions, as it uses :exclusive when attempting to create the temp file (and retries in case the file already existed). It also ensures created files are unlinked, although an open issue on the GitHub repo for it raises questions about whether or not that really works due to reference holding.

    It doesn't help with the precompilation file issue, as they get created by something else entirely.

I would like to use the new version to get this bug fixed, but all three issues are individually show-stoppers.

@kivikakk
Copy link
Author

Maybe Temp::Path is better than File::Temp (see https://colabti.org/irclogger/irclogger_log/perl6-dev?date=2017-06-05#l313).

@JJ
Copy link

JJ commented Dec 17, 2018 via email

@JJ
Copy link

JJ commented Dec 17, 2018

Re: temporary files: You are going to have that all over the place if you use Perl 6, it's the precomp cache and you'll have .precomp files all over. I don't see an easy solution to that. We can remove them every time they are created, but that would have a (maybe small) performance penalty.

@kivikakk
Copy link
Author

You are going to have that all over the place if you use Perl 6, it's the precomp cache and you'll have .precomp files all over. I don't see an easy solution to that.

It's fine if we end up with precompiled code as a result of running our Perl code -- it's not okay if we end up with precompiled code sitting on the disk as a result of converting Pod to HTML. (i.e. it's an artifact of converting a particular piece of Pod.) The first category won't grow with continued use, the second category will.

@JJ JJ reopened this Dec 17, 2018
@JJ JJ changed the title Class names get Pod::To::HTML prepended in some cases Issues related to rendering Pod6 in GitHub Dec 17, 2018
JJ added a commit to JJ/p6-pod-load that referenced this issue Dec 17, 2018
@JJ
Copy link

JJ commented Dec 17, 2018

OK, I'm working my way through this. 1 down, two to go. Checking out Temp::Path now in JJ/p6-pod-load#2

JJ added a commit to JJ/p6-pod-load that referenced this issue Dec 17, 2018
This refs to Raku/Pod-To-HTML#55, also closes #2.

Added some tests also for good measure
@JJ
Copy link

JJ commented Dec 17, 2018

OK, two down, one to go. This one is actually the messiest. Let's see.

@kivikakk
Copy link
Author

Excellent! One last question. Precompiling this way appears to execute the referenced code, not just precompile it. For instance:

$ cat test.p6
use v6.c;

use nqp;

our $precomp-store = CompUnit::PrecompilationStore::File.new(prefix => './tmp/'.IO);
our $precomp = CompUnit::PrecompilationRepository::Default.new(store => $precomp-store);

my $id = nqp::sha1(~'my-file');
my $handle = $precomp.load($id)[0];
without $handle {
    $precomp.precompile('my-file'.IO, $id, :force);
    $handle = $precomp.load($id)[0] // fail("Could not precompile");
}
$ cat my-file
"hello".say;
$ perl6 test.p6
hello
$

More precisely, it's $precomp.load that causes execution, rather than $precomp.precompile. We can't execute arbitrary code on our servers in order to render documentation -- Markup isn't made for things which need sandboxing. (We've already had to disable the current Pod 6 renderer having realised the old version of Pod::To::HTML's use of EVAL was completely unsafe.)

@JJ
Copy link

JJ commented Dec 17, 2018

The thing is that Pod6 is part of the language itself, not just some dumb comments. I was thinking about creating some Pod::To::Pod module that extracts just-the-pod out of a file. But even so there might be some way of executing code (via config passed to the Pod chunks). That can be eliminated too via a smart Pod::To::Pod module, though.

@kivikakk
Copy link
Author

Is the same not true of Pod in perl5? Pod::Simple, which we use for perl5 Pod, doesn't appear to actually execute the given code.

@JJ
Copy link

JJ commented Dec 17, 2018

In the case above, the thing is that piece of code is actually run ASAP after compilation, same as any top-level code. Even if it's called pod6, it will actually compile it.
Just seen your other comment. No, it's not. Pod in Perl5 are simply comments, not a DSL that is interpreted by the interpreter itself. perl5 sees it and says "OK, I don't care about this". You need stuff like perldoc to actually interpret it. That's not the case for Perl6.
I would say the only option is to "sanitize" the code leaving just the POD DSL. If you can think of something else, just let me know.

@JJ
Copy link

JJ commented Dec 17, 2018

Ah, and also it's not possible for the precomp store to actually stop doing that, since it's running part of the code to compute dependencies, for instance. That shouldn't be needed for Pod, though.
At the end of the day, the right thing would be to include in the core something that deals just with POD6. Next best thing, however, is to eliminate actual code from the Pod (which I was planning to do anyway for my Test::Pod::Embedded module, WIP)

JJ added a commit to JJ/p6-pod-load that referenced this issue Dec 17, 2018
We were using only the first element of the pod. However, this is a
major change in interface, because load now returns an array with at
least one element, but array nonetheless. That's going to break
everything. Well, the two functions that use it, anyway.

This closes #3, and works towards Raku/Pod-To-HTML#55. Going back there now
@JJ
Copy link

JJ commented Dec 17, 2018

In the way to fix the third part... Almost there now. The sanitizing part, that's going to be harder.

JJ added a commit that referenced this issue Dec 18, 2018
@JJ JJ closed this as completed in 86544cc Dec 18, 2018
@tbrowder
Copy link
Member

tbrowder commented Dec 18, 2018

@JJ, have you looked at having the pod classes spit out their original pod again? If we could do that, sanitizing would be easy, but maybe I’m missing something. (New lib Pod::To::Pod?)

@JJ
Copy link

JJ commented Dec 18, 2018

@tbrowder yep, it was mentioned. The problem is, as @jnthn says here, it's almost completely impossible to parse something without running something. The only alternative is to split the Perl 6 grammar to get only the pod part. Do you think that's even possible?

@tbrowder
Copy link
Member

tbrowder commented Dec 18, 2018

Sorry, I’m muddling up the whole process in my mind. FWIW, much of the Rakudo pod parsing code has been separated from other grammar and actions in Grammar.nqp and Actions.nqp.

Completely separating it may be easier now, although I think some of the non-pod code will need to be mixed in.

@JJ
Copy link

JJ commented Dec 18, 2018 via email

@tbrowder
Copy link
Member

Find the pod-only sections by searching on “POD-ONLY” in both Grammar.nqp and Actions.nqp.

I think a separate, pod-only parser would be an interesting challenge!

@tbrowder
Copy link
Member

tbrowder commented Dec 19, 2018

I revisited original comments of @jnthn on subject and he said a pod slang should do the trick, and I agree that it would be better than splitting off pod6 handling as a separate program. Breaking out the slang really looks way to complex for my p6 skills. I think Larry took quite some time on the last major slang work, so it’s not a weekend project for mere mortals for sure!

@raiph
Copy link

raiph commented Dec 20, 2018

P6 bucks norms and "tortures implementors" and this is a case in point.

The announcement of the draft of the design of Pod6 in 2009 that became today's Pod6 was at https://www.nntp.perl.org/group/perl.perl6.language/2009/08/msg32352.html

A thread in the ensuing discussion is directly relevant to the matters relevant to this github issue. There are two sub threads both of which began with a comment by Mortiz Lenz at https://www.nntp.perl.org/group/perl.perl6.language/2009/08/msg32359.html I'll repeat the essential text in Moritz's comment and do likewise for each of the replies that followed for the two sub threads.

SUB THREAD #1

Moritz Lenz: However it seems we have to pay a price: each act of rendering a Pod file actually means executing the program that's being documented (at least the BEGIN blocks and other stuff that happens at compile time), with all the security risks implied. So we'll need a very good sandbox. Is that worth it?

Damian: I believe so. ... do we just use Perl 6 as its metalanguage? [yes] ... Sure, there are downsides: Pod becomes much more tightly tied to Perl 6. ... the bad guys have been able to do [bad things] ... for a long time now ... And the people who do take such precautions [run in a sandbox] will simply start to do the same thing for any documented code (or coded documentation :-). Or maybe the sandbox will just be an option to disable any DOC blocks except the default one. Or else maybe perl -doc will just run under an augmented taint mode that suspects not just anything that emerges from an input op, but also anything created in a DOC'd command.

Kyle Hasselbacher: Pod itself is a DSL. If we're committed to giving guns to books, can we default to having the safety on? Can it be so that 'perl6doc foo.pl' does not execute any code without an option to allow it? Perl 5 programmers are sometimes surprised to find that 'perl -c strange.pl' can execute code. Imagine their surprise to find that 'perl6doc' does too.

Damian: There is no perl6doc. There is only: perl6 -doc That is, running the Perl interpreter in a special mode to get documentation. ... you can't tell what's documentation and what's code until you parse the mixture. And you can't parse Perl (5 or 6) without executing stuff. Look, I'm sure we will have a safety mode of parsing Pod (maybe: perl -undoc) But it can't be on by default

Jerry Gay: this is why it's spelled 'perl6 --doc', which should give you some hint that you're running the compiler, just as 'perl -c' does, and 'perldoc' doesn't.

SUB THREAD #2

Moritz Lenz: However it seems we have to pay a price: each act of rendering a Pod file actually means executing the program that's being documented (at least the BEGIN blocks and other stuff that happens at compile time), with all the security risks implied. So we'll need a very good sandbox. Is that worth it?

David Green: Yes. In general, if you've installed a module, it's because you're going to use it, and you already trust it. So this is a problem only if you're looking at the documentation for the first time to decide
whether you do want to use the module (and didn't already read the docs on CPAN.org first or something). Of course, CPAN will need a static copy of the docs anyway, so the solution is that authors should
provide a static file (preferably in a few formats, at least text and HTML). Sites like CPAN will probably make a static doc file a requirement, and even the cpan shell could warn users about any modules that don't
include static docs -- in fact, I think it would be reasonable to refuse to install such modules by default.

Jan Ingvoldstad: Why is it worth it? In general, executable documentation is a bad thing. It's been shown
to be a bad thing many times over. If we absolutely must have some sort of executable documentation, then
if I could, I would insist that it wouldn't be a feature complete language. That is: absolutely no IO in the language, no way of executing code that's foreign to the doc, and so on.

@raiph
Copy link

raiph commented Jan 3, 2019

A recap of this issue; and what I'm thinking is a proposal for progress. Afaict my proposal is practical and requires no action on the part of GH.

@kivikakk got pod6 rendering working. But realized that it requires execution of a parser that can do arbitrary execution and therefore needs sand boxing. As they put it, "Markup isn't made for things which need sandboxing." (cf the 2009 discussion in my previous comment.) Then, unless I'm mistaken, JJ closed this issue with a solution that requires sandboxing, i.e. helpful for other situations but, aiui, even if working perfectly, is a non-starter as a solution for this issue.

I can be an idiot. Maybe I'm missing obvious stuff. I'd be very happy to hear I've misunderstood. Until then the rest of this comment is a strawman proposal of what simple thing could be done next.

Afaict the only common sense conclusion is we must have a solution that avoids arbitrary execution on third party servers for rendering pod6. This is for GH and any other sane third party storage service.

Hence the following strawman proposal:

  • Authors in the P6 community start generating target translation files that are pre-rendered or renderable files that are in an existing format that doesn't require sandboxing for use when stored in repos. For example, given a file foo.p6 that contains pod6, foo's author could generate a foo.p6.md that contains a markdown version of the pod in foo.p6. Then this file could be copied alongside foo.p6 wherever it goes into a repo, such as when stored in a GH repo. The markdown version will then be available for clicking on, and will render appropriately, without GH needing to do anything.

  • We could/should/would also nail down a consistent convention to make this a little more organized, e.g. that any directory containing P6 source may contain a .doc subdirectory that contains these files that contain doc corresponding to the files in the parent directory of the .doc subdirectory.

  • We could/should/would create or modify tooling to make this easier.

  • We include a link in a source file to its corresponding doc file and from the doc file to its corresponding source file. The links could/should/would try to notice if their host source file and rendered doc file are out of step; avoid 404ing; and whatever else can be done to make this solution initially adequate (just a link) and eventually as good as it can be.

  • Maybe there's something we ask GH to do once we've got it together; but quite possibly this is an entirely autonomous solution and GH and other storage services don't need to do anything to support this.

@AlexDaniel
Copy link
Member

@raiph committing generated files (especially that amount of generated files) in a git repo is typically a big no-no.

@JJ
Copy link

JJ commented Jan 4, 2019

@raiph plus if you're generating .md, GitHub is going to render it for you anyway. Only way out I see is to generate a module that renders pod6 via an independent pod6 grammar (which, as @AlexDaniel says, could be actually overkill) or some other method.

@tbrowder
Copy link
Member

tbrowder commented Jan 4, 2019 via email

@finanalyst
Copy link

I've been re-writing the tool-chain for docs.perl6.org. Although the work is not quite complete, there are interim results . The main difference between this and the main docs.perl6.org is that there are no files under Routine and Syntax and the Search button generates a different type of output).

The reason I started this is that the tool-chain is a rats nest because it has evolved organically. Some of the issues raised here are related to this rats nest. Some are not.

The reason why there are no Routine/Syntax files is that all of these files are generated by htmlify from the pod sources in docs/Language and docs/Type

Pod::To::HTML generates a lot of the HTML in many different places. The Pod::Code block can run an external program provided to it from an environment variable (this is to allow for an external syntax highlighter). Pod::To::HTML does not always handle Pod blocks recursively. Consequently, external programs such as htmlify have to run code separately to get to the final Str, eg. for a Heading.

The perl6 --doc=xxx CLI calls render twice. So that the code for rendering pod6 is called twice.

The module I have created is purely based on templates, and so could easily be adapted to generate Markdown from Pod6 sources. The holdup for me at the moment is that the method rakudo calls providing it with the pod tree is Pod::To::xxx::render and rakudo calls it twice.

@raiph
Copy link

raiph commented Jan 11, 2019

@AlexDaniel

committing generated files ... in a git repo is typically a big no-no.

Sure.

But we've got dueling no-nos here.

kivikakk has confirmed that GH do not sandbox doc rendering so they need compelling assurance that a renderer is safe. This was as predicted for public repos in general and discussed in 2009 by Damian and Moritz et al in the comments I've already excerpted and linked above.

Do you accept we have a fundamental problem? First, that Pod6 is Pod6 in the sense discussed in that 2009 thread. And second that modifying Rakudo, or writing a rendering module, or writing a custom parser, doesn't solve the problem either?

(especially that amount of generated files)

I'm not suggesting we commit any particular amount of files. I'm suggesting we commit some unless we're willing to accept the status quo which is that pod6 doc is rendered as ordinary unformatted text.

@JJ

if you're generating .md, GitHub is going to render it for you anyway.

Exactly. That's why I'm suggesting it.

Only way out I see is to generate a module that renders pod6 via an independent pod6 grammar (which, as @AlexDaniel says, could be actually overkill) or some other method.

The issue, aiui, is that they need a safe solution, one they trust.

That in turn seems largely incompatible with us cooking up our own renderer unless we use some grammar or regex driven tool they already consider safe and drop support for Pod6 features too complex to render with that tool. Am I wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@JJ @kivikakk @finanalyst @tbrowder @raiph @AlexDaniel and others