TOC generator for generated html #180

adamdruppe · 2015-12-22T19:02:59Z

This PR is not yet ready to be pulled, I just want to show what it does and see if @andralex feels it is useful to continue with.

Compile the new file with dmd generate_toc.d dom.d then you can run it on a downloaded copy of html from the website like first wget http://dlang.org/spec/declaration.html then ./generate_doc declaration.html and you will get this:

WARNING: no anchor for heading <h1>Declarations</h1>
WARNING: no anchor for heading <h3>Declaration Syntax</h3>
WARNING: no anchor for heading <h3>Void Initializations</h3>
WARNING: no anchor for heading <h3>Global and Static Initializers</h3>
WARNING: no anchor for heading <h3>Type Qualifiers vs. Storage Classes</h3>
<ol>
        <li><a href="#AutoDeclaration">Implicit Type Inference</a></li>
        <li><a href="#alias">Alias Declarations</a></li>
        <li><a href="#extern">Extern Declarations</a></li>
        <li><a href="#typeof">typeof</a></li>
</ol>

The warnings ought to be fixed in the source.... or I could make the program do that automatically to, that's trivial, the question is just what we want to do. We so badly need these sections to be easy to link to though! So something needs to be done.

Then the generated TOC is HTML suitable to be pasted right in. This program could also edit it in itself very easily too, provide a <div id="toc"></div> or something. Or, we could make an index or sitemap file with links to all the sections too.

As you can see in the source, dom.d makes postprocessing html really easy and there's tons of possibilities to improve the website by expanding on this idea.

adamdruppe · 2015-12-22T19:05:28Z

I bundled dom.d right here to avoid having an external dependency. Since it is just a single file anyway, it isn't a huge problem to drop it in here. And since it is reasonably stable, we probably won't hit any bugs... but if we do, I'll fix them upstream then copy the file over here again - KISS package management :)

CyberShadow · 2015-12-22T19:45:02Z

FWIW chmgen also does HTML parsing to some degree and warns about broken internal links.

andralex · 2016-01-12T13:57:41Z

Thanks! I think this is a neat idea, and having an official means of postprocessing our HTML opens many possibilities - hey, including perhaps cross-referencing and  insertion. @MartinNowak what do you think?

A few notes about tactics:

There is appeal about KISS package management but also we should be weary about unwittingly plopping 6KLOC of code into the project.
The code looks foreign compared to all other tools - style is very different etc. It's not that it's worse - I actually prefer Egyptians myself - but it's just different, creates a precedent, etc.
Generation should probably issue ddo(c|x) code, so we can control it further. The page would need more than a bare HTML list.
In the same spirit @CyberShadow in another PR, I think a great way to introduce a tool is together with a use of it, in this case to automate some page of the site or to generate a new neat page.

@adamdruppe how do you think we can address these?

adamdruppe · 2016-01-13T01:34:39Z

On Tue, Jan 12, 2016 at 05:57:44AM -0800, Andrei Alexandrescu wrote:

Thanks! I think this is a neat idea, and having an official means of postprocessing our HTML opens many possibilities - hey, including perhaps cross-referencing and  insertion. @MartinNowak what do you think?

Yeah, it could do that too, though cross-referencing is really more of a source thing. Doing it as a post processor is pretty leaky.

There is appeal about KISS package management but also we should be weary about unwittingly plopping 6KLOC of code into the project.

The code looks foreign compared to all other tools - style is very different etc. It's not that it's worse - I actually prefer Egyptians myself - but it's just different, creates a precedent, etc.

Maybe the tools could try using dub and dfmt. Dog food a little!

Generation should probably issue ddo(c|x) code, so we can control it further. The page would need more than a bare HTML list.

What do you have in mind?

andralex · 2016-01-13T19:22:19Z

What do you have in mind?

Nothing fancy, just generate a ddoc file with $(UL ...) and $(LI ...). Then we can put it in any shape by defining DDOC appropriately.

wilzbach · 2017-06-20T13:54:22Z

We so badly need these sections to be easy to link to though! So something needs to be done.

What's the state of this? What was the intended goal? Just a TOC or simply a Ddoc postprocessor?
In any case it seems like this PR is dead?

there's tons of possibilities to improve the website by expanding on this idea.

Yeah I have recently added footer navigation for the spec, which is really hard to do for such a trivial problem...

andralex · 2017-06-30T15:50:37Z

I'll leave this to @CyberShadow. I recall he had concerns about adding additional dependencies.

CyberShadow · 2017-06-30T15:58:09Z

Ouch, dom.d is 6K LOC. I'm not sure. Is an alternative approach an option?

Pre-process the DDoc instead of the HTML
Post-process the HTML with ugly but simple hacks, like regular expressions
A combination of the above (make DDoc generate HTML markup that's useful for post-processing HTML)
Wait until dlang.org is built with DDox (though I don't think anyone is working on this).

andralex · 2017-06-30T16:22:08Z

@CyberShadow the path of least resistance is generate ddoc with a different .ddoc macros file and then filter out chaff with sed.

wilzbach · 2017-07-04T20:17:48Z

Ouch, dom.d is 6K LOC. I'm not sure. Is an alternative approach an option?

Well, we could always use the ugly std.xml, e.g.:

void main(string[] args)
{
    import std.file, std.meta, std.stdio, std.typecons, std.xml;
    string s = readText(args[1]);

    alias TocEntry = Tuple!(string, "id", string, "name");

    TocEntry[] toc;
    auto xml = new DocumentParser(s);
    foreach (heading; AliasSeq!("h1", "h2", "h3", "h4", "h5"))
    {
        xml.onStartTag[heading] = (ElementParser parser)
        {
            TocEntry entry;

            parser.onStartTag["a"] = (ElementParser e) {
                if (auto v = "id" in e.tag.attr)
                    entry.id = *v;
            };
            parser.onText = (string s) { entry.name = s; };
            parser.parse();
            toc ~= entry;
        };
    }
    xml.parse();
    foreach (entry; toc)
        writefln(`<a href="#%s">%s</a>`, entry.id, entry.name);
}

Post-process the HTML with ugly but simple hacks, like regular expressions
A combination of the above (make DDoc generate HTML markup that's useful for post-processing HTML)

It's not that hard to include a DUB dependency nowadays, but this brings another series of problems with as e.g. I am pretty sure we don't want to maintain a XML library.

Wait until dlang.org is built with DDox (though I don't think anyone is working on this).

Yeah, not sure when/if this is going to happen.

I see these okayish solutions here:

use the ugly std.xml to transform the generated HTML
use sth. like footer_gen to generate the headings in Ddoc and store the result in git (the tool should be idempotent, s.t. it can be rerun to update all headings)

Probably improving footer_gen is a bit better

adamdruppe · 2017-07-05T01:44:12Z

On Tue, Jul 04, 2017 at 08:17:50PM +0000, Sebastian Wilzbach wrote: Well, we could always use the ugly `std.xml`, e.g.:

I don't think std.xml can read the broken html ddoc tends to produce.

It's not that hard to include a DUB dependency nowadays, but this brings another series of problems with as e.g. I am pretty sure we don't want to maintain a XML library.

dom.d is fairly stable and has built with several versions of dmd (so you could probably just keep the fork unmodified without issue), and besides, I use it heavily so you don't have to maintain it if you'd prefer to just keep up with me. While I don't believe in this approach any more, I'll still work with you on the html lib.

adamdruppe · 2017-07-05T01:49:33Z

On Fri, Jun 30, 2017 at 08:58:10AM -0700, Vladimir Panteleev wrote: - Post-process the HTML with ugly but simple hacks, like regular expressions - A combination of the above (make DDoc generate HTML markup that's useful for post-processing HTML)

You could probably also just make a null-defined macro with a name that is easily searchable to store some meta info and then grep for it.

wilzbach · 2018-01-02T19:12:02Z

Okay I went with improving the existing footer generation script and it doesn't seem to hard to parse the ddoc macros directly from their raw sources:

dlang/dlang.org#2043

So in this approach of parsing it directly from Ddoc, we couldn't have used your dom.d.

@adamdruppe I'm sorry that your work got wasted :/

TOC generator for generated html

6e89762

dlang-bot added the stalled label Jan 1, 2018

wilzbach closed this Jan 2, 2018

wilzbach mentioned this pull request Jan 7, 2018

Automatically generate TOC menu for the specification pages dlang/dlang.org#2043

Closed

Geod24 added PR.NeedsAttention A PR that is stalled/not mergeable anymore/abandoned and needs to be taken over and removed stalled labels Apr 8, 2020

Uh oh!

TOC generator for generated html #180

TOC generator for generated html #180

Uh oh!

Conversation

adamdruppe commented Dec 22, 2015

Uh oh!

adamdruppe commented Dec 22, 2015

Uh oh!

CyberShadow commented Dec 22, 2015

Uh oh!

andralex commented Jan 12, 2016

Uh oh!

adamdruppe commented Jan 13, 2016

Uh oh!

andralex commented Jan 13, 2016

Uh oh!

wilzbach commented Jun 20, 2017

Uh oh!

andralex commented Jun 30, 2017

Uh oh!

CyberShadow commented Jun 30, 2017

Uh oh!

andralex commented Jun 30, 2017

Uh oh!

wilzbach commented Jul 4, 2017

Uh oh!

adamdruppe commented Jul 5, 2017 via email

Uh oh!

adamdruppe commented Jul 5, 2017 via email

Uh oh!

wilzbach commented Jan 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants