Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add page categorization (or concept tagging) feature #436

Open
virtuous-sloth opened this issue Jan 11, 2023 · 8 comments
Open

Add page categorization (or concept tagging) feature #436

virtuous-sloth opened this issue Jan 11, 2023 · 8 comments

Comments

@virtuous-sloth
Copy link

virtuous-sloth commented Jan 11, 2023

This is a non-trivial feature request; implement MediaWiki's Categories functionality which TrueWiki also implements. This would provide an alternative, more flexible and more dynamic indexing of pages in addition to using directory structures with breadcrumbs.

The advantage of categories over directory structure is that one can start page categorization arbitrarily by simply starting to assign arbitrary category names to pages along multiple categorization dimensions and find structure later; category pages themselves can be categorized, resulting in a hierarchical category structure, if desired, with subcategories. Also, pages can belong to multiple categories and therefore belong multiple category hierarchies (like dimensions in a categorization space), which is not possible with directory structure (which would still be available for one particular/different categorization).

I assume YAML metadata could be used for the actual categorization of a page, with the actual categories (e.g. Foo, Bar, and 'Some Category' in the sample YAML metadata below) being an isolated name space with as little restriction as possible (strings, basically) with no relation to any other naming, including directories.

However, the names of the categories should be restricted to valid page names so that they themselves can be optionally instantiated with the custom page content being rendered above the calculated index of categorized pages below. The custom content would also include the category page having the ability to have it's own categorization metadata in it's YAML, essentially allowing the creation of category hierarchies, i.e. sub-categorization.

The namespace storage isolation happens in MediaWiki using a namespace field in their database. In terms of url paths, normal pages have an implicit Page: namespace which is not part of the path, but, uploaded files having a File: namespace, templates having a Template: namespace and categories having a Category: namespace, among others use the Namepace: prefix to the page name to access the specialized content.

TrueWiki, which is similar to Gollum in using git as a backing store, solves the storage problem by including the namespace in the directory path. The top two directory levels of the git repository are for language (e.g. /en/, /fr/) and namespace (/en/Page/, /fr/File/, etc.).

Gollum currently does not care about language and has default implicit page namespace (i.e. pages are any file with a markup path extension from the root on down). But is does effectively map the File: namespace to /uploads/.

In this spirit, category pages, if customized, would be instantiated in a sibling directory to /uploads/, /categories/. If the user marks a page's YAML with a category named Foo then that category would have a red (not-yet-created) /categories/Foo absolute path link which would render a default category page (with a page title of 'Category: Foo") and a dynamically-rendered list of all the pages in that category by default but allow the user to create a custom portion of the page which would be stored in /categories/Foo.md file, for example. The custom content would render between the 'Category: Foo' title and the dynamically-rendered index of all pages in that category.

This also would include a mandatory hard-coded category called 'Categories' which would render at the '/categories/Categories' url absolute path and allow for customization in a file at /categories/Categories.md. This is hard-coded in the sense that all category pages which render at '/categories/Some Category' path would implicitly have a metadata tag of "category: Categories" regardless of whether there was a file '/categories/Some Category.md' with or without YAML frontmatter 'category: Categories' or 'categories: [Categories, Foo, Bar].

Here are some sample options for the YAML metadata:

categories:
  - Foo
  - Bar
  - 'Some Category'

or

categories: [Foo, Bar, 'Some Category']

or

category: Foo
category: Bar
category: 'Some Category'

Page rendering would include category membership much like breadcrumbs currently are, but at the bottom of the page instead of at the top and as a list instead of as a breadcrumb hierarchy. That is, like what MediaWiki does. However MediaWiki does go one step further and instead of rendering subcategories as simple links to the subcategory page, it renders them using a dynamic button that allows the pages in the subcategory to be revealed in place in the parent category page, much like a book index might show sub-categorized pages.

Since this would provide an indexing function, the category index would have to generated on startup and stored in memory, which might cause performance problems, or persisted to a file and cached in memory, which adds complexity (where to store it, how to manage corruption and synchronization between the index file and the categorization source of truth, the YAML metadata).

@virtuous-sloth
Copy link
Author

I think my above functional specification may not be clearest, so I assume improving it would be the first thing (I'd have) to do. It reads better in reference to the MediaWiki and TrueWiki specifications, so if we could avoid duplication and continue use them by reference, that would he helpful, I hope.

Although this feature request would be non-trivial, I do think it has the nice property of being fairly easy to isolate. Gollum already supports the YAML metadata with no restrictions, to my understanding. It already has the breadcrumbs as an example of boilerplate content rendered on every page.

@guillaume-d
Copy link

Hi!

What you initially said makes perfect sense to me and I am pretty much on the same page as you! 😄

I implemented a very early prototype a few months more or less along these lines, and finally took the time to minimally clean it up and publish it.

Some notes/caveats:

I use the term "tag" and not category as it seems to be more standard, see below.

This is and will stay Markdown-only unless someone else helps and maintains that part.

For tags I use YAML metadata.
Tag names must also be valid wiki page names (because each tag can also have a normal page), but are not otherwise limited.

There is no new concept of (tag) namespace, although the lists of tagged pages are stored for performance as *..json (sic!) files alongside the *.md files, but that's more to avoid file name collisions than for conceptual namespacing really.

The list of tagged pages is never displayed automatically anywhere for now, a Gollum custom macro must be invoked everywhere you want to have it displayed.
Accordingly there is no automatically generated tag pages, although I also agree it would be nice to have.

YAML metadata and thus tag names are still rendered as ugly and useless plain text (unchanged from Gollum's default), although that would be my next TODO.
Hence of course some server-side styling (or Javascript code?) to color not yet created tags is missing.
Fancy stuff like breadcrumbs and foldable subtag lists would be nice (given much more time).

Not thought about a tags tag yet, my implementation seems not to need it, but it might help as a marker to collect all intended tags (those that are not spelling mistakes of real ones that is).

I decide to use tags in YAML and not categories as it already seems to have become quite standardized, see some research at https://gitlab.com/iguillaumed/playground/tagging/-/blob/main/tags.md
Is your 3rd YAML tag syntax proposal with repeated key name really valid YAML? That would be nice and my preferred syntax then!

As regards performance and other needed work, lots of things are still missing.
The most glaring omission is that tags deleted from pages are not deleted from the index pages yet.
And at least there should be a full reindexing command, if only as a 1st inefficient synchronization implementation.
Patches to the Gollum code itself may not be needed for what I want to achieve, also I still have to test how the current search implementation reacts to lots of YAML frontmatter...

So if you are interested, you can try and understand what I did there (it is not much code yet): https://gitlab.com/iguillaumed/playground/tagging-example/

@guillaume-d
Copy link

See also the earlier (2015) Gollum issue which among other things outlines a fully manual approach: gollum/gollum#1058 (comment).
It might work for you if nothing else does, as it offers the most flexibility.

@guillaume-d
Copy link

Hence of course some server-side styling (or Javascript code?) to color not yet created tags is missing.

Actually, rendering broken internal links in general (or "absent" pages as gollum-lib names it) is only supported when using the Gollum-specific double-bracketed link syntax (for example [[NonExistingPage]] ), see #177 (comment)! :-/
It seems to be implemented in https://github.com/gollum/gollum-lib/blob/master/lib/gollum-lib/filter/tags.rb.

As I only plan to support (standard) Markdown syntax, and as in my design tags also double as normal pages, only Javascript code seems possible for me. On the brighter side, it means that again no modification to Gollum will be needed, but just a custom.js instead.

@guillaume-d
Copy link

YAML metadata and thus tag names are still rendered as ugly and useless plain text (unchanged from Gollum's default), although that would be my next TODO.

This is now implemented in the latest version of my PoC, see https://gitlab.com/iguillaumed/playground/tagging/-/blob/v0.0.3/CHANGELOG.md#003-2023-02-19: all present pages' tags are now hyperlinked to the associated tag pages.
This was done using customized views, modifying the small part of wiki_content.mustache related to metadata.
The styling is very crude for now but could be made nicer using a custom.css, some additional CSS classes may be required though.

@guillaume-d
Copy link

[...] all present pages' tags are now hyperlinked to the associated tag pages. This was done using customized views, modifying the small part of wiki_content.mustache related to metadata. [...]

I just implemented a more flexible (*) Gollum custom macro for that in addition to the above in the latest version of my PoC, see https://gitlab.com/iguillaumed/playground/tagging/-/blob/v0.0.4/CHANGELOG.md#004-2023-03-15.
The old mechanism still works for now but is now hidden behind Gollum's display_metadata flag in case someone still find it useful.

(*) Now the hyperlinked tag list can be positioned anywhere in the header and/or footer and/or sidebar and its display logic is unconstrained contrary to the powerless Mustache templates...

From there my next TODOs are probably: (re)indexing fixes/MVP-features, search tuning and cosmetic nice-to-have JS ergonomics (in that order).

@virtuous-sloth
Copy link
Author

Thanks for all your work, Guillaume! I will hopefully try to test this out in the next couple weeks.

@guillaume-d
Copy link

Fancy stuff like breadcrumbs and foldable subtag lists would be nice (given much more time).

For those interested I added minimal support for subtags (JS-less non-foldable tag tree) in the latest version of my PoC and fixed a few things, see https://gitlab.com/iguillaumed/playground/tagging/-/blob/v0.0.5/CHANGELOG.md#005-2023-04-10.

Please note another limitation I had not documented yet here or elsewhere: for the moment subdirectories are not supported and do not work correctly: I want my users and myself to use tags for structuring.
I might (need to) fix that myself at some point but patches are welcome!
Also I am not sure I want tags to lie elsewhere than at the root anyway.

Not thought about a tags tag yet, my implementation seems not to need it, but it might help as a marker to collect all intended tags (those that are not spelling mistakes of real ones that is).

Doing the above I realized such a tag also allows to collect all hierarchies defined by a tag tree, although one could even define specialized tags so as to show only some of them.

From there my next TODOs are probably: (re)indexing fixes/MVP-features, search tuning and cosmetic nice-to-have JS ergonomics (in that order).

I now have subtags in a way which satisfies my current needs and I can live without other bells and whistles, so once (re)indexing and search is done as explained above I will probably declare victory 😀 and move on to other things...
In the meantime feedback is welcome though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants