Skip to content

Commit ecf3097

Browse files
authored
Merge pull request #271 from Raku/solution-250
Standardize documentation search categories (solution for #250)
2 parents 9ab1462 + ddfdfb0 commit ecf3097

File tree

1 file changed

+280
-0
lines changed

1 file changed

+280
-0
lines changed
Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
# Search categories
2+
3+
The Pod6 markup language provides a mechanism to "index" text via the `X` formatting code.
4+
The code consists of the optional text part that is formatted into the document if given
5+
and a number of obligatory, possibly multi-level (separated with a comma),
6+
index entries separated with a semicolon:
7+
8+
```
9+
# Common
10+
X<text|entry>
11+
# Variations
12+
X<|entry-level-1, entry-level-2>
13+
X<|entry1;entry-level1, entry-level2;entry2>
14+
```
15+
16+
The [documentation for Raku](https://docs.raku.org) website supports a search feature:
17+
the user can look up terms by text to navigate to relevant documentation pieces describing
18+
the term.
19+
20+
On the website the search results (a set of search entries) are presented
21+
as a list of items divided into *categories*.
22+
23+
Both forming a set of the search entries for the website and assigning
24+
categories to entries is based upon the Documentable module code which
25+
extracts the search anchors from the index entries, headings and pages.
26+
27+
At the time Documentable module was written, a lot of legacy code
28+
was simply migrated into the module and as a result the existing
29+
system for the indexing remains to be unpredictable and bizarre
30+
to the documentation writer.
31+
32+
Currently, in Documentable there are 6 (six) different code paths where the
33+
category is set for different cases of index entries, headings and pages,
34+
and each has its own arbitrary way of deciding the category.
35+
36+
More so, in the past, as the documentation emerged with no guidelines
37+
for indexing entries at all, people had to come up with index formatting
38+
in an ad-hoc manner, sometimes confusing its syntax in different ways
39+
and as the result the state of indexed entries and usability of the search suffers.
40+
41+
This document describes a solution chosen for the Raku language
42+
documentation sources to solve its specific range of issues in this
43+
particular field and may or may not apply to other usages of Pod6.
44+
45+
# Currently existing ways to categorize search entries
46+
47+
Current Raku documentation organization layout consists of six types of pages,
48+
these include primary pages:
49+
50+
* Explaining various aspects of the language itself, so assigned kind `Language`
51+
* Describing existing types included in the language, so assigned kind `Type`
52+
* Describing various topics around working with the language, so assigned kind `Program`
53+
54+
As well as secondary pages, generated based on the primary pages:
55+
56+
* Explaining routines (subroutines, methods, submethods) provided, so assigned kind `Routine`
57+
* Explaining syntax, so assigned kind `Syntax`
58+
* Describing various concepts the documentation refers to, so assigned kind `Reference`
59+
60+
To assign a category to a search item, these ways currently exist:
61+
62+
* For pages of kind `Type`, `Language`, `Program` pages depend on the page subkind
63+
* For secondary, generated documentation pages (kind `Routine`, `Syntax`) try
64+
to guess a category based on a subkind or simply take the first page subkind available
65+
as one
66+
* For entries of kind `Reference` simply take the Kind as a category
67+
* For a Pod6 heading with `X` formatting code inside use the first
68+
entry part as the category (so for `=head1 X<text|entry1,entry2` an item
69+
of category `entry1` will be created of hardcoded kind `Syntax`)
70+
* `X` formatting codes are also indexed for the second time, where a
71+
dubious code path instead of calculating the category uses
72+
a list of subkinds instead (which is reduced to just the first item down the line,
73+
which is most likely to become `'reference'`)
74+
* Headings are parsed for items like `variable`, `token`, `sub`, `method` etc.
75+
and for matching headers search entries are created automatically with
76+
categories being simply hardcoded in the grammar's actions.
77+
78+
Some of the code paths above simply hardcode some arbitrary categorization
79+
while others are driven by the sources (the content of indexing formatting code itself),
80+
which means that if e.g. a contributor makes a mistake and writes:
81+
82+
```
83+
X<text|entry term,category>
84+
```
85+
86+
Then the `category` term will be indexed under the `entry term` category,
87+
even if the search category `entry term` does not make any sense to the consumer.
88+
89+
# Solution
90+
91+
To address the issues described above, I propose to follow these steps:
92+
93+
1. Introduce strict guidelines for indexing.
94+
95+
To specify an index entry, the `X<>` markup code is restricted to be one of the
96+
following forms:
97+
98+
```
99+
X<|$category,term>
100+
X<text|$category,term>
101+
X<text|$category-1,term-1;...;$category-N,term-N>
102+
```
103+
104+
where the `$category-1`, ... `$category-N` variables *explicitly* set the category
105+
for the corresponding term indexed and strictly refer to one of the supported
106+
categories described in this document (including possible future extensions), see the next step.
107+
108+
Sticking to the syntax described here implies two important things:
109+
110+
* No implicit category setting (setting a category manually becomes mandatory)
111+
* Sub-categories (e.g. `X<|Cat,Sub1,Sub2,Term>`) are forbidden
112+
113+
Please, do note that in the example above names such as `$category`, `$category-1` are
114+
placeholders for an actual category name.
115+
116+
Valid examples are:
117+
118+
```
119+
X<|Syntax,does>
120+
X<|Language,\ (container binding)>
121+
X<Subroutines|Syntax,sub>
122+
X<|Variables,$*PID>
123+
X<Automatic signatures|Variables,@_;Variables,%_>
124+
X<Typing|Language,typed array;Syntax,[ ] (typed array)>
125+
X<Attributes|Language,Attribute;Foreign,Property;Foreign,Member;Foreign,Slot>
126+
```
127+
128+
2. Establish a list of supported search categories, see Appendix A for the suggested one.
129+
130+
3. Write a test for the documentation sources to gather all present index entries
131+
and check them against the approved categories list. The test passes when
132+
all index entries adhere to the approved categories.
133+
134+
4. Fix the `Documentable` module indexing bits to make them both clearer
135+
and adhere to the scheme suggested in this document.
136+
137+
5. Gradually adapt current documentation index entries to pass both tests.
138+
139+
140+
# Outcomes
141+
142+
* An organized list of categories makes it easier for the user to understand
143+
* A solid state of search categories makes it easier for the contributor to index items into correct
144+
categories
145+
* Possible separate documentation search frontends get a much cleaner set of categories to search by
146+
147+
# Appendix A: Search Categories
148+
149+
This list is made with a couple of styling rules in mind, those are:
150+
151+
* Titlecase over lowercase (`Subroutines` over `routines`)
152+
* Plural over singular (`Subroutines` over `routine`)
153+
* Categorize both Raku language standard library API (`Types`, `Subroutines`) as well
154+
as language-related topics and terms (`Language`, `Syntax`) grouped
155+
156+
The existing as for 05.01.2021 ad-hoc list of search categories is:
157+
158+
- [ ] `class`
159+
- [ ] `role`
160+
- [ ] `enum`
161+
- [ ] `module`
162+
- [ ] `Language`
163+
- [ ] `programs`
164+
- [ ] `sub`
165+
- [ ] `prefix`
166+
- [ ] `listop`
167+
- [ ] `infix`
168+
- [ ] `Routine`
169+
- [ ] `postfix`
170+
- [ ] `postcircumfix`
171+
- [ ] `method`
172+
- [ ] `submethod`
173+
- [ ] `routine`
174+
- [ ] `trait`
175+
- [ ] `term`
176+
- [ ] `twigil`
177+
- [ ] `variable`
178+
- [ ] `syntax`
179+
- [ ] `regex`
180+
- [ ] `regex quantifier`
181+
- [ ] `parameter`
182+
- [ ] `quote`
183+
- [ ] `matching adverb`
184+
- [ ] `regex adverb`
185+
- [ ] `substitution adverb`
186+
- [ ] `hyper`
187+
- [ ] `Phasers`
188+
- [ ] `Asynchronous Phasers`
189+
- [ ] `Python`
190+
- [ ] `constant`
191+
- [ ] `hash (Basics)`
192+
- [ ] `rakudoc`
193+
- [ ] `control flow`
194+
- [ ] `:sym<>`
195+
- [ ] `scalar (Basics)`
196+
- [ ] `statement (Basics)`
197+
- [ ] `string literal (Basics)`
198+
- [ ] `TOP`
199+
- [ ] `topic variable (Basics)`
200+
- [ ] `variable interpolation (Basics)`
201+
- [ ] `declarator`
202+
- [ ] `:cached`
203+
- [ ] `eager (statement prefix)`
204+
- [ ] `gather (statement prefix)`
205+
- [ ] `identifier`
206+
- [ ] `classes`
207+
- [ ] `lazy (statement prefix)`
208+
- [ ] `macros`
209+
- [ ] `pack`
210+
- [ ] `react (statement prefix)`
211+
- [ ] `sink (statement prefix)`
212+
- [ ] `supply (statement prefix)`
213+
- [ ] `<sym>`
214+
- [ ] `->`
215+
- [ ] `is default (Variable)`
216+
- [ ] `try (statement prefix)`
217+
- [ ] `with orwith without`
218+
- [ ] `Reference`
219+
220+
Most of them are absorbed during solution's step 3 into the new standard list.
221+
222+
The standard list is:
223+
224+
- [ ] `Types`
225+
- [ ] `Modules`
226+
- [ ] `Routines` (a common category for something existing as a method and a subroutine)
227+
- [ ] `Subroutines`
228+
- [ ] `Methods`
229+
230+
- [ ] `Terms`
231+
- [ ] `Adverbs`
232+
- [ ] `Traits`
233+
- [ ] `Phasers`
234+
- [ ] `Asynchronous Phasers`
235+
- [ ] `Pragmas`
236+
- [ ] `Variables`
237+
238+
- [ ] `Control flow` (everything related to control flow)
239+
- [ ] `Regexes` (everything related to regex)
240+
241+
- [ ] `Operators` (a common category for something existing as an operator with different application)
242+
- [ ] `Listop operators`
243+
- [ ] `Infix operators`
244+
- [ ] `Metaoperators`
245+
- [ ] `Postfix operators`
246+
- [ ] `Prefix operators`
247+
- [ ] `Circumfix operators`
248+
- [ ] `Postcircumfix operators`
249+
250+
- [ ] `Tutorial`
251+
- [ ] `Foreign` (for terms from other languages and migration guides)
252+
- [ ] `Syntax` (legacy, various bits of language syntax explained at meta-level)
253+
- [ ] `Reference` (legacy, default category for general reference)
254+
- [ ] `Language` (legacy, language-related topics)
255+
- [ ] `Programs` (legacy, program writing-related topics)
256+
257+
# Appendix B: Updates to the list of supported search term categories
258+
259+
The suggested list of categories is proven to be able to cover all
260+
existing index entries for the existing language documentation at the
261+
time of writing. In case of issues arising, updating it is possible
262+
by starting a discussion with other maintainers and providing the
263+
reasoning behind the change for one of the reasons:
264+
265+
* The current list does not include an important category which
266+
absolutely does not fit into any of the existing ones
267+
* The current list contains a category hindering the understanding of
268+
the language in any way
269+
* The current list creates cases where name clashing happens.
270+
For example, say the `Asynchronous Phasers` category is not stated
271+
and then both `QUIT` (asynchronous phaser) and `QUIT` ("normal" phaser)
272+
meaning two different things fall into the same category `Phasers`
273+
and result in a confusion
274+
275+
But not for one of the following reasons:
276+
277+
* A matter of style or preferences (e.g. `Subroutine` vs `Subroutines`
278+
or `routine`)
279+
* Overspecializing categories (e.g. splitting `Infix operators` into
280+
`Set infix operators`, `Compare infix operators` etc.)

0 commit comments

Comments
 (0)