|
| 1 | +# Search categories |
| 2 | + |
| 3 | +The Pod6 markup language provides a mechanism to "index" text via the `X` formatting code. |
| 4 | +The code consists of the optional text part that is formatted into the document if given |
| 5 | +and a number of obligatory, possibly multi-level (separated with a comma), |
| 6 | +index entries separated with a semicolon: |
| 7 | + |
| 8 | +``` |
| 9 | +# Common |
| 10 | +X<text|entry> |
| 11 | +# Variations |
| 12 | +X<|entry-level-1, entry-level-2> |
| 13 | +X<|entry1;entry-level1, entry-level2;entry2> |
| 14 | +``` |
| 15 | + |
| 16 | +The [documentation for Raku](https://docs.raku.org) website supports a search feature: |
| 17 | +the user can look up terms by text to navigate to relevant documentation pieces describing |
| 18 | +the term. |
| 19 | + |
| 20 | +On the website the search results (a set of search entries) are presented |
| 21 | +as a list of items divided into *categories*. |
| 22 | + |
| 23 | +Both forming a set of the search entries for the website and assigning |
| 24 | +categories to entries is based upon the Documentable module code which |
| 25 | +extracts the search anchors from the index entries, headings and pages. |
| 26 | + |
| 27 | +At the time Documentable module was written, a lot of legacy code |
| 28 | +was simply migrated into the module and as a result the existing |
| 29 | +system for the indexing remains to be unpredictable and bizarre |
| 30 | +to the documentation writer. |
| 31 | + |
| 32 | +Currently, in Documentable there are 6 (six) different code paths where the |
| 33 | +category is set for different cases of index entries, headings and pages, |
| 34 | +and each has its own arbitrary way of deciding the category. |
| 35 | + |
| 36 | +More so, in the past, as the documentation emerged with no guidelines |
| 37 | +for indexing entries at all, people had to come up with index formatting |
| 38 | +in an ad-hoc manner, sometimes confusing its syntax in different ways |
| 39 | +and as the result the state of indexed entries and usability of the search suffers. |
| 40 | + |
| 41 | +This document describes a solution chosen for the Raku language |
| 42 | +documentation sources to solve its specific range of issues in this |
| 43 | +particular field and may or may not apply to other usages of Pod6. |
| 44 | + |
| 45 | +# Currently existing ways to categorize search entries |
| 46 | + |
| 47 | +Current Raku documentation organization layout consists of six types of pages, |
| 48 | +these include primary pages: |
| 49 | + |
| 50 | +* Explaining various aspects of the language itself, so assigned kind `Language` |
| 51 | +* Describing existing types included in the language, so assigned kind `Type` |
| 52 | +* Describing various topics around working with the language, so assigned kind `Program` |
| 53 | + |
| 54 | +As well as secondary pages, generated based on the primary pages: |
| 55 | + |
| 56 | +* Explaining routines (subroutines, methods, submethods) provided, so assigned kind `Routine` |
| 57 | +* Explaining syntax, so assigned kind `Syntax` |
| 58 | +* Describing various concepts the documentation refers to, so assigned kind `Reference` |
| 59 | + |
| 60 | +To assign a category to a search item, these ways currently exist: |
| 61 | + |
| 62 | +* For pages of kind `Type`, `Language`, `Program` pages depend on the page subkind |
| 63 | +* For secondary, generated documentation pages (kind `Routine`, `Syntax`) try |
| 64 | + to guess a category based on a subkind or simply take the first page subkind available |
| 65 | + as one |
| 66 | +* For entries of kind `Reference` simply take the Kind as a category |
| 67 | +* For a Pod6 heading with `X` formatting code inside use the first |
| 68 | + entry part as the category (so for `=head1 X<text|entry1,entry2` an item |
| 69 | + of category `entry1` will be created of hardcoded kind `Syntax`) |
| 70 | +* `X` formatting codes are also indexed for the second time, where a |
| 71 | + dubious code path instead of calculating the category uses |
| 72 | + a list of subkinds instead (which is reduced to just the first item down the line, |
| 73 | + which is most likely to become `'reference'`) |
| 74 | +* Headings are parsed for items like `variable`, `token`, `sub`, `method` etc. |
| 75 | + and for matching headers search entries are created automatically with |
| 76 | + categories being simply hardcoded in the grammar's actions. |
| 77 | + |
| 78 | +Some of the code paths above simply hardcode some arbitrary categorization |
| 79 | +while others are driven by the sources (the content of indexing formatting code itself), |
| 80 | +which means that if e.g. a contributor makes a mistake and writes: |
| 81 | + |
| 82 | +``` |
| 83 | +X<text|entry term,category> |
| 84 | +``` |
| 85 | + |
| 86 | +Then the `category` term will be indexed under the `entry term` category, |
| 87 | +even if the search category `entry term` does not make any sense to the consumer. |
| 88 | + |
| 89 | +# Solution |
| 90 | + |
| 91 | +To address the issues described above, I propose to follow these steps: |
| 92 | + |
| 93 | +1. Introduce strict guidelines for indexing. |
| 94 | + |
| 95 | +To specify an index entry, the `X<>` markup code is restricted to be one of the |
| 96 | +following forms: |
| 97 | + |
| 98 | +``` |
| 99 | +X<|$category,term> |
| 100 | +X<text|$category,term> |
| 101 | +X<text|$category-1,term-1;...;$category-N,term-N> |
| 102 | +``` |
| 103 | + |
| 104 | +where the `$category-1`, ... `$category-N` variables *explicitly* set the category |
| 105 | +for the corresponding term indexed and strictly refer to one of the supported |
| 106 | +categories described in this document (including possible future extensions), see the next step. |
| 107 | + |
| 108 | +Sticking to the syntax described here implies two important things: |
| 109 | + |
| 110 | +* No implicit category setting (setting a category manually becomes mandatory) |
| 111 | +* Sub-categories (e.g. `X<|Cat,Sub1,Sub2,Term>`) are forbidden |
| 112 | + |
| 113 | +Please, do note that in the example above names such as `$category`, `$category-1` are |
| 114 | +placeholders for an actual category name. |
| 115 | + |
| 116 | +Valid examples are: |
| 117 | + |
| 118 | +``` |
| 119 | +X<|Syntax,does> |
| 120 | +X<|Language,\ (container binding)> |
| 121 | +X<Subroutines|Syntax,sub> |
| 122 | +X<|Variables,$*PID> |
| 123 | +X<Automatic signatures|Variables,@_;Variables,%_> |
| 124 | +X<Typing|Language,typed array;Syntax,[ ] (typed array)> |
| 125 | +X<Attributes|Language,Attribute;Foreign,Property;Foreign,Member;Foreign,Slot> |
| 126 | +``` |
| 127 | + |
| 128 | +2. Establish a list of supported search categories, see Appendix A for the suggested one. |
| 129 | + |
| 130 | +3. Write a test for the documentation sources to gather all present index entries |
| 131 | + and check them against the approved categories list. The test passes when |
| 132 | + all index entries adhere to the approved categories. |
| 133 | + |
| 134 | +4. Fix the `Documentable` module indexing bits to make them both clearer |
| 135 | + and adhere to the scheme suggested in this document. |
| 136 | + |
| 137 | +5. Gradually adapt current documentation index entries to pass both tests. |
| 138 | + |
| 139 | + |
| 140 | +# Outcomes |
| 141 | + |
| 142 | +* An organized list of categories makes it easier for the user to understand |
| 143 | +* A solid state of search categories makes it easier for the contributor to index items into correct |
| 144 | + categories |
| 145 | +* Possible separate documentation search frontends get a much cleaner set of categories to search by |
| 146 | + |
| 147 | +# Appendix A: Search Categories |
| 148 | + |
| 149 | +This list is made with a couple of styling rules in mind, those are: |
| 150 | + |
| 151 | +* Titlecase over lowercase (`Subroutines` over `routines`) |
| 152 | +* Plural over singular (`Subroutines` over `routine`) |
| 153 | +* Categorize both Raku language standard library API (`Types`, `Subroutines`) as well |
| 154 | + as language-related topics and terms (`Language`, `Syntax`) grouped |
| 155 | + |
| 156 | +The existing as for 05.01.2021 ad-hoc list of search categories is: |
| 157 | + |
| 158 | +- [ ] `class` |
| 159 | +- [ ] `role` |
| 160 | +- [ ] `enum` |
| 161 | +- [ ] `module` |
| 162 | +- [ ] `Language` |
| 163 | +- [ ] `programs` |
| 164 | +- [ ] `sub` |
| 165 | +- [ ] `prefix` |
| 166 | +- [ ] `listop` |
| 167 | +- [ ] `infix` |
| 168 | +- [ ] `Routine` |
| 169 | +- [ ] `postfix` |
| 170 | +- [ ] `postcircumfix` |
| 171 | +- [ ] `method` |
| 172 | +- [ ] `submethod` |
| 173 | +- [ ] `routine` |
| 174 | +- [ ] `trait` |
| 175 | +- [ ] `term` |
| 176 | +- [ ] `twigil` |
| 177 | +- [ ] `variable` |
| 178 | +- [ ] `syntax` |
| 179 | +- [ ] `regex` |
| 180 | +- [ ] `regex quantifier` |
| 181 | +- [ ] `parameter` |
| 182 | +- [ ] `quote` |
| 183 | +- [ ] `matching adverb` |
| 184 | +- [ ] `regex adverb` |
| 185 | +- [ ] `substitution adverb` |
| 186 | +- [ ] `hyper` |
| 187 | +- [ ] `Phasers` |
| 188 | +- [ ] `Asynchronous Phasers` |
| 189 | +- [ ] `Python` |
| 190 | +- [ ] `constant` |
| 191 | +- [ ] `hash (Basics)` |
| 192 | +- [ ] `rakudoc` |
| 193 | +- [ ] `control flow` |
| 194 | +- [ ] `:sym<>` |
| 195 | +- [ ] `scalar (Basics)` |
| 196 | +- [ ] `statement (Basics)` |
| 197 | +- [ ] `string literal (Basics)` |
| 198 | +- [ ] `TOP` |
| 199 | +- [ ] `topic variable (Basics)` |
| 200 | +- [ ] `variable interpolation (Basics)` |
| 201 | +- [ ] `declarator` |
| 202 | +- [ ] `:cached` |
| 203 | +- [ ] `eager (statement prefix)` |
| 204 | +- [ ] `gather (statement prefix)` |
| 205 | +- [ ] `identifier` |
| 206 | +- [ ] `classes` |
| 207 | +- [ ] `lazy (statement prefix)` |
| 208 | +- [ ] `macros` |
| 209 | +- [ ] `pack` |
| 210 | +- [ ] `react (statement prefix)` |
| 211 | +- [ ] `sink (statement prefix)` |
| 212 | +- [ ] `supply (statement prefix)` |
| 213 | +- [ ] `<sym>` |
| 214 | +- [ ] `->` |
| 215 | +- [ ] `is default (Variable)` |
| 216 | +- [ ] `try (statement prefix)` |
| 217 | +- [ ] `with orwith without` |
| 218 | +- [ ] `Reference` |
| 219 | + |
| 220 | +Most of them are absorbed during solution's step 3 into the new standard list. |
| 221 | + |
| 222 | +The standard list is: |
| 223 | + |
| 224 | +- [ ] `Types` |
| 225 | +- [ ] `Modules` |
| 226 | +- [ ] `Routines` (a common category for something existing as a method and a subroutine) |
| 227 | +- [ ] `Subroutines` |
| 228 | +- [ ] `Methods` |
| 229 | + |
| 230 | +- [ ] `Terms` |
| 231 | +- [ ] `Adverbs` |
| 232 | +- [ ] `Traits` |
| 233 | +- [ ] `Phasers` |
| 234 | +- [ ] `Asynchronous Phasers` |
| 235 | +- [ ] `Pragmas` |
| 236 | +- [ ] `Variables` |
| 237 | + |
| 238 | +- [ ] `Control flow` (everything related to control flow) |
| 239 | +- [ ] `Regexes` (everything related to regex) |
| 240 | + |
| 241 | +- [ ] `Operators` (a common category for something existing as an operator with different application) |
| 242 | +- [ ] `Listop operators` |
| 243 | +- [ ] `Infix operators` |
| 244 | +- [ ] `Metaoperators` |
| 245 | +- [ ] `Postfix operators` |
| 246 | +- [ ] `Prefix operators` |
| 247 | +- [ ] `Circumfix operators` |
| 248 | +- [ ] `Postcircumfix operators` |
| 249 | + |
| 250 | +- [ ] `Tutorial` |
| 251 | +- [ ] `Foreign` (for terms from other languages and migration guides) |
| 252 | +- [ ] `Syntax` (legacy, various bits of language syntax explained at meta-level) |
| 253 | +- [ ] `Reference` (legacy, default category for general reference) |
| 254 | +- [ ] `Language` (legacy, language-related topics) |
| 255 | +- [ ] `Programs` (legacy, program writing-related topics) |
| 256 | + |
| 257 | +# Appendix B: Updates to the list of supported search term categories |
| 258 | + |
| 259 | +The suggested list of categories is proven to be able to cover all |
| 260 | +existing index entries for the existing language documentation at the |
| 261 | +time of writing. In case of issues arising, updating it is possible |
| 262 | +by starting a discussion with other maintainers and providing the |
| 263 | +reasoning behind the change for one of the reasons: |
| 264 | + |
| 265 | +* The current list does not include an important category which |
| 266 | + absolutely does not fit into any of the existing ones |
| 267 | +* The current list contains a category hindering the understanding of |
| 268 | + the language in any way |
| 269 | +* The current list creates cases where name clashing happens. |
| 270 | + For example, say the `Asynchronous Phasers` category is not stated |
| 271 | + and then both `QUIT` (asynchronous phaser) and `QUIT` ("normal" phaser) |
| 272 | + meaning two different things fall into the same category `Phasers` |
| 273 | + and result in a confusion |
| 274 | + |
| 275 | +But not for one of the following reasons: |
| 276 | + |
| 277 | +* A matter of style or preferences (e.g. `Subroutine` vs `Subroutines` |
| 278 | + or `routine`) |
| 279 | +* Overspecializing categories (e.g. splitting `Infix operators` into |
| 280 | + `Set infix operators`, `Compare infix operators` etc.) |
0 commit comments