-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code blocks: Support partial text highlighting #1478
Code blocks: Support partial text highlighting #1478
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work at getting this to work @ryoarmanda !
For the proposed syntax lineNumber[start:end]
, would it be better to have start
and end
refer to the index of a word in a line? I think this would reduce the work on the part of the user as they would not need to manually count the character indices.
What are your thoughts about this?
} | ||
|
||
const [boundStart, boundEnd] = this.bounds; | ||
const start = lineStart <= boundStart && boundStart <= lineEnd ? boundStart : lineStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put a parenthesis around the two conditions for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem
return data; | ||
}); | ||
|
||
if (shouldHighlight.every(v => v === true)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have .every(v => v)
if shouldHighlight
is an array of booleans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not all items in shouldHighlight
array is boolean. It is just an array to collate the return value as described in the beginning of the function, which may be true
, false
, or an array of two numbers. I explicitly used === true
and === false
to not include the array of two numbers (which is truthy in itself).
In hindsight maybe I should call this just highlightData
// Essentially, we have to change the text node to become a tag node | ||
|
||
node.children.forEach((child, idx) => { | ||
if (shouldHighlight[idx] === false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to have !shouldHighlight[idx]
if shouldHighlight
is an array of booleans?
const text = child.data; | ||
let newElement; | ||
|
||
if (shouldHighlight[idx] === true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be similar to the issue above.
Hmm I was working under the spec on the original discussion here. My two cents is, if the unit of highlighting is words rather than characters, we would have to do some extra work in determining which of these characters can be grouped up as words. Moreover, with how code is generally written, sometimes users may want to highlight beyond words, such as words continued with special characters (like |
Having to count characters is not ideal, but I guess we can anticipate users will do that in two passes i.e., uses a rough count in the first try and tweak the numbers based on the result. |
This can be a good addition as the easier version of the line-slice syntax, provided the section to be highlighted is not that long. The line-slice syntax can then be a more advanced-user variant where users can express highlights in a more concise way. In fact, we can build up this easier syntax on top of the line-slice one. But to (user-side) writing and (dev-side) parsing the rule can be quite tricky, there might be quotes present in the text to be highlighted, in which the user may have to manually escape this in order for the whole May need to put some proper thought into this, I might address it in a separate PR for now. |
True. But given it is an additional convenience feature only, we can specify under what conditions the simpler syntax should be used. In all other cases, the user should use the slice syntax instead. Hence, we can avoid dealing with those complex corner cases. |
Just now I found out a way to support the convenience syntax on top of the line-slice processing (behind the scenes, the former rule will be converted to the latter, before handing over to NodeProcessor). I might push it here after docs and tests updates. But yeah, looks like we have a hard restriction, for some reason we can't have the same type of quotes that specifies the whole So, something with |
How about sticking to positioning? we could introduce something like
but you can try removing the patch now as well as its no longer needed with #1403 it was introduced as we previously had multiple nunjucks passes to make |
Perhaps we can explore this syntax as well. I might define a word as just a sequence of non-whitespace characters in order to include special characters (e.g. However, specific portions of words text might not be able to be highlighted which I guess is to be expected as word-level highlight is inherently less fine than character-level. For example, if a user wants to highlight only the With this syntax, do you think we should keep the previously proposed one? |
I'm open to either, specifying the words is more flexible, but also more repititive and verbose. On a personal stance I would go with positions however, for consistency with the whole line-slice syntax. Also, since repitition is against the trend here =P. Any thoughts? @damithc @raysonkoh
Does backslash escapes |
@ryoarmanda what are the choices being considered here? I haven't been following the previous discussion. |
To sum up the discussion, we have two proposed syntax for word-highlighting. One is the line-part syntax
Another is the word variant of line-slice syntax
With their own pros and limitations, which of the two do you prefer to be supported? Or should we just support both at once? |
Why not support both but let IIRC, we used Sorry if these were discussed before. |
I think we can support
For |
Some additional concerns about the expected behavior of
|
My vote:
|
yup. character positions would be supported in either choice
I think we should stick with this (versus single colon for words). Its consistent with the existing
I'm open to supporting both as well (although it is a niche feature) (are we going with both?) If possible, this PR could be split into 4 commits (or prs) as well:
I'll be doing a couple things shortly as well to make the reviewing easier:
|
All done. Sorry, this should really have been done beforehand. 😓 diff for |
Currently in this PR there are already character positions and I am not sure if this is too much introduction of new features in one PR, should I break away the |
Both sounds fine. It all ties under one logical change (partial text highlighting), but is a little large as you mentioned. Let's split up the commits if sticking to the PR though. You can also use fixups (or just rebasing in your git ui) to address reviews later in new, separate commits, then squash them with the relevant commit once done) |
@@ -176,6 +176,142 @@ class NodeProcessor { | |||
cheerio(node).remove(); | |||
} | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move these to a separate file for a start
packages/core/src/utils/index.js
Outdated
@@ -9,6 +9,14 @@ const { | |||
markdownFileExts, | |||
} = require('../constants'); | |||
|
|||
const htmlUnescapedMapping = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about something like https://www.npmjs.com/package/he?
might become a little tedious to maintain the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mapping is taken from markdown-it
library, just reversed. I did this as in our codebase we use markdown-it/utils
function escapeHtml
for a lot of purposes, but that package didn't expose unescaping methods for HTML, so I took it upon myself to add a complementary function. Looks like it's going to be only that 5 entries though, as this function only aims to reverse the escapeHtml
function and that function only escapes those 5 entries.
Though if you feel we can add a new package to make everything concise I'm also okay with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, thanks for clarifying!
In that case let's stick with this then, we might not want to unescape things markdown-it did not escape as well.
On the same lines, should we put this somewhere in the lib/markdown-it folder? Also a comment to document why / where the mappings came from may help 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I agree. This is more related to operations involving markdown-it as well so it makes sense to put it in the lib/markdown-it folder. Though I can't actually expose the new function through the markdownIt
object in lib/markdown-it/index.js
, and I can't disturb the exports there unless I replace all imports in the project, so I will just make it available for manual import through lib/markdown-it/utils
.
It's the best middle ground I have, but I'm afraid it will somewhat be confusing as now we have two different modules for utilities on markdown-it, one from the library itself usually referred to as md.utils
, the other is the new utils
module. I might put an explanation on the top of our utils
module that this is a separate extension of the one from the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🙂
Just 1 / 2 edges cases and some minor suggestions for the code:
const highlightLinesInput = getAttributeAndDelete(token, 'highlight-lines'); | ||
let highlightRules = []; | ||
if (highlightLinesInput) { | ||
const highlightLines = highlightLinesInput.split(','); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this cause issues if the word variant contains ,
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this will also eat up the commas inside the brackets. I'll add a regex for a more robust split so that it can ignore the inner commas.
highlightRules.forEach((rule) => { | ||
// Note: authors provide line numbers based on the 'start-from' attribute if it exists, | ||
// so we need to shift line numbers back down to start at 0 | ||
rule.offsetLines(-startFromZeroBased); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about shifting this into the parsing / construction stage? so index.js
dosen't need to be concerned with accidentally (not) calling this method; we keep these concerns isolated there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I hadn't thought of this approach. I can take it one step further so that the result of parseRuleComponent
is always a properly defined rule component wrt the code block (i.e. already accounting for line offset, figuring out the actual bounds wrt the line, converting to word slices wrt the line, etc), or null if it's not valid.
So, no need to have intermediate properties such as isWordSlice
, linePart
, anymore.
} | ||
|
||
// Convert word variant of line-slice to char | ||
if (rule.hasWordSlice()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here and hasLinePart
the checks may not be needed as well, since you already have the guards in the convertPartsToSlices / convertWordSliceToCharSlice
methods
} | ||
|
||
const line = lines[comp.lineNumber - 1]; // line numbers are 1-based | ||
const { 1: content } = HighlightRule._splitCodeAndIndentation(line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could array destructing be used instead?
const { 1: content } = HighlightRule._splitCodeAndIndentation(line); | |
const [, content] = HighlightRule._splitCodeAndIndentation(line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar to earlier, could we move this (and the similar logic in convertWordSliceToCharSlice
) down into HighlightRuleComponent
construction?
So the all the heavy lifting for the component stays in HighlightRuleComponent
, HighlightRule
just facilitates accessing / using these as a whole 🙂
return new HighlightRule(components); | ||
} | ||
|
||
offsetLines(offset) { | ||
this.ruleComponents.forEach(comp => comp.offsetLineNumber(offset)); | ||
} | ||
|
||
convertPartsToSlices(lines) { | ||
if (!this.hasLinePart()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this guard may be unneeded as well since you have !comp.linePart
below
return; | ||
} | ||
|
||
codeNode.children.forEach((line) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codeNode.children.forEach((line) => { | |
codeNode.children.forEach((codeEl) => { |
* @returns {[number, boolean | [number, number]]} An array of two items. | ||
*/ | ||
function traverseLinePart(node, hlStart, hlEnd) { | ||
// Return value is an array of two items: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use /*...*/
for multiline comments per https://github.com/airbnb/javascript#comments
dosen't seem to be configured correctly in our eslint rules 🤔
// 2. Highlighting data to be used by the node's parent. It can be: | ||
// - true (ask to apply highlighting from parent) | ||
// - false (do not process this node further) | ||
// - array of two numbers (only for text nodes, inform parent to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about an object with named properties?
{
numCharTraversed: ...,
....
}
}); | ||
|
||
// Set the references accordingly | ||
node.children.forEach((child, idx) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I read this right, you could use (in the earlier foreach) cheerio(child).wrap('<span class="highlighted"></span>')
, so you don't have to fiddle with the lower level linking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this certainly is more concise if the text is undisturbed (like the case when highlightData[idx] === true
i.e. highlight spans the whole text), but so far I wasn't able to work a similar solution for the else
case (highlight only spans partially). The problem is that the case needs to break up the text, creating multiple elements/nodes in the process before being wrapped up by a span. I tried out something like cheerio(child).wrap('<span></span>'); cheerio(child).html(...)
and it doesn't work :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: I found out a way to make it work with cheerio, will implement this approach instead of manually assigning the references.
docs/userGuide/syntax/code.mbdf
Outdated
|
||
To highlight only the text portion of the line, you can just use the line numbers as is. | ||
You can specify the rules in many different ways, depending on how you want it to be. There are three main variants: | ||
full text, substring, bounded (character-wise or word-wise), or full line highlighting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems a little lengthy; Could we compress all this into a table? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree :/ Felt like a lot to explain from the various types, syntax formats, and its limitations. But not sure how far I can condense it to a table, looks like some parts may need to just have a brief sentence or two without making the table feel like a wall of text. I'll try to do it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really neat now 👍 thanks!
@@ -56,8 +56,8 @@ Content in a fenced code block | |||
20 | |||
``` | |||
|
|||
**`highlight-lines` attr with line-slice syntax of empty indices should highlight leading/trailing spaces | |||
```xml {highlight-lines="2[:],4[:]-5[:]"} | |||
**`highlight-lines` attr with empty (any variant) line-slice syntax should highlight leading/trailing spaces** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add in the edge cases (if they are) earlier once fixed as well
ffe0c65
to
9844388
Compare
1432943
to
7e19b93
Compare
Hi @ang-zeyu, I have reworked the processing flow for the highlight rules, now the major work is done during parsing the component rule (including offsetting line numbers, computing actual bounds, etc) so it is ensured that after parsing, the rule can be applied immediately to the code block without further processing. Can you help review whether the flow is streamlined enough? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really excellent work, especially on the test cases and docs.
Just a few more nits:
* @param node The node of the line part to be traversed | ||
* @param hlStart The highlight start position, relative to the start of the line part | ||
* @param hlEnd The highlight end position, relative to the start of the line part | ||
* @returns {object} An array of two items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed one here
let curr = 0; | ||
const highlightData = node.children.map((child) => { | ||
const data = traverseLinePart(child, hlStart - curr, hlEnd - curr); | ||
curr += data.numCharsTraversed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would resData.numCharsTraversed += data.numCharsTraversed;
be more concise? since curr
isn't used anywhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but replacing curr
with resData.numCharsTraversed
might end up being more verbose (especially at the data
constant declaration there). But I'll try to remove curr
and rework the hlStart - curr
and hlEnd - curr
to be computed before the recursive call.
* @return | ||
*/ | ||
function highlightCodeBlock(node) { | ||
const codeNode = node.children.find(c => c.name === 'code'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops 🙈, misread this
const [indents, content] = HighlightRule._splitCodeAndIndentation(codeStr); | ||
return `<span>${indents}<span class="highlighted">${content}</span>\n</span>`; | ||
static _highlightPartOfText(codeStr, bounds) { | ||
// Note: As part-of-text highlighting requires walking over the node of the generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiline here
</foo> | ||
``` | ||
|
||
**`highlight-lines` attr with partial word-variant line-slice syntax should defaults highlight to start/end of line** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**`highlight-lines` attr with partial word-variant line-slice syntax should defaults highlight to start/end of line** | |
**`highlight-lines` attr with partial word-variant line-slice syntax should default highlight to start/end of line** |
**Ranged full text highlight**<br>Highlights only the text portion of the lines within the range | `lineStart-lineEnd` | `2-4` | ||
**Ranged full line highlight**<br>Highlights the entirety of the lines within the range | `lineStart[:]-lineEnd` or `lineStart-lineEnd[:]` | `1[:]-5`,`10-12[:]` | ||
**Ranged character-bounded highlight**<br>Highlights the text portion of the lines within the range, but starts/ends at an arbitrary character | `lineStart[start:]-lineEnd` or `lineStart-lineEnd[:end]` | `3[2:]-7`, `4-9[:17]` | ||
**Ranged word-bounded highlight**<br>Highlights the text portion of the lines within the range, but starts/ends at an arbitrary word | `lineStart[start::]-lineEnd` or `lineStart-lineEnd[::end]` | `16[1::]-20`,`22-24[::3]` | ||
|
||
<include src="codeAndOutputCode.md" boilerplate > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's shift this up before the value of highlight-lines...
so the user has a brief idea what the usage is like first :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might have missed one ^ @ryoarmanda
disregard if it looks stranger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this already @ang-zeyu, I just reworded the value of highlight-lines...
to the ones in line 85-86
of the file
Edit: strange it doesn't show up in the diffs, you can look for it on the preview site, it's already changed there. Maybe because I force-pushed as I rebased a fixup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this
was suggesting the reverse - moving the example up (the <include>
tag), so the user gets a brief overview of the entire syntax usage first before the details
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh okay, will rebase the fix shortly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, the change is reverted in the force push :)
docs/userGuide/syntax/code.mbdf
Outdated
For ranges, you only need to use line-slices on either ends. | ||
Type | Format | Example | ||
-----|--------|-------- | ||
**Ranged full text highlight**<br>Highlights only the text portion of the lines within the range | `lineStart-lineEnd` | `2-4` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**Ranged full text highlight**<br>Highlights only the text portion of the lines within the range | `lineStart-lineEnd` | `2-4` | |
**Ranged full text highlight**<br>Highlights from the first non-whitespace character to the last non-whitespace character | `lineStart-lineEnd` | `2-4` |
interpretation of text portion
is subjective (e.g. in-between whitespaces should / should not be highlighted) =P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, will do
docs/userGuide/syntax/code.mbdf
Outdated
-----|--------|-------- | ||
**Ranged full text highlight**<br>Highlights only the text portion of the lines within the range | `lineStart-lineEnd` | `2-4` | ||
**Ranged full line highlight**<br>Highlights the entirety of the lines within the range | `lineStart[:]-lineEnd` or `lineStart-lineEnd[:]` | `1[:]-5`,`10-12[:]` | ||
**Ranged character-bounded highlight**<br>Highlights the text portion of the lines within the range, but starts/ends at an arbitrary character | `lineStart[start:]-lineEnd` or `lineStart-lineEnd[:end]` | `3[2:]-7`, `4-9[:17]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like ranged full text highlight, but...
the word one too
3d942f5
to
faff513
Compare
The docs has been updated to address the comments, can have a look again and see if there is anything I missed out. Thanks for all the reviews so far @ang-zeyu 🙇 |
faff513
to
bff12e5
Compare
bff12e5
to
cd40d8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm 👍
@ryoarmanda try to avoid 'all possible usage in one example' technique. It might work for test cases but not a good option in user documentation. From user POV, the example is too complicated and unrealistic. Can you do a minor PR to split the example into separate small examples? |
BTW, kudos for getting this difficult feature to work @ryoarmanda 💯 |
What is the purpose of this pull request?
Resolves #1381
Overview of changes:
Partial text highlighting is available with the line-slice syntax format
lineNumber[start:end]
. You can see this in action in the deployed UG here.The processing is done in NodeProcessor in order to efficiently leverage the parsed HTML to traverse the code block node. For each line, the highlighting range is carried over by
hl-start
andhl-end
attributes, specified from themarkdown-it
patch, which will be extracted in NodeProcessor.In short, the traversal strategy is as follows:
highlighted
class in the node for concisenessWith the addition of partial text highlighting, better support for range highlighting is achievable. Users can specify whether to
start/end the range with partial text highlight. Range highlight processing is modified to support this.
Anything you'd like to highlight / discuss:
Testing instructions:
Proposed commit message: (wrap lines at 72 characters)
Code blocks: Support partial text highlighting
The code blocks line highlighting functionality only supports full-text
and full-line highlights. Authors who wish to only highlight certain
words or key variables are not able to do so with the supported syntax.
Let's add a new highlight syntax to provide authors with the ability to
partially highlight text in a line.
Checklist: ☑️