-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Syntax highlighting sometimes incorrect with multiline constructs #117
Comments
Do you mean that delta should try to locate the entire file on disk, and use that to determine exactly what is going on in a diff that shows just a fragment of the file? That's a good point, delta could try to do that. It won't always work of course, because the diff might just be piped to delta and the files mentioned in the diff might not even exist, but delta does already use libgit2 to see if it is in a git repo. As you say, it could be an option. |
Yup, that was what I meant. I hadn't considered that the diff source might not be files. |
Quick n' dirty: I think that syntax highlighting both portions (even if incorrect) is less disastrous than not highlighting the code at all. Perhaps a quick and simple workaround would be to ignore Robust and accurate: |
FYI this would only work with Git which is content-addressable, and not (as simply) with Mercurial. I'm not even sure that every operation in Git can be recovered by looking up a file given the information available in a diff, but I don't know enough about Git. |
There exists the |
Potentially a middle-ground between @YodaEmbedding's options: Try to highlight both sides, capture how successful we were (does |
Hi @TheKevJames, delta does not use bat for syntax highlighting |
@dandavison oh, interesting! I took a quick glance at the README before writing my above message and came across the following lines:
They definitely don't state that "bat is used for syntax highlighting" directly, but my read from them implied it. IMO "why mention bat if it's not relevant here?" Guesswork start here: perhaps the library you use for syntax highlighting is just the same one as bat's, or something like that? Maybe mentioning the library could help? eg. "All the syntax-highlighting color themes that are available with $library -- the same library used by bat -- are available with delta" |
Actually, re-reading what you wrote, you're just talking about guessing languages, not syntax highlighting. OK, so the way this works is:
|
Yes, exactly, you're right -- delta and bat use the same syntax highlighting library. Thanks, I'll change the README as you suggest -- this confuses lots of people. I thought it might help to mention that they're the same themes to reassure people that they will be able to create a visually consistent experience using both programs. |
Definitely agree on mentioning the similarity to bat -- the two having the same themes was certainly relevant to me when I started looking into using delta! But yeah, mentioning/linking to syntect will certainly help folks looking for a more specific understanding of what lib does what thing. So in that case, it looks to me like the functionality I described would best be implemented via changes to syntect code; eg. they would need to offer someway to check if a given SyntaxSet "seems to be correct". My initial thought is that "seems to be correct" might be approximate by checking on the number of modifications: my naive assumption is that in cases where we're highlighting the code we would get many tokens highlighted with their syntax but in cases where we're highlighting the comment we would get few highlights (or perhaps "all the same"? Not sure if there's a highlight for "this token doesn't make sense here"). So in that case, an option which doesn't require syntect changes might be to modify this chunk of delta. Pardon my incredible lack of rust fluency here, but in attempted psuedo-code / bad code, we might be able to do something like: let mut line_styles = Vec::new();
for (line, _) in lines.iter() {
let sections = highlighter
.highlight_line(line, &config.syntax_set)
.unwrap();
# MIN_STYLED_TOKENS might be "at least n tokens should be highlighted"
# alternatively, it could be "at least n% of the line is hightlighted"
let is_styled = sections
.reduce(|acc, (style, _)| {
if (style != NULL && style != INVALID_TOKEN) { acc += 1 }
}) <= MIN_STYLED_TOKENS;
line_styles.push((sections, is_styled));
}
# ie. "split our vector into lines before and lines after a block comment"
let index = line_styles.iter().position(|(line, _)| line == "\"\"\"").unwrap();
let styled_tokens_before = line_styles[..index]
.filter(|(_, is_styled)| { is_styled })
.count();
let styled_tokens_after = line_styles[index..]
.filter(|(_, is_styled)| { is_styled })
.count();
# This could also be thresholded: eg. "if they're close enough, style both"
if (styled_tokens_before > styled_tokens_after) {
line_sections.push(line_styles[..index]);
for (line, _) in lines[index..].iter() {
line_sections.push(vec![(config.null_syntect_style, line.as_str())]);
}
} else {
for (line, _) in lines[..index].iter() {
line_sections.push(vec![(config.null_syntect_style, line.as_str())]);
}
line_sections.push(line_styles[index..]);
} |
If delta receives a hunk that looks something like the example below, then it will syntax highlight the code as a string. (This is a python example, but the problem exists in any language with multiline string literals I think):
That seems like an inevitable ambiguity. However, are there perhaps some heuristics that could be introduced to allow delta to guess which side of the
"""
is code and which side string literal?The text was updated successfully, but these errors were encountered: