Syntax highlighting sometimes incorrect with multiline constructs #117

dandavison · 2020-03-23T18:44:10Z

If delta receives a hunk that looks something like the example below, then it will syntax highlight the code as a string. (This is a python example, but the problem exists in any language with multiline string literals I think):

    ...and this is the end of my docstring.
    """
    x = some_function("some_string")  # This line should be syntax-highlighted as code.

That seems like an inevitable ambiguity. However, are there perhaps some heuristics that could be introduced to allow delta to guess which side of the """ is code and which side string literal?

The text was updated successfully, but these errors were encountered:

dandavison · 2020-09-04T17:23:51Z

From @kofls in #316

I think a simple solution would be to get the syntax style per file and use that for each diff section

Do you mean that delta should try to locate the entire file on disk, and use that to determine exactly what is going on in a diff that shows just a fragment of the file? That's a good point, delta could try to do that. It won't always work of course, because the diff might just be piped to delta and the files mentioned in the diff might not even exist, but delta does already use libgit2 to see if it is in a git repo. As you say, it could be an option.

kofls · 2020-09-04T21:55:26Z

Yup, that was what I meant. I hadn't considered that the diff source might not be files.

YodaEmbedding · 2021-04-05T10:52:11Z

Quick n' dirty: I think that syntax highlighting both portions (even if incorrect) is less disastrous than not highlighting the code at all. Perhaps a quick and simple workaround would be to ignore """ when highlighting, so that everything gets highlighted?

Robust and accurate: delta knows the file location, line numbers, and the line number of function definition. So read the file, skip to the function definition, then highlight from there until the end of the function. Now extract the relevant lines from the diff and display those. There are probably some edge cases to handle but that seems to be a good way to do this properly.

Alphare · 2021-05-05T09:35:05Z

Robust and accurate: delta knows the file location, line numbers, and the line number of function definition

FYI this would only work with Git which is content-addressable, and not (as simply) with Mercurial. I'm not even sure that every operation in Git can be recovered by looking up a file given the information available in a diff, but I don't know enough about Git.

Alphare · 2021-05-07T12:00:29Z

There exists the git diff -U and hg diff -U flags that give more context. delta could use this flag and then strip down the added context. Note that the theoretical "right" approach in Mercurial would be to use some kind of extdiff mechanism, but that's probably more work. :)

TheKevJames · 2022-10-26T14:52:29Z

Potentially a middle-ground between @YodaEmbedding's options:

Try to highlight both sides, capture how successful we were (does bat expose details on what the guessed language was? This could even be a simple "bat guessed the wrong language for this file, probably a comment!"), only show the highlights if error_count < n.

dandavison · 2022-10-26T15:09:33Z

Hi @TheKevJames, delta does not use bat for syntax highlighting ~~or guessing languages~~. I'm curious -- could you tell me where you got the impression that it does? You're not alone, it's a common misconception!

TheKevJames · 2022-10-26T15:28:51Z

@dandavison oh, interesting! I took a quick glance at the README before writing my above message and came across the following lines:

Language syntax highlighting with the same syntax-highlighting themes as bat
All the syntax-highlighting color themes that are available with bat are available with delta:

They definitely don't state that "bat is used for syntax highlighting" directly, but my read from them implied it. IMO "why mention bat if it's not relevant here?"

Guesswork start here: perhaps the library you use for syntax highlighting is just the same one as bat's, or something like that? Maybe mentioning the library could help? eg.

"All the syntax-highlighting color themes that are available with $library -- the same library used by bat -- are available with delta"

dandavison · 2022-10-26T15:29:08Z

Actually, re-reading what you wrote, you're just talking about guessing languages, not syntax highlighting. OK, so the way this works is:

Delta uses some code from bat to map a file name to a language.
Delta uses syntect to syntax highlight that language.

dandavison · 2022-10-26T15:31:06Z

Yes, exactly, you're right -- delta and bat use the same syntax highlighting library. Thanks, I'll change the README as you suggest -- this confuses lots of people. I thought it might help to mention that they're the same themes to reassure people that they will be able to create a visually consistent experience using both programs.

TheKevJames · 2022-10-26T16:02:29Z

Definitely agree on mentioning the similarity to bat -- the two having the same themes was certainly relevant to me when I started looking into using delta! But yeah, mentioning/linking to syntect will certainly help folks looking for a more specific understanding of what lib does what thing.

So in that case, it looks to me like the functionality I described would best be implemented via changes to syntect code; eg. they would need to offer someway to check if a given SyntaxSet "seems to be correct".

My initial thought is that "seems to be correct" might be approximate by checking on the number of modifications: my naive assumption is that in cases where we're highlighting the code we would get many tokens highlighted with their syntax but in cases where we're highlighting the comment we would get few highlights (or perhaps "all the same"? Not sure if there's a highlight for "this token doesn't make sense here").

So in that case, an option which doesn't require syntect changes might be to modify this chunk of delta.

Pardon my incredible lack of rust fluency here, but in attempted psuedo-code / bad code, we might be able to do something like:

let mut line_styles = Vec::new();
for (line, _) in lines.iter() {
    let sections = highlighter
        .highlight_line(line, &config.syntax_set)
        .unwrap();
    # MIN_STYLED_TOKENS might be "at least n tokens should be highlighted"
    # alternatively, it could be "at least n% of the line is hightlighted"
    let is_styled = sections
        .reduce(|acc, (style, _)| {
            if (style != NULL && style != INVALID_TOKEN) { acc += 1 }
        }) <= MIN_STYLED_TOKENS;
    line_styles.push((sections, is_styled));
}

# ie. "split our vector into lines before and lines after a block comment"
let index = line_styles.iter().position(|(line, _)| line == "\"\"\"").unwrap();
let styled_tokens_before = line_styles[..index]
    .filter(|(_, is_styled)| { is_styled })
    .count();
let styled_tokens_after = line_styles[index..]
    .filter(|(_, is_styled)| { is_styled })
    .count();

# This could also be thresholded: eg. "if they're close enough, style both"
if (styled_tokens_before > styled_tokens_after) {
    line_sections.push(line_styles[..index]);
    for (line, _) in lines[index..].iter() {
        line_sections.push(vec![(config.null_syntect_style, line.as_str())]);
    }
} else {
    for (line, _) in lines[..index].iter() {
        line_sections.push(vec![(config.null_syntect_style, line.as_str())]);
    }
    line_sections.push(line_styles[index..]);
}

dandavison mentioned this issue Jul 21, 2020

🐛 PHP syntax highlighting not working properly #162

Open

dandavison changed the title ~~Heuristics for multiline string parse ambiguity~~ Syntax highlighting incorrect with multiline strings Sep 4, 2020

dandavison mentioned this issue Sep 4, 2020

🐛 Syntax highlighting is sometimes wrong #316

Closed

dandavison mentioned this issue Jan 27, 2021

🐛 Wrong syntax highlighting after python docstring #509

Closed

dandavison mentioned this issue Apr 26, 2021

🐛 Delta gets confused when highlighting some Python classes with block comments #577

Closed

Alphare mentioned this issue May 7, 2021

🐛 Cannot display hg help diff when using delta #594

Closed

dandavison mentioned this issue Nov 18, 2021

Support --color-moved #72

Closed

dandavison mentioned this issue Jan 5, 2022

🚀 Syntax highlighting of diffs should derive from the source file #886

Open

dandavison mentioned this issue Jun 25, 2022

🚀 Improve syntax highlighting by narrowing context #1091

Open

dandavison mentioned this issue Jan 17, 2024

🐛 Syntax highlighting fails when the visible snippet includes a multiline comment #1606

Closed

2 tasks

dandavison changed the title ~~Syntax highlighting incorrect with multiline strings~~ Syntax highlighting sometimes incorrect with multiline constructs Jan 20, 2024

dandavison mentioned this issue Jan 20, 2024

🐛 Vue SFC syntax highlighting does not work if opening tag is not included in diff #1546

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax highlighting sometimes incorrect with multiline constructs #117

Syntax highlighting sometimes incorrect with multiline constructs #117

dandavison commented Mar 23, 2020 •

edited

Loading

dandavison commented Sep 4, 2020

kofls commented Sep 4, 2020 •

edited

Loading

YodaEmbedding commented Apr 5, 2021 •

edited

Loading

Alphare commented May 5, 2021

Alphare commented May 7, 2021

TheKevJames commented Oct 26, 2022 •

edited

Loading

dandavison commented Oct 26, 2022 •

edited

Loading

TheKevJames commented Oct 26, 2022

dandavison commented Oct 26, 2022

dandavison commented Oct 26, 2022

TheKevJames commented Oct 26, 2022

Syntax highlighting sometimes incorrect with multiline constructs #117

Syntax highlighting sometimes incorrect with multiline constructs #117

Comments

dandavison commented Mar 23, 2020 • edited Loading

dandavison commented Sep 4, 2020

kofls commented Sep 4, 2020 • edited Loading

YodaEmbedding commented Apr 5, 2021 • edited Loading

Alphare commented May 5, 2021

Alphare commented May 7, 2021

TheKevJames commented Oct 26, 2022 • edited Loading

dandavison commented Oct 26, 2022 • edited Loading

TheKevJames commented Oct 26, 2022

dandavison commented Oct 26, 2022

dandavison commented Oct 26, 2022

TheKevJames commented Oct 26, 2022

dandavison commented Mar 23, 2020 •

edited

Loading

kofls commented Sep 4, 2020 •

edited

Loading

YodaEmbedding commented Apr 5, 2021 •

edited

Loading

TheKevJames commented Oct 26, 2022 •

edited

Loading

dandavison commented Oct 26, 2022 •

edited

Loading