Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting improvements #1244

Merged
merged 6 commits into from
Nov 29, 2022

Conversation

phillipwood
Copy link
Contributor

While using delta I have noticed a few cases where the highlighting
was not quite as good as it could have been. This PR attempts to
address that. The most significant change is that where possible
insertions and deletions are grouped together to minimize the number of
groups of changes within a line. For example the change

-printf "%s\n" s y y ...
+test_write_lines s y n ...

is now highlighted as

-[printf "%s\n"] s y n ...
+[test_write_lines] s y y ...

rather than

-[printf "%]s[\n"] [s ]y y ...
+[test_write_lines ]s y y ...

There are several other changes:

  • Unchanged tokens are less likely to be highlighted when the length of
    the line is unchanged.
  • Where possible deletions will now precede insertions.
  • An unchanged space at the end of a line will not be highlighted as
    changed when it is preceded by a changed word.

I was unsure whether to squash the last two commits together. The
diffstat of the combined patch is quite a bit smaller than the two
separate patches but I think it is clearer to show the changes
separately.

edits::tests::test_infer_edits_12 passes so there is no need to ignore
it.
Reduce the amount of repetition and specify the Levenshtein distance
explicitly in preparation for changing the Levenshtein calculation and
adding more tests in a future commit. As there are quite a few inputs to
each test use a struct rather than a plain function for the tests as
this effectively provides named parameters for the inputs.
When a token has moved within a line try to highlight it as a deletion
followed by an insertion. Currently

    -'b '
    +' b'

is highlighted as

    -'b[ ]'
    +'[ ]b'

as if the space has moved to the left, instead highlight it as

    -'[b] '
    +' [b]'

as if the "b" has moved to the right. As the changes to the tests show
this also tends to favor a longer match at the start of the line.
Only unchanged space between changed tokens should be highlighted, if
the line ends with an unchanged space then the space should not be
highlighted.
Sometimes the highlighting of a change is not as clear as it could
be. For example the change

    -printf "%s\n" s y y ...
    +test_write_lines s y n ...

is highlighted as

    -[printf "%]s[\n"] [s ]y y ...
    +[test_write_lines ]s y y ...

rather than

    -[printf "%s\n"] s y n ...
    +[test_write_lines] s y y ...

This is because the Levenshtein distance calculation only checks if
the current tokens match without considering if the previous tokens
matched or not. Adding a small penalty for starting a run of changed
tokens forces the changes to be grouped together whenever possible. A
knock on effect of adding the penalty is that the cost a substitution
needs to be increased relative to the cost of an insertion/deletion
otherwise the lowest cost for "ab" -> "ba" is two substitutions rather
than an insertion, match and deletion.

There are several changes to the tests

 - the Levenshtein distance is updated to reflect the new calculation in
   tests::align::*

 - several new tests are added to tests::align to check the grouping of
   different combinations of insertions and deletions

 - tests::edits::test_infer_edits_10 shows an improvement in the
   highlighting as the unchanged space at the end of the changed tokens
   is no longer highlighted. This is because it is no longer considered
   to be deleted by the Levenshtein matching as deleting the space
   between the deleted words now has a lower cost.

 - there is a new test in tests::edits using the example in this
   message.
Now that the lowest cost edit path groups deletions and insertions
together there seems to be little point in keeping substitutions. Indeed
the lower cost of substitutions compared to a deletion followed by an
insertion can adversely affect the highlighting. If the length of the
line does not change delta will prefer to use substitutions over
deletions and insertions even if it results in unchanged tokens being
marked as changed. For example in

    -[a] a a a a a [b b b]
    +[c] a a a a a [a c c]

unchanged tokens are marked as deleted and inserted. Without
substitutions this is highlighted as

    -a a a a a a [b b b]
    +[c ]a a a a a a [c c]

which highlights the insertion at the beginning of the line and the
change at the end of the line more clearly.

There is a change to the distance calculation as substitutions only
contributed the length of the deletion whereas now the same change will
add the length of both the deletion and the insertion. This is addressed
by doubling the contribution of unchanged sections so that the
denominator equals the sum of the distance contributions of the plus
line and minus line.
@dandavison
Copy link
Owner

dandavison commented Nov 26, 2022

@phillipwood this looks like one of the best improvements to delta that has been made since the project started. Thank you very much for working on this.

I'm very short of time currently unfortunately, but I'm certainly going to run your branch as my local delta and I've collected some screenshots of differences below. If some other delta contributors would like to try this branch out that would be great. Unless someone's aware of a major breaking change that some users are likely not to like, then I'm optimistic that we can merge and release this.

@phillipwood Is there any chance you'd be able to do the following? Look through some of the open issues that contain criticism of the details of delta's highlighting and comment on whether this branch fixes them / could fix them with modification / whether you think the issue should stay open?

Have you formed any view on what the default value of max-line-distance should be?

Before After
image image
image image
image image
image image
image image
image image

@phillipwood
Copy link
Contributor Author

phillipwood commented Nov 26, 2022

@dandavison Thanks, I'll try and take a look at the issues this week. I'm not sure what to do with max-line-distance. I've been using it unchanged and that seems to give reasonable results though there is some change it what gets highlighted and what does not. The screenshots below show the diff in the output between master and this branch when viewing the history of git (reading diffs of diffs takes a bit of getting used to but it is a convenient way to find changes in the output).

A couple of changes that get highlighted when they weren't before
Screenshot from 2022-11-26 15-30-18
image

A couple of examples where we lose the highlighting
image
image

Here are a couple examples of improvements which show changes being grouped together and the benefit of removing substitutions.
image
image

This is an improvement but I'm pretty sure it is just luck due to matching the longest prefix possible which helps when an item is inserted into a list.
image

I've not seen anything terrible but these are slightly worse now (due to the prefix matching which improved the example above). In the future I'd like to play with sliding the highlights to optimize their position in the same way git's diff.indentHeurstic does for diffs.
image
image

[edited to add some comments to explain the source of the change in highlighting]

@phillipwood
Copy link
Contributor Author

@dandavison I've been though all the open highlighting issues I could find and commented on the ones improved by this PR. Most of the issues seem to stem from problems pairing up added and deleted lines which are not addressed by this series. #521 is the closest match but that is already closed. #345 & #1181 can be closed if this is merged I think.

@dandavison dandavison merged commit 96c2fa8 into dandavison:master Nov 29, 2022
@dandavison
Copy link
Owner

LGTM! And thanks very much for going through those issues -- that's extremely useful.

@nestor-custodio
Copy link

I know the v0.15.0 release is a few days old at this point, but the highlighting improvement is so noticeable I just had to come back to this issue and say WOW. Thank you for making these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants