Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better diff algorithm #15

Closed
dmerejkowsky opened this issue Oct 18, 2018 · 10 comments · Fixed by #66
Closed

Better diff algorithm #15

dmerejkowsky opened this issue Oct 18, 2018 · 10 comments · Fixed by #66

Comments

@dmerejkowsky
Copy link
Collaborator

dmerejkowsky commented Oct 18, 2018

Right now if you replace foo_bar by spam_eggs, the underscore won't get colored when printing the diff.

I've tried and fail to fix the algorithm, so maybe we should use an other diff library instead.

Maybe this one: https://docs.rs/diff/0.1.11/diff/index.html ?

Or we port diff-so-fancy from Perl to Rust :P

dmerejkowsky added a commit that referenced this issue Oct 19, 2018
@ErichDonGubler
Copy link

ErichDonGubler commented Oct 23, 2018

Porting diff-so-fancy sounds like an EXCELLENT idea, but then again I'm all for rewriting things in Rust, so...

EDIT: The above is a joke. diff-so-fancy doesn't actually implement a diffing algorithm, it just prettifies a unified format patch and brings some config knobs with it. Please don't do this unless you plan on bringing in another diffing library as a dependency.

@ErichDonGubler
Copy link

Another alternative that looks like it might do what you want is difference: https://crates.io/crates/difference

@dmerejkowsky
Copy link
Collaborator Author

We're already using difference :)

@dmerejkowsky
Copy link
Collaborator Author

@cgestes
Copy link

cgestes commented Oct 24, 2018

I believe this should be done in two steps.

For the input: diffing the input like sed would do for: s/input//
For the output: diffing the previous output with the expected output.

So basically:

ruplacer foo_bar lol_rofl

Input: I love foo_bar eggs.
Output: I love lol_rofl eggs.
Intermediary output: I love eggs.

IN: diff input / Intermediary
OUT: diff Intermediary / Output

@dmerejkowsky
Copy link
Collaborator Author

Interesting. This way instead of showing the reality of the diff, we show back to the user what he meant.

Example:

# whatever diff lib we use, we don't now about the pattern and replacement there,
# so we print the 'real' diff:
$ ruplacer foo_bar_baz foo_bar_spam
-- baz
++ spam
# With @cgestes  suggestion:
# We know about 'foo_bar_baz' patterns and we
# we can use it to display the replacements:
$ ruplacer foo_bar_baz foo_bar_spam
-- foo_bar_baz
++ foo_baz_spam

The second option may give a more verbose output, but it will help if a complex regex with captured groups is used.

@dmerejkowsky
Copy link
Collaborator Author

Sadly going through an intermediary string still does not work:

impl BetterReplacement {
    fn print_self(&self) {
        let changeset = Changeset::new(&self.start, &self.middle, "");
        // ...
        let changeset = Changeset::new(&self.middle, &self.end, "");
        // ...
    }
}

let start = "I love foo and eggs";
let middle = "I love and eggs";
let end = "I love bar and eggs";

let replacement = BetterReplacement{ start, middle, end};
replacement.print_self()

Gives this result (where _ represent what gets colored);

-- I love foo and eggs
          _____
++ I love bar and eggs
          _ ___

I think the only way to move forward is to compute the changed indexed "by hand" when ruplacing

dmerejkowsky added a commit that referenced this issue Jul 27, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 28, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 28, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 28, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 28, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 28, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Jul 31, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
dmerejkowsky added a commit that referenced this issue Aug 3, 2020
Instead of computing the new line and then trying to guess the diff,
we compute the positions where the string needs to change and use that
to both print the diffs and compute the new string

Fix #15
@dmerejkowsky
Copy link
Collaborator Author

dmerejkowsky commented May 10, 2021

OK, so I managed to write a POC in Python. It's in the python-poc branch if you want to check it out

All that's left to do is to rewrite in in Rust :)

@dmerejkowsky
Copy link
Collaborator Author

Phew - it's done. All that's left is tidying up the history of the refactor branch - let's tackle that next time

@sergeevabc
Copy link

sergeevabc commented Feb 12, 2024

This thread mentions diff variation known as unified, when the original and the change are written under each other in a column. However, when I work with the app 0.8.2, this variation is missing, the original and the change are written in one long line. This is inconvenient with a small screen. Can you do something about it, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants