Skip to content

feat(patch_set): support git diff application#59

Open
weihanglo wants to merge 10 commits intobmwill:masterfrom
weihanglo:gitdiff
Open

feat(patch_set): support git diff application#59
weihanglo wants to merge 10 commits intobmwill:masterfrom
weihanglo:gitdiff

Conversation

@weihanglo
Copy link
Copy Markdown
Contributor

@weihanglo weihanglo commented Apr 12, 2026

Add git diff output parsing and application support. Some hightlights:

  • The first few refactor commits add parse options for Patch, making skipping preamble and rejecting hunks configurable. Optional preamble skipping is required for git diff parsing because git diff patch may have only git headers without any unidiff portion, followed by the next diff --git header. If we still skip preamble, the next diff --git will be swallowed.
  • Binary patch is not yet supported, though we have skip and fail options which user can configure to either skip binary patch, or fail on encountering a binary patch. I plan to add a Keep enum variant when we add the support the binary patch parsing/application. (See https://github.com/weihanglo/diffy/blob/4fc7c4f0e4bc6db55da6a8eb92f4cdf20113225c/src/patches/parse.rs#L172-L197)

}
};

// FIXME: error spans point at `diff --git` line, not the specific offending line
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it for future, I am lazy again to write new code 🙇🏾‍♂️.

@weihanglo weihanglo force-pushed the gitdiff branch 2 times, most recently from ffb8730 to 2515dd3 Compare April 12, 2026 21:08
Comment on lines +333 to +349
// Can't use `--numstat` for GitDiff: it shows `-\t-\t` for both
// actual binary diffs AND pure binary renames (100% similarity).
// Parser correctly handles pure renames (rename headers, no binary content).
//
// Use `--raw` for total count, subtract actual binary markers from diff.
//
// TODO: once `--binary` is passed above, count ALL `--raw`
// entries — every file will have patch data (delta, literal, or text).
let raw = git(repo, &["diff", "--raw", parent, child]);
let total = raw.lines().filter(|l| !l.is_empty()).count();
let type_changes = count_type_changes(&raw);
let binary = diff_output
.lines()
.filter(|l| l.starts_with("Binary files ") || l.starts_with("GIT binary patch"))
.count();
skipped += binary;
total - binary + type_changes
Copy link
Copy Markdown
Contributor Author

@weihanglo weihanglo Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a partial port of https://github.com/weihanglo/diffy/blob/e50f9286b95671b0f131fb5ac4f5725933e78232/tests/replay.rs#L292-L304. We will remove the binary filter eventually to avoid duplicate the parsing logic in test when we add binary patch support.

@weihanglo weihanglo force-pushed the gitdiff branch 2 times, most recently from eaa47ff to 49376b2 Compare April 13, 2026 19:24
`.lines()` strips line endings, so callers tracking byte offsets
need to re-add the `\r\n` or `\n` length manually.
Extract the repeated inline pattern into a reusable helper.
* Parse `diff --git` extended headers
* split multi-file git diffs at `diff --git` boundaries
Compat test for also `git apply`.
Unlike unidiff,
gitdiff produces patches for empty file creations/deletions
(`0\t0` in numstat)
because they carry `diff --git` + extended headers even without hunks.

Binary files (`-\t-\t`) are skipped in gitdiff mode for now.
Comment on lines +116 to +120
/// Skip binary diffs silently.
pub fn skip_binary(mut self) -> Self {
self.binary = Binary::Skip;
self
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good actually. We may want a binary marker like patch so people explicitly know they are skipping something

Comment on lines +491 to +494
/// * `a/<path> b/<path>` (default prefix)
/// * `<path> <path>` (`git diff --no-prefix`)
/// * `<src-prefix><path> <dst-prefix><path>` (custom prefix)
/// * `"<prefix><path>" "<prefix><path>"` (quoted, with escapes)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add more compat tests around this logic.

Comment on lines +388 to +396
return Ok(FileOperation::Rename {
from: Cow::Borrowed(from),
to: Cow::Borrowed(to),
});
}
if let (Some(from), Some(to)) = (header.copy_from, header.copy_to) {
return Ok(FileOperation::Copy {
from: Cow::Borrowed(from),
to: Cow::Borrowed(to),
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might return the quoted filename. This need to be fixed.

fn extract_file_op_gitdiff<'a>(
header: &GitHeader<'a>,
patch: &Patch<'a, str>,
) -> Result<FileOperation<'a>, PatchSetParseError> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing issue: This FileOperation preserves the raw path with prefix unstripped, e.g., a/src/lib.rs and b/src/lib.rs. I personally think this is the right chose on syntactic level and consumer should know their patch better than us. However, the API doc should call this out more explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that we have yet supported non-UTF8 path even in #64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant