fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Pingasmaster · 2025-11-09T10:11:02Z

This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? idk). Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios. Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None. This is only me taking a stab at it for fun. It is not prod ready but just an idea.

Tries to help with #2177.

Byron

Thanks a lot for making this happen.

I took a first look and don't like it ~~at all~~, but also wouldn't know how to achieve round-tripping differently.

If you wouldn't mind, breaking changes should be in a separate commit and prefixed with fix!: probably, and all adjustments to other crates go into a separate commit (not marked as breaking (assuming they aren't breaking).

Byron · 2025-11-10T04:16:34Z

Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None.

First of all, the raw field is only in the *Ref versions of the type, and there already was a precedent for altering these to support round-tripping. From that point of view, I think it's acceptable, but… this never had to go so far and implement a by-pass.

I was wondering… what if this raw information would be stored on the CommitRef instead? Then I'd even go as far as to turn the existing fields with parsed *Ref types into &BStr, turning it into the raw field effectively.

Then one can use commit.author to access the raw field, and commit.author() to get a fallible parsed version of it just like before.

So this PR is definitely good at showing that alternative solutions like the one mentioned here a worth exploring.
Is this something you'd be interested in?

Pingasmaster · 2025-11-11T00:44:34Z

I'm not sure I can do exactly what you have in mind but I gave it a try. It's much cleaner code-wise at least. Divided it into 2 commits like you asked too, but I'm still not sure this is 100% ready. It's 2AM right now for me so I'll sleep and see tomorrow if I have anything else which might make this cleaner. Thanks for the feedback!

Byron

Thanks a lot for the second round!

Yes, using the fields directly and parsing on the fly is the way to go. In general, this parsing is now fallible, and .expect/.unwrap can't be used.

Besides that, it's definitely getting there, thanks again!

Byron · 2025-11-11T04:11:08Z

gix-object/src/commit/mod.rs

    /// Return the author, with whitespace trimmed.
    ///
    /// This is different from the `author` field which may contain whitespace.
    pub fn author(&self) -> gix_actor::SignatureRef<'a> {


These must be fallible, panics aren't allowed. It's OK to use the error type returned by SignatureRef::from_bytes().

Byron · 2025-11-11T04:13:18Z

gix-object/src/commit/write.rs

+}
+
+fn write_signature(mut out: &mut dyn io::Write, field: &[u8], raw: &bstr::BStr) -> io::Result<()> {
+    if signature_requires_raw(raw) {


I wonder why this differentiation is still required. In theory, the committer and author are now verbatim, which should always be what's written back.

Byron · 2025-11-11T04:15:56Z

gix-object/src/object/convert.rs

        } = other;
+        let tagger = tagger.map(|raw| {
+            gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref())
+                .expect("signatures were validated during parsing")


Every time the signature is parsed it must be fallible.

Byron · 2025-11-11T04:18:00Z

gix-object/src/tag/write.rs

+    gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()).expect("signatures were validated during parsing")
+}
+
+fn signature_requires_raw(raw: &BStr) -> bool {


Again, I think differentiating between these shoudln't be necessary.

Pingasmaster · 2025-11-11T13:57:58Z

I've adjusted the commit and tag APIs so they return fallible results. CommitRef::author/committer/time report Result<SignatureRef, decode::Error> and after successful decoding trim the parsed value. TagRef::tagger also follows the same pattern. I made the owned conversions (CommitRef::into_owned/to_owned and TagRef::into_owned) use TryFrom so everything propagates cleanly, but don't hesitate to tell me if you have a better idea.

I’ve also removed the raw/canonical split. You were right, just streamring the stored header bytes and sizing them via raw.len() for commits and tag writers is much better.

Every place that re-parses actors during conversion or utility code should be fallible now. Tell me if you see anything else.

Byron · 2025-11-22T10:16:57Z

My apologies for letting this PR wait for so long. I do hope to get to it this weekend.

Byron · 2025-11-30T14:25:17Z

Thanks for the patience - I will do a final pass now and make sure this PR gets merged - no more action is needed from your side.

Copilot

Pull request overview

This PR changes the signature handling in gitoxide to be lossless by storing raw commit/tag actor headers as byte slices and parsing them on-demand, rather than parsing them eagerly during deserialization. This allows round-tripping of commits and tags with malformed signature headers (like embedded angle brackets).

Changed CommitRef and TagRef to store raw signature bytes (&'a BStr) instead of parsed SignatureRef structures
Added author() and committer() methods to CommitRef, and tagger() method to TagRef that parse signatures on demand
Changed conversions from *Ref to owned types (Commit, Tag) from From to TryFrom to handle parsing errors

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
gix/tests/gix/repository/object.rs	Updated tests to call new `tagger()`, `author()`, and `committer()` methods with error handling
gix/src/revision/spec/parse/error.rs	Added fallback error handling when parsing committer signature fails
gix/src/object/commit.rs	Updated documentation to reference the new accessor methods
gix/src/commit.rs	Updated `describe` functions to handle the new `Result`-returning `tagger()` method
gix-odb/tests/odb/store/loose.rs	Removed helper function and updated tests to use raw signature bytes
gix-object/tests/object/tag/mod.rs	Updated tests to use raw signature bytes and added test for `tagger()` method
gix-object/tests/object/encode/mod.rs	Changed conversions from `into()` to `try_from()` to handle new fallible conversions
gix-object/tests/object/commit/message.rs	Updated tests to use raw signature bytes and validate accessor methods
gix-object/tests/object/commit/from_bytes.rs	Updated tests to use raw signature bytes, added validation of accessor methods, and added new test for `author_method_returns_trimmed_signature()`
gix-object/src/tag/write.rs	Changed serialization to write raw bytes directly instead of using signature formatting
gix-object/src/tag/mod.rs	Added `tagger()` method that parses raw bytes on-demand and changed `into_owned()` to return `Result`
gix-object/src/tag/decode.rs	Changed parsing to use `signature_with_raw` helper that captures both parsed signature and raw bytes
gix-object/src/parse.rs	Added `signature_with_raw()` helper function that returns both parsed signature and raw input bytes
gix-object/src/object/mod.rs	Changed `into_owned()` and `to_owned()` methods to return `Result` due to fallible conversions
gix-object/src/object/convert.rs	Changed `From` implementations to `TryFrom` for `TagRef`, `CommitRef`, and `ObjectRef` conversions with signature parsing
gix-object/src/lib.rs	Changed `CommitRef` fields `author` and `committer`, and `TagRef` field `tagger` to store raw bytes instead of parsed signatures
gix-object/src/commit/write.rs	Changed serialization to write raw bytes directly instead of using signature formatting
gix-object/src/commit/mod.rs	Added `parse_signature()` helper and changed `author()`, `committer()`, and `time()` methods to return `Result`; changed `into_owned()` and `to_owned()` to return `Result`
gix-object/src/commit/decode.rs	Changed parsing to use `signature_with_raw` helper that captures both parsed signature and raw bytes
gix-merge/src/commit/virtual_merge_base.rs	Added error type for decode errors and updated conversion to use `try_from()`
examples/log.rs	Updated to use the new `author()` method with error handling

gix-object/src/commit/decode.rs

Copilot · 2025-11-30T14:29:38Z

gix-object/src/tag/decode.rs

+            parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| {
+                let _ = signature;
+                raw
+            })


[nitpick] The let _ = signature; pattern is used to explicitly discard the parsed signature value. Using an underscore pattern _ in the closure parameter would be more idiomatic:

parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw)

Suggested change

parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| {

let _ = signature;

raw

})

parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw)

gix-object/src/object/convert.rs

Note that this means you have to call `CommitRef::commiter|author()?` and `TagRef::tagger()?` instead of assuming pre-parsed fields. This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name <email> slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios.

- remove unwraps() - reduce duplication

Byron

Alright, this will work :)!
I also added an assertion to show that #2177 is now indeed fixed.

Byron reviewed Nov 9, 2025

View reviewed changes

Pingasmaster force-pushed the raw-email-attempt-fix branch from 2a27780 to 250d531 Compare November 11, 2025 00:38

Pingasmaster force-pushed the raw-email-attempt-fix branch 3 times, most recently from ddf9121 to 678bba4 Compare November 11, 2025 02:06

Pingasmaster changed the title ~~Enable SignatureRef/IdentityRef to preserve raw actor bytes for round-tripping malformed commits (see #2177)~~ fix!: store raw commit/tag actor headers and parse lazily (see #2177) Nov 11, 2025

Pingasmaster force-pushed the raw-email-attempt-fix branch 2 times, most recently from 114f986 to ddf21a1 Compare November 11, 2025 03:28

Byron requested changes Nov 11, 2025

View reviewed changes

Pingasmaster force-pushed the raw-email-attempt-fix branch from c423fbe to f547dcc Compare November 11, 2025 14:46

Byron self-assigned this Nov 30, 2025

Byron requested a review from Copilot November 30, 2025 14:24

Copilot started reviewing on behalf of Byron November 30, 2025 14:25 View session

Copilot finished reviewing on behalf of Byron November 30, 2025 14:28

Copilot AI reviewed Nov 30, 2025

View reviewed changes

Byron force-pushed the raw-email-attempt-fix branch from f547dcc to 797c6c5 Compare November 30, 2025 15:58

refactor

6f7b23a

- remove unwraps() - reduce duplication

Byron linked an issue Nov 30, 2025 that may be closed by this pull request

The CommitRef from invalid_email_or_committer test cannot be materialized or round-tripped #2177

Closed

Byron force-pushed the raw-email-attempt-fix branch from 797c6c5 to 6f7b23a Compare November 30, 2025 16:09

Byron approved these changes Nov 30, 2025

View reviewed changes

Byron enabled auto-merge November 30, 2025 16:10

Byron merged commit f471ac5 into GitoxideLabs:main Nov 30, 2025
28 checks passed

Uh oh!

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253

Uh oh!

Conversation

Pingasmaster commented Nov 9, 2025

Uh oh!

Byron left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Byron commented Nov 10, 2025

Uh oh!

Pingasmaster commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Byron Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Pingasmaster commented Nov 11, 2025

Uh oh!

Byron commented Nov 22, 2025

Uh oh!

Byron commented Nov 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Byron left a comment •

edited

Loading

Pingasmaster commented Nov 11, 2025 •

edited

Loading

Byron commented Nov 30, 2025 •

edited

Loading