Skip to content

Conversation

@Pingasmaster
Copy link
Contributor

This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? idk). Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios. Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None. This is only me taking a stab at it for fun. It is not prod ready but just an idea.

Tries to help with #2177.

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for making this happen.

I took a first look and don't like it at all, but also wouldn't know how to achieve round-tripping differently.

If you wouldn't mind, breaking changes should be in a separate commit and prefixed with fix!: probably, and all adjustments to other crates go into a separate commit (not marked as breaking (assuming they aren't breaking).

@Byron
Copy link
Member

Byron commented Nov 10, 2025

Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None.

First of all, the raw field is only in the *Ref versions of the type, and there already was a precedent for altering these to support round-tripping. From that point of view, I think it's acceptable, but… this never had to go so far and implement a by-pass.

I was wondering… what if this raw information would be stored on the CommitRef instead? Then I'd even go as far as to turn the existing fields with parsed *Ref types into &BStr, turning it into the raw field effectively.

Then one can use commit.author to access the raw field, and commit.author() to get a fallible parsed version of it just like before.

So this PR is definitely good at showing that alternative solutions like the one mentioned here a worth exploring.
Is this something you'd be interested in?

@Pingasmaster
Copy link
Contributor Author

Pingasmaster commented Nov 11, 2025

I'm not sure I can do exactly what you have in mind but I gave it a try. It's much cleaner code-wise at least. Divided it into 2 commits like you asked too, but I'm still not sure this is 100% ready. It's 2AM right now for me so I'll sleep and see tomorrow if I have anything else which might make this cleaner. Thanks for the feedback!

@Pingasmaster Pingasmaster force-pushed the raw-email-attempt-fix branch 3 times, most recently from ddf9121 to 678bba4 Compare November 11, 2025 02:06
@Pingasmaster Pingasmaster changed the title Enable SignatureRef/IdentityRef to preserve raw actor bytes for round-tripping malformed commits (see #2177) fix!: store raw commit/tag actor headers and parse lazily (see #2177) Nov 11, 2025
@Pingasmaster Pingasmaster force-pushed the raw-email-attempt-fix branch 2 times, most recently from 114f986 to ddf21a1 Compare November 11, 2025 03:28
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the second round!

Yes, using the fields directly and parsing on the fly is the way to go. In general, this parsing is now fallible, and .expect/.unwrap can't be used.

Besides that, it's definitely getting there, thanks again!

/// Return the author, with whitespace trimmed.
///
/// This is different from the `author` field which may contain whitespace.
pub fn author(&self) -> gix_actor::SignatureRef<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must be fallible, panics aren't allowed. It's OK to use the error type returned by SignatureRef::from_bytes().

}

fn write_signature(mut out: &mut dyn io::Write, field: &[u8], raw: &bstr::BStr) -> io::Result<()> {
if signature_requires_raw(raw) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this differentiation is still required. In theory, the committer and author are now verbatim, which should always be what's written back.

} = other;
let tagger = tagger.map(|raw| {
gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref())
.expect("signatures were validated during parsing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every time the signature is parsed it must be fallible.

gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()).expect("signatures were validated during parsing")
}

fn signature_requires_raw(raw: &BStr) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think differentiating between these shoudln't be necessary.

@Pingasmaster
Copy link
Contributor Author

I've adjusted the commit and tag APIs so they return fallible results. CommitRef::author/committer/time report Result<SignatureRef, decode::Error> and after successful decoding trim the parsed value. TagRef::tagger also follows the same pattern. I made the owned conversions (CommitRef::into_owned/to_owned and TagRef::into_owned) use TryFrom so everything propagates cleanly, but don't hesitate to tell me if you have a better idea.

I’ve also removed the raw/canonical split. You were right, just streamring the stored header bytes and sizing them via raw.len() for commits and tag writers is much better.

Every place that re-parses actors during conversion or utility code should be fallible now. Tell me if you see anything else.

@Byron
Copy link
Member

Byron commented Nov 22, 2025

My apologies for letting this PR wait for so long. I do hope to get to it this weekend.

@Byron Byron self-assigned this Nov 30, 2025
@Byron Byron requested a review from Copilot November 30, 2025 14:24
@Byron
Copy link
Member

Byron commented Nov 30, 2025

Thanks for the patience - I will do a final pass now and make sure this PR gets merged - no more action is needed from your side.

Copilot finished reviewing on behalf of Byron November 30, 2025 14:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the signature handling in gitoxide to be lossless by storing raw commit/tag actor headers as byte slices and parsing them on-demand, rather than parsing them eagerly during deserialization. This allows round-tripping of commits and tags with malformed signature headers (like embedded angle brackets).

  • Changed CommitRef and TagRef to store raw signature bytes (&'a BStr) instead of parsed SignatureRef structures
  • Added author() and committer() methods to CommitRef, and tagger() method to TagRef that parse signatures on demand
  • Changed conversions from *Ref to owned types (Commit, Tag) from From to TryFrom to handle parsing errors

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
gix/tests/gix/repository/object.rs Updated tests to call new tagger(), author(), and committer() methods with error handling
gix/src/revision/spec/parse/error.rs Added fallback error handling when parsing committer signature fails
gix/src/object/commit.rs Updated documentation to reference the new accessor methods
gix/src/commit.rs Updated describe functions to handle the new Result-returning tagger() method
gix-odb/tests/odb/store/loose.rs Removed helper function and updated tests to use raw signature bytes
gix-object/tests/object/tag/mod.rs Updated tests to use raw signature bytes and added test for tagger() method
gix-object/tests/object/encode/mod.rs Changed conversions from into() to try_from() to handle new fallible conversions
gix-object/tests/object/commit/message.rs Updated tests to use raw signature bytes and validate accessor methods
gix-object/tests/object/commit/from_bytes.rs Updated tests to use raw signature bytes, added validation of accessor methods, and added new test for author_method_returns_trimmed_signature()
gix-object/src/tag/write.rs Changed serialization to write raw bytes directly instead of using signature formatting
gix-object/src/tag/mod.rs Added tagger() method that parses raw bytes on-demand and changed into_owned() to return Result
gix-object/src/tag/decode.rs Changed parsing to use signature_with_raw helper that captures both parsed signature and raw bytes
gix-object/src/parse.rs Added signature_with_raw() helper function that returns both parsed signature and raw input bytes
gix-object/src/object/mod.rs Changed into_owned() and to_owned() methods to return Result due to fallible conversions
gix-object/src/object/convert.rs Changed From implementations to TryFrom for TagRef, CommitRef, and ObjectRef conversions with signature parsing
gix-object/src/lib.rs Changed CommitRef fields author and committer, and TagRef field tagger to store raw bytes instead of parsed signatures
gix-object/src/commit/write.rs Changed serialization to write raw bytes directly instead of using signature formatting
gix-object/src/commit/mod.rs Added parse_signature() helper and changed author(), committer(), and time() methods to return Result; changed into_owned() and to_owned() to return Result
gix-object/src/commit/decode.rs Changed parsing to use signature_with_raw helper that captures both parsed signature and raw bytes
gix-merge/src/commit/virtual_merge_base.rs Added error type for decode errors and updated conversion to use try_from()
examples/log.rs Updated to use the new author() method with error handling

Comment on lines 23 to 26
parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| {
let _ = signature;
raw
})
Copy link

Copilot AI Nov 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The let _ = signature; pattern is used to explicitly discard the parsed signature value. Using an underscore pattern _ in the closure parameter would be more idiomatic:

parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw)
Suggested change
parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| {
let _ = signature;
raw
})
parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw)

Copilot uses AI. Check for mistakes.
Note that this means you have to call `CommitRef::commiter|author()?` and `TagRef::tagger()?` instead
of assuming pre-parsed fields.

This PR makes signature handling truly lossless for "creative" emails and other info.
We now stash the raw name <email> slice on IdentityRef/SignatureRef and fall back to it when rewriting,
so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging?
Parsing and serialization honor that flag but still keep strict validation for normal input.
I also added regression coverage for these scenarios.
@Byron Byron force-pushed the raw-email-attempt-fix branch from f547dcc to 797c6c5 Compare November 30, 2025 15:58
- remove unwraps()
- reduce duplication
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, this will work :)!
I also added an assertion to show that #2177 is now indeed fixed.

@Byron Byron enabled auto-merge November 30, 2025 16:10
@Byron Byron merged commit f471ac5 into GitoxideLabs:main Nov 30, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The CommitRef from invalid_email_or_committer test cannot be materialized or round-tripped

2 participants