-
-
Notifications
You must be signed in to change notification settings - Fork 404
fix!: store raw commit/tag actor headers and parse lazily (see #2177) #2253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for making this happen.
I took a first look and don't like it at all, but also wouldn't know how to achieve round-tripping differently.
If you wouldn't mind, breaking changes should be in a separate commit and prefixed with fix!: probably, and all adjustments to other crates go into a separate commit (not marked as breaking (assuming they aren't breaking).
First of all, the I was wondering… what if this Then one can use So this PR is definitely good at showing that alternative solutions like the one mentioned here a worth exploring. |
2a27780 to
250d531
Compare
|
I'm not sure I can do exactly what you have in mind but I gave it a try. It's much cleaner code-wise at least. Divided it into 2 commits like you asked too, but I'm still not sure this is 100% ready. It's 2AM right now for me so I'll sleep and see tomorrow if I have anything else which might make this cleaner. Thanks for the feedback! |
ddf9121 to
678bba4
Compare
114f986 to
ddf21a1
Compare
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the second round!
Yes, using the fields directly and parsing on the fly is the way to go. In general, this parsing is now fallible, and .expect/.unwrap can't be used.
Besides that, it's definitely getting there, thanks again!
gix-object/src/commit/mod.rs
Outdated
| /// Return the author, with whitespace trimmed. | ||
| /// | ||
| /// This is different from the `author` field which may contain whitespace. | ||
| pub fn author(&self) -> gix_actor::SignatureRef<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These must be fallible, panics aren't allowed. It's OK to use the error type returned by SignatureRef::from_bytes().
gix-object/src/commit/write.rs
Outdated
| } | ||
|
|
||
| fn write_signature(mut out: &mut dyn io::Write, field: &[u8], raw: &bstr::BStr) -> io::Result<()> { | ||
| if signature_requires_raw(raw) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this differentiation is still required. In theory, the committer and author are now verbatim, which should always be what's written back.
gix-object/src/object/convert.rs
Outdated
| } = other; | ||
| let tagger = tagger.map(|raw| { | ||
| gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()) | ||
| .expect("signatures were validated during parsing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time the signature is parsed it must be fallible.
gix-object/src/tag/write.rs
Outdated
| gix_actor::SignatureRef::from_bytes::<()>(raw.as_ref()).expect("signatures were validated during parsing") | ||
| } | ||
|
|
||
| fn signature_requires_raw(raw: &BStr) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think differentiating between these shoudln't be necessary.
|
I've adjusted the commit and tag APIs so they return fallible results. I’ve also removed the raw/canonical split. You were right, just streamring the stored header bytes and sizing them via Every place that re-parses actors during conversion or utility code should be fallible now. Tell me if you see anything else. |
c423fbe to
f547dcc
Compare
|
My apologies for letting this PR wait for so long. I do hope to get to it this weekend. |
|
Thanks for the patience - I will do a final pass now and make sure this PR gets merged - no more action is needed from your side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR changes the signature handling in gitoxide to be lossless by storing raw commit/tag actor headers as byte slices and parsing them on-demand, rather than parsing them eagerly during deserialization. This allows round-tripping of commits and tags with malformed signature headers (like embedded angle brackets).
- Changed
CommitRefandTagRefto store raw signature bytes (&'a BStr) instead of parsedSignatureRefstructures - Added
author()andcommitter()methods toCommitRef, andtagger()method toTagRefthat parse signatures on demand - Changed conversions from
*Refto owned types (Commit,Tag) fromFromtoTryFromto handle parsing errors
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| gix/tests/gix/repository/object.rs | Updated tests to call new tagger(), author(), and committer() methods with error handling |
| gix/src/revision/spec/parse/error.rs | Added fallback error handling when parsing committer signature fails |
| gix/src/object/commit.rs | Updated documentation to reference the new accessor methods |
| gix/src/commit.rs | Updated describe functions to handle the new Result-returning tagger() method |
| gix-odb/tests/odb/store/loose.rs | Removed helper function and updated tests to use raw signature bytes |
| gix-object/tests/object/tag/mod.rs | Updated tests to use raw signature bytes and added test for tagger() method |
| gix-object/tests/object/encode/mod.rs | Changed conversions from into() to try_from() to handle new fallible conversions |
| gix-object/tests/object/commit/message.rs | Updated tests to use raw signature bytes and validate accessor methods |
| gix-object/tests/object/commit/from_bytes.rs | Updated tests to use raw signature bytes, added validation of accessor methods, and added new test for author_method_returns_trimmed_signature() |
| gix-object/src/tag/write.rs | Changed serialization to write raw bytes directly instead of using signature formatting |
| gix-object/src/tag/mod.rs | Added tagger() method that parses raw bytes on-demand and changed into_owned() to return Result |
| gix-object/src/tag/decode.rs | Changed parsing to use signature_with_raw helper that captures both parsed signature and raw bytes |
| gix-object/src/parse.rs | Added signature_with_raw() helper function that returns both parsed signature and raw input bytes |
| gix-object/src/object/mod.rs | Changed into_owned() and to_owned() methods to return Result due to fallible conversions |
| gix-object/src/object/convert.rs | Changed From implementations to TryFrom for TagRef, CommitRef, and ObjectRef conversions with signature parsing |
| gix-object/src/lib.rs | Changed CommitRef fields author and committer, and TagRef field tagger to store raw bytes instead of parsed signatures |
| gix-object/src/commit/write.rs | Changed serialization to write raw bytes directly instead of using signature formatting |
| gix-object/src/commit/mod.rs | Added parse_signature() helper and changed author(), committer(), and time() methods to return Result; changed into_owned() and to_owned() to return Result |
| gix-object/src/commit/decode.rs | Changed parsing to use signature_with_raw helper that captures both parsed signature and raw bytes |
| gix-merge/src/commit/virtual_merge_base.rs | Added error type for decode errors and updated conversion to use try_from() |
| examples/log.rs | Updated to use the new author() method with error handling |
gix-object/src/tag/decode.rs
Outdated
| parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| { | ||
| let _ = signature; | ||
| raw | ||
| }) |
Copilot
AI
Nov 30, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The let _ = signature; pattern is used to explicitly discard the parsed signature value. Using an underscore pattern _ in the closure parameter would be more idiomatic:
parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw)| parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(signature, raw)| { | |
| let _ = signature; | |
| raw | |
| }) | |
| parse::header_field(i, b"tagger", parse::signature_with_raw).map(|(_, raw)| raw) |
Note that this means you have to call `CommitRef::commiter|author()?` and `TagRef::tagger()?` instead of assuming pre-parsed fields. This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name <email> slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios.
f547dcc to
797c6c5
Compare
797c6c5 to
6f7b23a
Compare
Byron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, this will work :)!
I also added an assertion to show that #2177 is now indeed fixed.
This PR makes signature handling truly lossless for "creative" emails and other info. We now stash the raw name slice on IdentityRef/SignatureRef and fall back to it when rewriting, so even commits with embedded angle brackets round-trip cleanly (might want to expand to other malformed characters before merging? idk). Parsing and serialization honor that flag but still keep strict validation for normal input. I also added regression coverage for these scenarios. Might not be the most elegant solution, as now every ctor/helper that builds signatures or identities explicitly sets raw: None. This is only me taking a stab at it for fun. It is not prod ready but just an idea.
Tries to help with #2177.