You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The loose header manipulation currently lives in git-pack but it
depends on nothing *of* git-pack.
Move it to git-object (`::encode` and `::decode`), as well as update
the way it's generated: a header should be no more than 28 bytes: 6
bytes for the kind, 20 for the size (log10(2**64) = 19.3), 1 for the
space and 1 for the NUL, so return a `SmallVec<[u8, 28]>` instead of
writing it out directly (this should slightly reduce the amount of IO
when writing out the header to an unbuffered stream), this also avoids
the need for `header_buf` in the traversal state.
In order to generate the header without having to write out the entire
content, add `WriteTo::size()`, the size is (relatively) easy to
compute for all kinds. Ultimately this should avoid having to buffer
or move around the object data before generating the header (though
that's a bit TBD, I don't remember making those changes in git-pack).
This also requires adding size computations to `git_actor::Signature`
and `git_actor::Time`. For the latter the result should be reasonably
efficient[^bithack]. If the time part gets moved to 64b, this should
probably be updated to use a lookup table[^lookup] or even better
`u64::log10` as hopefully it'll be stable by then[^70887].
Also add direct shortcuts to `WriteTo` (to generate a loose header)
and `ObjectRef` (to directly parse a loose object).
Others:
* Lift conversion and header encoding at the start of
`Sink::write_stream` as they don't seem to depend on the hash kind.
* Alter `loose::Store` to inlin `write_header` but extract the
creation of the output stream, this reveals that the dispatch on
hash kind wasn't useful anymore as only the creation of the stream
needs it (aside from `finalize_object`).
One concern I found when making this change is things are kinda broken
wrt 32/64b: the existing code is a bit mixed between usize and u64,
but tends to store data in buffers which work by usize anyway.
But of course using usize means broken / corrupted files > 4GB on 32b
platforms, which is not great either.
Then again git itself has the same issue except even worse: it uses
`unsigned long` internally, which is not only 32b on 32b platforms
(ILP32) but also on 64 bits windows (LLP64)...
Final note: the `cargo fmt` for commits and tags is *really bad* as it
puts one sub-expression per line, a better alternative might be to
provide size-only and `[inline]` versions of the encoding helpers?
[^bithack]: it's considered a good solution when the input is
uniformly distributed
(http://graphics.stanford.edu/~seander/bithacks.html#IntegerLog10Obvious),
and for us the input is biased toward efficiency: timestamp 1e9 is
September 2001, and timestamp 1e8 is March 1973, so the vast
majority of legit commits will take the first branch, and only
joke commits will fail the second really.
[^lookup]: https://commaok.xyz/post/lookup_tables/ however these solve
the problem for u32
[^70887]: rust-lang/rust#70887 and
efficiency improvements could be contributed to the stdlib
directly.
0 commit comments