This repository has been archived by the owner on Aug 28, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 266
Implement chunking for blobs, at least #17
Comments
Note that this resolver thing is also needed to implement incremental decoding (issue #11) |
I'm making good progress... As a first step I'm only doing one level of nesting for the compound blobs. Making it into a tree is going to be similar to how we would do chunking for lists etc. |
This was referenced Jul 31, 2015
arv
referenced
this issue
in arv/noms-old
Aug 4, 2015
this point the compoundBlob only contains blob leafs but a future change will create multiple tiers. Both these implement the new Blob interface. The splitting is done by using a rolling hash over the last 64 bytes, when that hash ends with 13 consecutive ones we split the data. Issue #17
Merged
arv
referenced
this issue
in arv/noms-old
Aug 4, 2015
This adds a test to ensure that we generate the same blob leafs when we prepend and append to the data. Issue #17
arv
referenced
this issue
in arv/noms-old
Aug 4, 2015
This is similar to io.MultiReader but it does not deref the Future until needed. Issue #17
arv
referenced
this issue
in arv/noms-old
Aug 4, 2015
This is in preparation for Seek Issue #155, #17
arv
referenced
this issue
in arv/noms-old
Aug 5, 2015
- Put the length last - Skip the initial 0 since first blob is always at 0 Issue #17
arv
referenced
this issue
in arv/noms-old
Aug 7, 2015
The json serialization now only contains the length of each individual blob child. The go representation of this still uses offsets but the offsets are for the end delimiter. For "hi" "bye" we get {"cb", [{"ref": "sha1-hi"}, 2, {"ref": "sha1-bye"}, 3]} compoundBlob{[2, 5], [sha1-hi, ,sha1-bye]} Keeping the length in the serialization leads to smaller serializations Using the end offset leads to simpler binary search and allows us to use the last entry as the length. Issue #17
arv
referenced
this issue
in arv/noms-old
Sep 3, 2015
After a compound blob is created we try to chunk it again in a similar way to how we chunk Lists. We use the refs of the sub blob and compute a rolling hash over these. If the hash matches a pattern then we split the existing compound blob into a new compound blob with sub blobs which are slices of the original compound blob. Issue #17
This is done. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
We want to split large pieces of data up into chunks. Initially, we'll do this for blobs, but probably want to expand beyond that.
We'll use a rolling hash function to figure out where to chunk things, and will need to deal with chunks that are still too big.
A blob value will really just contain some list of Refs to the chunks that make it up, and the Ref of a blob will be some kind of hash-of-hashes thing. This does introduce some complexity to the Reader() of a blob, since it now needs to find and decode all the chunks that make up the blob. Aaron proposes a "Resolver" interface that all composite types will have a pointer to, which looks like this:
type Resolver struct{}
func (r *Resolver) Resolve(r ref.Ref) Value
The text was updated successfully, but these errors were encountered: