-
Notifications
You must be signed in to change notification settings - Fork 9
Refactor blob refs #454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor blob refs #454
Conversation
20c2161 to
102597d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I agree this makes the handling of blob references nicer. Some suggestions:
|
Thanks for the detailed review @jorisdral ! |
mheinzel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense and gives a nice foundation for the upcoming changes. Nice! Not much to add to Joris' comments.
102597d to
3bd6b7e
Compare
|
@jorisdral I think everything is addressed. |
jorisdral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just #454 (comment) to address, and then let's merge it
4c1de9c to
1fdd309
Compare
|
Done. |
Not yet used in this patch. Also move the BlobSpan defintion to this module.
Where before it contained a file handle and a ref counter, it now just contains a BlobFile (which itself is the pair of a handle and counter).
Where before it contained a file handle and a ref counter, it now just contains a BlobFile (which itself is the pair of a handle and counter).
data BlobRef m h previously contained:
blobRefFile :: !h
which meant allmost all uses of BlobRef had to be BlobRef m (Handle h)
Now we make that internal:
blobRefFile :: !(FS.Handle h)
and so all uses are now simply BlobRef m h
This continues with the trend to avoid having to use (Handle h)
everywhere.
This is a large but simple patch, that just deals with the fallout of
this local change.
To better reflect that it does not itself maintain a reference count, and to distinguish it from a new type we will introduce soon: a strong blob ref. Then we will have a clear type distinction: raw, weak and strong.
Originally it contained the file handle and the refcounter of either the Run or write buffer that the blob ref came from (pointed to). In a previous patch we changed the Run and WriteBufferBlobs to themselves contain an independently ref-counted BlobFile, and so at the same time the blob ref refcount refers to the blob file refcount not the refcount of the parent object. In this patch we now change RawBlobRef to directly hold a BlobFile. This makes the representation logically more uniform between the run and write buffer cases: since we now just point to the BlobFile being held by each blob source. This refactoring will make things much clearer once we switch the representation of ref-counted objects. Since we'll maintain a reference to a BlobFile rather than a lower-level combo of a file handle and reference count.
Previously a RawBlobRef served dual purpose for raw and strong. Keep that distinction clear. For the moment, all three have equivalent representations, but this is likely to change in a later refactoring of the reference counting API. Also split the functions for making blob refs from runs or the write buffer into the two variants that we use: raw (internally) and weak (externally).
The only uses of BlobRef.readBlob were in the context of withWeakBlobRef so we can go straight to a helper function that does just that. Also remove now-unused WriteBufferBlobs.readBlob
Instead of open-coding the block IO stuff in the top level Internal module in retrieveBlobs, it is moved next to BlobRef.readWeakBlobRef, so we have the singular and bulk versions next to each other.
We don't actually need to export the StrongBlobRef type since all uses of it are temporary, for doing I/O.
Noticed while refactoring
Currently, pre-snapshots, the Run does _not_ delete its blob files when the run is closed, but the WriteBufferBlobs _does_ delete its blob file. Insert a temporary hack to accomodate this. This commit can be reverted as part of implementing snapshots properly.
* Rename readBlobFile to readBlob * Change newBlobFile to openBlobFile, also now responsible for opening the file, rather than adopting an existing file handle. * Update comments * Add specialise pragmas * Drop unused language pragmas
It's a bit harder to provide something for RunBuilder since it has its own infrastructure for updating CRCs along with writing.
1fdd309 to
1c2269e
Compare
PR designed to be reviewed patch by patch.
Overall, this is a refactoring in preparation for changes to the ref counting API. The ref counting change will be about making things follow a model of there being singular references, rather than manipulating raw reference counts. This changes the BlobRef representation so that it follows this style: it now holds a reference to a BlobFile (which is itself a reference counted object). Previously it was more ad-hoc. Also more cleanly separate three kinds of reference: raw, weak and strong.