Skip to content

Conversation

@dcoutts
Copy link
Collaborator

@dcoutts dcoutts commented Nov 6, 2024

PR designed to be reviewed patch by patch.

Overall, this is a refactoring in preparation for changes to the ref counting API. The ref counting change will be about making things follow a model of there being singular references, rather than manipulating raw reference counts. This changes the BlobRef representation so that it follows this style: it now holds a reference to a BlobFile (which is itself a reference counted object). Previously it was more ad-hoc. Also more cleanly separate three kinds of reference: raw, weak and strong.

@dcoutts dcoutts changed the title Refactor blob refscoutts/blob ref blob file refactor Refactor blob refs Nov 6, 2024
@dcoutts dcoutts force-pushed the dcoutts/blob-ref-blob-file-refactor branch from 20c2161 to 102597d Compare November 6, 2024 23:03
Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I agree this makes the handling of blob references nicer. Some suggestions:

@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 7, 2024

Thanks for the detailed review @jorisdral !

Copy link
Collaborator

@mheinzel mheinzel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense and gives a nice foundation for the upcoming changes. Nice! Not much to add to Joris' comments.

@dcoutts dcoutts force-pushed the dcoutts/blob-ref-blob-file-refactor branch from 102597d to 3bd6b7e Compare November 8, 2024 12:51
@dcoutts dcoutts requested a review from jorisdral November 8, 2024 12:52
@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 8, 2024

@jorisdral I think everything is addressed.

Copy link
Collaborator

@jorisdral jorisdral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just #454 (comment) to address, and then let's merge it

@dcoutts dcoutts force-pushed the dcoutts/blob-ref-blob-file-refactor branch 2 times, most recently from 4c1de9c to 1fdd309 Compare November 8, 2024 15:10
@dcoutts
Copy link
Collaborator Author

dcoutts commented Nov 8, 2024

Done.

@dcoutts dcoutts enabled auto-merge November 8, 2024 15:11
@dcoutts dcoutts added this pull request to the merge queue Nov 8, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Nov 8, 2024
Not yet used in this patch.

Also move the BlobSpan defintion to this module.
Where before it contained a file handle and a ref counter, it now just
contains a BlobFile (which itself is the pair of a handle and counter).
Where before it contained a file handle and a ref counter, it now just
contains a BlobFile (which itself is the pair of a handle and counter).
data BlobRef m h previously contained:
      blobRefFile  :: !h
which meant allmost all uses of BlobRef had to be BlobRef m (Handle h)

Now we make that internal:
      blobRefFile  :: !(FS.Handle h)
and so all uses are now simply BlobRef m h

This continues with the trend to avoid having to use (Handle h)
everywhere.

This is a large but simple patch, that just deals with the fallout of
this local change.
To better reflect that it does not itself maintain a reference count,
and to distinguish it from a new type we will introduce soon: a strong
blob ref.

Then we will have a clear type distinction: raw, weak and strong.
Originally it contained the file handle and the refcounter of either the
Run or write buffer that the blob ref came from (pointed to).

In a previous patch we changed the Run and WriteBufferBlobs to
themselves contain an independently ref-counted BlobFile, and so at the
same time the blob ref refcount refers to the blob file refcount not
the refcount of the parent object.

In this patch we now change RawBlobRef to directly hold a BlobFile. This
makes the representation logically more uniform between the run and
write buffer cases: since we now just point to the BlobFile being held
by each blob source.

This refactoring will make things much clearer once we switch the
representation of ref-counted objects. Since we'll maintain a reference
to a BlobFile rather than a lower-level combo of a file handle and
reference count.
Previously a RawBlobRef served dual purpose for raw and strong. Keep
that distinction clear.

For the moment, all three have equivalent representations, but this is
likely to change in a later refactoring of the reference counting API.

Also split the functions for making blob refs from runs or the write
buffer into the two variants that we use: raw (internally) and weak
(externally).
The only uses of BlobRef.readBlob were in the context of withWeakBlobRef
so we can go straight to a helper function that does just that.

Also remove now-unused WriteBufferBlobs.readBlob
Instead of open-coding the block IO stuff in the top level Internal
module in retrieveBlobs, it is moved next to BlobRef.readWeakBlobRef, so
we have the singular and bulk versions next to each other.
We don't actually need to export the StrongBlobRef type since all uses
of it are temporary, for doing I/O.
Noticed while refactoring
Currently, pre-snapshots, the Run does _not_ delete its blob files when
the run is closed, but the WriteBufferBlobs _does_ delete its blob file.

Insert a temporary hack to accomodate this. This commit can be reverted
as part of implementing snapshots properly.
* Rename readBlobFile to readBlob
* Change newBlobFile to openBlobFile, also now responsible for opening the
  file, rather than adopting an existing file handle.
* Update comments
* Add specialise pragmas
* Drop unused language pragmas
It's a bit harder to provide something for RunBuilder since it has its
own infrastructure for updating CRCs along with writing.
@dcoutts dcoutts force-pushed the dcoutts/blob-ref-blob-file-refactor branch from 1fdd309 to 1c2269e Compare November 8, 2024 16:06
@dcoutts dcoutts enabled auto-merge November 8, 2024 16:06
@dcoutts dcoutts added this pull request to the merge queue Nov 8, 2024
Merged via the queue into main with commit 01b734f Nov 8, 2024
24 checks passed
@dcoutts dcoutts deleted the dcoutts/blob-ref-blob-file-refactor branch November 8, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants