Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: gcs storage driver #4120

Merged
merged 1 commit into from Nov 17, 2023

Conversation

milosgajdos
Copy link
Member

@milosgajdos milosgajdos commented Oct 20, 2023

This refactors GCS storage driver from the ground up and makes it a bit more consistent with the rest of the storage drivers.

Notably, the writer now contains a reference to the driver in a similar fashion the S3 writer is implemented.

Some of the "helper" funcs were removed and others were made methods on specific GCS driver objects. This simplifies the code quite a bit and gets it more aligned with other storage driver implementations in this repository.

NOTE: There is more work to be done but I figured this could be a first step hence the PR is marked as a DRAFT at the moment

CC: @flavianmissi @jmontleon @kaovilai since you were the OG authors of the recent GCS driver update. Would you mind having a look and if you have a way to test it somewhere in GCS that'd be grand. I've only done some local "simulation" tests.

Copy link
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen the (storage.Writer).ChunkSize field? It sounds like all the buffering, chunking and retry code in the driver could be replaced with setting wc.ChunkSize = d.chunkSize!

registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
Comment on lines 380 to +364
closed bool
cancelled bool
committed bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This smells like an ad-hoc state machine. Are all eight combinations of these flags valid states? Are all transitions valid? You might want to look into whether this code could benefit from one of the state machine design patterns. (The state-space is small enough that importing an FSM or state-chart library would be massive overkill.) Even just sketching out a state diagram is really helpful, in my experience.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree. Right now this basically follows the same FSM established in the S3 driver. I'd like to zoom in on that at some point, too.

registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
@milosgajdos
Copy link
Member Author

I've fixed the first batch of bugs and can report I successfully tested this on my personal GCS bucket 😄

$ docker push localhost:5000/distribution/hello-world
Using default tag: latest
The push refers to repository [localhost:5000/distribution/hello-world]
01bb4fce3eb1: Pushed
latest: digest: sha256:7e9b6e7ba2842c91cf49f3e214d04a7a496f8214356f41d81a6e6dcad11f11e3 size: 525

@milosgajdos milosgajdos marked this pull request as ready for review October 23, 2023 22:43
@corhere
Copy link
Collaborator

corhere commented Oct 24, 2023

Now I see that the chunking built into the Google Cloud SDK does not coalesce short writes. It only breaks up long writes, but that still covers half of what our own buffering is useful for. And we don't have to opt in as the ChunkSize defaults to 16 MiB. bufio.Writer could cover the other half: it coalesces short writes and passes through long ones. I think there's easily at least delta -60 lines to be found by replacing our own buffering with off-the-shelf parts. Is there something unique that rolling our own buffering affords for us that I'm not seeing?

@milosgajdos
Copy link
Member Author

milosgajdos commented Oct 24, 2023

Is there something unique that rolling our own buffering affords for us that I'm not seeing?

I dont think so. GCS driver has always been marked as experimental historically - that's also the reason why building it requires explicitly opting in by specifying a build tag.

The code was written by someone in the community and was never cleaned up; I'm attempting to do it but it's a slow burn because I'm not yet very familiar with the GCS SDK and can only test it against my personal GCP account 😅

But at least things are starting to work now and I can finally iterate on them. Way more to be done here.

@flavianmissi
Copy link
Contributor

flavianmissi commented Oct 24, 2023

Running gcs tests with code from this branch yields hundreds of "already closed" errors, here's the full output (down below I paste the output of a run on master as well for reference).

$ go test -timeout=60m -tags=include_gcs -v ./registry/storage/driver/gcs/...
=== RUN   Test

----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1013: DriverSuite.TestConcurrentFileStreams

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1254:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:391: DriverSuite.TestContinueStreamAppendLarge

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:396:
    suite.testContinueStreamAppend(c, chunkSize)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:452:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:401: DriverSuite.TestContinueStreamAppendSmall

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:402:
    suite.testContinueStreamAppend(c, int64(32))
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:452:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:275: DriverSuite.TestWriteReadLargeStreams

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:297:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:235: DriverSuite.TestWriteReadStreams1

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:238:
    suite.writeReadCompareStreams(c, filename, contents)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1290:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:243: DriverSuite.TestWriteReadStreams2

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:246:
    suite.writeReadCompareStreams(c, filename, contents)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1290:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:251: DriverSuite.TestWriteReadStreams3

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:254:
    suite.writeReadCompareStreams(c, filename, contents)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1290:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:259: DriverSuite.TestWriteReadStreams4

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:262:
    suite.writeReadCompareStreams(c, filename, contents)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1290:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:267: DriverSuite.TestWriteReadStreamsNonUTF8

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:270:
    suite.writeReadCompareStreams(c, filename, contents)
/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:1290:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:534: DriverSuite.TestWriteZeroByteContentThenAppend

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:556:
    c.Assert(err, check.IsNil)
... value driver.PathNotFoundError = driver.PathNotFoundError{Path:"var/folders/p0/c1z2jht90l99pg75kvywwyfw0000gn/T/driver-4129983992/u-zpwjhw.yhgt/9zyu.xvl/p-e1.0iu", DriverName:"gcs"} ("gcs: Path not found: var/folders/p0/c1z2jht90l99pg75kvywwyfw0000gn/T/driver-4129983992/u-zpwjhw.yhgt/9zyu.xvl/p-e1.0iu")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:476: DriverSuite.TestWriteZeroByteStreamThenAppend

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:490:
    c.Assert(err, check.IsNil)
... value *errors.errorString = &errors.errorString{s:"already closed"} ("already closed")

OOPS: 25 passed, 11 FAILED
--- FAIL: Test (1003.73s)
=== RUN   TestCommitEmpty
    gcs_test.go:132: writer.Close: unexpected error: already closed
--- FAIL: TestCommitEmpty (0.43s)
=== RUN   TestCommit
    gcs_test.go:179: writer.Close: unexpected error: already closed
--- FAIL: TestCommit (3.74s)
=== RUN   TestRetry
--- PASS: TestRetry (28.83s)
=== RUN   TestEmptyRootList
--- PASS: TestEmptyRootList (0.60s)
=== RUN   TestMoveDirectory
--- PASS: TestMoveDirectory (0.52s)
FAIL
FAIL	github.com/distribution/distribution/v3/registry/storage/driver/gcs	1038.991s

Not all the tests are passing on the main branch either - for reference, here's the output of running the tests on main:

$ go test -timeout=60m -tags=include_gcs -v ./registry/storage/driver/gcs/...
=== RUN   Test

----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:534: DriverSuite.TestWriteZeroByteContentThenAppend

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:556:
    c.Assert(err, check.IsNil)
... value driver.PathNotFoundError = driver.PathNotFoundError{Path:"/1b8.va_2flb2e-kf_nq_wnj6nw/e/uk", DriverName:"gcs"} ("gcs: Path not found: /1b8.va_2flb2e-kf_nq_wnj6nw/e/uk")


----------------------------------------------------------------------
FAIL: /Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:476: DriverSuite.TestWriteZeroByteStreamThenAppend

/Users/flavian/go/src/github.com/openshift/docker-distribution/registry/storage/driver/testsuites/testsuites.go:507:
    c.Assert(err, check.IsNil)
... value driver.PathNotFoundError = driver.PathNotFoundError{Path:"/f85-g-6um9/kw8n-wh6ao2r.yx-w/e2", DriverName:"gcs"} ("gcs: Path not found: /f85-g-6um9/kw8n-wh6ao2r.yx-w/e2")

OOPS: 34 passed, 2 FAILED
--- FAIL: Test (1848.90s)
=== RUN   TestCommitEmpty
--- PASS: TestCommitEmpty (0.49s)
=== RUN   TestCommit
--- PASS: TestCommit (1.77s)
=== RUN   TestRetry
--- PASS: TestRetry (28.47s)
=== RUN   TestEmptyRootList
--- PASS: TestEmptyRootList (0.61s)
=== RUN   TestMoveDirectory
--- PASS: TestMoveDirectory (0.52s)
FAIL
FAIL	github.com/distribution/distribution/v3/registry/storage/driver/gcs	1882.070s
FAIL

@milosgajdos
Copy link
Member Author

milosgajdos commented Oct 24, 2023

I need to look into them 😬 I've been testing this mostly by simpe docker pull and docker push directly into my GCS bucket.

I have changed some code around how the driver is marked as closed....so that needs looking into

@flavianmissi what do you use for these tests? Do you just point them into an actual GCS package or do you use some nice tool that let you run them locally?

@kaovilai
Copy link
Contributor

@kaovilai
Copy link
Contributor

I'd just create a mock gcp client
https://github.com/googleapis/google-cloud-go/blob/main/testing.md#testing-using-mocks

another tool I've seen used in OSS is mockery https://vektra.github.io/mockery/latest/#why-mockery

@milosgajdos
Copy link
Member Author

milosgajdos commented Oct 24, 2023

Mocks are cute and handy for unit tests, but not a replacement for integration tests 😄

I'm interested in the latter

@milosgajdos
Copy link
Member Author

@flavianmissi the close errors should now be addressed.

@corhere Go cloud API SDK still doesn't seem to provide "native" support for resumable upload. I was searching their GH repo and came across googleapis/google-cloud-go#1224 but it's 4 years old and as good as dead.

This is as far as I could clean it up using the SDK. PTAL.

@milosgajdos milosgajdos changed the title [WIP] refactor: gcs storage driver refactor: gcs storage driver Oct 24, 2023
@corhere
Copy link
Collaborator

corhere commented Oct 24, 2023

@milosgajdos the SDK uses resumable uploads internally to power chunking and retrying. That doesn't help when the upload needs to be resumed appended to from another host, of course. Could we use compose operations to append to uploads as suggested in the issue you linked, instead of resuming the upload session?

@milosgajdos
Copy link
Member Author

Could we use compose operations to append to uploads as suggested in the issue you linked, instead of resuming the upload session?

I was skimming through docs and on the first and very short look I couldnt tell. I'd need to spend a bit more time with it. Worth exploring indeed..

@corhere
Copy link
Collaborator

corhere commented Oct 25, 2023

Thinking about it some more, we'd have to fundamentally redesign how the gcs driver implements append-writes in order to use compose operations. That's way bigger than "just" a refactor. Maybe it's even big enough of a change to justify forking the driver. Regardless, it would be massive scope creep for this PR so I'm going to shut up and review this PR as is.

registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
@milosgajdos
Copy link
Member Author

I've accepted a bunch of suggestions on the first look but will give this a proper look tomorrow. I've also realized that GCS writer doesn't have a reference to driver's context like the S3 driver has...so I'll add that in so we can call w.writeChunk(w.ctx), etc.

@milosgajdos
Copy link
Member Author

Some fantastic optimisation gains, @corhere. There is still the question of that Read comment you made. I need to have a think about it. In the meantime, would you mind taking another pass at this?

@milosgajdos
Copy link
Member Author

Alright, PTAL @corhere

Copy link
Collaborator

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The (driver).gcs field is only ever used for one thing: d.gcs.Bucket(d.bucket). And with two exceptions, the (driver).bucket field is only used in the aforementioned expressions. A *BucketHandle field on the driver could take the place of the gcs and bucket fields, much like what we did with the writer and *ObjectHandle! See inline comments for my suggestion to elide one of the d.bucket exceptions. The other exception, in URLFor, can be elided by switching to func (*BucketHandle) SignedURL. Also, by only constructing the BucketHandle once per driver instance, we would only have to apply the retryer config in one code location!

registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
registry/storage/driver/gcs/gcs.go Outdated Show resolved Hide resolved
@milosgajdos
Copy link
Member Author

Anything else you want me to address on this PR @corhere ?

@corhere
Copy link
Collaborator

corhere commented Oct 31, 2023

@milosgajdos just this:

wrap the putContent calls in Cancel() and Close() with retry()

The Google Cloud SDK will not retry those requests as uploads without preconditions are not idempotent and the default retry config only retries idempotent requests.

@milosgajdos
Copy link
Member Author

Aaaaah, right. Sorry I'm in a different timezone now and now it's coming back to me 🙃

@milosgajdos
Copy link
Member Author

PTAL @Jamstah @wy65701436 @thaJeztah

@corhere
Copy link
Collaborator

corhere commented Nov 6, 2023

@milosgajdos what's your plan for merging this PR? Are you planning to squash it all down to a single commit, or clean up the history with rebase?

@milosgajdos
Copy link
Member Author

@milosgajdos what's your plan for merging this PR? Are you planning to squash it all down to a single commit, or clean up the history with rebase?

Yes, gonna squash today. I have been leaving it unsquashed both because I was lazy but also was hoping more people would chip in. I'll squash and chase folks

@milosgajdos
Copy link
Member Author

milosgajdos commented Nov 6, 2023

Ok, squashed.

PTAL @thaJeztah @Jamstah @squizzi @wy65701436 🙏

@@ -49,11 +48,13 @@ const (
driverName = "gcs"
dummyProjectID = "<unknown>"

minChunkSize = 256 * 1024
defaultChunkSize = 16 * 1024 * 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default was changed, was that intentional? (if so, that may warrant it to be put in a separate commit, outlining why it was changed, and what the new value was based on).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we set it to the default ChunkSize as defined in the docs https://pkg.go.dev/cloud.google.com/go/storage#Writer (see the ChunkSize comment).
I can rip this out into a separate commit but I'm not sure it's worth it. We're doing a major overhaul in this PR that changes things almost from the grounds up - by the same token we'd have to separate a lot of changes to dedicated commits on a branch that's not been released yet 🤷‍♂️

res.Body.Close()
var status *googleapi.Error
if errors.As(err, &status) {
if status.Code == http.StatusNotFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a nit, but while we're at it, perhaps change this block to a switch instead of multiple if statements

switch status.Code {
case http.StatusNotFound:
// ... etc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: d461472

res.Body.Close()
obj, err := d.storageStatObject(ctx, path)
if status.Code == http.StatusRequestedRangeNotSatisfiable {
attrs, err := obj.Attrs(ctx)
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the other errors are converted to a specific error-type (storagedriver.InvalidOffsetError, storagedriver.PathNotFoundError); is any conversion needed for this case? (is a specific error-type expected for this, or is that already handled by obj.Attrs?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a generic error that can happen when we fail to unpack object attributes: https://pkg.go.dev/cloud.google.com/go/storage#ObjectHandle.Attrs. It can return ErrObjectNotExist which we could return as PathNotFoundError though that'd have been handled in the earlier check via the http.StatusNotFound check 🤔

if res.Header.Get("Content-Type") == uploadSessionContentType {
defer res.Body.Close()
if r.Attrs.ContentType == uploadSessionContentType {
defer r.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was existing code, so perhaps no need to change, but looks like we don't need a defer here, because we're returning immediately, so this could be just r.Close()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, but mind you, the return statement is right below it 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There ya go: 591271b

@@ -596,7 +577,7 @@ func retry(req request) error {
}

status, ok := err.(*googleapi.Error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new code, so perhaps for a follow-up, but maybe we should rewrite this to errors.As

}
req.Header.Set("Content-Type", blobContentType)
if from == to+1 {
req.Header.Set("Content-Range", fmt.Sprintf("bytes */%v", size))
Copy link
Member

@thaJeztah thaJeztah Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size is a string; could safe some cycles by using a straight concat here;

req.Header.Set("Content-Range", "bytes */" + size)

Or otherwise make it explicit that it's a string;

req.Header.Set("Content-Range", fmt.Sprintf("bytes */%s", size))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: 9b3043c

if from == to+1 {
req.Header.Set("Content-Range", fmt.Sprintf("bytes */%v", size))
} else {
req.Header.Set("Content-Range", fmt.Sprintf("bytes %v-%v/%v", from, to, size))
Copy link
Member

@thaJeztah thaJeztah Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here;

req.Header.Set("Content-Range", fmt.Sprintf("bytes %d-%d/%s", from, to, size))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 9b3043c


resp, err := client.Do(req)
bytesPut := int64(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps;

var bytesPut int64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually prefer explicit assignment to the default Go init values 😄

Comment on lines -893 to +868
if totalSize < 0 && resp.StatusCode == 308 {
if totalSize < 0 && resp.StatusCode == http.StatusPermanentRedirect {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was added in 7162cb1 (#1438), but I actually had to Google to see what's the difference between a 308 and 301;

The main difference between the 301 and 308 redirects is that when a 308 redirect code is specified, the client must repeat the exact same request (POST or GET) on the target location. For 301 redirect, the client may not necessarily follow the exact same request.

So the difference here would be that a 308 MUST preserve the request type (i.e. POST or GET), but "exact same request" (my interpretation) would also mean "preserve all headers", which MAY include authentication headers, correct?

Wondering if (in a follow-up) we should explicitly call out why a 308 is used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the difference here would be that a 308 MUST preserve the request type (i.e. POST or GET), but "exact same request" (my interpretation) would also mean "preserve all headers", which MAY include authentication headers, correct?

That's my assumption, too.

Comment on lines 649 to 676
if err == iterator.Done {
break
}
if err != nil {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see my other comment; we should be consistent, and pick one approach for these;

if err != nil {
	if err == iterator.Done {
		break
	}
	return nil, err
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, nice catch 77624e7

@milosgajdos
Copy link
Member Author

milosgajdos commented Nov 14, 2023

Think I've addressed all your comments @thaJeztah PTAL

Seems like GH actions were having a bad time yesterday. Re-ran the failing actions and we're back to good now.

With the latest update, we're on par with what's currently on main as long as integration tests are concerned -- as @flavianmissi reports above (I've verified this by running integration tests in GCP):

Test Output
$ GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/application_default_credentials.json" REGISTRY_STORAGE_GCS_BUCKET="distributiontest" go test -timeout=10m -tags=include_gcs -v  ./registry/storage/driver/gcs/...
=== RUN   Test

----------------------------------------------------------------------
FAIL: /home/milosgajdos/distribution/registry/storage/driver/testsuites/testsuites.go:535: DriverSuite.TestWriteZeroByteContentThenAppend

/home/milosgajdos/distribution/registry/storage/driver/testsuites/testsuites.go:557:
    c.Assert(err, check.IsNil)
... value driver.PathNotFoundError = driver.PathNotFoundError{Path:"tmp/driver-2184697195/n-r-3/rww9h/z-78n/p-ntxaec_k99y", DriverName:"gcs"} ("gcs: Path not found: tmp/driver-2184697195/n-r-3/rww9h/z-78n/p-ntxaec_k99y")


----------------------------------------------------------------------
FAIL: /home/milosgajdos83/distribution/registry/storage/driver/testsuites/testsuites.go:477: DriverSuite.TestWriteZeroByteStreamThenAppend

/home/milosgajdos/distribution/registry/storage/driver/testsuites/testsuites.go:508:
    c.Assert(err, check.IsNil)
... value driver.PathNotFoundError = driver.PathNotFoundError{Path:"tmp/driver-2184697195/u9-0xnjsdq5w9/ujo/h/bpnzw4/x_71", DriverName:"gcs"} ("gcs: Path not found: tmp/driver-2184697195/u9-0xnjsdq5w9/ujo/h/bpnzw4/x_71")

OOPS: 34 passed, 2 FAILED
--- FAIL: Test (152.24s)
=== RUN   TestCommitEmpty
--- PASS: TestCommitEmpty (0.09s)
=== RUN   TestCommit
--- PASS: TestCommit (0.47s)
=== RUN   TestRetry
--- PASS: TestRetry (28.00s)
=== RUN   TestEmptyRootList
--- PASS: TestEmptyRootList (0.12s)
=== RUN   TestMoveDirectory
--- PASS: TestMoveDirectory (0.12s)
FAIL
FAIL    github.com/distribution/distribution/v3/registry/storage/driver/gcs     181.535s
FAIL

I have some ideas about why the two tests fail, but I'd have to verify that theory first. For now, we're good.

PTAL @thaJeztah @Jamstah

@milosgajdos
Copy link
Member Author

milosgajdos commented Nov 17, 2023

Ok, I figured out the cause of the failing tests 🎉 CC: @flavianmissi all the tests are now passing on this branch 😄

Test Output
GOOGLE_APPLICATION_CREDENTIALS="$HOME/.config/application_default_credentials.json" REGISTRY_STORAGE_GCS_BUCKET="distributiontest" go test -timeout=10m -tags=include_gcs -v  ./registry/storage/driver/gcs/...
=== RUN   Test
OK: 36 passed
--- PASS: Test (151.72s)
=== RUN   TestCommitEmpty
--- PASS: TestCommitEmpty (0.09s)
=== RUN   TestCommit
--- PASS: TestCommit (0.42s)
=== RUN   TestRetry
--- PASS: TestRetry (28.75s)
=== RUN   TestEmptyRootList
--- PASS: TestEmptyRootList (0.11s)
=== RUN   TestMoveDirectory
--- PASS: TestMoveDirectory (0.09s)
PASS
ok      github.com/distribution/distribution/v3/registry/storage/driver/gcs     181.748s

PTAL @thaJeztah @wy65701436 @squizzi @Jamstah

@Jamstah
Copy link
Collaborator

Jamstah commented Nov 17, 2023

Does the current code work against GCS? If not then I'll happily approve this.

If it does, then I think testing is the big thing here. If we haven't tested against GCS then we shouldn't merge.

@Jamstah
Copy link
Collaborator

Jamstah commented Nov 17, 2023

Although it looks like here - you did sign up for a bucket?

#4120 (comment)

@milosgajdos
Copy link
Member Author

Yeah, it's tested on my personal GCP account.

This a proper integration test result. No LARP 😁

Copy link
Collaborator

@Jamstah Jamstah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks squeaky clean, haven't tested myself but confident in the testing performed.

@milosgajdos
Copy link
Member Author

Would be nice to figure out integration tests across all major cloud providers.

I'll open a ticket with CNCF. Maybe they have some cloud credits we could (ab)use.

@milosgajdos
Copy link
Member Author

Going to squash and then merge.

This commit refactors the GCS storage driver from the ground up and makes
it more consistent with the rest of the storage drivers.

We are also fixing GCS authentication using default app credentials:
When the default application credentials are used we don't initialize the
GCS storage client which then panics.

Co-authored-by: Cory Snider <corhere@gmail.com>
Signed-off-by: Milos Gajdos <milosthegajdos@gmail.com>
@milosgajdos milosgajdos merged commit 9610a1e into distribution:main Nov 17, 2023
15 checks passed
@milosgajdos milosgajdos deleted the gcs-refactoring branch November 17, 2023 13:06
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge LGTM

sorry was on PTO for a few days, and it was hard to look from my phone 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants