-
Notifications
You must be signed in to change notification settings - Fork 205
Added implementation for Google Cloud Storage with REST #1645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54c2607 to
2b7ad35
Compare
|
The only issue now is your private key. |
2b7ad35 to
c3b84c0
Compare
Fixed it! |
aaronmondal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get through all of it, but at the moment I'd say that this would introduce too much maintenance overhead considering that this is all a single store. If this is the amount of logic necessary to support a reliable GCS setup I'd honestly lean towards dropping this entirely.
cc @kubevalet AFAIU we already run on gcs via the filesystem store, right? Would it be viable to just recommend people to use that instead and improve documentation around that? I'm trying to understand the tradeoff that we'd be making here in terms of maintainability vs maintenance overhead. Are there usecases where one "has" to use GCS?
@amankrx Maybe my concerns are unreasonable. Have you explored ways to make this... well let's say "less code"?
Reviewed 19 of 19 files at r1, all commit messages.
Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and 12 discussions need to be resolved
-- commits line 2 at r1:
"Add" instead of "Added"
nativelink-store/Cargo.toml line 31 at r1 (raw file):
bytes = { version = "1.10.0", default-features = false } bytes-utils = { version = "0.1.4", default-features = false } chrono = "0.4.39"
Add this (and the other creates) with x = { version = "y", default -features = false } by default to prevent pulling in unnecessary deps.
nativelink-store/src/gcs_client/client.rs line 84 at r1 (raw file):
pub async fn new_with_http_client( spec: &GcsSpec, client: hyper::Client<C>,
fyi this will need a hyper 1.x implementation after #1639. It might be easier to build on top of that so that we don't need migrate this one as well. The hyper 1.x migration is a bit complicated and we're still testing, but it should make it into main in the next few days.
nativelink-store/src/gcs_client/client.rs line 229 at r1 (raw file):
return Err(make_err!( match status.as_u16() { 401 | 403 => Code::PermissionDenied,
nit: Do these have typed names?
nativelink-store/src/gcs_client/client.rs line 315 at r1 (raw file):
} tokio::time::sleep(Duration::from_millis(200)).await; // Small delay to allow GCS to finalize
Let's make this a named const.
nativelink-store/src/gcs_client/client.rs line 642 at r1 (raw file):
match status.as_u16() { 200 | 201 | 308 => Ok(()),
nit: as above
nativelink-store/src/gcs_client/operations.rs line 14 at r1 (raw file):
// See the License for the specific language governing permissions and // limitations under the License.
nit: Seems unnecessary to have this in a dedicated file.
Cargo.lock line 42 at r1 (raw file):
[[package]] name = "android_system_properties"
Whatever causes these android_* dependencies to be pulled in should probably be disabled. My guess is that it's some default feature in some cratej (chrono maybe?). Try adding the deps as somedep = { version = "x.y.z", default-features = false, features = ["somefeature"] } in Cargo.toml where you only enable the minimal set of features that we need.
nativelink-store/src/gcs_store.rs line 42 at r1 (raw file):
}; struct ConnectionPool {
fyi We'll probably need to rework this as we're moving the entire nativelink-store setup to hyper 1.x (Current WIP branch at #1639). I general I like the explicitness of this approach though and I'll check whether this is something I should add to the aws store as well.
nativelink-store/src/gcs_store.rs line 204 at r1 (raw file):
NowFn: Fn() -> I + Send + Sync + Unpin + 'static, { async fn has_with_results(
This looks duplicated. We might get away with lifting this out of the aws implementation and reusing it in both places.
nativelink-store/src/gcs_store.rs line 272 at r1 (raw file):
// For larger files, use resumable upload // First, initiate the upload session let upload_id = self
note to self: This looks wrong to me. I can't pinpoint it though. Will to evaluate in more detail.
nativelink-store/src/gcs_store.rs line 275 at r1 (raw file):
.retrier .retry(unfold(object.clone(), move |object| async move { let conn = match self.connection_pool.acquire().await {
Hmm I'm not entirely sure on this one but I feel like the connection acquisition and retrial of that should be handled via the client side. Otherwise we'd have to "remember" to always acquire the permit in the calls. And it introduces fairly tight coupling between the store implementation and the underlying client.
nativelink-store/src/gcs_store.rs line 436 at r1 (raw file):
.retrier .retry(unfold( (object.clone(), offset, end_offset),
I believe all of these xx.clone() calls in the unfolds (here and everywhere else) should be moves. It shouldn't be necessary to clone the objects or at least only do so in the minimal cases necessary.
nativelink-store/src/gcs_client/connector.rs line 83 at r1 (raw file):
Pin::new(&mut Pin::get_mut(self).connection).poll_shutdown(cx) } }
fyi this will need to be changed after the hyper 1.x migration mentioned earlier.
.gitignore line 6 at r1 (raw file):
.idea/ .zed .idea/
This is duplicated from two lines above. Should be safe to remove.
nativelink-store/src/gcs_client/auth.rs line 35 at r1 (raw file):
#[derive(Debug, Serialize)] struct JwtClaims {
Is it actually necessary to roll our own implementation here? Are there reasons we can't use e.g. https://github.com/googleapis/google-cloud-rust/tree/main/src/auth or https://github.com/yoshidan/google-cloud-rust/tree/main/foundation/auth ?
For instance, if we used the official sdk, if we wanted to move the gcs store to non-experimental we could just wait until the official rust sdk reaches GA and then "bump a dependency".
I'm also worried that rolling our own here would require us to maintain perfect coverage and fuzzing as it would be "on us" to maintain the security guarantees of this implementation. This seems like a big lift. cc @kubevalet
amankrx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get this done with less code altogether. The primary issue is that we don't have a proper GCS SDK with a custom choice of client to use. So, the majority of the implementation is primarily the client itself. I'm not sure if we would be changing the client that much later on. So, it's up to us if we want a custom client like the one that the one I have provided, or to use third-party ones like https://github.com/abdolence/gcloud-sdk-rs
Also, it was initially less than 1000 lines of code PR, but the primary overhead was implementing traits and associated methods to add the unit tests. Apart from that, if it is absolutely necessary, I can work on reducing the unnecessary addition of extra modularity and concise it a bit.
Reviewable status: 0 of 2 LGTMs obtained, and all files reviewed, and 12 discussions need to be resolved
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
"Add" instead of "Added"
Sure. I will fix the commit message before the final merge.
.gitignore line 6 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
This is duplicated from two lines above. Should be safe to remove.
Ohh... This was added a while ago when I had raised the PR with the gRPC implementation. We didn't have .idea added at that time. I will remove it now.
Cargo.lock line 42 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
Whatever causes these
android_*dependencies to be pulled in should probably be disabled. My guess is that it's some default feature in some cratej (chronomaybe?). Try adding the deps assomedep = { version = "x.y.z", default-features = false, features = ["somefeature"] }in Cargo.toml where you only enable the minimal set of features that we need.
Yeah, looks like chrono added a number of crates unnecessarily. I will disable most of the features.
nativelink-store/src/gcs_store.rs line 204 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
This looks duplicated. We might get away with lifting this out of the aws implementation and reusing it in both places.
Yup, this is the same implementation we have for AWS. The primary implementation is what we have for the has method above. I would also argue that we can reuse the update and get_part methods, and the primary implementation can be shifted to a similar method we have as the has method.
nativelink-store/src/gcs_client/auth.rs line 35 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
Is it actually necessary to roll our own implementation here? Are there reasons we can't use e.g. https://github.com/googleapis/google-cloud-rust/tree/main/src/auth or https://github.com/yoshidan/google-cloud-rust/tree/main/foundation/auth ?
For instance, if we used the official sdk, if we wanted to move the gcs store to non-experimental we could just wait until the official rust sdk reaches GA and then "bump a dependency".
I'm also worried that rolling our own here would require us to maintain perfect coverage and fuzzing as it would be "on us" to maintain the security guarantees of this implementation. This seems like a big lift. cc @kubevalet
As I mentioned above, we can definitely use these crates, but they tend to pull in the reqwest library and we can't turn off that feature right away. If we decide to use that library, we wouldn't have to worry about the GCS client implementation though.
nativelink-store/src/gcs_client/client.rs line 315 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
Let's make this a named const.
Sure!
nativelink-store/src/gcs_client/operations.rs line 14 at r1 (raw file):
Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…
nit: Seems unnecessary to have this in a dedicated file.
Can be moved to client.rs.
|
@amankrx Ah I think I get now where the aversion to reqwest is coming from 😅 You're probably referring to this: #1547 (review). What I meant by that is that we shouldn't directly depend on reqwest as we have a bunch of other mechanisms to make requests. If the google-cloud-sdk pulls that in it'd be fine as it would be similar to the aws sdk where we simply cannot control what official packages require. It's also possible to minimize the transitive dependencies by e.g. overriding the default features (although I wouldn't even consider it that problematic as long as we don't pull in any complex C/C++ dependencies). So if we can get auth with the official sdk, even if that pulls in reqwest transitively it'd be totally fine as it's not us directly depending on reqwest. This should significantly simplify this setup (and alleviate my security concerns as we can essentially treat it as "not our problem" if the official sdk has issues 😆 This should help us "remove" the current If the new client implementation in #1639 is generic enough we should be able to remove the Regarding the current There doesn't seem to be any easy way around the With changes along these lines I'm optimistic that we can significantly reduce the maintenance overhead of the GCSStore. |
c3b84c0 to
7213aaa
Compare
Yup! I have added an extra commit now, such that it would be easier if we do revert the changes. The current implementation is based on the Also, there might be some CI failures. I will address some of the @aaronmondal 's comments as well by today. I just wanted to push these changes to get an idea if the current implementation seems to be in the right direction. I tried using AWS's SDK for GCS auth and stuff, but there were always some challenge with the trait bounds, or some other issue. If we really need that implementation, then I can dive a bit more deeper into it, but until then, please let me know if this implementation seems good enough? |
7213aaa to
9f1d1ca
Compare
aaronmondal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job @amankrx and thank you for being so patient with all the reviews! This was certainly a pretty difficult one with a bunch of back and forth and I feel like we now have a strong baseline for future stores as well ❤️
Reviewed 2 of 22 files at r5, 9 of 12 files at r10, 2 of 2 files at r11, all commit messages.
Reviewable status: 2 of 2 LGTMs obtained, and all files reviewed, and 1 discussions need to be resolved
ffb619f to
c7c66f0
Compare
|
Done! Thanks everyone for such a detailed and awesome review process! |
c7c66f0 to
b45617f
Compare
b45617f to
d5a0a08
Compare
aaronmondal
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 4 files at r12, 1 of 1 files at r13, all commit messages.
Reviewable status:complete! 2 of 2 LGTMs obtained, and all files reviewed
Description
This PR implements the Google Cloud Store implementation and closely aligns with the AWS S3 implementation.
This is an update to my previous PR I raised using the gRPC implementation. There were multiple issues with that PR, and the primary issue was the stability. This PR fixes all the underlying issues we were facing with that PR. We are using the
google-cloud-storageandgoogle-cloud-authcrates.Note: To run the
gcs_backend.json5example that I have provided, you should ensure that the path to the credentials file is specified in theGOOGLE_APPLICATION_CREDENTIALSenvironment variable. Check https://docs.rs/google-cloud-storage/0.24.0/google_cloud_storage/#automatically for more details.Fixes #659
/claim #659
Type of change
Please delete options that aren't relevant.
How Has This Been Tested?
I did an E2E testing using the
gcs_backend.json5example that I have provided. Along with that, I have set up a mocking framework for the GCS client and added a few tests for both the GCS store and the GCS client.Checklist
bazel test //...passes locallygit amendsee some docsThis change is