Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize: avoid redundant blob fetching #3569

Conversation

justadogistaken
Copy link
Contributor

I read through the code of proxyBlobStore. I found that the logic of ServeBlob could be optimized. We don't need to request twice for the same blob when caching it.

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch from bc95d2c to c182ba3 Compare January 20, 2022 03:55
@codecov-commenter
Copy link

codecov-commenter commented Jan 20, 2022

Codecov Report

Merging #3569 (abfc675) into main (02e2231) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main    #3569      +/-   ##
==========================================
- Coverage   56.34%   56.32%   -0.02%     
==========================================
  Files         101      101              
  Lines        7314     7309       -5     
==========================================
- Hits         4121     4117       -4     
+ Misses       2536     2535       -1     
  Partials      657      657              
Impacted Files Coverage Δ
...tion/distribution/registry/proxy/proxyblobstore.go 55.31% <0.00%> (-0.81%) ⬇️
...thub.com/distribution/distribution/context/http.go 63.07% <0.00%> (-0.29%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02e2231...abfc675. Read the comment docs.

@justadogistaken
Copy link
Contributor Author

/cc @milosgajdos @deleteriousEffect

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch from c182ba3 to 9a3e52a Compare January 24, 2022 09:06
@milosgajdos
Copy link
Member

We should address the CodeQL issues despite their not being directly related to this PR.

@justadogistaken
Copy link
Contributor Author

We should address the CodeQL issues despite their not being directly related to this PR.

So let't disable the configuration for insecure cipher suites?

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch 2 times, most recently from 76d1e1d to c74bde3 Compare January 24, 2022 11:35
Comment on lines 60 to 68

/* The following cipher suites are insecure, so we disable those options.
TLS_RSA_WITH_RC4_128_SHA.
TLS_RSA_WITH_AES_128_CBC_SHA256.
TLS_ECDHE_ECDSA_WITH_RC4_128_SHA.
TLS_ECDHE_RSA_WITH_RC4_128_SHA.
TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256.
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move this commit to a separate pull request, otherwise the security change might get overlooked (burried together with unrelated changes).

I'd also just remove the commented code (as it has no real purpose), and instead putting this information in the commit message, e.g.

This commit removes the following cipher suites that are known to be insecure:

    TLS_RSA_WITH_RC4_128_SHA
    TLS_RSA_WITH_AES_128_CBC_SHA256
    TLS_ECDHE_ECDSA_WITH_RC4_128_SHA
    TLS_ECDHE_RSA_WITH_RC4_128_SHA
    TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256
    TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256

That said, perhaps we should follow the recommendation that's posted in CI (if that works):

The following example shows how to create a safer TLS configuration:

package main

import "crypto/tls"

func saferTLSConfig() {
    config := &tls.Config{}
    config.MinVersion = tls.VersionTLS12
    config.MaxVersion = tls.VersionTLS13
    // OR
    config.MaxVersion = 0 // GOOD: Setting MaxVersion to 0 means that the highest version available in the package will be used.
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it!

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch 3 times, most recently from 37c80e1 to 5627378 Compare January 27, 2022 03:56
@milosgajdos milosgajdos added the area/proxy Related to registry as a pull-through cache label Feb 21, 2022
Copy link
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look ok to me. I only have nits around naming vars.

registry/proxy/proxyblobstore.go Outdated Show resolved Hide resolved
registry/proxy/proxyblobstore.go Outdated Show resolved Hide resolved
mu.Unlock()
}()

desc, err := pbs.remoteStore.Stat(ctx, dgst)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the Stat to line 114, just before Create(), to avoid unnecessy load for failure case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and please wrapper an method for the code block.

All the code should be in the same level,


func {
  method()
  method()
  details....
}

== >

func {
  method()
  method()
  method()
}

@wy65701436
Copy link
Collaborator

I am not quite sure about the original idea to use go rountine to async calls the storeLocal(). But this PR changes it to sync mode, if I understand correctly. Is it expceted? cc @milosgajdos

return err
}

remoteReader, err := pbs.remoteStore.Open(ctx, dgst)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pbs.scheduler.AddBlob(blobRef, repositoryTTL)

is removed for this PR, it's right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I forgot this part.

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch 2 times, most recently from f0b3cd6 to abfc675 Compare March 24, 2022 07:16
@sc0530
Copy link

sc0530 commented Sep 12, 2023

Any progress on this optimization? It appears to be a valuable approach for eliminating redundant blob requests to the registry.

@milosgajdos
Copy link
Member

This needs a rebase @justadogistaken

@justadogistaken
Copy link
Contributor Author

This needs a rebase @justadogistaken

I will rebase it later.

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch from abfc675 to 808f435 Compare September 13, 2023 12:52
@justadogistaken
Copy link
Contributor Author

@thaJeztah @wy65701436 Please help review this work.

@Jamstah
Copy link
Collaborator

Jamstah commented Sep 17, 2023

If a copy is already in flight, does it just fall back to streaming directly from the remote store to the client? If that is the case, a comment might be good there.

Just trying to get my head around the logic before adding a LGTM.

@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch from 808f435 to b99307c Compare September 17, 2023 16:06
@justadogistaken
Copy link
Contributor Author

If a copy is already in flight, does it just fall back to streaming directly from the remote store to the client? If that is the case, a comment might be good there.

Just trying to get my head around the logic before adding a LGTM.

Yes, it will fall back to streaming directly from remote store. Which is what it used to be.
This work majorly reduces a redundant blob fetching when "the requested blob does not exist locally" and "the blob digest it not in the flight."

Copy link
Collaborator

@Jamstah Jamstah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a comment in ServeBlob explaining that if there is already a copy in flight that the current thread will just proxy straight from the remote to the client, I assume because there's no sensible way to join in with the tee.

But that's not blocking and overall, very sensible change, lgtm.

Signed-off-by: baojiangnan <baojn1998@163.com>
@justadogistaken justadogistaken force-pushed the optimize/avoid-redundant-blob-fetching branch from b99307c to 1795292 Compare September 18, 2023 02:40
@justadogistaken
Copy link
Contributor Author

I'd appreciate a comment in ServeBlob explaining that if there is already a copy in flight that the current thread will just proxy straight from the remote to the client, I assume because there's no sensible way to join in with the tee.

But that's not blocking and overall, very sensible change, lgtm.

Thank you. And the comments are added.

@milosgajdos milosgajdos merged commit 42ce5d4 into distribution:main Sep 18, 2023
12 checks passed
@justadogistaken justadogistaken deleted the optimize/avoid-redundant-blob-fetching branch September 18, 2023 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proxy Related to registry as a pull-through cache
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants