Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiplatform (manifest list) proxying + fix concurrent writes to files + add concurrent download of layers #314

Merged

Conversation

awoimbee
Copy link
Collaborator

@awoimbee awoimbee commented Feb 1, 2022

Sorry, I wanted this PR to be smaller but then went down the rabbithole of file integrity and concurrent writes.

  • Fixes Trow cannot proxy multiplatform images #310 (Fixes Digest sha256:(...) not in repository (...) Extrality/trow#1):
    Relevant changes are in fn create_accept_header and async fn download_manifest_and_layers + newly created async fn download_blob
    We have to download every layer of every platform because a client might ask for any random layer.
  • Lots of improvements regarding Race conditions on the file system #309:
    I ended up creating lib/server/temporary_file.rs, take a look at test_temporary_file_async_cancellation ;)
    Thanks to this wrapper:
    • there are no stale scratch file: either a file exists and is being worked on, or it doesn't exist
    • trow cannot write to a scratch file without being its creator: no concurrent writes
  • Concurrent download of multi platform manifests and of layers: the speed increase is noticeable
  • I didn't understand why Trow used fs::copy to move blobs from the scratch to their destination, so I changed it to fs::rename (I think it's fair to assume that both these directories are on the same filesystem), the speed increase is noticeable here too

@splitice
Copy link

splitice commented Feb 1, 2022

wow, great work with the filesystem corruption points. Thats a serious bug.

I'm not a maintainer but I'm not sure I agree with "I think it's fair to assume that both these directories are on the same filesystem". It might be the case 90% of the time, but thats not a good argument. Perhaps a default and a flag? Or attempt a copy if the rename fails?

@awoimbee
Copy link
Collaborator Author

awoimbee commented Feb 2, 2022

@splitice keep in mind that the cli accepts a --data-dir arg, this dir then contains blobs/ manifests/ scratch/. So the file operations are all happening inside the data-dir (not between eg /tmp and /data).

@splitice
Copy link

splitice commented Feb 4, 2022

While I see your point @awoimbee I much prefer retaining compatibility and flexibility. Is there some reason attempting a move if the rename fails would be difficult or suboptimal?

The errno could be checked.

@awoimbee
Copy link
Collaborator Author

awoimbee commented Feb 4, 2022

Anything can be done, the question is should it be done. I forgot to add that fs::rename is an atomic operation while fs::copy is not, meaning fs::copy might cause Trow to serve invalid files.
I also don't want to spend more time on this, nor do I want to write tests for edge cases (eg test_copy_if_rename_fails). So if @amouat says I need to roll-back, I'll roll-back, otherwise not.

@amouat
Copy link
Contributor

amouat commented Feb 4, 2022

Just looking at this now. Thanks a lot for all the work here.

@amouat amouat merged commit f378275 into Trow-Registry:main Feb 4, 2022
@amouat
Copy link
Contributor

amouat commented Feb 4, 2022

Yeah, I think rename is better. I am worried there was a reason I didn't do it, but I suspect I just never noticed. This way makes things a lot simpler. Thanks for the tests as well, this was a great PR.

@splitice thanks for bringing up those points; they both need to be thought about. In this case, I really don't see much impact on existing users. And @awoimbee is correct about the filesystem. This does make it harder to move to a system where those directories are on a different FS, but I don't expect that to be a requested feature in the short term at least. And if we're wrong we can always rework this.

@awoimbee awoimbee deleted the upstream-fix-digest-not-in-repository branch February 7, 2022 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trow cannot proxy multiplatform images Digest sha256:(...) not in repository (...)
3 participants