Add Container/File export and check for workdir escaping#3348
Conversation
|
Since under the hood this gives us a tar stream, would it make sense to just expose it as a In other words: extend type Container {
"Export the container to an OCI-compatible image archive"
export: Directory!
}Then compose that with anything that can take a Directory? |
|
@shykes I really like the idea of funneling everything into I'm pretty hesitant to change Here's a counter-proposal to tear apart: extend type HostDirectory {
export(container: ContainerID!, path: String!)
}This avoids the issue of not having a clear |
|
That seems fine. Just to play devil's advocate: could it be orchestrated on top of a primitive that unpacks a tarball into a Directory? Possibly with the help of the shim? By the way this could be done later as a future replacement to the "king of the jungle" approach. |
Here we go :) extend type Container {
"Write the container image to the host directory as an OCI tarball"
export(to: HostDirectoryID!, path: String!): Boolean!
}"A directory on the host"
type HostDirectory {
"Write the contents of another directory to the directory"
write(contents: DirectoryID!, path: String): Boolean!
}These 2 are doing about the same (at least in terms of BuildKit call, they're both an Export), however their GraphQL API is very different: Exporting the raw tarball is done in the Container object by passing a HostDirectory as an argument, while exporting the unpacked tarball is done in the HostDirectory object by passing the Directory as an argument (e.g. swapping object<->subject). I think we should keep it consistent with one or the other. === unrelated to this PR, but still related === While looking at the API, I realize there's something funky: the only way to "write" (whether it's a host directory write or container export) is to do so through HostDirectory, which itself must be "read" before. e.g. writing to Basically writing forces us to read first. Is that a problem? Alternatively, we could use "Information about the host execution environment"
type Host {
"The current working directory on the host"
workdir: Directory!
"Write the contents of a directory to the host"
exportDirectory(id: DirectoryID!, path: String!): Boolean!
"Write the container image to the host directory as an OCI tarball"
exportContainer(id: ContainerID!, path: String!): Boolean!
}This would:
|
|
@aluzzardi One correction: you don't need to read a The weird part to me is having to know the workdir's re:
Fully agree with this especially since I realized I also need an |
e51a72b to
907cdfb
Compare
|
Inspired by this discussion, here's a proposal for both 1) a simpler way to write to host filesystem, and 2) container export to OCI archive. |
907cdfb to
e6ca864
Compare
container { export } for exporting OCI tarballs to the hostexport for exporting OCI tarballs to the host
e6ca864 to
04c4370
Compare
export for exporting OCI tarballs to the hostexport and check for workdir escaping
There was a problem hiding this comment.
I guess this API is different than the local dir export one in that it directly creates files and thus forces the engine to be local to the client.
From the discussion earlier, it seems like we might end up going down that route anyways, but this makes the decision for us if we keep the implementation as is (unless I'm missing something of course).
My best guess at long term solution here is services (can stream this back to clients over websocket), but depending on how we end up implementing file sync between clients and remote engine maybe this will end up using that same mechanism.
There was a problem hiding this comment.
To me all these host APIs are a stopgap until we figure out how the remote engine architecture should work and how files should be synced back and forth. It's pretty hard for me to make sense of any of this API in that world.
My mental picture is that the routing API will always be co-located with Buildkit, so for this particular piece of code it shouldn't make much difference between doing an os.Create here vs in Buildkit against the same filesystem, it's more just dealing with the fact that his export type gives us a stream instead.
But it's strange to think about how this code would work in that context and whether it's consistent with the rest because I feel like we'd want a fundamentally different interface based on filesync anyhow. :)
There was a problem hiding this comment.
Yeah I'm okay with merging this implementation for now. The only other options would be to get the exported file into buildkit's cache somehow and then do a local export of that. Probably possible but not worth it.
There was a problem hiding this comment.
Sorry if I'm slow here... I want to make sure I understand.
- Directory export works by receiving the buildkit exporter tar stream and unpacking it in the target host directory;
- File export works by querying the file's contents from the buildkit state, then writing it to the target host file;
- Both Directory and File export write the result to the client's host filesystem (ie. where the client code is running)
- But @sipsma you're concerned that by using different buildkit APIs (llb state inspection vs. exporter), these two implementations may be affected differently by the future change to a remote engine architecture
I think I'm missing something as I don't understand the last part. Sorry if I'm missing something obvious.
There was a problem hiding this comment.
- Directory export works by receiving the buildkit exporter tar stream and unpacking it in the target host directory;
- File export works by querying the file's contents from the buildkit state, then writing it to the target host file;
- Both Directory and File export write the result to the client's host filesystem (ie. where the client code is running)
Yes (technically it's not a tar stream but rather buildkit's own diffcopy protocol, but same idea)
But @sipsma you're concerned that by using different buildkit APIs (llb state inspection vs. exporter), these two implementations may be affected differently by the future change to a remote engine architecture
I'm just pointing out that this API implementation is different in that nothing is streamed back to the client; we just os.Create the file in the engine code directly and presume that it's the same filesystem as the clients. So before we move to a remote engine we need to fix this as, unlike the other file export mechanism, it's a hard assumption on the engine being local. That's totally fine for now, I just wanted to make sure we're aware of this and tracking it (#3624)
ad9a114 to
af5e2f0
Compare
| defer out.Close() | ||
|
|
||
| return host.Export(ctx, bkclient.ExportEntry{ | ||
| Type: bkclient.ExporterOCI, |
There was a problem hiding this comment.
afaict this exporter type supports multiplatform too, so we should export it for each platform (same as here)
There was a problem hiding this comment.
I copied everything from Publish into a private function shared by Export, but one caveat is that the multi-platform OCI images created by it aren't compatible with docker load since they don't contain a manifest.json.
As a compromise I've made it include a manifest.json if there's only one platform, which should be compatible with the previous behavior.
There was a problem hiding this comment.
Great catch on that issue, thanks!
| "Write the file to a directory on the host" | ||
| export(path: String!): Boolean! |
There was a problem hiding this comment.
I get why we want to make path be a directory on the host (it's annoying to have to specify the filename all the time), but my initial intuition would have been that path should be set to the path where I want the file at, not the parent dir.
The problem is that if a user has that expectation and does something like export a file named "foo.txt" like this:
file.Export(ctx, "/dir/subdir/bar.txt")The actual result will be that we create /dir/subdir/bar.txt/foo.txt even though they probably expected us to create /dir/subdir/bar.txt with the contents of foo.txt.
I don't have great solutions in the short term for this other than considering making it so path points to the path of the file rather than the parent dir. But I think at minimum we could be more explicit in the docstring here.
There was a problem hiding this comment.
Agree. I actually started with 'file or dir' but then realized there's no way for us to tell the user's intention (besides requiring a trailing / which is too subtle).
So I went with just directory, but now that you mention it exporting to a file path feels much more intuitive alongside the container export.
I think it can be done easily by specifying a new name for the file on the right-hand side of llb.Copy and just exporting it to the specified file path's parent dir.
Starting on that now!
There was a problem hiding this comment.
p.s.: the only downside to this is kind of the opposite of what you mention: if someone does file.Export(ctx, ".") (targeting a dir) it won't work. I would expect some kind of 'is a directory' error.
There was a problem hiding this comment.
Oh, yikes, the Buildkit exporter actually seems to do a os.RemoveAll of the destination directory in order to replace it with a file. So that actually "works" but doesn't do NEARLY what the user wanted. 😱
I'll add a guard for this. Yet another thing to keep in mind with remote filesync. 🤔
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
also updated GraphQL descriptions Signed-off-by: Alex Suraci <alex@dagger.io>
Signed-off-by: Alex Suraci <alex@dagger.io>
0a273ce to
a06ff1f
Compare
Signed-off-by: Alex Suraci <alex@dagger.io>
Adds the following schema:
Side note: I thought about supporting
export(path: String!, address: ContainerAddress!)so you can set a name for the exported image, but supporting that isn't possible with the OCI exporter until Buildkit ships a new version (possibly v0.11?) with annotation support (moby/buildkit#2879).