Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

network mode "custom_network" not supported by buildkit #175

Open
MarcosMorelli opened this issue Oct 23, 2019 · 70 comments
Open

network mode "custom_network" not supported by buildkit #175

MarcosMorelli opened this issue Oct 23, 2019 · 70 comments

Comments

@MarcosMorelli
Copy link

MarcosMorelli commented Oct 23, 2019

Background:
Running a simple integration test fails with network option:

docker network create custom_network
docker run -d --network custom_network --name mongo mongo:3.6
docker buildx build --network custom_network --target=test .

Output:
network mode "custom_network" not supported by buildkit

Still not supported? Code related:
https://github.com/docker/buildx/blob/master/build/build.go#L462-L463

@tonistiigi
Copy link
Member

Sorry, I'm not sure if we will ever start supporting this as it makes the build dependant on the configuration of a specific node and limits the build to a single node.

@bryanhuntesl
Copy link

Sorry, I'm not sure if we will ever start supporting this as it makes the build dependant on the configuration of a specific node and limits the build to a single node.

That horse has bolted - SSH mount makes the build dependent upon the configuration of a single node - where did that dogma even get started?

@tonistiigi
Copy link
Member

That horse has bolted - SSH mount makes the build dependent upon the configuration of a single node

No, it does not. You can forward your ssh agent against any node or a cluster of nodes in buildx. Not really different than just using private images.

@bryanhuntesl
Copy link

That horse has bolted - SSH mount makes the build dependent upon the configuration of a single node

No, it does not. You can forward your ssh agent against any node or a cluster of nodes in buildx. Not really different than just using private images.

Why would someone do that? ssh-agent is a something that needs to be fairly well locked down - why would someone forward it across an insecure connection?

I mean, that's a tangent anyway. Being able to run integration-tests in a docker build was an incredibly useful feature, one less VM to spin up, and one less iceberg to melt, it's just useful because it's efficient.

It's also great to not have to run nodejs, ruby, etc on the build host but instead just have them as container dependency, if you can do all your tests in a docker build container it's one less thing to lock down.

Anyhow, I apologise for running off on a tangent. All I'm saying is, it would be awesome if you could bring that functionality into the latest version of docker along with the means to temporary mount secrets. It's just a really lightweight way to run disposable VMs without touching the host or even giving any rights to run any scripts or anything on the host.

@tonistiigi
Copy link
Member

why would someone forward it across an insecure connection?

Why would that connection be insecure? Forwarding agent is more secure than build secrets because your nodes never get access to your keys.

if you can do all your tests in a docker build container it's one less thing to lock down.
along with the means to temporary mount secrets

We have solutions for build secrets, privileged execution modes (where you needed docker run before for more complicated integration tests) and persistent cache for your apt/npm cache etc. moby/buildkit#1337 is implementing sidecar containers support. None of this breaks the portability of the build. And if you really want it, host networking is available for you.

@bryanhuntesl
Copy link

None of this breaks the portability of the build. And if you really want it, host networking is available for you.

But I'd like to spin up a network for each build - and have all the stuff running that would be needed for the integration tests. But again, I have to loop back around and either do weird stuff with iptables, or run postgres on the host and share it with all builds (contention/secrets/writing to the same resources/etc).

You could see how it would be so much more encapsulated and attractive if I could spin up a network per build with a bunch of stub services and tear it down afterwards ?

@bryanhuntesl
Copy link

why would someone forward it across an insecure connection?

Why would that connection be insecure? Forwarding agent is more secure than build secrets because your nodes never get access to your keys.

I'm talking about the socat hack where you forward the socket over TCP - you might have been referring to something else.

@bryanhuntesl
Copy link

moby/buildkit#1337 sounds cool but honestly, given the choice between something that right now works or something that will drop in 2 years time, I know what most of the community would choose.

@tonistiigi
Copy link
Member

you might have been referring to something else.

https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066

@bryanhuntesl
Copy link

you might have been referring to something else.

https://medium.com/@tonistiigi/build-secrets-and-ssh-forwarding-in-docker-18-09-ae8161d066

Nah your secrets and forwarding feature is great - love it. Rocker had secrets support 3 years ago but that project withered on the vine.

@bryanhuntesl
Copy link

The sidecar also sounds great and very clever and well structured. But again, 3 years ago I could build with secrets and talk to network services to run integration tests.

@alanreyesv
Copy link

The sidecar also sounds great and very clever and well structured. But again, 3 years ago I could build with secrets and talk to network services to run integration tests.

Also, it does work in compose while build secrets does not.

@sudo-bmitch
Copy link

Adding another use case where specifying the network would be useful: "hermetic builds".

I'm defining a docker network with --internal that has one other container on the network, a proxy that is providing all the external libraries and files needed for the build. I'd like the docker build to run on this network without access to the external internet, but with access to that proxy.

I can do this with the classic docker build today, or I can create an entire VM with the appropriate network settings, perhaps it would also work if I setup a DinD instance, but it would be useful for buildkit to support this natively.

@bryanhuntesl
Copy link

Adding another use case where specifying the network would be useful: "hermetic builds".

I'm defining a docker network with --internal that has one other container on the network, a proxy that is providing all the external libraries and files needed for the build. I'd like the docker build to run on this network without access to the external internet, but with access to that proxy.

I can do this with the classic docker build today, or I can create an entire VM with the appropriate network settings, perhaps it would also work if I setup a DinD instance, but it would be useful for buildkit to support this natively.

Good point, I should have mentioned I was doing that too for git dependencies, and... Docker themselves have blogged about using it to augment the docker cache. Now I just burn the network, take lots of coffee breaks, and do my bit to melt the ice caps.

@tonistiigi
Copy link
Member

@bryanhuntesl The proxy vars are still supported. For this use case, cache mounts might be a better solution now https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md#run---mounttypecache

@eriksw
Copy link

eriksw commented Sep 23, 2021

This is particularly needed in environments such as Google Cloud Build where ambient credentials (via special-IP metadata service) are available only on a particular named network, not on the default network, in order to keep their exposure to build steps opt-in.

@septum
Copy link

septum commented Oct 26, 2021

Any updates on this? I have also checked this issue moby/buildkit#978, but can't find a straight answer. I've disabled buildKit in the Docker Desktop configuration to be able to build my containers, but I'm guessing that is a workaround. Any progress on this would be appreciated.

@tonistiigi
Copy link
Member

tonistiigi commented Oct 26, 2021

The recommendation is to use buildx create --driver-opt network=custom instead when you absolutely need this capability. The same applies to the google cloud build use case.

@septum
Copy link

septum commented Oct 26, 2021

Thank you! It seemed like this was a weird use case, but it fits my needs for now. I'll be looking for a better solution, but in the meanwhile I'll use the recommendation.

@existere
Copy link

The recommendation is to use buildx create --driver-opt network=custom instead when you absolutely need this capability. The same applies to the google cloud build use case.

Anyone have a working example of this in Github Actions? Not working for me.

Run docker/setup-buildx-action@v1
  with:
    install: true
    buildkitd-flags: --debug
    driver-opts: network=custom-network
    driver: docker-container
    use: true
  env:
    DOCKER_CLI_EXPERIMENTAL: enabled
Docker info
Creating a new builder instance
  /usr/bin/docker buildx create --name builder-3eaacab9-d53e-490c-9020-xxx --driver docker-container --driver-opt network=custom-network --buildkitd-flags --debug --use
  builder-3eaacab9-d53e-490c-9020-bae1d022b444
Booting builder
Setting buildx as default builder
Inspect builder
BuildKit version
  moby/buildkit:buildx-stable-1 => buildkitd github.com/moby/buildkit v0.9.3 8d2625494a6a3d413e3d875a2ff7xxx
Build
/usr/bin/docker build -f Dockerfile -t my_app:latest --network custom-network --target production .
time="2022-01-19T17:00:XYZ" level=warning msg="No output specified for docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load"
error: network mode "custom-network" not supported by buildkit. You can define a custom network for your builder using the network driver-opt in buildx create.
Error: The process '/usr/bin/docker' failed with exit code 1

@crazy-max
Copy link
Member

crazy-max commented Jan 19, 2022

@existere
Copy link

@existere https://github.com/docker/buildx/blob/master/docs/reference/buildx_create.md#use-a-custom-network

I don't see how that setup is any different that my configuration. Am I missing something?

Use a custom network
$ docker network create foonet
$ docker buildx create --name builder --driver docker-container --driver-opt network=foonet --use
$ docker buildx inspect --bootstrap
$ docker inspect buildx_buildkit_builder0 --format={{.NetworkSettings.Networks}}
map[foonet:0xc00018c0c0]

/usr/bin/docker buildx create --name builder-3eaacab9-d53e-490c-9020-xxx --driver docker-container --driver-opt network=custom-network --buildkitd-flags --debug --use

Here's the network create:

/usr/bin/docker network create custom-network
35bb341a1786f50af6b7baf7853ffc46926b62739736e93709e320xxx
/usr/bin/docker run --name my_container --network custom-network 

@tonistiigi
Copy link
Member

I don't see how that setup is any different that my configuration

You don't pass the custom network name with build commands. Your builder instance is already part of that network.

@philomory
Copy link

OK, so once you've got it set up, how do you get name resolution to work? If I have a container foo that's running on my custom network, and I do docker run --rm --network custom alpine ping -c 1 foo, it's able to resolve the name foo. Likewise, if I create a builder with docker buildx create --driver docker-container --driver-opt network=custom --name example --bootstrap, and then docker exec buildx_buildkit_example0 ping -c 1 foo, that works. But if I have a Dockerfile with RUN ping -c 1 foo and then run docker buildx build --builder example ., I get bad address foo. If I manually specify the IP address, that works, but hard-coding an IP address into the Dockerfile hardly seems reasonable.

@xkobal
Copy link

xkobal commented Feb 7, 2022

I have the same problem as @philomory. Name resolution doesn't work.
I am using network=cloudbuild on Google Cloud platform, so I can't hardcode any IP address.

Step #2: #17 3.744 WARNING: Compute Engine Metadata server unavailable on attempt 1 of 5. Reason: [Errno -2] Name or service not known
Step #2: #17 3.750 WARNING: Compute Engine Metadata server unavailable on attempt 2 of 5. Reason: [Errno -2] Name or service not known
Step #2: #17 3.756 WARNING: Compute Engine Metadata server unavailable on attempt 3 of 5. Reason: [Errno -2] Name or service not known
Step #2: #17 3.762 WARNING: Compute Engine Metadata server unavailable on attempt 4 of 5. Reason: [Errno -2] Name or service not known
Step #2: #17 3.768 WARNING: Compute Engine Metadata server unavailable on attempt 5 of 5. Reason: [Errno -2] Name or service not known

Step #2: #17 3.771 WARNING: No project ID could be determined. Consider running `gcloud config set project` or setting the GOOGLE_CLOUD_PROJECT environment variable
Step #2: #17 3.782 WARNING: Compute Engine Metadata server unavailable on attempt 1 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efc17f85820>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Step #2: #17 3.917 WARNING: Compute Engine Metadata server unavailable on attempt 2 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efc17f85c40>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Step #2: #17 3.925 WARNING: Compute Engine Metadata server unavailable on attempt 3 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efc17f860d0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Step #2: #17 3.934 WARNING: Compute Engine Metadata server unavailable on attempt 4 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efc17f85af0>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Step #2: #17 3.942 WARNING: Compute Engine Metadata server unavailable on attempt 5 of 5. Reason: HTTPConnectionPool(host='metadata.google.internal', port=80): Max retries exceeded with url: /computeMetadata/v1/instance/service-accounts/default/?recursive=true (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7efc17f85880>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Step #2: #17 3.944 WARNING: Failed to retrieve Application Default Credentials: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true from the Google Compute Engine metadata service. Compute Engine Metadata server unavailable

Builder has been created with the following command:

docker buildx create --driver docker-container --driver-opt network=cloudbuild --name test --use

@fibbers
Copy link

fibbers commented Mar 9, 2022

It seems GCE's metadata server IP is 169.254.169.254 (but I'm not sure if this is always the case), so this worked for me in Google Cloud Build:

docker buildx create --name builder --driver docker-container --driver-opt network=cloudbuild --use
docker buildx build \
  --add-host metadata.google.internal:169.254.169.254 \
  ... \
  .

and inside Dockerfile (or using Cloud Client Libraries which use Application Default Credentials):

RUN curl "http://metadata.google.internal/computeMetadata/v1/project/project-id" -H "Metadata-Flavor: Google"

@xkobal
Copy link

xkobal commented Mar 10, 2022

Thanks for the tips @fibbers, it works like a charm. It will do the job until a real fix.

@poconnor-lab49
Copy link

@tonistiigi What's the right way to use the docker run scenario you describe?

We have solutions for build secrets, privileged execution modes (where you needed docker run before for more complicated integration tests) and persistent cache for your apt/npm cache etc. moby/buildkit#1337 is implementing sidecar containers support. None of this breaks the portability of the build. And if you really want it, host networking is available for you.

I'm currently doing something like

# create network and container the build relies on
docker network create echo-server
docker run -d --name echo-server --network echo-server -p 8080:80 ealen/echo-server
# sanity check that the echo server is on the network
docker run --rm --network echo-server curlimages/curl http://echo-server:80

# create the Dockerfile, will need to hit echo-server during the build
cat << EOF > echo-client.docker
FROM curlimages/curl
RUN curl echo-server:80 && echo
EOF

# create the builder using the network from earlier
docker buildx create --name builder-5fa507d2-a5c6-4fb8-8a18-7340b233672e \
    --driver docker-container \
    --driver-opt network=echo-server \
    --buildkitd-flags '--allow-insecure-entitlement security.insecure --allow-insecure-entitlement network.host' \
    --use

# run the build, output to docker to sanity check
docker buildx build --file echo-client.docker \
    --add-host echo-server:$(docker inspect echo-server | jq '.[0].NetworkSettings.Networks["echo-server"].IPAddress' | tr -d '"\n') \
    --tag local/echo-test-buildx \
    --output type=docker \
    --builder builder-5fa507d2-a5c6-4fb8-8a18-7340b233672e .

Using add-host like this seems like a dirty hack just to reach another container on the same network. What would be the right way to do this?

@sudo-bmitch
Copy link

I've been seeing similar. You can run the build in a user specified network. But the buildkit container on that network has DNS set to the docker's localhost entry which won't get passed through to nested containers. So the RUN steps within the build don't have that DNS resolution. I'm not sure of the best way to get that to pass through, perhaps a proxy running in the buildkit container that lets DNS get set to the container IP instead of localhost?

@Zahlii
Copy link

Zahlii commented Oct 27, 2023

I am currently hitting this issue, too, with the following setup on my Jenkins. I want to a) spin up a postgres docker image b) build a python library inside a Dockerfile, while running tests against said postgres database with a fixed name.

The issue is that my company wants me to use the docker plugin for Jenkins (https://plugins.jenkins.io/docker-workflow/, see https://docs.cloudbees.com/docs/cloudbees-ci/latest/pipelines/docker-workflow)

The Jenkinsfile code looks similar to this here:

docker network create flow-network
docker.image('postgres:15.4-bookworm').withRun('--network=flow-network --name postgres-db...') {
    c ->
    docker.build(TAG, "--network flow-network .") // This will run python code assuming DB available at postgres-db
  }

Now, I could rewrite this code to work with buildx as said above, but then I'd need to use basic shell syntax as opposed to the plugin, which will perform clean-up activities in case of failures automatically.

@jorismak
Copy link

jorismak commented Nov 2, 2023

My pipelines are still working fine for some time, but trying to create a new container locally I now notice that docker build is not working anymore. Apparently docker build is an alias for buildx (Whatever that is) but buildx can't have a simple 'network' flag to build with a network? (The recommendation is still to disable the default network on local installs and create a custom bridge network, so this seems quite essential??).

So, to get a build going locally, I now have to 'create a buildx node' with the network flag, then tell buildx to use that node, then when use 'buildx build' instead of just 'build', and the first thing it does is load up some buildx-image.

.. why? If the build system is being replaced by buildx, at least make it seamless and feature-parity before doing things like this (or make users opt in to the experimental stuff).

Making the docker build commandline compatible by including an option to set the network to use makes it already a lot better, but I'm sure other people have other issues with it reading this thread.

@64J0
Copy link

64J0 commented Dec 20, 2023

I was able to reach the containers using --network "host" but this is not good enough, since some people that would run this command are not using Linux, and this flag does not work in other major OS' like Mac and Windows (even with WSL).

@64J0
Copy link

64J0 commented Dec 20, 2023

So, this other approach apparently worked for me:

# start the necessary container using docker-compose
# so it already creates the network and attaches the container to it
# this container exposes a port and is attached to the network some-network
docker-compose up -d some-database

# get the gateway IP for the some-network
SOME_NETWORK_GW_IP=$(docker container inspect some-database --format='{{ $network := index .NetworkSettings.Networks "some-network" }}{{ $network.Gateway }}')

docker build . --add-host "some-database:${SOME_NETWORK_GW_IP}" --tag "some-api" --target "some-api" --file "Dockerfile.api"

Update: This doesn't work in Windows WSL, unfortunately.

@jorismak
Copy link

jorismak commented Dec 20, 2023 via email

@64J0
Copy link

64J0 commented Dec 20, 2023

Indeed, ideally it would be more simple. These workarounds are ugly as hell.

@vladimirrostok
Copy link

vladimirrostok commented Dec 29, 2023

I found a nice workaround, it's also relevant to any other frontend framework too, short workaround strategy is this:

  1. There is a host network mode available, meaning by default the build happens in sandboxed isolated mode, so we switch our Next.js app build time network to host mode from the isolation
  2. we're in the host network, so follow this rule: From your container's perspective localhost is the container itself. If you want to communicate with the host machine, you will need the IP address of the host itself. Depending on your Operating System and Docker configuration, this IP address varies. host.docker.internal is recommended from Docker version 18.03 onward.
  3. Use host.docker.internal as Strapi app hostname, if we take into account Strapi+Postgres container up and running with :1337 port exposed for Strapi we need connection string value like this
    NEXT_PUBLIC_STRAPI_API_URL=http://host.docker.internal:1337/graphql

That way we have build and runtime both working just fine, without external network setup, just two independent docker-compose files started locally on the same machine
It works like that:

  • Next.js build network driver is switched to host mode from the sandboxed mode -> it builds and asks graphql for data -> connects to host.docker.internal:1337 -> Docker's internal network redirects it to localhost:1337 -> Docker routes this to locally exposed Strapi running at :1337 port -> connection is fine

so, frontend build stage can connect straight to internet to fetch cloud CMS instance or host.docker.internal connection to access another local docker-compose setup, but it can't connect directly to localhost:1337 or attach to custom network and use DNS there like http://strapi:1337/, so the build stage network setup is very limited and needs some tweaking like this, only https://cms-in-the-cloud.com/graphql or only http://host.docker.internal:1337/graphql and work outside the default sandboxed mode set for docker build

image
There are only two ways to achieve the desired result with new Buildkit engine used for Docker
Option one: Use CMS deployment from the cloud and connect Next.js during build phase to it
Option two: Use local CMS deployment from different network, but connect to it via host.docker.internal

This will work for you on the Windows machine with latest Docker version used (Docker v3 and above with Buildkit engine), for Linux/MacOs consider to use 127.0.0.1 address instead if you have older versions of Docker

To make this work follow the external network configuration at Docker like this

Docker-compose A for Next.js
Specify .env connection to Strapi like this NEXT_PUBLIC_STRAPI_API_URL=http://host.docker.internal:1337/graphql

version: '3.8'
services:
  webnextjs:
    container_name: webnextjs
    ....
    ....
    build:
      network: host
    ports:
      - '3000:3000'

Docker-compose B for Strapi + PostgreSQL for data storage

version: "3.8"

services:
  strapiDB:
    container_name: strapiDB
      ....
      ....
      ports:
        - "5432:5432"
      expose:
        - "5432"
      networks:
        - strapiPostgresNetwork

  strapi:
    container_name: strapi
    ....
    ....
    ports:
      - "1337:1337"
    expose:
      - "1337"
    depends_on:
      strapiDB:
    networks:
      strapiPostgresNetwork:

networks:
  strapiPostgresNetwork:
    driver: bridge
    name: strapiPostgresNetwork

Run the Docker-compose B to get Strapi up, and then run the Docker-compose A to run the Next.js build+run process that will go through the build and run phases of it nicely through the
NEXT_PUBLIC_STRAPI_API_URL=http://host.docker.internal:1337/graphql connection, that's it!

More docs:

Read #591 (comment) to see Buildkit supports none, host, default(sandboxed) as network mode. Supporting custom network names is currently not planned as it conflicts with the portability and security guarantees. and don't use any other custom networks with Buildkit, that's the new reality of docker build command.

In short, custom network cannot be specified for the build step of the docker-compose anymore, meaning that connecting two docker-compose files into a single custom external network to have build running as before is not feasible anymore after the Docker moved to this https://docs.docker.com/build/buildkit/ engine which was enabled by-default in v23.0.0 on 2023-02-01 (https://docs.docker.com/engine/release-notes/23.0/#2300)

With custom network runtime phase is always fine but this Docker v3 update breaks the build phase completely making the build impossible if any getStaticProps static asset can't be fetched during the build phase (docker build) and fails the build so it never reaches the runtime phase (docker run) (if you skip build and got to runtime, it will work)

image

The use case for this setup:
Your organization has website/Strapi and website/Nextjs repositories separately, but you just want to run them both locally in same Docker network running website/Strapi docker-compose file and website/Nextjs docker-compose file with build step and getStaticProps, this is stupid simple use case we all use in everyday life and this is where Docker v3 introduced that breaking change.

Any other solution I found is to use docker buildx ... but that way you lose the docker-compose features or merge strapi and next.js in one big monorepo with big single docker-compose file

@dm17
Copy link

dm17 commented Dec 29, 2023

Thank you @VladimirAndrianov96 - it is a great summary of how they're wasting our time by removing features they claim we didn't want/need.

Does anyone know if podman/podman-compose has custom networks during build time? I'm ready to take the leap, despite the unpaid dev hours it'll take.

@jorismak
Copy link

I found a nice workaround, it's also relevant to any other frontend framework too, short workaround strategy is this:

Any workaround which involves 'finding out the IP of' is not acceptable, since that absolutely does not make it portable :). That works maybe for a local setup where you are the only one using the docker files and you know your host. In any kind of network setting or team setting, that is out of the window.

Also, can't be 'the way it is supposed to be'

Read #591 (comment) to see Buildkit supports none, host, default(sandboxed) as network mode. Supporting custom network names is currently not planned as it conflicts with the portability and security guarantees. and don't use any other custom networks with Buildkit, that's the new reality of docker build command.

The thing is that they are still recommending (and it's kind of needed with how it work) to disable the builtin bridge network as step 1 after installing Docker Engine. So you need to create custom networks and supply them to any container or build command that starts and needs internet or some sort of container-to-container network mode.
Breaking the --network command from docker build (since it's passed blindly to buildx which uses a different network parameter all together) means breaking the build command.

I know a lot of people here are commenting about how it works with docker-compose, but it just basically breaks the default docker build command. How....

The workaround is to create your own builder, and apply a network to that. But I would much rather just see the 'buildx' enforcement being removed, because it seems completely useless to me if you are just building things locally to then push.
But that's also not a good workaround if you often switch networks for your build commands (as you do in a docker-compose file with custom networks).

Any other solution I found is to use docker buildx ... but that way you lose the docker-compose features or merge strapi and next.js in one big monorepo with big single docker-compose file

In theory, this means you use docker-compose to build and then launch? Simply building in a separate step would work I guess. I don't use docker-compose here. Development is done locally and I'm the one making the images for deployment for the team, so I only work local and push to our registry.

If someone can tell me howto go back to the normal old 'docker build' command and just disable and ignore the whole buildx thing, yes please.

@dm17
Copy link

dm17 commented Dec 29, 2023

If someone has a Hacker News account with street/interweb cred, then perhaps they can post this thread on there... It is often how such situations get attention from companies with loads of money whose sole purpose is supposedly to remedy such situations - rather than cause them.

@jedevc
Copy link
Collaborator

jedevc commented Jan 2, 2024

If someone has a Hacker News account with street/interweb cred, then perhaps they can post this thread on

Heya, so, I'm no longer at docker and so only really speak for myself here, but I do feel the need as a maintainer on this project to call this out.

This is an open source project. Contributions and contributors are welcome - any person in this thread is welcome to engage on how to fix this issue, and upstream any contributions. It's known to be a limitation, but demanding that the maintainers drop everything to work on this is just not great form, sorry. Just like you, the maintainers of this project are busy, having to deal with internal/external constraints on their time, and often other things take priority.

That said, if you're a paid user of docker and this is affecting you, and you need to see this resolved, then you should raise this through support/your account representative. They can escalate internally and help set engineering priority - this is generally true of any corporate-backed open source project.

Finally, essentially calling for a brigade of a github thread during the break between Christmas and the new year, a time which very often people take off from work and would not be able to respond is... not great. I sincerely hope no good open source citizen would do this, but if they had, it would have the potential to disrupt people's time with family and friends. Please don't do that.

Please let's move this thread back on track - sharing workarounds and discussing ways in which this could be worked on productively.

@dm17
Copy link

dm17 commented Jan 2, 2024

I'm not surprised that someone with pronouns in their GitHub profile just created an entire victim narrative out of this. However, I am thankful for jedevc for pointing out that I should try pushing on a docker account rep for this issue.

Calling it a "brigade" is your narrative - not a fact. Especially when loads of stuff has recevied positive attention via this method, there is no rule against posting in a more general developer publication (which HN is), and after the attention was gotten via that method no one retrospectively said "I know billion dollar corporation X helped us only after thread Y was posted, but we now realize this was actually a brigade." I believe it is clear in my comment that I am not for pushing anyone who is working for free to fix it, but rather for the utilization of Docker Inc's power to fix (restore) the functionality.

I will also add that "we should not post anything that causes workers to scramble unnecessarily during the holidays" - despite the fact that I never said the opposite. I do realize it could be implied by the fact that I made the suggestion on Dec 29th, but I view this as an async rather than sync comm.

@dm17
Copy link

dm17 commented Jan 2, 2024

Please let's move this thread back on track - sharing workarounds and discussing ways in which this could be worked on productively.

It was blocked immediately by this:

Sorry, I'm not sure if we will ever start supporting this as it makes the build dependant on the configuration of a specific node and limits the build to a single node.

Where is the evidence for this claim? Also, it seems to me that this person is on Docker's payroll - and yet I am still not asking them to stop their holiday and work @jedevc - I'm merely replying with a reasonable question in a session of asynchronous communication. They can choose to use the technology they own so they only see work messages during work hours, for example.

The comment also implies that the fix for this issue will not be accepted by Docker, so it definitely dissuades anyone (who isn't on the Docker payroll) from attempting a fix during their free time. That's why the argument that "this is an open source project" work for us rather than Docker Inc in this senario.

@thaJeztah
Copy link
Member

I'm not surprised that someone with pronouns in their GitHub profile just created an entire victim narrative out of this.

Please keep such comments at the door. No need for this.

I'll skip over some of you other wordings, because I don't think there's anything constructive in those

Where is the evidence for this claim?

There's many things associated with this, and this feature is not trivial to implement (if possible at all). Let me try to do a short summary from the top of my head. Note that I'm not a BuildKit maintainer; I'm familiar with the overal aspects, but not deeply familiar with all parts of BuildKit.

First of all, buildx (the repository this ticket is posted in) is a client for BuildKit; the feature requested here would have to be implemented by the builder (BuildKit); Ideally this ticket would be moved to a more appropriate issue tracker, but unfortunately GitHub does not allow transferring tickets between orgs; in addition, multiple projects may need to be involved. There's no harm in having a ticket in this repository (nor to have discussions elsewhere), but ultimately this feature would have to be both accepted by upstream BuildKit and/or Moby maintainers, and implemented respectivee projects.

Now the more technical aspect of this feature request (or: this is where the fun starts).

BuildKit can be run in different settings / ways;

  • standalone, (bare-metal, containerized, in a kubernetes cluster); Buildx provides commands to instance new builders (docker buildx create) which can create new containerized builders
  • as part of the Moby daemon ("Docker Engine") as the embeded / built-in builder

Depending on the above, semantics, and available features will differ.

Standalone BuildKit builders

Standalone builders are designed to be stateless; builders may have cache from prior builds, but don't persist images (only build-cache), nor do they have a concept of "networks" (other than "host" networking or "no" networking). The general concept here is that standalone builders can be ephemeral (auto-scaling cluster), and that builds don't depend on prior state. They may advertise certain capabilities (BuildKit API version, native platform/architecture), but otherwise builders are interchangeable (perform a build, and export the result (e.g. push to a registry, or export as an OCI bundle).

Given that standalone builders don't have access to "docker custom networks" (there's no docker daemon involved), it won't be possible to provide a per build option to use a specific network. The workaround mentioned in this thread is to;

  • create a containerized builder (BuildKit daemon runing in a container)
  • to run that container with specific docker network attached.
  • to run builds inside the container with "host" networking; i.e., run in the host's networking namespace where in this setup "host" means the container in which the BuildKit daemon runs.

In this configuration, the "host" (container) can resolve containers running in that custom network, but this only works in very specific scenarios;

  • the builder-container is running on a Docker Engine (can't be supported on bare-metal BuildKit daemons, nor on Kubernetes or other drivers)
  • the Docker Engine must already have the given custom network (standalone buildkit has no means to control the Docker Engine to create the network)
  • dependencies must be manages through other means; other containers that must be resolvable must be attached on the custom network, and must be running before thee build is started
  • all builds on the builder must use the same custom network; using a different custom network means the builder must be either destroyed/recreated (or detached and re-attached to another network); which also means it cannot be used for parallel builds depending on different networks.
  • if builds affect state in dependency containers (e.g. depend on a database-container), all parts of a build must run on the same node (and potentially must be serialized, not in parallel)

All of the above combined make this a workaround that would work in only very specific scenarios. Implementing this as a native feature for standalone builders would quickly go down the rabbit-hole; "custom network attached to this builder" as well as "state of dependencies" would have to be exposed as capabilities, so that buildx can query which builder to select for a specific build, or an external orchestrator would need to be involved to construct builders (and dependencies) on-demand. Both would be moving away significantly from the current design of standalone builders (stateless, ephemeral).

As part of the Moby daemon ("Docker Engine")

When using the "default' builder on Docker Engine, BuildKit is not running as a standalone daemon, but compiled into the Docker Engine. In this scenario, BuildKit has some (but limited) access to features provided by the Docker Engine. For example, it's possible to use images that are available in the local image cache as part of the build (e.g. an image you built earlier, but that's not pushed to a registry). There's also limitations; when using "graphdrivers", the Docker Engine does not provide a multi-arch image store, so it's not possible to build multple architectures in a single build. (This will be possible in the near future and being worked on as part of the containerd image store integration).

Containers created during build are optimized for performance; containers used for build-steps tend to be very short-lived. "regular" containers as created by the Docker Engine have a non-insignificant overhead to provide all features that docker run can provide. To reduce this overhead, BuildKit creates optimized containers with a subset of those features; to further improve performance, it also skips some intermediate processes: BuildKit acts as its own runtime, and it can directly use the OCI runtime (runc) without requiring the intermediate "docker engine", and "containerd" processes.

Long story short; build-time-containers are (by design) isolated from "regular" containers; they're not managed by the docker daemon itself, won't show up in docker ps, and networking is not managed by the regular networking stack (no DNS-entries are registered in the internal DNS).

So, while the "embedded" BuildKit may have more potential options to integrate with custom networks, integrating with custom networks will require a significant amount of work; both in BuildKit (integration with the network stack) and in Moby / "docker engine" to (somehow) allow BuildKit to create different "flavors" of containers (isolated / non-isolated). This will come with a performance penalty, in addition to complexity involved (a secondary set of (short-lived) containers that can be attached to a network, but are not directly managed by the Moby / Docker Engine itself.

@dm17
Copy link

dm17 commented Jan 3, 2024

Thank you for the analysis.
Do you think the classic builder will be deprecated in the near future within docker, docker compose or buildx?

@TafkaMax
Copy link

TafkaMax commented Jan 3, 2024

TLDR; Which scenario would be better?

EDIT: For me personally - build time is not the primary thing I am after. I wish the solution to work. Currently using the classic builder still works, but my primary concern is that this feature disappears.

@TBBle
Copy link

TBBle commented Jan 3, 2024

Thank you @thaJeztah for that clarifying summary.

I wonder if it'd be worth a feature request against Docker Compose, which seems like it can do most of what the "containerised builder workaround" needs, since people in this thread have tried to do it that way already.

If Docker Compose was able to create/use/remove temporary private builder instances on its own (or the config-specified) network, along with resolving the apparent issue that a standalone BuildKit instance in a Docker container attached to a custom network doesn't get the right DNS setup for services on that custom network (see #175 (comment) from earlier attempts to use the "workaround" manually), then I think it can deliver the "workaround" flow fairly naturally for some of the use-cases described here.

I see a similar idea mentioned at docker/compose#10745 (comment) but I think that was about selecting an existing builder, rather than instantiating a new one. compose-spec/compose-spec#386 is along these lines but it wants to be super-generic; it is probably too generic for this use-case, particularly if we want to tear-down the builder instance after usage or at compose-down time. In this case, the builder is more like another service in Docker Compose that compose knows how to use when building service images. (That also might be a better way to visualise and implement it, similar to existing depends_on semantics, and semantics for defining service deployment etc. in the Compose file.)

That is separate from the existing --builder flag for docker compose build introduced in Docker Compose 2.20.0 in July 2023.

That said, I'm not a Docker Compose user, so I may be overestimating what it can cover, or misunderstanding the relevant workflow. Either way, unless this is an obviously-faulty idea to Docker Compose users, it'd be better to discuss in a ticket there, to focus on the relevant use-cases that involve Docker Compose, and also focus this ticket on buildx-involved use-cases.

@thaJeztah
Copy link
Member

Do you think the classic builder will be deprecated in the near future within docker, docker compose or buildx?

There's no immediate plans to actively remove the classic builder, but no active development is happening on it; consider it in "maintenance mode", and mostly to support building native Windows containers (BuildKit does not yet support Windows Containers, although work on that is in progress). The classic Builder's architecture does not give a lot of room for further expansion, so it will start to diverge / get behind BuildKit more over time. The classic builder may also not make the transition to the containerd image-store integration; there's currently a very rudimentary implementation to help the transition, but we're aware of various limitations in that implementations that may not be addressable with the classic buidler.

EDIT: For me personally - build time is not the primary thing I am after. I wish the solution to work. Currently using the classic builder still works, but my primary concern is that this feature disappears.

Perhaps it'd be useful to start a GitHub discussion; GitHub tickets aren't "great" for longer conversations and don't provide threads (not sure which repository would be best; perhaps BuildKit (https://github.com/moby/buildkit/discussions) to collect slightly more in-depth information about use-cases. I know there was a lot of contention around the original implementation, which also brought up concerns about portability and the feature being out of scope for building (see lengthy discussion on moby/moby#10324 and moby/moby#20987 (carried in moby/moby#27702)).

Use-cases that I'm aware of;

  • (ab)use docker build as a CI pipeline; spin up a stack of containers to test/verify things in an intermediate stage before tagging the final stage
  • caching; e.g. run a apt-mirror / Apt-Cacher in a container to locally host packages used for the build

But there may be other use-cases. Some of those may make more sense in a "controlled" environment (local / special purpose machine; single user), but get complicated fast in other environments. Having more data about use-cases could potentially help design around those (which may be through a different approach).

@sudo-bmitch
Copy link

To repeat a nearly 2 year old comment here, buildkit does support running with a custom network, it's even documented: https://docs.docker.com/build/drivers/docker-container/#custom-network

The issue isn't that it won't run on a custom network, instead, as so often happens on the internet, it was DNS. When you run a container runtime (which buildkit does) inside of a container on a custom network, it sees the DNS settings of that parent container:

$ docker network create build
9b7c83ceda7e2552e99d27c29d275936e882fd9cc9488361209bbf4421c2f180

$ docker run -it --rm --net build busybox cat /etc/resolv.conf
search lan
nameserver 127.0.0.11
options ndots:0

And as docker and other runtimes do, they refuse to use 127.0.0.11 as a DNS server, so the nested container falls back to 8.8.8.8. Here's a demo for proof:

#!/bin/sh

set -ex

docker network create custom-network

echo "hello from custom network" >test.txt
docker run --name "test-server" --net custom-network -d --rm \
  -v "$(pwd)/test.txt:/usr/share/nginx/html/test.txt:ro" nginx
server_ip="$(docker container inspect test-server --format "{{ (index .NetworkSettings.Networks \"custom-network\").IPAddress }}")"
echo "Server IP is ${server_ip}"

cat >Dockerfile <<EOF
FROM curlimages/curl as build
USER root
ARG server_ip
RUN mkdir -p /output \
 && cp /etc/resolv.conf /output/resolv.conf \
 && echo \${server_ip} >/output/server_ip.txt \
 && (curl -sSL http://test-server/test.txt >/output/by-dns.txt 2>&1 || :) \
 && (curl -sSL http://\${server_ip}/test.txt >/output/by-ip.txt 2>&1 || :)

FROM scratch
COPY --from=build /output /
EOF

docker buildx create \
  --name custom-net-build \
  --driver docker-container \
  --driver-opt "network=custom-network"
docker buildx build --builder custom-net-build --build-arg "server_ip=${server_ip}" \
  -o "type=local,dest=output" .

docker buildx rm custom-net-build
docker stop test-server
docker network rm custom-network

Running that shows that custom networks are supported, just not DNS:

$ ./demo-custom-network.sh
+ docker network create custom-network
bd5ce7361f5fc94b0da0fe32a3f5482176a6fcaca68997556d3449269c451cea
+ echo hello from custom network
+ pwd
+ docker run --name test-server --net custom-network -d --rm -v /home/bmitch/data/docker/buildkit-network/test.txt:/usr/share/nginx/html/test.txt:ro nginx
31bf9516fdc6dba7d97796be6b6c55f2a134a3050a7b1286a2ac96658e444c62
+ docker container inspect test-server --format {{ (index .NetworkSettings.Networks "custom-network").IPAddress }}
+ server_ip=192.168.74.2
+ echo Server IP is 192.168.74.2
Server IP is 192.168.74.2
+ cat
+ docker buildx create --name custom-net-build --driver docker-container --driver-opt network=custom-network
custom-net-build
+ docker buildx build --builder custom-net-build --build-arg server_ip=192.168.74.2 -o type=local,dest=output .
[+] Building 4.2s (9/9) FINISHED
 => [internal] booting buildkit                                                                    1.8s
 => => pulling image moby/buildkit:buildx-stable-1                                                 0.4s
 => => creating container buildx_buildkit_custom-net-build0                                        1.5s
 => [internal] load build definition from Dockerfile                                               0.0s
 => => transferring dockerfile: 393B                                                               0.0s
 => [internal] load metadata for docker.io/curlimages/curl:latest                                  0.6s
 => [auth] curlimages/curl:pull token for registry-1.docker.io                                     0.0s
 => [internal] load .dockerignore                                                                  0.0s
 => => transferring context: 2B                                                                    0.0s
 => [build 1/2] FROM docker.io/curlimages/curl:latest@sha256:4bfa3e2c0164fb103fb9bfd4dc956facce32  1.2s
 => => resolve docker.io/curlimages/curl:latest@sha256:4bfa3e2c0164fb103fb9bfd4dc956facce32b6c5d4  0.0s
 => => sha256:4ca545ee6d5db5c1170386eeb39b2ffe3bd46e5d4a73a9acbebc805f19607eb3 42B / 42B           0.1s
 => => sha256:fcad2432d35a50de75d71a26d674352950ae2f9de77cb34155bdb570f49b5fc3 4.04MB / 4.04MB     0.8s
 => => sha256:c926b61bad3b94ae7351bafd0c184c159ebf0643b085f7ef1d47ecdc7316833c 3.40MB / 3.40MB     0.8s
 => => extracting sha256:c926b61bad3b94ae7351bafd0c184c159ebf0643b085f7ef1d47ecdc7316833c          0.1s
 => => extracting sha256:fcad2432d35a50de75d71a26d674352950ae2f9de77cb34155bdb570f49b5fc3          0.1s
 => => extracting sha256:4ca545ee6d5db5c1170386eeb39b2ffe3bd46e5d4a73a9acbebc805f19607eb3          0.0s
 => [build 2/2] RUN mkdir -p /output  && cp /etc/resolv.conf /output/resolv.conf  && echo 192.168  0.2s
 => [stage-1 1/1] COPY --from=build /output /                                                      0.0s
 => exporting to client directory                                                                  0.0s
 => => copying files 357B                                                                          0.0s
+ docker buildx rm custom-net-build
custom-net-build removed
+ docker stop test-server
test-server
+ docker network rm custom-network
custom-network

$ cat output/resolv.conf
search lan
options ndots:0

nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 2001:4860:4860::8888
nameserver 2001:4860:4860::8844

$ cat output/server_ip.txt
192.168.74.2

$ cat output/by-dns.txt
curl: (6) Could not resolve host: test-server

$ cat output/by-ip.txt
hello from custom network

@thaJeztah
Copy link
Member

thaJeztah commented Jan 3, 2024

And as docker and other runtimes do, they refuse to use 127.0.0.11 as a DNS server, so the nested container falls back to 8.8.8.8. Here's a demo for proof:

Hmm.. right, but that should not be the case when using --network=host. In that case, the container should inherit the /etc/resolv.conf from the host; here's a docker-in-docker running with a custom network (127.0.0.11 is the ambedded DNS resolver); a container with --network=host inherits the host's settings;

cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

docker run --rm --network=host alpine cat /etc/resolv.conf
nameserver 127.0.0.11
options ndots:0

Trying to do the same with a BuildKit container builder that has host networking allowed, and a build started with --network=host shows that BuildKit does not do the same; it uses the default DNS servers;

Creating a custom network and a "test-server" container attached to it;

docker network create custom-network
docker run -d --name test-server --network custom-network nginx:alpine
docker container inspect test-server --format '{{ (index .NetworkSettings.Networks "custom-network").IPAddress }}'
172.24.0.2

Create a custom builder attached to the network, and allow "host-mode" networking;

docker buildx create --name custom-net-build --driver docker-container --driver-opt network=custom-network --buildkitd-flags '--allow-insecure-entitlement network.host'
custom-net-build

Running a build with --network=host, which should run in the host's networking namespace and inherit the host's DNS configuration (127.0.0.11 - the embedded DNS);

docker buildx build --no-cache --builder custom-net-build --network=host --progress=plain --load -<<'EOF'
FROM alpine
RUN cat /etc/resolv.conf
RUN wget http://test-server
EOF

However, it looks like BuildKit sets the default DNS resolvers, not inheriting from the host;

#4 [1/3] FROM docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48
#4 resolve docker.io/library/alpine:latest@sha256:51b67269f354137895d43f3b3d810bfacd3945438e94dc5ac55fdac340352f48 done
#4 CACHED

#5 [2/3] RUN cat /etc/resolv.conf
#5 0.038 options ndots:0
#5 0.038
#5 0.038 nameserver 8.8.8.8
#5 0.038 nameserver 8.8.4.4
#5 0.038 nameserver 2001:4860:4860::8888
#5 0.038 nameserver 2001:4860:4860::8844
#5 DONE 0.0s

#6 [3/3] RUN wget http://test-server
#6 0.057 wget: bad address 'test-server'
#6 ERROR: process "/bin/sh -c wget http://test-server" did not complete successfully: exit code: 1

So I think there's something funky going on there, and BuildKit's executor / runtime does not take host networking into account for DNS resolvers 🤔

Had a quick peek at code that I think is related to this; it looks like it logs a message about host networking; https://github.com/moby/buildkit/blob/8849789cf8abdc7d63ace61f8dc548582d22f3b5/executor/runcexecutor/executor.go#L184-L188

But after that unconditionally uses the standard /etc/resolv.conf that was generated (and used for all containers used during build); https://github.com/moby/buildkit/blob/8849789cf8abdc7d63ace61f8dc548582d22f3b5/executor/oci/resolvconf.go#L27-L118

@TBBle
Copy link

TBBle commented Jan 3, 2024

BuildKit incorrectly replacing localhost DNS resolvers when using host networking is moby/buildkit#3210. There was a PR in progress just over a year ago, but it wasn't completed. moby/buildkit#2404 seems to have had more recent activity, but looks much wider in scope than moby/buildkit#3210.

@crazy-max
Copy link
Member

BuildKit incorrectly replacing localhost DNS resolvers when using host networking is moby/buildkit#3210. There was a PR in progress just over a year ago, but it wasn't completed. moby/buildkit#2404 seems to have had more recent activity, but looks much wider in scope than moby/buildkit#3210.

Should be solved with moby/buildkit#4524

@TBBle
Copy link

TBBle commented Feb 11, 2024

If you want to test the fixed BuildKit right now, create a builder with --driver-opt=image=moby/buildkit:master,network=custom-network <other params>.

In fact, I made that change in the shell script from #175 (comment) on line 30

--driver-opt "image=moby/buildkit:master,network=custom-network" \

and it failed the same way; and I confirmed that it was running BuildKit from commit 2873353.

I can see that the BuildKit worker is supposed to be using host-mode networking:

Labels:
 org.mobyproject.buildkit.worker.executor:         oci
 org.mobyproject.buildkit.worker.hostname:         d1295746541b
 org.mobyproject.buildkit.worker.network:          host
 org.mobyproject.buildkit.worker.oci.process-mode: sandbox
 org.mobyproject.buildkit.worker.selinux.enabled:  false
 org.mobyproject.buildkit.worker.snapshotter:      overlayfs

So I tried with the extra changes from #175 (comment), resulting in

docker buildx create \
  --name custom-net-build \
  --driver docker-container \
  --driver-opt "image=moby/buildkit:master,network=custom-network" \
  --buildkitd-flags "--allow-insecure-entitlement network.host"
docker buildx build --builder custom-net-build --build-arg "server_ip=${server_ip}" \
  --network host \
  -o "type=local,dest=output" .

and that worked correctly:

$ cat output/resolv.conf
nameserver 127.0.0.11
options ndots:0
$ cat output/by-dns.txt
hello from custom network

So it'd be nice if the buildx docker-container driver could automatically set up the hosted buildkit with --network host-equivalent, either when using a custom network, or when the network mode is not CNI (defaulted or explicitly "host").

Technically, I guess if BuildKit used the worker's network config rather than the specific RUN command's network config when making resolv.conf decisions, then it'd work too, since clearly the custom network is reachable even without --network host.

I'm not 100% clear on the network layering here. Maybe the RUN's container (without --network=host) is actually being attached to the custom network too, rather than inheriting from its parent, and so BuildKit is still doing the wrong thing by not producing a resolv.conf that matches this behaviour?

Anyway, the relevant buildkit change should be part of the 0.13 release and any pre-releases after 0.13b3, and automatically picked up by docker-container drivers through the default moby/buildkit:buildx-stable-1 tag, updated when the BuildKit maintainers bless a build as stable-enough.

@TBBle
Copy link

TBBle commented Feb 12, 2024

Okay, results of discussion with BuildKit maintainer on that PR is that making this smoother is a BuildX thing, as BuildKit is changing its default network config such that host mode will no longer be the implicit default.

For reference, here is my current working mod of #175 (comment)

#!/bin/sh

set -ex

docker network create custom-network

echo "hello from custom network" >test.txt
docker run --name "test-server" --net custom-network -d --rm \
  -v "$(pwd)/test.txt:/usr/share/nginx/html/test.txt:ro" nginx
server_ip="$(docker container inspect test-server --format "{{ (index .NetworkSettings.Networks \"custom-network\").IPAddress }}")"
echo "Server IP is ${server_ip}"

cat >Dockerfile <<EOF
FROM curlimages/curl as build
USER root
ARG server_ip
ADD http://${server_ip}/test.txt /output/by-ip-add.txt
ADD http://test-server/test.txt /output/by-dns-add.txt
RUN mkdir -p /output \
 && cp /etc/resolv.conf /output/resolv.conf \
 && echo \${server_ip} >/output/server_ip.txt \
 && (curl -sSL http://test-server/test.txt >/output/by-dns.txt 2>&1 || :) \
 && (curl -sSL http://\${server_ip}/test.txt >/output/by-ip.txt 2>&1 || :)

FROM scratch
COPY --from=build /output /
EOF

docker buildx create \
  --name custom-net-build \
  --driver docker-container \
  --driver-opt "image=moby/buildkit:master,network=custom-network" \
  --buildkitd-flags "--allow-insecure-entitlement=network.host --oci-worker-net=host"
docker buildx build --builder custom-net-build --build-arg "server_ip=${server_ip}" \
  --network host \
  -o "type=local,dest=output" .
docker buildx inspect custom-net-build

docker buildx rm custom-net-build
docker stop test-server
docker network rm custom-network

set +x
for d in output/by-*; do echo -n "$d:"; cat $d; done

This should remain working even when BuildKit changes default networking to bridge, as I've explicitly passed "--oci-worker-net=host" to the buildkitd in the container.

I also added a pair of ADD calls to the Dockerfile, to demonstrate that even if you remove --network host from the docker buildx build call, they can still the custom network DNS as they operate in buildkitd's network namespace, i.e. implicitly host. Once bridge becomes the default, that will break. (I tried to have a quick test of what happens when bridge-mode becomes the default, but hit a minor issue with the buildkit image. Edit: Working around that, it seems to give the same results as host-mode, so I guess it's irrelevant here: ADD and COPY are operating in host-mode no matter what you pass into the worker mode, so the change in worker net-mode default won't break these.)

For buildx's docker-container driver, a simple nicety would be for the "custom network" driver option to automatically enable those two buildkitd flags --allow-insecure-entitlement=network.host --oci-worker-net=host since without the former, you can't pass --network host to docker buildx build, and without the latter, the default BuildKit network mode change will break builds that are working today by only using the custom network for ADD and COPY.

(Thinking about this, the kubernetes driver must be doing something similar, unless the same problem shows up there...)

Then the buildx docker-container custom network docs also needs to mention that you need to use docker buildx build --network host to access your custom network from RUN commands. That's a little semantically weird ("which host?") but it is documentable. Nicety options there are welcome, but I note that the buildkit maintainer feedback was that simply automatically adding --network host was not a good choice.

Longer term/more-fully, buildx could perhaps use the new bridge mode to expose the custom network to both the buildkitd (ADD/COPY) and the workers (RUN), removing the need to do anything special in the docker buildx build command except choose the already-configured-for-custom-network builder instance, and presumably mildly improving the network isolation in the process.

@tonistiigi
Copy link
Member

#2255
#2256

@dm17
Copy link

dm17 commented Feb 12, 2024

@TBBle Does that mean that one would be able to use bake with a compose file?

@TBBle
Copy link

TBBle commented Feb 13, 2024

I believe so, yes. AFAIK Bake is just driving these same components underneath, and I think all the relevant configuration options as used in the test-case I extended are exposed in compose.yaml; but note that I am not a Compose user so I haven't tried this, and some of the compose.yaml fields will not be usable as they will be passing parameters to docker buildx build that instead need to go to docker buildx create.

Edit: Actually no. I tried to get this working, and realised that I don't see how you specify a compose file that actually starts services, and then runs builds that rely on those services; the docker compose up --build command wants to build things before starting any services. (Apparently in 2018 depends_on affected building, but that wasn't intended behaviour)

So I'd need to see a working (and ideally simple) example (i.e. from non-buildx) to understand what I'm trying to recreate.

Also, as noted in #175 (comment), compose doesn't currently support creating a buildx builder instance, so the builder instance would need to be created first, which means the network must be created first and then referenced in the compose.yaml as external: true.

I also just remembered you specified docker buildx bake with a compose.yaml, not docker compose up which I was testing with. I didn't think docker buildx bake ran services from the compose.yaml, I understood it just parsed out build targets.

So yeah, without an example of what you think ought to work, I can't really advance any compose/bake discussion further.


If you're just using docker compose up to bring up the relevant services on the custom network, then docker buildx bake to do the build against those, then it should work, but you'd still need to either pre-create the custom network and builder before running the tools, or between compose-up and buildx-bake, create the custom builder attached it to the compose-created network.

I recrated the same test-script in this style:

networks:
  custom-network:
    # Force the name so that we can reference it when creating the builder
    name: custom-network

services:
  test-server:
    # Force the name so that we can reference from the build-only Dockerfile
    container_name: test-server
    image: nginx
    networks:
      - custom-network
    volumes:
      - type: bind
        source: ./test.txt
        target: /usr/share/nginx/html/test.txt
        read_only: true
  build-only:
    build:
      network: host
      dockerfile_inline: |
        FROM curlimages/curl as build
        USER root
        ADD http://test-server/test.txt /output/by-dns-add.txt
        RUN mkdir -p /output \
         && cp /etc/resolv.conf /output/resolv.conf \
         && (curl -sSL http://test-server/test.txt >/output/by-dns.txt 2>&1 || :)
        RUN for d in /output/by-*; do echo -n "$$d:"; cat $$d; done

        FROM scratch
        COPY --from=build /output /

and then created test.txt with the desired contents ("hello from custom network"), and:

$ docker compose up test-server --detach
$ docker buildx create --name custom-net-build --driver docker-container --driver-opt "image=moby/buildkit:master,network=custom-network" --buildkitd-flags "--allow-insecure-entitlement=network.host"
$ docker buildx bake --builder custom-net-build --progress plain --no-cache --set=build-only.output=type=local,dest=output
...
#11 [build 4/4] RUN for d in /output/by-*; do echo -n "$d:"; cat $d; done
#11 0.071 /output/by-dns-add.txt:hello from custom network
#11 0.072 /output/by-dns.txt:hello from custom network
#11 DONE 0.1s
...
$ docker buildx rm custom-net-build
$ docker compose down
$ for d in output/by-*; do echo -n "$d:"; cat $d; done
output/by-dns-add.txt:hello from custom network
output/by-dns.txt:hello from custom network

Note that in the docker buildx bake call, --progress plain --no-cache --set=build-only.output.type=output is only there so you can see the output in the log, so I can rerun the command and see the output each time, and to simulate the shellscript's -o "type=local,dest=output" to dump the final image into a directory so you can examine the results, respectively. (Weirdly, --set=build-only.output.type=<any string> also had this effect, which I'm pretty sure is a bug/unintended feature.)

So that seems to work, yeah. If compose and/or bake were able to define builders to create (and ideally agree on the real name of either the builder or the network in order to avoid hard-coding the network name as I did here) then it would just be a relatively simple compose up -d && bake && compose down.

If that's what you want, then I'd suggest opening a new feature request for it: I think it makes more sense to be part of compose so it can be used for compose's own builds too but SWTMS (See What The Maintainers Say). (Also, docker buildx bake doesn't know about compose name-mangling, so if this was part of bake then you'd still need to give the network an explicit name; either way you end up hard-coding the container_name for the services accessed from the Dockerfile, so local parallelisability is of the bake call only, not the whole stack)

(If you're feeling really clever, you could include a builder as a normal service in the compose spec, and then use the remote buildx driver to connect to it, making the docker buildx create call more generic; however, I haven't used the remote buildx driver so can't promise that this ends up simpler... it probably ends up more complex as you have to manage TLS and however you reach the custom-network-based builder from your buildx-hosting instance. The docker-container and kubernetes drivers avoid all this by using docker exec and kubectl exec equivalents.)

@dm17
Copy link

dm17 commented Apr 9, 2024

@TBBle Thanks; that's also what I found. I appreciate the suggestions but don't yet have the mental energy earmarked in order to complete any. I'll wait a little longer in hopes that someone else streamlines a solution before attempting again :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests