Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no active session for <id>: context deadline exceeded #456

Closed
errordeveloper opened this issue Nov 27, 2020 · 41 comments
Closed

no active session for <id>: context deadline exceeded #456

errordeveloper opened this issue Nov 27, 2020 · 41 comments

Comments

@errordeveloper
Copy link
Contributor

I’m seeing buildx error like this:

#25 exporting to image
#25 exporting layers
#25 exporting layers 2.4s done
#25 exporting manifest sha256:bee6a0ade00b7bc2bcaae86deefc8e88e6a274dee097c8ade078e4b3a88de520 done
#25 exporting config sha256:ac3b1fe36ac072dafadc6f2aed84ff305d810a92a3e62f71a820928c832d9ac9
#25 exporting config sha256:ac3b1fe36ac072dafadc6f2aed84ff305d810a92a3e62f71a820928c832d9ac9 0.0s done
#25 exporting manifest sha256:b7812799f180bbded0a8de865efba29dcf351c858f958e506cc13e612657e4b8 0.0s done
#25 exporting config sha256:b7efbce252b10357c495758cfb1299a4c6908153eec71fc8e3097bcca5d3eb8f done
#25 exporting manifest list sha256:cbf719abf8d8c678350283152e91f9f49c1138e0087758eeaa67f6697ae4dfda done
#25 ERROR: no active session for os44n22472jqaz3bk2t3jktq2: context deadline exceeded

Things to note:

  • buildx v0.4.2
  • build takes 4h50m
  • it's meant to push to quay.io
  • full build log
@errordeveloper
Copy link
Contributor Author

My first hunch is that some context is initialised with a short deadline, and that deadline expires soon since the build is taking so long.

The error is very ambiguous, however no active session for is quite unique and is returned from here:

case <-ctx.Done():
sm.mu.Unlock()
return nil, errors.Wrapf(ctx.Err(), "no active session for %s", id)

Also, ERROR: is a unique prefix too, perhaps somewhat surprisingly!

if v.Error != "" {
if v.logsPartial {
fmt.Fprintln(p.w, "")
}
if strings.HasSuffix(v.Error, context.Canceled.Error()) {
fmt.Fprintf(p.w, "#%d CANCELED\n", v.index)
} else {
fmt.Fprintf(p.w, "#%d ERROR: %s\n", v.index, v.Error)

All I can tell is that this came from here:

var c console.Console
if cons, err := console.ConsoleFromFile(out); err == nil && (mode == "auto" || mode == "tty") {
c = cons
}
// not using shared context to not disrupt display but let is finish reporting errors
pw.err = progressui.DisplaySolveStatus(ctx, "", c, out, statusCh)
close(doneCh)

I am not entirely sure where to go from here, this is seems like some kind of generic processing queue thing and this can be coming from just about anywhere in either buildkit or buildx code...

@SphtKr
Copy link

SphtKr commented Dec 4, 2020

Not much to add, but seeing same behavior here today, similar setup, building multiarch across three nodes, amd64 + aarch64 + armv7l ... the arm7l one took close to eight hours and then hit this. Using plain TCP socket context to the two ARM nodes issuing the build on the amd64 node (UNIX socket).

@tonistiigi
Copy link
Member

Was that export supposed to end with a push? What version of buildkit?

@errordeveloper
Copy link
Contributor Author

Was that export supposed to end with a push? What version of buildkit?

@tonistiigi yes, in my case it was.

@errordeveloper
Copy link
Contributor Author

Using plain TCP socket context to the two ARM nodes issuing the build on the amd64 node (UNIX socket).

In my case everything was built in the same docker instance, with qemu for some of the Arm stages.

@errordeveloper
Copy link
Contributor Author

@tonistiigi ping!

@tonistiigi
Copy link
Member

It looks like maybe the session connection just dropped because your build took almost 5h. You should see that from the daemon logs. If that is the case then maybe we could add logic to redial. Although not really different from build request connection itself dropping or if this happens at the same time session is being used it would still fail.

@errordeveloper
Copy link
Contributor Author

Would you mind pointing me to where the redial logic would have to be implemented? Also, how can we improve the error message? I'm happy to have a go at fixing this one, just need some pointers. I'm pretty sure it's some sort of a drop.

@tonistiigi
Copy link
Member

Session request happens in https://github.com/moby/buildkit/blob/master/client/solve.go#L165 but first you should figure out if it is dropping and if there is any error message of condition that causes it.

@carlonluca
Copy link

Same happens to me, very frequently. In some cases it is almost impossible to push images to docker hub. In my case I'm crossbuilding on arm to other arm versions or amd64.

#13 exporting to image
#13 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#13 exporting layers
#13 exporting layers 103.9s done
#13 exporting manifest sha256:eb3e1510baff02d53aa0afd867f1e4d297935fc99ee88d8152bc7cd977ba8108 0.1s done
#13 exporting config sha256:5c767564ca11f26821ce80e1fe80a713b1b0bffae5fff3e47c03cd29f295e013
#13 exporting config sha256:5c767564ca11f26821ce80e1fe80a713b1b0bffae5fff3e47c03cd29f295e013 0.1s done
#13 pushing layers
#13 pushing layers 765.1s done
#13 ERROR: no active session for mxx75rit9yyhguolxzs23e4oq: context deadline exceeded
------
 > exporting to image:
------
error: failed to solve: rpc error: code = DeadlineExceeded desc = no active session for mxx75rit9yyhguolxzs23e4oq: context deadline exceeded

@pgehring
Copy link

pgehring commented Feb 25, 2021

We are getting the same error while pushing to GitHub Container Registry. We though this is related to the 5GB limit of a single image layer, but we were able to push images with layersizes >5GB.
As we observed, the layers of this example is pushed succesfully according to the logfile.

#9 exporting to image
#9 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#9 exporting layers
#9 exporting layers 989.6s done
#9 exporting manifest sha256:78e6645f9a005842a4243947bd2e6e928f9a45b071637e2a1ba42ad9bf1c1136
#9 exporting manifest sha256:78e6645f9a005842a4243947bd2e6e928f9a45b071637e2a1ba42ad9bf1c1136 0.4s done
#9 exporting config sha256:4782358c0f8cd40619f1244a283965c470e3977ef112276f93a84eb5e5111a27
#9 exporting config sha256:4782358c0f8cd40619f1244a283965c470e3977ef112276f93a84eb5e5111a27 0.4s done
#9 pushing layers
#9 pushing layers 85.5s done
#9 ERROR: no active session for d2mw2899j00mfampt13ooz376: context deadline exceeded
------
 > exporting to image:
------
error: failed to solve: rpc error: code = DeadlineExceeded desc = no active session for d2mw2899j00mfampt13ooz376: context deadline exceeded
Error: buildx call failed with: error: failed to solve: rpc error: code = DeadlineExceeded desc = no active session for d2mw2899j00mfampt13ooz376: context deadline exceeded

The following log is copied from an image with layersizes above 5GB size. As of this log, the pushing of the manifest data fails for the example above.

#9 exporting to image
#9 sha256:e8c613e07b0b7ff33893b694f7759a10d42e180f2b4dc349fb57dc6b71dcab00
#9 pushing layers 461.4s done
#9 pushing manifest for ghcr.io/...
#9 pushing manifest for ghcr.io/... 1.2s done
#9 DONE 805.3s
🛒 Extracting digest...
sha256:56e2cf1b9b5d78b293851ef6e68e23494885974db6e945141677240c458d380c

@errordeveloper
Copy link
Contributor Author

errordeveloper commented Mar 12, 2021

moby/buildkit#2019 is potentially related with regard to improving how the underlying error handling machinery is implemented.

@deepio
Copy link

deepio commented Apr 6, 2021

For anyone else running into this issue as part of their CI pipeline, we've decided to downgrade to legacy builds instead because reliability is more important than speed.

@hfedcba
Copy link

hfedcba commented May 23, 2021

This also happens with short build times, in this case less than 10 minues. Often the same build passes with a build time of about 12 minutes. I'd say it fails in about 50 % of the jobs.

#9 DONE 528.6s
#10 exporting to image
#10 exporting layers
#10 exporting layers 10.6s done
#10 exporting manifest sha256:e341e2c0bc3510d62cc7650f3ee0e5006bb679499977832fc8a6b493d5dc8491
#10 exporting manifest sha256:e341e2c0bc3510d62cc7650f3ee0e5006bb679499977832fc8a6b493d5dc8491 0.4s done
#10 exporting config sha256:082f6382fb3047e7182df5bf0360c2cf5b9927b4b4fd8664ad058c75663bc485
#10 exporting config sha256:082f6382fb3047e7182df5bf0360c2cf5b9927b4b4fd8664ad058c75663bc485 0.4s done
#10 ERROR: no active session for 7c6uxna8fa27v8oanfq8zcj3l: context deadline exceeded
------
 > exporting to image:
------
failed to solve: rpc error: code = Unknown desc = no active session for 7c6uxna8fa27v8oanfq8zcj3l: context deadline exceeded
Cleaning up file based variables 00:00
ERROR: Job failed: exit status 1

@Alexander-Bartosh
Copy link

It looks like related to concurrent builds on the same instance of buildkit.
Have the same problem reproduced multiple times. All build requests take no more than 15 minutes
Env:
Buildkit: moby/buildkit:v0.8.3-rootless, Azure Kubernetes service, Azure Container service

error writing layer blob: no active session for s6kthksskcg47bmuu60w25p3o: context deadline exceeded

@wojiushixiaobai
Copy link

The same problem.

@Oliveirakun
Copy link

Does anybody knows any workaround for this issue?

@wojiushixiaobai
Copy link

wojiushixiaobai commented Jul 18, 2021

@Oliveirakun
Temporary solutions can be used..

export DOCKER_BUILDKIT=1
for i in $(seq 1 3); do
  if docker buildx build --platform linux/amd64,linux/arm64 -t test:dev . --push; then
    break
  fi
  sleep 3
  if [[ "${i}" == "3" ]]; then
    echo "[Error]: build error"
    exit 1
  fi
done

Add retry mechanism..

@tonistiigi
Copy link
Member

I'd like to see daemon debug logs for this case. Possible cases are

  1. network hiccup to dial session. Solution would be to increase timeout, I think it is 5sec atm.
  2. session disconnects in the middle of the build, or the session stream receives some HTTP2 error. Solution would be for client to redial.
  3. something goes wrong and wrong sessionid is asked.

@ifeelingz
Copy link

Same Problem : (

@ddomnik
Copy link

ddomnik commented Aug 16, 2021

Same problem here ... anybody got a solution or hack for that? Or otherwise can we store multiarch images locally in our "images" and then try to push them?

@ifeelingz
Copy link

ifeelingz commented Aug 16, 2021

Hi everyone!

I add before " Build and push " step.

  - name: Pre Build
    run: "docker system prune -af"

or

  - name: Pre Build
    run: "docker system prune --volume -af"

It works for me.

@Alexander-Bartosh
Copy link

Guys I have stopped seeing the problem after update to Buildkit: moby/buildkit:v0.9.0-rootless and increasing cache size from default to 50 GB ( --oci-worker-gc-keepstorage=50000)

@Alexander-Bartosh
Copy link

Alexander-Bartosh commented Aug 24, 2021

Still have it on Buildkit: moby/buildkit:v0.9.0-rootless, Azure Kubernetes service, Azure Container service

In my case was related to 2 concurrent cache exports:

2021-08-24T12:19:53.5930996Z #26 exporting cache
2021-08-24T12:19:53.5931582Z #26 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
2021-08-24T12:19:53.5931959Z #26 preparing build cache for export
2021-08-24T12:19:54.1934009Z #26 preparing build cache for export 0.5s done
2021-08-24T12:19:54.1934497Z #26 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d
2021-08-24T12:20:14.4543178Z #26 20.95 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:14.4543633Z #26 20.95 retrying in 1s
2021-08-24T12:20:44.4583718Z #26 50.95 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:44.4584324Z #26 50.95 retrying in 2s
2021-08-24T12:20:59.4693657Z #26 65.96 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:59.4694097Z #26 65.96 retrying in 4s
2021-08-24T12:21:09.4759423Z #26 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d 75.4s done
2021-08-24T12:21:09.4760392Z #26 75.97 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:21:09.4760992Z #26 ERROR: error writing layer blob: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded

2021-08-24T12:19:39.0062856Z #43 exporting cache
2021-08-24T12:19:39.0063170Z #43 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
2021-08-24T12:19:39.0063480Z #43 preparing build cache for export
2021-08-24T12:19:39.4326968Z #43 preparing build cache for export 0.5s done
2021-08-24T12:19:39.5829077Z #43 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d
2021-08-24T12:19:54.4506394Z #43 15.50 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:19:54.4506870Z #43 15.50 retrying in 1s
2021-08-24T12:20:24.4557915Z #43 45.50 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:24.4558485Z #43 45.50 retrying in 2s
2021-08-24T12:20:54.4602936Z #43 75.51 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:54.4604240Z #43 75.51 retrying in 4s
2021-08-24T12:21:04.4717026Z #43 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d 85.0s done
2021-08-24T12:21:04.4717958Z #43 85.52 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:21:04.4718705Z #43 ERROR: error writing layer blob: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded

@wojiushixiaobai
Copy link

Still have it on Buildkit: moby/buildkit:v0.9.0-rootless, Azure Kubernetes service, Azure Container service

In my case was related to 2 concurrent cache exports:

## 2021-08-24T12:19:53.5930996Z #26 exporting cache
2021-08-24T12:19:53.5931582Z #26 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
2021-08-24T12:19:53.5931959Z #26 preparing build cache for export
2021-08-24T12:19:54.1934009Z #26 preparing build cache for export 0.5s done
2021-08-24T12:19:54.1934497Z #26 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d
2021-08-24T12:20:14.4543178Z #26 20.95 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:14.4543633Z #26 20.95 retrying in 1s
2021-08-24T12:20:44.4583718Z #26 50.95 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:44.4584324Z #26 50.95 retrying in 2s
2021-08-24T12:20:59.4693657Z #26 65.96 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:59.4694097Z #26 65.96 retrying in 4s
2021-08-24T12:21:09.4759423Z #26 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d 75.4s done
2021-08-24T12:21:09.4760392Z #26 75.97 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:21:09.4760992Z #26 ERROR: error writing layer blob: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
## 2021-08-24T12:19:39.0062856Z #43 exporting cache
2021-08-24T12:19:39.0063170Z #43 sha256:2700d4ef94dee473593c5c614b55b2dedcca7893909811a8f2b48291a1f581e4
2021-08-24T12:19:39.0063480Z #43 preparing build cache for export
2021-08-24T12:19:39.4326968Z #43 preparing build cache for export 0.5s done
2021-08-24T12:19:39.5829077Z #43 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d
2021-08-24T12:19:54.4506394Z #43 15.50 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:19:54.4506870Z #43 15.50 retrying in 1s
2021-08-24T12:20:24.4557915Z #43 45.50 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:24.4558485Z #43 45.50 retrying in 2s
2021-08-24T12:20:54.4602936Z #43 75.51 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:20:54.4604240Z #43 75.51 retrying in 4s
2021-08-24T12:21:04.4717026Z #43 writing layer sha256:0da622ee6d9e9d6172a4eb7e3647a200f74190536a9c3b8b53a9c2d014803b7d 85.0s done
2021-08-24T12:21:04.4717958Z #43 85.52 error: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded
2021-08-24T12:21:04.4718705Z #43 ERROR: error writing layer blob: no active session for aim388ulrmif949ccgd2k0xza: context deadline exceeded

Maybe you can format the reply so that others can read better.

@tonistiigi
Copy link
Member

moby/buildkit#2369 was merged. Hopefully this is fixed now.

@tonistiigi
Copy link
Member

@ekaterinadimitrova2 are you running the master buildkit build with the patch?

@koehn
Copy link

koehn commented Nov 11, 2021

I can confirm that this is still happening with the master build kit build moby/buildkit.

All local using qemu for the emulated Arm64 bits.

@buu700
Copy link

buu700 commented Jan 25, 2022

I've been consistently running into this with Docker for Mac 4.4.2 on my M1 MacBook Air, specifically when building a Dockerfile with a COPY instruction. Downgrading to 4.3.2 seems to have fixed the problem.

@errordeveloper
Copy link
Contributor Author

I've been consistently running into this with Docker for Mac 4.4.2 on my M1 MacBook Air, specifically when building a Dockerfile with a COPY instruction. Downgrading to 4.3.2 seems to have fixed the problem.

Do you use 'docker buildx create'?

@buu700
Copy link

buu700 commented Jan 25, 2022

I tried it both ways with the same result, but when I did use buildx create the command was docker buildx create --buildkitd-flags '--oci-worker-gc --oci-worker-gc-keepstorage 50000' --name cyph_build_context ; docker buildx use cyph_build_context. The buildkit flags were set based on @Alexander-Bartosh's suggested workaround above, and I also tried setting that in the global configuration file through the Docker for Mac preferences window.

@jamshid
Copy link

jamshid commented Apr 2, 2022

I've been trying to figure out why I get this dreaded error when doing a few concurrent DOCKER_BUILDKIT=1 docker build(x) builds:

failed to solve with frontend dockerfile.v0: failed to read dockerfile: no active session for kqupcokbxys31oapli62ax81k: context deadline exceeded

Btw I don't get the error when DOCKER_HOST is blank. Only get it when using DOCKER_HOST=ssh://docker.example.com. I'm on that server docker.example.com (passwordless ssh is configured and working).

Anyway I saw this closed bug and figure it's been fixed for so long I must have the fix, but my docker 20.10.14 on ubuntu 20 still uses buildx: Docker Buildx (Docker Inc., v0.8.1-docker)! Why is it so old? Is there any way to upgrade it?

@errordeveloper
Copy link
Contributor Author

@jamshid does the error occur right away in your case or as a timeout? This was originally reported as a timeout case, if you are seeing it right away it might be a different bug altogether.

@jamshid
Copy link

jamshid commented Apr 3, 2022

@errordeveloper thanks yes its pretty immediate.

Where are the useful logs, just journalctl -u docker? I can enable debug logging for docker client (--debug), server, and buildx somehow?

I haven't had much luck narrowing it down to something easily reproducible but I can file a bug with the logs. But I'd still like to upgrade buildx to see if this is already fixed.

@errordeveloper
Copy link
Contributor Author

Anyway I saw this closed bug and figure it's been fixed for so long I must have the fix, but my docker 20.10.14 on ubuntu 20 still uses buildx: Docker Buildx (Docker Inc., v0.8.1-docker)! Why is it so old? Is there any way to upgrade it?

Are you using official Docker package for Ubuntu? I'd recommend trying using latest official packages, it's probably a good idea to also remove ~/.docker/cli-plugins/docker-buildx before installing the latest version.

@errordeveloper
Copy link
Contributor Author

I have to admit, what I said above is just a general point about upgrading, but having looked more specifically at the details, I can see that you don't have to upgrade.

Firstly, buildx v0.8.1 is only two weeks old and one patch relase behind latests (v0.8.2).
Secondly, this bug is most likely to be in buildkitd, and by default it's part of Docker engine. Docker 21.x is not out yet, so you won't get latest buildkit unless you use container driver.

You should try this to get an instance of buildkitd running in a container:

docker buildx create --use  --driver docker-container

Having done that, you should get latest stable version out of the box.
If you can reproduce the behaviour this way, please do open another issue as from you are saying it looks like a new bug associated with SSH transport.

@jamshid
Copy link

jamshid commented Apr 4, 2022

I am using latest official ubuntu docker packages. Ok will file a new bug about the SSH transport.

Sorry my confusion about buildx versions was that I was looking at the buildkit releases under https://github.com/moby/buildkit/releases. That versioning is apparently unrelated to the buildx cli plugin (https://github.com/docker/buildx) versioning.

@Cveinnt
Copy link

Cveinnt commented May 21, 2022

Getting this error for GitHub Action - has anyone encountered anything similar?

@aikoven
Copy link

aikoven commented Jun 22, 2022

I was getting this error in GitHub Actions. Solved by bumping action versions:

docker/setup-buildx-action@v2
docker/login-action@v2
docker/build-push-action@v3

UPD: still getting the error sometimes 😔

UPD2: solved by running on a machine with more memory.

@buu700
Copy link

buu700 commented Jun 28, 2022

buu700 commented on Jan 24

I've been consistently running into this with Docker for Mac 4.4.2 on my M1 MacBook Air, specifically when building a Dockerfile with a COPY instruction. Downgrading to 4.3.2 seems to have fixed the problem.

I confirmed that this is still an issue in version 4.9.1, this time testing on x86 macOS. As before, downgrading to 4.3.2 is an effective workaround.

akihironitta added a commit to Lightning-AI/pytorch-lightning that referenced this issue Jul 26, 2022
akihironitta added a commit to Lightning-AI/pytorch-lightning that referenced this issue Aug 10, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
awaelchli pushed a commit to Lightning-AI/pytorch-lightning that referenced this issue Aug 15, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
jessecambon pushed a commit to jessecambon/lightning that referenced this issue Aug 16, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
lexierule pushed a commit to Lightning-AI/pytorch-lightning that referenced this issue Aug 17, 2022
* update version and changelog for 1.7.2 release

* Reset all results on epoch end (#14061)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

* Skip ddp fork tests on windows (#14121)

* Fix device placement when `.cuda()` called without specifying index (#14128)

* Convert subprocess test to standalone test (#14101)

* Fix entry point test for Python 3.10 (#14154)

* Fix flaky test caused by weak reference (#14157)

* Fix saving hyperparameters in a composition where parent is not a LM or LDM (#14151)



Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Remove DeepSpeed version restriction from Lite (#13967)

* Configure the check-group app (#14165)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* Update onnxruntime requirement from <=1.12.0 to <1.13.0 in /requirements (#14083)

Updates the requirements on [onnxruntime](https://github.com/microsoft/onnxruntime) to permit the latest version.
- [Release notes](https://github.com/microsoft/onnxruntime/releases)
- [Changelog](https://github.com/microsoft/onnxruntime/blob/master/docs/ReleaseManagement.md)
- [Commits](microsoft/onnxruntime@v0.1.4...v1.12.1)

---
updated-dependencies:
- dependency-name: onnxruntime
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update gcsfs requirement from <2022.6.0,>=2021.5.0 to >=2021.5.0,<2022.8.0 in /requirements (#14079)

Update gcsfs requirement in /requirements

Updates the requirements on [gcsfs](https://github.com/fsspec/gcsfs) to permit the latest version.
- [Release notes](https://github.com/fsspec/gcsfs/releases)
- [Commits](fsspec/gcsfs@2021.05.0...2022.7.1)

---
updated-dependencies:
- dependency-name: gcsfs
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix a bug that caused spurious `AttributeError` when multiple `DataLoader` classes are imported (#14117)


fix

* CI: Replace `_` of in GHA workflow filenames with `-` (#13917)

* Rename workflow files

* Update docs

* Fix azure badges

* Update the main readme

* bad rebase

* Update doc

* CI: Update Windows version from 2019 to 2022 (#14129)

Update windows

* CI/CD: Add CUDA version to docker image tags (#13831)

* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Avoid entry_points deprecation warning (#14052)

Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* Configure the check-group app (#14165)

Co-authored-by: Jirka <jirka.borovec@seznam.cz>

* Profile batch transfer and gradient clipping hooks (#14069)

Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Avoid false positive warning about using `sync_dist` when using torchmetrics (#14143)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Avoid raising the sampler warning if num_replicas=1 (#14097)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* Remove skipping logic in favor of path filtering (#14170)

* Support checkpoint save and load with Stochastic Weight Averaging (#9938)

Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Carlos Mocholi <carlossmocholi@gmail.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* Use fsdp module to initialize precision scalar for fsdp native (#14092)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>

* add more issues types (#14174)

* add more issues types

* Update .github/ISSUE_TEMPLATE/config.yml

Co-authored-by: Mansy <ahmed.mansy156@gmail.com>

* typo

Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>

Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Co-authored-by: Adrian Wälchli <aedu.waelchli@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>

* CI: clean building docs (#14216)

* CI: clean building docs

* group

* .

* CI: docker focus on PL only (#14246)

* CI: docker focus on PL only

* group

* Allowed setting attributes on `DataLoader` and `BatchSampler` when instantiated inside `*_dataloader` hooks (#14212)


Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>

* Revert "Remove skipping logic in favor of path filtering (#14170)" (#14244)

* Update defaults for WandbLogger's run name and project name (#14145)

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>
Co-authored-by: Rohit Gupta <rohitgr1998@gmail.com>
Co-authored-by: Jirka <jirka.borovec@seznam.cz>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Akihiro Nitta <nitta@akihironitta.com>
Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>
Co-authored-by: Adam J. Stewart <ajstewart426@gmail.com>
Co-authored-by: otaj <6065855+otaj@users.noreply.github.com>
Co-authored-by: Adam Reeve <adreeve@gmail.com>
Co-authored-by: thomas chaton <thomas@grid.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>
Co-authored-by: Laverne Henderson <laverne.henderson@coupa.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Kaushik B <45285388+kaushikb11@users.noreply.github.com>
Co-authored-by: Mansy <ahmed.mansy156@gmail.com>
Borda pushed a commit to Lightning-AI/pytorch-lightning that referenced this issue Aug 22, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

(cherry picked from commit d5f35ec)
lantiga pushed a commit to Lightning-AI/pytorch-lightning that referenced this issue Aug 22, 2022
* append cuda version to tags

* revertme: push to hub

* Update docker readme

* Build base-conda-py3.9-torch1.12-cuda11.3.1

* Use new images in conda tests

* revertme: push to hub

* Revert "revertme: push to hub"

This reverts commit 0f7d534.

* Revert "revertme: push to hub"

This reverts commit 46a05fc.

* Run conda if workflow edited

* Run gpu testing if workflow edited

* Use new tags in release/Dockerfile

* Build base-cuda and PL release images with all combinations

* Update release docker

* Update conda from py3.9-torch1.12 to py3.10-torch.1.12

* Fix ubuntu version

* Revert conda

* revertme: push to hub

* Don't build Python 3.10 for now...

* Fix pl release builder

* updating version contribute to the error? docker/buildx#456

* Update actions' versions

* Update slack user to notify

* Don't use 11.6.0 to avoid bagua incompatibility

* Don't use 11.1, and use 11.1.1

* Update .github/workflows/ci-pytorch_test-conda.yml

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

* Update trigger

* Ignore artfacts from tutorials

* Trim docker images to distribute

* Add an image for tutorials

* Update conda image 3.8x1.10

* Try different conda variants

* No need to set cuda for conda jobs

* Update who to notify ipu failure

* Don't push

* update filenaem

Co-authored-by: Luca Medeiros <67411094+luca-medeiros@users.noreply.github.com>

(cherry picked from commit d5f35ec)
@padrepitufo
Copy link

I was seeing this error crop up but only when using v0.33.2 of tilt, having upgraded to v0.33.10 the error went away 🤷‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests