-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ci pipeline fails with TestContainers #8350
Comments
I had a look at the error. This 304 is because we ask the container to shutdown, but the docker API says that the container remains unchanged after the command completes, which means it's still running. It's a little weird why this happens however, but the logs might give us a hint. |
Looking at the logs, you can see that the container crashed due to out of memory during the test run, so when it tried to shut it down, it turns out it was already shut down, which explains the 304. In fact, the broker never even started, it failed immediately:
|
Possibly we're running out of memory on the Docker daemon? Could be related to the increase in parallelism. |
I ran a couple builds and I was monitoring our memory usage. We never get close to the limit (60Gi) - our max for the integration agent (all containers included) is about 32Gi, so there's plenty of room for the Docker containers. I really don't get how this could have happened 🤔 |
I'm not closing this yet since I did see it happen, but it happens very sporadically, so I'd leave it as low priority for now. |
@npepinpe Did you see this occur again? If not, I would propose to close this issue |
Yes, it still happens sporadically (and for some reason usually consecutively in batches), but I haven't seen that with the GHA pipeline, so I would also be in favor of closing. |
9462: Integrate Jenkins pipeline with TC Cloud r=npepinpe a=npepinpe ## Description This PR updates both the Jenkins pipeline and the GHA pipeline to run against Testcontainers Cloud (TCC). Since TCC cannot access your local images, it relies on us starting a local registry and proxying it to the remote VM. This means the image under test changes from `camunda/zeebe:current-test` (which is really `docker.io/camunda/zeebe:current-test`) to specify the local registry's URL, `localhost:5000/camunda/zeebe:current-test`. To make our build work (while keeping our local workflow), some preliminary changes were necessary, mainly about finding the right image name. The most notable change is that the name of the image under test can now be overwritten via an environment variable, `ZEEBE_TEST_DOCKER_IMAGE` (but will fallback to `camunda/zeebe:current-test` when not defined). ### How it works TCC generally works by providing you a remote VM where your containers are started. While you can broadly think of it as Docker-as-a-Service, Docker is implementation detail as far as TCC is concerned, so we shouldn't rely on that. To do so, you start the TCC agent locally, which the `testcontainers-java` library will use to communicate with the VM. No changes are required in your tests. That said, under the hood, we still use only one VM, so we're still limited in terms of resources. One way to overcome this is to specify the concurrency, which will allocate more VMs for your tests. Right now, this is limited to 4, but hopefully we can increase this. How it works is, each JVM is pinned to a single VM in a round robin fashion. So if you have VMs 0-4 (`VM_0`, `VM_1`, etc.) and JVMs 0-4 (`JVM_0`, `JVM_1`, etc.), then you can imagine that `JVM_0` is starts containers on `VM_0`, `JVM_1` on `VM_1`, etc. This is just an example, of course. This means that if you have more JVMs than VMs, then of course the VMs are shared. So you can still run out of resources depending on test ordering and execution, but it's a little more spread out than the previous situation of one large VM. Right now, I only configured the integration tests to run with TCC. The Go tests and the Elasticsearch exporter tests also make use of containers, but as they are not resource heavy, we can run them in the same job for now. If we see to increase parallelism, then we might want to look into setting them up with TCC as well. ## Related issues closes #8350 Co-authored-by: Nicolas Pepin-Perreault <nicolas.pepin-perreault@camunda.com>
Description
During merging I encountered some issues with TestContainer:
https://ci.zeebe.camunda.cloud/blue/organizations/jenkins/camunda-cloud%2Fzeebe/detail/staging/1463/pipeline/260
I think interesting is the last part:
Is this rate limiting related to the github rate limiting you mentioned earlier this week @npepinpe ?
Another thing I was wonderign is that no test is marked as failed (in the tests tab) in the ocean view.
The text was updated successfully, but these errors were encountered: