Description
For a uni project, we are supposed to introduce refactoring changes to the OpenTelemetry demo architecture. To run the local images, the repo got cloned, and version names within the .env and opentelemetry-demo.yaml got changed, forcing Docker to build the local images and not pull the remote ones. Until now, no refactoring changes have been introduced. Now we are seeing the issue that the frontend pod gets OOMKilled every 5-15 minutes. Running the architecture on the remote images works with no problem on our minikube setup (8 cpus, 14000 memory, running in a GCP VM).
The log file of the terminated frontend pod:
npm error path /app
npm error command failed
npm error signal SIGKILL
npm error command sh -c node --require ./Instrumentation.js server.js
npm notice
npm notice New major version of npm available! 10.9.2 -> 11.4.2
npm notice Changelog: https://github.com/npm/cli/releases/tag/v11.4.2
npm notice To update run: npm install -g npm@11.4.2
npm notice
npm error A complete log of this run can be found in: /home/nextjs/.npm/_logs/2025-06-17T18_23_27_597Z-debug-0.log
The log file within the pod cannot be retrieved after the pod got terminated.
What we also observed is a way higher I/O wait percentage of the CPU, with about 40-50% when running the local images and 1-5% when running the remote images.
We are happy about any idea or clue about the source of this problem.