Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent is Unable to kill container when Task Definition Memory Is reached #794

Closed
dm03514 opened this issue May 9, 2017 · 22 comments
Closed

Comments

@dm03514
Copy link

dm03514 commented May 9, 2017

I am running a container and when the hard task memory limit is reached it is not killed. In addition to not dying it begins to do a large amount of docker.io.read_bytes (observed from ECS datadog integration).

$ uname -r
4.4.51-40.58.amzn1.x86_64
$ sudo docker -v
Docker version 1.12.6, build 7392c3b/1.12.6

Agent version 1.14.1

Stats shows that the container Id frequently reaches 100% memory complete, and shows BLOCK I/O perpetually increasing (the application should only be using BLOCK I/O to read a configuration file during startup)

$ sudo docker stats <container_id>
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
43a240399e21        3.82%               99.93 MiB / 100 MiB   99.93%              0 B / 0 B           45.72 GB / 0 B      0

The container remains up:

CONTAINER ID        IMAGE                                                                           COMMAND              CREATED             STATUS              PORTS               NAMES
43a240399e21        420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest   "events-writer.py"   13 minutes ago      Up 13 minutes                           ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00
$ sudo docker inspect 43a240399e21
[
    {
        "Id": "43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853",
        "Created": "2017-05-09T19:38:28.950062224Z",
        "Path": "events-writer.py",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 25144,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-05-09T19:38:29.592819604Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:8bbab436d3e9458aa48c9f65e1c99832033ef3d6dc9c41a728962fd7a40ab553",
        "ResolvConfPath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/hostname",
        "HostsPath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/hosts",
        "LogPath": "",
        "Name": "/ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00",
        "RestartCount": 0,
        "Driver": "devicemapper",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
          "HostConfig": {
            "Binds": [
                "/etc/xxx:/etc/xxx",
                "/raid0/workers/log:/raid0/workers/log"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "awslogs",
                "Config": {
                    "awslogs-group": "ecs-production-events-writer-xxx",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream": "s3_events_writer/s3_events_writer_worker/ff36c369-f4e7-46b3-b8af-89a3823dcc37"
                }
            },
            "NetworkMode": "host",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "label=disable"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                             0,
                0
            ],
            "Isolation": "",
            "CpuShares": 256,
            "Memory": 104857600,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 209715200,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "devicemapper",
            "Data": {
                "DeviceId": "125",
                "DeviceName": "docker-202:1-263237-7e419402b4a47f4da21314bf3ae14aff3a4f95b26b2932c3099089474329eed9",
                "DeviceSize": "10737418240"
            }
        },
        "Mounts": [
            {
                "Source": "/etc/xxx",
                "Destination": "/etc/xxx",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/raid0/workers/log",
                "Destination": "/raid0/workers/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "ip-10-0-116-202",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "XXX=XXX",
                "MAX_BUFFER_EVENTS_FOR_FLUSH=7000",
                "XXX=8000",
                "XXX=35",
                "EVENTS_CHANNEL=s3_events_writer_xxx",
                "MAX_EVENTS_PER_FILE=500",
                "S3_BUCKET=XXX",
                "XXX=1",
                "XXX=1",
                "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LANG=C.UTF-8",
                "GPG_KEY=XXX",
                "PYTHON_VERSION=2.7.13",
                "XXX=9.0.1",
                "XXX=1"
            ],
            "Cmd": [
                "events-writer.py"
            ],
            "Image": "420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest",
            "Volumes": null,
            "WorkingDir": "/usr/src/app",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "com.amazonaws.ecs.cluster": "production",
                "com.amazonaws.ecs.container-name": "s3_events_writer_worker",
                "com.amazonaws.ecs.task-arn": "arn:aws:ecs:us-east-1:420876366622:task/ff36c369-f4e7-46b3-b8af-89a3823dcc37",
                "com.amazonaws.ecs.task-definition-family": "production-events-writer",
                "com.amazonaws.ecs.task-definition-version": "57"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f4dbb092647e221c6ebf8e5cb5f6dd2ad5a0152919738ecf5bad7f84af84de2e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "1a8ec44a83a5c735e9c8c25b017874ed8041249bc8ed525ac6f68a9097abf47d",
                    "EndpointID": "0f3888269e99921cdfb848f5aaecbb77a4d6cbd9d4dac7b2c5685d40230b888e",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]

Sometimes the agent IS able to kill the container after 10-20 minutes:

43a240399e21        420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest    "events-writer.py"   19 minutes ago      Exited (137) About a minute ago                       ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00

Also once the container is in a 100% state, if I try to exec -it <container_id> /bin/bash it will hang for a while and then register the SIGKILL, almost like it finally recognizes SIGKILL only after I exec.

The daemonization feature, and auto restart is critical to keeping resource depletion failures from taking down other services and would really appreciate any insight possible.

Thank you

@jhaynes
Copy link
Contributor

jhaynes commented May 10, 2017

Hi @dm03514, thanks for filing the issue. I'd like a little more information before I can help though. Could you add the output of docker info as well as your task definition? If you're not comfortable putting it on github, feel free to email me at jushay at amazon dot com.

Thanks,
Justin

@dm03514
Copy link
Author

dm03514 commented May 10, 2017

$docker info

Containers: 3
 Running: 3
 Paused: 0
 Stopped: 0
Images: 2
Server Version: 1.12.6
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: ext4
 Data file:
 Metadata file:
 Data Space Used: 1.752 GB
 Data Space Total: 26.54 GB
 Data Space Available: 24.79 GB
 Metadata Space Used: 647.2 kB
 Metadata Space Total: 29.36 MB
 Metadata Space Available: 28.71 MB
 Thin Pool Minimum Free Space: 2.654 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host null bridge overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options:
Kernel Version: 4.4.51-40.58.amzn1.x86_64
Operating System: Amazon Linux AMI 2016.09
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.307 GiB
Name: ip-10-0-116-202
ID: XYQI:QZAZ:VXWH:MJPR:SMKN:ZYO4:EQUK:ZRTQ:27BO:6W25:6E2H:ONRX
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Insecure Registries:
 127.0.0.0/8

Emailing you the task definition, Thank you

@jhaynes
Copy link
Contributor

jhaynes commented May 11, 2017

Thanks for the additional info @dm03514. Can you confirm a few suspicions I have?

  1. Are you running with swap enabled?
  2. Can you see your IO burst balance dropping in the cloud watch metrics for either ebs volume attached to your instance?

Assuming those are both true, what you're seeing is related to #124 (comment). Docker's --memory flag, and the associated api, default to configuring swap memory equal to the amount requested in the flag.

@dm03514
Copy link
Author

dm03514 commented May 12, 2017

Thank you, i'll check first thing tomorrow morning (EST)

@dm03514
Copy link
Author

dm03514 commented May 15, 2017

It looks like we do not have swap enabled based on:

  • ops checking: "i checked with free, there's no swap memory on the machines"

And IO burst balance was observed to be dropping by the person helping us with AWS support

@jhaynes
Copy link
Contributor

jhaynes commented May 15, 2017

Thank you for the info. I haven't been able to reproduce this on my end, but if you have repro steps, please let me know.

Could you send me the container instance arn, docker logs, and ecs-agent logs on an instance where this is happening? You can use the ECS Logs Collector to grab the logs as well as some helpful system logs.

@bobzoller
Copy link

We're debugging the exact same issue here, but I believe the issue lies with the kernel and not the ECS agent or even docker. (the oomkiller lives in the kernel)

Very basic containers (a few different nodejs-based apps, one collectd) container, reaches its memory limit, observed to sit between 99.9 and 100% of the limit, starts chewing through IO read on the docker volume, which eventually exhausts our burst balance, at which point the host (and other workloads) become pretty unhappy. The container may or may not be eventually killed OOM, but not as soon as one would expect.

In one case I directly observed, docker stats reported the container in question flapping between 99 and 100% usage, but it was only killed OOM after almost an hour in that state. syslogs confirm the kernel didn't consider killing it until then.

A few things that seem relevant to note:

  • we manually adjust the value of /cgroup/memory/docker/<container>/memory.memsw.limit_in_bytes to match /cgroup/memory/docker/<container>/memory.limit_in_bytes, to attempt to disable swap usage for all containers (where by default it is 2X the memory limit)
  • we have swap enabled on the host (with a 5GB EBS volume), but the spike in IO reads we see are on the docker devicemapper volume, not the swap volume
  • host sets vm.swappiness=0
  • Amazon ECS-Optimized Amazon Linux AMI 2016.09.g

@dm03514 if we figure it out I'll make sure you hear about it, would appreciate the same!

@dm03514
Copy link
Author

dm03514 commented May 15, 2017

@bobzoller absolutely, i was afraid the problem was going to be in the OS :( debbuging those sorts of issues is pretty over my experience level. Have you happened to have any success with any different kernel versions :p looking for the easy way out :)

@bobzoller
Copy link

@swettk and I spent more time with this today, and have plausible theory: as the container approaches its memory limit, it causes major page faults. This could be why we see high reads but almost no writes, and they are reads from the docker disk, not the swap disk. It eventually crosses the actual cgroup memory limit and will then get killed OOM, but while it hangs out at that boundary you may end up thrashing your disk.

we're personally planning to investigate:

  • using cgroups to limit disk IO (as this cluster is not intended for disk-heavy loads)
  • writing an agent to police containers and kill ones with high major page faults and/or disk IO, just as the oomkiller does with memory
  • increasing our IOPS budget (which currently stands at a pretty low 150)

@jhaynes
Copy link
Contributor

jhaynes commented May 24, 2017

@bobzoller and/or @dm03514 I'd like to add one more avenue of investigation. Try increasing your task's memory limit a bit.

The page faults you're seeing are likely due to the page cache being partially flushed to free up more process memory. This, in turn, causes your application to need to re-read portions of itself (or its dependencies) from disk. Depending on your application's structure, this can cause a fairly tight feedback loop where for any unit of work to proceed, lots of disk IO will occur.

@bobzoller
Copy link

bobzoller commented May 25, 2017

correct @jhaynes, as I said major page faults are absolutely the problem and increasing the memory limit will resolve it until you bump up against the limit again.

as we're striving for container isolation and protecting the health of the host, we chose to write a simple reaper that runs on every ECS instance and stops containers that have crossed a major page fault threshold we chose based on our environment (happy containers might cause 300/day, and sad containers can rack up hundreds of thousands within a few minutes). running it every minute using cron has been effective: these containers are now killed off within 60 seconds of them starting to thrash the disk, and the host recovers without intervention. ECS reschedules the container if necessary, and we notify the responsible engineer so they can investigate later. 👌

Our script looks something like this:

#!/bin/sh

# don't kill containers using these images even if they're misbehaving
EXCLUDES_PATTERN=$(cat <<'EOF' | xargs | sed 's/ /|/g'
amazon/amazon-ecs-agent
EOF
)

# list all the candidate containers
targets=$(docker ps --no-trunc --format '{{.ID}} {{.Image}}' | grep -Ev "$EXCLUDES_PATTERN" | awk '{ print $1; }' | xargs)

for target in $targets; do
  cd "/cgroup/memory/docker/$target" || exit
  info="id=$target $(docker inspect --format 'image={{.Config.Image}} StartedAt="{{.State.StartedAt}}"' "$target") pgmajfault=$(grep total_pgmajfault memory.stat | awk '{print $2;}')"
  value=$(echo "$info" | awk '{ print $4;}' | sed 's/pgmajfault=//g')

  if [ "$value" -gt 10000 ]; then
    echo "Executing docker stop on container due to $value major page faults ($info)"
    docker stop "$target" &
  fi

  cd - || exit
done

wait

HTH!

@jhaynes
Copy link
Contributor

jhaynes commented May 26, 2017

@dm03514 I'm inclined to close this since it isn't directly related to an ECS issue. However, if you or @bobzoller wind up with other questions or issues, feel free to open bugs here or engage directly with AWS Support.

@HarryWeppner
Copy link

@jhaynes @bobzoller Just ran into this issue myself and am wondering whether an "out-of-agent" cron job is still the recommended course of action?

@bobzoller
Copy link

we still run our cron job "reaper" just in case, but since moving off Amazon Linux onto Ubuntu we haven't seen a single occurrence. I'd assume this is more to do with kernel version and less to do with distro, but I can't tell you for sure. FWIW we're currently running kernel 4.13.0-31-generic on Ubuntu Xenial 16.04.

@HarryWeppner
Copy link

@bobzoller thanks. I am seeing this on an Amazon ECS-Optimized Amazon Linux AMI 2017.09.f

$ uname -r
4.9.75-25.55.amzn1.x86_64

@vikalpj
Copy link

vikalpj commented Feb 21, 2018

Thanks @bobzoller For the wonderful script...
It seems like the above script needs some updates for the newly setup ecs hosts.

#!/bin/bash -e

##
# Use this annotated script as base for killing container misbehaving on reaching memory limit
#
# Requirements:
# - `jq` must be installed on ecs machine
##

# don't kill containers using these images even if they're misbehaving
EXCLUDES_PATTERN=$(cat <<'EOF' | xargs | sed 's/ /|/g'
amazon/amazon-ecs-agent
EOF
)

# list all the candidate containers
targets=$(docker ps --no-trunc --format '{{.ID}} {{.Image}}' | grep -Ev "$EXCLUDES_PATTERN" | awk '{ print $1; }' | xargs)
for target in $targets; do

  # get taskid and dockerid from ecs
  task=$(curl -s http://localhost:51678/v1/tasks?dockerid=$target)
  taskId=$(echo $task | jq -r ".Arn" | cut -d "/" -f 2)
  dockerId=$(echo $task | jq -r ".Containers[0] .DockerId")
  memoryStatsFile="/cgroup/memory/ecs/$taskId/$dockerId/memory.stat"

  # skip current target if cannot find memory stats file, might not be managed by ecs
  if ! [ -f $memoryStatsFile ]
  then echo "Memory stats not found for taskId=$taskid dockerId=$dockerId" && continue
  fi

  info="id=$target $(docker inspect --format 'image={{.Config.Image}} StartedAt="{{.State.StartedAt}}"' "$target") pgmajfault=$(grep total_pgmajfault $memoryStatsFile | awk '{print $2;}')"
  majorPageFaults=$(echo "$info" | awk '{ print $4;}' | sed 's/pgmajfault=//g')

  if [ "$majorPageFaults" -gt 5000 ]; then
    echo "Stopping container due to major page faults exceeding threshold ($info)"
    docker stop "$target"
  fi
done

@adamgotterer
Copy link

We are also having the same problem on Amazon Linux AMI 2017.09. A container uses up all it's available memory and starts thrashing reads. Container is pretty much unavailable until its eventually killed off.

Besides the reaper cron, has anyone found a reasonable solution?

@owengo
Copy link

owengo commented Apr 7, 2018

amzn-ami-2017.09.i-amazon-ecs-optimized is still affected by the issue.
Is there any plan to provide a kernel compatible with docker for the "ecs-optimized" ami ?
The Ubuntu "solution" and the reaper-cron "solution" do not feel really sound.

@toredash
Copy link

We hit this issue ourselves when someone configured to little memory to a task.

I think one part of the problem is that the container never reaches its memory limit. I tested this by giving a container that requires 128MB RAM just to start, only 8 MB.

The container (according to quay.io/vektorlab/ctop, docker run --rm -ti --name=ctop -v /var/run/docker.sock:/var/run/docker.sock quay.io/vektorlab/ctop:latest) never reaches more than 6MB before trashing the system with disk I/O. My exceptions are probably wrong, but the hard limit is in my view the trigger point for when ecs-agent/docker/kernel should kill the process, since it is way out of the expected threshold of operations.

My biggest annoyance with this, is that it is really hard to detect. I could use the scripte provided by vikalpj, and log the output to a log group in CloudWatch, and trigger an alarm on new events. But that is not my expectations of a the ECS product, I expect it to kill the container and inform me why. Now it just trashed the disk.

@jhaynes, are you open to re-opent this issue, or look at alternatives to log this with ecs-agent ?

@owengo
Copy link

owengo commented Dec 10, 2018

We hit this issue ourselves when someone configured to little memory to a task.

I think one part of the problem is that the container never reaches its memory limit. I tested this by giving a container that requires 128MB RAM just to start, only 8 MB.

The container (according to quay.io/vektorlab/ctop, docker run --rm -ti --name=ctop -v /var/run/docker.sock:/var/run/docker.sock quay.io/vektorlab/ctop:latest) never reaches more than 6MB before trashing the system with disk I/O. My exceptions are probably wrong, but the hard limit is in my view the trigger point for when ecs-agent/docker/kernel should kill the process, since it is way out of the expected threshold of operations.

Yes, all the fs cache has disappeared, but the application is not "out of memory". The application did allocate only 6MB, but when the kernel needs to access the code of the application, it is not available in memory so it has to read it from disk. As if it was running 100% on swap except the heap memory segment..

The workaround I have is to configure ecs tasks with a memory reservation ( aka "soft" ) big enough to fit the process image and all the files needed by the application. Then you hope that your application will never break the limit or if it does, it will be a big allocation that will break the limit at once before any disk trashing occurs allowing the oom killer to destroy your process.

Obviously you have to spend some time reading docker stats for your workload.. And if your application leaks slowly you will hit the problem again and again..

Maybe some fine tuning on sys/vm could fix or serioulsy alleviate the issue, but I would like to have the official ecs ami configured with a correct setting.

@wpalmer
Copy link

wpalmer commented Dec 2, 2019

this issue, or one very similar to it, appears to still be present (hello from the tail-end of 2019). Is there any official documentation relating to how to approach this issue, as it appears to have been closed intentionally not-fixed?

@thabiger
Copy link

thabiger commented Mar 4, 2020

The memory allocation limits enforcement is carried out by a host operating system's cgroups and oom killer. Just as we don't expect that the whole os will shut down just because some of the processes it runs has eaten up the memory(*), we shouldn't expect that from containers. In fact, they are not much more than just processes running on a system. What we usually observe is the oom killer ending the processes that cause the exhaustion.

In the case of containers that according to good practices contain only one process, killing by the oom killer will have an effect of terminating container, as this particular PID 1 process and container is the same thing.

The problem begins when an additional manager is introduced on the container. This can be by forking additional processes or by using tools like the supervisor, systemd, etc. Here's the example with plain Docker:

docker run -it -m 1024M --memory-swap=1024M --entrypoint /bin/sh debian -c "apt-get update && apt-get install stress-ng -y; stress-ng --brk 4 --stack 4 --bigheap 4"

The container allocates almost all of the available memory. The CPU usage and disk reads are high (the lack of memory causes that the app and libraries files can't be saved in the cache and are constantly reread for execution purposes).

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
1d2ba95930f2        awesome_mendeleev   669.94%             1022MiB / 1GiB      99.76%              11.3MB / 44.6kB     18.6GB / 25.9MB     26

At the same time, the oom killer tries to end processes that caused the exhaustion:


Mar 04 18:08:52 thalap kernel: Memory cgroup out of memory: Killed process 5080 (stress-ng-brk) total-vm:191364kB, anon-rss:160880kB, file-rss:376kB, shmem-rss:4kB, UID:100000000 pgtables:384kB oom_score_adj:1000
Mar 04 18:08:52 thalap kernel: Memory cgroup out of memory: Killed process 5094 (stress-ng-bighe) total-vm:177008kB, anon-rss:146520kB, file-rss:504kB, shmem-rss:4kB, UID:100000000 pgtables:360kB oom_score_adj:1000
Mar 04 18:08:52 thalap kernel: oom_reaper: reaped process 5080 (stress-ng-brk), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB
Mar 04 18:08:52 thalap kernel: oom_reaper: reaped process 5094 (stress-ng-bighe), now anon-rss:0kB, file-rss:0kB, shmem-rss:4kB
...

which are then spawned again and again according to the stress-ng docs: "If the out of memory killer (OOM) on Linux kills the worker or the allocation fails then the allocating process starts all over again." Many apps behave similar. What else could they do to keep working if some of their workers have been stopped/killed?

Theoretically, even if the enforcement of limits was the ECS agent's responsibility, as a result of the OS intervention, memory usage stays below the given threshold, and the agent wouldn't be able to take any actions.

How to approach that?

  • use only soft limits & monitor by how much they are exceeded,
  • use autoscaling mechanism to avoid failures due to a memory shortage,
  • if you have to use them, set hard limits reasonably high.

(*)For non-containerized systems, this is actually possible with the kernel setting: vm.panic_on_oom = 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests