New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error loading buddy information"; root filesystem remounted read-only #1811

Closed
bgilbert opened this Issue Feb 16, 2017 · 8 comments

Comments

Projects
None yet
3 participants
@bgilbert
Member

bgilbert commented Feb 16, 2017

See kernel logs below. Prometheus, while generating high I/O load, hits the container memcg limit and is OOM killed. ext4 immediately reports "Error loading buddy information" as a result of a failed ext4_mb_load_buddy() call from ext4_discard_preallocations(), presumably running due to a Prometheus file descriptor being closed. That report is made via ext4_error(), causing a journal abort and read-only remount of the root FS. I haven't done an exhaustive audit, but it doesn't appear that any allocations under ext4_mb_load_buddy() should be accountable to the memcg.

This has happened on 4.7.3-coreos-r2 and 4.8.15-coreos. The root filesystem is in EBS. The problem is sporadic but recurring.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Feb 16, 2017

Member
prometheus invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=997
prometheus cpuset=[...] mems_allowed=0
CPU: 2 PID: 31004 Comm: prometheus Tainted: G        W       4.7.3-coreos-r2 #1
EXT4-fs error (device xvda9): ext4_discard_preallocations:4028: comm prometheus: Error loading buddy information for 149
Aborting journal on device xvda9-8.
EXT4-fs (xvda9): Remounting filesystem read-only
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
 0000000000000286 00000000cf53517a ffff8805e5effc70 ffffffffb3308d73
 ffff8805e5effd50 ffff88071508ec00 ffff8805e5effcd8 ffffffffb31f14ff
 00000000ead330c0 00000000cf53517a ffffffffb31780fb 0000000000000001
Call Trace:
 [<ffffffffb3308d73>] dump_stack+0x63/0x90
 [<ffffffffb31f14ff>] dump_header+0x60/0x1df
 [<ffffffffb31780fb>] ? find_lock_task_mm+0x3b/0x80
 [<ffffffffb3178ad9>] oom_kill_process+0x229/0x410
 [<ffffffffb31e88fb>] ? mem_cgroup_iter+0x1db/0x350
 [<ffffffffb31eac47>] mem_cgroup_out_of_memory+0x307/0x350
 [<ffffffffb31ebae5>] mem_cgroup_oom_synchronize+0x335/0x340
 [<ffffffffb31e6510>] ? high_work_func+0x20/0x20
 [<ffffffffb31791f4>] pagefault_out_of_memory+0x44/0xc0
 [<ffffffffb30662c1>] mm_fault_error+0x91/0x190
 [<ffffffffb3066b7a>] __do_page_fault+0x47a/0x4a0
 [<ffffffffb3066bc2>] do_page_fault+0x22/0x30
 [<ffffffffb35aee58>] page_fault+0x28/0x30
Task in /docker/[...] killed as a result of limit of /docker/[...]
memory: usage 3072000kB, limit 3072000kB, failcnt 123365
memory+swap: usage 3072000kB, limit 9007199254740988kB, failcnt 0
kmem: usage 1648kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/[...]: cache:8KB rss:3070344KB rss_huge:501760KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:3070344KB inactive_file:4KB active_file:4KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[30971]     0 30971   776337   768008    1517       8        0           997 prometheus
Memory cgroup out of memory: Kill process 30971 (prometheus) score 1998 or sacrifice child
Killed process 30971 (prometheus) total-vm:3105348kB, anon-rss:3068684kB, file-rss:3348kB, shmem-rss:0kB
Member

bgilbert commented Feb 16, 2017

prometheus invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=997
prometheus cpuset=[...] mems_allowed=0
CPU: 2 PID: 31004 Comm: prometheus Tainted: G        W       4.7.3-coreos-r2 #1
EXT4-fs error (device xvda9): ext4_discard_preallocations:4028: comm prometheus: Error loading buddy information for 149
Aborting journal on device xvda9-8.
EXT4-fs (xvda9): Remounting filesystem read-only
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
 0000000000000286 00000000cf53517a ffff8805e5effc70 ffffffffb3308d73
 ffff8805e5effd50 ffff88071508ec00 ffff8805e5effcd8 ffffffffb31f14ff
 00000000ead330c0 00000000cf53517a ffffffffb31780fb 0000000000000001
Call Trace:
 [<ffffffffb3308d73>] dump_stack+0x63/0x90
 [<ffffffffb31f14ff>] dump_header+0x60/0x1df
 [<ffffffffb31780fb>] ? find_lock_task_mm+0x3b/0x80
 [<ffffffffb3178ad9>] oom_kill_process+0x229/0x410
 [<ffffffffb31e88fb>] ? mem_cgroup_iter+0x1db/0x350
 [<ffffffffb31eac47>] mem_cgroup_out_of_memory+0x307/0x350
 [<ffffffffb31ebae5>] mem_cgroup_oom_synchronize+0x335/0x340
 [<ffffffffb31e6510>] ? high_work_func+0x20/0x20
 [<ffffffffb31791f4>] pagefault_out_of_memory+0x44/0xc0
 [<ffffffffb30662c1>] mm_fault_error+0x91/0x190
 [<ffffffffb3066b7a>] __do_page_fault+0x47a/0x4a0
 [<ffffffffb3066bc2>] do_page_fault+0x22/0x30
 [<ffffffffb35aee58>] page_fault+0x28/0x30
Task in /docker/[...] killed as a result of limit of /docker/[...]
memory: usage 3072000kB, limit 3072000kB, failcnt 123365
memory+swap: usage 3072000kB, limit 9007199254740988kB, failcnt 0
kmem: usage 1648kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/[...]: cache:8KB rss:3070344KB rss_huge:501760KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:3070344KB inactive_file:4KB active_file:4KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[30971]     0 30971   776337   768008    1517       8        0           997 prometheus
Memory cgroup out of memory: Kill process 30971 (prometheus) score 1998 or sacrifice child
Killed process 30971 (prometheus) total-vm:3105348kB, anon-rss:3068684kB, file-rss:3348kB, shmem-rss:0kB
@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Feb 16, 2017

Member
prometheus invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=997
prometheus cpuset=[...] mems_allowed=0
CPU: 1 PID: 20930 Comm: prometheus Tainted: G        W       4.8.15-coreos #1
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
EXT4-fs error (device xvda9): ext4_discard_preallocations:4031: comm prometheus: Error loading buddy information for 397
Aborting journal on device xvda9-8.
EXT4-fs (xvda9): Remounting filesystem read-only
 0000000000000286 0000000013548d17 ffff953846b9bc60 ffffffffb83157f3
 ffff953846b9bd40 ffff9537fffe1d40 ffff953846b9bcc8 ffffffffb81fc36e
 0000000000000002 ffffffffb817d8eb ffff953904fc57c0 0000000000000001
Call Trace:
 [<ffffffffb83157f3>] dump_stack+0x63/0x90
 [<ffffffffb81fc36e>] dump_header+0x5d/0x1e1
 [<ffffffffb817d8eb>] ? find_lock_task_mm+0x3b/0x80
 [<ffffffffb817e515>] oom_kill_process+0x225/0x3f0
 [<ffffffffb81f38e5>] ? mem_cgroup_iter+0x1a5/0x310
 [<ffffffffb81f5bbc>] mem_cgroup_out_of_memory+0x2bc/0x300
 [<ffffffffb81f6aa5>] mem_cgroup_oom_synchronize+0x335/0x340
 [<ffffffffb81f14f0>] ? high_work_func+0x20/0x20
 [<ffffffffb817eb9c>] pagefault_out_of_memory+0x4c/0xc0
 [<ffffffffb8067ff1>] mm_fault_error+0x91/0x190
 [<ffffffffb80688c6>] __do_page_fault+0x476/0x4a0
 [<ffffffffb8068912>] do_page_fault+0x22/0x30
 [<ffffffffb85c0af8>] page_fault+0x28/0x30
Task in /docker/[...] killed as a result of limit of /docker/[...]
memory: usage 3072000kB, limit 3072000kB, failcnt 164128
memory+swap: usage 3072000kB, limit 9007199254740988kB, failcnt 0
kmem: usage 8168kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/[...]: cache:4KB rss:3063828KB rss_huge:514048KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:3063828KB inactive_file:0KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[20889]     0 20889   777317   768312    1519       8        0           997 prometheus
Memory cgroup out of memory: Kill process 20889 (prometheus) score 1999 or sacrifice child
Killed process 20889 (prometheus) total-vm:3109268kB, anon-rss:3062540kB, file-rss:10708kB, shmem-rss:0kB
Member

bgilbert commented Feb 16, 2017

prometheus invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=997
prometheus cpuset=[...] mems_allowed=0
CPU: 1 PID: 20930 Comm: prometheus Tainted: G        W       4.8.15-coreos #1
Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
EXT4-fs error (device xvda9): ext4_discard_preallocations:4031: comm prometheus: Error loading buddy information for 397
Aborting journal on device xvda9-8.
EXT4-fs (xvda9): Remounting filesystem read-only
 0000000000000286 0000000013548d17 ffff953846b9bc60 ffffffffb83157f3
 ffff953846b9bd40 ffff9537fffe1d40 ffff953846b9bcc8 ffffffffb81fc36e
 0000000000000002 ffffffffb817d8eb ffff953904fc57c0 0000000000000001
Call Trace:
 [<ffffffffb83157f3>] dump_stack+0x63/0x90
 [<ffffffffb81fc36e>] dump_header+0x5d/0x1e1
 [<ffffffffb817d8eb>] ? find_lock_task_mm+0x3b/0x80
 [<ffffffffb817e515>] oom_kill_process+0x225/0x3f0
 [<ffffffffb81f38e5>] ? mem_cgroup_iter+0x1a5/0x310
 [<ffffffffb81f5bbc>] mem_cgroup_out_of_memory+0x2bc/0x300
 [<ffffffffb81f6aa5>] mem_cgroup_oom_synchronize+0x335/0x340
 [<ffffffffb81f14f0>] ? high_work_func+0x20/0x20
 [<ffffffffb817eb9c>] pagefault_out_of_memory+0x4c/0xc0
 [<ffffffffb8067ff1>] mm_fault_error+0x91/0x190
 [<ffffffffb80688c6>] __do_page_fault+0x476/0x4a0
 [<ffffffffb8068912>] do_page_fault+0x22/0x30
 [<ffffffffb85c0af8>] page_fault+0x28/0x30
Task in /docker/[...] killed as a result of limit of /docker/[...]
memory: usage 3072000kB, limit 3072000kB, failcnt 164128
memory+swap: usage 3072000kB, limit 9007199254740988kB, failcnt 0
kmem: usage 8168kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/[...]: cache:4KB rss:3063828KB rss_huge:514048KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:3063828KB inactive_file:0KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[20889]     0 20889   777317   768312    1519       8        0           997 prometheus
Memory cgroup out of memory: Kill process 20889 (prometheus) score 1999 or sacrifice child
Killed process 20889 (prometheus) total-vm:3109268kB, anon-rss:3062540kB, file-rss:10708kB, shmem-rss:0kB
@eddieesquivel

This comment has been minimized.

Show comment
Hide comment
@eddieesquivel

eddieesquivel Feb 24, 2017

another instance of this happening...logs are below:

Feb 23 20:53:32 <IP> kernel: EXT4-fs error (device xvda9): ext4_discard_preallocations:4028: comm prometheus: Error loading buddy information for 989
Feb 23 20:53:32 <IP> kernel: Aborting journal on device xvda9-8.
Feb 23 20:53:32 <IP> kernel: EXT4-fs (xvda9): Remounting filesystem read-only
Feb 23 20:53:32 <IP> kernel: CPU: 7 PID: 16116 Comm: prometheus Not tainted 4.7.3-coreos-r3 #1
Feb 23 20:53:32 <IP> kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
Feb 23 20:53:32 <IP> kernel:  0000000000000286 000000000bd2c9d4 ffff880590ac7c70 ffffffff91308d73
Feb 23 20:53:32 <IP> kernel:  ffff880590ac7d50 ffff8807e3cd2c00 ffff880590ac7cd8 ffffffff911f14ff
Feb 23 20:53:32 <IP> kernel:  ffff880590ac7c98 000000000bd2c9d4 ffffffff911780fb 0000000000000001
Feb 23 20:53:32 <IP> kernel: Call Trace:
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91308d73>] dump_stack+0x63/0x90
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911f14ff>] dump_header+0x60/0x1df
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911780fb>] ? find_lock_task_mm+0x3b/0x80
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91178ad9>] oom_kill_process+0x229/0x410
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911e88fb>] ? mem_cgroup_iter+0x1db/0x350
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911eac47>] mem_cgroup_out_of_memory+0x307/0x350
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911ebae5>] mem_cgroup_oom_synchronize+0x335/0x340
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911e6510>] ? high_work_func+0x20/0x20
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911791f4>] pagefault_out_of_memory+0x44/0xc0
Feb 23 20:53:33 <IP> kernel:  [<ffffffff910662c1>] mm_fault_error+0x91/0x190
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91066b7a>] __do_page_fault+0x47a/0x4a0
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91066bc2>] do_page_fault+0x22/0x30
Feb 23 20:53:33 <IP> kernel:  [<ffffffff915aee58>] page_fault+0x28/0x30

eddieesquivel commented Feb 24, 2017

another instance of this happening...logs are below:

Feb 23 20:53:32 <IP> kernel: EXT4-fs error (device xvda9): ext4_discard_preallocations:4028: comm prometheus: Error loading buddy information for 989
Feb 23 20:53:32 <IP> kernel: Aborting journal on device xvda9-8.
Feb 23 20:53:32 <IP> kernel: EXT4-fs (xvda9): Remounting filesystem read-only
Feb 23 20:53:32 <IP> kernel: CPU: 7 PID: 16116 Comm: prometheus Not tainted 4.7.3-coreos-r3 #1
Feb 23 20:53:32 <IP> kernel: Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
Feb 23 20:53:32 <IP> kernel:  0000000000000286 000000000bd2c9d4 ffff880590ac7c70 ffffffff91308d73
Feb 23 20:53:32 <IP> kernel:  ffff880590ac7d50 ffff8807e3cd2c00 ffff880590ac7cd8 ffffffff911f14ff
Feb 23 20:53:32 <IP> kernel:  ffff880590ac7c98 000000000bd2c9d4 ffffffff911780fb 0000000000000001
Feb 23 20:53:32 <IP> kernel: Call Trace:
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91308d73>] dump_stack+0x63/0x90
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911f14ff>] dump_header+0x60/0x1df
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911780fb>] ? find_lock_task_mm+0x3b/0x80
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91178ad9>] oom_kill_process+0x229/0x410
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911e88fb>] ? mem_cgroup_iter+0x1db/0x350
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911eac47>] mem_cgroup_out_of_memory+0x307/0x350
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911ebae5>] mem_cgroup_oom_synchronize+0x335/0x340
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911e6510>] ? high_work_func+0x20/0x20
Feb 23 20:53:33 <IP> kernel:  [<ffffffff911791f4>] pagefault_out_of_memory+0x44/0xc0
Feb 23 20:53:33 <IP> kernel:  [<ffffffff910662c1>] mm_fault_error+0x91/0x190
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91066b7a>] __do_page_fault+0x47a/0x4a0
Feb 23 20:53:33 <IP> kernel:  [<ffffffff91066bc2>] do_page_fault+0x22/0x30
Feb 23 20:53:33 <IP> kernel:  [<ffffffff915aee58>] page_fault+0x28/0x30
@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Mar 4, 2017

Member

The buddy allocator which (typically) manages ext4 preallocations stores its state in an inode (ext4_sb_info.s_buddy_cache) and accesses it through the page cache. The page cache, in turn, always charges allocations to the current memcg. This has apparently caused similar problems before (torvalds/linux@adb7ef6) and the proper fix may be the same as in that commit: adding __GFP_NOFAIL to the ext4_discard_preallocations() path. However, the memcg OOM case should already be handled by a special case in try_charge() which permits allocations from dying processes, and I'd like to understand why that's not working before patching.

Member

bgilbert commented Mar 4, 2017

The buddy allocator which (typically) manages ext4 preallocations stores its state in an inode (ext4_sb_info.s_buddy_cache) and accesses it through the page cache. The page cache, in turn, always charges allocations to the current memcg. This has apparently caused similar problems before (torvalds/linux@adb7ef6) and the proper fix may be the same as in that commit: adding __GFP_NOFAIL to the ext4_discard_preallocations() path. However, the memcg OOM case should already be handled by a special case in try_charge() which permits allocations from dying processes, and I'd like to understand why that's not working before patching.

@crawford

This comment has been minimized.

Show comment
Hide comment
@crawford

crawford Mar 13, 2017

Member

Another interesting datapoint: #1309 (comment).

Member

crawford commented Mar 13, 2017

Another interesting datapoint: #1309 (comment).

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Mar 16, 2017

Member

Also reproduced on 4.9.9-coreos-r1.

Member

bgilbert commented Mar 16, 2017

Also reproduced on 4.9.9-coreos-r1.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jun 12, 2017

Member

This should be fixed in kernel 4.12.

Member

bgilbert commented Jun 12, 2017

This should be fixed in kernel 4.12.

@bgilbert

This comment has been minimized.

Show comment
Hide comment
@bgilbert

bgilbert Jul 6, 2017

Member

This should now be fixed in the beta channel, as well as in the next alpha and stable releases, due shortly.

Member

bgilbert commented Jul 6, 2017

This should now be fixed in the beta channel, as well as in the next alpha and stable releases, due shortly.

@bgilbert bgilbert closed this Jul 6, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment