-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ampfsm: <error> fileop_new: kmem_cache_alloc(path_name) failed #4
Comments
Another thought I had, which I haven't made a test case for yet, that might explain why the issue doesn't seem randomly distributed amongst processes on the host, is that it might be triggered by a process with CWD of / (which may look empty?), I'll have to check to see if the relevant processes do set their CWD to / in case this is a factor |
Hi @mtinberg. Thanks for reporting the issue. The “fileop_new: kmem_cache_alloc(path_name) failed” error indicates a memory allocation failure. There can be a variety of reasons for memory pressure on the system, and it is normal that in a sensitive area of kernel code such as this (i.e., which must run to completion and cannot sleep) that memory allocation can sometimes fail. ampfsm is designed to handle this gracefully by skipping its processing of the current file operation but continue onwards so when memory pressure is relieved, processing returns to normal. I noticed you are running kernel version 3.10.0-1160.71.1.0.1.el7.x86_64. I understand this is a recent update for Oracle Linux 7. What Linux distribution are you running? Did this problem appear or become more frequent after a recent kernel or software upgrade? In my investigation so far I did not find a defect in ampfsm but found opportunities to make memory allocation more efficient. This includes requesting smaller sized memory blocks and freeing those blocks sooner. While speculative, those improvements could help reduce the occurrences of this error. Lastly, if you are using ampfsm because you are a Cisco customer, don't hesitate to contact Technical Assistance Center and open a case. That provides a way to provide more direct help. |
I'm cc'ing a CyberSecurity staff who manages AMP, it looks like this started around July 5 according to my alarm log, and this also corresponds to an update to the ciscoampconnector package. The host is running Oracle Linux 7.9 and the problem has persisted across a few minor patch levels of the kernel.
$ rpm -qi ciscoampconnector
Name : ciscoampconnector
Version : 1.19.0.846
Release : 1.el7
Architecture: x86_64
Install Date: Tue 05 Jul 2022 10:08:21 AM CDT
Apr 19 05:07:07 Installed: kernel-3.10.0-1160.62.1.el7.x86_64
Jun 21 05:02:22 Installed: kernel-3.10.0-1160.66.1.el7.x86_64
Jul 19 05:11:57 Installed: kernel-3.10.0-1160.71.1.0.1.el7.x86_64
—
Mark Tinberg ***@***.***>
Division of Information Technology-Network Services
University of Wisconsin-Madison
…________________________________
From: Anthony Chan ***@***.***>
Sent: Friday, August 5, 2022 12:01 PM
To: cisco/ampfsm ***@***.***>
Cc: Mark Tinberg ***@***.***>; Mention ***@***.***>
Subject: Re: [cisco/ampfsm] ampfsm: <error> fileop_new: kmem_cache_alloc(path_name) failed (Issue #4)
Hi @mtinberg<https://github.com/mtinberg>. Thanks for reporting the issue..
The “fileop_new: kmem_cache_alloc(path_name) failed” error indicates a memory allocation failure. There can be a variety of reasons for memory pressure on the system, and it is normal that in a sensitive area of kernel code such as this (i.e., which must run to completion and cannot sleep) that memory allocation can sometimes fail. ampfsm is designed to handle this gracefully by skipping its processing of the current file operation but continue onwards so when memory pressure is relieved, processing returns to normal.
I noticed you are running kernel version 3.10.0-1160.71.1.0.1.el7.x86_64. I understand this is a recent update for Oracle Linux 7. What Linux distribution are you running? Did this problem appear or become more frequent after a recent kernel or software upgrade?
In my investigation so far I did not find a defect in ampfsm but found opportunities to make memory allocation more efficient. This includes requesting smaller sized memory blocks and freeing those blocks sooner. While speculative, those improvements could help reduce the occurrences of this error.
Lastly, if you are using ampfsm because you are a Cisco customer, don't hesitate to contact Technical Assistance Center<https://www.cisco.com/c/en/us/support/web/tsd-cisco-worldwide-contacts.html> and open a case. That provides an way to provide more direct help.
—
Reply to this email directly, view it on GitHub<#4 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AAS7UM5IMJE3FFSSLS5E2TLVXVCHTANCNFSM54WXVZOA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
SR 694170620 : Issue with ampfsm A TAC case has now been opened, there was a delay due to staff vacations and handoff of this issue on our end.
systemd pid 1 and the scraper processes that seemed to be most commonly affected by this do set CWD to / but so do most other processes so I'm not sure that differentiates anything. |
Fixed in 62a5dae (Optimize memory allocation to reduce chance of allocation failures). |
ampfsm_fileop_new_error.log
Over the last few weeks we started getting a bunch of errors from ampfsm, I'm not sure what's causing them and the last change to the fileop_new call and usage was a couple of years ago, I'm not a kernel developer, or even a C programmer, but I only note that which struct was used in
kmem_cache_alloc()
changed from what was renamed topath_kmem_cache
to the next struct membercombined_path_kmem_cache
and guessed that maybe this isn't populated in some obscure case.$ modinfo /usr/lib/modules/$(uname -r)/extra/ampfsm.ko
filename: /usr/lib/modules/3.10.0-1160.71.1.0.1.el7.x86_64/extra/ampfsm.ko
description: Cisco AMP Filesystem Module
author: Craig Davison crdaviso@cisco.com
author: Russ Kubik rkubik@cisco.com
license: GPL
retpoline: Y
rhelversion: 7.8
srcversion: 2E9B5B876C49847D65CC59B
depends:
vermagic: 3.10.0-1127.el7.x86_64 SMP mod_unload modversions
$ modinfo ampfsm
filename: /lib/modules/3.10.0-1160.71.1.0.1.el7.x86_64/weak-updates/lib/modules/3.10.0-1160.2.2.el7.x86_64/extra/ampfsm.ko
description: Cisco AMP Filesystem Module
author: Craig Davison crdaviso@cisco.com
author: Russ Kubik rkubik@cisco.com
license: GPL
retpoline: Y
rhelversion: 7.8
srcversion: 5DB2BA437180AD290E7C36F
depends:
vermagic: 3.10.0-1127.el7.x86_64 SMP mod_unload modversions
The text was updated successfully, but these errors were encountered: