Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ampfsm: <error> fileop_new: kmem_cache_alloc(path_name) failed #4

Closed
mtinberg opened this issue Jul 26, 2022 · 5 comments
Closed

ampfsm: <error> fileop_new: kmem_cache_alloc(path_name) failed #4

mtinberg opened this issue Jul 26, 2022 · 5 comments

Comments

@mtinberg
Copy link

ampfsm_fileop_new_error.log

Over the last few weeks we started getting a bunch of errors from ampfsm, I'm not sure what's causing them and the last change to the fileop_new call and usage was a couple of years ago, I'm not a kernel developer, or even a C programmer, but I only note that which struct was used in kmem_cache_alloc() changed from what was renamed to path_kmem_cache to the next struct member combined_path_kmem_cache and guessed that maybe this isn't populated in some obscure case.

$ modinfo /usr/lib/modules/$(uname -r)/extra/ampfsm.ko
filename: /usr/lib/modules/3.10.0-1160.71.1.0.1.el7.x86_64/extra/ampfsm.ko
description: Cisco AMP Filesystem Module
author: Craig Davison crdaviso@cisco.com
author: Russ Kubik rkubik@cisco.com
license: GPL
retpoline: Y
rhelversion: 7.8
srcversion: 2E9B5B876C49847D65CC59B
depends:
vermagic: 3.10.0-1127.el7.x86_64 SMP mod_unload modversions
$ modinfo ampfsm
filename: /lib/modules/3.10.0-1160.71.1.0.1.el7.x86_64/weak-updates/lib/modules/3.10.0-1160.2.2.el7.x86_64/extra/ampfsm.ko
description: Cisco AMP Filesystem Module
author: Craig Davison crdaviso@cisco.com
author: Russ Kubik rkubik@cisco.com
license: GPL
retpoline: Y
rhelversion: 7.8
srcversion: 5DB2BA437180AD290E7C36F
depends:
vermagic: 3.10.0-1127.el7.x86_64 SMP mod_unload modversions

@mtinberg
Copy link
Author

Another thought I had, which I haven't made a test case for yet, that might explain why the issue doesn't seem randomly distributed amongst processes on the host, is that it might be triggered by a process with CWD of / (which may look empty?), I'll have to check to see if the relevant processes do set their CWD to / in case this is a factor

@antchan2
Copy link
Contributor

antchan2 commented Aug 5, 2022

Hi @mtinberg. Thanks for reporting the issue.

The “fileop_new: kmem_cache_alloc(path_name) failed” error indicates a memory allocation failure. There can be a variety of reasons for memory pressure on the system, and it is normal that in a sensitive area of kernel code such as this (i.e., which must run to completion and cannot sleep) that memory allocation can sometimes fail. ampfsm is designed to handle this gracefully by skipping its processing of the current file operation but continue onwards so when memory pressure is relieved, processing returns to normal.

I noticed you are running kernel version 3.10.0-1160.71.1.0.1.el7.x86_64. I understand this is a recent update for Oracle Linux 7. What Linux distribution are you running? Did this problem appear or become more frequent after a recent kernel or software upgrade?

In my investigation so far I did not find a defect in ampfsm but found opportunities to make memory allocation more efficient. This includes requesting smaller sized memory blocks and freeing those blocks sooner. While speculative, those improvements could help reduce the occurrences of this error.

Lastly, if you are using ampfsm because you are a Cisco customer, don't hesitate to contact Technical Assistance Center and open a case. That provides a way to provide more direct help.

@mtinberg
Copy link
Author

mtinberg commented Aug 5, 2022 via email

@mtinberg
Copy link
Author

contact Technical Assistance Center and open a case.

SR 694170620 : Issue with ampfsm

A TAC case has now been opened, there was a delay due to staff vacations and handoff of this issue on our end.

I'll have to check to see if the relevant processes do set their CWD to / in case this is a factor

systemd pid 1 and the scraper processes that seemed to be most commonly affected by this do set CWD to / but so do most other processes so I'm not sure that differentiates anything.

@antchan2
Copy link
Contributor

Fixed in 62a5dae (Optimize memory allocation to reduce chance of allocation failures).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants