New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: non-UTF-8 characters kill syscheckd silently #23354
Comments
@zbalkan Thank you for reporting this issue. We have successfully reproduced the problem and are currently working on a fix. Backtrace
We will keep you posted. |
Hi @vikman90, thanks for the update. |
I have uploaded a preliminary attempt at a fix, and at least I have managed to replace the crash with this log:
Next steps
|
Hi @vikman90, logging properly is better. It would have saved our logs and troubleshooting efforts. Here, the question is, would converting byte sequence to an array of UTF-8 chars be worth spending time if the files are most probably unreadable chunks? In my case, they are useless. Our solution was removing them all. But if there are valid cases, then it may be worth. |
@zbalkan, if I understood you correctly, this is what I was referring to in the fourth point. Currently, the code introduces some non-UTF8 data into the JSON structure (in this case, the file name). When printing that JSON as a string, the error occurs. It would be worth doing a sanitization prior to ingesting the data into the structure, as well as improving the logging, as you mention. However, our highest priority is to avoid a crash in the next version of the product. |
New error conditionConfiguration<syscheck>
<directories realtime="yes" report_changes="yes">/root/test/fim</directories>
<synchronization>
<interval>10s</interval>
</synchronization>
</syscheck> Steps
Backtrace
|
Proposed solution for crashesHello team, This last error found does not occur in syscheck but in rsync, this is because during the callback
So we should add error handling here as well. After introducing these changes in the fix branch: 4.7.5...fix/23354-fim-crash I have been analyzing all the behavior of syscheck, (whodata, realtime and scheduled), the event with invalid characters is lost, and an error appears in the log, but at least syscheck does not crash:
I have also checked that the realtime mode can work correctly with content with invalid characters, and there is no problem:
Issue to analyze the problem in depth |
Dear @zbalkan, We have just applied the fix for the issue you reported. As of the next patch version, the agent will print a log like this when trying to monitor a file whose name is unsupported (non-UTF8):
This way, there is a double layer:
On the other hand, we will extend the support for this kind of files: Thank you very much for bringing this to our attention. Best regards! |
Problem
We detected that the FIM logs were not being generated for a long while. Since other logs were being forwarded we could not detect this in time. After some troubleshooting, we found out that
syscheckd
process dies in a couple of second after it is started. The logs do not include any error message.Root Cause Analysis
We checked that the problem occurred after updating the agent to v4.6.0. But there was no anomaly on any other agents.
We then tried to check
ossec.log
of the agent but it was hopeless.Next step was to manually run the
syscheckd
, yet it failed in less than a minute. No logs and traces around.We checked if the agent hits the
inotify
limit, but it was being killed when theinotify
watcher number was around 12.000. The limit of the computer is set to 3.000.000, and enough memory was allocated for this change. Theinotify
instance number was just 1, so it was not aboutinotify
. There were no logs regarding the issue within/var/log/dmesg
as well.We then observed the CPU and memory usage to see if it was killed by OOM Killer or any other reason. There was low resource usage. We omitted this scenario.
We enabled debug log in agent by setting the
syscheck.debug
value to 2var/ossec/etc/internal_options.conf
. No issues were detected in the logs: while the daemon was running normally, it just stops without any trace.We utilized strace for detailed investigation. The last message we saw in the strace was
[json.exception.type_error.316] invalid UTF-8 byte at index <some number>
, triggered bynlohmann.json
library. What caused the parsing error was some filed with weird names.We discovered that some time ago, some old files had an error during file compression, and the output of the compression operation became weird named directories and files scattered around. The names are actually excerpts from the binary compression data, therefore some of them are accepted as valid file and directory names. The text representation of the binary-like paths were human readable but eventually they are binary representation of compressed data, which are not in the range of printable UTF-8 characters.
When the
syscheckd
daemon calculated the checksum successfully, it tries to create the json message from the collected data. When the directory names are byte-like char sequences, the json dump function tries to resolve them into UTF-8 characters and fail, instead of writing them to stream as is. It is by design as thedump()
function requires the strings as UTF-8, as mentioned here. See the issue on the library's repository. And that function is used insyscheckd
:wazuh/src/syscheckd/src/db/src/db.cpp
Line 169 in 1e51c2d
wazuh/src/syscheckd/src/db/src/db.cpp
Line 221 in 1e51c2d
wazuh/src/syscheckd/src/db/src/db.cpp
Line 352 in 1e51c2d
Expected behavior
When the underlying dependency raises an error, it is expected for
syscheckd
to catch the error, either log and continue or resolve it in the business logic instead of killing the process with SIGABRT silently.Hints
The text was updated successfully, but these errors were encountered: