-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Since version 0.105 the scan is unbearable slow #590
Comments
I noticed the same thing! |
Thank you for letting us know. Are there a particular set of files/file types that are causing the issue. Any sample files you could provide us with to demonstrate the issue would help in fixing it. Thanks, |
As described, the problem occurs when scanning all of $HOME. I can't tell you which of the >98,000 files is causing the problem; I guess all. The log also doesn't show how long a scan of each file took, but only gives an overall statistic, so I can't provide a sample file either. The overall performance is significantly worse with version 0.105 than with version 0.104, I'm sorry but that's all I can say. |
On this point, I've hit what looks to be the same issue, so I've opened a separate issue for it - #593. I'm unable to attach the PDF I'm experiencing the issue with due to sensitive contents. @martin-ms, if you have any example files that aren't sensitive, and are reporting "Can't parse data ERROR", could you attach them to that ticket to help with diagnosis. |
@alext done |
Thank you for the updates. I understand that you cannot determine which files are causing the issues. I attempted scanning the sample in #593, and it scanned much quicker with 0.105 than with 0.104, so they don't appear related. I'll let you know when I am able to reproduce the issue. |
When I was reading this earlier, I had initially thought the scan time may be longer because we're now calculating fuzzy hashes for image files. But then we realized a much more obvious reason. In 0.105 we increased the default max file-size, max scan-size, etc. Specifically:
Ref: #489 @martin-ms what scan options do you use? If you're scanning with the defaults, then it would make a lot of sense that 0.105 is significantly slower. 0.105 will be scanning a lot more files, and a lot more data in those files. |
Thank you for taking care of the issue. Although I had used the default settings, I now changed the variables mentioned to the old values. clamconf reports as non-default values
but unfortunately it doesn't change the behavior that it still runs much longer than with version 0.104:
Here for comparison the same task a few minutes later with the same settings with version 0.104:
I don't know if it's important, but I got with v0.104 several |
Apologies I should've shared the options for use with To get a similar effect with |
I tried it with the given command line parameters, but it didn't get significantly faster:
|
@martin-ms Interesting. I'm not sure what to say. We do a bit of performance profiling/monitoring on a a selection of file types but I think we will have to extend that and compare older and newer versions to understand what's going on. |
Abstract booklet CNIC Inflammation Day.pdf This seems weird.
But it ony takes 19 seconds in 0.104.4 or 0.105.1 with same limits.
|
Any news on this issue?
Another antivirus as a comparison with the mention that it must load virus definitions before scanning (the period is included in the total scan time).
Pay attention to the number of virus definitions! |
I have problem with slow scan time with big PDF files and I just found that this 2 options or settings are the most sigificant on scan time.
Scan file is email file and I created signature from JPG image file inside attachment PDF with fuzzyimg
pdf01: f0e00b0fef9689cc You can see the different scan time with different scan options adjustment Scan with default setting
Scan with specify default setting values
Scan with specify half default setting values
However I tested with above bigger PDF file (Abstract.booklet.CNIC.Inflammation.Day.pdf) LibClamAV debug: cli_unzip: Time limit reached (max: 120000)
|
The problem affects any service that uses Clamav, for example Amavis, Squid eCAP, he puts them in head without any problem. |
Scanning using Sophos Protection for Linux (avscanner) new replacement for Sophos Antivirus for Linux.
|
Any news on that? |
From https://www.linuxquestions.org/ Slackware forum:
|
I tried again with the current version 1.0.0, but I got the same result. Scanning directories with version 1.0.0:
Then the same directories with the same options and settings with version 0.104.2:
It's still more than three times slower. I'll stay with 0.104.2 for now, but may have to look for something else as I can't work with an outdated version forever. |
Now Clamav 1.0.0 can only be reasonably used for small files (perhaps under 1 MB), is this by design? |
A commercial alternative that is compatible with Clamav can be IKARUS scan.server
It also works as a unix socket and version 6.1.7 includes the option to configure socket permissions (very useful). |
Still no improvement to the existing problem in version 1.1.0! |
To those affected, could you please provide a flamegraph showing where clamav is spending more time? This is the best way to show us what you're seeing on your system so we can figure out a fix. Instructions here: https://docs.clamav.net/manual/Development/performance-profiling.html?highlight=flame#flame-graph-profiling Ideally, we'd like a flamegraph of the older, more performant version and the latest so we can compare the two. |
I'm sorry, but I don't have the necessary hardware to perform those operations. |
I don't know where to obtain the missing file, it ist not part of the distributed installation package. [UPDATE] and the results are for 0.104.2 The performance was the same both times, I can't imagine what this test is good for, it doesn't bring any new insights. |
I compiled perf, recompiled clamav (1.0.1) with debug symbols and made some tries.
Don't know about that symbol error. |
As I already told in a previous comment, I don't know how to handle the
as the command line with the following results:
For me it's all useless stuff & wasted time, but for those who like it... |
I did some more tests with this PDF file and found that it seems to keep clamscan busy until one of the limits is hit, which can be shown by adding the In 0.103.8 the default MaxFileSize gets hit pretty quickly, and that's why the scan appears to be so fast. When increasing the size limits the scan runs longer until it hits the MaxScanTime. With newer version and their higher default size limits the MaxScanTime limit gets hit when running the engine with defaults, but when increasing the time, one of the size limits gets hit as well. I tried this with size limits of up to 1000MB and time limits up to 200 seconds and got no regular finish of the scan process with either version. I think the ClamAV team should have a closer look at this file to see why it is always driving ClamAV to its limits. [Update] See my correction in the next post [/Update]
|
I have to corrct myself regarding the assumed never-ending scan of that file. It turned out that with just a little more ressources than I had tried before, the scan does come to an end in both versions with comparable timings:
|
@martin-ms thank you for making the flamegraph. Sadly it lacks the debug symbols required to show real insight into what's going on. A debug build of clamav would be required (i.e. building with the As @rma-x identified, it seems this particular file is very slow for both versions if you crank up the scan limits. I'll see if I can do something similar with flamegraph and scanning the provided Abstract booklet CNIC Inflammation Day.pdf. Maybe it will shed some light on what this file is so slow to scan. |
I just started using clamav yesterday on windows10 x64, 1.1.1v. It takes 55minutes to scan 62GB on a m.2 3500MB/s ssd, i5-10600k oc'd to 5GHz, 32GB ram DDr4 F4-4000C16-16GTZRA OC'd at 4,266MHz, very fast system this is why I supplied my components. 55Minutes for 62GB OS. I have a 22TB HDD needs scan that has 8TB free, imagine how long it would take clamav to scan the 22TB hdd? I am not trying it. I am just here to inform you, clamav is awfully awfully unbearable slow. Here is the cmds I am using. clamdscan --multiscan --infected --move /quarantine_directory --log /clamav-directory |
Nothing new yet, not even in version 1.2.0 |
I had a closer look at that PDF file and found that it consists of 32 images each having 2623 x 3651 pixels, two for each of the 16 pages, one containing the actual content and one for the borders and cut marks. When I unpack all these images to 24 bit ppm files they take up a total of about 900 MB which might explain why clamscan reports about 1 GB of "Data scanned". @micahsnyder I hope this can be useful for your further analysis of what ClamAV does with that file that takes so long and whether that's actually needed for proper scanning of the file. |
Additional observations: When I use poppler's
BTW, the original file is PDF-1.4 whereas |
For me it makes no difference if I scan PDFs or not. If I exclude them and all graphics using
then the result with 0.104 is
and with 1.2.0
The execution time with version 1.2.0 is still about twice as long as with version 0.104.0. Adding to all the misery today is the fact that I'm getting a lot of "LibClamAV Error: cli_html_normalise: style chunk size underflow " error messages that didn't exist in the past. Instead of getting better, it keeps getting worse. I reported the problem as soon as it occurred. I have little understanding of the developers' inability to work out what the differences are between versions 0.104 and 0.105. They need to know what they changed between the two versions and start their investigation there. Instead, we as users have been looking for the causes for over a year, testing everything possible, sometimes with enormous effort, but without any tangible results. I'm tired for more attempts; my strategy now is to continue using version 0.104 as long as possible and then to avoid clamav altogether because I can't expect any support from the developers. As far as I'm concerned we can close the bug report, there won't be anything more. |
The Clamav seems to become a difficult problem to manage. Maybe because of the lack of human or financial resources, maybe both! |
1.2.0 apparently checks a much larger portion of your data, which of course takes more time, but also has a chance of finding malware that wouldn't have been found before. Did you do these test runs with identical limits for the two versions or with the respective defaults? BTW, if you want to stick with an older version for now, it might be better to stay with LTS version 0.103 which will be supported for another 11 months, whereas 0.104 and 0.105 are already EOL. |
I am aware of this, and as you can see from #590 (comment), I had set the limits to the values recommended by Micah Snyder. But nothing changed in the result, so that can't be the problem.
Yes, I only swapped the program version between the runs
My provider only offers the current program version. I backed up version 0.104 and reinstalled it because it was the last working version before 0.105. I would have to build the installation package for LTS version 0.103 myself from the current sources and would also have to constantly monitor changes. And in 11 months I'll face the same problem again. |
Yep, I noticed that after my post and therefore revoked it.
Sorry, but it is still not clear to me if you went with the (differing) default limits or forced both versions to use the same limits in your latest performance comparison.
Well, I would hope that in 11 months either the issues with the newer versions will have been sorted out, or support for version 0.103 will be extended once more. |
In the latest comparison I swapped only the program version and used their default limits, but I can repeat it again with the values mentioned by Micah Snyder for both runs, if that might be useful. But since the first try on Jun 8, 2022 didn't improve the performance I am in doubt that it would change it now. |
Well, using different limits will definitely result in different scan times, if there is a significant amount of files in your workload that exceed one of the the limits from the lower set. So, if you want to show that there are performance differences that are not caused by the limit changes you always have to to use identical limits for the two runs you want to compare. BTW, the |
OK… then here are the results for both runs with the same limits:
This is not too bad at all, only 5 mins more, and this is the result with exclude of uppercase file extensions and limits:
Scanning all files in
and the same with 0.104:
So it looks like after exclude of graphics and pdf the time for scanning is almost the same, but takes about two times longer across all files and setting limits in this case has no effect. In the meantime, I was also able to build an installation package from the sources of 0.103.10 LTS, but I haven't tested it for functionality yet. |
to follow up. |
No change in scan speed after removing the bytecode signatures here. |
By the way, in 1.4 we'll be adding an option to disable image scanning and image fuzzy hashing. For clamscan:
for clamd.conf:
Ref: https://github.com/Cisco-Talos/clamav/releases/tag/clamav-1.4.0-rc This may help for those whole-harddrive scans where image fuzzy hashing, added in 0.105, was slowing down scans. You may also wanto to adjust the max-filesize or max-scansize limits to match 0.104 and older versions. |
The scanning speed depends a little on the hardware used, but it is still low. On the old server, scanning the .pdf from here takes 2 minutes, on the new one it takes 1 minute and 30 seconds. |
A full scan on
$HOME
needs now 98 mins with version 0.105:After downgrade to 0.104 and perform a scan on the same folder a few minutes later, its completed in 26 mins;
That is about four times faster, and the normal duration experienced in the past. The amount of files between the runs is almost the same, but the "Data scanned" (whatever that means) is remarkable different.
I also get now a "Can't parse data ERROR" on different PNG and PDF files while scanning with version 0.105, the same files can be processed with version 0.104. -> will be handled in separate report #593
Every suggestion is appreciated to get back the old speed, known until version 0.104.
The text was updated successfully, but these errors were encountered: