Enhance reliability with fuzzy hashing #691

Benji377 · 2024-03-30T13:43:39Z

Is your feature request related to a problem? Please describe.
Currently Raspirus is highly dependant on MD5 signatures. If there is a virus whose signature we don't have, Raspirus has no way to know it's a virus. Even if we create a massive database with all possible MD5 signatures and always keep it up to date, an attacker could still just add a white-space to the file and completely change the MD5 signature.

Describe the solution you'd like
It would be great to have a system that tells us how likely a file is a malware. Ideally, it should be lightweight and fast. That's where fuzzy hashing comes in play, it creates a hash of a given file, just like MD5, but with the added benefit that we can compare one hash to the other. Implementing this would give us the ability to compare a given file to a database of known-malware signatures and return a percentage of how similar a file is. Then we add a threshold and everything above that threshold is considered malware, everything below is considered safe.

Describe alternatives you've considered

A machine learning algorithm - Too slow and unpredictable. Also hard to implement with the current setup
Yara signatures - Resource intensive, would drop support for Single board computers and lower-end PCs
File analysis - Too slow, would require opening each file and "look at it"

Additional context
The current issue is gathering the fuzzy hashes, this might take a while. And even then, we would still need to keep the database up to date and reformat the backend. We might allow the user to choose between MD5 signatures (Fast, higher coverage, higher miss-rate) and Fuzzy hashing (Lower coverage due to missing samples, lower miss-rate and more accurate analysis)

Benji377 · 2024-03-30T13:44:16Z

@GamingGuy003 does this roughly sum it up?

GamingGuy003 · 2024-03-31T09:45:33Z

Sounds about right. This will presumably greatly increase the time scanning takes, so we might have to come up with something in regards to that (Threading?? / making fuzzy hashing optional if you just want a quick scan?)

Benji377 · 2024-04-12T17:41:12Z

Threading might be a good idea, but we might need to scale it in relation to the user's resources. Also maybe adding a switch on the frontend to choose between signature scanning and fuzzy scanning might be useful

GamingGuy003 · 2024-05-14T12:45:51Z

Threading with a dynamically scaled threadpool shouldnt be a problem. The toggle makes sense

Benji377 added the enhancement New feature or request label Mar 30, 2024

Benji377 mentioned this issue Jun 16, 2024

Integrating parts of the SIMBIoTA project #795

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance reliability with fuzzy hashing #691

Enhance reliability with fuzzy hashing #691

Benji377 commented Mar 30, 2024

Benji377 commented Mar 30, 2024

GamingGuy003 commented Mar 31, 2024

Benji377 commented Apr 12, 2024

GamingGuy003 commented May 14, 2024

Enhance reliability with fuzzy hashing #691

Enhance reliability with fuzzy hashing #691

Comments

Benji377 commented Mar 30, 2024

Benji377 commented Mar 30, 2024

GamingGuy003 commented Mar 31, 2024

Benji377 commented Apr 12, 2024

GamingGuy003 commented May 14, 2024