Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance reliability with fuzzy hashing #691

Open
Benji377 opened this issue Mar 30, 2024 · 4 comments
Open

Enhance reliability with fuzzy hashing #691

Benji377 opened this issue Mar 30, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@Benji377
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently Raspirus is highly dependant on MD5 signatures. If there is a virus whose signature we don't have, Raspirus has no way to know it's a virus. Even if we create a massive database with all possible MD5 signatures and always keep it up to date, an attacker could still just add a white-space to the file and completely change the MD5 signature.

Describe the solution you'd like
It would be great to have a system that tells us how likely a file is a malware. Ideally, it should be lightweight and fast. That's where fuzzy hashing comes in play, it creates a hash of a given file, just like MD5, but with the added benefit that we can compare one hash to the other. Implementing this would give us the ability to compare a given file to a database of known-malware signatures and return a percentage of how similar a file is. Then we add a threshold and everything above that threshold is considered malware, everything below is considered safe.

Describe alternatives you've considered

  • A machine learning algorithm - Too slow and unpredictable. Also hard to implement with the current setup
  • Yara signatures - Resource intensive, would drop support for Single board computers and lower-end PCs
  • File analysis - Too slow, would require opening each file and "look at it"

Additional context
The current issue is gathering the fuzzy hashes, this might take a while. And even then, we would still need to keep the database up to date and reformat the backend. We might allow the user to choose between MD5 signatures (Fast, higher coverage, higher miss-rate) and Fuzzy hashing (Lower coverage due to missing samples, lower miss-rate and more accurate analysis)

@Benji377 Benji377 added the enhancement New feature or request label Mar 30, 2024
@Benji377
Copy link
Member Author

@GamingGuy003 does this roughly sum it up?

@GamingGuy003
Copy link
Collaborator

Sounds about right. This will presumably greatly increase the time scanning takes, so we might have to come up with something in regards to that (Threading?? / making fuzzy hashing optional if you just want a quick scan?)

@Benji377
Copy link
Member Author

Threading might be a good idea, but we might need to scale it in relation to the user's resources. Also maybe adding a switch on the frontend to choose between signature scanning and fuzzy scanning might be useful

@GamingGuy003
Copy link
Collaborator

Threading with a dynamically scaled threadpool shouldnt be a problem. The toggle makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: 🆕 New
Development

No branches or pull requests

2 participants