-
Notifications
You must be signed in to change notification settings - Fork 678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virus analysis tools should use local heuristical analysis/sandboxes plus artificial CNS #1206
Comments
This comment was marked as duplicate.
This comment was marked as duplicate.
Thanks for the... interesting suggestion. This approach does not seem workable for a number of reasons, the least of which is the apparent lack of a coherent suggestion and workable implementation plan. Since you're obviously a fan of "AI" I've asked Gemini to assist in drafting the remainder of my response: Resource Challenges:
False Positive Issues:
Current Methods Work Well:
Alternative Solutions:
|
Do not trust AI; AI is just sin, is not an artificial CNS. Resources: This post suggests to produce artificial CNS, and shows you FLOSS resources of artificial CNS (such as APXR and HSOM) that have examples of how to setup for us. This post also suggests uses of heuristical analysis plus sandboxes, and links to resources (such as Virustotal/Zenbox) that do so for us. Current methods: Other researchers would not have begun to produce new methods if the old methods are good enough for us. How this affects us: Safety concerns are the main reason that autonomous robots do not work outdoors to mass produce structures such as houses to us. |
It's clear that you don't have the depth to engage on this topic. Artificial Neural Networks (ANNs) aren't exactly the same as a human brain (CNS). However, ANNs are inspired by the structure and function of the brain and fall under the broad umbrella of Artificial Intelligence (AI). AI encompasses various approaches to mimicking human intelligence, and ANNs are one specific technique.
You know what already uses herustics? ClamAV! https://blog.clamav.net/2011/03/top-5-misconceptions-about-clamav.html I'll also note quickly that the blog post also indicates that the ClamAV team use sandboxes, though perhaps not in the automated way that you're envisioning (some sort of honeypot perhaps?)
It is clear that you do not understand how antiviruses and endpoint protection services work. It is uncommon to 'undo the infection' (i.e. clean infected files), instead these tools focus on preventing the exploitation of a device by preventing the execution of "bad" code on an endpoint (and detecting and quarantining infected files).
|
Gemini is not able to follow links or parse sources. Lots of antiviruses are able to undo infection from programs, Was stupid to not have found those pages about how ClamAV/ClamScan uses some heuristical analysis, |
I agree with the sentiment of your request. It is a good request to investigate AI / ML to identify malware. Just last week, the Snort team released SnortML, which is a module for Snort that may load ML models to classify HTTP URI inputs to identify zero day attacks: https://blog.snort.org/2024/03/talos-launching-new-machine-learning.html It would be wonderful to add detection capabilities to ClamAV. It seems like a promising research area for folks interested in malware research. |
Updated original post (English fixes, + extra examples/sources) |
This is too large of a request. If you want to make such a thing, we could possibly accept a pull request with this kind of feature added. It is also probably too resource intensive to run on the devices that ClamAV uses. |
Is fast with caches. To train (produce synaptic weights for) the CNS, is slow plus requires access to huge sample databases, |
This comment was marked as duplicate.
This comment was marked as duplicate.
Artifiicial central nervous system's |
This comment was marked as duplicate.
This comment was marked as duplicate.
Original post was pseudocode, is now C++. |
Original post has new fixes. Comments have new fixes. |
@ETERNALBLUEbullrun The concepts you're discussing is so much outside my wheelhouse it mostly sounds like ChatGPT make up some tech jargon. The code you shared isn't what I would call C++. It's just C++ wrapping around Python code. Sorry, we're not interested. |
SwuduSusuwu/SubStack#6 " Was that the sole concern? With C++ implementation of |
Last post before this ( #1206 (comment) ) was about how to produce virus signatures (which is just one submodule of this issue). Is this what you are referring to? Am curious: what can you ask ChatGPT which has a chance to produce this? Which part confused you? Was it the part about how formulas to compress data (lossless) with codebooks, are close to formulas to produce virus signatures? Formulas such as
This is not a concept, runnable C++ source exists. |
|
Was the confusion from the original post's For comparison; |
Repurposed from https://swudususuwu.substack.com/p/howto-produce-better-virus-scanners ("Allows all uses")
Static analysis + sandbox + CNS = 1 second (approx) analysis of new executables (protects all app launches,) but caches reduce this to less than 1ms (just cost to lookup
ResultList::hashes
, which isstd::unordered_set<decltype(Sha2(const FileBytecode &))>
; a hashmap of hashes)./* Licenses: allows all uses ("Creative Commons"/"Apache 2") */
[Version of post is Reduce 5183071 to
envpS.empty() ? execv : execve
· SwuduSusuwu/SubStack@f2b58d5] For the most new sources ( + static libs), use apps such as iSH (for iOS) or Termux (for Android OS) to run this:git clone https://github.com/SwuduSusuwu/SubStack.git && cd ./Substack/ && ./build.sh
less
cxx/ClassPortableExecutable.hxxless
cxx/ClassSha2.cxxless
cxx/ClassResultList.hxxless
cxx/ClassCns.hxxless
cxx/ClassCns.cxxless
cxx/VirusAnalysis.hxxless
cxx/VirusAnalysis.cxxless
cxx/main.cxxTo run most of this fast (lag less,) use
CXXFLAGS
which auto-vectorizes/auto-parallelizes, and to setup CNS synapses (Cns::setupSynapses()
) fast, use TensorFlow'sMapReduce
. Resources: How to have computers process fast.For comparison;
produceVirusFixCns
is close to assistants (such as "ChatGPT 4.0" or "Claude-3 Opus",) have such demo asproduceAssistantCns
;less
cxx/AssistantCns.hxxless
cxx/AssistantCns.cxx========
Hash resources:
Is just a checksum (such as Sha-2) of all sample inputs, which maps to "this passes" (or "this does not pass".)
https://wikipedia.org/wiki/Sha-2
Signature resources:
Is just a substring (or regex) of infections, which the virus analysis tool checks all executables for; if the signature is found in the executable, do not allow to launch, otherwise launch this.
https://wikipedia.org/wiki/Regex
Static analysis resources:
https://github.com/topics/analysis has lots of open source (FLOSS) analysis tools (such as
https://github.com/kylefarris/clamscan,
which wraps https://github.com/Cisco-Talos/clamav/ ,)
which show how to use hex dumps (or disassembled sources) of the apps/SW (executables) to deduce what the apps/SW do to your OS.
Static analysis (such as Clang/LLVM has) just checks programs for accidental security threats (such as buffer overruns/underruns, or null-pointer-dereferences,) but could act as a basis,
if you add a few extra checks for deliberate vulnerabilities/signs of infection (these are heuristics, so the user should have a choice to quarantine and submit for review, or continue launch of this).
https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer
is part of Clang/LLVM (license is FLOSS,) does static analysis (emulation produces inputs to functions, formulas analyze stacktraces (+ heap/stack uses) to produce lists of possible unwanted side effects to warn you of); versus
-fsanitize
, do not have to recompile to do static analysis.-fsanitize
requires you to produce inputs, static analysis does this for you.LLVM is lots of files, Phasar is just it’s static analysis:
https://github.com/secure-software-engineering/phasar
Example outputs (tests “Fdroid.apk”) from VirusTotal, of static analysis + 2 sandboxes;
the false positive outputs (from VirusTotal's Zenbox) show the purpose of manual review.
Sandbox resources:
As opposed to static analysis of the executables hex (or disassembled sources,)
sandboxes perform chroot + functional analysis.
https://wikipedia.org/wiki/Valgrind is just meant to locate accidental security vulnerabilities, but is a common example of functional analysis.
If compliant to POSIX (each Linux OS is), tools can use:
chroot()
(runman chroot
for instructions) so that the programs you test cannot alter stuff out of the test;plus can use
strace()
(runman strace
for instructions, or look at https://opensource.com/article/19/10/stracehttps://www.geeksforgeeks.org/strace-command-in-linux-with-examples/ ) which hooks all system calls and saves logs for functional analysis.
Simple sandboxes just launch programs with "chroot()"+"strace()" for a few seconds,
with all outputs sent for manual reviews;
if more complex, has heuristics to guess what is important (in case of lots of submissions, so manual reviews have less to do.)
Autonomous sandboxes (such as Virustotal's) use full outputs from all analyses,
with calculus to guess if the app/SW is cool to us
(thousands of rules such as "Should not alter files of other programs unless prompted to through OS dialogs", "Should not perform network access unless prompted to from you", "Should not perform actions leading to obfuscation which could hinder analysis",)
which, if violated, add to the executables "danger score" (which the analysis results page shows you.)
CNS resources:
Once the virus analysis tool has static+functional analysis (+ sandbox,) the next logical move is to do artificial CNS.
Just as (if humans grew trillions of neurons plus thousands of layers of cortices) one of us could parse all databases of infections (plus samples of fresh apps/SW) to setup our synapses to parse hex dumps of apps/SW (to allow us to revert all infections to fresh apps/SW, or if the whole thing is an infection just block,)
so too could artificial CNS (with trillions of artificial neurons) do this:
For analysis, pass training inputs mapped to outputs (infection -> block, fresh apps/SW -> pass) to artificial CNS;
To undo infections (to restore to fresh apps/SW,)
inputs = samples of all (infections or fresh apps/SW,)
outputs = EOF/null (if is infection that can not revert to fresh apps/SW,) or else outputs = fresh apps/SW;
To setup synapses, must have access to huge sample databases (such as Virustotal's access.)
Github has lots of FLOSS (Open Source Softwares) simulators of CNS at https://github.com/topics/artificial-neural-network such as:
"HSOM" (license is FLOSS) has simple Python artificial neural networks: https://github.com/CarsonScott/HSOM
"apxr_run" (https://github.com/Rober-t/apxr_run/ , license is FLOSS) is more complex;
"apxr_run" has various FLOSS neural network activation functions (absolute, average, standard deviation, sqrt, sin, tanh, log, sigmoid, cos), plus sensor functions (vector difference, quadratic, multiquadric, saturation [+D-zone], gaussian, cartesian/planar/polar distances): https://github.com/Rober-t/apxr_run/blob/master/src/lib/functions.erl
Various FLOSS neuroplastic functions (self-modulation, Hebbian function, Oja's function): https://github.com/Rober-t/apxr_run/blob/master/src/lib/plasticity.erl
Various FLOSS neural network input aggregator functions (dot products, product of differences, mult products): https://github.com/Rober-t/apxr_run/blob/master/src/agent_mgr/signal_aggregator.erl
Various simulated-annealing functions for artificial neural networks (dynamic [+ random], active [+ random], current [+ random], all [+ random]): https://github.com/Rober-t/apxr_run/blob/master/src/lib/tuning_selection.erl
Choices to evolve connections through Darwinian or Lamarkian formulas: https://github.com/Rober-t/apxr_run/blob/master/src/agent_mgr/neuron.erl
Simple to convert Erlang functions to Java/C++ (to reuse for fast programs;
the syntax is close to Lisp's.
Examples of howto setup APXR as artificial CNS; https://github.com/Rober-t/apxr_run/blob/master/src/examples/
Examples of howto setup HSOM as artificial CNS; https://github.com/CarsonScott/HSOM/tree/master/examples
Simple to setup once you have access to databases.
Alternative CNS:
https://swudususuwu.substack.com/p/albatross-performs-lots-of-neural
This post was about general methods to produce virus analysis tools,
does not require that local resources do all of this;
For systems with lots of resources, could have local sandboxes/CNS;
For systems with less resources, could just submit samples of unknown apps/SW to hosts to perform analysis;
Could have small local sandboxes (that just run for a few seconds) and small CNS (just billions of neurons with hundreds of layers,
versus the trillions of neurons with thousands of layers of cortices that antivirus hosts would use for this);
Allows reuses of workflows the analysis tool has (could just add (small) local sandboxes, or just add artificial CNS to antivirus hosts for extra analysis.)
How to reproduce the problem
Scan new executables (that are not part of stock databases)
The text was updated successfully, but these errors were encountered: