# SecML Malware Tutorial

In this tutorial, you will learn how to use this plugin to test the already implemented attacks against a PyTorch network of your choice.

In [1]:
!pip install python-magic
import magic

Collecting python-magic
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Installing collected packages: python-magic
Successfully installed python-magic-0.4.27


In [2]:
pip install secml

Collecting secml
  Downloading secml-0.15.6-py3-none-any.whl (463 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m464.0/464.0 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: secml
Successfully installed secml-0.15.6


In [6]:
try:
  import secml_malware
except ImportError:
  %pip install git+https://github.com/elastic/ember.git
  %pip install secml-malware

Collecting git+https://github.com/elastic/ember.git
  Cloning https://github.com/elastic/ember.git to /tmp/pip-req-build-452wyoxt
  Running command git clone --filter=blob:none --quiet https://github.com/elastic/ember.git /tmp/pip-req-build-452wyoxt
  Resolved https://github.com/elastic/ember.git to commit d97a0b523de02f3fe5ea6089d080abacab6ee931
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ember
  Building wheel for ember (setup.py) ... [?25l[?25hdone
  Created wheel for ember: filename=ember-0.1.0-py3-none-any.whl size=13050 sha256=989b01705b6aa79ab9d36308200b3bc26b49b1460c846eaed82c6b5f0090f3f9
  Stored in directory: /tmp/pip-ephem-wheel-cache-cn05zcv7/wheels/7a/af/81/7e3bd4d43fd62c37273aa84e0720752df8dbc9c43700279961
Successfully built ember
Installing collected packages: ember
Successfully installed ember-0.1.0
Collecting secml-malware
  Downloading secml_malware-0.2.8-py3-none-any.whl (3.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━

In [7]:
import os
import magic
from secml.array import CArray

from secml_malware.models.malconv import MalConv
from secml_malware.models.c_classifier_end2end_malware import CClassifierEnd2EndMalware, End2EndModel

net = MalConv()
net = CClassifierEnd2EndMalware(net)
net.load_pretrained_model()

Firstly, we have created the network (MalConv) and it has been passed wrapped with a *CClassifierEnd2EndMalware* model class.
This object generalizes PyTorch end-to-end ML models.
Since MalConv is already coded inside the plugin, the weights are also stored, and they can be retrieved with the *load_pretrained_model* method.

If you wish to use diffierent weights, pass the path to the PyTorch *pth* file to that method.

In [8]:
from secml_malware.attack.whitebox.c_header_evasion import CHeaderEvasion

partial_dos = CHeaderEvasion(net, random_init=False, iterations=50, optimize_all_dos=False, threshold=0.5)

This is how an attack is created, no further action is needed.
The `random_init` parameter specifies if the bytes should be assigned with random values before beginning the optimization process, `iterations` sets the number of steps of the attack, `optimize_all_dos` sets if all the DOS header should be perturbed, or just the first 58 bytes, while `threshold` is the detection threshold used as a stopping condition.

If you want to see how much the network is deteriorated by the attack, set this parameter to 0, or it will stop as soon as the confidence decreases below such value.

In [9]:
folder = "secml_malware/data/malware_samples/test_folder"
X = []
y = []
file_names = []
for i, f in enumerate(os.listdir(folder)):
    path = os.path.join(folder, f)
    if 'petya' not in path:
        continue
    if "PE32" not in magic.from_file(path):
        continue
    with open(path, "rb") as file_handle:
        code = file_handle.read()
    x = End2EndModel.bytes_to_numpy(
        code, net.get_input_max_length(), 256, False
    )
    _, confidence = net.predict(CArray(x), True)

    if confidence[0, 1].item() < 0.5:
        continue

    print(f"> Added {f} with confidence {confidence[0,1].item()}")
    X.append(x)
    conf = confidence[1][0].item()
    y.append([1 - conf, conf])
    file_names.append(path)

FileNotFoundError: ignored

We load a simple dataset from the `malware_samples/test_folder` that you have filled with malware to test the attacks.
We discard all the samples that are not seen by the network.
The `CArray` class is the base object you will handle when dealing with vectors in this library.

In [None]:
for sample, label in zip(X, y):
    y_pred, adv_score, adv_ds, f_obj = partial_dos.run(CArray(sample), CArray(label[1]))
    print(partial_dos.confidences_)
    print(f_obj)

[0.9112271666526794, 0.06050172820687294]
0.06050172820687294


Inside the `adv_ds` object, you can find the adversarial example computed by the attack.
You can reconstruct the functioning example by using a specific function inside the plugin:

In [None]:
adv_x = adv_ds.X[0,:]
real_adv_x = partial_dos.create_real_sample_from_adv(file_names[0], adv_x)
print(len(real_adv_x))
real_x = End2EndModel.bytes_to_numpy(real_adv_x, net.get_input_max_length(), 256, False)
_, confidence = net.predict(CArray(real_x), True)
print(confidence[0,1].item())

806912
0.06050172820687294


... and you're done!
If you want to create a real sample (stored on disk), just have a look at the `create_real_sample_from_adv` of each attack. It accepts a third string argument that will be used as a destination file path for storing the adversarial example.

## Bonus: more attacks!
We used one attack, which is the Partial DOS one. But what if we want to use others?
Easy peasy task! Just open the [source code](https://github.com/pralab/secml_malware/tree/master/secml_malware/attack/whitebox) or the [documentation](https://secml-malware.readthedocs.io/en/docs/source/secml_malware.attack.whitebox.html) of the other white box attacks, and instantiate the one you like!
Let's use the [FGSM attack](https://arxiv.org/abs/1802.04528), for instance:

In [None]:
from secml_malware.attack.whitebox import CKreukEvasion

fgsm = CKreukEvasion(net, how_many_padding_bytes=2048, epsilon=1.0, iterations=5)
for i, (sample, label) in enumerate(zip(X, y)):
    y_pred, adv_score, adv_ds, f_obj = fgsm.run(CArray(sample), CArray(label[1]))
    print(fgsm.confidences_)
    print(f_obj)
    real_adv_x = fgsm.create_real_sample_from_adv(file_names[i], adv_ds.X[i, :])
    with open(file_names[i], 'rb') as f:
        print('Original length: ', len(f.read()))
    print('Adversarial sample length: ', len(real_adv_x))


[0.9112271666526794, 0.67103111743927, 0.0]
1.1346487553964835e-05
Original length:  806912
Adversarial sample length:  808960


... and you're done! Remember that this particular attack might take a while, depending on how many bytes the algorithm is tasked to edit (and also for the number of iterations).
In the meantime, **happy coding with SecML Malware!**