# Practical 07 - Evasion Attacks on Static Malware Detection

Assignment for the lecture "AI Security for Malware Detection" in the 2025 AI and Cybersecurity course at the University of Luxembourg.

_Hamid Bostani (hamid.bostani@uni.lu) and Mashal Zainab (mashal.zainab@uni.lu)_

_31 October 2025_

-----

**Name:**

-----

In this assignment, you will:
- Train and evaluate a static Android malware classifier.
- Perform evasion attacks in the feature-space.
- Measure the performance of the malware classifier and the evasion attacks using standard metrics.
- Analyze the impact of attack parameters.
- Interpret findings in the context of real-world malware manipulation.

**Instructions:**
- Fill in your name.
- Answer the questions and complete the code where necessary.
- Run the notebook cells in order. Wherever you see `# TODO` implement the requested functionality.
- Keep all function signatures unchanged (unless noted) so autograders can run your notebook.
- Place the dataset files (`dataset-X.json`, `dataset-Y.json`, `dataset-meta.json`) in a local `data/` folder or update `config['path']` accordingly.
- This file is formatted with interactive cells ("# %%") so you can run it in Jupyter, VSCode, or other IDEs that understand Python interactive cells.
- Save the notebook as a PDF and submit that in Moodle together with the `.ipynb` notebook file.
- The easiest way to create a PDF of your notebook is to use File > Print Preview when running your notebook in the browser. 

### Setup

Run the following cell to import required packages and load configuration:

In [None]:
import os
import numpy as np
import json
import sys
from sklearn.model_selection import train_test_split
from sklearn.svm import LinearSVC
from sklearn.metrics import confusion_matrix,f1_score
import random
from scipy.sparse import csr_matrix

config = {"path" : "/path/to/data/folder/", "svm_c": 0.0006, "no_modifiable_features": 20}



### Dataset Preparation Code

In [2]:
def dataset_preparation():
    path = config["path"]
    try:
        X_filename = os.path.join(path ,"dataset-X.json")
        with open(X_filename ,'rb') as f:
            X = json.load(f)

        Y_finename = os.path.join(path ,"dataset-Y.json")
        with open(Y_finename ,'rt') as f:
            Y = json.load(f)

        meta_filename = os.path.join(path ,"dataset-meta.json")
        with open(meta_filename, 'rt') as f:
            meta = json.load(f)

        X, y, vec = vectorize(X, Y)

        random_state = 60
        train_idxs, test_idxs = train_test_split(
            range(X.shape[0]),
            stratify=y,
            test_size=0.33,
            random_state=random_state)
        X_train = X[train_idxs]
        X_test = X[test_idxs]
        y_train = y[train_idxs]
        y_test = y[test_idxs]
        m_train = [meta[i] for i in train_idxs]
        m_test = [meta[i] for i in test_idxs]


        column_idxs = perform_feature_selection(X_train, y_train)
        X_train = X_train[:, column_idxs]
        X_test = X_test[:, column_idxs]

        return X_train, y_train, X_test, y_test, m_train, m_test, vec, np.array(vec.feature_names_)[column_idxs]

    except Exception as ex:
        print(ex)
        sys.exit(1)

def perform_feature_selection(X_train, y_train):
    _num_features = 10000
    if _num_features is not None:
        selector = LinearSVC(C=1)
        selector.fit(X_train, y_train)
        cols = np.argsort(np.abs(selector.coef_[0]))[::-1]
        cols = cols[:_num_features]
    else:
        cols = [i for i in range(X_train.shape[1])]
    return cols

def vectorize(X, y):
    from sklearn.feature_extraction import DictVectorizer
    vec = DictVectorizer()
    X = vec.fit_transform(X)
    y = np.asarray(y)
    return X, y, vec


### Load Data & Build Classifier

In [5]:
X_train, y_train, X_test, y_test, m_train, m_test, _, feature_set = dataset_preparation()

model = LinearSVC(C=config["svm_c"])
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()

print("Dataset Loaded & Model Trained ✅")




Dataset Loaded & Model Trained ✅


##### Q1. Dataset Overview and Classifier Configuration (2 points)

##### Q1a. Report the dimensions of the training and the test set.

In [8]:
# TODO: Print dimentions of the training and the test set. 

##### Q1b. Report the number of malware samples in each set.
What to do: Count how many malware (label 1) and benign (label 0) samples are in both the training and test splits and print them. 

In [7]:
# TODO: Count malware class occurrences in y_train and y_test

##### Q1c. Explain what the true positive rate (TPR) and false positive rate (FPR) mean in the context of malware detection, and report their values.

#TODO: Add your answer (explanation part)

In [None]:
# TODO: Compute TPR and FPR, and report their values

##### Q1d. Tune the regularization parameters so that the classifier achieves the maximum true positive rate (TPR) at a cost of approximately 2% false positive rate (FPR).

In [None]:
# TODO: Perform tuning loop and report final TPR and FPR

#### Evasion Attacks
These adversarial attacks can transform an input sample into an adversarial example by either adding or removing features, or both. Since removing features is risky in the malware domain, in this assignment we only consider adding features. Please note that the feature representations of Android apps in the prepared datasets are binary, including 1 and 0, meaning a certain feature—such as an API call—exists or does not exist. To add a feature, you simply change its value from 0 to 1.

##### Q2. Implement the Random Attack (2 points)
This attack **randomly adds new features** until the malware is misclassified as benign or the perturbation limit is reached.

In [None]:
def random_attack(x, no_modifiable_features, model):
   # TODO 
    return x_adv, y_adv,cnt


##### Q3. Implement the success rate metric (1 point) 

The success rate of an adversarial attack is defined as:

$$
S = 1 - \frac{n_d}{n_t}
$$

Where:

- $S$: Attack success rate  
- $n_d$: Number of malware samples that are still detected after the attack  
- $n_t$: Total number of malware samples subjected to the attack  

A higher value of \(S\) indicates a more effective evasion attack. 

Implement this metric and apply it to the random attack.



In [None]:
def calculate_success_rate(y_adv_malware_samples, adv_malware_samples): 
    #TODO: Write function to compute success_rate
    return success_rate 

##### Q4. Random Attack Success Rate & Explain Results (1 points)
Report the success rate by uxing the method you implemented. 

Explain:  
- Why might the performance be low or high?

In [None]:
# TODO: result calc

##### Q5. Implement PK-Greedy Attack (4 points)
This attack ranks features by **magnitude of negative SVM weights** → Those strongly indicative of malware detection.

 Strategy:

1️⃣ Sort features by negative impact on model’s decision.

2️⃣ Add the most harmful ones first.

3️⃣ Stop if misclassified as benign or the perturbation limit is reached.

This typically outperforms random attacks.


In [None]:
def pk_attack(x,no_modifiable_features,model):
    #TODO 
    return x_adv,y_adv,cnt

##### Q6. Measure performance of the PK-Greedy Attack (2 points) 
Report the success rate again and interpret why the succes rate is higher or lower. 

In [None]:
# TODO: result calc 

##### Q7. Relationship between perturbation bounds and attack strength (2 points)
What is the relationship between the perturbation bounds and the strength of the attack? 

#TODO Add your answer 

##### Q8. Effect of changing no_modifiable_features across attacks (3 points)
Change `no_modifiable_features` in both attacks, report and describe results. 

Pick representative budgets and run both attacks.  then describe: 

Plot or tabulate success rate vs. number of modifications.

Describe how adding more features changes the success rates attack effectiveness.


In [None]:
# TODO

##### Q9. Which features change, hard cases, and realizability (3 points)
Q9a. What are the top 10 common features that are changed when generating adversarial malware?  

For all successful adversarial examples (across an attack or across both attacks), compute which feature indices were modified most often. Map the top-10 indices to feature_set (vectorizer feature names) and report them with counts and percentages. 

In [None]:
# TODO 

Q9b. Hard cases: how many malware samples are "hard"? 

Define a "hard case" as a malware sample for which neither attack (or for a chosen attack) could flip it to benign within the perturbation budget. Count how many such samples exist in the attacked set.  
Hint: For each malware sample, check if y_adv == 1 after the full budget for the attack — if yes, it’s a hard case for that attack.

In [None]:
# TODO 

Q9c. Are the adversarial examples realizable? Justify your answer. 

Provide a reasoned answer about whether the adversarial examples you produce (binary feature flips) could correspond to real, functioning malware once modified. Consider the feature representation and domain constraints explained in the lecture. 

#TODO Add your answer 

##### Q10. Bonus: Improve PK (greedy) attack (2 points)
Q10 (Bonus). PK attack is greedy and may miss the best perturbation. Propose and implement an improvement. 

Propose at least one algorithmic improvement and implement it. Compare the improved attack against PK and Random with the same perturbation budgets. 

In [None]:
# TODO 