# Serialization

### Insecure Deserialization

A vulnerability leads to:
- abuse application logic,
- deny service, or
- execute arbitrary code when an object being deserialized

> [CWE-502: Deserialization of Untrusted Data](https://cwe.mitre.org/data/definitions/502.html)

> Don't compile and run untrusted pull request.<br><br>
Don't load pickle files you found on a street.

## Pickle

### Pickling

Preserving for future use.

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

Preferred way to serialize python objects.

### Pickling Basics

In [45]:
import pickle

# An arbitrary collection of objects supported by pickle.
data = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

canned_data = pickle.dumps(data)
print(canned_data)

b'\x80\x04\x95y\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01a\x94]\x94(K\x01G@\x00\x00\x00\x00\x00\x00\x00K\x03\x8c\x08builtins\x94\x8c\x07complex\x94\x93\x94G@\x10\x00\x00\x00\x00\x00\x00G@\x18\x00\x00\x00\x00\x00\x00\x86\x94R\x94e\x8c\x01b\x94\x8c\x10character string\x94C\x0bbyte string\x94\x86\x94\x8c\x01c\x94\x8f\x94(\x89\x88N\x90u.'


In [46]:
pickle.loads(canned_data)

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

### Pickle Security

`pickle` security warning at docs.python.org:

> **Warning:** The pickle module **is not secure**. Only unpickle data you trust.<br><br>
It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

### Pickle Code Execution

Pickle can also handle the module imports and execute functions:

In [47]:
# This will call OS command `echo hello world`
import pickle
pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")

# For demo purposes we have to run the code with
# a system interpreter to capture its std output
!python ../src/pickle_code_execution.py

hello world


### Secure `pickle` Usage

If you brave enough and need to use pickle ensure it comes from trusted source and can't be altered on it's way:

1. Encrypted network connection or cryptographic signature
2. Strict permissions if loading from disk

### Signing Serialized Data

In [48]:
import pickle
# An arbitrary collection of objects supported by pickle.
obj = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}
data = pickle.dumps(obj)

In [49]:
# assume both parties agreed on some random secret key for the session
import secrets
secret_random_key = secrets.token_bytes(32)

In [50]:
# send a pickled data along with its digest
import hmac
digest = hmac.new(secret_random_key, data, 'sha256').digest()
payload = digest + data
print(payload)

b'\xe1\x97\xa7`$1\x83\x1a\xdd\x98*\xcbxO-.\xb5\x1e&\xc2D7\xfdU\x17\xd4\x07cI*\xf8\xf6\x80\x04\x95y\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01a\x94]\x94(K\x01G@\x00\x00\x00\x00\x00\x00\x00K\x03\x8c\x08builtins\x94\x8c\x07complex\x94\x93\x94G@\x10\x00\x00\x00\x00\x00\x00G@\x18\x00\x00\x00\x00\x00\x00\x86\x94R\x94e\x8c\x01b\x94\x8c\x10character string\x94C\x0bbyte string\x94\x86\x94\x8c\x01c\x94\x8f\x94(\x89\x88N\x90u.'


In [51]:
# recieve the payload and verify the digest before using the data
def unpickle_payload(payload):
    rcv_digest = payload[:32]
    rcv_data = payload[32:]
    expected_digest = hmac.new(secret_random_key,
                               rcv_data,
                               'sha256').digest()
    if not secrets.compare_digest(rcv_digest, expected_digest):
        raise ValueError('Integrity check failed')
    return pickle.loads(rcv_data)

unpickle_payload(payload)

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

In [52]:
# check if unauthorized modification detected
unpickle_payload(payload + b'hack')

ValueError: Integrity check failed

### Restricted Unpickler

Limit unpickler to 'safe' objects to mitigate risk of code execution.

> Restricted unpickler does not mitigate all security risks with `pickle.load()`.


At the end whis will not protect you. Dedicated hacker can always find a way.
Avoid using pickle. Consider JSON or protobuf as secure alternatives.

[python-can-i-safely-unpickle-untrusted-data](https://stackoverflow.com/questions/25353753/python-can-i-safely-unpickle-untrusted-data)

[reverse enginnering pickle](https://hackmd.io/@2KUYNtTcQ7WRyTsBT7oePg/BycZwjKNX?print-pdf#/)

In [53]:
import builtins
import pickle

safe_builtins = {
    'range',
    'complex',
    'set',
    'frozenset',
    'slice',
}

class RestrictedUnpickler(pickle.Unpickler):

    def find_class(self, module, name):
        # Only allow safe classes from builtins.
        if module == "builtins" and name in safe_builtins:
            return getattr(builtins, name)
        # Forbid everything else.
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
                                     (module, name))

In [54]:
import io
def restricted_loads(s):
    """Helper function analogous to pickle.loads()."""
    return RestrictedUnpickler(io.BytesIO(s)).load()

In [55]:
# try loading builtins ony
obj = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}

restricted_loads(pickle.dumps(obj))

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

In [56]:
# try loading dangerous code
restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")

UnpicklingError: global 'os.system' is forbidden

### Numpy Loading

Numpy is vulnerable to remote code execution when loading arrays of objects.

Prefer using `allow_pickle=False` when saving and loading in numpy.

In [57]:
import numpy

import io
outfile = io.BytesIO()

numpy.save(outfile, numpy.arange(10), allow_pickle=False)
_ = outfile.seek(0) # Only needed here to simulate closing & reopening file
print(numpy.load(outfile, allow_pickle=False))

[0 1 2 3 4 5 6 7 8 9]


### Demo CVE-2019-6446 Exploit

allow_pickle is True by default before numpy version 1.16.3:

In [58]:
# CVE-2019-6446 exploit
import numpy
import io
import pickle
import os

class Test(object):
    def __init__(self):
        self.a = 1

    def __reduce__(self):        
        return (os.system,('echo System commmand executed',))
malicious_object = Test()

# Create npy file in memory from plain picke
npy_bytes = io.BytesIO(pickle.dumps(malicious_object))
# This line execute OS command `echo` embedded into numpy file
numpy.load(npy_bytes, allow_pickle=True)

# For demo purposes we have to run the code with
# a system interpreter to capture its std output
!python ../src/numpy_exploit.py

System commmand executed


The `__reduce__()` method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the "reduce value"). ... When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or None can be provided as their value. The semantics of each item are in order:

A callable object that will be called to create the initial version of the object.
A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument. ...
https://davidhamann.de/2020/04/05/exploiting-python-pickle/

## Yaml

### What is YAML?

Yaml is data serialization format.

It provides a simple reading and editing in plain text, thus frequiently used as a format for various configuration files.

### Yaml vs. JSON:

Yaml:

``` yaml
klocwork_linux:
  os:
    ubuntu18
klocwork_win:
  os:
    windows
```

JSON:

```javascript
{
  "klocwork_linux": {
    "os": "ubuntu18"
  },
  "klocwork_win": {
    "os": "windows"
  }
}
```

TODO: basic types

TODO: advanced python types

### Yaml Safe Loading

`yaml.load()` is as powerfull as `pickle.load()`.

Prefer using `yaml.safe_load()` and `yaml.safe_dump()` when loading and saving yaml files.

Yaml can construct arbitray python objects and `yaml.safe_load()` limits this to
simple Python objects like lists and integers.

In [59]:
import yaml
text = yaml.safe_dump({'a':1})
yaml.safe_load(text)

{'a': 1}

### Yaml Code Execution

Latest pyyaml have most of the issues hidden under yaml.UnsafeLoader. 

In [60]:
import yaml
data = b"""!!python/object/new:os.system [echo EXPLOIT!]"""
# This line execute OS command `echo` embedded into yaml file
deserialized_data = yaml.load(data, Loader=yaml.UnsafeLoader)

# For demo purposes we have to run the code with
# a system interpreter to capture its std output
!python ../src/yaml_code_execution.py

EXPLOIT!


### Useful Links

https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html - OWASP Deserialization Cheat Sheet